linux.git
3 years agogfs2: lookup local statfs inodes prior to journal recovery
Abhi Das [Tue, 20 Oct 2020 20:58:04 +0000 (15:58 -0500)]
gfs2: lookup local statfs inodes prior to journal recovery

We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Add fields for statfs info in struct gfs2_log_header_host
Abhi Das [Tue, 20 Oct 2020 20:58:03 +0000 (15:58 -0500)]
gfs2: Add fields for statfs info in struct gfs2_log_header_host

And read these in __get_log_header() from the log header.
Also make gfs2_statfs_change_out() non-static so it can be used
outside of super.c

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Ignore subsequent errors after withdraw in rgrp_go_sync
Andreas Gruenbacher [Tue, 20 Oct 2020 12:18:24 +0000 (14:18 +0200)]
gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync

Once a withdraw has occurred, ignore errors that are the consequence of the
withdraw.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Eliminate gl_vm
Bob Peterson [Thu, 15 Oct 2020 16:07:26 +0000 (11:07 -0500)]
gfs2: Eliminate gl_vm

The gfs2_glock structure has a gl_vm member, introduced in commit 7005c3e4ae428
("GFS2: Use range based functions for rgrp sync/invalidation"), which stores
the location of resource groups within their address space.  This structure is
in a union with iopen glock specific fields.  It was introduced because at
unmount time, the resource group objects were destroyed before flushing out any
pending resource group glock work, and flushing out such work could require
flushing / truncating the address space.

Since commit b3422cacdd7e6 ("gfs2: Rework how rgrp buffer_heads are managed"),
any pending resource group glock work is flushed out before destroying the
resource group objects.  So the resource group objects will now always exist in
rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values
where needed instead of caching them.  This also eliminates the union.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Only access gl_delete for iopen glocks
Bob Peterson [Thu, 15 Oct 2020 16:16:48 +0000 (11:16 -0500)]
gfs2: Only access gl_delete for iopen glocks

Only initialize gl_delete for iopen glocks, but more importantly, only access
it for iopen glocks in flush_delete_work: flush_delete_work is called for
different types of glocks including rgrp glocks, and those use gl_vm which is
in a union with gl_delete.  Without this fix, we'll end up clobbering gl_vm,
which results in general memory corruption.

Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Fix comments to glock_hash_walk
Bob Peterson [Mon, 12 Oct 2020 20:04:20 +0000 (15:04 -0500)]
gfs2: Fix comments to glock_hash_walk

The comments before function glock_hash_walk had the wrong name and
an extra parameter. This simply fixes the comments.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)
Bob Peterson [Mon, 12 Oct 2020 18:57:37 +0000 (13:57 -0500)]
gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)

Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
when a glock had a holder queued. It was only checked for inode glocks,
although set and cleared by all glocks, and it was only used to determine
whether the glock should be held for the minimum hold time before releasing.

The problem is that the flag is not accurate at all. If a process holds
the glock, the flag is set. When they dequeue the glock, it only cleared
the flag in cases when the state actually changed. So if the state doesn't
change, the flag may still be set, even when nothing is queued.

This happens to iopen glocks often: the get held in SH, then the file is
closed, but the glock remains in SH mode.

We don't need a special flag to indicate this: we can simply tell whether
the glock has any items queued to the holders queue. It's a waste of cpu
time to maintain it.

This patch eliminates the flag in favor of simply checking list_empty
on the glock holders.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Ignore journal log writes for jdata holes
Bob Peterson [Thu, 27 Aug 2020 17:18:37 +0000 (12:18 -0500)]
gfs2: Ignore journal log writes for jdata holes

When flushing out its ail1 list, gfs2_write_jdata_page calls function
__block_write_full_page passing in function gfs2_get_block_noalloc.
But there was a problem when a process wrote to a jdata file, then
truncated it or punched a hole, leaving references to the blocks within
the new hole in its ail list, which are to be written to the journal log.

In writing them to the journal, after calling gfs2_block_map, function
gfs2_get_block_noalloc determined that the (hole-punched) block was not
mapped, so it returned -EIO to generic_writepages, which passed it back
to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming
there was a real IO error writing to the journal.

This might be a valid error when writing metadata to the journal, but for
journaled data writes, it does not warrant a withdraw.

This patch adds a check to function gfs2_block_map that makes an exception
for journaled data writes that correspond to jdata holes: If the iomap
get function returns a block type of IOMAP_HOLE, it instead returns
-ENODATA which does not cause the withdraw. Other errors are returned as
before.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: simplify gfs2_block_map
Bob Peterson [Wed, 19 Aug 2020 14:24:48 +0000 (09:24 -0500)]
gfs2: simplify gfs2_block_map

Function gfs2_block_map had a lot of redundancy between its create and
no_create paths. This patch simplifies the code to eliminate the redundancy.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Only set PageChecked if we have a transaction
Bob Peterson [Wed, 19 Aug 2020 13:55:18 +0000 (08:55 -0500)]
gfs2: Only set PageChecked if we have a transaction

With jdata writes, we frequently got into situations where gfs2 deadlocked
because of this calling sequence:

gfs2_ail1_start
   gfs2_ail1_flush - for every tr on the sd_ail1_list:
      gfs2_ail1_start_one - for every bd on the tr's tr_ail1_list:
         generic_writepages
    write_cache_pages passing __writepage()
       calls clear_page_dirty_for_io which calls set_page_dirty:
          which calls jdata_set_page_dirty which sets PageChecked.
       __writepage() calls
          mapping->a_ops->writepage AKA gfs2_jdata_writepage

However, gfs2_jdata_writepage checks if PageChecked is set, and if so, it
ignores the write and redirties the page. The problem is that write_cache_pages
calls clear_page_dirty_for_io, which often calls set_page_dirty(). See comments
in page-writeback.c starting with "Yes, Virginia". If it's jdata,
set_page_dirty will call jdata_set_page_dirty which will set PageChecked.
That causes a conflict because it makes it look like the page has been
redirtied by another writer, in which case we need to skip writing it and
redirty the page. That ends up in a deadlock because it isn't a "real" writer
and nothing will ever clear PageChecked.

If we do have a real writer, it will have started a transaction. So this
patch checks if a transaction is in use, and if not, it skips setting
PageChecked. That way, the page will be dirtied, cleaned, and written
appropriately.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: don't lock sd_ail_lock in gfs2_releasepage
Bob Peterson [Tue, 18 Aug 2020 16:50:28 +0000 (11:50 -0500)]
gfs2: don't lock sd_ail_lock in gfs2_releasepage

Patch 380f7c65a7eb3288e4b6812acf3474a1de230707 changed gfs2_releasepage
so that it held the sd_ail_lock spin_lock for most of its processing.
It did this for some mysterious undocumented bug somewhere in the
evict code path. But in the nine years since, evict has been reworked
and fixed many times, and so have the transactions and ail list.
I can't see a reason to hold the sd_ail_lock unless it's protecting
the actual ail lists hung off the transactions. Therefore, this patch
removes the locking to increase speed and efficiency, and to further help
us rework the log flush code to be more concurrent with transactions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: make gfs2_ail1_empty_one return the count of active items
Bob Peterson [Tue, 18 Aug 2020 13:05:14 +0000 (08:05 -0500)]
gfs2: make gfs2_ail1_empty_one return the count of active items

This patch is one baby step toward simplifying the journal management.
It simply changes function gfs2_ail1_empty_one from a void to an int and
makes it return a count of active items. This allows the caller to check
the return code rather than list_empty on the tr_ail1_list. This way
we can, in a later patch, combine transaction ail1 and ail2 lists.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe
Bob Peterson [Wed, 22 Jul 2020 15:27:50 +0000 (10:27 -0500)]
gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe

Before this patch, when blocks were freed, it called gfs2_meta_wipe to
take the metadata out of the pending journal blocks. It did this mostly
by calling another function called gfs2_remove_from_journal. This is
shortsighted because it does not do anything with jdata blocks which
may also be in the journal.

This patch expands the function so that it wipes out jdata blocks from
the journal as well, and it wipes it from the ail1 list if it hasn't
been written back yet. Since it now processes jdata blocks as well,
the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe.

New function gfs2_ail1_wipe wants a static view of the ail list, so it
locks the sd_ail_lock when removing items. To accomplish this, function
gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now
the caller's responsibility to do so.

I was going to make sd_ail_lock locking conditional, but the practice is
generally frowned upon. For details, see: https://lwn.net/Articles/109066/

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: enhance log_blocks trace point to show log blocks free
Bob Peterson [Thu, 20 Aug 2020 13:59:28 +0000 (08:59 -0500)]
gfs2: enhance log_blocks trace point to show log blocks free

This patch adds some code to enhance the log_blocks trace point. It
reports the number of free log blocks. This makes the trace point much
more useful, especially for debugging performance problems when we can
tell when the journal gets full and needs to wait for flushes, etc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: add missing log_blocks trace points in gfs2_write_revokes
Bob Peterson [Thu, 20 Aug 2020 13:53:29 +0000 (08:53 -0500)]
gfs2: add missing log_blocks trace points in gfs2_write_revokes

Function gfs2_write_revokes was incrementing and decrementing the number
of log blocks free, but there was never a log_blocks trace point for it.
Thus, the free blocks from a log_blocks trace would jump around
mysteriously.

This patch adds the missing trace points so the trace makes more sense.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm
Bob Peterson [Wed, 22 Jul 2020 16:51:09 +0000 (11:51 -0500)]
gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm

Since the function is only used for writing jdata pages, this patch
simply renames function gfs2_write_full_page to a more appropriate
name: gfs2_write_jdata_page. This makes the code easier to understand.

The function was only called in one place, which passed in a pointer to
function gfs2_get_block_noalloc. The function doesn't need to be
passed in. Therefore, this also eliminates the unnecessary parameter
to increase efficiency.

I also took the liberty of cleaning up the function comments.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: add validation checks for size of superblock
Anant Thazhemadam [Wed, 14 Oct 2020 16:31:09 +0000 (22:01 +0530)]
gfs2: add validation checks for size of superblock

In gfs2_check_sb(), no validation checks are performed with regards to
the size of the superblock.
syzkaller detected a slab-out-of-bounds bug that was primarily caused
because the block size for a superblock was set to zero.
A valid size for a superblock is a power of 2 between 512 and PAGE_SIZE.
Performing validation checks and ensuring that the size of the superblock
is valid fixes this bug.

Reported-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
Tested-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
Suggested-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
[Minor code reordering.]
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: use-after-free in sysfs deregistration
Jamie Iles [Mon, 12 Oct 2020 13:13:09 +0000 (14:13 +0100)]
gfs2: use-after-free in sysfs deregistration

syzkaller found the following splat with CONFIG_DEBUG_KOBJECT_RELEASE=y:

  Read of size 1 at addr ffff000028e896b8 by task kworker/1:2/228

  CPU: 1 PID: 228 Comm: kworker/1:2 Tainted: G S                5.9.0-rc8+ #101
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events kobject_delayed_cleanup
  Call trace:
   dump_backtrace+0x0/0x4d8
   show_stack+0x34/0x48
   dump_stack+0x174/0x1f8
   print_address_description.constprop.0+0x5c/0x550
   kasan_report+0x13c/0x1c0
   __asan_report_load1_noabort+0x34/0x60
   memcmp+0xd0/0xd8
   gfs2_uevent+0xc4/0x188
   kobject_uevent_env+0x54c/0x1240
   kobject_uevent+0x2c/0x40
   __kobject_del+0x190/0x1d8
   kobject_delayed_cleanup+0x2bc/0x3b8
   process_one_work+0x96c/0x18c0
   worker_thread+0x3f0/0xc30
   kthread+0x390/0x498
   ret_from_fork+0x10/0x18

  Allocated by task 1110:
   kasan_save_stack+0x28/0x58
   __kasan_kmalloc.isra.0+0xc8/0xe8
   kasan_kmalloc+0x10/0x20
   kmem_cache_alloc_trace+0x1d8/0x2f0
   alloc_super+0x64/0x8c0
   sget_fc+0x110/0x620
   get_tree_bdev+0x190/0x648
   gfs2_get_tree+0x50/0x228
   vfs_get_tree+0x84/0x2e8
   path_mount+0x1134/0x1da8
   do_mount+0x124/0x138
   __arm64_sys_mount+0x164/0x238
   el0_svc_common.constprop.0+0x15c/0x598
   do_el0_svc+0x60/0x150
   el0_svc+0x34/0xb0
   el0_sync_handler+0xc8/0x5b4
   el0_sync+0x15c/0x180

  Freed by task 228:
   kasan_save_stack+0x28/0x58
   kasan_set_track+0x28/0x40
   kasan_set_free_info+0x24/0x48
   __kasan_slab_free+0x118/0x190
   kasan_slab_free+0x14/0x20
   slab_free_freelist_hook+0x6c/0x210
   kfree+0x13c/0x460

Use the same pattern as f2fs + ext4 where the kobject destruction must
complete before allowing the FS itself to be freed.  This means that we
need an explicit free_sbd in the callers.

Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[Also go to fail_free when init_names fails.]
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Fix NULL pointer dereference in gfs2_rgrp_dump
Andrew Price [Wed, 7 Oct 2020 11:30:58 +0000 (12:30 +0100)]
gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump

When an rindex entry is found to be corrupt, compute_bitstructs() calls
gfs2_consist_rgrpd() which calls gfs2_rgrp_dump() like this:

    gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf);

gfs2_rgrp_dump then dereferences the gl without checking it and we get

    BUG: KASAN: null-ptr-deref in gfs2_rgrp_dump+0x28/0x280

because there's no rgrp glock involved while reading the rindex on mount.

Fix this by changing gfs2_rgrp_dump to take an rgrp argument.

Reported-by: syzbot+43fa87986bdd31df9de6@syzkaller.appspotmail.com
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: use iomap for buffered I/O in ordered and writeback mode
Christoph Hellwig [Mon, 1 Jul 2019 21:54:39 +0000 (23:54 +0200)]
gfs2: use iomap for buffered I/O in ordered and writeback mode

Switch to using the iomap readpage and writepage helpers for all I/O in
the ordered and writeback modes, and thus eliminate using buffer_heads
for I/O in these cases.  The journaled data mode is left untouched.

(Andreas Gruenbacher: In gfs2_unstuffer_page, switch from mark_buffer_dirty
to set_page_dirty instead of accidentally leaving the page / buffer clean.)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: call truncate_inode_pages_final for address space glocks
Bob Peterson [Wed, 16 Sep 2020 16:06:23 +0000 (11:06 -0500)]
gfs2: call truncate_inode_pages_final for address space glocks

Before this patch, we were not calling truncate_inode_pages_final for the
address space for glocks, which left the possibility of a leak. We now
take care of the problem instead of complaining, and we do it during
glock tear-down..

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: simplify the logic in gfs2_evict_inode
Bob Peterson [Wed, 16 Sep 2020 13:50:44 +0000 (08:50 -0500)]
gfs2: simplify the logic in gfs2_evict_inode

Now that we've factored out the deleted and undeleted dinode cases
in gfs2_evict_inode, we can greatly simplify the logic. Now the
function is easy to read and understand.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: factor evict_linked_inode out of gfs2_evict_inode
Bob Peterson [Fri, 11 Sep 2020 19:53:52 +0000 (14:53 -0500)]
gfs2: factor evict_linked_inode out of gfs2_evict_inode

Now that we've factored out the delete-dinode case to simplify
gfs2_evict_inode, we take it a step further and factor out the other
case: where we don't delete the inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: further simplify gfs2_evict_inode with new func evict_should_delete
Bob Peterson [Fri, 11 Sep 2020 15:30:26 +0000 (10:30 -0500)]
gfs2: further simplify gfs2_evict_inode with new func evict_should_delete

This patch further simplifies function gfs2_evict_inode() by adding a
new function evict_should_delete. The function may also lock the inode
glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: factor evict_unlinked_inode out of gfs2_evict_inode
Bob Peterson [Fri, 11 Sep 2020 14:29:25 +0000 (09:29 -0500)]
gfs2: factor evict_unlinked_inode out of gfs2_evict_inode

Function gfs2_evict_inode is way too big, complex and unreadable. This
is a baby step toward breaking it apart to be more readable. It factors
out the portion that deletes the online bits for a dinode that is
unlinked and needs to be deleted. A future patch will factor out more.
(If I factor out too much, the patch itself becomes unreadable).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: rename variable error to ret in gfs2_evict_inode
Bob Peterson [Fri, 11 Sep 2020 15:56:31 +0000 (10:56 -0500)]
gfs2: rename variable error to ret in gfs2_evict_inode

Function gfs2_evict_inode is too big and unreadable. This patch is just
a baby step toward improving that. This first step just renames variable
error to ret. This will help make future patches more readable.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: convert to use DEFINE_SEQ_ATTRIBUTE macro
Liu Shixin [Wed, 16 Sep 2020 02:50:21 +0000 (10:50 +0800)]
gfs2: convert to use DEFINE_SEQ_ATTRIBUTE macro

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Fix bad comment for trans_drain
Bob Peterson [Mon, 31 Aug 2020 13:12:15 +0000 (08:12 -0500)]
gfs2: Fix bad comment for trans_drain

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agogfs2: Make sure we don't miss any delayed withdraws
Andreas Gruenbacher [Fri, 28 Aug 2020 21:44:36 +0000 (23:44 +0200)]
gfs2: Make sure we don't miss any delayed withdraws

Commit ca399c96e96e changes gfs2_log_flush to not withdraw the
filesystem while holding the log flush lock, but it fails to check if
the filesystem needs to be withdrawn once the log flush lock has been
released.  Likewise, commit f05b86db314d depends on gfs2_log_flush to
trigger for delayed withdraws.  Add that and clean up the code flow
somewhat.

In gfs2_put_super, add a check for delayed withdraws that have been
missed to prevent these kinds of bugs in the future.

Fixes: ca399c96e96e ("gfs2: flesh out delayed withdraw for gfs2_log_flush")
Fixes: f05b86db314d ("gfs2: Prepare to withdraw as soon as an IO error occurs in log write")
Cc: stable@vger.kernel.org # v5.7+: 462582b99b607: gfs2: add some much needed cleanup for log flushes that fail
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
3 years agoLinux 5.9 v5.9
Linus Torvalds [Sun, 11 Oct 2020 21:15:50 +0000 (14:15 -0700)]
Linux 5.9

3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sun, 11 Oct 2020 18:18:04 +0000 (11:18 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "Five fixes.

  Subsystems affected by this patch series: MAINTAINERS, mm/pagemap,
  mm/swap, and mm/hugetlb"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
  mm: validate inode in mapping_set_error()
  mm: mmap: Fix general protection fault in unlink_file_vma()
  MAINTAINERS: Antoine Tenart's email address
  MAINTAINERS: change hardening mailing list

3 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 11 Oct 2020 18:11:35 +0000 (11:11 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
 "Fixes an obvious bug (memory leak introduced in 5.8)"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  pipe: Fix memory leaks in create_pipe_files()

3 years agoMerge tag 'x86-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 11 Oct 2020 17:53:37 +0000 (10:53 -0700)]
Merge tag 'x86-urgent-2020-10-11' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Two fixes:

   - Fix a (hopefully final) IRQ state tracking bug vs MCE handling

   - Fix a documentation link"

* tag 'x86-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Fix incorrect references to zero-page.txt
  x86/mce: Use idtentry_nmi_enter/exit()

3 years agoMerge tag 'perf-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 11 Oct 2020 17:43:37 +0000 (10:43 -0700)]
Merge tag 'perf-urgent-2020-10-11' of git://git./linux/kernel/git/tip/tip

Pull perf fix from Ingo Molnar:
 "Fix an error handling bug that can cause a lockup if a CPU is offline
  (doh ...)"

* tag 'perf-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix task_function_call() error handling

3 years agomm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khuge...
Vijay Balakrishna [Sun, 11 Oct 2020 06:16:40 +0000 (23:16 -0700)]
mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged

When memory is hotplug added or removed the min_free_kbytes should be
recalculated based on what is expected by khugepaged.  Currently after
hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost.

This change restores min_free_kbytes as expected for THP consumers.

[vijayb@linux.microsoft.com: v5]
Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com
Fixes: f000565adb77 ("thp: set recommended min free kbytes")
Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Allen Pais <apais@microsoft.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.microsoft.com
Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: validate inode in mapping_set_error()
Minchan Kim [Sun, 11 Oct 2020 06:16:37 +0000 (23:16 -0700)]
mm: validate inode in mapping_set_error()

The swap address_space doesn't have host. Thus, it makes kernel crash once
swap write meets error. Fix it.

Fixes: 735e4ae5ba28 ("vfs: track per-sb writeback errors and report them to syncfs")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Andres Freund <andres@anarazel.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201010000650.750063-1-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: mmap: Fix general protection fault in unlink_file_vma()
Miaohe Lin [Sun, 11 Oct 2020 06:16:34 +0000 (23:16 -0700)]
mm: mmap: Fix general protection fault in unlink_file_vma()

The syzbot reported the below general protection fault:

  general protection fault, probably for non-canonical address
  0xe00eeaee0000003b: 0000 [#1] PREEMPT SMP KASAN
  KASAN: maybe wild-memory-access in range [0x00777770000001d8-0x00777770000001df]
  CPU: 1 PID: 10488 Comm: syz-executor721 Not tainted 5.9.0-rc3-syzkaller #0
  RIP: 0010:unlink_file_vma+0x57/0xb0 mm/mmap.c:164
  Call Trace:
     free_pgtables+0x1b3/0x2f0 mm/memory.c:415
     exit_mmap+0x2c0/0x530 mm/mmap.c:3184
     __mmput+0x122/0x470 kernel/fork.c:1076
     mmput+0x53/0x60 kernel/fork.c:1097
     exit_mm kernel/exit.c:483 [inline]
     do_exit+0xa8b/0x29f0 kernel/exit.c:793
     do_group_exit+0x125/0x310 kernel/exit.c:903
     get_signal+0x428/0x1f00 kernel/signal.c:2757
     arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811
     exit_to_user_mode_loop kernel/entry/common.c:136 [inline]
     exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:167
     syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:242
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

It's because the ->mmap() callback can change vma->vm_file and fput the
original file.  But the commit d70cec898324 ("mm: mmap: merge vma after
call_mmap() if possible") failed to catch this case and always fput()
the original file, hence add an extra fput().

[ Thanks Hillf for pointing this extra fput() out. ]

Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible")
Reported-by: syzbot+c5d5a51dcbb558ca0cb5@syzkaller.appspotmail.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
Cc: Hongxiang Lou <louhongxiang@huawei.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Link: https://lkml.kernel.org/r/20200916090733.31427-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMAINTAINERS: Antoine Tenart's email address
Antoine Tenart [Sun, 11 Oct 2020 06:16:30 +0000 (23:16 -0700)]
MAINTAINERS: Antoine Tenart's email address

Use my kernel.org address instead of my bootlin.com one.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20201005164533.16811-1-atenart@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMAINTAINERS: change hardening mailing list
Kees Cook [Sun, 11 Oct 2020 06:16:27 +0000 (23:16 -0700)]
MAINTAINERS: change hardening mailing list

As more email from git history gets aimed at the OpenWall
kernel-hardening@ list, there has been a desire to separate "new topics"
from "on-going" work.

To handle this, the superset of hardening email topics are now to be
directed to linux-hardening@vger.kernel.org.

Update the MAINTAINERS file and the .mailmap to accomplish this, so that
linux-hardening@ can be treated like any other regular upstream kernel
development list.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: "Tobin C. Harding" <me@tobin.cc>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/linux-hardening/202010051443.279CC265D@keescook/
Link: https://lkml.kernel.org/r/20201006000012.2768958-1-keescook@chromium.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 10 Oct 2020 23:09:12 +0000 (16:09 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Some more driver bugfixes for I2C. Including a revert - the updated
  series for it will come during the next merge window"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: owl: Clear NACK and BUS error bits
  Revert "i2c: imx: Fix reset of I2SR_IAL flag"
  i2c: meson: fixup rate calculation with filter delay
  i2c: meson: keep peripheral clock enabled
  i2c: meson: fix clock setting overwrite
  i2c: imx: Fix reset of I2SR_IAL flag

3 years agocifs: Fix incomplete memory allocation on setxattr path
Vladimir Zapolskiy [Sat, 10 Oct 2020 18:25:54 +0000 (21:25 +0300)]
cifs: Fix incomplete memory allocation on setxattr path

On setxattr() syscall path due to an apprent typo the size of a dynamically
allocated memory chunk for storing struct smb2_file_full_ea_info object is
computed incorrectly, to be more precise the first addend is the size of
a pointer instead of the wanted object size. Coincidentally it makes no
difference on 64-bit platforms, however on 32-bit targets the following
memcpy() writes 4 bytes of data outside of the dynamically allocated memory.

  =============================================================================
  BUG kmalloc-16 (Not tainted): Redzone overwritten
  -----------------------------------------------------------------------------

  Disabling lock debugging due to kernel taint
  INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc
  INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201
  INFO: Object 0x6f171df3 @offset=352 fp=0x00000000

  Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
  Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69  ........snrub.fi
  Redzone 79e69a6f: 73 68 32 0a                                      sh2.
  Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
  CPU: 0 PID: 8196 Comm: attr Tainted: G    B             5.9.0-rc8+ #3
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
  Call Trace:
   dump_stack+0x54/0x6e
   print_trailer+0x12c/0x134
   check_bytes_and_report.cold+0x3e/0x69
   check_object+0x18c/0x250
   free_debug_processing+0xfe/0x230
   __slab_free+0x1c0/0x300
   kfree+0x1d3/0x220
   smb2_set_ea+0x27d/0x540
   cifs_xattr_set+0x57f/0x620
   __vfs_setxattr+0x4e/0x60
   __vfs_setxattr_noperm+0x4e/0x100
   __vfs_setxattr_locked+0xae/0xd0
   vfs_setxattr+0x4e/0xe0
   setxattr+0x12c/0x1a0
   path_setxattr+0xa4/0xc0
   __ia32_sys_lsetxattr+0x1d/0x20
   __do_fast_syscall_32+0x40/0x70
   do_fast_syscall_32+0x29/0x60
   do_SYSENTER_32+0x15/0x20
   entry_SYSENTER_32+0x9f/0xf2

Fixes: 5517554e4313 ("cifs: Add support for writing attributes on SMB2+")
Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/khugepaged: fix filemap page_to_pgoff(page) != offset
Hugh Dickins [Sat, 10 Oct 2020 03:07:59 +0000 (20:07 -0700)]
mm/khugepaged: fix filemap page_to_pgoff(page) != offset

There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.

Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.

collapse_file(old start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=old offset
  new_page->mapping = mapping
  xas_store(&xas, new_page)

                          filemap_fault
                            page = find_get_page(mapping, offset)
                            // if offset falls inside hpage then
                            // compound_head(page) == hpage
                            lock_page_maybe_drop_mmap()
                              __lock_page(page)

  // collapse fails
  xas_store(&xas, old page)
  new_page->mapping = NULL
  unlock_page(new_page)

collapse_file(new start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=new offset
  new_page->mapping = mapping // mapping becomes valid again

                            // since compound_head(page) == hpage
                            // page_to_pgoff(page) got changed
                            VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)

An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above.  But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.

It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.

Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).

Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.

Reported-by: Denis Lisov <dennis.lissov@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoi2c: owl: Clear NACK and BUS error bits
Cristian Ciocaltea [Thu, 8 Oct 2020 21:44:39 +0000 (00:44 +0300)]
i2c: owl: Clear NACK and BUS error bits

When the NACK and BUS error bits are set by the hardware, the driver is
responsible for clearing them by writing "1" into the corresponding
status registers.

Hence perform the necessary operations in owl_i2c_interrupt().

Fixes: d211e62af466 ("i2c: Add Actions Semiconductor Owl family S900 I2C driver")
Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
3 years agoRevert "i2c: imx: Fix reset of I2SR_IAL flag"
Wolfram Sang [Sat, 10 Oct 2020 11:03:54 +0000 (13:03 +0200)]
Revert "i2c: imx: Fix reset of I2SR_IAL flag"

This reverts commit fa4d30556883f2eaab425b88ba9904865a4d00f3. An updated
version was sent. So, revert this version and give the new version more
time for testing.

Signed-off-by: Wolfram Sang <wsa@kernel.org>
3 years agoMerge tag 'spi-fix-v5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Sat, 10 Oct 2020 01:05:12 +0000 (18:05 -0700)]
Merge tag 'spi-fix-v5.9-rc8' of git://git./linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
 "One last minute fix for v5.9 which has been causing crashes in test
  systems with the fsl-dspi driver when they hit deferred probe (and
  which I probably let cook in next a bit longer than is ideal).

  And an update to MAINTAINERS reflecting Serge's extensive and
  detailed recent work on the DesignWare driver"

* tag 'spi-fix-v5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  MAINTAINERS: Add maintainer of DW APB SSI driver
  spi: fsl-dspi: fix NULL pointer dereference

3 years agoMerge tag 'riscv-for-linus-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 9 Oct 2020 18:49:22 +0000 (11:49 -0700)]
Merge tag 'riscv-for-linus-5.9' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "Two fixes this week:

   - A fix to actually reserve the device tree's memory. Without this
     the device tree can be overwritten on systems that don't otherwise
     reserve it. This issue should only manifest on !MMU systems.

   - A workaround for a BUG() that triggers when the memory that
     originally contained initdata is freed and later repurposed. This
     triggers a BUG() on builds that had HARDENED_USERCOPY enabled"

* tag 'riscv-for-linus-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fixup bootup failure with HARDENED_USERCOPY
  RISC-V: Make sure memblock reserves the memory containing DT

3 years agoMerge tag 'for-v5.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux...
Linus Torvalds [Fri, 9 Oct 2020 18:38:07 +0000 (11:38 -0700)]
Merge tag 'for-v5.9-rc' of git://git./linux/kernel/git/sre/linux-power-supply

Pull power supply fix from Sebastian Reichel:
 "Just a single change to revert enablement of packet error checking for
  battery data on Chromebooks, since some of their embedded controllers
  do not handle it correctly"

* tag 'for-v5.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: sbs-battery: chromebook workaround for PEC

3 years agoMerge tag 'gpio-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux...
Linus Torvalds [Fri, 9 Oct 2020 18:33:48 +0000 (11:33 -0700)]
Merge tag 'gpio-v5.9-3' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
 "Some late fixes: one IRQ issue and one compilation issue for UML.

   - Fix a compilation issue with User Mode Linux

   - Handle spurious interrupts properly in the PCA953x driver"

* tag 'gpio-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: pca953x: Survive spurious interrupts
  gpiolib: Disable compat ->read() code in UML case

3 years agoMerge tag 'mmc-v5.9-rc4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 9 Oct 2020 17:10:52 +0000 (10:10 -0700)]
Merge tag 'mmc-v5.9-rc4-4' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fix from Ulf Hansson:
 "Assign a proper discard granularity rather than incorrectly set it to
  zero"

* tag 'mmc-v5.9-rc4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: don't set limits.discard_granularity as 0

3 years agoMerge tag 'drm-fixes-2020-10-09' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 9 Oct 2020 16:59:36 +0000 (09:59 -0700)]
Merge tag 'drm-fixes-2020-10-09' of git://anongit.freedesktop.org/drm/drm

Pull amdgpu drm fixes from Dave Airlie:
 "Fixes trickling in this week.

  Alex had a final fix for the newest GPU they introduced in rc1, along
  with one build regression and one crasher fix.

  Cross my fingers that's it for 5.9:

   - Fix a crash on renoir if you override the IP discovery parameter

   - Fix the build on ARC platforms

   - Display fix for Sienna Cichlid"

* tag 'drm-fixes-2020-10-09' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Change ABM config init interface
  drm/amdgpu/swsmu: fix ARC build errors
  drm/amdgpu: fix NULL pointer dereference for Renoir

3 years agommc: core: don't set limits.discard_granularity as 0
Coly Li [Fri, 2 Oct 2020 01:38:52 +0000 (09:38 +0800)]
mmc: core: don't set limits.discard_granularity as 0

In mmc_queue_setup_discard() the mmc driver queue's discard_granularity
might be set as 0 (when card->pref_erase > max_discard) while the mmc
device still declares to support discard operation. This is buggy and
triggered the following kernel warning message,

WARNING: CPU: 0 PID: 135 at __blkdev_issue_discard+0x200/0x294
CPU: 0 PID: 135 Comm: f2fs_discard-17 Not tainted 5.9.0-rc6 #1
Hardware name: Google Kevin (DT)
pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
pc : __blkdev_issue_discard+0x200/0x294
lr : __blkdev_issue_discard+0x54/0x294
sp : ffff800011dd3b10
x29: ffff800011dd3b10 x28: 0000000000000000 x27: ffff800011dd3cc4 x26: ffff800011dd3e18 x25: 000000000004e69b x24: 0000000000000c40 x23: ffff0000f1deaaf0 x22: ffff0000f2849200 x21: 00000000002734d8 x20: 0000000000000008 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000394 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 00000000000008b0 x9 : ffff800011dd3cb0 x8 : 000000000004e69b x7 : 0000000000000000 x6 : ffff0000f1926400 x5 : ffff0000f1940800 x4 : 0000000000000000 x3 : 0000000000000c40 x2 : 0000000000000008 x1 : 00000000002734d8 x0 : 0000000000000000 Call trace:
__blkdev_issue_discard+0x200/0x294
__submit_discard_cmd+0x128/0x374
__issue_discard_cmd_orderly+0x188/0x244
__issue_discard_cmd+0x2e8/0x33c
issue_discard_thread+0xe8/0x2f0
kthread+0x11c/0x120
ret_from_fork+0x10/0x1c
---[ end trace e4c8023d33dfe77a ]---

This patch fixes the issue by setting discard_granularity as SECTOR_SIZE
instead of 0 when (card->pref_erase > max_discard) is true. Now no more
complain from __blkdev_issue_discard() for the improper value of discard
granularity.

This issue is exposed after commit b35fd7422c2f ("block: check queue's
limits.discard_granularity in __blkdev_issue_discard()"), a "Fixes:" tag
is also added for the commit to make sure people won't miss this patch
after applying the change of __blkdev_issue_discard().

Fixes: e056a1b5b67b ("mmc: queue: let host controllers specify maximum discard timeout")
Fixes: b35fd7422c2f ("block: check queue's limits.discard_granularity in __blkdev_issue_discard()").
Reported-and-tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20201002013852.51968-1-colyli@suse.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
3 years agoperf: Fix task_function_call() error handling
Kajol Jain [Thu, 27 Aug 2020 06:47:32 +0000 (12:17 +0530)]
perf: Fix task_function_call() error handling

The error handling introduced by commit:

  2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")

looses any return value from smp_call_function_single() that is not
{0, -EINVAL}. This is a problem because it will return -EXNIO when the
target CPU is offline. Worse, in that case it'll turn into an infinite
loop.

Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20200827064732.20860-1-kjain@linux.ibm.com
3 years agoMerge tag 'amd-drm-fixes-5.9-2020-10-08' of git://people.freedesktop.org/~agd5f/linux...
Dave Airlie [Fri, 9 Oct 2020 03:02:49 +0000 (13:02 +1000)]
Merge tag 'amd-drm-fixes-5.9-2020-10-08' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.9-2020-10-08:

amdgpu:
- Fix a crash on renoir if you override the IP discovery parameter
- Fix the build on ARC platforms
- Display fix for Sienna Cichlid

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201009024917.3984-1-alexander.deucher@amd.com
3 years agoMerge tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 9 Oct 2020 01:48:34 +0000 (18:48 -0700)]
Merge tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few fixes that should go into this release:

   - NVMe controller error path reference fix (Chaitanya)

   - Fix regression with IBM partitions on non-dasd devices (Christoph)

   - Fix a missing clear in the compat CDROM packet structure (Peilin)"

* tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block:
  partitions/ibm: fix non-DASD devices
  nvme-core: put ctrl ref when module ref get fail
  block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg()

3 years agopower: supply: sbs-battery: chromebook workaround for PEC
Sebastian Reichel [Sun, 4 Oct 2020 22:46:01 +0000 (00:46 +0200)]
power: supply: sbs-battery: chromebook workaround for PEC

Looks like the I2C tunnel implementation from Chromebook's
embedded controller does not handle PEC correctly. Fix this
by disabling PEC for batteries behind those I2C tunnels as
a workaround.

Note, that some Chromebooks actually have been reported to
have working PEC support (with I2C tunnel). Since the problem
has not yet been fully understood this simply reverts all
Chromebooks to not use PEC for now.

Reported-by: "Milan P. Stanić" <mps@arvanta.net>
Reported-by: Vicente Bergas <vicencb@gmail.com>
CC: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Fixes: 7222bd603dd2 ("power: supply: sbs-battery: add PEC support")
Tested-by: Vicente Bergas <vicencb@gmail.com>
Tested-by: "Milan P. Stanić" <mps@arvanta.net>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
3 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Thu, 8 Oct 2020 21:25:46 +0000 (14:25 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull vhost fixes from Michael Tsirkin:
 "Some last minute vhost,vdpa fixes.

  The last two of them haven't been in next but they do seem kind of
  obvious, very small and safe, fix bugs reported in the field, and they
  are both in a new mlx5 vdpa driver, so it's not like we can introduce
  regressions"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa/mlx5: Fix dependency on MLX5_CORE
  vdpa/mlx5: should keep avail_index despite device status
  vhost-vdpa: fix page pinning leakage in error path
  vhost-vdpa: fix vhost_vdpa_map() on error condition
  vhost: Don't call log_access_ok() when using IOTLB
  vhost: Use vhost_get_used_size() in vhost_vring_set_addr()
  vhost: Don't call access_ok() when using IOTLB
  vhost vdpa: fix vhost_vdpa_open error handling

3 years agodrm/amd/display: Change ABM config init interface
Yongqiang Sun [Fri, 31 Jul 2020 17:57:05 +0000 (13:57 -0400)]
drm/amd/display: Change ABM config init interface

[Why & How]
change abm config init interface to support multiple ABMs.

Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Chris Park <Chris.Park@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 8 Oct 2020 21:11:21 +0000 (14:11 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "One more set of fixes from the networking tree:

   - add missing input validation in nl80211_del_key(), preventing
     out-of-bounds access

   - last minute fix / improvement of a MRP netlink (uAPI) interface
     introduced in 5.9 (current) release

   - fix "unresolved symbol" build error under CONFIG_NET w/o
     CONFIG_INET due to missing tcp_timewait_sock and inet_timewait_sock
     BTF.

   - fix 32 bit sub-register bounds tracking in the bpf verifier for OR
     case

   - tcp: fix receive window update in tcp_add_backlog()

   - openvswitch: handle DNAT tuple collision in conntrack-related code

   - r8169: wait for potential PHY reset to finish after applying a FW
     file, avoiding unexpected PHY behaviour and failures later on

   - mscc: fix tail dropping watermarks for Ocelot switches

   - avoid use-after-free in macsec code after a call to the GRO layer

   - avoid use-after-free in sctp error paths

   - add a device id for Cellient MPL200 WWAN card

   - rxrpc fixes:
      - fix the xdr encoding of the contents read from an rxrpc key
      - fix a BUG() for a unsupported encoding type.
      - fix missing _bh lock annotations.
      - fix acceptance handling for an incoming call where the incoming
        call is encrypted.
      - the server token keyring isn't network namespaced - it belongs
        to the server, so there's no need. Namespacing it means that
        request_key() fails to find it.
      - fix a leak of the server keyring"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (21 commits)
  net: usb: qmi_wwan: add Cellient MPL200 card
  macsec: avoid use-after-free in macsec_handle_frame()
  r8169: consider that PHY reset may still be in progress after applying firmware
  openvswitch: handle DNAT tuple collision
  sctp: fix sctp_auth_init_hmacs() error path
  bridge: Netlink interface fix.
  net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()
  bpf: Fix scalar32_min_max_or bounds tracking
  tcp: fix receive window update in tcp_add_backlog()
  net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails
  mptcp: more DATA FIN fixes
  net: mscc: ocelot: warn when encoding an out-of-bounds watermark value
  net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOP
  net: qrtr: ns: Fix the incorrect usage of rcu_read_lock()
  rxrpc: Fix server keyring leak
  rxrpc: The server keyring isn't network-namespaced
  rxrpc: Fix accept on a connection that need securing
  rxrpc: Fix some missing _bh annotations on locking conn->state_lock
  rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()
  rxrpc: Fix rxkad token xdr encoding
  ...

3 years agovdpa/mlx5: Fix dependency on MLX5_CORE
Eli Cohen [Wed, 7 Oct 2020 06:40:11 +0000 (09:40 +0300)]
vdpa/mlx5: Fix dependency on MLX5_CORE

Remove propmt for selecting MLX5_VDPA by the user and modify
MLX5_VDPA_NET to select MLX5_VDPA. Also modify MLX5_VDPA_NET to depend
on mlx5_core.

This fixes an issue where configuration sets 'y' for MLX5_VDPA_NET while
MLX5_CORE is compiled as a module causing link errors.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 device")s
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20201007064011.GA50074@mtl-vdi-166.wap.labs.mlnx
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
3 years agovdpa/mlx5: should keep avail_index despite device status
Si-Wei Liu [Thu, 1 Oct 2020 20:18:31 +0000 (16:18 -0400)]
vdpa/mlx5: should keep avail_index despite device status

A VM with mlx5 vDPA has below warnings while being reset:

vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)

We should allow userspace emulating the virtio device be
able to get to vq's avail_index, regardless of vDPA device
status. Save the index that was last seen when virtq was
stopped, so that userspace doesn't complain.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Link: https://lore.kernel.org/r/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
3 years agonet: usb: qmi_wwan: add Cellient MPL200 card
Wilken Gottwalt [Thu, 8 Oct 2020 07:21:38 +0000 (09:21 +0200)]
net: usb: qmi_wwan: add Cellient MPL200 card

Add usb ids of the Cellient MPL200 card.

Signed-off-by: Wilken Gottwalt <wilken.gottwalt@mailbox.org>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomacsec: avoid use-after-free in macsec_handle_frame()
Eric Dumazet [Wed, 7 Oct 2020 08:42:46 +0000 (01:42 -0700)]
macsec: avoid use-after-free in macsec_handle_frame()

De-referencing skb after call to gro_cells_receive() is not allowed.
We need to fetch skb->len earlier.

Fixes: 5491e7c6b1a9 ("macsec: enable GRO and RPS on macsec devices")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agor8169: consider that PHY reset may still be in progress after applying firmware
Heiner Kallweit [Wed, 7 Oct 2020 11:34:51 +0000 (13:34 +0200)]
r8169: consider that PHY reset may still be in progress after applying firmware

Some firmware files trigger a PHY soft reset and don't wait for it to
be finished. PHY register writes directly after applying the firmware
may fail or provide unexpected results therefore. Fix this by waiting
for bit BMCR_RESET to be cleared after applying firmware.

There's nothing wrong with the referenced change, it's just that the
fix will apply cleanly only after this change.

Fixes: 89fbd26cca7e ("r8169: fix firmware not resetting tp->ocp_base")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoopenvswitch: handle DNAT tuple collision
Dumitru Ceara [Wed, 7 Oct 2020 15:48:03 +0000 (17:48 +0200)]
openvswitch: handle DNAT tuple collision

With multiple DNAT rules it's possible that after destination
translation the resulting tuples collide.

For example, two openvswitch flows:
nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))

Assuming two TCP clients initiating the following connections:
10.0.0.10:5000->10.0.0.10:10
10.0.0.10:5000->10.0.0.20:10

Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing
nf_conntrack_confirm() to fail because of tuple collision.

Netfilter handles this case by allocating a null binding for SNAT at
egress by default.  Perform the same operation in openvswitch for DNAT
if no explicit SNAT is requested by the user and allocate a null binding
for SNAT for packets in the "original" direction.

Reported-at: https://bugzilla.redhat.com/1877128
Suggested-by: Florian Westphal <fw@strlen.de>
Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agosctp: fix sctp_auth_init_hmacs() error path
Eric Dumazet [Thu, 8 Oct 2020 08:38:31 +0000 (01:38 -0700)]
sctp: fix sctp_auth_init_hmacs() error path

After freeing ep->auth_hmacs we have to clear the pointer
or risk use-after-free as reported by syzbot:

BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
BUG: KASAN: use-after-free in sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
Read of size 8 at addr ffff8880a8ff52c0 by task syz-executor941/6874

CPU: 0 PID: 6874 Comm: syz-executor941 Not tainted 5.9.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
 sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
 sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
 sctp_endpoint_destroy+0x95/0x240 net/sctp/endpointola.c:203
 sctp_endpoint_put net/sctp/endpointola.c:236 [inline]
 sctp_endpoint_free+0xd6/0x110 net/sctp/endpointola.c:183
 sctp_destroy_sock+0x9c/0x3c0 net/sctp/socket.c:4981
 sctp_v6_destroy_sock+0x11/0x20 net/sctp/socket.c:9415
 sk_common_release+0x64/0x390 net/core/sock.c:3254
 sctp_close+0x4ce/0x8b0 net/sctp/socket.c:1533
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475
 __sock_release+0xcd/0x280 net/socket.c:596
 sock_close+0x18/0x20 net/socket.c:1277
 __fput+0x285/0x920 fs/file_table.c:281
 task_work_run+0xdd/0x190 kernel/task_work.c:141
 exit_task_work include/linux/task_work.h:25 [inline]
 do_exit+0xb7d/0x29f0 kernel/exit.c:806
 do_group_exit+0x125/0x310 kernel/exit.c:903
 __do_sys_exit_group kernel/exit.c:914 [inline]
 __se_sys_exit_group kernel/exit.c:912 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43f278
Code: Bad RIP value.
RSP: 002b:00007fffe0995c38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f278
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004bf068 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 6874:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
 kmem_cache_alloc_trace+0x174/0x300 mm/slab.c:3554
 kmalloc include/linux/slab.h:554 [inline]
 kmalloc_array include/linux/slab.h:593 [inline]
 kcalloc include/linux/slab.h:605 [inline]
 sctp_auth_init_hmacs+0xdb/0x3b0 net/sctp/auth.c:464
 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
 sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
 __sys_setsockopt+0x2db/0x610 net/socket.c:2132
 __do_sys_setsockopt net/socket.c:2143 [inline]
 __se_sys_setsockopt net/socket.c:2140 [inline]
 __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 6874:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
 __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
 __cache_free mm/slab.c:3422 [inline]
 kfree+0x10e/0x2b0 mm/slab.c:3760
 sctp_auth_destroy_hmacs net/sctp/auth.c:511 [inline]
 sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
 sctp_auth_init_hmacs net/sctp/auth.c:496 [inline]
 sctp_auth_init_hmacs+0x2b7/0x3b0 net/sctp/auth.c:454
 sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
 sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
 sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
 __sys_setsockopt+0x2db/0x610 net/socket.c:2132
 __do_sys_setsockopt net/socket.c:2143 [inline]
 __se_sys_setsockopt net/socket.c:2140 [inline]
 __x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1f485649f529 ("[SCTP]: Implement SCTP-AUTH internals")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'mac80211-for-net-2020-10-08' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Thu, 8 Oct 2020 19:18:34 +0000 (12:18 -0700)]
Merge tag 'mac80211-for-net-2020-10-08' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
pull-request: mac80211 2020-10-08

A single fix for missing input validation in nl80211.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Thu, 8 Oct 2020 19:05:37 +0000 (12:05 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2020-10-08

The main changes are:

1) Fix "unresolved symbol" build error under CONFIG_NET w/o CONFIG_INET due
   to missing tcp_timewait_sock and inet_timewait_sock BTF, from Yonghong Song.

2) Fix 32 bit sub-register bounds tracking for OR case, from Daniel Borkmann.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agobridge: Netlink interface fix.
Henrik Bjoernlund [Wed, 7 Oct 2020 12:07:00 +0000 (12:07 +0000)]
bridge: Netlink interface fix.

This commit is correcting NETLINK br_fill_ifinfo() to be able to
handle 'filter_mask' with multiple flags asserted.

Fixes: 36a8e8e265420 ("bridge: Extend br_fill_ifinfo to return MPR status")

Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Suggested-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'drm-fixes-2020-10-08' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Thu, 8 Oct 2020 18:14:17 +0000 (11:14 -0700)]
Merge tag 'drm-fixes-2020-10-08' of git://anongit.freedesktop.org/drm/drm

Pull drm nouveau fixes from Dave Airlie:
 "Karol found two last minute nouveau fixes, they both fix crashes, the
  TTM one follows what other drivers do already, and the other is for
  bailing on load on unrecognised chipsets.

   - fix crash in TTM alloc fail path

   - return error earlier for unknown chipsets"

* tag 'drm-fixes-2020-10-08' of git://anongit.freedesktop.org/drm/drm:
  drm/nouveau/mem: guard against NULL pointer access in mem_del
  drm/nouveau/device: return error for unknown chipsets

3 years agoMerge tag 'exfat-for-5.9-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/linkin...
Linus Torvalds [Thu, 8 Oct 2020 18:10:13 +0000 (11:10 -0700)]
Merge tag 'exfat-for-5.9-rc9' of git://git./linux/kernel/git/linkinjeon/exfat

Pull exfat fixes from Namjae Jeon:

 - Fix use of uninitialized spinlock on error path

 - Fix missing err assignment in exfat_build_inode()

* tag 'exfat-for-5.9-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix use of uninitialized spinlock on error path
  exfat: fix pointer error checking

3 years agoMerge tag 'for-linus-5.9b-rc9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 8 Oct 2020 18:01:53 +0000 (11:01 -0700)]
Merge tag 'for-linus-5.9b-rc9-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
 "One fix for a regression when booting as a Xen guest on ARM64
  introduced probably during the 5.9 cycle. It is very low risk as it is
  modifying Xen specific code only.

  The exact commit introducing the bug hasn't been identified yet, but
  everything was fine in 5.8 and only in 5.9 some configurations started
  to fail"

* tag 'for-linus-5.9b-rc9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  arm/arm64: xen: Fix to convert percpu address to gfn correctly

3 years agoafs: Fix deadlock between writeback and truncate
David Howells [Wed, 7 Oct 2020 13:22:12 +0000 (14:22 +0100)]
afs: Fix deadlock between writeback and truncate

The afs filesystem has a lock[*] that it uses to serialise I/O operations
going to the server (vnode->io_lock), as the server will only perform one
modification operation at a time on any given file or directory.  This
prevents the the filesystem from filling up all the call slots to a server
with calls that aren't going to be executed in parallel anyway, thereby
allowing operations on other files to obtain slots.

  [*] Note that is probably redundant for directories at least since
      i_rwsem is used to serialise directory modifications and
      lookup/reading vs modification.  The server does allow parallel
      non-modification ops, however.

When a file truncation op completes, we truncate the in-memory copy of the
file to match - but we do it whilst still holding the io_lock, the idea
being to prevent races with other operations.

However, if writeback starts in a worker thread simultaneously with
truncation (whilst notify_change() is called with i_rwsem locked, writeback
pays it no heed), it may manage to set PG_writeback bits on the pages that
will get truncated before afs_setattr_success() manages to call
truncate_pagecache().  Truncate will then wait for those pages - whilst
still inside io_lock:

    # cat /proc/8837/stack
    [<0>] wait_on_page_bit_common+0x184/0x1e7
    [<0>] truncate_inode_pages_range+0x37f/0x3eb
    [<0>] truncate_pagecache+0x3c/0x53
    [<0>] afs_setattr_success+0x4d/0x6e
    [<0>] afs_wait_for_operation+0xd8/0x169
    [<0>] afs_do_sync_operation+0x16/0x1f
    [<0>] afs_setattr+0x1fb/0x25d
    [<0>] notify_change+0x2cf/0x3c4
    [<0>] do_truncate+0x7f/0xb2
    [<0>] do_sys_ftruncate+0xd1/0x104
    [<0>] do_syscall_64+0x2d/0x3a
    [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The writeback operation, however, stalls indefinitely because it needs to
get the io_lock to proceed:

    # cat /proc/5940/stack
    [<0>] afs_get_io_locks+0x58/0x1ae
    [<0>] afs_begin_vnode_operation+0xc7/0xd1
    [<0>] afs_store_data+0x1b2/0x2a3
    [<0>] afs_write_back_from_locked_page+0x418/0x57c
    [<0>] afs_writepages_region+0x196/0x224
    [<0>] afs_writepages+0x74/0x156
    [<0>] do_writepages+0x2d/0x56
    [<0>] __writeback_single_inode+0x84/0x207
    [<0>] writeback_sb_inodes+0x238/0x3cf
    [<0>] __writeback_inodes_wb+0x68/0x9f
    [<0>] wb_writeback+0x145/0x26c
    [<0>] wb_do_writeback+0x16a/0x194
    [<0>] wb_workfn+0x74/0x177
    [<0>] process_one_work+0x174/0x264
    [<0>] worker_thread+0x117/0x1b9
    [<0>] kthread+0xec/0xf1
    [<0>] ret_from_fork+0x1f/0x30

and thus deadlock has occurred.

Note that whilst afs_setattr() calls filemap_write_and_wait(), the fact
that the caller is holding i_rwsem doesn't preclude more pages being
dirtied through an mmap'd region.

Fix this by:

 (1) Use the vnode validate_lock to mediate access between afs_setattr()
     and afs_writepages():

     (a) Exclusively lock validate_lock in afs_setattr() around the whole
       RPC operation.

     (b) If WB_SYNC_ALL isn't set on entry to afs_writepages(), trying to
       shared-lock validate_lock and returning immediately if we couldn't
       get it.

     (c) If WB_SYNC_ALL is set, wait for the lock.

     The validate_lock is also used to validate a file and to zap its cache
     if the file was altered by a third party, so it's probably a good fit
     for this.

 (2) Move the truncation outside of the io_lock in setattr, using the same
     hook as is used for local directory editing.

     This requires the old i_size to be retained in the operation record as
     we commit the revised status to the inode members inside the io_lock
     still, but we still need to know if we reduced the file size.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm: avoid early COW write protect games during fork()
Linus Torvalds [Mon, 28 Sep 2020 19:50:03 +0000 (12:50 -0700)]
mm: avoid early COW write protect games during fork()

In commit 70e806e4e645 ("mm: Do early cow for pinned pages during fork()
for ptes") we write-protected the PTE before doing the page pinning
check, in order to avoid a race with concurrent fast-GUP pinning (which
doesn't take the mm semaphore or the page table lock).

That trick doesn't actually work - it doesn't handle memory ordering
properly, and doing so would be prohibitively expensive.

It also isn't really needed.  While we're moving in the direction of
allowing and supporting page pinning without marking the pinned area
with MADV_DONTFORK, the fact is that we've never really supported this
kind of odd "concurrent fork() and page pinning", and doing the
serialization on a pte level is just wrong.

We can add serialization with a per-mm sequence counter, so we know how
to solve that race properly, but we'll do that at a more appropriate
time.  Right now this just removes the write protect games.

It also turns out that the write protect games actually break on Power,
as reported by Aneesh Kumar:

 "Architecture like ppc64 expects set_pte_at to be not used for updating
  a valid pte. This is further explained in commit 56eecdb912b5 ("mm:
  Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")"

and the code triggered a warning there:

  WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
  Call Trace:
    copy_present_page mm/memory.c:857 [inline]
    copy_present_pte mm/memory.c:899 [inline]
    copy_pte_range mm/memory.c:1014 [inline]
    copy_pmd_range mm/memory.c:1092 [inline]
    copy_pud_range mm/memory.c:1127 [inline]
    copy_p4d_range mm/memory.c:1150 [inline]
    copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212
    dup_mmap kernel/fork.c:592 [inline]
    dup_mm+0x77c/0xab0 kernel/fork.c:1355
    copy_mm kernel/fork.c:1411 [inline]
    copy_process+0x1f00/0x2740 kernel/fork.c:2070
    _do_fork+0xc4/0x10b0 kernel/fork.c:2429

Link: https://lore.kernel.org/lkml/CAHk-=wiWr+gO0Ro4LvnJBMs90OiePNyrE3E+pJvc9PzdBShdmw@mail.gmail.com/
Link: https://lore.kernel.org/linuxppc-dev/20201008092541.398079-1-aneesh.kumar@linux.ibm.com/
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agonet: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()
Anant Thazhemadam [Wed, 7 Oct 2020 03:54:01 +0000 (09:24 +0530)]
net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key()

In nl80211_parse_key(), key.idx is first initialized as -1.
If this value of key.idx remains unmodified and gets returned, and
nl80211_key_allowed() also returns 0, then rdev_del_key() gets called
with key.idx = -1.
This causes an out-of-bounds array access.

Handle this issue by checking if the value of key.idx after
nl80211_parse_key() is called and return -EINVAL if key.idx < 0.

Cc: stable@vger.kernel.org
Reported-by: syzbot+b1bb342d1d097516cbda@syzkaller.appspotmail.com
Tested-by: syzbot+b1bb342d1d097516cbda@syzkaller.appspotmail.com
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Link: https://lore.kernel.org/r/20201007035401.9522-1-anant.thazhemadam@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 years agoi2c: meson: fixup rate calculation with filter delay
Nicolas Belin [Wed, 7 Oct 2020 08:07:51 +0000 (10:07 +0200)]
i2c: meson: fixup rate calculation with filter delay

Apparently, 15 cycles of the peripheral clock are used by the controller
for sampling and filtering. Because this was not known before, the rate
calculation is slightly off.

Clean up and fix the calculation taking this filtering delay into account.

Fixes: 30021e3707a7 ("i2c: add support for Amlogic Meson I2C controller")
Signed-off-by: Nicolas Belin <nbelin@baylibre.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
3 years agoi2c: meson: keep peripheral clock enabled
Jerome Brunet [Wed, 7 Oct 2020 08:07:50 +0000 (10:07 +0200)]
i2c: meson: keep peripheral clock enabled

SCL rate appears to be different than what is expected. For example,
We get 164kHz on i2c3 of the vim3 when 400kHz is expected. This is
partially due to the peripheral clock being disabled when the clock is
set.

Let's keep the peripheral clock on after probe to fix the problem. This
does not affect the SCL output which is still gated when i2c is idle.

Fixes: 09af1c2fa490 ("i2c: meson: set clock divider in probe instead of setting it for each transfer")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
3 years agoi2c: meson: fix clock setting overwrite
Jerome Brunet [Wed, 7 Oct 2020 08:07:49 +0000 (10:07 +0200)]
i2c: meson: fix clock setting overwrite

When the slave address is written in do_start(), SLAVE_ADDR is written
completely. This may overwrite some setting related to the clock rate
or signal filtering.

Fix this by writing only the bits related to slave address. To avoid
causing unexpected changed, explicitly disable filtering or high/low
clock mode which may have been left over by the bootloader.

Fixes: 30021e3707a7 ("i2c: add support for Amlogic Meson I2C controller")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
3 years agoi2c: imx: Fix reset of I2SR_IAL flag
Christian Eggers [Wed, 7 Oct 2020 08:45:22 +0000 (10:45 +0200)]
i2c: imx: Fix reset of I2SR_IAL flag

According to the "VFxxx Controller Reference Manual" (and the comment
block starting at line 97), Vybrid requires writing a one for clearing
an interrupt flag. Syncing the method for clearing I2SR_IIF in
i2c_imx_isr().

Signed-off-by: Christian Eggers <ceggers@arri.de>
Fixes: 4b775022f6fd ("i2c: imx: add struct to hold more configurable quirks")
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Wolfram Sang <wsa@kernel.org>
3 years agobpf: Fix scalar32_min_max_or bounds tracking
Daniel Borkmann [Wed, 7 Oct 2020 13:48:58 +0000 (15:48 +0200)]
bpf: Fix scalar32_min_max_or bounds tracking

Simon reported an issue with the current scalar32_min_max_or() implementation.
That is, compared to the other 32 bit subreg tracking functions, the code in
scalar32_min_max_or() stands out that it's using the 64 bit registers instead
of 32 bit ones. This leads to bounds tracking issues, for example:

  [...]
  8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
  8: (79) r1 = *(u64 *)(r0 +0)
   R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
  9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
  9: (b7) r0 = 1
  10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
  10: (18) r2 = 0x600000002
  12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  12: (ad) if r1 < r2 goto pc+1
   R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  13: (95) exit
  14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  14: (25) if r1 > 0x0 goto pc+1
   R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  15: (95) exit
  16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  16: (47) r1 |= 0
  17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x1; 0x700000000),s32_max_value=1,u32_max_value=1) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  [...]

The bound tests on the map value force the upper unsigned bound to be 25769803777
in 64 bit (0b11000000000000000000000000000000001) and then lower one to be 1. By
using OR they are truncated and thus result in the range [1,1] for the 32 bit reg
tracker. This is incorrect given the only thing we know is that the value must be
positive and thus 2147483647 (0b1111111111111111111111111111111) at max for the
subregs. Fix it by using the {u,s}32_{min,max}_value vars instead. This also makes
sense, for example, for the case where we update dst_reg->s32_{min,max}_value in
the else branch we need to use the newly computed dst_reg->u32_{min,max}_value as
we know that these are positive. Previously, in the else branch the 64 bit values
of umin_value=1 and umax_value=32212254719 were used and latter got truncated to
be 1 as upper bound there. After the fix the subreg range is now correct:

  [...]
  8: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
  8: (79) r1 = *(u64 *)(r0 +0)
   R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=mmmmmmmm
  9: R0=map_value(id=0,off=0,ks=4,vs=48,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
  9: (b7) r0 = 1
  10: R0_w=inv1 R1_w=inv(id=0) R10=fp0 fp-8=mmmmmmmm
  10: (18) r2 = 0x600000002
  12: R0_w=inv1 R1_w=inv(id=0) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  12: (ad) if r1 < r2 goto pc+1
   R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  13: R0_w=inv1 R1_w=inv(id=0,umin_value=25769803778) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  13: (95) exit
  14: R0_w=inv1 R1_w=inv(id=0,umax_value=25769803777,var_off=(0x0; 0x7ffffffff)) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  14: (25) if r1 > 0x0 goto pc+1
   R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  15: R0_w=inv1 R1_w=inv(id=0,umax_value=0,var_off=(0x0; 0x7fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  15: (95) exit
  16: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=25769803777,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  16: (47) r1 |= 0
  17: R0_w=inv1 R1_w=inv(id=0,umin_value=1,umax_value=32212254719,var_off=(0x0; 0x77fffffff),u32_max_value=2147483647) R2_w=inv25769803778 R10=fp0 fp-8=mmmmmmmm
  [...]

Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Simon Scannell <scannell.smn@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
3 years agodrm/amdgpu/swsmu: fix ARC build errors
Alex Deucher [Tue, 6 Oct 2020 13:20:47 +0000 (09:20 -0400)]
drm/amdgpu/swsmu: fix ARC build errors

We want to use the dev_* functions here rather than the pr_* variants.
Switch to using dev_warn() which mirrors what we do on other asics.

Fixes the following build errors on ARC:

../drivers/gpu/drm/amd/amdgpu/../powerplay/navi10_ppt.c: In function 'navi10_fill_i2c_req':
../arch/arc/include/asm/bug.h:24:2: error: implicit declaration of function 'pr_warn'; did you mean 'drm_warn'? [-Werror=implicit-function-declaration]

../drivers/gpu/drm/amd/amdgpu/../powerplay/sienna_cichlid_ppt.c: In function 'sienna_cichlid_fill_i2c_req':
../arch/arc/include/asm/bug.h:24:2: error: implicit declaration of function 'pr_warn'; did you mean 'drm_warn'? [-Werror=implicit-function-declaration]

Reported-by: kernel test robot <lkp@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: fix NULL pointer dereference for Renoir
Dirk Gouders [Thu, 1 Oct 2020 19:55:25 +0000 (21:55 +0200)]
drm/amdgpu: fix NULL pointer dereference for Renoir

Commit c1cf79ca5ced46 ("drm/amdgpu: use IP discovery table for renoir")
introduced a NULL pointer dereference when booting with
amdgpu.discovery=0, because it removed the call of vega10_reg_base_init()
for that case.

Fix this by calling that funcion if amdgpu_discovery == 0 in addition to
the case that amdgpu_discovery_reg_base_init() failed.

Fixes: c1cf79ca5ced46 ("drm/amdgpu: use IP discovery table for renoir")
Signed-off-by: Dirk Gouders <dirk@gouders.net>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agoMerge tag 'nvme-5.9-2020-10-07' of git://git.infradead.org/nvme into block-5.9
Jens Axboe [Wed, 7 Oct 2020 14:24:09 +0000 (08:24 -0600)]
Merge tag 'nvme-5.9-2020-10-07' of git://git.infradead.org/nvme into block-5.9

Pull NVMe fix from Christoph:

"nvme fix for 5.9:

 - fix a recently introduced controller leak (Logan Gunthorpe)"

* tag 'nvme-5.9-2020-10-07' of git://git.infradead.org/nvme:
  nvme-core: put ctrl ref when module ref get fail

3 years agopartitions/ibm: fix non-DASD devices
Christoph Hellwig [Wed, 7 Oct 2020 12:40:09 +0000 (14:40 +0200)]
partitions/ibm: fix non-DASD devices

Don't error out if the dasd_biodasdinfo symbol is not available.

Cc: stable@vger.kernel.org
Fixes: 26d7e28e3820 ("s390/dasd: remove ioctl_by_bdev calls")
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agogpio: pca953x: Survive spurious interrupts
Marc Zyngier [Mon, 5 Oct 2020 14:02:17 +0000 (15:02 +0100)]
gpio: pca953x: Survive spurious interrupts

The pca953x driver never checks the result of irq_find_mapping(),
which returns 0 when no mapping is found. When a spurious interrupt
is delivered (which can happen under obscure circumstances), the
kernel explodes as it still tries to handle the error code as
a real interrupt.

Handle this particular case and warn on spurious interrupts.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201005140217.1390851-1-maz@kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
3 years agogpiolib: Disable compat ->read() code in UML case
Andy Shevchenko [Mon, 5 Oct 2020 13:10:44 +0000 (16:10 +0300)]
gpiolib: Disable compat ->read() code in UML case

It appears that UML (arch/um) has no compat.h header defined and hence
can't compile a recently provided piece of code in GPIO library.

Disable compat ->read() code in UML case to avoid compilation errors.

While at it, use pattern which is already being used in the kernel elsewhere.

Fixes: 5ad284ab3a01 ("gpiolib: Fix line event handling in syscall compatible mode")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20201005131044.87276-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
3 years agonvme-core: put ctrl ref when module ref get fail
Chaitanya Kulkarni [Tue, 6 Oct 2020 23:36:47 +0000 (16:36 -0700)]
nvme-core: put ctrl ref when module ref get fail

When try_module_get() fails in the nvme_dev_open() it returns without
releasing the ctrl reference which was taken earlier.

Put the ctrl reference which is taken before calling the
try_module_get() in the error return code path.

Fixes: 52a3974feb1a "nvme-core: get/put ctrl and transport module in nvme_dev_open/release()"
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agodrm/nouveau/mem: guard against NULL pointer access in mem_del
Karol Herbst [Tue, 6 Oct 2020 22:05:28 +0000 (00:05 +0200)]
drm/nouveau/mem: guard against NULL pointer access in mem_del

other drivers seems to do something similar

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Cc: dri-devel <dri-devel@lists.freedesktop.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006220528.13925-2-kherbst@redhat.com
3 years agodrm/nouveau/device: return error for unknown chipsets
Karol Herbst [Tue, 6 Oct 2020 22:05:27 +0000 (00:05 +0200)]
drm/nouveau/device: return error for unknown chipsets

Previously the code relied on device->pri to be NULL and to fail probing
later. We really should just return an error inside nvkm_device_ctor for
unsupported GPUs.

Fixes: 24d5ff40a732 ("drm/nouveau/device: rework mmio mapping code to get rid of second map")

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Cc: dann frazier <dann.frazier@canonical.com>
Cc: dri-devel <dri-devel@lists.freedesktop.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201006220528.13925-1-kherbst@redhat.com
3 years agoexfat: fix use of uninitialized spinlock on error path
Namjae Jeon [Tue, 29 Sep 2020 00:09:49 +0000 (09:09 +0900)]
exfat: fix use of uninitialized spinlock on error path

syzbot reported warning message:

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1d6/0x29e lib/dump_stack.c:118
 register_lock_class+0xf06/0x1520 kernel/locking/lockdep.c:893
 __lock_acquire+0xfd/0x2ae0 kernel/locking/lockdep.c:4320
 lock_acquire+0x148/0x720 kernel/locking/lockdep.c:5029
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:354 [inline]
 exfat_cache_inval_inode+0x30/0x280 fs/exfat/cache.c:226
 exfat_evict_inode+0x124/0x270 fs/exfat/inode.c:660
 evict+0x2bb/0x6d0 fs/inode.c:576
 exfat_fill_super+0x1e07/0x27d0 fs/exfat/super.c:681
 get_tree_bdev+0x3e9/0x5f0 fs/super.c:1342
 vfs_get_tree+0x88/0x270 fs/super.c:1547
 do_new_mount fs/namespace.c:2875 [inline]
 path_mount+0x179d/0x29e0 fs/namespace.c:3192
 do_mount fs/namespace.c:3205 [inline]
 __do_sys_mount fs/namespace.c:3413 [inline]
 __se_sys_mount+0x126/0x180 fs/namespace.c:3390
 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

If exfat_read_root() returns an error, spinlock is used in
exfat_evict_inode() without initialization. This patch combines
exfat_cache_init_inode() with exfat_inode_init_once() to initialize
spinlock by slab constructor.

Fixes: c35b6810c495 ("exfat: add exfat cache")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: syzbot <syzbot+b91107320911a26c9a95@syzkaller.appspotmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
3 years agoexfat: fix pointer error checking
Tetsuhiro Kohada [Wed, 26 Aug 2020 01:18:29 +0000 (10:18 +0900)]
exfat: fix pointer error checking

Fix missing result check of exfat_build_inode().
And use PTR_ERR_OR_ZERO instead of PTR_ERR.

Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
3 years agoarm/arm64: xen: Fix to convert percpu address to gfn correctly
Masami Hiramatsu [Tue, 6 Oct 2020 06:49:31 +0000 (15:49 +0900)]
arm/arm64: xen: Fix to convert percpu address to gfn correctly

Use per_cpu_ptr_to_phys() instead of virt_to_phys() for per-cpu
address conversion.

In xen_starting_cpu(), per-cpu xen_vcpu_info address is converted
to gfn by virt_to_gfn() macro. However, since the virt_to_gfn(v)
assumes the given virtual address is in linear mapped kernel memory
area, it can not convert the per-cpu memory if it is allocated on
vmalloc area.

This depends on CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK.
If it is enabled, the first chunk of percpu memory is linear mapped.
In the other case, that is allocated from vmalloc area. Moreover,
if the first chunk of percpu has run out until allocating
xen_vcpu_info, it will be allocated on the 2nd chunk, which is
based on kernel memory or vmalloc memory (depends on
CONFIG_NEED_PER_CPU_KM).

Without this fix and kernel configured to use vmalloc area for
the percpu memory, the Dom0 kernel will fail to boot with following
errors.

[    0.466172] Xen: initializing cpu0
[    0.469601] ------------[ cut here ]------------
[    0.474295] WARNING: CPU: 0 PID: 1 at arch/arm64/xen/../../arm/xen/enlighten.c:153 xen_starting_cpu+0x160/0x180
[    0.484435] Modules linked in:
[    0.487565] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #4
[    0.493895] Hardware name: Socionext Developer Box (DT)
[    0.499194] pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
[    0.504836] pc : xen_starting_cpu+0x160/0x180
[    0.509263] lr : xen_starting_cpu+0xb0/0x180
[    0.513599] sp : ffff8000116cbb60
[    0.516984] x29: ffff8000116cbb60 x28: ffff80000abec000
[    0.522366] x27: 0000000000000000 x26: 0000000000000000
[    0.527754] x25: ffff80001156c000 x24: fffffdffbfcdb600
[    0.533129] x23: 0000000000000000 x22: 0000000000000000
[    0.538511] x21: ffff8000113a99c8 x20: ffff800010fe4f68
[    0.543892] x19: ffff8000113a9988 x18: 0000000000000010
[    0.549274] x17: 0000000094fe0f81 x16: 00000000deadbeef
[    0.554655] x15: ffffffffffffffff x14: 0720072007200720
[    0.560037] x13: 0720072007200720 x12: 0720072007200720
[    0.565418] x11: 0720072007200720 x10: 0720072007200720
[    0.570801] x9 : ffff8000100fbdc0 x8 : ffff800010715208
[    0.576182] x7 : 0000000000000054 x6 : ffff00001b790f00
[    0.581564] x5 : ffff800010bbf880 x4 : 0000000000000000
[    0.586945] x3 : 0000000000000000 x2 : ffff80000abec000
[    0.592327] x1 : 000000000000002f x0 : 0000800000000000
[    0.597716] Call trace:
[    0.600232]  xen_starting_cpu+0x160/0x180
[    0.604309]  cpuhp_invoke_callback+0xac/0x640
[    0.608736]  cpuhp_issue_call+0xf4/0x150
[    0.612728]  __cpuhp_setup_state_cpuslocked+0x128/0x2c8
[    0.618030]  __cpuhp_setup_state+0x84/0xf8
[    0.622192]  xen_guest_init+0x324/0x364
[    0.626097]  do_one_initcall+0x54/0x250
[    0.630003]  kernel_init_freeable+0x12c/0x2c8
[    0.634428]  kernel_init+0x1c/0x128
[    0.637988]  ret_from_fork+0x10/0x18
[    0.641635] ---[ end trace d95b5309a33f8b27 ]---
[    0.646337] ------------[ cut here ]------------
[    0.651005] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:158!
[    0.657697] Internal error: Oops - BUG: 0 [#1] SMP
[    0.662548] Modules linked in:
[    0.665676] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.9.0-rc4+ #4
[    0.673398] Hardware name: Socionext Developer Box (DT)
[    0.678695] pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
[    0.684338] pc : xen_starting_cpu+0x178/0x180
[    0.688765] lr : xen_starting_cpu+0x144/0x180
[    0.693188] sp : ffff8000116cbb60
[    0.696573] x29: ffff8000116cbb60 x28: ffff80000abec000
[    0.701955] x27: 0000000000000000 x26: 0000000000000000
[    0.707344] x25: ffff80001156c000 x24: fffffdffbfcdb600
[    0.712718] x23: 0000000000000000 x22: 0000000000000000
[    0.718107] x21: ffff8000113a99c8 x20: ffff800010fe4f68
[    0.723481] x19: ffff8000113a9988 x18: 0000000000000010
[    0.728863] x17: 0000000094fe0f81 x16: 00000000deadbeef
[    0.734245] x15: ffffffffffffffff x14: 0720072007200720
[    0.739626] x13: 0720072007200720 x12: 0720072007200720
[    0.745008] x11: 0720072007200720 x10: 0720072007200720
[    0.750390] x9 : ffff8000100fbdc0 x8 : ffff800010715208
[    0.755771] x7 : 0000000000000054 x6 : ffff00001b790f00
[    0.761153] x5 : ffff800010bbf880 x4 : 0000000000000000
[    0.766534] x3 : 0000000000000000 x2 : 00000000deadbeef
[    0.771916] x1 : 00000000deadbeef x0 : ffffffffffffffea
[    0.777304] Call trace:
[    0.779819]  xen_starting_cpu+0x178/0x180
[    0.783898]  cpuhp_invoke_callback+0xac/0x640
[    0.788325]  cpuhp_issue_call+0xf4/0x150
[    0.792317]  __cpuhp_setup_state_cpuslocked+0x128/0x2c8
[    0.797619]  __cpuhp_setup_state+0x84/0xf8
[    0.801779]  xen_guest_init+0x324/0x364
[    0.805683]  do_one_initcall+0x54/0x250
[    0.809590]  kernel_init_freeable+0x12c/0x2c8
[    0.814016]  kernel_init+0x1c/0x128
[    0.817583]  ret_from_fork+0x10/0x18
[    0.821226] Code: d0006980 f9427c00 cb000300 17ffffea (d4210000)
[    0.827415] ---[ end trace d95b5309a33f8b28 ]---
[    0.832076] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.839815] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/160196697165.60224.17470743378683334995.stgit@devnote2
Signed-off-by: Juergen Gross <jgross@suse.com>
3 years agoriscv: Fixup bootup failure with HARDENED_USERCOPY
Guo Ren [Tue, 6 Oct 2020 16:49:33 +0000 (16:49 +0000)]
riscv: Fixup bootup failure with HARDENED_USERCOPY

6184358da000 ("riscv: Fixup static_obj() fail") attempted to elide a lockdep
failure by rearranging our kernel image to place all initdata within [_stext,
_end], thus triggering lockdep to treat these as static objects.  These objects
are released and eventually reallocated, causing check_kernel_text_object() to
trigger a BUG().

This backs out the change to make [_stext, _end] all-encompassing, instead just
moving initdata.  This results in initdata being outside of [__init_begin,
__init_end], which means initdata can't be freed.

Link: https://lore.kernel.org/linux-riscv/1593266228-61125-1-git-send-email-guoren@kernel.org/T/#t
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
[Palmer: Clean up commit text]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Tue, 6 Oct 2020 19:09:29 +0000 (12:09 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix a kernel panic in the AES crypto code caused by a BR tail call not
  matching the target BTI instruction (when branch target identification
  is enabled)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  crypto: arm64: Use x16 with indirect branch to bti_c

3 years agoMerge tag 'platform-drivers-x86-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 6 Oct 2020 19:00:52 +0000 (12:00 -0700)]
Merge tag 'platform-drivers-x86-v5.9-3' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull another x86 platform driver fix from Hans de Goede:
 "One final pdx86 fix for Tablet Mode reporting regressions (which make
  the keyboard and touchpad unusable) on various Asus notebooks.

  These regressions were caused by the asus-nb-wmi and the intel-vbtn
  drivers both receiving recent patches to start reporting Tablet Mode /
  to report it on more models.

  Due to a miscommunication between Andy and me, Andy's earlier pull-req
  only contained the fix for the intel-vbtn driver and not the fix for
  the asus-nb-wmi code.

  This fix has been tested as a downstream patch in Fedora kernels for
  approx two weeks with no problems being reported"

* tag 'platform-drivers-x86-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: asus-wmi: Fix SW_TABLET_MODE always reporting 1 on many different models

3 years agoMerge tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Tue, 6 Oct 2020 18:05:44 +0000 (11:05 -0700)]
Merge tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Daniel queued these up last week and I took a long weekend so didn't
  get them out, but fixing the OOB access on get font seems like
  something we should land and it's cc'ed stable as well.

  The other big change is a partial revert for a regression on android
  on the clcd fbdev driver, and one other docs fix.

  fbdev:
   - Re-add FB_ARMCLCD for android
   - Fix global-out-of-bounds read in fbcon_get_font()

  core:
   - Small doc fix"

* tag 'drm-fixes-2020-10-06-1' of git://anongit.freedesktop.org/drm/drm:
  drm: drm_dsc.h: fix a kernel-doc markup
  Partially revert "video: fbdev: amba-clcd: Retire elder CLCD driver"
  fbcon: Fix global-out-of-bounds read in fbcon_get_font()
  Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts
  fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h

3 years agousermodehelper: reset umask to default before executing user process
Linus Torvalds [Mon, 5 Oct 2020 17:56:22 +0000 (10:56 -0700)]
usermodehelper: reset umask to default before executing user process

Kernel threads intentionally do CLONE_FS in order to follow any changes
that 'init' does to set up the root directory (or cwd).

It is admittedly a bit odd, but it avoids the situation where 'init'
does some extensive setup to initialize the system environment, and then
we execute a usermode helper program, and it uses the original FS setup
from boot time that may be very limited and incomplete.

[ Both Al Viro and Eric Biederman point out that 'pivot_root()' will
  follow the root regardless, since it fixes up other users of root (see
  chroot_fs_refs() for details), but overmounting root and doing a
  chroot() would not. ]

However, Vegard Nossum noticed that the CLONE_FS not only means that we
follow the root and current working directories, it also means we share
umask with whatever init changed it to. That wasn't intentional.

Just reset umask to the original default (0022) before actually starting
the usermode helper program.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agosplice: teach splice pipe reading about empty pipe buffers
Linus Torvalds [Mon, 5 Oct 2020 18:26:27 +0000 (11:26 -0700)]
splice: teach splice pipe reading about empty pipe buffers

Tetsuo Handa reports that splice() can return 0 before the real EOF, if
the data in the splice source pipe is an empty pipe buffer.  That empty
pipe buffer case doesn't happen in any normal situation, but you can
trigger it by doing a write to a pipe that fails due to a page fault.

Tetsuo has a test-case to show the behavior:

  #define _GNU_SOURCE
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <unistd.h>

  int main(int argc, char *argv[])
  {
const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
int pipe_fd[2] = { -1, -1 };
pipe(pipe_fd);
write(pipe_fd[1], NULL, 4096);
/* This splice() should wait unless interrupted. */
return !splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
  }

which results in

    write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
    splice(4, NULL, 3, NULL, 65536, 0)      = 0

and this can confuse splice() users into believing they have hit EOF
prematurely.

The issue was introduced when the pipe write code started pre-allocating
the pipe buffers before copying data from user space.

This is modified verion of Tetsuo's original patch.

Fixes: a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot")
Link:https://lore.kernel.org/linux-fsdevel/20201005121339.4063-1-penguin-kernel@I-love.SAKURA.ne.jp/
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agocrypto: arm64: Use x16 with indirect branch to bti_c
Jeremy Linton [Tue, 6 Oct 2020 16:33:26 +0000 (11:33 -0500)]
crypto: arm64: Use x16 with indirect branch to bti_c

The AES code uses a 'br x7' as part of a function called by
a macro. That branch needs a bti_j as a target. This results
in a panic as seen below. Using x16 (or x17) with an indirect
branch keeps the target bti_c.

  Bad mode in Synchronous Abort handler detected on CPU1, code 0x34000003 -- BTI
  CPU: 1 PID: 265 Comm: cryptomgr_test Not tainted 5.8.11-300.fc33.aarch64 #1
  pstate: 20400c05 (nzCv daif +PAN -UAO BTYPE=j-)
  pc : aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
  lr : aesbs_xts_encrypt+0x48/0xe0 [aes_neon_bs]
  sp : ffff80001052b730

  aesbs_encrypt8+0x0/0x5f0 [aes_neon_bs]
   __xts_crypt+0xb0/0x2dc [aes_neon_bs]
   xts_encrypt+0x28/0x3c [aes_neon_bs]
  crypto_skcipher_encrypt+0x50/0x84
  simd_skcipher_encrypt+0xc8/0xe0
  crypto_skcipher_encrypt+0x50/0x84
  test_skcipher_vec_cfg+0x224/0x5f0
  test_skcipher+0xbc/0x120
  alg_test_skcipher+0xa0/0x1b0
  alg_test+0x3dc/0x47c
  cryptomgr_test+0x38/0x60

Fixes: 0e89640b640d ("crypto: arm64 - Use modern annotations for assembly functions")
Cc: <stable@vger.kernel.org> # 5.6.x-
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Suggested-by: Dave P Martin <Dave.Martin@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20201006163326.2780619-1-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoMerge tag 'rxrpc-fixes-20201005' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Tue, 6 Oct 2020 13:18:20 +0000 (06:18 -0700)]
Merge tag 'rxrpc-fixes-20201005' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Miscellaneous fixes

Here are some miscellaneous rxrpc fixes:

 (1) Fix the xdr encoding of the contents read from an rxrpc key.

 (2) Fix a BUG() for a unsupported encoding type.

 (3) Fix missing _bh lock annotations.

 (4) Fix acceptance handling for an incoming call where the incoming call
     is encrypted.

 (5) The server token keyring isn't network namespaced - it belongs to the
     server, so there's no need.  Namespacing it means that request_key()
     fails to find it.

 (6) Fix a leak of the server keyring.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotcp: fix receive window update in tcp_add_backlog()
Eric Dumazet [Mon, 5 Oct 2020 13:48:13 +0000 (06:48 -0700)]
tcp: fix receive window update in tcp_add_backlog()

We got reports from GKE customers flows being reset by netfilter
conntrack unless nf_conntrack_tcp_be_liberal is set to 1.

Traces seemed to suggest ACK packet being dropped by the
packet capture, or more likely that ACK were received in the
wrong order.

 wscale=7, SYN and SYNACK not shown here.

 This ACK allows the sender to send 1871*128 bytes from seq 51359321 :
 New right edge of the window -> 51359321+1871*128=51598809

 09:17:23.389210 IP A > B: Flags [.], ack 51359321, win 1871, options [nop,nop,TS val 10 ecr 999], length 0

 09:17:23.389212 IP B > A: Flags [.], seq 51422681:51424089, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 1408
 09:17:23.389214 IP A > B: Flags [.], ack 51422681, win 1376, options [nop,nop,TS val 10 ecr 999], length 0
 09:17:23.389253 IP B > A: Flags [.], seq 51424089:51488857, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 64768
 09:17:23.389272 IP A > B: Flags [.], ack 51488857, win 859, options [nop,nop,TS val 10 ecr 999], length 0
 09:17:23.389275 IP B > A: Flags [.], seq 51488857:51521241, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384

 Receiver now allows to send 606*128=77568 from seq 51521241 :
 New right edge of the window -> 51521241+606*128=51598809

 09:17:23.389296 IP A > B: Flags [.], ack 51521241, win 606, options [nop,nop,TS val 10 ecr 999], length 0

 09:17:23.389308 IP B > A: Flags [.], seq 51521241:51553625, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 32384

 It seems the sender exceeds RWIN allowance, since 51611353 > 51598809

 09:17:23.389346 IP B > A: Flags [.], seq 51553625:51611353, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 57728
 09:17:23.389356 IP B > A: Flags [.], seq 51611353:51618393, ack 1577, win 268, options [nop,nop,TS val 999 ecr 10], length 7040

 09:17:23.389367 IP A > B: Flags [.], ack 51611353, win 0, options [nop,nop,TS val 10 ecr 999], length 0

 netfilter conntrack is not happy and sends RST

 09:17:23.389389 IP A > B: Flags [R], seq 92176528, win 0, length 0
 09:17:23.389488 IP B > A: Flags [R], seq 174478967, win 0, length 0

 Now imagine ACK were delivered out of order and tcp_add_backlog() sets window based on wrong packet.
 New right edge of the window -> 51521241+859*128=51631193

Normally TCP stack handles OOO packets just fine, but it
turns out tcp_add_backlog() does not. It can update the window
field of the aggregated packet even if the ACK sequence
of the last received packet is too old.

Many thanks to Alexandre Ferrieux for independently reporting the issue
and suggesting a fix.

Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>