linux.git
2 days agoLinux 4.8-rc8 master torvalds/master v4.8-rc8
Linus Torvalds [Mon, 26 Sep 2016 01:47:13 +0000 (18:47 -0700)]
Linux 4.8-rc8

2 days agoMerge tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Mon, 26 Sep 2016 01:40:13 +0000 (18:40 -0700)]
Merge tag 'trace-v4.8-rc7' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracefs fixes from Steven Rostedt:
 "Al Viro has been looking at the tracefs code, and has pointed out some
  issues.  This contains one fix by me and one by Al.  I'm sure that
  he'll come up with more but for now I tested these patches and they
  don't appear to have any negative impact on tracing"

* tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  fix memory leaks in tracing_buffers_splice_read()
  tracing: Move mutex to protect against resetting of seq data

2 days agofault_in_multipages_readable() throws set-but-unused error
Dave Chinner [Sun, 25 Sep 2016 23:57:33 +0000 (09:57 +1000)]
fault_in_multipages_readable() throws set-but-unused error

When building XFS with -Werror, it now fails with:

  include/linux/pagemap.h: In function 'fault_in_multipages_readable':
  include/linux/pagemap.h:602:16: error: variable 'c' set but not used [-Werror=unused-but-set-variable]
    volatile char c;
                  ^

This is a regression caused by commit e23d4159b109 ("fix
fault_in_multipages_...() on architectures with no-op access_ok()").
Fix it by re-adding the "(void)c" trick taht was previously used to make
the compiler think the variable is used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 days agomm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
Lorenzo Stoakes [Sun, 11 Sep 2016 22:54:25 +0000 (23:54 +0100)]
mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing

The NUMA balancing logic uses an arch-specific PROT_NONE page table flag
defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page
PMDs respectively as requiring balancing upon a subsequent page fault.
User-defined PROT_NONE memory regions which also have this flag set will
not normally invoke the NUMA balancing code as do_page_fault() will send
a segfault to the process before handle_mm_fault() is even called.

However if access_remote_vm() is invoked to access a PROT_NONE region of
memory, handle_mm_fault() is called via faultin_page() and
__get_user_pages() without any access checks being performed, meaning
the NUMA balancing logic is incorrectly invoked on a non-NUMA memory
region.

A simple means of triggering this problem is to access PROT_NONE mmap'd
memory using /proc/self/mem which reliably results in the NUMA handling
functions being invoked when CONFIG_NUMA_BALANCING is set.

This issue was reported in bugzilla (issue 99101) which includes some
simple repro code.

There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page()
added at commit c0e7cad to avoid accidentally provoking strange
behaviour by attempting to apply NUMA balancing to pages that are in
fact PROT_NONE.  The BUG_ON()'s are consistently triggered by the repro.

This patch moves the PROT_NONE check into mm/memory.c rather than
invoking BUG_ON() as faulting in these pages via faultin_page() is a
valid reason for reaching the NUMA check with the PROT_NONE page table
flag set and is therefore not always a bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 days agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sun, 25 Sep 2016 20:59:52 +0000 (13:59 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
 "A round of 4.8 fixes:

  MIPS generic code:
   - Add a missing ".set pop" in an early commit
   - Fix memory regions reaching top of physical
   - MAAR: Fix address alignment
   - vDSO: Fix Malta EVA mapping to vDSO page structs
   - uprobes: fix incorrect uprobe brk handling
   - uprobes: select HAVE_REGS_AND_STACK_ACCESS_API
   - Avoid a BUG warning during PR_SET_FP_MODE prctl
   - SMP: Fix possibility of deadlock when bringing CPUs online
   - R6: Remove compact branch policy Kconfig entries
   - Fix size calc when avoiding IPIs for small icache flushes
   - Fix pre-r6 emulation FPU initialisation
   - Fix delay slot emulation count in debugfs

  ATH79:
   - Fix test for error return of clk_register_fixed_factor.

  Octeon:
   - Fix kernel header to work for VDSO build.
   - Fix initialization of platform device probing.

  paravirt:
   - Fix undefined reference to smp_bootstrap"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Fix delay slot emulation count in debugfs
  MIPS: SMP: Fix possibility of deadlock when bringing CPUs online
  MIPS: Fix pre-r6 emulation FPU initialisation
  MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
  MIPS: Select HAVE_REGS_AND_STACK_ACCESS_API
  MIPS: Octeon: Fix platform bus probing
  MIPS: Octeon: mangle-port: fix build failure with VDSO code
  MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)
  MIPS: c-r4k: Fix size calc when avoiding IPIs for small icache flushes
  MIPS: Add a missing ".set pop" in an early commit
  MIPS: paravirt: Fix undefined reference to smp_bootstrap
  MIPS: Remove compact branch policy Kconfig entries
  MIPS: MAAR: Fix address alignment
  MIPS: Fix memory regions reaching top of physical
  MIPS: uprobes: fix incorrect uprobe brk handling
  MIPS: ath79: Fix test for error return of clk_register_fixed_factor().

2 days agoMerge tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 25 Sep 2016 20:52:59 +0000 (13:52 -0700)]
Merge tag 'powerpc-4.8-7' of git://git./linux/kernel/git/powerpc/linux

Pull one more powerpc fix from Michael Ellerman:
 "powernv/pci: Fix m64 checks for SR-IOV and window alignment from
  Russell Currey"

* tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment

2 days agoradix tree: fix sibling entry handling in radix_tree_descend()
Linus Torvalds [Sun, 25 Sep 2016 20:32:46 +0000 (13:32 -0700)]
radix tree: fix sibling entry handling in radix_tree_descend()

The fixes to the radix tree test suite show that the multi-order case is
broken.  The basic reason is that the radix tree code uses tagged
pointers with the "internal" bit in the low bits, and calculating the
pointer indices was supposed to mask off those bits.  But gcc will
notice that we then use the index to re-create the pointer, and will
avoid doing the arithmetic and use the tagged pointer directly.

This cleans the code up, using the existing is_sibling_entry() helper to
validate the sibling pointer range (instead of open-coding it), and
using entry_to_node() to mask off the low tag bit from the pointer.  And
once you do that, you might as well just use the now cleaned-up pointer
directly.

[ Side note: the multi-order code isn't actually ever used in the kernel
  right now, and the only reason I didn't just delete all that code is
  that Kirill Shutemov piped up and said:

    "Well, my ext4-with-huge-pages patchset[1] uses multi-order entries.
     It also converts shmem-with-huge-pages and hugetlb to them.

     I'm okay with converting it to other mechanism, but I need
     something.  (I looked into Konstantin's RFC patchset[2].  It looks
     okay, but I don't feel myself qualified to review it as I don't
     know much about radix-tree internals.)"

  [1] http://lkml.kernel.org/r/20160915115523.29737-1-kirill.shutemov@linux.intel.com
  [2] http://lkml.kernel.org/r/147230727479.9957.1087787722571077339.stgit@zurg ]

Reported-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Cedric Blancher <cedric.blancher@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 days agoradix tree test suite: Test radix_tree_replace_slot() for multiorder entries
Matthew Wilcox [Thu, 22 Sep 2016 18:53:34 +0000 (11:53 -0700)]
radix tree test suite: Test radix_tree_replace_slot() for multiorder entries

When we replace a multiorder entry, check that all indices reflect the
new value.

Also, compile the test suite with -O2, which shows other problems with
the code due to some dodgy pointer operations in the radix tree code.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 days agofix memory leaks in tracing_buffers_splice_read()
Al Viro [Sat, 17 Sep 2016 22:31:46 +0000 (18:31 -0400)]
fix memory leaks in tracing_buffers_splice_read()

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2 days agotracing: Move mutex to protect against resetting of seq data
Steven Rostedt (Red Hat) [Sat, 24 Sep 2016 02:57:13 +0000 (22:57 -0400)]
tracing: Move mutex to protect against resetting of seq data

The iter->seq can be reset outside the protection of the mutex. So can
reading of user data. Move the mutex up to the beginning of the function.

Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants")
Cc: stable@vger.kernel.org # 2.6.30+
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
3 days agoMIPS: Fix delay slot emulation count in debugfs
Paul Burton [Thu, 22 Sep 2016 14:47:40 +0000 (15:47 +0100)]
MIPS: Fix delay slot emulation count in debugfs

Commit 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot
instructions") accidentally removed use of the MIPS_FPU_EMU_INC_STATS
macro from do_dsemulret, leading to the ds_emul file in debugfs always
returning zero even though we perform delay slot emulations.

Fix this by re-adding the use of the MIPS_FPU_EMU_INC_STATS macro.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14301/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
3 days agoMIPS: SMP: Fix possibility of deadlock when bringing CPUs online
Matt Redfearn [Thu, 22 Sep 2016 16:15:47 +0000 (17:15 +0100)]
MIPS: SMP: Fix possibility of deadlock when bringing CPUs online

This patch fixes the possibility of a deadlock when bringing up
secondary CPUs.
The deadlock occurs because the set_cpu_online() is called before
synchronise_count_slave(). This can cause a deadlock if the boot CPU,
having scheduled another thread, attempts to send an IPI to the
secondary CPU, which it sees has been marked online. The secondary is
blocked in synchronise_count_slave() waiting for the boot CPU to enter
synchronise_count_master(), but the boot cpu is blocked in
smp_call_function_many() waiting for the secondary to respond to it's
IPI request.

Fix this by marking the CPU online in cpu_callin_map and synchronising
counters before declaring the CPU online and calculating the maps for
IPIs.

Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Reported-by: Justin Chen <justinpopo6@gmail.com>
Tested-by: Justin Chen <justinpopo6@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: stable@vger.kernel.org # v4.1+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14302/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
3 days agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Sep 2016 19:44:28 +0000 (12:44 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "Three fixlets for perf:

   - add a missing NULL pointer check in the intel BTS driver

   - make BTS an exclusive PMU because BTS can only handle one event at
     a time

   - ensure that exclusive events are limited to one PMU so that several
     exclusive events can be scheduled on different PMU instances"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Limit matching exclusive events to one PMU
  perf/x86/intel/bts: Make it an exclusive PMU
  perf/x86/intel/bts: Make sure debug store is valid

3 days agoMerge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Sep 2016 19:41:19 +0000 (12:41 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Two smallish fixes:

   - use the proper asm constraint in the Super-H atomic_fetch_ops

   - a trivial typo fix in the Kconfig help text"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
  locking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()

3 days agoMerge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Sep 2016 19:35:26 +0000 (12:35 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Thomas Gleixner:
 "Two fixes for EFI/PAT:

   - a 32bit overflow bug in the PAT code which was unearthed by the
     large EFI mappings

   - prevent a boot hang on large systems when EFI mixed mode is enabled
     but not used"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Only map RAM into EFI page tables if in mixed-mode
  x86/mm/pat: Prevent hang during boot when mapping pages

3 days agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Sep 2016 19:30:12 +0000 (12:30 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "Three fixes for irq core and irq chip drivers:

   - Do not set the irq type if type is NONE.  Fixes a boot regression
     on various SoCs

   - Use the proper cpu for setting up the GIC target list.  Discovered
     by the cpumask debugging code.

   - A rather large fix for the MIPS-GIC so per cpu local interrupts
     work again.  This was discovered late because the code falls back
     to slower timers which use normal device interrupts"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mips-gic: Fix local interrupts
  irqchip/gicv3: Silence noisy DEBUG_PER_CPU_MAPS warning
  genirq: Skip chained interrupt trigger setup if type is IRQ_TYPE_NONE

3 days agoMerge branch 'hughd-fixes' (patches from Hugh Dickins)
Linus Torvalds [Sat, 24 Sep 2016 18:31:45 +0000 (11:31 -0700)]
Merge branch 'hughd-fixes' (patches from Hugh Dickins)

Merge VM fixes from High Dickins:
 "I get the impression that Andrew is away or busy at the moment, so I'm
  going to send you three independent uncontroversial little mm fixes
  directly - though none is strictly a 4.8 regression fix.

   - shmem: fix tmpfs to handle the huge= option properly from Toshi
     Kani is a one-liner to fix a major embarrassment in 4.8's hugepages
     on tmpfs feature: although Hillf pointed it out in June, somehow
     both Kirill and I repeatedly dropped the ball on this one.  You
     might wonder if the feature got tested at all with that bug in:
     yes, it did, but for wider testing coverage, Kirill and I had each
     relied too much on an override which bypasses that condition.

   - huge tmpfs: fix Committed_AS leak just a run-of-the-mill accounting
     fix in the same feature.

   - mm: delete unnecessary and unsafe init_tlb_ubc() is an unrelated
     fix to 4.3's TLB flush batching in reclaim: the bug would be rare,
     and none of us will be shamed if this one misses 4.8; but it got
     such a quick ack from Mel today that I'm inclined to offer it along
     with the first two"

* emailed patches from Hugh Dickins <hughd@google.com>:
  mm: delete unnecessary and unsafe init_tlb_ubc()
  huge tmpfs: fix Committed_AS leak
  shmem: fix tmpfs to handle the huge= option properly

3 days agomm: delete unnecessary and unsafe init_tlb_ubc()
Hugh Dickins [Sat, 24 Sep 2016 03:27:04 +0000 (20:27 -0700)]
mm: delete unnecessary and unsafe init_tlb_ubc()

init_tlb_ubc() looked unnecessary to me: tlb_ubc is statically
initialized with zeroes in the init_task, and copied from parent to
child while it is quiescent in arch_dup_task_struct(); so I went to
delete it.

But inserted temporary debug WARN_ONs in place of init_tlb_ubc() to
check that it was always empty at that point, and found them firing:
because memcg reclaim can recurse into global reclaim (when allocating
biosets for swapout in my case), and arrive back at the init_tlb_ubc()
in shrink_node_memcg().

Resetting tlb_ubc.flush_required at that point is wrong: if the upper
level needs a deferred TLB flush, but the lower level turns out not to,
we miss a TLB flush.  But fortunately, that's the only part of the
protocol that does not nest: with the initialization removed, cpumask
collects bits from upper and lower levels, and flushes TLB when needed.

Fixes: 72b252aed506 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 days agohuge tmpfs: fix Committed_AS leak
Hugh Dickins [Sat, 24 Sep 2016 03:24:23 +0000 (20:24 -0700)]
huge tmpfs: fix Committed_AS leak

Under swapping load on huge tmpfs, /proc/meminfo's Committed_AS grows
bigger and bigger: just a cosmetic issue for most users, but disabling
for those who run without overcommit (/proc/sys/vm/overcommit_memory 2).

shmem_uncharge() was forgetting to unaccount __vm_enough_memory's
charge, and shmem_charge() was forgetting it on the filesystem-full
error path.

Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 days agoshmem: fix tmpfs to handle the huge= option properly
Toshi Kani [Sat, 24 Sep 2016 03:21:56 +0000 (20:21 -0700)]
shmem: fix tmpfs to handle the huge= option properly

shmem_get_unmapped_area() checks SHMEM_SB(sb)->huge incorrectly, which
leads to a reversed effect of "huge=" mount option.

Fix the check in shmem_get_unmapped_area().

Note, the default value of SHMEM_SB(sb)->huge remains as
SHMEM_HUGE_NEVER.  User will need to specify "huge=" option to enable
huge page mappings.

Reported-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 days agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Fri, 23 Sep 2016 23:44:12 +0000 (16:44 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Three driver bugfixes: fixing uninitialized memory pointers (eg20t),
  pm/clock imbalance (qup), and a wrongly set cached variable (pc954x)"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
  i2c: mux: pca954x: retry updating the mux selection on failure
  i2c-eg20t: fix race between i2c init and interrupt enable

4 days agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Fri, 23 Sep 2016 23:34:24 +0000 (16:34 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:
 "Just a fix up for the firmware handling to the Silead driver (which is
  a new driver in this release)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: silead_gsl1680 - use "silead/" prefix for firmware loading
  Input: silead_gsl1680 - document firmware-name, fix implementation

4 days agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 23 Sep 2016 23:24:36 +0000 (16:24 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Three fixes, two regressions and one that poses a problem in blk-mq
  with the new nvmef code"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx
  nvme-rdma: only clear queue flags after successful connect
  blk-throttle: Extend slice if throttle group is not empty

4 days agoMerge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason...
Linus Torvalds [Fri, 23 Sep 2016 20:39:37 +0000 (13:39 -0700)]
Merge branch 'for-linus-4.8' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
 "Josef fixed a problem when quotas are enabled with his latest ENOSPC
  rework, and Jeff added more checks into the subvol ioctls to avoid
  tripping up lookup_one_len"

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: ensure that file descriptor used with subvol ioctls is a dir
  Btrfs: handle quota reserve failure properly

4 days agoMerge tag 'regmap-fix-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 23 Sep 2016 18:50:49 +0000 (11:50 -0700)]
Merge tag 'regmap-fix-v4.8-rc7' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
 "A fix for an issue with double locking that was introduced earlier
  this release.  I'd missed in review that we were already in a locked
  region when trying to drop part of the cache"

* tag 'regmap-fix-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: fix deadlock on _regmap_raw_write() error path

4 days agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 23 Sep 2016 18:28:04 +0000 (11:28 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes a regression in RSA that was only half-fixed earlier in the
  cycle.  It also fixes an older regression that breaks the keyring
  subsystem"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: rsa-pkcs1pad - Handle leading zero for decryption
  KEYS: Fix skcipher IV clobbering

4 days agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 23 Sep 2016 18:24:42 +0000 (11:24 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "A couple of last-minute arm64 fixes for 4.8:

   - Fix secondary CPU to NUMA node assignment

   - Fix kgdb breakpoint insertion in read-only text sections (when
     CONFIG_DEBUG_RODATA or CONFIG_DEBUG_SET_MODULE_RONX are enabled)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kgdb: handle read-only text / modules
  arm64: Call numa_store_cpu_info() earlier.

4 days agoMerge tag 'tags/nand-fixes-for-4.8-rc8' of git://git.infradead.org/linux-ubifs
Linus Torvalds [Fri, 23 Sep 2016 18:15:00 +0000 (11:15 -0700)]
Merge tag 'tags/nand-fixes-for-4.8-rc8' of git://git.infradead.org/linux-ubifs

Pull MTD fixes from Richard Weinberger:
 "NAND Fixes for 4.8-rc8.

  This contains fixes for bugs which got introduced in -rc1.  Usually
  Brian takes NAND patches from Boris, but since Brian is very busy
  these days with other stuff and Boris is not yet member of the
  kernel.org web of trust I stepped in.

  Boris will be in Berlin at ELCE, I'll sign his key and hopefully other
  Kernel developers too such that he can issue his own pull requests
  soon.

  Summary:

   - Fix a wrong OOB layout definition in the mxc driver
   - Fix incorrect ECC handling in the mtk driver"

* tag 'tags/nand-fixes-for-4.8-rc8' of git://git.infradead.org/linux-ubifs:
  mtd: nand: mxc: fix obiwan error in mxc_nand_v[12]_ooblayout_free() functions
  mtd: nand: fix chances to create incomplete ECC data when writing
  mtd: nand: fix generating over-boundary ECC data when writing

4 days agoMerge tag 'mmc-v4.8-rc7' of git://git.linaro.org/people/ulf.hansson/mmc
Linus Torvalds [Fri, 23 Sep 2016 18:10:53 +0000 (11:10 -0700)]
Merge tag 'mmc-v4.8-rc7' of git://git.linaro.org/people/ulf.hansson/mmc

Pull MMC fix from Ulf Hansson:
 "MMC host:

   - dw_mmc: fix the spamming log message"

* tag 'mmc-v4.8-rc7' of git://git.linaro.org/people/ulf.hansson/mmc:
  mmc: dw_mmc: fix the spamming log message

4 days agoMerge tag 'configfs-for-4.8-2' of git://git.infradead.org/users/hch/configfs
Linus Torvalds [Fri, 23 Sep 2016 16:45:15 +0000 (09:45 -0700)]
Merge tag 'configfs-for-4.8-2' of git://git.infradead.org/users/hch/configfs

Pull configfs fix from Christoph Hellwig:
 "One more trivial fix for the binary attribute code from Phil Turnbull"

* tag 'configfs-for-4.8-2' of git://git.infradead.org/users/hch/configfs:
  configfs: Return -EFBIG from configfs_write_bin_file.

4 days agoblk-mq: skip unmapped queues in blk_mq_alloc_request_hctx
Christoph Hellwig [Fri, 23 Sep 2016 16:25:48 +0000 (10:25 -0600)]
blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx

This provides the caller a feedback that a given hctx is not mapped and thus
no command can be sent on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
4 days agoMIPS: Fix pre-r6 emulation FPU initialisation
Paul Burton [Fri, 23 Sep 2016 14:13:53 +0000 (15:13 +0100)]
MIPS: Fix pre-r6 emulation FPU initialisation

In the mipsr2_decoder() function, used to emulate pre-MIPSr6
instructions that were removed in MIPSr6, the init_fpu() function is
called if a removed pre-MIPSr6 floating point instruction is the first
floating point instruction used by the task. However, init_fpu()
performs varous actions that rely upon not being migrated. For example
in the most basic case it sets the coprocessor 0 Status.CU1 bit to
enable the FPU & then loads FP register context into the FPU registers.
If the task were to migrate during this time, it may end up attempting
to load FP register context on a different CPU where it hasn't set the
CU1 bit, leading to errors such as:

    do_cpu invoked from kernel context![#2]:
    CPU: 2 PID: 7338 Comm: fp-prctl Tainted: G      D         4.7.0-00424-g49b0c82 #2
    task: 838e4000 ti: 88d38000 task.ti: 88d38000
    $ 0   : 00000000 00000001 ffffffff 88d3fef8
    $ 4   : 838e4000 88d38004 00000000 00000001
    $ 8   : 3400fc01 801f8020 808e9100 24000000
    $12   : dbffffff 807b69d8 807b0000 00000000
    $16   : 00000000 80786150 00400fc4 809c0398
    $20   : 809c0338 0040273c 88d3ff28 808e9d30
    $24   : 808e9d30 00400fb4
    $28   : 88d38000 88d3fe88 00000000 8011a2ac
    Hi    : 0040273c
    Lo    : 88d3ff28
    epc   : 80114178 _restore_fp+0x10/0xa0
    ra    : 8011a2ac mipsr2_decoder+0xd5c/0x1660
    Status: 1400fc03 KERNEL EXL IE
    Cause : 1080002c (ExcCode 0b)
    PrId  : 0001a920 (MIPS I6400)
    Modules linked in:
    Process fp-prctl (pid: 7338, threadinfo=88d38000, task=838e4000, tls=766527d0)
    Stack : 00000000 00000000 00000000 88d3fe98 00000000 00000000 809c0398 809c0338
       808e9100 00000000 88d3ff28 00400fc4 00400fc4 0040273c 7fb69e18 004a0000
       004a0000 004a0000 7664add0 8010de18 00000000 00000000 88d3fef8 88d3ff28
       808e9100 00000000 766527d0 8010e534 000c0000 85755000 8181d580 00000000
       00000000 00000000 004a0000 00000000 766527d0 7fb69e18 004a0000 80105c20
       ...
    Call Trace:
    [<80114178>] _restore_fp+0x10/0xa0
    [<8011a2ac>] mipsr2_decoder+0xd5c/0x1660
    [<8010de18>] do_ri+0x90/0x6b8
    [<80105c20>] ret_from_exception+0x0/0x10

Fix this by disabling preemption around the call to init_fpu(), ensuring
that it starts & completes on one CPU.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: b0a668fb2038 ("MIPS: kernel: mips-r2-to-r6-emul: Add R2 emulator for MIPS R6")
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.0+
Patchwork: https://patchwork.linux-mips.org/patch/14305/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
4 days agoarm64: kgdb: handle read-only text / modules
AKASHI Takahiro [Fri, 23 Sep 2016 07:42:08 +0000 (16:42 +0900)]
arm64: kgdb: handle read-only text / modules

Handle read-only cases when CONFIG_DEBUG_RODATA (4.0) or
CONFIG_DEBUG_SET_MODULE_RONX (3.18) are enabled by using
aarch64_insn_write() instead of probe_kernel_write() as introduced by
commit 2f896d586610 ("arm64: use fixmap for text patching") in 4.0.

Fixes: 11d91a770f1f ("arm64: Add CONFIG_DEBUG_SET_MODULE_RONX support")
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 days agoarm64: Call numa_store_cpu_info() earlier.
David Daney [Tue, 20 Sep 2016 18:46:35 +0000 (11:46 -0700)]
arm64: Call numa_store_cpu_info() earlier.

The wq_numa_init() function makes a private CPU to node map by calling
cpu_to_node() early in the boot process, before the non-boot CPUs are
brought online.  Since the default implementation of cpu_to_node()
returns zero for CPUs that have never been brought online, the
workqueue system's view is that *all* CPUs are on node zero.

When the unbound workqueue for a non-zero node is created, the
tsk_cpus_allowed() for the worker threads is the empty set because
there are, in the view of the workqueue system, no CPUs on non-zero
nodes.  The code in try_to_wake_up() using this empty cpumask ends up
using the cpumask empty set value of NR_CPUS as an index into the
per-CPU area pointer array, and gets garbage as it is one past the end
of the array.  This results in:

[    0.881970] Unable to handle kernel paging request at virtual address fffffb1008b926a4
[    1.970095] pgd = fffffc00094b0000
[    1.973530] [fffffb1008b926a4] *pgd=0000000000000000, *pud=0000000000000000, *pmd=0000000000000000
[    1.982610] Internal error: Oops: 96000004 [#1] SMP
[    1.987541] Modules linked in:
[    1.990631] CPU: 48 PID: 295 Comm: cpuhp/48 Tainted: G        W       4.8.0-rc6-preempt-vol+ #9
[    1.999435] Hardware name: Cavium ThunderX CN88XX board (DT)
[    2.005159] task: fffffe0fe89cc300 task.stack: fffffe0fe8b8c000
[    2.011158] PC is at try_to_wake_up+0x194/0x34c
[    2.015737] LR is at try_to_wake_up+0x150/0x34c
[    2.020318] pc : [<fffffc00080e7468>] lr : [<fffffc00080e7424>] pstate: 600000c5
[    2.027803] sp : fffffe0fe8b8fb10
[    2.031149] x29: fffffe0fe8b8fb10 x28: 0000000000000000
[    2.036522] x27: fffffc0008c63bc8 x26: 0000000000001000
[    2.041896] x25: fffffc0008c63c80 x24: fffffc0008bfb200
[    2.047270] x23: 00000000000000c0 x22: 0000000000000004
[    2.052642] x21: fffffe0fe89d25bc x20: 0000000000001000
[    2.058014] x19: fffffe0fe89d1d00 x18: 0000000000000000
[    2.063386] x17: 0000000000000000 x16: 0000000000000000
[    2.068760] x15: 0000000000000018 x14: 0000000000000000
[    2.074133] x13: 0000000000000000 x12: 0000000000000000
[    2.079505] x11: 0000000000000000 x10: 0000000000000000
[    2.084879] x9 : 0000000000000000 x8 : 0000000000000000
[    2.090251] x7 : 0000000000000040 x6 : 0000000000000000
[    2.095621] x5 : ffffffffffffffff x4 : 0000000000000000
[    2.100991] x3 : 0000000000000000 x2 : 0000000000000000
[    2.106364] x1 : fffffc0008be4c24 x0 : ffffff0ffffada80
[    2.111737]
[    2.113236] Process cpuhp/48 (pid: 295, stack limit = 0xfffffe0fe8b8c020)
[    2.120102] Stack: (0xfffffe0fe8b8fb10 to 0xfffffe0fe8b90000)
[    2.125914] fb00:                                   fffffe0fe8b8fb80 fffffc00080e7648
.
.
.
[    2.442859] Call trace:
[    2.445327] Exception stack(0xfffffe0fe8b8f940 to 0xfffffe0fe8b8fa70)
[    2.451843] f940: fffffe0fe89d1d00 0000040000000000 fffffe0fe8b8fb10 fffffc00080e7468
[    2.459767] f960: fffffe0fe8b8f980 fffffc00080e4958 ffffff0ff91ab200 fffffc00080e4b64
[    2.467690] f980: fffffe0fe8b8f9d0 fffffc00080e515c fffffe0fe8b8fa80 0000000000000000
[    2.475614] f9a0: fffffe0fe8b8f9d0 fffffc00080e58e4 fffffe0fe8b8fa80 0000000000000000
[    2.483540] f9c0: fffffe0fe8d10000 0000000000000040 fffffe0fe8b8fa50 fffffc00080e5ac4
[    2.491465] f9e0: ffffff0ffffada80 fffffc0008be4c24 0000000000000000 0000000000000000
[    2.499387] fa00: 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000040
[    2.507309] fa20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    2.515233] fa40: 0000000000000000 0000000000000000 0000000000000000 0000000000000018
[    2.523156] fa60: 0000000000000000 0000000000000000
[    2.528089] [<fffffc00080e7468>] try_to_wake_up+0x194/0x34c
[    2.533723] [<fffffc00080e7648>] wake_up_process+0x28/0x34
[    2.539275] [<fffffc00080d3764>] create_worker+0x110/0x19c
[    2.544824] [<fffffc00080d69dc>] alloc_unbound_pwq+0x3cc/0x4b0
[    2.550724] [<fffffc00080d6bcc>] wq_update_unbound_numa+0x10c/0x1e4
[    2.557066] [<fffffc00080d7d78>] workqueue_online_cpu+0x220/0x28c
[    2.563234] [<fffffc00080bd288>] cpuhp_invoke_callback+0x6c/0x168
[    2.569398] [<fffffc00080bdf74>] cpuhp_up_callbacks+0x44/0xe4
[    2.575210] [<fffffc00080be194>] cpuhp_thread_fun+0x13c/0x148
[    2.581027] [<fffffc00080dfbac>] smpboot_thread_fn+0x19c/0x1a8
[    2.586929] [<fffffc00080dbd64>] kthread+0xdc/0xf0
[    2.591776] [<fffffc0008083380>] ret_from_fork+0x10/0x50
[    2.597147] Code: b00057e1 91304021 91005021 b8626822 (b8606821)
[    2.603464] ---[ end trace 58c0cd36b88802bc ]---
[    2.608138] Kernel panic - not syncing: Fatal exception

Fix by moving call to numa_store_cpu_info() for all CPUs into
smp_prepare_cpus(), which happens before wq_numa_init().  Since
smp_store_cpu_info() now contains only a single function call,
simplify by removing the function and out-lining its contents.

Suggested-by: Robert Richter <rric@kernel.org>
Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.")
Cc: <stable@vger.kernel.org> # 4.7.x-
Signed-off-by: David Daney <david.daney@cavium.com>
Reviewed-by: Robert Richter <rrichter@cavium.com>
Tested-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 days agolocking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text
Vivien Didelot [Thu, 22 Sep 2016 20:55:13 +0000 (16:55 -0400)]
locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text

Fix the indefinitiley -> indefinitely typo in Kconfig.debug.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160922205513.17821-1-vivien.didelot@savoirfairelinux.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 days agonvme-rdma: only clear queue flags after successful connect
Sagi Grimberg [Fri, 23 Sep 2016 01:58:17 +0000 (19:58 -0600)]
nvme-rdma: only clear queue flags after successful connect

Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly
try to stop/free rdma qps/cm_ids that are already freed.

Fixes: e89ca58f9c90 ("nvme-rdma: add DELETING queue flag")
Reported-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
5 days agoi2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
Sudeep Holla [Thu, 25 Aug 2016 11:23:39 +0000 (12:23 +0100)]
i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended

If the i2c device is already runtime suspended, if qup_i2c_suspend is
executed during suspend-to-idle or suspend-to-ram it will result in the
following splat:

WARNING: CPU: 3 PID: 1593 at drivers/clk/clk.c:476 clk_core_unprepare+0x80/0x90
Modules linked in:

CPU: 3 PID: 1593 Comm: bash Tainted: G        W       4.8.0-rc3 #14
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
PC is at clk_core_unprepare+0x80/0x90
LR is at clk_unprepare+0x28/0x40
pc : [<ffff0000086eecf0>] lr : [<ffff0000086f0c58>] pstate: 60000145
Call trace:
 clk_core_unprepare+0x80/0x90
 qup_i2c_disable_clocks+0x2c/0x68
 qup_i2c_suspend+0x10/0x20
 platform_pm_suspend+0x24/0x68
 ...

This patch fixes the issue by executing qup_i2c_pm_suspend_runtime
conditionally in qup_i2c_suspend.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
5 days agoMerge tag 'media/v4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Thu, 22 Sep 2016 16:04:49 +0000 (09:04 -0700)]
Merge tag 'media/v4.8-7' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - several fixes for new drivers added for Kernel 4.8 addition (cec
   core, pulse8 cec driver and Mediatek vcodec)

 - a regression fix for cx23885 and saa7134 drivers

 - an important fix for rcar-fcp, making rcar_fcp_enable() return 0 on
   success

* tag 'media/v4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (25 commits)
  [media] cx23885/saa7134: assign q->dev to the PCI device
  [media] rcar-fcp: Make sure rcar_fcp_enable() returns 0 on success
  [media] cec: fix ioctl return code when not registered
  [media] cec: don't Feature Abort broadcast msgs when unregistered
  [media] vcodec:mediatek: Refine VP8 encoder driver
  [media] vcodec:mediatek: Refine H264 encoder driver
  [media] vcodec:mediatek: change H264 profile default to profile high
  [media] vcodec:mediatek: Add timestamp and timecode copy for V4L2 Encoder
  [media] vcodec:mediatek: Fix visible_height larger than coded_height issue in s_fmt_out
  [media] vcodec:mediatek: Fix fops_vcodec_release flow for V4L2 Encoder
  [media] vcodec:mediatek:code refine for v4l2 Encoder driver
  [media] cec-funcs.h: add missing vendor-specific messages
  [media] cec-edid: check for IEEE identifier
  [media] pulse8-cec: fix error handling
  [media] pulse8-cec: set correct Signal Free Time
  [media] mtk-vcodec: add HAS_DMA dependency
  [media] cec: ignore messages when log_addr_mask == 0
  [media] cec: add item to TODO
  [media] cec: set unclaimed addresses to CEC_LOG_ADDR_INVALID
  [media] cec: add CEC_LOG_ADDRS_FL_ALLOW_UNREG_FALLBACK flag
  ...

5 days agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 22 Sep 2016 15:49:25 +0000 (08:49 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:
 "Mostly small bits scattered all over the place, which is usually how
  things go this late in the -rc series.

   1) Proper driver init device resets in bnx2, from Baoquan He.

   2) Fix accounting overflow in __tcp_retransmit_skb(),
      sk_forward_alloc, and ip_idents_reserve, from Eric Dumazet.

   3) Fix crash in bna driver ethtool stats handling, from Ivan Vecera.

   4) Missing check of skb_linearize() return value in mac80211, from
      Johannes Berg.

   5) Endianness fix in nf_table_trace dumps, from Liping Zhang.

   6) SSN comparison fix in SCTP, from Marcelo Ricardo Leitner.

   7) Update DSA and b44 MAINTAINERS entries.

   8) Make input path of vti6 driver work again, from Nicolas Dichtel.

   9) Off-by-one in mlx4, from Sebastian Ott.

  10) Fix fallback route lookup handling in ipv6, from Vincent Bernat.

  11) Fix stack corruption on probe in qed driver, from Yuval Mintz.

  12) PHY init fixes in r8152 from Hayes Wang.

  13) Missing SKB free in irda_accept error path, from Phil Turnbull"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tcp: properly account Fast Open SYN-ACK retrans
  tcp: fix under-accounting retransmit SNMP counters
  MAINTAINERS: Update b44 maintainer.
  net: get rid of an signed integer overflow in ip_idents_reserve()
  net/mlx4_core: Fix to clean devlink resources
  net: can: ifi: Configure transmitter delay
  vti6: fix input path
  ipmr, ip6mr: return lastuse relative to now
  r8152: disable ALDPS and EEE before setting PHY
  r8152: remove r8153_enable_eee
  r8152: move PHY settings to hw_phy_cfg
  r8152: move enabling PHY
  r8152: move some functions
  cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter
  qed: Fix stack corruption on probe
  MAINTAINERS: Add an entry for the core network DSA code
  net: ipv6: fallback to full lookup if table lookup is unsuitable
  net/mlx5: E-Switch, Handle mode change failures
  net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code
  net/mlx5: Fix flow counter bulk command out mailbox allocation
  ...

5 days agoperf/core: Limit matching exclusive events to one PMU
Alexander Shishkin [Tue, 20 Sep 2016 15:48:11 +0000 (18:48 +0300)]
perf/core: Limit matching exclusive events to one PMU

An "exclusive" PMU is the one that can only have one event scheduled in
at any given time. There may be more than one of such PMUs in a system,
though, like Intel PT and BTS. It should be allowed to have one event
for either of those inside the same context (there may be other constraints
that may prevent this, but those would be hardware-specific). However,
the exclusivity code is written so that only one event from any of the
"exclusive" PMUs is allowed in a context.

Fix this by making the exclusive event filter explicitly match two events'
PMUs.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160920154811.3255-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 days agoperf/x86/intel/bts: Make it an exclusive PMU
Alexander Shishkin [Tue, 20 Sep 2016 15:48:10 +0000 (18:48 +0300)]
perf/x86/intel/bts: Make it an exclusive PMU

Just like intel_pt, intel_bts can only handle one event at a time,
which is the reason we introduced PERF_PMU_CAP_EXCLUSIVE in the first
place. However, at the moment one can have as many intel_bts events
within the same context at the same time as one pleases. Only one of
them, however, will get scheduled and receive the actual trace data.

Fix this by making intel_bts an "exclusive" PMU.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160920154811.3255-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 days agolocking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()
Peter Zijlstra [Fri, 26 Aug 2016 13:06:04 +0000 (15:06 +0200)]
locking/atomic, arch/sh: Fix ATOMIC_FETCH_OP()

We cannot use the "z" constraint twice, since its a single register
(r0). Change the one not used by movli.l/movco.l to "r".

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 days agoregmap: fix deadlock on _regmap_raw_write() error path
Nikita Yushchenko [Thu, 22 Sep 2016 09:02:25 +0000 (12:02 +0300)]
regmap: fix deadlock on _regmap_raw_write() error path

Commit 815806e39bf6 ("regmap: drop cache if the bus transfer error")
added a call to regcache_drop_region() to error path in
_regmap_raw_write(). However that path runs with regmap lock taken,
and regcache_drop_region() tries to re-take it, causing a deadlock.
Fix that by calling map->cache_ops->drop() directly.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
5 days agocrypto: rsa-pkcs1pad - Handle leading zero for decryption
Herbert Xu [Thu, 22 Sep 2016 09:04:57 +0000 (17:04 +0800)]
crypto: rsa-pkcs1pad - Handle leading zero for decryption

As the software RSA implementation now produces fixed-length
output, we need to eliminate leading zeros in the calling code
instead.

This patch does just that for pkcs1pad decryption while signature
verification was fixed in an earlier patch.

Fixes: 9b45b7bba3d2 ("crypto: rsa - Generate fixed-length output")
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 days agoKEYS: Fix skcipher IV clobbering
Herbert Xu [Tue, 20 Sep 2016 12:35:55 +0000 (20:35 +0800)]
KEYS: Fix skcipher IV clobbering

The IV must not be modified by the skcipher operation so we need
to duplicate it.

Fixes: c3917fd9dfbc ("KEYS: Use skcipher")
Cc: stable@vger.kernel.org
Reported-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 days agommc: dw_mmc: fix the spamming log message
Jaehoon Chung [Thu, 22 Sep 2016 05:12:00 +0000 (14:12 +0900)]
mmc: dw_mmc: fix the spamming log message

When there is no Card which is set to "broken-cd", it's displayed a clock
information continuously. Because it's polling for detecting card.
This patch is fixed this problem.

Fixes: 65257a0deed5 ("mmc: dw_mmc: remove UBSAN warning in dw_mci_setup_bus()")
Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
5 days agotcp: properly account Fast Open SYN-ACK retrans
Yuchung Cheng [Wed, 21 Sep 2016 23:16:15 +0000 (16:16 -0700)]
tcp: properly account Fast Open SYN-ACK retrans

Since the TFO socket is accepted right off SYN-data, the socket
owner can call getsockopt(TCP_INFO) to collect ongoing SYN-ACK
retransmission or timeout stats (i.e., tcpi_total_retrans,
tcpi_retransmits). Currently those stats are only updated
upon handshake completes. This patch fixes it.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 days agotcp: fix under-accounting retransmit SNMP counters
Yuchung Cheng [Wed, 21 Sep 2016 23:16:14 +0000 (16:16 -0700)]
tcp: fix under-accounting retransmit SNMP counters

This patch fixes these under-accounting SNMP rtx stats
LINUX_MIB_TCPFORWARDRETRANS
LINUX_MIB_TCPFASTRETRANS
LINUX_MIB_TCPSLOWSTARTRETRANS
when retransmitting TSO packets

Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 days agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Thu, 22 Sep 2016 06:56:23 +0000 (02:56 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2016-09-21

1) Propagate errors on security context allocation.
   From Mathias Krause.

2) Fix inbound policy checks for inter address family tunnels.
   From Thomas Zeitlhofer.

3) Fix an old memory leak on aead algorithm usage.
   From Ilan Tayari.

4) A recent patch fixed a possible NULL pointer dereference
   but broke the vti6 input path.
   Fix from Nicolas Dichtel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 days agoMerge tag 'linux-can-fixes-for-4.8-20160921' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 22 Sep 2016 06:47:46 +0000 (02:47 -0400)]
Merge tag 'linux-can-fixes-for-4.8-20160921' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-09-21

this is another pull request of one patch for the upcoming linux-4.8 release.

Marek Vasut fixes the CAN-FD bit rate switch in the ifi driver by configuring
the transmitter delay.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 days agoMAINTAINERS: Update b44 maintainer.
Michael Chan [Wed, 21 Sep 2016 03:33:15 +0000 (23:33 -0400)]
MAINTAINERS: Update b44 maintainer.

Taking over as maintainer since Gary Zambrano is no longer working
for Broadcom.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 days agonet: get rid of an signed integer overflow in ip_idents_reserve()
Eric Dumazet [Wed, 21 Sep 2016 01:06:17 +0000 (18:06 -0700)]
net: get rid of an signed integer overflow in ip_idents_reserve()

Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve()

[] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11
[] signed integer overflow:
[] -2117905507 + -695755206 cannot be represented in type 'int'

Since we do not have uatomic_add_return() yet, use atomic_cmpxchg()
so that the arithmetics can be done using unsigned int.

Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 days agonet/mlx4_core: Fix to clean devlink resources
Kamal Heib [Tue, 20 Sep 2016 11:55:31 +0000 (14:55 +0300)]
net/mlx4_core: Fix to clean devlink resources

This patch cleans devlink resources by calling devlink_port_unregister()
to avoid the following issues:

- Kernel panic when triggering reset flow.
- Memory leak due to unfreed resources in mlx4_init_port_info().

Fixes: 09d4d087cd48 ("mlx4: Implement devlink interface")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 days agoMerge tag 'wireless-drivers-for-davem-2016-09-20' of git://git.kernel.org/pub/scm...
David S. Miller [Thu, 22 Sep 2016 01:45:19 +0000 (21:45 -0400)]
Merge tag 'wireless-drivers-for-davem-2016-09-20' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.8

iwlwifi

* fix to prevent firmware crash when sending off-channel frames
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 days agobtrfs: ensure that file descriptor used with subvol ioctls is a dir
Jeff Mahoney [Wed, 21 Sep 2016 12:31:29 +0000 (08:31 -0400)]
btrfs: ensure that file descriptor used with subvol ioctls is a dir

If the subvol/snapshot create/destroy ioctls are passed a regular file
with execute permissions set, we'll eventually Oops while trying to do
inode->i_op->lookup via lookup_one_len.

This patch ensures that the file descriptor refers to a directory.

Fixes: cb8e70901d (Btrfs: Fix subvolume creation locking rules)
Fixes: 76dda93c6a (Btrfs: add snapshot/subvolume destroy ioctl)
Cc: <stable@vger.kernel.org> #v2.6.29+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
6 days agoBtrfs: handle quota reserve failure properly
Josef Bacik [Thu, 15 Sep 2016 18:57:48 +0000 (14:57 -0400)]
Btrfs: handle quota reserve failure properly

btrfs/022 was spitting a warning for the case that we exceed the quota.  If we
fail to make our quota reservation we need to clean up our data space
reservation.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Tested-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
6 days agoi2c: mux: pca954x: retry updating the mux selection on failure
Peter Rosin [Wed, 14 Sep 2016 13:24:12 +0000 (15:24 +0200)]
i2c: mux: pca954x: retry updating the mux selection on failure

The cached value of the last selected channel prevents retries on the
next call, even on failure to update the selected channel. Fix that.

Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
6 days agoi2c-eg20t: fix race between i2c init and interrupt enable
Yadi.hu [Sun, 18 Sep 2016 10:52:31 +0000 (18:52 +0800)]
i2c-eg20t: fix race between i2c init and interrupt enable

the eg20t driver call request_irq() function before the pch_base_address,
base address of i2c controller's register, is assigned an effective value.

there is one possible scenario that an interrupt which isn't inside eg20t
arrives immediately after request_irq() is executed when i2c controller
shares an interrupt number with others. since the interrupt handler
pch_i2c_handler() has already active as shared action, it will be called
and read its own register to determine if this interrupt is from itself.

At that moment, since base address of i2c registers is not remapped
in kernel space yet,so the INT handler will access an illegal address
and then a error occurs.

Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
6 days agoMIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
James Hogan [Wed, 7 Sep 2016 12:37:01 +0000 (13:37 +0100)]
MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs

The page structures associated with the vDSO pages in the kernel image
are calculated using virt_to_page(), which uses __pa() under the hood to
find the pfn associated with the virtual address. The vDSO data pointers
however point to kernel symbols, so __pa_symbol() should really be used
instead.

Since there is no equivalent to virt_to_page() which uses __pa_symbol(),
fix init_vdso_image() to work directly with pfns, calculated with
__phys_to_pfn(__pa_symbol(...)).

This issue broke the Malta Enhanced Virtual Addressing (EVA)
configuration which has a non-default implementation of __pa_symbol().
This is because it uses a physical alias so that the kernel executes
from KSeg0 (VA 0x80000000 -> PA 0x00000000), while RAM is provided to
the kernel in the KUSeg range (VA 0x00000000 -> PA 0x80000000) which
uses the same underlying RAM.

Since there are no page structures associated with the low physical
address region, some arbitrary kernel memory would be interpreted as a
page structure for the vDSO pages and badness ensues.

Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/14229/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
6 days agonet: can: ifi: Configure transmitter delay
Marek Vasut [Mon, 19 Sep 2016 19:34:01 +0000 (21:34 +0200)]
net: can: ifi: Configure transmitter delay

Configure the transmitter delay register at +0x1c to correctly handle
the CAN FD bitrate switch (BRS). This moves the SSP (secondary sample
point) to a proper offset, so that the TDC mechanism works and won't
generate error frames on the CAN link.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
6 days agovti6: fix input path
Nicolas Dichtel [Mon, 19 Sep 2016 14:17:57 +0000 (16:17 +0200)]
vti6: fix input path

Since commit 1625f4529957, vti6 is broken, all input packets are dropped
(LINUX_MIB_XFRMINNOSTATES is incremented).

XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
xfrm6_rcv_spi().

A new function xfrm6_rcv_tnl() that enables to pass a value to
xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
is used in several handlers).

CC: Alexey Kodanev <alexey.kodanev@oracle.com>
Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
7 days agoipmr, ip6mr: return lastuse relative to now
Nikolay Aleksandrov [Tue, 20 Sep 2016 14:17:22 +0000 (16:17 +0200)]
ipmr, ip6mr: return lastuse relative to now

When I introduced the lastuse member I made a subtle error because it was
returned as an absolute value but that is meaningless to user-space as it
doesn't allow to see how old exactly an entry is. Let's make it similar to
how the bridge returns such values and make it relative to "now" (jiffies).
This allows us to show the actual age of the entries and is much more
useful (e.g. user-space daemons can age out entries, iproute2 can display
the lastuse properly).

Fixes: 43b9e1274060 ("net: ipmr/ip6mr: add support for keeping an entry age")
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agoMerge branch 'r8152-phy-fixes'
David S. Miller [Wed, 21 Sep 2016 04:53:53 +0000 (00:53 -0400)]
Merge branch 'r8152-phy-fixes'

Hayes Wang says:

====================
r8152: correct the flow of PHY

First, to enable the PHY as early as possible. Some settings may fail if the
PHY is power down.

Move the other PHY settings to hw_phy_cfg() to make sure the order is correct.

Finally, disable ALDPS and EEE before updating the PHY for RTL8153.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agor8152: disable ALDPS and EEE before setting PHY
hayeswang [Tue, 20 Sep 2016 08:22:09 +0000 (16:22 +0800)]
r8152: disable ALDPS and EEE before setting PHY

Disable ALDPS and EEE to avoid the possible failure when setting the PHY.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agor8152: remove r8153_enable_eee
hayeswang [Tue, 20 Sep 2016 08:22:08 +0000 (16:22 +0800)]
r8152: remove r8153_enable_eee

Remove r8153_enable_eee().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agor8152: move PHY settings to hw_phy_cfg
hayeswang [Tue, 20 Sep 2016 08:22:07 +0000 (16:22 +0800)]
r8152: move PHY settings to hw_phy_cfg

Move the PHY relative settings together to hw_phy_cfg().

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agor8152: move enabling PHY
hayeswang [Tue, 20 Sep 2016 08:22:06 +0000 (16:22 +0800)]
r8152: move enabling PHY

Move enabling PHY to init(), otherwise some other settings may fail.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agor8152: move some functions
hayeswang [Tue, 20 Sep 2016 08:22:05 +0000 (16:22 +0800)]
r8152: move some functions

Move the following functions forward.

r8152_mmd_indirect()
r8152_mmd_read()
r8152_mmd_write()
r8152_eee_en()
r8152b_enable_eee()
r8153_eee_en()
r8153_enable_eee()
r8152b_enable_fc()
r8153_aldps_en()

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agocxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter
Hariprasad Shenai [Tue, 20 Sep 2016 06:30:52 +0000 (12:00 +0530)]
cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter

We were missing check for 25G and 100G while checking port speed,
which lead to less number of queues getting allocated for 25G & 100G
adapters and leading to low throughput. Adding the missing check for
both NIC and vNIC driver.

Also fixes port advertisement for 25G and 100G in ethtool output.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agopowerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment
Russell Currey [Wed, 14 Sep 2016 06:37:17 +0000 (16:37 +1000)]
powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment

Commit 5958d19a143e checks for prefetchable m64 BARs by comparing the
addresses instead of using resource flags.  This broke SR-IOV as the m64
check in pnv_pci_ioda_fixup_iov_resources() fails.

The condition in pnv_pci_window_alignment() also changed to checking
only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
IORESOURCE_PREFETCH.

Revert these cases to the previous behaviour, adding a new helper function
to do so.  This is named pnv_pci_is_m64_flags() to make it clear this
function is only looking at resource flags and should not be relied on for
non-SRIOV resources.

Fixes: 5958d19a143e ("Fix incorrect PE reservation attempt on some 64-bit BARs")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
7 days agoMerge tag 'linux-can-fixes-for-4.8-20160919' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 21 Sep 2016 02:46:14 +0000 (22:46 -0400)]
Merge tag 'linux-can-fixes-for-4.8-20160919' of git://git./linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-09-19

this is a pull request of one patch for the upcoming linux-4.8 release.

The patch by Fabio Estevam fixes the pm handling in the flexcan driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agoMerge tag 'usercopy-v4.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Wed, 21 Sep 2016 00:11:19 +0000 (17:11 -0700)]
Merge tag 'usercopy-v4.8-rc8' of git://git./linux/kernel/git/kees/linux

Pull usercopy hardening fix from Kees Cook:
 "Expand the arm64 vmalloc check to include skipping the module space
  too"

* tag 'usercopy-v4.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mm: usercopy: Check for module addresses

7 days agofix fault_in_multipages_...() on architectures with no-op access_ok()
Al Viro [Tue, 20 Sep 2016 19:07:42 +0000 (20:07 +0100)]
fix fault_in_multipages_...() on architectures with no-op access_ok()

Switching iov_iter fault-in to multipages variants has exposed an old
bug in underlying fault_in_multipages_...(); they break if the range
passed to them wraps around.  Normally access_ok() done by callers will
prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
such a range and they should not point to any valid objects).

However, on architectures where userland and kernel live in different
MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
with a wraparound can reach fault_in_multipages_...().

Since any wraparound means EFAULT there, the fix is trivial - turn
those

    while (uaddr <= end)
    ...
into

    if (unlikely(uaddr > end))
    return -EFAULT;
    do
    ...
    while (uaddr <= end);

Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 days agomm: usercopy: Check for module addresses
Laura Abbott [Tue, 20 Sep 2016 15:56:36 +0000 (08:56 -0700)]
mm: usercopy: Check for module addresses

While running a compile on arm64, I hit a memory exposure

usercopy: kernel memory exposure attempt detected from fffffc0000f3b1a8 (buffer_head) (1 bytes)
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:75!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in: ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp
llc ebtable_nat ip6table_security ip6table_raw ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
iptable_security iptable_raw iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
ebtable_filter ebtables ip6table_filter ip6_tables vfat fat xgene_edac
xgene_enet edac_core i2c_xgene_slimpro i2c_core at803x realtek xgene_dma
mdio_xgene gpio_dwapb gpio_xgene_sb xgene_rng mailbox_xgene_slimpro nfsd
auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sdhci_of_arasan
sdhci_pltfm sdhci mmc_core xhci_plat_hcd gpio_keys
CPU: 0 PID: 19744 Comm: updatedb Tainted: G        W 4.8.0-rc3-threadinfo+ #1
Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board, BIOS 3.06.12 Aug 12 2016
task: fffffe03df944c00 task.stack: fffffe00d128c000
PC is at __check_object_size+0x70/0x3f0
LR is at __check_object_size+0x70/0x3f0
...
[<fffffc00082b4280>] __check_object_size+0x70/0x3f0
[<fffffc00082cdc30>] filldir64+0x158/0x1a0
[<fffffc0000f327e8>] __fat_readdir+0x4a0/0x558 [fat]
[<fffffc0000f328d4>] fat_readdir+0x34/0x40 [fat]
[<fffffc00082cd8f8>] iterate_dir+0x190/0x1e0
[<fffffc00082cde58>] SyS_getdents64+0x88/0x120
[<fffffc0008082c70>] el0_svc_naked+0x24/0x28

fffffc0000f3b1a8 is a module address. Modules may have compiled in
strings which could get copied to userspace. In this instance, it
looks like "." which matches with a size of 1 byte. Extend the
is_vmalloc_addr check to be is_vmalloc_or_module_addr to cover
all possible cases.

Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
7 days agoirqchip/mips-gic: Fix local interrupts
Paul Burton [Tue, 13 Sep 2016 16:53:35 +0000 (17:53 +0100)]
irqchip/mips-gic: Fix local interrupts

Since the device hierarchy domain was added by commit c98c1822ee13
("irqchip/mips-gic: Add device hierarchy domain"), GIC local interrupts
have been broken.

Users attempting to setup a per-cpu local IRQ, for example the GIC timer
clock events code in drivers/clocksource/mips-gic-timer.c, the
setup_percpu_irq function would refuse with -EINVAL because the GIC
irqchip driver never called irq_set_percpu_devid so the
IRQ_PER_CPU_DEVID flag was never set for the IRQ. This happens because
irq_set_percpu_devid was being called from the gic_irq_domain_map
function which is no longer called.

Doing only that runs into further problems because gic_dev_domain_alloc
set the struct irq_chip for all interrupts, local or shared, to
gic_level_irq_controller despite that only being suitable for shared
interrupts. The typical outcome of this is that gic_level_irq_controller
callback functions are called for local interrupts, and then hwirq
number calculations overflow & the driver ends up attempting to access
some invalid register with an address calculated from an invalid hwirq
number. Best case scenario is that this then leads to a bus error. This
is fixed by abstracting the setup of the hwirq & chip to a new function
gic_setup_dev_chip which is used by both the root GIC IRQ domain & the
device domain.

Finally, decoding local interrupts failed because gic_dev_domain_alloc
only called irq_domain_alloc_irqs_parent for shared interrupts. Local
ones were therefore never associated with hwirqs in the root GIC IRQ
domain and the virq in gic_handle_local_int would always be 0. This is
fixed by calling irq_domain_alloc_irqs_parent unconditionally & having
gic_irq_domain_alloc handle both local & shared interrupts, which is
easy due to the aforementioned abstraction of chip setup into
gic_setup_dev_chip.

This fixes use of the MIPS GIC timer for clock events, which has been
broken since c98c1822ee13 ("irqchip/mips-gic: Add device hierarchy
domain") but hadn't been noticed due to a silent fallback to the MIPS
coprocessor 0 count/compare clock events device.

Fixes: c98c1822ee13 ("irqchip/mips-gic: Add device hierarchy domain")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Qais Yousef <qsyousef@gmail.com>
Cc: stable@vger.kernel.org
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/20160913165335.31389-1-paul.burton@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
7 days agofs/proc/kcore.c: Add bounce buffer for ktext data
Jiri Olsa [Thu, 8 Sep 2016 07:57:08 +0000 (09:57 +0200)]
fs/proc/kcore.c: Add bounce buffer for ktext data

We hit hardened usercopy feature check for kernel text access by reading
kcore file:

  usercopy: kernel memory exposure attempt detected from ffffffff8179a01f (<kernel text>) (4065 bytes)
  kernel BUG at mm/usercopy.c:75!

Bypassing this check for kcore by adding bounce buffer for ktext data.

Reported-by: Steve Best <sbest@redhat.com>
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 days agofs/proc/kcore.c: Make bounce buffer global for read
Jiri Olsa [Thu, 8 Sep 2016 07:57:07 +0000 (09:57 +0200)]
fs/proc/kcore.c: Make bounce buffer global for read

Next patch adds bounce buffer for ktext area, so it's
convenient to have single bounce buffer for both
vmalloc/module and ktext cases.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 days agoMerge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming...
Ingo Molnar [Tue, 20 Sep 2016 14:56:56 +0000 (16:56 +0200)]
Merge tag 'efi-urgent' of git://git./linux/kernel/git/mfleming/efi into efi/urgent

Pull EFI fixes from Matt Fleming:

 * Fix a boot hang on large memory machines (multiple terabyte) caused
   by type conversion errors in the x86 PAT code (Matt Fleming)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 days agoperf/x86/intel/bts: Make sure debug store is valid
Sebastian Andrzej Siewior [Tue, 20 Sep 2016 13:12:21 +0000 (15:12 +0200)]
perf/x86/intel/bts: Make sure debug store is valid

Since commit 4d4c47412464 ("perf/x86/intel/bts: Fix BTS PMI detection")
my box goes boom on boot:

| .... node  #0, CPUs:      #1 #2 #3 #4 #5 #6 #7
| BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
| IP: [<ffffffff8100c463>] intel_bts_interrupt+0x43/0x130
| Call Trace:
|  <NMI> d [<ffffffff8100b341>] intel_pmu_handle_irq+0x51/0x4b0
|  [<ffffffff81004d47>] perf_event_nmi_handler+0x27/0x40

This happens because the code introduced in this commit dereferences the
debug store pointer unconditionally. The debug store is not guaranteed to
be available, so a NULL pointer check as on other places is required.

Fixes: 4d4c47412464 ("perf/x86/intel/bts: Fix BTS PMI detection")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: vince@deater.net
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20160920131220.xg5pbdjtznszuyzb@breakpoint.cc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
7 days agox86/efi: Only map RAM into EFI page tables if in mixed-mode
Matt Fleming [Mon, 19 Sep 2016 12:09:09 +0000 (13:09 +0100)]
x86/efi: Only map RAM into EFI page tables if in mixed-mode

Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
multi-terabyte HP machine results in boot crashes, because the EFI
region mapping functions loop forever while trying to map those
regions describing RAM.

While this patch doesn't fix the underlying hang, there's really no
reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
when mixed-mode is not in use at runtime.

Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
7 days agox86/mm/pat: Prevent hang during boot when mapping pages
Matt Fleming [Tue, 20 Sep 2016 13:26:21 +0000 (14:26 +0100)]
x86/mm/pat: Prevent hang during boot when mapping pages

There's a mixture of signed 32-bit and unsigned 32-bit and 64-bit data
types used for keeping track of how many pages have been mapped.

This leads to hangs during boot when mapping large numbers of pages
(multiple terabytes, as reported by Waiman) because those values are
interpreted as being negative.

commit 742563777e8d ("x86/mm/pat: Avoid truncation when converting
cpa->numpages to address") fixed one of those bugs, but there is
another lurking in __change_page_attr_set_clr().

Additionally, the return value type for the populate_*() functions can
return negative values when a large number of pages have been mapped,
triggering the error paths even though no error occurred.

Consistently use 64-bit types on 64-bit platforms when counting pages.
Even in the signed case this gives us room for regions 8PiB
(pebibytes) in size whilst still allowing the usual negative value
error checking idiom.

Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
7 days agoqed: Fix stack corruption on probe
Yuval Mintz [Mon, 19 Sep 2016 14:47:41 +0000 (17:47 +0300)]
qed: Fix stack corruption on probe

Commit fe56b9e6a8d95 ("qed: Add module with basic common support")
has introduced a stack corruption during probe, where filling a
local struct with data to be sent to management firmware is incorrectly
filled; The data is written outside of the struct and corrupts
the stack.

Changes from v1:
----------------
 - Correct the value written [Caught by David Laight]

Fixes: fe56b9e6a8d95 ("qed: Add module with basic common support")
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agoMAINTAINERS: Add an entry for the core network DSA code
Andrew Lunn [Sun, 18 Sep 2016 19:17:19 +0000 (21:17 +0200)]
MAINTAINERS: Add an entry for the core network DSA code

The core distributed switch architecture code currently does not have
a MAINTAINERS entry, which results in some contributions not landing
in the right peoples inbox.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 days agonet: ipv6: fallback to full lookup if table lookup is unsuitable
Vincent Bernat [Sun, 18 Sep 2016 15:46:07 +0000 (17:46 +0200)]
net: ipv6: fallback to full lookup if table lookup is unsuitable

Commit 8c14586fc320 ("net: ipv6: Use passed in table for nexthop
lookups") introduced a regression: insertion of an IPv6 route in a table
not containing the appropriate connected route for the gateway but which
contained a non-connected route (like a default gateway) fails while it
was previously working:

    $ ip link add eth0 type dummy
    $ ip link set up dev eth0
    $ ip addr add 2001:db8::1/64 dev eth0
    $ ip route add ::/0 via 2001:db8::5 dev eth0 table 20
    $ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
    RTNETLINK answers: No route to host
    $ ip -6 route show table 20
    default via 2001:db8::5 dev eth0  metric 1024  pref medium

After this patch, we get:

    $ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
    $ ip -6 route show table 20
    2001:db8:cafe::1 via 2001:db8::6 dev eth0  metric 1024  pref medium
    default via 2001:db8::5 dev eth0  metric 1024  pref medium

Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 days agoMerge branch 'mlx5-fixes'
David S. Miller [Tue, 20 Sep 2016 02:10:25 +0000 (22:10 -0400)]
Merge branch 'mlx5-fixes'

Or Gerlitz says:

====================
mlx5 fixes to 4.8-rc6

This series series has a fix from Roi to memory corruption bug in
the bulk flow counters code and two late and hopefully last fixes
from me to the new eswitch offloads code.

Series done over net commit 37dd348 "bna: fix crash in bnad_get_strings()"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 days agonet/mlx5: E-Switch, Handle mode change failures
Or Gerlitz [Sun, 18 Sep 2016 15:20:29 +0000 (18:20 +0300)]
net/mlx5: E-Switch, Handle mode change failures

E-switch mode changes involve creating HW tables, potentially allocating
netdevices, etc, and things can fail. Add an attempt to rollback to the
existing mode when changing to the new mode fails. Only if rollback fails,
getting proper SRIOV functionality requires module unload or sriov
disablement/enablement.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 days agonet/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code
Or Gerlitz [Sun, 18 Sep 2016 15:20:28 +0000 (18:20 +0300)]
net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code

When enablement of the SRIOV e-switch in certain mode (switchdev or legacy)
fails, we must set the mode to none. Otherwise, we'll run into double free
based crashes when further attempting to deal with the e-switch (such
as when disabling sriov or unloading the driver).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 days agonet/mlx5: Fix flow counter bulk command out mailbox allocation
Roi Dayan [Sun, 18 Sep 2016 15:20:27 +0000 (18:20 +0300)]
net/mlx5: Fix flow counter bulk command out mailbox allocation

The FW command output length should be only the length of struct
mlx5_cmd_fc_bulk out field. Failing to do so will cause the memcpy
call which is invoked later in the driver to write over wrong memory
address and corrupt kernel memory which results in random crashes.

This bug was found using the kernel address sanitizer (kasan).

Fixes: a351a1b03bf1 ('net/mlx5: Introduce bulk reading of flow counters')
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 days agoirqchip/gicv3: Silence noisy DEBUG_PER_CPU_MAPS warning
James Morse [Mon, 19 Sep 2016 17:29:15 +0000 (18:29 +0100)]
irqchip/gicv3: Silence noisy DEBUG_PER_CPU_MAPS warning

gic_raise_softirq() walks the list of cpus using for_each_cpu(), it calls
gic_compute_target_list() which advances the iterator by the number of
CPUs in the cluster.

If gic_compute_target_list() reaches the last CPU it leaves the iterator
pointing at the last CPU. This means the next time round the for_each_cpu()
loop cpumask_next() will be called with an invalid CPU.

This triggers a warning when built with CONFIG_DEBUG_PER_CPU_MAPS:
[    3.077738] GICv3: CPU1: found redistributor 1 region 0:0x000000002f120000
[    3.077943] CPU1: Booted secondary processor [410fd0f0]
[    3.078542] ------------[ cut here ]------------
[    3.078746] WARNING: CPU: 1 PID: 0 at ../include/linux/cpumask.h:121 gic_raise_softirq+0x12c/0x170
[    3.078812] Modules linked in:
[    3.078869]
[    3.078930] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc5+ #5188
[    3.078994] Hardware name: Foundation-v8A (DT)
[    3.079059] task: ffff80087a1a0080 task.stack: ffff80087a19c000
[    3.079145] PC is at gic_raise_softirq+0x12c/0x170
[    3.079226] LR is at gic_raise_softirq+0xa4/0x170
[    3.079296] pc : [<ffff0000083ead24>] lr : [<ffff0000083eac9c>] pstate: 200001c9
[    3.081139] Call trace:
[    3.081202] Exception stack(0xffff80087a19fbe0 to 0xffff80087a19fd10)

[    3.082269] [<ffff0000083ead24>] gic_raise_softirq+0x12c/0x170
[    3.082354] [<ffff00000808e614>] smp_send_reschedule+0x34/0x40
[    3.082433] [<ffff0000080e80a0>] resched_curr+0x50/0x88
[    3.082512] [<ffff0000080e89d0>] check_preempt_curr+0x60/0xd0
[    3.082593] [<ffff0000080e8a60>] ttwu_do_wakeup+0x20/0xe8
[    3.082672] [<ffff0000080e8bb8>] ttwu_do_activate+0x90/0xc0
[    3.082753] [<ffff0000080ea9a4>] try_to_wake_up+0x224/0x370
[    3.082836] [<ffff0000080eabc8>] default_wake_function+0x10/0x18
[    3.082920] [<ffff000008103134>] __wake_up_common+0x5c/0xa0
[    3.083003] [<ffff0000081031f4>] __wake_up_locked+0x14/0x20
[    3.083086] [<ffff000008103f80>] complete+0x40/0x60
[    3.083168] [<ffff00000808df7c>] secondary_start_kernel+0x15c/0x1d0
[    3.083240] [<00000000808911a4>] 0x808911a4
[    3.113401] Detected PIPT I-cache on CPU2

Avoid updating the iterator if the next call to cpumask_next() would
cause the for_each_cpu() loop to exit.

There is no change to gic_raise_softirq()'s behaviour, (cpumask_next()s
eventual call to _find_next_bit() will return early as start >= nbits),
this patch just silences the warning.

Fixes: 021f653791ad ("irqchip: gic-v3: Initial support for GICv3")
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1474306155-3303-1-git-send-email-james.morse@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8 days agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Mon, 19 Sep 2016 23:08:03 +0000 (16:08 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
 "20 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  rapidio/rio_cm: avoid GFP_KERNEL in atomic context
  Revert "ocfs2: bump up o2cb network protocol version"
  ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
  cgroup: duplicate cgroup reference when cloning sockets
  mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
  ocfs2: fix double unlock in case retry after free truncate log
  fanotify: fix list corruption in fanotify_get_response()
  fsnotify: add a way to stop queueing events on group shutdown
  ocfs2: fix trans extend while free cached blocks
  ocfs2: fix trans extend while flush truncate log
  ipc/shm: fix crash if CONFIG_SHMEM is not set
  mm: fix the page_swap_info() BUG_ON check
  autofs: use dentry flags to block walks during expire
  MAINTAINERS: update email for VLYNQ bus entry
  mm: avoid endless recursion in dump_page()
  mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()
  khugepaged: fix use-after-free in collapse_huge_page()
  MAINTAINERS: Maik has moved
  ocfs2/dlm: fix race between convert and migration
  mem-hotplug: don't clear the only node in new_node_page()

8 days agorapidio/rio_cm: avoid GFP_KERNEL in atomic context
Alexandre Bounine [Mon, 19 Sep 2016 21:44:47 +0000 (14:44 -0700)]
rapidio/rio_cm: avoid GFP_KERNEL in atomic context

As reported by Alexey Khoroshilov (https://lkml.org/lkml/2016/9/9/737):
riocm_send_close() is called from rio_cm_shutdown() under
spin_lock_bh(idr_lock), but riocm_send_close() uses a GFP_KERNEL
allocation.

Fix by taking riocm_send_close() outside of spinlock protected code.

[akpm@linux-foundation.org: remove unneeded `if (!list_empty())']
Link: http://lkml.kernel.org/r/20160915175402.10122-1-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 days agoRevert "ocfs2: bump up o2cb network protocol version"
Junxiao Bi [Mon, 19 Sep 2016 21:44:44 +0000 (14:44 -0700)]
Revert "ocfs2: bump up o2cb network protocol version"

This reverts commit 38b52efd218b ("ocfs2: bump up o2cb network protocol
version").

This commit made rolling upgrade fail.  When one node is upgraded to new
version with this commit, the remaining nodes will fail to establish
connections to it, then the application like VMs on the remaining nodes
can't be live migrated to the upgraded one.  This will cause an outage.
Since negotiate hb timeout behavior didn't change without this commit,
so revert it.

Fixes: 38b52efd218bf ("ocfs2: bump up o2cb network protocol version")
Link: http://lkml.kernel.org/r/1471396924-10375-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 days agoocfs2: fix start offset to ocfs2_zero_range_for_truncate()
Ashish Samant [Mon, 19 Sep 2016 21:44:42 +0000 (14:44 -0700)]
ocfs2: fix start offset to ocfs2_zero_range_for_truncate()

If we punch a hole on a reflink such that following conditions are met:

1. start offset is on a cluster boundary
2. end offset is not on a cluster boundary
3. (end offset is somewhere in another extent) or
   (hole range > MAX_CONTIG_BYTES(1MB)),

we dont COW the first cluster starting at the start offset.  But in this
case, we were wrongly passing this cluster to
ocfs2_zero_range_for_truncate() to zero out.  This will modify the
cluster in place and zero it in the source too.

Fix this by skipping this cluster in such a scenario.

To reproduce:

1. Create a random file of say 10 MB
     xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
2. Reflink  it
     reflink -f 10MBfile reflnktest
3. Punch a hole at starting at cluster boundary  with range greater that
1MB. You can also use a range that will put the end offset in another
extent.
     fallocate -p -o 0 -l 1048615 reflnktest
4. sync
5. Check the  first cluster in the source file. (It will be zeroed out).
    dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C

Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reported-by: Saar Maoz <saar.maoz@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Eric Ren <zren@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 days agocgroup: duplicate cgroup reference when cloning sockets
Johannes Weiner [Mon, 19 Sep 2016 21:44:38 +0000 (14:44 -0700)]
cgroup: duplicate cgroup reference when cloning sockets

When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup.  As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 days agomm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
Johannes Weiner [Mon, 19 Sep 2016 21:44:36 +0000 (14:44 -0700)]
mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting

During cgroup2 rollout into production, we started encountering css
refcount underflows and css access crashes in the memory controller.
Splitting the heavily shared css reference counter into logical users
narrowed the imbalance down to the cgroup2 socket memory accounting.

The problem turns out to be the per-cpu charge cache.  Cgroup1 had a
separate socket counter, but the new cgroup2 socket accounting goes
through the common charge path that uses a shared per-cpu cache for all
memory that is being tracked.  Those caches are safe against scheduling
preemption, but not against interrupts - such as the newly added packet
receive path.  When cache draining is interrupted by network RX taking
pages out of the cache, the resuming drain operation will put references
of in-use pages, thus causing the imbalance.

Disable IRQs during all per-cpu charge cache operations.

Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
Link: http://lkml.kernel.org/r/20160914194846.11153-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 days agoocfs2: fix double unlock in case retry after free truncate log
Joseph Qi [Mon, 19 Sep 2016 21:44:33 +0000 (14:44 -0700)]
ocfs2: fix double unlock in case retry after free truncate log

If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
free truncate log and then retry.  Since ocfs2_try_to_free_truncate_log
will lock/unlock global bitmap inode, we have to unlock it before
calling this function.  But when retry reserve and it fails with no
global bitmap inode lock taken, it will unlock again in error handling
branch and BUG.

This issue also exists if no need retry and then ocfs2_inode_lock fails.
So fix it.

Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in truncate log")
Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 days agofanotify: fix list corruption in fanotify_get_response()
Jan Kara [Mon, 19 Sep 2016 21:44:30 +0000 (14:44 -0700)]
fanotify: fix list corruption in fanotify_get_response()

fanotify_get_response() calls fsnotify_remove_event() when it finds that
group is being released from fanotify_release() (bypass_perm is set).

However the event it removes need not be only in the group's notification
queue but it can have already moved to access_list (userspace read the
event before closing the fanotify instance fd) which is protected by a
different lock.  Thus when fsnotify_remove_event() races with
fanotify_release() operating on access_list, the list can get corrupted.

Fix the problem by moving all the logic removing permission events from
the lists to one place - fanotify_release().

Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 days agofsnotify: add a way to stop queueing events on group shutdown
Jan Kara [Mon, 19 Sep 2016 21:44:27 +0000 (14:44 -0700)]
fsnotify: add a way to stop queueing events on group shutdown

Implement a function that can be called when a group is being shutdown
to stop queueing new events to the group.  Fanotify will use this.

Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 days agoocfs2: fix trans extend while free cached blocks
Junxiao Bi [Mon, 19 Sep 2016 21:44:24 +0000 (14:44 -0700)]
ocfs2: fix trans extend while free cached blocks

The root cause of this issue is the same with the one fixed by the last
patch, but this time credits for allocator inode and group descriptor
may not be consumed before trans extend.

The following error was caught:

  WARNING: CPU: 0 PID: 2037 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
  Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_netfront parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
  CPU: 0 PID: 2037 Comm: rm Tainted: G        W       4.1.12-37.6.3.el6uek.bug24573128v2.x86_64 #2
  Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
  Call Trace:
    dump_stack+0x48/0x5c
    warn_slowpath_common+0x95/0xe0
    warn_slowpath_null+0x1a/0x20
    start_this_handle+0x4c3/0x510 [jbd2]
    jbd2__journal_restart+0x161/0x1b0 [jbd2]
    jbd2_journal_restart+0x13/0x20 [jbd2]
    ocfs2_extend_trans+0x74/0x220 [ocfs2]
    ocfs2_free_cached_blocks+0x16b/0x4e0 [ocfs2]
    ocfs2_run_deallocs+0x70/0x270 [ocfs2]
    ocfs2_commit_truncate+0x474/0x6f0 [ocfs2]
    ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
    ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
    ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
    ocfs2_evict_inode+0x28/0x60 [ocfs2]
    evict+0xab/0x1a0
    iput_final+0xf6/0x190
    iput+0xc8/0xe0
    do_unlinkat+0x1b7/0x310
    SyS_unlinkat+0x22/0x40
    system_call_fastpath+0x12/0x71
  ---[ end trace a62437cb060baa71 ]---
  JBD2: rm wants too many credits (149 > 128)

Link: http://lkml.kernel.org/r/1473674623-11810-2-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 days agoocfs2: fix trans extend while flush truncate log
Junxiao Bi [Mon, 19 Sep 2016 21:44:21 +0000 (14:44 -0700)]
ocfs2: fix trans extend while flush truncate log

Every time, ocfs2_extend_trans() included a credit for truncate log
inode, but as that inode had been managed by jbd2 running transaction
first time, it will not consume that credit until
jbd2_journal_restart().

Since total credits to extend always included the un-consumed ones,
there will be more and more un-consumed credit, at last
jbd2_journal_restart() will fail due to credit number over the half of
max transction credit.

The following error was caught when unlinking a large file with many
extents:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 13626 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
  Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
  CPU: 0 PID: 13626 Comm: unlink Tainted: G        W       4.1.12-37.6.3.el6uek.x86_64 #2
  Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
  Call Trace:
    dump_stack+0x48/0x5c
    warn_slowpath_common+0x95/0xe0
    warn_slowpath_null+0x1a/0x20
    start_this_handle+0x4c3/0x510 [jbd2]
    jbd2__journal_restart+0x161/0x1b0 [jbd2]
    jbd2_journal_restart+0x13/0x20 [jbd2]
    ocfs2_extend_trans+0x74/0x220 [ocfs2]
    ocfs2_replay_truncate_records+0x93/0x360 [ocfs2]
    __ocfs2_flush_truncate_log+0x13e/0x3a0 [ocfs2]
    ocfs2_remove_btree_range+0x458/0x7f0 [ocfs2]
    ocfs2_commit_truncate+0x1b3/0x6f0 [ocfs2]
    ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
    ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
    ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
    ocfs2_evict_inode+0x28/0x60 [ocfs2]
    evict+0xab/0x1a0
    iput_final+0xf6/0x190
    iput+0xc8/0xe0
    do_unlinkat+0x1b7/0x310
    SyS_unlink+0x16/0x20
    system_call_fastpath+0x12/0x71
  ---[ end trace 28aa7410e69369cf ]---
  JBD2: unlink wants too many credits (251 > 128)

Link: http://lkml.kernel.org/r/1473674623-11810-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>