Simon J. Gerraty [Fri, 20 Nov 2020 06:02:31 +0000 (06:02 +0000)]
Merge bmake-
20201117
o allow env var MAKE_OBJDIR_CHECK_WRITABLE=no to skip writable
checks in InitObjdir. Explicit .OBJDIR target always allows
read-only directory.
o More code cleanup and refactoring.
o More unit tests
MFC after: 1 week
Alexander Motin [Fri, 20 Nov 2020 05:46:27 +0000 (05:46 +0000)]
Microoptimize cam_num_doneqs math in xpt_done().
MFC after: 1 week
Simon J. Gerraty [Fri, 20 Nov 2020 03:54:37 +0000 (03:54 +0000)]
Import bmake-
20201117
o allow env var MAKE_OBJDIR_CHECK_WRITABLE=no to skip writable
checks in InitObjdir. Explicit .OBJDIR target always allows
read-only directory.
o Fix building and unit-tests on non-BSD.
o More code cleanup and refactoring.
o More unit tests
Alexander Motin [Fri, 20 Nov 2020 02:03:58 +0000 (02:03 +0000)]
Fix r367857 build without ISP_TARGET_MODE.
Alexander Motin [Fri, 20 Nov 2020 01:15:48 +0000 (01:15 +0000)]
Remove parallel SCSI and 1/2Gb FC support from isp(4).
This removes 288KB (36%) of the driver code and zillions of hacks and
workarounds, making single driver uniformly support several different
generations of hardware interfaces, not counting minor card variations.
After years of the hopeless fight, I don't think it worth to continue
support for hardware obsolete for 15-20 years. Instead much cleaner
now code should allow to move forward toward better locking, multiple
queues and other cool features.
All the remaining Qlogic cards starting from 4Gb 24xx to 32Gb 27xx use
the same hardware/firmware interface with minor incremental improvements,
so it seems to be a good new starting point. Except one PCI-X model all
all of them are PCIe and so still usable in modern systems.
Discussed with: ken, scottl, jpaetzel, imp
Relnotes: yes
Vladimir Kondratyev [Fri, 20 Nov 2020 00:13:30 +0000 (00:13 +0000)]
psm(4): Disable AUX multiplexer probing on all Lenovo laptops.
Rudimentary AUX multiplexing support was added to kernel to make possible
touchpad initialization on some HP EliteBook laptops with trackpoint.
Disable multiplexer probing on all Lenovo laptops now as they use touchpad
pass-through port rather than AUX multiplexer to connect trackpoint and
at least two model (X120e and X121e) is known for getting PS/2 AUX port
dysfunctional after switching back to hidden multiplexing mode.
AUX MUX probing can be reenabled with setting of hw.psm.mux_disabled loader
tunable to 0.
PR: 249987
Reported by: jwb
MFC after: 2 weeks
Ed Maste [Thu, 19 Nov 2020 21:10:36 +0000 (21:10 +0000)]
addr2line: swap if conditions for diff reduction in upcoming change
No functional change intended.
Mateusz Guzik [Thu, 19 Nov 2020 19:25:47 +0000 (19:25 +0000)]
pipe: thundering herd problem in pipelock
All reads and writes are serialized with a hand-rolled lock, but unlocking it
always wakes up all waiters. Existing flag fields get resized to make room for
introduction of waiter counter without growing the struct.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D27273
Fernando Apesteguía [Thu, 19 Nov 2020 19:05:16 +0000 (19:05 +0000)]
fstat(1): Add EXAMPLES section
* Add examples covering -f, -m and -p flags.
While here, extend the initial description paragraph to note that fstat(1)
will report on all opened files, belonging to processes the user has access to.
The current paragraph may lead to understand that you can get information on
opened files from processes belonging to other users.
Reviewed by: bjk@, danfe@, gbe@
Approved by: manpages (gbe@)
Differential Revision: https://reviews.freebsd.org/D26949
Fernando Apesteguía [Thu, 19 Nov 2020 18:58:15 +0000 (18:58 +0000)]
grep(1): Add more EXAMPLES
* Add more EXAMPLES covering flags: -A, -B, -c, -f, -i, -H, -l, -q, -R, -w
* While here, change existing wording to use the imperative (remove "To
find")
* Reword first example to be consistent with how grep(1) understand
words (-w)
Approved by: manpages (bcr@)
Differential Revision: https://reviews.freebsd.org/D27264
Mark Johnston [Thu, 19 Nov 2020 18:37:28 +0000 (18:37 +0000)]
callout(9): Fix a race between CPU migration and callout_drain()
Suppose a running callout re-arms itself, and before the callout
finishes running another CPU calls callout_drain() and goes to sleep.
softclock_call_cc() will wake up the draining thread, which may not run
immediately if there is a lot of CPU load. Furthermore, the callout is
still in the callout wheel so it can continue to run and re-arm itself.
Then, suppose that the callout migrates to another CPU before the
draining thread gets a chance to run. The draining thread is in this
loop in _callout_stop_safe():
while (cc_exec_curr(cc) == c) {
CC_UNLOCK(cc);
sleep();
CC_LOCK(cc);
}
but after the migration, cc points to the wrong CPU's callout state.
Then the draining thread goes off and removes the callout from the
wheel, but does so using the wrong lock and per-CPU callout state.
Fix the problem by doing a re-lookup of the callout CPU after sleeping.
Reported by: syzbot+
79569cd4d76636b2cc1c@syzkaller.appspotmail.com
Reported by: syzbot+
1b27e0237aa22d8adffa@syzkaller.appspotmail.com
Reported by: syzbot+
e21aa5b85a9aff90ef3e@syzkaller.appspotmail.com
Reviewed by: emaste, hselasky
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27266
Mitchell Horne [Thu, 19 Nov 2020 18:03:40 +0000 (18:03 +0000)]
Add an option for entering KDB on recursive panics
There are many cases where one would choose avoid entering the debugger
on a normal panic, opting instead to reboot and possibly save a kernel
dump. However, recursive kernel panics are an unusual case that might
warrant attention from a human, so provide a secondary tunable,
debug.debugger_on_recursive_panic, to allow entering the debugger only
when this occurs.
For for simplicity in maintaining existing behaviour, the tunable
defaults to zero.
Reviewed by: cem, markj
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27271
Warner Losh [Thu, 19 Nov 2020 17:54:41 +0000 (17:54 +0000)]
Document disk ioctl
First stab at documenting the different disk ioctl commands defined in
sys/disk.h.
Reviewed by: phk (prior version)
Differential Revision: https://reviews.freebsd.org/D26994
Daniel Ebdrup Jensen [Thu, 19 Nov 2020 16:57:45 +0000 (16:57 +0000)]
intro.7: Add missing manual page
Section 7 of the manual pages contain lots of very useful information, but
finding the pages is not always obvious - to assist people in finding the
information, add missing cross-references.
Reviewed by: 0mp (mentor), mhorne, yuripv
Approved by: 0mp (mentor
Differential Revision: https://reviews.freebsd.org/D27284
Mark Johnston [Thu, 19 Nov 2020 15:41:42 +0000 (15:41 +0000)]
Wrap a long line in vm_pqbatch_process_page()
Mark Johnston [Thu, 19 Nov 2020 15:40:58 +0000 (15:40 +0000)]
Micro-optimize vm_page_pqbatch_submit()
Avoid calling vm_page_domain() twice.
Discussed with: alc (in D27207)
Emmanuel Vadot [Thu, 19 Nov 2020 14:27:01 +0000 (14:27 +0000)]
release: Switch the Allwinner board to GPT
Allwinner bootrom have an alternate location for u-boot at 128k.
Work was made recently in u-boot to relocate correctly if loaded from
there.
The advantage of this offset is that we can now use a GPT scheme.
Mateusz Guzik [Thu, 19 Nov 2020 10:00:48 +0000 (10:00 +0000)]
thread: numa-aware zombie reaping
The current global list is a significant problem, in particular induces a lot
of cross-domain thread frees. When running poudriere on a 2 domain box about
half of all frees were of that nature.
Patch below introduces per-domain thread data containing zombie lists and
domain-aware reaping. By default it only reaps from the current domain, only
reaping from others if there is free TID shortage.
A dedicated callout is introduced to reap lingering threads if there happens
to be no activity.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D27185
Andrew Turner [Thu, 19 Nov 2020 09:26:51 +0000 (09:26 +0000)]
Fall back to use the GICR address from the generic interrupt struct
When there is no ACPI redistributor sub-table in the MADT we need to
fall back to use the GICR base address from the GIC CPU interface
structure.
Handle this fallback when adding memory to the device and when counting
the number of redistributors.
PR: 251171
Reported by: Andrey Fesenko <f0andrey_gmail.com>
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D27247
Mateusz Guzik [Thu, 19 Nov 2020 08:16:45 +0000 (08:16 +0000)]
pipe: tidy up pipelock
Peter Grehan [Thu, 19 Nov 2020 07:23:39 +0000 (07:23 +0000)]
Advance RIP after userspace instruction decode
Add update to RIP after a userspace instruction decode (as is done for
the in-kernel counterpart of this case).
Submitted by: adam_fenn.io
Reviewed by: cem, markj
Approved by: grehan (bhyve)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27243
Mateusz Guzik [Thu, 19 Nov 2020 06:30:25 +0000 (06:30 +0000)]
pipe: allow for lockless pipe_stat
pipes get stated all thet time and this avoidably contributed to contention.
The pipe lock is only held to accomodate MAC and to check the type.
Since normally there is no probe for pipe stat depessimize this by having the
flag.
The pipe_state field gets modified with locks held all the time and it's not
feasible to convert them to use atomic store. Move the type flag away to a
separate variable as a simple cleanup and to provide stable field to read.
Use short for both fields to avoid growing the struct.
While here short-circuit MAC for pipe_poll as well.
Dag-Erling Smørgrav [Thu, 19 Nov 2020 05:46:59 +0000 (05:46 +0000)]
Merge upstream r948: fix race condition in openpam_ttyconv(3).
Dag-Erling Smørgrav [Thu, 19 Nov 2020 05:44:41 +0000 (05:44 +0000)]
Merge upstream r948: fix race condition in openpam_ttyconv(3).
Mateusz Guzik [Thu, 19 Nov 2020 04:28:39 +0000 (04:28 +0000)]
cred: fix minor nits in r367695
Noted by: jhb
Mateusz Guzik [Thu, 19 Nov 2020 04:27:51 +0000 (04:27 +0000)]
smp: fix smp_rendezvous_cpus_retry usage before smp starts
Since none of the other CPUs are running there is nobody to clear their
entries and the routine spins indefinitely.
Mark Johnston [Thu, 19 Nov 2020 03:59:21 +0000 (03:59 +0000)]
vm_phys: Try to clean up NUMA KPIs
It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures. We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.
Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain(). Make the latter an inline function.
Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers. Include it from vm_page.h so that
vm_page_domain() can be defined there.
Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.
Reviewed by: alc
Reviewed by: dougm, kib (earlier versions)
Differential Revision: https://reviews.freebsd.org/D27207
Mark Johnston [Thu, 19 Nov 2020 02:53:29 +0000 (02:53 +0000)]
Move kern_clocksource.c to sys/conf/files
Sponsored by: The FreeBSD Foundation
Mark Johnston [Thu, 19 Nov 2020 02:50:48 +0000 (02:50 +0000)]
Remove NO_EVENTTIMERS support
The arm configs that required it have been removed from the tree.
Removing this option makes the callout code easier to read and
discourages developers from adding new configs without eventtimer
drivers.
Reviewed by: ian, imp, mav
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27270
Gleb Smirnoff [Thu, 19 Nov 2020 02:20:38 +0000 (02:20 +0000)]
Add '-u' switch that would uncompress cores that were compressed by
kernel during dump time.
A real life scenario is that cores are compressed to reduce
size of dumpon partition, but we either don't care about space
in the /var/crash or we have a filesystem level compression of
/var/crash. And we want cores to be uncompressed in /var/crash
because we'd like to instantily read them with kgdb. In this
case we want kernel to write cores compressed, but savecore(1)
write them uncompressed.
Reviewed by: markj, gallatin
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D27245
Ed Maste [Thu, 19 Nov 2020 00:03:15 +0000 (00:03 +0000)]
libc: fix undefined behavior from signed overflow in strstr and memmem
unsigned char promotes to int, which can overflow when shifted left by
24 bits or more. this has been reported multiple times but then
forgotten. it's expected to be benign UB, but can trap when built with
explicit overflow catching (ubsan or similar). fix it now.
note that promotion to uint32_t is safe and portable even outside of
the assumptions usually made in musl, since either uint32_t has rank
at least unsigned int, so that no further default promotions happen,
or int is wide enough that the shift can't overflow. this is a
desirable property to have in case someone wants to reuse the code
elsewhere.
musl commit:
593caa456309714402ca4cb77c3770f4c24da9da
Obtained from: musl
Ed Maste [Thu, 19 Nov 2020 00:02:12 +0000 (00:02 +0000)]
libc: optimize memmem two-way bad character shift
first, the condition (mem && k < p) is redundant, because mem being
nonzero implies the needle is periodic with period exactly p, in which
case any byte that appears in the needle must appear in the last p
bytes of the needle, bounding the shift (k) by p.
second, the whole point of replacing the shift k by mem (=l-p) is to
prevent shifting by less than mem when discarding the memory on shift,
in which case linear time could not be guaranteed. but as written, the
check also replaced shifts greater than mem by mem, reducing the
benefit of the shift. there is no possible benefit to this reduction of
the shift; since mem is being cleared, the full shift is valid and
more optimal. so only replace the shift by mem when it would be less
than mem.
musl commits:
8f5a820d147da36bcdbddd201b35d293699dacd8
122d67f846cb0be2c9e1c3880db9eb9545bbe38c
Obtained from: musl
MFC after: 2 weeks
Ed Maste [Wed, 18 Nov 2020 22:01:34 +0000 (22:01 +0000)]
clang-format libc string functions imported from musl
We have adopted these and don't consider them 'contrib' code, so bring
them closer to style(9). This is a followon to r315467 and r351700.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Mariusz Zaborski [Wed, 18 Nov 2020 21:26:14 +0000 (21:26 +0000)]
Add CTLFLAG_MPSAFE to the suser_enabled sysctl.
Pointed out by: mjg
Mariusz Zaborski [Wed, 18 Nov 2020 21:07:08 +0000 (21:07 +0000)]
jail: introduce per jail suser_enabled setting
The suser_enable sysctl allows to remove a privileged rights from uid 0.
This change introduce per jail setting which allow to make root a
normal user.
Reviewed by: jamie
Previous version reviewed by: kevans, emaste, markj, me_igalic.co
Discussed with: pjd
Differential Revision: https://reviews.freebsd.org/D27128
Mariusz Zaborski [Wed, 18 Nov 2020 20:59:58 +0000 (20:59 +0000)]
Fix style nits.
Conrad Meyer [Wed, 18 Nov 2020 20:20:03 +0000 (20:20 +0000)]
msdosfs(5): Fix debug-only format string
No functional change; MSDOSFS_DEBUG isn't a real build option, so this isn't
covered by LINT kernels.
Stefan Eßer [Wed, 18 Nov 2020 20:00:55 +0000 (20:00 +0000)]
Make use of the getlocalbase() function for run-time adjustment of the
local software base directory, as committed in SVN rev. 367813.
The pkg and mailwrapper programs used the LOCALBASE environment variable
for this purpose and this functionality is preserved by getlocalbase().
After this change, the value of the user.localbase sysctl variable is used
if present (and not overridden in the environment).
The nvmecontrol program gains support of a dynamic path to its plugin
directory with this update.
Differential Revision: https://reviews.freebsd.org/D27237
Dimitry Andric [Wed, 18 Nov 2020 19:55:24 +0000 (19:55 +0000)]
For llvm's internal function which retrieves the number of available
"hardware threads", use cpuset_getaffinity(2) on FreeBSD, so it will
honor processor sets configured by the cpuset(1) command.
This should make it possible to avoid e.g. lld creating a huge number of
threads on a machine with many cores, even for linking simple programs.
This will also be submitted upstream.
Submitted by: mjg
MFC after: 1 week
Mateusz Guzik [Wed, 18 Nov 2020 19:47:24 +0000 (19:47 +0000)]
fd: reorder struct file to reduce false sharing
The size on LP64 is 80 bytes, which is just more than a cacheline, does
not lend itself to easy shrinking and rounding up to 2 would be a huge
waste given NOFREE marker.
The least which can be done is to reorder it so that most commonly used
fields are less likely to span different lines, and consequently suffer
less false sharing.
With the change at hand most commonly used fields land in the same line
about 3/4 of the time, as opposed to 2/4.
Stefan Eßer [Wed, 18 Nov 2020 19:44:30 +0000 (19:44 +0000)]
Add function getlocalbase() to libutil.
This function returns the path to the local software base directory, by
default "/usr/local" (or the value of _PATH_LOCALBASE in include/paths.h
when building the world).
The value returned can be overridden by 2 methods:
- the LOCALBASE environment variable (ignored by SUID programs)
- else a non-default user.localbase sysctl value
Reviewed by: hps (earlier version)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D27236
Li-Wen Hsu [Wed, 18 Nov 2020 19:35:30 +0000 (19:35 +0000)]
ipheth(4): Fix for iOS 14
Fix USB tethering for iOS 14.
Inspired by: https://github.com/libimobiledevice/libimobiledevice/issues/1038
PR: 249979
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D27250
Alfredo Dal'Ava Junior [Wed, 18 Nov 2020 19:23:30 +0000 (19:23 +0000)]
msun tests: use standard floating-point exception flags on lrint and fenv tests
Some platforms have additional architecture-specific floating-point flags.
Msun test cases lrint and test_fegsetenv (fenv) expects only standard flags,
so make sure to mask them appropriately.
This makes test pass on PowerPC64.
Reviewed by: jhibbits, ngie
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27202
Warner Losh [Wed, 18 Nov 2020 19:22:24 +0000 (19:22 +0000)]
mergemaster: handle symbolic links during update.
/etc/os-release is now a symbolic link to a generated file. Make
mergemaster cope with symbolic links generically. I'm no longer
a big mergemaster user, so this has only been lightly tested
by me, though Kimura-san has ran it through its paces.
Submitted by: Yasushiro KIMURA-san
PR: 242212
MFC After: 2 weeks
Dimitry Andric [Wed, 18 Nov 2020 18:40:58 +0000 (18:40 +0000)]
When elftoolchain's objcopy (or strip) is rewriting a file in-place,
make it create the temporary file in the same directory as the source
file by default, instead of always using $TMPDIR or /tmp. If creating
that file fails because the directory is not writable, also fallback to
$TMPDIR or /tmp.
This has also been submitted upstream as:
https://sourceforge.net/p/elftoolchain/tickets/597/
Reported by: cem
PR: 250872
MFC after: 2 weeks
Justin Hibbits [Wed, 18 Nov 2020 17:37:01 +0000 (17:37 +0000)]
Fix octeon_pmc post-r334827
MFC after: 3 days
Sponsored by: Juniper Networks, Inc
John Baldwin [Wed, 18 Nov 2020 16:21:37 +0000 (16:21 +0000)]
Fix a few nits in vn_printf().
- Mask out recently added VV_* bits to avoid printing them twice.
- Keep VI_LOCKed on the same line as the rest of the flags.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27261
Marcin Wojtas [Wed, 18 Nov 2020 15:25:38 +0000 (15:25 +0000)]
Update ENA driver version to v2.3.0
The v2.3.0 introduces new ena_com layer, ENI metrics updates and SPDX
license tags.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27120
Nick Hibma [Wed, 18 Nov 2020 15:23:43 +0000 (15:23 +0000)]
Fix mandoc lint warnings.
Marcin Wojtas [Wed, 18 Nov 2020 15:20:01 +0000 (15:20 +0000)]
Rename descriptions of the supported ENA devices
Some of the PCI ID were described as ENA with LLQ support - it's not
fully accurate and because of that, their names were changed.
Instead of LLQ, use RSERV0 for the description of those devices.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27119
Marcin Wojtas [Wed, 18 Nov 2020 15:17:55 +0000 (15:17 +0000)]
Add ENI metrics for the ENA driver
The new HAL allows the driver to read extra ENI stats. Exact meaning of
each of them can be found in base/ena_defs/ena_admin_defs.h file and
structure ena_admin_eni_stats.
Those stats are being updated inside of the timer service, which is
executed every second.
ENI metrics are turned off by default. They can be enabled, using the
sysctl node: dev.ena.X.eni_metrics.update_delay
0 value in this node means that the update is turned off. Other values
determine how many seconds must pass, before ENI metrics will be
updated.
They can be acquired, using sysctl:
sysctl dev.ena.X.eni_metrics
Where X stands for the interface number.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27118
Marcin Wojtas [Wed, 18 Nov 2020 15:07:34 +0000 (15:07 +0000)]
Add SPDX license tag to the ENA driver files
Refering to guide: https://wiki.freebsd.org/SPDX the SPDX tag should not
replace the standard license text, however it should be added over the
standard license text to make the automation easier.
Because of that, the old license was kept, but the SPDX tag was added
on top of every ENA driver file.
Submited by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27117
Marcin Wojtas [Wed, 18 Nov 2020 15:02:12 +0000 (15:02 +0000)]
Add Rx offsets support for the ENA driver
For the first descriptor in a chain the data may start at an offset.
It is optional feature of some devices, so the driver must ack that
it supports it.
The data pointer of the mbuf is simply shifted by the given value.
Submitted by: Maciej Bielski <mba@semihalf.com>
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27116
Marcin Wojtas [Wed, 18 Nov 2020 14:59:22 +0000 (14:59 +0000)]
Adjust ENA driver files to latest ena-com changes
* Use the new API of ena_trace_*
* Fix typo syndrom --> syndrome
* Remove validation of the Rx req ID (already performed in the ena-com)
* Remove usage of deprecated ENA_ASSERT macro
Submitted by: Ido Segev <idose@amazon.com>
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27115
Andrew Gallatin [Wed, 18 Nov 2020 14:55:49 +0000 (14:55 +0000)]
LACP: When suppressing distributing, return ENOBUFS
When links come and go, lacp goes into a "suppress distributing" mode
where it drops traffic for 3 seconds. When in this mode, lagg/lacp
historiclally drops traffic with ENETDOWN. That return value causes TCP
to close any connection where it gets that value back from the lower
parts of the stack. This means that any TCP connection with active
traffic during a 3-second windown when an LACP link comes or goes
would get closed.
TCP treats return values of ENOBUFS as transient errors, and re-schedules
transmission later. So rather than returning ENETDOWN, lets
return ENOBUFS instead. This allows TCP connections to be preserved.
I've tested this by repeatedly bouncing links on a Netlfix CDN server
under a moderate (20Gb/s) load and overved ENOBUFS reported back to
the TCP stack (as reported by a RACK TCP sysctl).
Reviewed by: jhb, jtl, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27188
Marcin Wojtas [Wed, 18 Nov 2020 14:54:55 +0000 (14:54 +0000)]
Upgrade ENA HAL to the latest version (26/10/20)
Add support for the ENI metrics, bug fix for destroying wait event and
also other minor bug fixes, improvements, etc.
Submitted by: Ido Segev <idose@amazon.com>
Obtained from: Amazon, Inc.
Marcin Wojtas [Wed, 18 Nov 2020 14:50:12 +0000 (14:50 +0000)]
Fix completion descriptors alignment for the ENA
The latest generation hardware requires IO CQ (completion queue)
descriptors memory to be aligned to a 4K. It needs that feature for
the best performance.
Allocating unaligned descriptors will have a big performance impact as
the packet processing in a HW won't be optimized properly. For that
purpose adjust ena_dma_alloc() to support it.
It's a critical fix, especially for the arm64 EC2 instances.
Submitted by: Ido Segev <idose@amazon.com>
Obtained from: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27114
Marcin Wojtas [Wed, 18 Nov 2020 14:30:59 +0000 (14:30 +0000)]
ena-com: Fix ena-com to allocate cdesc aligned to 4k
The latest generation hardware requires IO CQ (completion queue)
descriptors memory to be aligned to a 4K. It needs that feature for
the best performance.
Allocating unaligned descriptors will have a big performance impact as
the packet processing in a HW won't be optimized properly.
It's a critical fix, especially for the arm64 EC2 instances.
Hans Petter Selasky [Wed, 18 Nov 2020 13:47:11 +0000 (13:47 +0000)]
Allow LinuxKPI types to be used in bootloaders, by checking for the
_STANDALONE definition.
No functional change intended.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Hans Petter Selasky [Wed, 18 Nov 2020 13:45:32 +0000 (13:45 +0000)]
Add missing header file when building the LinuxKPI module separately.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Hans Petter Selasky [Wed, 18 Nov 2020 13:22:22 +0000 (13:22 +0000)]
Fix build of USB bootloader code by adding checks for _STANDALONE being defined.
Currently the USB bootloader code is not part of buildworld.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Alan Somers [Wed, 18 Nov 2020 04:35:49 +0000 (04:35 +0000)]
nfs: Mark unused statistics variable as reserved
FreeBSD's NFS exporter has long exported some unused statistics fields.
Revision r366992 removed them from nfsstat. This revision renames those
fields in the kernel's exported structures to make it clear to other
consumers that they are unused.
Reported by: emaste
Reviewed by: emaste
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D27258
Alexander Motin [Wed, 18 Nov 2020 03:43:03 +0000 (03:43 +0000)]
Move ecmd memory allocation itto separate DMA tag.
Ecmd memory is not directly related to the request queue, only referenced
from it sometimes in target mode. Separate allocation should be easier
in case of fragmented memory and can be skipped when target is not built.
MFC after: 1 month
Kyle Evans [Wed, 18 Nov 2020 03:30:31 +0000 (03:30 +0000)]
_umtx_op: fix robust lists after r367744
A copy-pasto left us copying in 24-bytes at the address of the rb pointer
instead of the intended target.
Reported by: sigsys@gmail.com
Sighing: kevans
Alexander Motin [Wed, 18 Nov 2020 02:54:05 +0000 (02:54 +0000)]
Remove bus_dma locking/sleeping when not needed.
MFC after: 1 month
Alexander Motin [Wed, 18 Nov 2020 02:12:51 +0000 (02:12 +0000)]
Don't allocate full XCMD_SIZE (512 bytes) on stack.
We need only 24 bytes (fcp_rsp_iu_t) there for isp_put_fcp_rsp_iu().
MFC after: 1 month
Cy Schubert [Wed, 18 Nov 2020 01:18:45 +0000 (01:18 +0000)]
Restore identification of VDEVs using non-native block size.
NAME STATE READ WRITE CKSUM
dsk02 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada1s4a ONLINE 0 0 0
ada2s4a ONLINE 0 0 0 block size: 512B configured,
4096B native
Reviewed by: tsoome (previous FreeBSD phab version)
Differential Revision: https://reviews.freebsd.org/D26880
Upstream commit:
3928ec53395fcc26be7844dd6b63df757166c281
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed off by: Cy Schubert <cy@FreeBSD.org>
Closes #11088
Conrad Meyer [Tue, 17 Nov 2020 21:20:11 +0000 (21:20 +0000)]
linux(4) clone(2): Correctly handle CLONE_FS and CLONE_FILES
The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1(). This change depends on the pwddesc split
introduced in r367777.
I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag). This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.
In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.
The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).
Discussed with: markj, trasz (earlier version)
Differential Revision: https://reviews.freebsd.org/D27016
Conrad Meyer [Tue, 17 Nov 2020 21:14:13 +0000 (21:14 +0000)]
Split out cwd/root/jail, cmask state from filedesc table
No functional change intended.
Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).
__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.
Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037
Conrad Meyer [Tue, 17 Nov 2020 20:01:21 +0000 (20:01 +0000)]
unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI
As this ABI is still fresh (r367287), let's correct some mistakes now:
- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess
Discussed with: kib, markj, trasz
Differential Revision: https://reviews.freebsd.org/D27084
Conrad Meyer [Tue, 17 Nov 2020 19:56:47 +0000 (19:56 +0000)]
linprocfs(5): Add rudimentary /proc/<pid>/mountinfo
This is used by some Linux programs using filehandles (r367773) to locate
the mountpoint for a given fsid.
Differential Revision: https://reviews.freebsd.org/D27136
Conrad Meyer [Tue, 17 Nov 2020 19:53:59 +0000 (19:53 +0000)]
'make sysent' for r367773
X-MFC-With: r367773
Conrad Meyer [Tue, 17 Nov 2020 19:51:47 +0000 (19:51 +0000)]
linux(4): Implement name_to_handle_at(), open_by_handle_at()
They are similar to our getfhat(2) and fhopen(2) syscalls.
Differential Revision: https://reviews.freebsd.org/D27111
Ed Maste [Tue, 17 Nov 2020 18:28:20 +0000 (18:28 +0000)]
uplcom: add ATen/Prolific USB-232 Controller D USB ID
PR: 251166
Submitted by: marcus
MFC after: 2 weeks
Adrian Chadd [Tue, 17 Nov 2020 17:12:28 +0000 (17:12 +0000)]
[nvmecontrol] Fix type signedness warning-to-error on gcc-6.4
This fixes a type signedness comparison warning-to-error on
gcc-6.4. The ternary operation casts it right but the actual
assignment doesn't.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D26791
Adrian Chadd [Tue, 17 Nov 2020 17:11:07 +0000 (17:11 +0000)]
[cddl] Fix lz4 function definitions to not tri pup compile.
This tripped up in llvm compilation on amd64 noting that lz4_init/lz4_fini
were lacking in being previously defined.
Reviewed by: emaste, freqlabs, brooks
Differential Revision: https://reviews.freebsd.org/D27240
Mateusz Piotrowski [Tue, 17 Nov 2020 16:54:12 +0000 (16:54 +0000)]
Partially revert r367756 (chpass(1) synopsis changes)
Let's have two entries in the synopsis:
- chpass now lists options which can be used for non-NIS-specific
functionalities.
- ypchpass additionally lists the NIS-specific flags.
Technically, it is an artificial distinction, as chpass and ypchpass behave
identically. Nevertheless, it might help navigating the synopsis section.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D27251
Alexander Motin [Tue, 17 Nov 2020 16:34:58 +0000 (16:34 +0000)]
Stop using NVME_MAX_XFER_SIZE constant.
This constant depends on MAXPHYS and does not respect device capabilities.
Use proper dynamic ioctl(NVME_GET_MAX_XFER_SIZE) instead.
MFC after: 1 month
Emmanuel Vadot [Tue, 17 Nov 2020 14:59:58 +0000 (14:59 +0000)]
syscon: Add syscon_get_by_ofw_node
This allow to get a syscon node defined under a specific fdt node (which isn't
always the device one).
Emmanuel Vadot [Tue, 17 Nov 2020 14:58:30 +0000 (14:58 +0000)]
arm64: allwinner: Init the Display Engine clock
In case u-boot was compiled without video support set the PLL
to 432Mhz (which allow us to use most of the HDMI resolution for
tcon) and set it as the parent for the DE clock.
Emmanuel Vadot [Tue, 17 Nov 2020 14:57:34 +0000 (14:57 +0000)]
arm: allwinner: Add DE2 Clock support for H3 SoC
While here also enable the clock and deassert the reset
Emmanuel Vadot [Tue, 17 Nov 2020 14:41:23 +0000 (14:41 +0000)]
vchiq: Rename timer func so they do not conflict with linuxkpi
Jonathan T. Looney [Tue, 17 Nov 2020 14:07:27 +0000 (14:07 +0000)]
When copying types from one CTF container to another, ensure that we
always copy intrinsic data types before copying bitfields which are
based on those types. This ensures the type ordering in the destination
CTF container matches the assumption made elsewhere in the CTF code
that instrinsic data types will always appear before bitfields based on
those types.
This resolves the following error message some users have seen after
r366908:
"/usr/lib/dtrace/ipfw.d", line 121: failed to copy type of 'ip6p':
Conflicting type is already defined
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27213
Peter Grehan [Tue, 17 Nov 2020 13:14:04 +0000 (13:14 +0000)]
Add legacy debug/test interfaces for kvm unit tests.
Implement the legacy debug/test interfaces expected by KVM-unit-tests'
realmode, emulator, and ioapic tests.
Submitted by: adam_fenn.io
Reviewed by: markj, grehan
Approved by: grehan (bhyve)
MFC after: 3 weeks
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D27130
Alfredo Dal'Ava Junior [Tue, 17 Nov 2020 12:36:59 +0000 (12:36 +0000)]
[POWERPC] msun: fix incorrect flag in fesetexceptflag
Fix incorrect mask being used when FE_INVALID bit is wanted by user.
The problem was noticed thanks to msun fenv tests.
Reviewed by: jhibbits, luporl
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27201
Alfredo Dal'Ava Junior [Tue, 17 Nov 2020 12:33:12 +0000 (12:33 +0000)]
[POWERPC] fix signal race condition
r367416 should have called save_fpu() before kern_sigprocmask to avoid
race condition
Thanks jhibbits and bdragon for pointing it out
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27241
Mateusz Piotrowski [Tue, 17 Nov 2020 12:04:29 +0000 (12:04 +0000)]
Add an example for the -s flag
MFC after: 2 weeks
Leandro Lupori [Tue, 17 Nov 2020 11:36:31 +0000 (11:36 +0000)]
[PowerPC] Don't overwrite vm.pmap sysctl node
After r367417, both mmu_oea64 and mmu_radix were defining the vm.pmap
sysctl node, resulting in the later definition hiding the properties of
the previous one. Avoid this issue by defining vm.pmap in a common
source file and declaring it where needed.
This change also standardizes the tunable name used to enable superpages
and change its default to disabled on radix MMU, because it still has some
issues with superpages.
Reviewed by: bdragon, jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27156
Mateusz Piotrowski [Tue, 17 Nov 2020 10:57:28 +0000 (10:57 +0000)]
Improve readability of the lists of options
- Sort options alphabetically
- Add missing arguments (e.g., "list" to -a)
- Adjust the width of Bl
MFC after: 1 week
Mateusz Piotrowski [Tue, 17 Nov 2020 10:48:01 +0000 (10:48 +0000)]
Clean up the synopsis section & fix mandoc warnings
The synopsis section had two very similar entries. The flags documented by
the first one were a strict subset of the second one. Let's just keep only
the second entry for simplicity.
MFC after: 1 week
Andrew Turner [Tue, 17 Nov 2020 10:27:42 +0000 (10:27 +0000)]
Stop calling gic_v3_detach when we haven't called gic_v3_attach
The former tries to dereference memory allocated by the latter. If counting
the redistributor fails it may try to dereference memory that was never
allocated.
Sponsored by: Innovate UK
Andrew Turner [Tue, 17 Nov 2020 10:17:18 +0000 (10:17 +0000)]
Allow the GICv3 ACPI driver to attach to a GICv4
The same driver works on both, allow the driver to attach to a GICv4
controller with the ACPI attachment.
Reported by: Andrey Fesenko <f0andrey_gmail.com>
Sponsored by: Innovate UK
Kyle Evans [Tue, 17 Nov 2020 04:22:10 +0000 (04:22 +0000)]
Fix !COMPAT_FREEBSD32 kernel build
One of the last shifts inadvertently moved these static assertions out of a
COMPAT_FREEBSD32 block, which the relevant definitions are limited to.
Fix it.
Pointy hat: kevans
Kyle Evans [Tue, 17 Nov 2020 04:06:35 +0000 (04:06 +0000)]
sys/proc.h: improve comment for new TDP2 flag
This was suggested by kib and integrated locally, but somehow did not make
it into the committed version.
Kyle Evans [Tue, 17 Nov 2020 03:36:58 +0000 (03:36 +0000)]
umtx_op: reduce redundancy required for compat32
All of the compat32 variants are substantially the same, save for
copyin/copyout (mostly). Apply the same kind of technique used with kevent
here by having the syscall routines supply a umtx_copyops describing the
operations needed.
umtx_copyops carries the bare minimum needed- size of timespec and
_umtx_time are used for determining if copyout is needed in the sem2_wait
case.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27222
Kyle Evans [Tue, 17 Nov 2020 03:34:01 +0000 (03:34 +0000)]
_umtx_op: fix a compat32 bug in UMTX_OP_NWAKE_PRIVATE
Specifically, if we're waking up some value n > BATCH_SIZE, then the
copyin(9) is wrong on the second iteration due to upp being the wrong type.
upp is currently a uint32_t**, so upp + pos advances it by twice as many
elements as it should (host pointer size vs. compat32 pointer size).
Fix it by just making upp a uint32_t*; it's still technically a double
pointer, but the distinction doesn't matter all that much here since we're
just doing arithmetic on it.
Add a test case that demonstrates the problem, placed with the libthr tests
since one messing with _umtx_op should be running these tests. Running under
compat32, the new test case will hang as threads after the first 128 get
missed in the wake. it's not immediately clear how to hit it in practice,
since pthread_cond_broadcast() uses a smaller (sleepq batch?) size observed
to be around ~50 -- I did not spend much time digging into it.
The uintptr_t change makes no functional difference, but i've tossed it in
since it's more accurate (semantically).
Reported by: Andrew Gierth (andrew_tao173.riddles.org.uk, inspection)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27231
Kyle Evans [Tue, 17 Nov 2020 03:26:56 +0000 (03:26 +0000)]
_umtx_op: document UMTX_OP_SEM2_WAIT copyout behavior
This clever technique to get a time remaining back was added to support sem_clockwait_np.
Reviewed by: kib, vangyzen
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27160
Konstantin Belousov [Tue, 17 Nov 2020 02:18:34 +0000 (02:18 +0000)]
vmem: trivial warning and style fixes.
Add __unused to some args.
Change type of the iterator variables to match loop control.
Remove excessive {}.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27220
Mateusz Guzik [Tue, 17 Nov 2020 00:04:30 +0000 (00:04 +0000)]
cpuset: reorder so that cs_mask does not share cacheline with cs_ref
Mateusz Guzik [Tue, 17 Nov 2020 00:04:05 +0000 (00:04 +0000)]
cpuset: refcount-clean