Sascha Wildner [Tue, 24 Mar 2020 15:52:43 +0000 (16:52 +0100)]
ssh-copy-id(1): Fix a printf(1) missing format character warning.
Reported-by: Pierre-Alain TORET <pierre-alain.toret@protonmail.com>
Sascha Wildner [Tue, 24 Mar 2020 15:24:01 +0000 (16:24 +0100)]
Hook ssh-copy-id into the build and adjust README.DELETED.
Sascha Wildner [Tue, 24 Mar 2020 15:14:57 +0000 (16:14 +0100)]
Merge branch 'vendor/OPENSSH'
Sascha Wildner [Tue, 24 Mar 2020 15:14:20 +0000 (16:14 +0100)]
Import OpenSSH-8.0p1's ssh-copy-id and manual page on the vendor branch.
Sascha Wildner [Mon, 23 Mar 2020 16:17:27 +0000 (17:17 +0100)]
config(8): Replace the old "= {" syntax in config.y with just "{".
According to FreeBSD it was only for compatibility with
6th edition Unix' yacc(1).
Sascha Wildner [Fri, 20 Mar 2020 14:48:16 +0000 (15:48 +0100)]
Adjust iwm.4/iwmfw.4/fstab.5 a bit.
* iwm.4/iwmfw.4: Mention new devices and firmware.
Reported-by: noob237 (IRC)
* fstab.5: Mention HAMMER2.
Reported-by: daftaupe (IRC)
Tomohiro Kusumi [Fri, 20 Mar 2020 14:39:06 +0000 (23:39 +0900)]
sys/vfs/autofs: Make autofs(5) timeout messages include affected process name and PID
from freebsd/freebsd@
4a10e991b50fa9c8e0ec6af5f8cc81aa63d0e1f3
Matthew Dillon [Thu, 19 Mar 2020 00:29:08 +0000 (17:29 -0700)]
dsynth - Fix bug in dequote()
* Fix bug, the buffer was declared on the stack instead of as a
static.
* Repurpose the static buffer to be a buffer pointer instead,
removing the 256 char limit. The limit was messing up the
:|: delimeter.
Matthew Dillon [Wed, 18 Mar 2020 23:35:39 +0000 (16:35 -0700)]
kernel - Generate POLLHUP for fully disconnected socket
* Properly generate POLLHUP for fully disconnected sockets.
However, there is still a possible issue. We do not set POLLHUP
for half-closed sockets and it is really unclear whether we should
or not once read data has been exhausted.
Matthew Dillon [Wed, 18 Mar 2020 18:04:58 +0000 (11:04 -0700)]
dsynth - Fix escaping, skipped count
* Fix backslashes in info fields that were causing json to implode.
* Remove parens around (%d) skipped count reporting so the field
can be sorted.
Reported-by: tuxillo
François Tigeot [Wed, 18 Mar 2020 14:06:02 +0000 (15:06 +0100)]
drm/i915: Revert Clean up DDI DDC/AUX CH sanitation
* This reverts Linux commit
0ce140d45a8398b501934ac289aef0eb7f47c596
* It caused phantom screens to be detected on some Skylake machines
Reported-by: Peeter Must
Sascha Wildner [Tue, 17 Mar 2020 14:24:15 +0000 (15:24 +0100)]
Update the pciconf(8) database.
March 7, 2020 snapshot from https://pci-ids.ucw.cz
Matthew Dillon [Mon, 16 Mar 2020 18:39:40 +0000 (11:39 -0700)]
kernel - Fix rare vm_map_entry exhaustion panic (2)
* Increase per-cpu fast-cache hysteresis from its absurdly small
value to a significantly larger value.
* Missing header file update for prior commit
Matthew Dillon [Mon, 16 Mar 2020 18:24:52 +0000 (11:24 -0700)]
kernel - Fix rare vm_map_entry exhaustion panic
* Fix a rare situation where many processes blocked in zget() on
the same CPU can cause the kernel's per-cpu vm_map_entry entry
to be exhusted, causing a panic.
The situation arises because the zget() operation can wake other
threads up via the deeper vm_map lock after burning vm_map_entry's
to expand the space but prior to adding the new structural objects
to the vm_zone.
* Caused a leaf.dragonflybsd.org panic on concurrent git fork/exec's
via the web server.
Sascha Wildner [Sat, 14 Mar 2020 18:15:48 +0000 (19:15 +0100)]
<sys/conf.h>: Remove some more dead prototypes.
Sascha Wildner [Sat, 14 Mar 2020 18:14:36 +0000 (19:14 +0100)]
kernel: Remove some get_dev() remains. We no longer have this function.
zrj [Sat, 14 Mar 2020 09:20:30 +0000 (11:20 +0200)]
kernel: Adjust description for kern.tls_extra
zrj [Thu, 13 Feb 2020 12:07:37 +0000 (14:07 +0200)]
kernel: Add handling for R_X86_64_PLT32 (type 4) in kernel linker.
Newer binutils can emit R_X86_64_PLT32 for -shared compilations.
Tested-with: binutils234
Sascha Wildner [Fri, 13 Mar 2020 20:42:59 +0000 (21:42 +0100)]
kernel/kprintf: Add a tunable for the kern.kprintf_logging sysctl.
While here, add one too for security.unprivileged_read_msgbuf and
document the tunables affecting kprintf(9).
François Tigeot [Thu, 12 Mar 2020 06:16:53 +0000 (07:16 +0100)]
drm/i915: Update DRIVER_DATE to
20161024
Matthew Dillon [Wed, 11 Mar 2020 18:57:57 +0000 (11:57 -0700)]
kernel - Rework vfs_timestamp(), adjust default
* Rework the vfs_timestamp() precision mode as follows:
0 TSP_SEC seconds granularity
1 TSP_HZ ticks granularity
2 TSP_USEC ticks granularity modulo microseconds
3 TSP_NSEC ticks granularity modulo nanoseconds
4 TSP_USEC_PRECISE precise microseconds (expensive)
5 TSP_NSEC_PRECISE precise nanoseconds (expensive)
The default is TSP_USEC (with tick granularity)
* Change numerous bits of code that were calling getmicrotime()
or calling microtime()/nanotime() explicitly instead of calling
vfs_timstamp(). procfs and devfs in particular.
Reported-by: mjg
Matthew Dillon [Wed, 11 Mar 2020 18:55:57 +0000 (11:55 -0700)]
kernel - Do not use rdtsc() in the spinlock loop when virtualized
* When running as a guest, do not use rdtsc() in the spinlock loop
as numerous HVM subsystems will trap-out on the instruction.
Reported-by: mjg
Matthew Dillon [Wed, 11 Mar 2020 18:50:36 +0000 (11:50 -0700)]
kernel - Allow 8254 timer to be forced, clean-up user/sys/intr/idle
* Allows the 8254 timer to be forced on for machines which do not
support the LAPIC timer during deep-sleep. Fix an assertion that
occurs in this situation.
hw.i8254.intr_disable="0"
* Adjust the statclock to calculate user/sys/intr/idle time
properly when the clock interrupt occurs from an interrupt
thread instead of from a hard interrupt.
Basically when the clock interrupt occurs from an interrupt thread,
we have to look at curthread->td_preempted instead of curthread.
In addition RQF_INTPEND will be set across the call due to the way
processing works and we have to look at the bitmask of interrupt
sources instead of this bit.
Reported-by: CuteLarva
Sascha Wildner [Wed, 11 Mar 2020 15:13:49 +0000 (16:13 +0100)]
BSD.include.dist: Fix indentation (we use spaces in these files).
François Tigeot [Wed, 11 Mar 2020 11:19:45 +0000 (12:19 +0100)]
world: Install Linux headers required by Mesa >= 19.3
Avoiding many patches in dports
François Tigeot [Wed, 11 Mar 2020 11:15:31 +0000 (12:15 +0100)]
linux/types.h: Fix compilation with userland C++ programs
Such as newer Mesa versions
François Tigeot [Mon, 9 Mar 2020 22:28:12 +0000 (23:28 +0100)]
drm/linux: Rewrite the tasklet implementation
Newer drm/i915 driver versions expect tasklets to run in dedicated
threads and no longer work with synchronous calls.
Thanks to Matthew Dillon for advice on locking issues and how best
to resolve mp races.
François Tigeot [Sun, 8 Mar 2020 21:12:42 +0000 (22:12 +0100)]
drm/linux: Add put_pid()
Sascha Wildner [Sun, 8 Mar 2020 14:19:35 +0000 (15:19 +0100)]
libkvm: No need to include <sys/proc.h> when <sys/user.h> is included.
Sascha Wildner [Sun, 8 Mar 2020 13:34:21 +0000 (14:34 +0100)]
kernel: Include <sys/lock.h> instead of <sys/mutex.h> in linux/kfifo.h.
This should have been changed in
45aa70c6e8cc2435e82aabd4d0d233948c7cb105.
Sascha Wildner [Sun, 8 Mar 2020 13:18:39 +0000 (14:18 +0100)]
kernel: Add missing newlines at the end of two files.
Sascha Wildner [Sun, 8 Mar 2020 07:18:25 +0000 (08:18 +0100)]
Revert "Remove unneeded *_if.c from SRCS in kernel module Makefiles that have it."
This reverts commit
99bd8089615e30757d8327c0a5afe0b8fe69d337.
Oops, this seems to have broken a few things after all. I'll investigate better.
Reported-by: Peeter Must
Sascha Wildner [Sun, 8 Mar 2020 05:48:48 +0000 (06:48 +0100)]
Remove unneeded *_if.c from SRCS in kernel module Makefiles that have it.
Those are always compiled into the kernel, per NORMAL_M in kern.pre.mk,
so they don't need to be in a module's SRCS. This removes the few cases
where they were added by mistake.
François Tigeot [Sat, 7 Mar 2020 18:28:17 +0000 (19:28 +0100)]
drm/linux: Add disable_irq() and enable_irq()
François Tigeot [Sat, 7 Mar 2020 18:13:58 +0000 (19:13 +0100)]
drm/linux: Add atomic_fetch_xor()
François Tigeot [Sat, 7 Mar 2020 18:09:24 +0000 (19:09 +0100)]
drm/linux: Add io_mapping_init_wc() and _fini()
Matthew Dillon [Sat, 7 Mar 2020 17:41:28 +0000 (09:41 -0800)]
rtld - Use kern.tls_extra
* Use kern.tls_extra, if available, to calculate the extra tls
space to allocate for late library loads.
* If not available, default to 6144 bytes instead of 1280 bytes
to support greater use of static tls sections in late-loaded
libraries (read: mesa 19.3).
Reported-by: ftigeot
Matthew Dillon [Sat, 7 Mar 2020 17:39:03 +0000 (09:39 -0800)]
kernel - Add sysctl kern.tls_extra
* Add sysctl kern.tls_extra, defaulting to 6144, which rtld will query
to get the amount of extra tls space to allocate to accomodate late
library loads.
François Tigeot [Fri, 6 Mar 2020 10:13:03 +0000 (11:13 +0100)]
drm/linux: Add wake_up_bit() and wait_on_bit_timeout()
François Tigeot [Fri, 6 Mar 2020 10:12:39 +0000 (11:12 +0100)]
drm: Improve linux/timer.h
François Tigeot [Fri, 6 Mar 2020 10:12:08 +0000 (11:12 +0100)]
drm/linux: Add list_for_each_entry_from()
Obtained-from: FreeBSD
François Tigeot [Fri, 6 Mar 2020 10:11:50 +0000 (11:11 +0100)]
drm/linux: Add __add_wait_queue_tail()
Obtained-from: FreeBSD
François Tigeot [Fri, 6 Mar 2020 10:08:55 +0000 (11:08 +0100)]
drm/linux: Add atomic_set_release()
Obtained-from: FreeBSD
Matthew Dillon [Thu, 5 Mar 2020 18:40:54 +0000 (10:40 -0800)]
kernel - Add minor VM shortcuts (2)
* Fix bug last commit. I was trying to shortcut the case where the
vm_page was not flagged MAPPED or WRITEABLE, but didn't read my
own code comment above the conditional and issued a vm_page_free()
without first checking to see if the VM object could be locked.
This lead to a livelock in the kernel under heavy loads.
* Rejigger the fix to do the shortcut in a slightly different
place.
François Tigeot [Thu, 5 Mar 2020 08:33:22 +0000 (09:33 +0100)]
drm/linux: Add oops_in_progress
François Tigeot [Thu, 5 Mar 2020 08:32:59 +0000 (09:32 +0100)]
drm/linux: Add reboot_notifier functions
François Tigeot [Thu, 5 Mar 2020 08:31:54 +0000 (09:31 +0100)]
drm/linux: Add the "noinline" compiler directive
François Tigeot [Thu, 5 Mar 2020 08:31:08 +0000 (09:31 +0100)]
drm/linux: Add spin_lock_irqsave_nested()
François Tigeot [Thu, 5 Mar 2020 08:24:46 +0000 (09:24 +0100)]
drm/linux: Implement static_branch_xxx functions
François Tigeot [Thu, 5 Mar 2020 08:24:04 +0000 (09:24 +0100)]
drm: Add a few linux/gfp.h constants and functions
François Tigeot [Thu, 5 Mar 2020 08:23:26 +0000 (09:23 +0100)]
drm/linux: Add the X86_FEATURE_XMM4_1 flag
Matthew Dillon [Thu, 5 Mar 2020 01:21:41 +0000 (17:21 -0800)]
tmpfs - Fix bug last commit
* TMPFS_NODE_LOCK -> TMPFS_NODE_UNLOCK
Reported-by: Aaron LI
Matthew Dillon [Wed, 4 Mar 2020 17:23:09 +0000 (09:23 -0800)]
kernel - Minor pmap optimizations, minor swap_pager*() optimizations
* Avoid the atomic_fcmpset_int() in pmap_enter() if the page already
has the appropriate PG_* bits set. Because the page is soft-busied,
this check should not be able to race any clearing of the bits (which
can only be done when the page is hard-busied).
* swap_pager_freespace*() no longer bothers to acquire the object
token if the object's swblock_count is 0.
* Misc __read_* optimizations
* Misc __predict* optimizations
Matthew Dillon [Wed, 4 Mar 2020 17:17:47 +0000 (09:17 -0800)]
kernel - Add minor VM shortcuts
* Adjust vm_page_hash_elm heuristic to save the full pindex field
instead of just the lower 32 bits.
* Refactor the hash table and hash lookup to index directly to the
potential hit rather than masking to the SET size (~3). This
improves our chances of finding the requested page without having
to iterate.
The hash table is now N + SET sized and the SET iteration runs
from the potential direct-hit point forwards.
* Minor __predict* code optimizations.
* Shortcut vm_page_alloc() when PG_MAPPED|PG_WRITEABLE are clear
to avoid unnecessary code paths.
Matthew Dillon [Wed, 4 Mar 2020 17:16:09 +0000 (09:16 -0800)]
kernel - Minor optimizations
* Minor __predict and __read_mostly/frequently optimizations.
Matthew Dillon [Wed, 4 Mar 2020 17:10:37 +0000 (09:10 -0800)]
tmpfs - Fix minor deadlock, refactor tn_links
* Fix a minor deadlock. tmpfs_alloc_vp() can rarely race a vnode
and leave a dangling lock, causing a later umount to deadlock.
* Refactor tn_links to use atomic ops, mainly to clean-up an
almost impossible race that can happen at umount time.
François Tigeot [Wed, 4 Mar 2020 16:37:25 +0000 (17:37 +0100)]
drm/linux: Add __test_and_set_bit()
François Tigeot [Wed, 4 Mar 2020 16:29:16 +0000 (17:29 +0100)]
drm: Add a few Linux headers
Matthew Dillon [Wed, 4 Mar 2020 04:06:12 +0000 (20:06 -0800)]
kernel - Rename spinlock counter trick API
* Rename the access side of the API from spin_update_*() to
spin_access_*() to avoid confusion.
Matthew Dillon [Wed, 4 Mar 2020 03:35:59 +0000 (19:35 -0800)]
kernel - Refactor cache_vref() using counter trick
* Refactor cache_vref() such that it is able to validate that a vnode
(whos ref count might be 0) is not in VRECLAIM, without acquiring the
vnode lock. This is the normal case.
If cache_vref() is unable to do this, it backs down to the old method
which was to get a vnode lock, validate that the vnode is not in
VRECLAIM, then release the lock.
* NOTE: In DragonFlyBSD, holding a vref on a vnode (vref, NOT vhold) will
prevent the vnode from transitioning to VRECLAIM.
* Use the new feature for nlookup's naccess() tests and for the *stat*()
series of system calls.
This significantly increases performance. However, we are not entirely
cache-contention free as both the namecache entry and the vnode are still
referenced, requiring atomic adds.
Matthew Dillon [Wed, 4 Mar 2020 03:34:08 +0000 (19:34 -0800)]
kernel - More counter trick spinlock APIs
* Add spin_update_start_only(), spin_update_check_inprog(), and
spin_update_end_only(). Same as the non-only versions but do not
acquire a spin-lock in the (v & 1) case.
Matthew Dillon [Wed, 4 Mar 2020 01:58:43 +0000 (17:58 -0800)]
kernel - Fix minor scheduler bugs
* Fix an issue where the ucount and/or uload could get out of sync,
giving a cpu a permanent bias one way or the other.
* Fix an issue where the scheduler pull feature (0x01 or 0x04) would
ignore cpus with non-zero run-queue counts in favor of cpus with
high uloads (that can be due to processes sleeping < 1 second).
* By default, the scheduler helper's pull feature now runs once a tick.
This may be changed as testing progresses.
Matthew Dillon [Tue, 3 Mar 2020 21:26:48 +0000 (13:26 -0800)]
kernel - Normalize the vx_*() vnode interface
* The vx_*() vnode interface is used for initial allocations, reclaims,
and terminations.
Normalize all use cases to prevent the mixing together of the vx_*()
API and the vn_*() API. For example, vx_lock() should not be paired
with vn_unlock(), and so forth.
* Integrate an update-counter mechanism into the vx_*() API, assert
reasonability.
* Change vfs_cache.c to use an int update counter instead of a long.
The vfs_cache code can't quite use the spin-lock update counter API
yet.
Use proper atomics for load and store.
* Implement VOP_GETATTR_QUICK, meant to be a 'quick' version of
VOP_GETATTR() that only retrieves information related to permissions
and ownership. This will be fast-pathed in a later commit.
* Implement vx_downgrade() to convert an exclusive vx_lock into an
exclusive vn_lock (for vnodes). Adjust all use cases in the
getnewvnode() path.
* Remove unnecessary locks in tmpfs_getattr() and don't use
any in tmpfs_getattr_quick().
* Remove unnecessary locks in hammer2_vop_getattr() and don't use
any in hammer2_vop_getattr_quick()
Matthew Dillon [Tue, 3 Mar 2020 18:21:33 +0000 (10:21 -0800)]
kernel - Integrate the counter & 1 trick into the spinlock API
* The counter trick allows read accessors to sample a data structure
without any further locks or ref-counts, as long as the data structure
is free-safe.
It does not necessarily protect pointers within the data structure so e.g.
ref'ing some sub-structure via a data structure pointer is not safe
on its own unless the sub-structure is able to provide some sort of
additional guarantee.
* Our struct spinlock has always been 8 bytes, but only uses 4 bytes for
the lock. Ipmlement the new API using the second field.
Accessor side:
spin_update_start()
spin_update_end()
Modifer side:
spin_lock_update()
spin_unlock_update()
* On the acessor side if spin_update_start() detects a change in-progress
it will obtain a shared spin-lock, else remains unlocked.
spin_update_end() tells the caller whether it must retry the operation,
i.e. if a change occurred between start and end. This can only happen
if spin_update_start() remained unlocked.
If the start did a shared lock then no changes are assumed to have occurred
and spin_update_end() will release the shared spinlock and return 0.
* On the modifier side, spin_lock_update() obtains an exclusive spinlock
and increments the update counter, making it odd ((spin->update & 1) != 0).
spin_unlock_update() increments the counter again, making it even but
different, and releases the exclusive spinlock.
Matthew Dillon [Tue, 3 Mar 2020 04:42:22 +0000 (20:42 -0800)]
kernel - syscall path optimizations
* Shortcut checks in dfly_acquire_curproc(), significantly reducing
system call overhead.
* Move one of the TDF_MP_BATCH_DEMARC test out of dfly_acquire_curproc()
and into the scheduler clock.
* Add appropriate __predict*() macros to various conditionals in the
system call path and convert the terminal switch() for syscall to
a sequence of if()'s.
* Remove SYF_ARGMASK.
Aaron LI [Tue, 3 Mar 2020 14:04:37 +0000 (22:04 +0800)]
development.7: Modernize git commands for vendor import recipe
Since Git v1.7.2, 'git checkout --orphan' is the proper way to create an
orphan branch for vendor import. Use the new git commands to simplify
the vendor import recipe.
In addition, polish the recipe a bit.
Credit: https://stackoverflow.com/a/4288660
Aaron LI [Tue, 3 Mar 2020 09:15:09 +0000 (17:15 +0800)]
release.7: Rename MAKE_JOBS to NREL_MAKE_JOBS
Follow the change made in
834a13062350ab14a8c6aa28e5f9419613c173c2.
François Tigeot [Tue, 3 Mar 2020 07:51:13 +0000 (08:51 +0100)]
drm/linux: handle NULL pointers in kmap_to_page()
This prevents i915-related crashes in some rare circumstances.
Sascha Wildner [Tue, 3 Mar 2020 03:09:43 +0000 (04:09 +0100)]
psm.4/vga.4: Clean up a bit.
Sascha Wildner [Tue, 3 Mar 2020 02:57:49 +0000 (03:57 +0100)]
Remove historic references to config(8) from various manual pages.
Running it is normally not needed nowadays as it is part of the
buildkernel target.
Sascha Wildner [Tue, 3 Mar 2020 02:43:54 +0000 (03:43 +0100)]
ukbd.4: Fix wrong info.
Sascha Wildner [Tue, 3 Mar 2020 02:30:43 +0000 (03:30 +0100)]
cd.4: Remove a section that is pointless nowadays.
Matthew Dillon [Tue, 3 Mar 2020 01:28:44 +0000 (17:28 -0800)]
dsynth - Add 'debug' directive to the manual page
* Describe the debug directive in the manual page. It's been in dsynth
for a while and can be quite useful.
Matthew Dillon [Tue, 3 Mar 2020 01:11:35 +0000 (17:11 -0800)]
kernel - Refactor vfs_cache 4/N
* Refactor cache_findmount() to operate conflict-free and cache line
bounce free for the most part. The counter trick is used to probe
cache entries and combined with a local pcpu spinlock to interlock
against unmounts.
The umount code is now a bit more expensive (it has to acquire all
pcpu umount spinlocks before cleaning the cache out).
This code is not ideal but it performs up to 6x better on multple cpus.
* Refactor _cache_mntref() to use a 4-way set association.
* Rewrite cache_copy() and cache_drop_and_cache()'s caching algorithm.
* Use cache_dvpref() from nlookup() instead of rolling the code twice.
* Rewrite the nlookup*() and cache_nlookup*() code to generally leave
namecache records unlocked throughout, removing one layer of shared
locks from cpu contention. Only the last element is locked.
* Refactor nlookup*()'s handling of absolute paths a bit more.
* Refactor nlookup*()'s handling of NLC_REFDVP to better-validate
that the parent directory is actually the parent directory.
This also necessitates a nlookupdata.nl_dvp check in various system
calls using NLC_REFDVP to detect the mount-point case and return
the proper error code (usually EINVAL, but e.g. mkdir would return
EEXIST).
* Clean up _cache_lock() and friends to restore the diagnostic messages
when a namecache lock stalls for too long.
* FIX: Fix bugs in nlookup*() retry code. The retry code was not properly
unwinding symlink path construction during the loop and also not properly
resetting the base directory when looping up. This primarily effects NFS.
* NOTE: Using iscsi_crc32() at the moment to get a good hash distribution.
This is obviously expensive, but at least it is per-cpu.
* NOTE: The cache_nlookup() nchpp cache still has a shared spin-lock
that will cache-line-bounce concurrent aquisitions.
Matthew Dillon [Tue, 3 Mar 2020 01:09:54 +0000 (17:09 -0800)]
nrelease - Rename MAKE_JOBS to NREL_MAKE_JOBS
* Remove some confusion by renaming this variable. Should have no
effect on nrelease builds.
Matthew Dillon [Tue, 3 Mar 2020 01:08:34 +0000 (17:08 -0800)]
libc - Use fixed block size for db hash and btree method
* Use a fixed block size for newly created hash and btree DB
files instead of querying the filesystem for st_blksize.
st_blksize has little to do with what an efficient blocksize
for I/O would be on a modern system.
Matthew Dillon [Tue, 3 Mar 2020 01:07:40 +0000 (17:07 -0800)]
tmpfs - Close a whole in tmpfs_vop_nremove()
* Close a race condition in tmpfs_vop_nremove() that could result
in an assertion.
Matthew Dillon [Tue, 3 Mar 2020 01:05:59 +0000 (17:05 -0800)]
kernel - Enable stosq based pagezero()
* For page-sized blocks both AMD and Intel seem to do a good job
with stosq, though with real workloads it doesn't seem to really
improve performance. Go back to stosq for now though.
Matthew Dillon [Sat, 29 Feb 2020 06:27:39 +0000 (22:27 -0800)]
kernel - Improve cache_fullpath(), plus cleanup
* Improve cache_fullpath(). It can use a shared lock rather than an
exclusive lock, significantly improving concurrency. Important now
since realpath() indirectly uses this function.
* Code cleanup. Remove unused vfscache_rollup_all()
Matthew Dillon [Sat, 29 Feb 2020 06:25:35 +0000 (22:25 -0800)]
kernel - Comment future vrele() code intention
* vrele() currently uses atomic_fcmpset_*() and will in the future
use atomic_fetchadd_*() instead, but I can't change it without a
bit more work.
* Avoid updating v_flag and v_act if the values do not change, reducing
SMP contention a bit.
Matthew Dillon [Sat, 29 Feb 2020 06:23:53 +0000 (22:23 -0800)]
kernel - Improve nlookup() performance w/absolute paths
* Improve nlookup() performance when handed an absolute path by
initializing the base ncp to the rootnch instead of the current
directory nch when an absolute path is detected.
Matthew Dillon [Sat, 29 Feb 2020 06:18:15 +0000 (22:18 -0800)]
kernel - Rearrange struct vnode fields
* Rearrange vnode fields for improved SMP performance. Place v_auxrefs
and v_refcnt together but in a different cache line than v_lock. This
allows concurrent SMP operations to effectively pipeline atomic ops
on these fields that we haven't yet been able to get rid of, improving
performance.
Matthew Dillon [Fri, 28 Feb 2020 04:38:58 +0000 (20:38 -0800)]
kernel - Refactor vfs_cache 3/N
* Leave the vnode held for each linked namecache entry, allowing us to
remove all the hold/drop code for 0->1 and 1->0 lock transitions of
ncps.
This significantly simplifies the cache_lock*() and cache_unlock()
functions.
* Adjust the vnode recycling code to check v_auxrefs against
v_namecache_count instead of against 0.
Matthew Dillon [Fri, 28 Feb 2020 02:14:08 +0000 (18:14 -0800)]
kernel - Refactor vfs_cache 2/N
* Use lockmgr locks for the ncp lock. Convert nc_lockstatus / nc_locktd
to struct lock.
lockmgr locks use atomic_fetchadd_*() instead of atomic_fcmpset_*()
for nominal shared and exclusive lock count operations, which avoids
contention loops on failed fcmpset operations. There is still cache
line contention but since the code doesn't have to loop so much it
scales to core count a whole lot better.
* Two experimental __cachealign's added to bloat struct namecache. It
won't stay this way.
* Retain the non-optimal nc_vp ref count mess, which is why nc_vprefs
is needed. This will be fixed next.
Matthew Dillon [Thu, 27 Feb 2020 20:01:35 +0000 (12:01 -0800)]
kernel - Continue pmap work
* Conditionalize this work on PMAP_ADVANCED, default enabled.
* Remove md_page.pmap_count and md_page.writeable_count, no longer
track these counts which cause tons of cache line interactions.
However, there are still a few stubborn hold-overs.
* The vm_page still needs to be soft-busied in the page fault path
* For now we need to have a md_page.interlock_count to flag pages
being replaced by pmap_enter() (e.g. COW faults) in order to be
able to safely dispose of the page without busying it.
This need will eventually go away, hopefully just leaving us with
the soft-busy-count issue.
Matthew Dillon [Thu, 27 Feb 2020 18:41:08 +0000 (10:41 -0800)]
kernel - Refactor vfs_cache ncp->nc_refs
* Refactor namecache->nc_refs to use atomic_fetchadd_int() instead
of atomic_fcmp_set(), which really helps in heavily contended
situations.
This is accomplished by having a ref for every possible access point,
so that the 1->0 transition can lead directly to termination without
requiring further surgery.
Matthew Dillon [Thu, 27 Feb 2020 02:41:24 +0000 (18:41 -0800)]
kernel - Update vfs_cache to use fcmpset
* Update vfs_cache from using atomic_cmpset_*() to using
atomic_fcmpset_*().
Sascha Wildner [Fri, 28 Feb 2020 17:09:59 +0000 (18:09 +0100)]
Minor Makefiles cleanup in systat and flame_graph.
Matthew Dillon [Thu, 27 Feb 2020 19:47:15 +0000 (11:47 -0800)]
flame_graph - Add initial code to support flame graphs
* Add better PC sampling code to the kernel, capable of generating
call stack traces.
* Implement an initial flame_graph utility.
flame_graph > /tmp/x.out &
(let it run a while)
flame_graph -p < /tmp/x.out
Requested-by: mjg
Sascha Wildner [Thu, 27 Feb 2020 16:32:15 +0000 (17:32 +0100)]
hier.7: Adjust for binutils234.
Sascha Wildner [Wed, 26 Feb 2020 19:40:46 +0000 (20:40 +0100)]
kernel: Fix kernel build with 'options KTR' after
2ff21866646c375554d6.
While here, indent the KTR_COND_LOG() macro like the KTR_LOG() macro,
which looks slightly better.
Matthew Dillon [Wed, 26 Feb 2020 06:15:11 +0000 (22:15 -0800)]
kernel - Add kern.usched_dfly.poll_ticks
* Add kern.usched_dfly.poll_ticks (default 0, same as previous
operation) for testing a more aggressive scheduler 'pulling'
mode.
Matthew Dillon [Wed, 26 Feb 2020 06:07:22 +0000 (22:07 -0800)]
kernel - Code path optimization for kmalloc/kfree
* Use __read_frequently for numerous variables.
* Increase the pcpu slab cache on high-memory machines. This reduces
kernel_map and smpinvltlb interactions.
* Get rid of the ZoneGenAlloc and ZoneBigAlloc sysctl tracking
variables. These were causing unnecessary frequent cache line
bouncing between cpus.
Also get rid of SlabsFreed and SlabsAllocated.
* Get rid of unused sysctl variables.
* Make the on-kfree bcopy of weirdary optional (default OFF now).
It was previously unconditional and on by default.
Matthew Dillon [Wed, 26 Feb 2020 06:05:43 +0000 (22:05 -0800)]
kernel - Simple code path optimizations
* Add __read_mostly and __read_frequently to numerous variables as
appropriate to reduce unnecessary cache line ping-ponging.
* Adjust conditionals in the syscall code with __predict_true/false
to clean up the execution path.
Matthew Dillon [Wed, 26 Feb 2020 05:57:52 +0000 (21:57 -0800)]
kernel - Rearrange uidinfo structure a bit
* Rearrange the structure to move ui_lock and ui_refs
into a cache-line isolated area of the structure.
Sascha Wildner [Tue, 25 Feb 2020 15:48:03 +0000 (16:48 +0100)]
kernel/options: Fix wrong LINT->LINT64 replacement in a comment.
Matthew Dillon [Tue, 25 Feb 2020 05:29:36 +0000 (21:29 -0800)]
kernel - Simple cache line optimizations
* Reorder struct vm_page, struct vnode, and struct vm_object a bit
to improve cache-line locality.
* Use atomic_fcmpset_*() instead of atomic_cmpset_*() in several
places to reduce the inter-cpu cache coherency load a bit.
Matthew Dillon [Mon, 24 Feb 2020 23:00:00 +0000 (15:00 -0800)]
kernel - Try to fix tcp ISN generator
* The ISN generator couldn't stand the test of time. Very fast port
reuse can catch the destination host inpcb still in a TIME_WAIT
state and a bad ISN results in the destination ignoring the new SYN.
The old ISN generator could wind up returning the same sequence
number for fast reconnects occuring within the same tick.
Reimplement the ISN generator and also make it SMP friendly and
cache friendly. Because... it really wasn't before. Also attempt
to modernize the monotonic sequence space algorithm, reseed the
secret every 20 seconds, and make the reseeding non-disruptive to
sequence space monotonicity.
* Change the TH_SYN + TIME_WAIT state handling. Generally speaking it
is inteded that a new SYN when the inpcb is in TIME_WAIT recycle the
port/address pair and allow the new connection.
The sequence space checks for the TH_SYN may have been too strict.
Change the check to allow the recycling of the port/address pair
as long as the SYN has a different sequence number as the previous
connection.
I believe this is relatively safe since the recycling can only happen
if the socket is already in a TIME_WAIT state, but consider the code
still under test.
Matthew Dillon [Mon, 24 Feb 2020 22:56:05 +0000 (14:56 -0800)]
jail - add jail.defaults.allow_listen_override (3)
* Normalize the nominal jail IP conversions to the system call
interface whenever it is convenient. Remove conversions that
were previously in the udp and tcp connect and send code.
* Also do jail IP conversions in bind(), connect(), extconnect(),
sendto(), sendmsg(), recvfrom(), recvmsg().
* Refactor in_pcbladdr_find() to improve jail bindings, try to find
the correct interface IP to bind to. When a route is utilized,
iterate available interface IPs to locate a jail-acceptable IP
on the same interface.
Sascha Wildner [Mon, 24 Feb 2020 14:13:47 +0000 (15:13 +0100)]
boot/loader: Small improvement to merge_help.awk.
FreeBSD's r162742:
Ignore a sub-topic match if it is inside the command description.
Otherwise, merge-help can get confused by a command description that
includes a word that starts with a capital S.