Sascha Wildner [Tue, 17 Mar 2020 14:24:15 +0000 (15:24 +0100)]
Update the pciconf(8) database.
March 7, 2020 snapshot from https://pci-ids.ucw.cz
Matthew Dillon [Mon, 16 Mar 2020 18:39:40 +0000 (11:39 -0700)]
kernel - Fix rare vm_map_entry exhaustion panic (2)
* Increase per-cpu fast-cache hysteresis from its absurdly small
value to a significantly larger value.
* Missing header file update for prior commit
Matthew Dillon [Mon, 16 Mar 2020 18:24:52 +0000 (11:24 -0700)]
kernel - Fix rare vm_map_entry exhaustion panic
* Fix a rare situation where many processes blocked in zget() on
the same CPU can cause the kernel's per-cpu vm_map_entry entry
to be exhusted, causing a panic.
The situation arises because the zget() operation can wake other
threads up via the deeper vm_map lock after burning vm_map_entry's
to expand the space but prior to adding the new structural objects
to the vm_zone.
* Caused a leaf.dragonflybsd.org panic on concurrent git fork/exec's
via the web server.
Sascha Wildner [Sat, 14 Mar 2020 18:15:48 +0000 (19:15 +0100)]
<sys/conf.h>: Remove some more dead prototypes.
Sascha Wildner [Sat, 14 Mar 2020 18:14:36 +0000 (19:14 +0100)]
kernel: Remove some get_dev() remains. We no longer have this function.
zrj [Sat, 14 Mar 2020 09:20:30 +0000 (11:20 +0200)]
kernel: Adjust description for kern.tls_extra
zrj [Thu, 13 Feb 2020 12:07:37 +0000 (14:07 +0200)]
kernel: Add handling for R_X86_64_PLT32 (type 4) in kernel linker.
Newer binutils can emit R_X86_64_PLT32 for -shared compilations.
Tested-with: binutils234
Sascha Wildner [Fri, 13 Mar 2020 20:42:59 +0000 (21:42 +0100)]
kernel/kprintf: Add a tunable for the kern.kprintf_logging sysctl.
While here, add one too for security.unprivileged_read_msgbuf and
document the tunables affecting kprintf(9).
François Tigeot [Thu, 12 Mar 2020 06:16:53 +0000 (07:16 +0100)]
drm/i915: Update DRIVER_DATE to
20161024
Matthew Dillon [Wed, 11 Mar 2020 18:57:57 +0000 (11:57 -0700)]
kernel - Rework vfs_timestamp(), adjust default
* Rework the vfs_timestamp() precision mode as follows:
0 TSP_SEC seconds granularity
1 TSP_HZ ticks granularity
2 TSP_USEC ticks granularity modulo microseconds
3 TSP_NSEC ticks granularity modulo nanoseconds
4 TSP_USEC_PRECISE precise microseconds (expensive)
5 TSP_NSEC_PRECISE precise nanoseconds (expensive)
The default is TSP_USEC (with tick granularity)
* Change numerous bits of code that were calling getmicrotime()
or calling microtime()/nanotime() explicitly instead of calling
vfs_timstamp(). procfs and devfs in particular.
Reported-by: mjg
Matthew Dillon [Wed, 11 Mar 2020 18:55:57 +0000 (11:55 -0700)]
kernel - Do not use rdtsc() in the spinlock loop when virtualized
* When running as a guest, do not use rdtsc() in the spinlock loop
as numerous HVM subsystems will trap-out on the instruction.
Reported-by: mjg
Matthew Dillon [Wed, 11 Mar 2020 18:50:36 +0000 (11:50 -0700)]
kernel - Allow 8254 timer to be forced, clean-up user/sys/intr/idle
* Allows the 8254 timer to be forced on for machines which do not
support the LAPIC timer during deep-sleep. Fix an assertion that
occurs in this situation.
hw.i8254.intr_disable="0"
* Adjust the statclock to calculate user/sys/intr/idle time
properly when the clock interrupt occurs from an interrupt
thread instead of from a hard interrupt.
Basically when the clock interrupt occurs from an interrupt thread,
we have to look at curthread->td_preempted instead of curthread.
In addition RQF_INTPEND will be set across the call due to the way
processing works and we have to look at the bitmask of interrupt
sources instead of this bit.
Reported-by: CuteLarva
Sascha Wildner [Wed, 11 Mar 2020 15:13:49 +0000 (16:13 +0100)]
BSD.include.dist: Fix indentation (we use spaces in these files).
François Tigeot [Wed, 11 Mar 2020 11:19:45 +0000 (12:19 +0100)]
world: Install Linux headers required by Mesa >= 19.3
Avoiding many patches in dports
François Tigeot [Wed, 11 Mar 2020 11:15:31 +0000 (12:15 +0100)]
linux/types.h: Fix compilation with userland C++ programs
Such as newer Mesa versions
François Tigeot [Mon, 9 Mar 2020 22:28:12 +0000 (23:28 +0100)]
drm/linux: Rewrite the tasklet implementation
Newer drm/i915 driver versions expect tasklets to run in dedicated
threads and no longer work with synchronous calls.
Thanks to Matthew Dillon for advice on locking issues and how best
to resolve mp races.
François Tigeot [Sun, 8 Mar 2020 21:12:42 +0000 (22:12 +0100)]
drm/linux: Add put_pid()
Sascha Wildner [Sun, 8 Mar 2020 14:19:35 +0000 (15:19 +0100)]
libkvm: No need to include <sys/proc.h> when <sys/user.h> is included.
Sascha Wildner [Sun, 8 Mar 2020 13:34:21 +0000 (14:34 +0100)]
kernel: Include <sys/lock.h> instead of <sys/mutex.h> in linux/kfifo.h.
This should have been changed in
45aa70c6e8cc2435e82aabd4d0d233948c7cb105.
Sascha Wildner [Sun, 8 Mar 2020 13:18:39 +0000 (14:18 +0100)]
kernel: Add missing newlines at the end of two files.
Sascha Wildner [Sun, 8 Mar 2020 07:18:25 +0000 (08:18 +0100)]
Revert "Remove unneeded *_if.c from SRCS in kernel module Makefiles that have it."
This reverts commit
99bd8089615e30757d8327c0a5afe0b8fe69d337.
Oops, this seems to have broken a few things after all. I'll investigate better.
Reported-by: Peeter Must
Sascha Wildner [Sun, 8 Mar 2020 05:48:48 +0000 (06:48 +0100)]
Remove unneeded *_if.c from SRCS in kernel module Makefiles that have it.
Those are always compiled into the kernel, per NORMAL_M in kern.pre.mk,
so they don't need to be in a module's SRCS. This removes the few cases
where they were added by mistake.
François Tigeot [Sat, 7 Mar 2020 18:28:17 +0000 (19:28 +0100)]
drm/linux: Add disable_irq() and enable_irq()
François Tigeot [Sat, 7 Mar 2020 18:13:58 +0000 (19:13 +0100)]
drm/linux: Add atomic_fetch_xor()
François Tigeot [Sat, 7 Mar 2020 18:09:24 +0000 (19:09 +0100)]
drm/linux: Add io_mapping_init_wc() and _fini()
Matthew Dillon [Sat, 7 Mar 2020 17:41:28 +0000 (09:41 -0800)]
rtld - Use kern.tls_extra
* Use kern.tls_extra, if available, to calculate the extra tls
space to allocate for late library loads.
* If not available, default to 6144 bytes instead of 1280 bytes
to support greater use of static tls sections in late-loaded
libraries (read: mesa 19.3).
Reported-by: ftigeot
Matthew Dillon [Sat, 7 Mar 2020 17:39:03 +0000 (09:39 -0800)]
kernel - Add sysctl kern.tls_extra
* Add sysctl kern.tls_extra, defaulting to 6144, which rtld will query
to get the amount of extra tls space to allocate to accomodate late
library loads.
François Tigeot [Fri, 6 Mar 2020 10:13:03 +0000 (11:13 +0100)]
drm/linux: Add wake_up_bit() and wait_on_bit_timeout()
François Tigeot [Fri, 6 Mar 2020 10:12:39 +0000 (11:12 +0100)]
drm: Improve linux/timer.h
François Tigeot [Fri, 6 Mar 2020 10:12:08 +0000 (11:12 +0100)]
drm/linux: Add list_for_each_entry_from()
Obtained-from: FreeBSD
François Tigeot [Fri, 6 Mar 2020 10:11:50 +0000 (11:11 +0100)]
drm/linux: Add __add_wait_queue_tail()
Obtained-from: FreeBSD
François Tigeot [Fri, 6 Mar 2020 10:08:55 +0000 (11:08 +0100)]
drm/linux: Add atomic_set_release()
Obtained-from: FreeBSD
Matthew Dillon [Thu, 5 Mar 2020 18:40:54 +0000 (10:40 -0800)]
kernel - Add minor VM shortcuts (2)
* Fix bug last commit. I was trying to shortcut the case where the
vm_page was not flagged MAPPED or WRITEABLE, but didn't read my
own code comment above the conditional and issued a vm_page_free()
without first checking to see if the VM object could be locked.
This lead to a livelock in the kernel under heavy loads.
* Rejigger the fix to do the shortcut in a slightly different
place.
François Tigeot [Thu, 5 Mar 2020 08:33:22 +0000 (09:33 +0100)]
drm/linux: Add oops_in_progress
François Tigeot [Thu, 5 Mar 2020 08:32:59 +0000 (09:32 +0100)]
drm/linux: Add reboot_notifier functions
François Tigeot [Thu, 5 Mar 2020 08:31:54 +0000 (09:31 +0100)]
drm/linux: Add the "noinline" compiler directive
François Tigeot [Thu, 5 Mar 2020 08:31:08 +0000 (09:31 +0100)]
drm/linux: Add spin_lock_irqsave_nested()
François Tigeot [Thu, 5 Mar 2020 08:24:46 +0000 (09:24 +0100)]
drm/linux: Implement static_branch_xxx functions
François Tigeot [Thu, 5 Mar 2020 08:24:04 +0000 (09:24 +0100)]
drm: Add a few linux/gfp.h constants and functions
François Tigeot [Thu, 5 Mar 2020 08:23:26 +0000 (09:23 +0100)]
drm/linux: Add the X86_FEATURE_XMM4_1 flag
Matthew Dillon [Thu, 5 Mar 2020 01:21:41 +0000 (17:21 -0800)]
tmpfs - Fix bug last commit
* TMPFS_NODE_LOCK -> TMPFS_NODE_UNLOCK
Reported-by: Aaron LI
Matthew Dillon [Wed, 4 Mar 2020 17:23:09 +0000 (09:23 -0800)]
kernel - Minor pmap optimizations, minor swap_pager*() optimizations
* Avoid the atomic_fcmpset_int() in pmap_enter() if the page already
has the appropriate PG_* bits set. Because the page is soft-busied,
this check should not be able to race any clearing of the bits (which
can only be done when the page is hard-busied).
* swap_pager_freespace*() no longer bothers to acquire the object
token if the object's swblock_count is 0.
* Misc __read_* optimizations
* Misc __predict* optimizations
Matthew Dillon [Wed, 4 Mar 2020 17:17:47 +0000 (09:17 -0800)]
kernel - Add minor VM shortcuts
* Adjust vm_page_hash_elm heuristic to save the full pindex field
instead of just the lower 32 bits.
* Refactor the hash table and hash lookup to index directly to the
potential hit rather than masking to the SET size (~3). This
improves our chances of finding the requested page without having
to iterate.
The hash table is now N + SET sized and the SET iteration runs
from the potential direct-hit point forwards.
* Minor __predict* code optimizations.
* Shortcut vm_page_alloc() when PG_MAPPED|PG_WRITEABLE are clear
to avoid unnecessary code paths.
Matthew Dillon [Wed, 4 Mar 2020 17:16:09 +0000 (09:16 -0800)]
kernel - Minor optimizations
* Minor __predict and __read_mostly/frequently optimizations.
Matthew Dillon [Wed, 4 Mar 2020 17:10:37 +0000 (09:10 -0800)]
tmpfs - Fix minor deadlock, refactor tn_links
* Fix a minor deadlock. tmpfs_alloc_vp() can rarely race a vnode
and leave a dangling lock, causing a later umount to deadlock.
* Refactor tn_links to use atomic ops, mainly to clean-up an
almost impossible race that can happen at umount time.
François Tigeot [Wed, 4 Mar 2020 16:37:25 +0000 (17:37 +0100)]
drm/linux: Add __test_and_set_bit()
François Tigeot [Wed, 4 Mar 2020 16:29:16 +0000 (17:29 +0100)]
drm: Add a few Linux headers
Matthew Dillon [Wed, 4 Mar 2020 04:06:12 +0000 (20:06 -0800)]
kernel - Rename spinlock counter trick API
* Rename the access side of the API from spin_update_*() to
spin_access_*() to avoid confusion.
Matthew Dillon [Wed, 4 Mar 2020 03:35:59 +0000 (19:35 -0800)]
kernel - Refactor cache_vref() using counter trick
* Refactor cache_vref() such that it is able to validate that a vnode
(whos ref count might be 0) is not in VRECLAIM, without acquiring the
vnode lock. This is the normal case.
If cache_vref() is unable to do this, it backs down to the old method
which was to get a vnode lock, validate that the vnode is not in
VRECLAIM, then release the lock.
* NOTE: In DragonFlyBSD, holding a vref on a vnode (vref, NOT vhold) will
prevent the vnode from transitioning to VRECLAIM.
* Use the new feature for nlookup's naccess() tests and for the *stat*()
series of system calls.
This significantly increases performance. However, we are not entirely
cache-contention free as both the namecache entry and the vnode are still
referenced, requiring atomic adds.
Matthew Dillon [Wed, 4 Mar 2020 03:34:08 +0000 (19:34 -0800)]
kernel - More counter trick spinlock APIs
* Add spin_update_start_only(), spin_update_check_inprog(), and
spin_update_end_only(). Same as the non-only versions but do not
acquire a spin-lock in the (v & 1) case.
Matthew Dillon [Wed, 4 Mar 2020 01:58:43 +0000 (17:58 -0800)]
kernel - Fix minor scheduler bugs
* Fix an issue where the ucount and/or uload could get out of sync,
giving a cpu a permanent bias one way or the other.
* Fix an issue where the scheduler pull feature (0x01 or 0x04) would
ignore cpus with non-zero run-queue counts in favor of cpus with
high uloads (that can be due to processes sleeping < 1 second).
* By default, the scheduler helper's pull feature now runs once a tick.
This may be changed as testing progresses.
Matthew Dillon [Tue, 3 Mar 2020 21:26:48 +0000 (13:26 -0800)]
kernel - Normalize the vx_*() vnode interface
* The vx_*() vnode interface is used for initial allocations, reclaims,
and terminations.
Normalize all use cases to prevent the mixing together of the vx_*()
API and the vn_*() API. For example, vx_lock() should not be paired
with vn_unlock(), and so forth.
* Integrate an update-counter mechanism into the vx_*() API, assert
reasonability.
* Change vfs_cache.c to use an int update counter instead of a long.
The vfs_cache code can't quite use the spin-lock update counter API
yet.
Use proper atomics for load and store.
* Implement VOP_GETATTR_QUICK, meant to be a 'quick' version of
VOP_GETATTR() that only retrieves information related to permissions
and ownership. This will be fast-pathed in a later commit.
* Implement vx_downgrade() to convert an exclusive vx_lock into an
exclusive vn_lock (for vnodes). Adjust all use cases in the
getnewvnode() path.
* Remove unnecessary locks in tmpfs_getattr() and don't use
any in tmpfs_getattr_quick().
* Remove unnecessary locks in hammer2_vop_getattr() and don't use
any in hammer2_vop_getattr_quick()
Matthew Dillon [Tue, 3 Mar 2020 18:21:33 +0000 (10:21 -0800)]
kernel - Integrate the counter & 1 trick into the spinlock API
* The counter trick allows read accessors to sample a data structure
without any further locks or ref-counts, as long as the data structure
is free-safe.
It does not necessarily protect pointers within the data structure so e.g.
ref'ing some sub-structure via a data structure pointer is not safe
on its own unless the sub-structure is able to provide some sort of
additional guarantee.
* Our struct spinlock has always been 8 bytes, but only uses 4 bytes for
the lock. Ipmlement the new API using the second field.
Accessor side:
spin_update_start()
spin_update_end()
Modifer side:
spin_lock_update()
spin_unlock_update()
* On the acessor side if spin_update_start() detects a change in-progress
it will obtain a shared spin-lock, else remains unlocked.
spin_update_end() tells the caller whether it must retry the operation,
i.e. if a change occurred between start and end. This can only happen
if spin_update_start() remained unlocked.
If the start did a shared lock then no changes are assumed to have occurred
and spin_update_end() will release the shared spinlock and return 0.
* On the modifier side, spin_lock_update() obtains an exclusive spinlock
and increments the update counter, making it odd ((spin->update & 1) != 0).
spin_unlock_update() increments the counter again, making it even but
different, and releases the exclusive spinlock.
Matthew Dillon [Tue, 3 Mar 2020 04:42:22 +0000 (20:42 -0800)]
kernel - syscall path optimizations
* Shortcut checks in dfly_acquire_curproc(), significantly reducing
system call overhead.
* Move one of the TDF_MP_BATCH_DEMARC test out of dfly_acquire_curproc()
and into the scheduler clock.
* Add appropriate __predict*() macros to various conditionals in the
system call path and convert the terminal switch() for syscall to
a sequence of if()'s.
* Remove SYF_ARGMASK.
Aaron LI [Tue, 3 Mar 2020 14:04:37 +0000 (22:04 +0800)]
development.7: Modernize git commands for vendor import recipe
Since Git v1.7.2, 'git checkout --orphan' is the proper way to create an
orphan branch for vendor import. Use the new git commands to simplify
the vendor import recipe.
In addition, polish the recipe a bit.
Credit: https://stackoverflow.com/a/4288660
Aaron LI [Tue, 3 Mar 2020 09:15:09 +0000 (17:15 +0800)]
release.7: Rename MAKE_JOBS to NREL_MAKE_JOBS
Follow the change made in
834a13062350ab14a8c6aa28e5f9419613c173c2.
François Tigeot [Tue, 3 Mar 2020 07:51:13 +0000 (08:51 +0100)]
drm/linux: handle NULL pointers in kmap_to_page()
This prevents i915-related crashes in some rare circumstances.
Sascha Wildner [Tue, 3 Mar 2020 03:09:43 +0000 (04:09 +0100)]
psm.4/vga.4: Clean up a bit.
Sascha Wildner [Tue, 3 Mar 2020 02:57:49 +0000 (03:57 +0100)]
Remove historic references to config(8) from various manual pages.
Running it is normally not needed nowadays as it is part of the
buildkernel target.
Sascha Wildner [Tue, 3 Mar 2020 02:43:54 +0000 (03:43 +0100)]
ukbd.4: Fix wrong info.
Sascha Wildner [Tue, 3 Mar 2020 02:30:43 +0000 (03:30 +0100)]
cd.4: Remove a section that is pointless nowadays.
Matthew Dillon [Tue, 3 Mar 2020 01:28:44 +0000 (17:28 -0800)]
dsynth - Add 'debug' directive to the manual page
* Describe the debug directive in the manual page. It's been in dsynth
for a while and can be quite useful.
Matthew Dillon [Tue, 3 Mar 2020 01:11:35 +0000 (17:11 -0800)]
kernel - Refactor vfs_cache 4/N
* Refactor cache_findmount() to operate conflict-free and cache line
bounce free for the most part. The counter trick is used to probe
cache entries and combined with a local pcpu spinlock to interlock
against unmounts.
The umount code is now a bit more expensive (it has to acquire all
pcpu umount spinlocks before cleaning the cache out).
This code is not ideal but it performs up to 6x better on multple cpus.
* Refactor _cache_mntref() to use a 4-way set association.
* Rewrite cache_copy() and cache_drop_and_cache()'s caching algorithm.
* Use cache_dvpref() from nlookup() instead of rolling the code twice.
* Rewrite the nlookup*() and cache_nlookup*() code to generally leave
namecache records unlocked throughout, removing one layer of shared
locks from cpu contention. Only the last element is locked.
* Refactor nlookup*()'s handling of absolute paths a bit more.
* Refactor nlookup*()'s handling of NLC_REFDVP to better-validate
that the parent directory is actually the parent directory.
This also necessitates a nlookupdata.nl_dvp check in various system
calls using NLC_REFDVP to detect the mount-point case and return
the proper error code (usually EINVAL, but e.g. mkdir would return
EEXIST).
* Clean up _cache_lock() and friends to restore the diagnostic messages
when a namecache lock stalls for too long.
* FIX: Fix bugs in nlookup*() retry code. The retry code was not properly
unwinding symlink path construction during the loop and also not properly
resetting the base directory when looping up. This primarily effects NFS.
* NOTE: Using iscsi_crc32() at the moment to get a good hash distribution.
This is obviously expensive, but at least it is per-cpu.
* NOTE: The cache_nlookup() nchpp cache still has a shared spin-lock
that will cache-line-bounce concurrent aquisitions.
Matthew Dillon [Tue, 3 Mar 2020 01:09:54 +0000 (17:09 -0800)]
nrelease - Rename MAKE_JOBS to NREL_MAKE_JOBS
* Remove some confusion by renaming this variable. Should have no
effect on nrelease builds.
Matthew Dillon [Tue, 3 Mar 2020 01:08:34 +0000 (17:08 -0800)]
libc - Use fixed block size for db hash and btree method
* Use a fixed block size for newly created hash and btree DB
files instead of querying the filesystem for st_blksize.
st_blksize has little to do with what an efficient blocksize
for I/O would be on a modern system.
Matthew Dillon [Tue, 3 Mar 2020 01:07:40 +0000 (17:07 -0800)]
tmpfs - Close a whole in tmpfs_vop_nremove()
* Close a race condition in tmpfs_vop_nremove() that could result
in an assertion.
Matthew Dillon [Tue, 3 Mar 2020 01:05:59 +0000 (17:05 -0800)]
kernel - Enable stosq based pagezero()
* For page-sized blocks both AMD and Intel seem to do a good job
with stosq, though with real workloads it doesn't seem to really
improve performance. Go back to stosq for now though.
Matthew Dillon [Sat, 29 Feb 2020 06:27:39 +0000 (22:27 -0800)]
kernel - Improve cache_fullpath(), plus cleanup
* Improve cache_fullpath(). It can use a shared lock rather than an
exclusive lock, significantly improving concurrency. Important now
since realpath() indirectly uses this function.
* Code cleanup. Remove unused vfscache_rollup_all()
Matthew Dillon [Sat, 29 Feb 2020 06:25:35 +0000 (22:25 -0800)]
kernel - Comment future vrele() code intention
* vrele() currently uses atomic_fcmpset_*() and will in the future
use atomic_fetchadd_*() instead, but I can't change it without a
bit more work.
* Avoid updating v_flag and v_act if the values do not change, reducing
SMP contention a bit.
Matthew Dillon [Sat, 29 Feb 2020 06:23:53 +0000 (22:23 -0800)]
kernel - Improve nlookup() performance w/absolute paths
* Improve nlookup() performance when handed an absolute path by
initializing the base ncp to the rootnch instead of the current
directory nch when an absolute path is detected.
Matthew Dillon [Sat, 29 Feb 2020 06:18:15 +0000 (22:18 -0800)]
kernel - Rearrange struct vnode fields
* Rearrange vnode fields for improved SMP performance. Place v_auxrefs
and v_refcnt together but in a different cache line than v_lock. This
allows concurrent SMP operations to effectively pipeline atomic ops
on these fields that we haven't yet been able to get rid of, improving
performance.
Matthew Dillon [Fri, 28 Feb 2020 04:38:58 +0000 (20:38 -0800)]
kernel - Refactor vfs_cache 3/N
* Leave the vnode held for each linked namecache entry, allowing us to
remove all the hold/drop code for 0->1 and 1->0 lock transitions of
ncps.
This significantly simplifies the cache_lock*() and cache_unlock()
functions.
* Adjust the vnode recycling code to check v_auxrefs against
v_namecache_count instead of against 0.
Matthew Dillon [Fri, 28 Feb 2020 02:14:08 +0000 (18:14 -0800)]
kernel - Refactor vfs_cache 2/N
* Use lockmgr locks for the ncp lock. Convert nc_lockstatus / nc_locktd
to struct lock.
lockmgr locks use atomic_fetchadd_*() instead of atomic_fcmpset_*()
for nominal shared and exclusive lock count operations, which avoids
contention loops on failed fcmpset operations. There is still cache
line contention but since the code doesn't have to loop so much it
scales to core count a whole lot better.
* Two experimental __cachealign's added to bloat struct namecache. It
won't stay this way.
* Retain the non-optimal nc_vp ref count mess, which is why nc_vprefs
is needed. This will be fixed next.
Matthew Dillon [Thu, 27 Feb 2020 20:01:35 +0000 (12:01 -0800)]
kernel - Continue pmap work
* Conditionalize this work on PMAP_ADVANCED, default enabled.
* Remove md_page.pmap_count and md_page.writeable_count, no longer
track these counts which cause tons of cache line interactions.
However, there are still a few stubborn hold-overs.
* The vm_page still needs to be soft-busied in the page fault path
* For now we need to have a md_page.interlock_count to flag pages
being replaced by pmap_enter() (e.g. COW faults) in order to be
able to safely dispose of the page without busying it.
This need will eventually go away, hopefully just leaving us with
the soft-busy-count issue.
Matthew Dillon [Thu, 27 Feb 2020 18:41:08 +0000 (10:41 -0800)]
kernel - Refactor vfs_cache ncp->nc_refs
* Refactor namecache->nc_refs to use atomic_fetchadd_int() instead
of atomic_fcmp_set(), which really helps in heavily contended
situations.
This is accomplished by having a ref for every possible access point,
so that the 1->0 transition can lead directly to termination without
requiring further surgery.
Matthew Dillon [Thu, 27 Feb 2020 02:41:24 +0000 (18:41 -0800)]
kernel - Update vfs_cache to use fcmpset
* Update vfs_cache from using atomic_cmpset_*() to using
atomic_fcmpset_*().
Sascha Wildner [Fri, 28 Feb 2020 17:09:59 +0000 (18:09 +0100)]
Minor Makefiles cleanup in systat and flame_graph.
Matthew Dillon [Thu, 27 Feb 2020 19:47:15 +0000 (11:47 -0800)]
flame_graph - Add initial code to support flame graphs
* Add better PC sampling code to the kernel, capable of generating
call stack traces.
* Implement an initial flame_graph utility.
flame_graph > /tmp/x.out &
(let it run a while)
flame_graph -p < /tmp/x.out
Requested-by: mjg
Sascha Wildner [Thu, 27 Feb 2020 16:32:15 +0000 (17:32 +0100)]
hier.7: Adjust for binutils234.
Sascha Wildner [Wed, 26 Feb 2020 19:40:46 +0000 (20:40 +0100)]
kernel: Fix kernel build with 'options KTR' after
2ff21866646c375554d6.
While here, indent the KTR_COND_LOG() macro like the KTR_LOG() macro,
which looks slightly better.
Matthew Dillon [Wed, 26 Feb 2020 06:15:11 +0000 (22:15 -0800)]
kernel - Add kern.usched_dfly.poll_ticks
* Add kern.usched_dfly.poll_ticks (default 0, same as previous
operation) for testing a more aggressive scheduler 'pulling'
mode.
Matthew Dillon [Wed, 26 Feb 2020 06:07:22 +0000 (22:07 -0800)]
kernel - Code path optimization for kmalloc/kfree
* Use __read_frequently for numerous variables.
* Increase the pcpu slab cache on high-memory machines. This reduces
kernel_map and smpinvltlb interactions.
* Get rid of the ZoneGenAlloc and ZoneBigAlloc sysctl tracking
variables. These were causing unnecessary frequent cache line
bouncing between cpus.
Also get rid of SlabsFreed and SlabsAllocated.
* Get rid of unused sysctl variables.
* Make the on-kfree bcopy of weirdary optional (default OFF now).
It was previously unconditional and on by default.
Matthew Dillon [Wed, 26 Feb 2020 06:05:43 +0000 (22:05 -0800)]
kernel - Simple code path optimizations
* Add __read_mostly and __read_frequently to numerous variables as
appropriate to reduce unnecessary cache line ping-ponging.
* Adjust conditionals in the syscall code with __predict_true/false
to clean up the execution path.
Matthew Dillon [Wed, 26 Feb 2020 05:57:52 +0000 (21:57 -0800)]
kernel - Rearrange uidinfo structure a bit
* Rearrange the structure to move ui_lock and ui_refs
into a cache-line isolated area of the structure.
Sascha Wildner [Tue, 25 Feb 2020 15:48:03 +0000 (16:48 +0100)]
kernel/options: Fix wrong LINT->LINT64 replacement in a comment.
Matthew Dillon [Tue, 25 Feb 2020 05:29:36 +0000 (21:29 -0800)]
kernel - Simple cache line optimizations
* Reorder struct vm_page, struct vnode, and struct vm_object a bit
to improve cache-line locality.
* Use atomic_fcmpset_*() instead of atomic_cmpset_*() in several
places to reduce the inter-cpu cache coherency load a bit.
Matthew Dillon [Mon, 24 Feb 2020 23:00:00 +0000 (15:00 -0800)]
kernel - Try to fix tcp ISN generator
* The ISN generator couldn't stand the test of time. Very fast port
reuse can catch the destination host inpcb still in a TIME_WAIT
state and a bad ISN results in the destination ignoring the new SYN.
The old ISN generator could wind up returning the same sequence
number for fast reconnects occuring within the same tick.
Reimplement the ISN generator and also make it SMP friendly and
cache friendly. Because... it really wasn't before. Also attempt
to modernize the monotonic sequence space algorithm, reseed the
secret every 20 seconds, and make the reseeding non-disruptive to
sequence space monotonicity.
* Change the TH_SYN + TIME_WAIT state handling. Generally speaking it
is inteded that a new SYN when the inpcb is in TIME_WAIT recycle the
port/address pair and allow the new connection.
The sequence space checks for the TH_SYN may have been too strict.
Change the check to allow the recycling of the port/address pair
as long as the SYN has a different sequence number as the previous
connection.
I believe this is relatively safe since the recycling can only happen
if the socket is already in a TIME_WAIT state, but consider the code
still under test.
Matthew Dillon [Mon, 24 Feb 2020 22:56:05 +0000 (14:56 -0800)]
jail - add jail.defaults.allow_listen_override (3)
* Normalize the nominal jail IP conversions to the system call
interface whenever it is convenient. Remove conversions that
were previously in the udp and tcp connect and send code.
* Also do jail IP conversions in bind(), connect(), extconnect(),
sendto(), sendmsg(), recvfrom(), recvmsg().
* Refactor in_pcbladdr_find() to improve jail bindings, try to find
the correct interface IP to bind to. When a route is utilized,
iterate available interface IPs to locate a jail-acceptable IP
on the same interface.
Sascha Wildner [Mon, 24 Feb 2020 14:13:47 +0000 (15:13 +0100)]
boot/loader: Small improvement to merge_help.awk.
FreeBSD's r162742:
Ignore a sub-topic match if it is inside the command description.
Otherwise, merge-help can get confused by a command description that
includes a word that starts with a capital S.
Sascha Wildner [Mon, 24 Feb 2020 10:20:54 +0000 (11:20 +0100)]
syscons.4: Mention restrictions on the resolution of ttyv0.
Matthew Dillon [Mon, 24 Feb 2020 08:48:00 +0000 (00:48 -0800)]
jail - add jail.defaults.allow_listen_override (2)
* Also munge the returned sockaddr for accept4() when inside a
jail.
Matthew Dillon [Mon, 24 Feb 2020 07:05:42 +0000 (23:05 -0800)]
jail - add jail.defaults.allow_listen_override
* Add jail.defaults.allow_listen_override (also per-jail settable).
This feature is disabled by default.
When enabled, this feature allows both wildcard and non-wildcard listen
sockets in the jail to override wildcard listen sockets on the host.
These sockets will be masked by the jail's IP list, meaning that a
wildcard socket in the jail effectively covers just the jail's IP list.
Non-wildcard listen sockets on the host are not overriden.
Use of this feature allows the host to operate normally, without having
to make its services jail-friendly. Only those services which bind to
specific IPs that might conflict with the jail IPs will need modification,
and only if the jail needs to have that service as well.
* In order to use the feature safely each jail should be given its
own unique IPs for both localhost and its externally routable IP.
For example:
jail -u root / tr3990xJ 127.0.0.2,10.0.0.139 /bin/csh
ifconfig can be used on the host to create multiple 127.0.0.X aliases
on lo0 and to assign additional routable IPs to the machine for use
in its jails. For example:
ifconfig lo0 inet 127.0.0.2 alias
ifconfig lo0 inet 127.0.0.3 alias
ifconfig lo0 inet6 ::2 alias
ifconfig lo0 inet6 ::3 alias
ifconfig em0 inet 10.0.0.139 netmask 255.255.0.0 alias
ifconfig em0 inet 10.0.0.140 netmask 255.255.0.0 alias
...
* Within a jail, use of localhost (127.0.0.1 or ::1) will automatically
be converted to the jail's localhost IP (such as 127.0.0.2). Also,
accept(), getsockname(), and getpeername() will translate the jail's
localhost IP back to 127.0.0.1 or ::1. Most services within the
jail can thus use localhost without being the wiser.
* Listen address/port pairs within a jail can now be overloaded with the
same address/port pairs on the host, or overloaded verses other jails
without generating an error. However, accessibility to these ports is
governed by the 'jail.deafults.allow_listen_override' sysctl setting
for the jail (or the jail-specific version of the same sysctl).
Any jail-to-jail overloading of identical address/port pairs is allowed,
but operationally undefined. Only one jail will receive connections.
It is best to supply each jail with its own unique local and routable
IPs.
* IPV6 is now fully supported using the same mechanisms. You can supply
a mix of IPV4 and IPV6 addresses in the jail command if desired. The
overloading feature works the same.
Matthew Dillon [Mon, 24 Feb 2020 06:55:13 +0000 (22:55 -0800)]
nfs - Strip out cr_prison from cached creds
* Strip out cr_prison from creds cached in struct nfs_node to
prevent exited jails from sticking around indefinitely.
Matthew Dillon [Sun, 23 Feb 2020 20:02:27 +0000 (12:02 -0800)]
jail - Fix broken port matching code
* in_pcblookup_local() and in_pcblookup_localremote() were trying to
use the cred to distinguish between jails, but these routines are used
to locate a free port for bindind purposes and could wind up returning
a lookup failure for an occupied port.
The code may have been present in an early isolation attempt for jails.
* Remove the code. Isolating the IPs for a jail basically requires using
IP aliases, not by trying to isolate port number sets between jails.
Matthew Dillon [Sun, 23 Feb 2020 18:11:28 +0000 (10:11 -0800)]
jail - Allow loopback interface in in_pcbladdr_find()
* Prior jail adjustments to allow loopback IPs to be specified in
the ip-list missed this bit of code which caused the binding
code to ignore routes to loopback interfaces.
* Adjust the code to accept such routes. If a loopback IP is in
the jail's ip-list, it can now be bound to. If not, and a loopback
route is returned, it will use the first non-loopback IP in the jail's
ip-list.
* Note that listen sockets within a jail are not overloaded and so can
connect to listen sockets on the host or in other jails when a common
IP (such as 127.0.0.1) is in the ip-list for both. In this regard,
shared loopback IPs now work identically to shared NIC IPs.
IP aliases may be used to create a separation. If you use e.g. 127.0.0.2
in a jail, bindings to 127.0.0.1 will automatically be adjusted to
use 127.0.0.2.
Matthew Dillon [Sat, 22 Feb 2020 19:12:35 +0000 (11:12 -0800)]
tmpfs - Fix races in tmpfs_nrename() and tmpfs_nrmdir()
* Lock all nrename elements before checks. This is particularly
important when renaming over a file or empty directory, but other
manipulations done by this code without locks could also cause
races which result in corruption, particularly with the link count.
* Lock all nrmdir elements before checks, for the same reason.
Matthew Dillon [Sat, 22 Feb 2020 18:48:50 +0000 (10:48 -0800)]
tmpfs - Cleanup, refactor tmpfs_alloc_vp()
* Refactor tmpfs_alloc_vp() to handle races without having to have
a weird intermediate TMPFS_VNODE_ALLOCATING state. This also
removes the related ALLOCATING/WAIT code which had a totally broken
tsleep() call in it.
* Properly zero fields in tmpfs_alloc_node().
* Cleanup some comments
Matthew Dillon [Mon, 17 Feb 2020 08:09:37 +0000 (00:09 -0800)]
kernel and world - Replace bcmp/bcopy/bzero/memcmp/memcpy/memmove/memset
* Replace bcmp/bcopy/bzero/memcmp/memcpy/memmove/memset with mjg's
code, with some minor adjustments.
* mjg's code has been given its own header file,
<machine/asm_mjgmacros.h>
* Also replaces copyin and copyout.
* Around a 1.7% improvement in bulk-build performance.
Tomohiro Kusumi [Sun, 23 Feb 2020 16:43:00 +0000 (01:43 +0900)]
sbin/fsck_msdosfs: Use humanize_number to format available and bad space sizes
from freebsd/freebsd@
996c6aefcf2ed50629407b6ad8794ebfce8ac794
François Tigeot [Sun, 23 Feb 2020 14:51:19 +0000 (15:51 +0100)]
drm/i915: Update base driver to
20160808