Matthew Dillon [Thu, 9 Nov 2017 03:35:14 +0000 (19:35 -0800)]
kernel - Fix bug in vm_fault_page()
* Fix a bug in vm_fault_page() and vm_fault_page_quick(). The code
is not intended to update the user pmap, but if the vm_map_lookup()
results in a COW, any existing page in the underlying pmap will no
longer match the page that should be there.
The user process will still work correctly in that it will fault the
COW'd page if/when it tries to issue a write to that address, but
userland will not have visibility to any kernel use of vm_fault_page()
that modifies the page and causes a COW if the page has already been
faulted in.
* Fixed by detecting the COW and at least removing the pte from the pmap
to force userland to re-fault it.
* This fixes gdb operation on programs. The problem did not rear its
head before because the kernel did not pre-populate as many pages in the
initial exec as it does now.
* Enhance vm_map_lookup()'s &wired argument to return wflags instead,
which includes FS_WIRED and also now has FS_DIDCOW.
Reported-by: profmakx
Matthew Dillon [Wed, 8 Nov 2017 19:06:54 +0000 (11:06 -0800)]
kernel - Enhance debugging wakeup sysctl
* Add a second debugging sysctl to issue wakeup's for the UMTX
domain. This sysctl is only used for debugging purposes.
Matthew Dillon [Wed, 8 Nov 2017 19:05:03 +0000 (11:05 -0800)]
kernel - Try to fix 'busy buffer' panic.
* Improve the shutdown sequence to require three passes with
0 buffers remaining before proceeding. This should fix the
'busy buffer' panic.
Matthew Dillon [Wed, 8 Nov 2017 18:56:06 +0000 (10:56 -0800)]
libc and pthreads - Fix atfork issues with nmalloc, update dmalloc
* Implement atfork handling for nmalloc. As part of this, refactor
some of nmalloc.
* Remove ZERO_LENGTH_PTR from nmalloc. Instead, force 0-byte
allocations to allocate 1 byte. The standard requires unique
pointers to be returned.
* For now go back to a single depot lock instead of a per-zone
lock. It is unclear whether multi-threaded performance will
suffer or not, but its the only way to implement atfork handling.
* Implement proper atfork interlocks for nmalloc via pthreads to avoid
corruption when heavily threaded programs call fork().
* Bring dmalloc up to date in various ways, including properly
implementing a minimum 16-byte alignment for allocations >= 16 bytes,
and atfork handling. Also use a global depot lock for the same
reason we use it in nmalloc, and implement a front-end magazine
shortcut for any allocations <= 2MB.
Reported-by: mneumann
Matthew Dillon [Tue, 7 Nov 2017 06:08:21 +0000 (22:08 -0800)]
kernel - Update umtx documentation
* Update the umtx(2) documentation with better examples and include
fine detail and warnings on use.
* Update the fork(2) documentation to include warnings about using
the function in threaded programs.
Sascha Wildner [Wed, 8 Nov 2017 09:45:22 +0000 (10:45 +0100)]
rtld: Remove unneeded CSTD, our default is gnu99.
zrj [Wed, 8 Nov 2017 08:59:38 +0000 (10:59 +0200)]
file: Allow NOSHARED build, bring back -lz.
This effectively reverts the
e3b736ab82b63ca197aa7e3508f8a7e88c8db3fc.
Static libmagic.a does not know about its dependencies. Some mechanism to track
shared/static libs dependencies would be needed, especially for btools where we
have to keep temporal dimension of the base libraries evolution in mind.
zrj [Wed, 8 Nov 2017 08:52:23 +0000 (10:52 +0200)]
tcpdchk,tcpdmatch: Allow the NOSHARED build.
This one is technically not a bug, just global int rfc931_timeout, gets initialized
to different values.
zrj [Wed, 8 Nov 2017 08:48:02 +0000 (10:48 +0200)]
libcrypt: Fix symbol conflict with LIBRECRYPTO.
The MD5Transform() could be made static, but for now just mangle the symbol.
Solves issue of -static build with ${LIBCRYPT} and ${LIBRECRYPTO}.
zrj [Wed, 8 Nov 2017 08:05:12 +0000 (10:05 +0200)]
bsd.dep.mk: Document MKDEPINTDEPS addition.
zrj [Wed, 1 Nov 2017 15:32:28 +0000 (17:32 +0200)]
lib/csu: Fix longstanding quickworld issue.
Now /bin/sh and friends will have correct DragonFly version in NOTES section.
Full buildworld upon major version changes still should be preferred.
zrj [Wed, 1 Nov 2017 15:29:48 +0000 (17:29 +0200)]
bsd.lib.mk: Allow to skip implicit dependencies.
Only to be used for corner cases.
zrj [Wed, 1 Nov 2017 15:26:53 +0000 (17:26 +0200)]
bsd.dep.mk: Add support for internal dependencies.
This will be used for all cases where intermediate compilation
products are used due to one reason or another.
zrj [Tue, 7 Nov 2017 06:35:41 +0000 (08:35 +0200)]
cal(1): Fix locales handling.
* Remove long deprecated locale name symlinks.
* Remove 'mkdir -p' that was hiding missing mtree entries.
* Add missing mtree entries.
Matthew Dillon [Mon, 6 Nov 2017 19:21:43 +0000 (11:21 -0800)]
hammer2 - Fix divide by 0 race
* Fix a statfs/statvfs race which can cause a divide-by-0.
Reported-by: arcade@b1t.name
Matthew Dillon [Mon, 6 Nov 2017 18:31:04 +0000 (10:31 -0800)]
hammer2 - Add vfs.hammer2.supported_version
* Add vfs.hammer2.supported_version, which newfs_hammer2 probes
and complains about when it isn't there. This stops the complaints.
Sascha Wildner [Mon, 6 Nov 2017 15:38:15 +0000 (16:38 +0100)]
share/syscons/scrnmaps: Use FILES instead of 'all' target.
Reported-by: zrj
Sascha Wildner [Mon, 6 Nov 2017 15:35:58 +0000 (16:35 +0100)]
Clean up the namespace better in <netdb.h>, <spawn.h> and <sys/statvfs.h>.
Christian Groessler [Fri, 3 Nov 2017 13:28:53 +0000 (14:28 +0100)]
telnetd: print system information (OS and architecture) before login prompt.
Sascha Wildner [Mon, 6 Nov 2017 10:06:59 +0000 (11:06 +0100)]
arcmsr(4): Upgrade to Areca's Revision 1.40.00.00.
This adds support for various adapters: ARC1203, ARC1216, ARC1226, and
ARC1884.
Detailed list of changes:
1. fix ADAPTER_TYPE_D scanning device timeout.
2. fix ADAPTER_TYPE_D getting configuration data.
3. eliminate ending white-space of code.
4. align some code for more readable.
5. modify code of ADAPTER_TYPE_B for support ARC-1203
6. add a new ADAPTER_TYPE_E for support ARC-1884
7. adapter ARC-1215, 1225, 1216, 1226 are belong to ADAPTER_TYPE_C
also support in this driver.
8. redefine ADAPTER_TYPE_A,B,C,D value
Thanks to Areca for actively supporting DragonFly!!
Submitted-by: Ching Huang <ching2048@areca.com.tw>
Tested-by: kerma (Michael)
Sascha Wildner [Mon, 6 Nov 2017 09:17:34 +0000 (10:17 +0100)]
/usr/share/examples/etc: Remove pam.conf from the README.
Reported-by: zrj
zrj [Sun, 5 Nov 2017 16:53:10 +0000 (18:53 +0200)]
world: Honour the NO_SHARE in make.conf
Mark all cases that touch ${DESTDIR}/usr/share in one way or another.
While there, adust libmagic to use MAGICSHAREDIR instead of FILESDIR.
In-discussion-with: swildner
Sascha Wildner [Mon, 6 Nov 2017 00:42:17 +0000 (01:42 +0100)]
Say 'hammer2' instead of 'hammer' in various places.
Sascha Wildner [Sun, 5 Nov 2017 23:34:51 +0000 (00:34 +0100)]
hammer2.8: Fix typo.
Imre Vadász [Wed, 18 Oct 2017 21:40:12 +0000 (23:40 +0200)]
syscons - Add minimal fbio support for "scfb" xorg driver with sc->fbi fb.
* At the moment there is no support for safely handling the case where
userspace has mapped the EFI framebuffer, and a drm graphics driver is
loaded, that uses the same hardware as the EFI framebuffer.
(Either loading the drm(4) driver should be prevented in this case, or
the framebuffer should be forcibly unmapped from the userspace
application).
Sascha Wildner [Fri, 3 Nov 2017 18:27:38 +0000 (19:27 +0100)]
dhclient(8): Use SCRIPTS instead of beforeinstall target.
Sascha Wildner [Fri, 3 Nov 2017 17:32:30 +0000 (18:32 +0100)]
indxbib(1): Use FILES instead of beforeinstall target.
Reported-by: zrj
Sascha Wildner [Fri, 3 Nov 2017 17:23:02 +0000 (18:23 +0100)]
makewhatis.local(8): Remove unneeded SCRIPTSDIR variable.
Sascha Wildner [Fri, 3 Nov 2017 17:13:29 +0000 (18:13 +0100)]
efisetup(8): Remove unneeded SCRIPTSNAME variable.
The .sh suffix will be stripped automatically.
Sascha Wildner [Fri, 3 Nov 2017 16:56:21 +0000 (17:56 +0100)]
bc(1): Use FILES instead of SCRIPTS.
Remove unneeded variables as well.
Reported-by: zrj
Matthew Dillon [Thu, 2 Nov 2017 23:21:13 +0000 (16:21 -0700)]
pthreads - Fix rtld-elf and libthread_xu
* Fixes chrome, thunderbird, and multiple other issues with recent
libpthreads work.
Testing-by: mneumann, dillon
zrj [Thu, 2 Nov 2017 09:05:56 +0000 (11:05 +0200)]
sys: Remove a.out from comments.
While there, remove inherited htags tweak too.
zrj [Sat, 28 Oct 2017 14:25:31 +0000 (17:25 +0300)]
bootstrap: Remove helpers for upgrading directly from pre 4.4
This is partly needed to smooth out transition between c++98 to c++14
without switching to a rapid release cycling.
Many changes went in since 4.0 involving btools:
gnu/usr.bin/cc50/cc_tools - iconv() c++ issue, pre 4.4
gnu/usr.bin/grep - max_align_t issue, pre 4.2
usr.bin/chflags - chflagsat(2), pre 4.2
usr.bin/gencat - locales, pre 3.6(4.1 for safety)
usr.bin/sort - isnan(), pre 4.4
It is highly recomended to take an intermediate update step to any of
4.4, 4.6, 4.8 or 5.0 releases first when upgrading from older releases.
zrj [Sat, 28 Oct 2017 13:22:50 +0000 (16:22 +0300)]
bootstrap: Remove helpers for upgrading directly from pre 4.0
gnu/usr.bin/grep - <xlocale.h> addition, pre 3.6
usr.bin/basename - mbrlen(), pre 1.4
usr.bin/find - rpmatch(), pre 3.6
zrj [Sat, 28 Oct 2017 12:23:58 +0000 (15:23 +0300)]
bootstrap: Remove inherited helpers.
Present since initial fork from FreeBSD RELENG_4.
While there, remove no longer needed s/getline/get_line/
Matthew Dillon [Thu, 2 Nov 2017 03:33:24 +0000 (20:33 -0700)]
kernel - Refactor vm_fault and vm_map a bit (3)
* Fix bug in vm_map_split() where boject was being released
and dropped out of order on a certain condition, causing an
assertion. bobject is released properly later so we should
be able to simply remove the offending code.
Matthew Dillon [Thu, 2 Nov 2017 02:32:56 +0000 (19:32 -0700)]
kernel - Refactor vm_fault and vm_map a bit (2)
* Remove debugging.
Matthew Dillon [Thu, 2 Nov 2017 01:53:30 +0000 (18:53 -0700)]
libc - Add rtld stubs for pthreads
* Add needed rtld stubs for -static -pthreads links.
Matthew Dillon [Thu, 2 Nov 2017 00:47:48 +0000 (17:47 -0700)]
kernel - Improve uidinfo
* Improve uifind() to check td_cred for likely uid's, avoiding all
locking on hits.
* Create proc0 cred's cr_uidinfo and cr_ruidinfo using uicreate().
All creds should now never have a NULL cr_uidinfo or cr_ruidinfo,
so also remove conditionals that test for NULL.
Suggested-by: __mjg
Matthew Dillon [Thu, 2 Nov 2017 00:36:14 +0000 (17:36 -0700)]
kernel - Refactor vm_fault and vm_map a bit.
* Allow the virtual copy feature to be disabled via a sysctl.
Default enabled.
* Fix a bug in the virtual copy test. Multiple elements were
not being retested after reacquiring the map lock.
* Change the auto-partitioning of vm_map_entry structures from
16MB to 32MB. Add a sysctl to allow the feature to be disabled.
Default enabled.
* Cleanup map->timestamp bumps. Basically we bump it in
vm_map_lock(), and also fix a bug where it was not being
bumped after relocking the map in the virtual copy feature.
* Fix an incorrect assertion in vm_map_split(). Refactor tests
in vm_map_split(). Also, acquire the chain lock for the VM
object in the caller to vm_map_split() instead of in vm_map_split()
itself, allowing us to include the pmap adjustment within the
locked area.
* Make sure OBJ_ONEMAPPING is cleared for nobject in vm_map_split().
* Fix a bug in a call to vm_map_transition_wait() that
double-locked the vm_map in the partitioning code.
* General cleanups in vm/vm_object.c
Matthew Dillon [Thu, 2 Nov 2017 00:18:56 +0000 (17:18 -0700)]
libthread_xu - Fix rtld and refactor locks
* Add a separate atfork facility for internal pthread atfork entities
(sem and rtld) which must execute after all user atfork entities
pre-fork and before all user atfork entities post-fork.
* Install an atfork handler for rtld-elf (also requires rtld-elf to
be updated). The handler will ensure that RTLD locks are in a sane
state prior to fork (by acquiring them), and will then release the
locks post-fork. This is the primary fix for lang/rust and cargo.
Also do not issue _thr_rtld_fini() when threading drops to 0.
Once threading has been set, rtld's pthread locks remain installed.
* Refactor thr_cond.c. Refactor condition variables to perform
according to the spec. Use a TAILQ to make pthread_cond_signal()
work exactly as described in the manual (that is, waking up only
one waiter at a time).
* Refactor thr_mutex.c. Primary instrument for debugging and
clean up. Also deal with improper EINTR handling.
* Refactor thr_fork.c. Implement the new atfork facility for
internal atfork handlers.
* Refactor thr_rwlock.c. Add debugging, cleanup.
* thr_sem.c now uses the internal atfork handler to ensure proper
ordering.
* thr_sig.c implements debugging features.
* Refactor thr_umtx.c... the low level mutex code. Store the id
for additonal verification and use an atomic lock to clear the
lock instead of an assignment. Properly ignore EINTR.
* Cleanup init_private() a bit.
* Add PTHREADS_DEBUGGING=TRUE and PTHREADS_DEBUGGING2=TRUE make
flags. The first writes out a garbage file in /tmp for all
locking operations as they occur. The second is used for
point debugging and writes out a file when signal 63 is sent
to the program.
* Add cpu_ccfence() in various places that might need it (a hack
for the moment, userland cannot currently #include
"machine/cpufunc.h").
* Should fix lang/rust and 'cargo'
Matthew Dillon [Thu, 2 Nov 2017 00:15:26 +0000 (17:15 -0700)]
rtld - Add fork hooks for libthread_xu to install
* Add fork hooks for libthread_xu to install. rtld must acquire its
locks exclusively during a fork, and then release them after the
fork is complete, to prevent the fork() from catching the locks in
a bad state. See libthread_xu.
Sascha Wildner [Wed, 1 Nov 2017 11:02:38 +0000 (12:02 +0100)]
Install hammer2 periodic script.
Reported-by: ftigeot
zrj [Wed, 1 Nov 2017 09:26:57 +0000 (11:26 +0200)]
bsd.links.mk: Add some dir debug for SYMLINKS.
For now not fatal.
Matthew Dillon [Sun, 29 Oct 2017 17:52:36 +0000 (10:52 -0700)]
hammer2 - Add KVABIO support for hammer2
* Add KVABIO support for H2. This allows H2 to manipulate the buffer
cache without having to fully synchronize buffer data to all cpus,
saving us a boatload of global IPIs.
* This more than doubles uncached read throughput from NVMe media.
A simple test showed an increase from ~600 MBytes/sec to
~1400 MBytes/sec through the filesystem. The IPI rate was reduced
from 25000 IPIs/cpu/sec to less than 200 IPIs/cpu/sec.
Read throughput was likely improved even further. The NVMe device
used for the test has roughly a ~1500 MB/sec cap.
Matthew Dillon [Sun, 29 Oct 2017 05:50:55 +0000 (22:50 -0700)]
kernel - Add KVABIO debugging, flesh out inlines
* Add vfs.debug_kvabio to dump a limited number of stack backtraces
when a buffer needs full SMP synchronization.
* Add a cluster_read_kvabio() inline which makes the appropriate
call to cluster_readx().
Matthew Dillon [Sun, 29 Oct 2017 05:49:34 +0000 (22:49 -0700)]
devfs - propagate D_KVABIO to vnode
* If si_ops->head.flags has D_KVABIO set, then set VKVABIO for the
related vnode. This enables KVABIO in vn_strategy() calls through
to devices which support it (aka NVME driver).
Matthew Dillon [Sat, 28 Oct 2017 23:15:20 +0000 (16:15 -0700)]
kernel - Remove vfs.cache.numfullpath* sysctl statistics
* Remove vfs.cache.numfullpath* sysctl statistics. Nearly all
full path lookups are now cached and the statistics no longer
serve any purpose.
Matthew Dillon [Sat, 28 Oct 2017 01:59:18 +0000 (18:59 -0700)]
test - Add lockmgr1, lockmgr2, lockmgr3 tests
* Add tests which perform various system calls intended to exercise
lockmgr() locks.
* Generally shows a roughly 40% improvement in SMP performance
from the recent lockmgr changes when shared locks have high
collision rates.
Matthew Dillon [Sat, 28 Oct 2017 01:55:43 +0000 (18:55 -0700)]
kernel - Refactor lockmgr() (2)
* Remove the global vfs_spin() lock and single vnode_active_list
and single vnode_inactive_list.
* Replace with a pcpu array of spinlocks and lists. However, for
this initial push the array is simply hashed based on the vnode
pointer, so it isn't really being acted on pcpu.
* Significantly reduces numerous bottlenecks when vnodes start to get
recycled by vnlru(). Cache line bounces are still a problem,
but direct spinlock conflicts are essentially gone.
Matthew Dillon [Tue, 24 Oct 2017 02:08:13 +0000 (19:08 -0700)]
kernel - Refactor lwkt_token shared lock release
* We can finally get rid of the atomic_cmpset*() loop in the token
release code. The exclusive release can now simply clear the
TOK_EXCLUSIVE bit, and the shared release can now simply reduce
tok->t_count by TOK_INCR.
* This significantly improves heavily constested shared token
performance.
Matthew Dillon [Tue, 24 Oct 2017 01:39:16 +0000 (18:39 -0700)]
kernel - Refactor lockmgr()
* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.
The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).
* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.
LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).
LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).
LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).
LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.
LKC_CANCEL Cancel API support
LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).
LKC_XMASK Exclusive lock count mask (+1 increments)
The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.
* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.
This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.
* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.
* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.
* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
Matthew Dillon [Tue, 24 Oct 2017 01:36:46 +0000 (18:36 -0700)]
kernel - Add #define for atomic_add_64()
* Add #define for atomic_add_64() ( -> atomic_add_long() on x86-64 )
Matthew Dillon [Sat, 21 Oct 2017 06:43:15 +0000 (23:43 -0700)]
kernel - Fix bug in machdep.pmap_mmu_optimize
* Fix a bug in the pmap_mmu_optimize feature (default disabled). When
enable this feature will automatically share page table pages with
equivalent permissions for objects covering the whole page table page.
* However, the code which cleaned out the 'old' page table page failed
to properly lock its pindex across the replacement operation, likely
allowing threaded programs to sometimes race it and potentially
lose track of one or more PTEs.
The code tried to hold onto proc_pd_pv to prevent races, but there is
still a small window due to the fact that pmap_allocpte() allocates
pv's from the leaf upward. So if the pmap optimization is good in
one thread but the fails in another for the same page table (for
example, due to a vm_map_entry split), a race can ensue.
* Use the existing pt_placemarker feature to properly lock the empty
page table page slot while it is being replaced.
* Add a soft-locking mechanism to temporarily work around a
(usually short-lived) allocation live-lock which can crop up
when one thread is trying to replace a page table page while
another is trying to allocate to it.
Matthew Dillon [Sat, 21 Oct 2017 06:11:30 +0000 (23:11 -0700)]
kernel - Fix vm.max_proc_mmap
* The vm.max_proc_mmap calculation always overflowed, and was only
saved by the result always being some ridiculously large nujmber
(98M). vm.max_proc_mmap is an int and the calculation was based on
KvaSize which is ... a huge number much larger than anything an
int can hold.
* Replace the mess with a hard-coded value of 1000000. The value can
be changed via sysctl as before.
Matthew Dillon [Fri, 20 Oct 2017 19:12:54 +0000 (12:12 -0700)]
kernel - Fix userldt refcnt races
* Fix userldt refcnt races. Note that at the moment, DragonFlyBSD
doesn't implement userldt support anyway, so this does not fix
any actual bugs. But make sure the code is correct.
Suggested-by: mjg
Matthew Dillon [Fri, 20 Oct 2017 19:01:43 +0000 (12:01 -0700)]
kernel - Refactor kern_sendfile()
* Refactor kern_sendfile() to greatly improve performance.
* Use vm_page_lookup_sbusy_try() exclusively to acquire VM pages
to assign to the mbufs.
* Instead of holding pages in a fancy manner, just issue the
UIO_NOCOPY / VMIO VOP_READ() in the blind and loop up.
* The VOP_READ() is still synchronous. It is really unclear
whether asynchronizing VOP_READ() via the pagerops would
really improve performance verses simply implementing a
limited number of connections per worker. At least in
localhost tests, we seem to be hitting a hardware memory
bottleneck long before we hit a cpu bottleneck.
Matthew Dillon [Fri, 20 Oct 2017 19:01:03 +0000 (12:01 -0700)]
kernel - Enhance vm_page_lookup_sbusy_try() API
* Add a pgoff/pgbytes to allow a more fine-grained test of the
page's validity. Will be used by kern_sendfile().
Matthew Dillon [Fri, 20 Oct 2017 18:59:53 +0000 (11:59 -0700)]
kernel - Increase MAGAZINE_CAPACITY_MAX
* Increase MAGAZINE_CAPACITY_MAX from 128 to 4096 to improve
mbuf objcache recycling performance and reduce lock contention.
Sepherosa Ziehau [Tue, 31 Oct 2017 04:51:56 +0000 (12:51 +0800)]
em: Free tx mbufs proactively.
This is preparation for the dillon's upcoming sendfile patch.
Sepherosa Ziehau [Mon, 30 Oct 2017 05:54:56 +0000 (13:54 +0800)]
emx: Free tx mbufs proactively.
This is preparation for the dillon's upcoming sendfile patch.
Markus Pfeiffer [Sun, 29 Oct 2017 22:47:18 +0000 (22:47 +0000)]
kernel: Rename struct tmpfs_args to tmpfs_mount_info
This makes the names of vfs argument structures slightly more
uniform.
Since they were not installed before this should not break
any userland software.
Markus Pfeiffer [Sun, 29 Oct 2017 21:36:47 +0000 (21:36 +0000)]
kernel: Rename tmpfs_args.h to tmpfs_mount.h
This is slightly more consistent with the other VFS.
Markus Pfeiffer [Sun, 29 Oct 2017 15:46:59 +0000 (15:46 +0000)]
Install vfs/tmpfs headers
Matthew Dillon [Sun, 29 Oct 2017 21:37:32 +0000 (14:37 -0700)]
kernel - Fix boot issues with > 512GB of ram
* Fix DMAP installation issues for kernels with > 512GB of ram.
The page table was not being laid out properly for PML4e
entries past the first one.
* Fix early panic reporting. Conditionalize the lapic access as the
lapic might not exist yet.
* Tested to 1TB of ram. Theoretically DragonFlyBSD can support up
to 32TB of ram (and slightly less than ~64TB with one #define
change).
Reported-by: zrj
Testing-by: zrj
Matthew Dillon [Sun, 29 Oct 2017 17:48:15 +0000 (10:48 -0700)]
hammer2 - Fix "hammer2_chain_getparent: no parent" assertion
* Inodes are placed on the pmp->sideq when a flush action is required
but no vnode association exists. This is most typically done when
a vnode is reclaimed. The sideq code also handles destroying an
unlinked inode on last-close.
* It is possible for an already-deleted inode (not just unlinked, but
also deleted from the topology) to wind up on the sideq list, resulting
in the above assertion.
* Fix the assertion by handling the case. Just flush the inode normally
instead of trying to re-delete it. The related in-memory topology will
be destroyed automatically.
Sascha Wildner [Sun, 29 Oct 2017 10:46:32 +0000 (11:46 +0100)]
Sync zoneinfo database with tzdata2017c from ftp://ftp.iana.org/tz/releases
* Northern Cyprus switches from +03 to +02/+03 on 2017-10-29.
* Fiji ends DST 2018-01-14, not 2018-01-21.
* Namibia switches from +01/+02 to +02 on 2018-04-01.
* Sudan switches from +03 to +02 on 2017-11-01.
* Tonga likely switches from +13/+14 to +13 on 2017-11-05.
* Turks & Caicos switches from -04 to -05/-04 on 2018-11-04.
For a detailed list of changes, see share/zoneinfo/NEWS.
Sascha Wildner [Sun, 29 Oct 2017 10:41:51 +0000 (11:41 +0100)]
Remove two no longer needed directories.
/usr/lib/aout
/usr/libexec/sm.bin
Reported-by: zrj
Sepherosa Ziehau [Sat, 28 Oct 2017 23:34:36 +0000 (07:34 +0800)]
ix: Fix possible TX desc GC missing.
This would theoretically happen, if the polling rate was extremely high.
Sepherosa Ziehau [Sat, 28 Oct 2017 19:01:14 +0000 (03:01 +0800)]
igb: Free tx mbufs proactively.
For 82575, which is earliest product of this product line, the RS
bit is set on every packet's last TX desc by default, since in the
'head write back' mode, the content of TDH register does not move,
if the content of the 'header', which is memory based, does not
move. It is still allowed to reduce the density of RS TX descs,
which will be useful for workloads w/o using sendfile.
This is preparation for the dillon's upcoming sendfile patch.
Matthew Dillon [Sat, 28 Oct 2017 23:04:58 +0000 (16:04 -0700)]
kernel - Fix bugs and refactor namecache cleaning code
* Refactor the cleaning code. For positive namecache entries,
track based on NCHHASH linkages rather than v_namecache linkages.
Also maintain a count of freeable (leaf) entries and use that
in the _cache_cleanpos() code.
This should hopefully fix a bug where the system can get stuck
constantly calling _cache_cleanpos() in situations where there
are not any freeable entries.
* Refactor the negative namecache tracking code. Move to a per-cpu
structure where entries are made based on the current cpu and removed
based on the recorded cpu. Refactor the cleaning code to iterate
the cpu list on a per-call basis which should hopefully allow multiple
cpus to clean the ncneg lists concurrently.
This reduces a SMP bottleneck, but does not deal with cache
ping-ponging issues on related structures.
Reported-by: ftigeot
Matthew Dillon [Sat, 28 Oct 2017 22:14:02 +0000 (15:14 -0700)]
kernel - Fix cluster_awrite() race
* Fix a race between cluster_awrite() and vnode destruction. We
have to finish working the cluster pbuf before disposing of the
component elements. b_vp in the cluster pbuf is held only by
the presence of the components.
* Fixes NULL pointer indirection panic associated with heavy
paging during tmpfs operations. This typically only occurs
when maxvnodes is set to a relatively low value, but it can
eventually occur in any modestly paging environment when
tmpfs is used.
Sepherosa Ziehau [Sat, 28 Oct 2017 06:46:31 +0000 (14:46 +0800)]
x86_64: Add pauses in the TSC mpsync testing loop.
This fixes Intel N3450 deadlock in the tight rdtsc/IPI loop.
Suggested-by: dillon@
Tested-by: mneumann@
Dragonfly-bug: http://bugs.dragonflybsd.org/issues/3087
Sepherosa Ziehau [Sat, 21 Oct 2017 22:06:27 +0000 (06:06 +0800)]
ix: Free tx mbufs proactively.
This is preparation for the dillon's upcoming sendfile patch.
Sascha Wildner [Sat, 28 Oct 2017 10:48:15 +0000 (12:48 +0200)]
Remove the ancient rdist(1) tool along with related periodic(8) scripts.
There are substitutes in dports' net/44bsd-rdist and net/rdist6.
Sascha Wildner [Sat, 28 Oct 2017 10:22:41 +0000 (12:22 +0200)]
kernel/hptmv: Use __DragonFly__ instead of __DragonFly_version.
Matthew Dillon [Sat, 28 Oct 2017 01:37:16 +0000 (18:37 -0700)]
kernel - Rewrite umtx_sleep() and umtx_wakeup() (2)
* Refactor some of the umtx code to do a better job dealing
with pageout and fork() races.
This still is not ideal.
Reported-by: profmakx
zrj [Fri, 27 Oct 2017 10:31:02 +0000 (13:31 +0300)]
kldload.8: Mention /boot/modules.local purpose.
Imre Vadász [Wed, 25 Oct 2017 19:11:08 +0000 (21:11 +0200)]
boot - Abort boot if EFI-framebuffer format is unsupported.
* At the moment we support 24-bit and 32-bit pixel formats, make sure we
notify the user and abort booting, when encountering an unsupported
framebuffer format.
Imre Vadász [Sun, 22 Oct 2017 20:47:24 +0000 (22:47 +0200)]
syscons - Add 24bit pixel format support for EFI framebuffer.
Sepherosa Ziehau [Tue, 24 Oct 2017 04:54:44 +0000 (12:54 +0800)]
x86_64: Allow TSC MP synchronization test be disabled.
Matthew Dillon [Tue, 24 Oct 2017 06:07:13 +0000 (23:07 -0700)]
vmstat - Fix formatting
* 'fre' memory formatting width was incorrect, causing
the rest of the field to be incorrectly offset.
* Display more precision as the field width allows.
* Add -b for 'brief' mode to display less precision.
* Add -u for 'unformatted' mode to display raw numbers (columnar
output will not be aligned).
Matthew Dillon [Sun, 22 Oct 2017 07:02:18 +0000 (00:02 -0700)]
kernel - Use different queue iterator for emergency pager
* Adjust q1iterator and q1iterator to minimize collisions between
the two pageout demons. The pageout demon will iterate forwards
while the emergency demon will iterate backwards.
Matthew Dillon [Sun, 22 Oct 2017 06:17:26 +0000 (23:17 -0700)]
kernel - Use different cache_rover for emergency pager
* Fix an issue where the same cache_rover index was being used for both
pageout threads. This could result in a great deal of contention
and cache line bouncing between the threads due to the vm pagerq
spinlock.
* Fix by changing cache_rover to an array[2]. In addition, the
one pageout thread iterates its rover forwards while the other
runs its rover backwards, plus a little more code, to minimize
conflicts.
Aaron LI [Tue, 17 Oct 2017 15:19:48 +0000 (23:19 +0800)]
pf: Make pf_print_host() print IPv6 addresses correctly
Taken-from: OpenBSD sys/net/pf.c v.1.615
Aaron LI [Tue, 17 Oct 2017 14:55:45 +0000 (22:55 +0800)]
pf: Always skip "urpf-failed" test for IPv6 link local addresses
We could re-embed the scope-id before we do the route lookup,
but then we would just find the very interface we've received
the packet on anyway.
Taken-from: OpenBSD sys/net/pf.c v.1.625
Aaron LI [Tue, 17 Oct 2017 14:41:42 +0000 (22:41 +0800)]
pf: use IN6_IS_SCOPE_EMBED to check kernel-internal form addresses
Use IN6_IS_SCOPE_EMBED to check kernel-internal form addresses
(s6_addr16[1] filled).
Taken-from: OpenBSD sys/net/pf.c v.1.520
Matthew Dillon [Sun, 22 Oct 2017 00:31:43 +0000 (17:31 -0700)]
kernel - Zero out syncache_percpu properly
* The kmalloc for the syncache_percpu was not using M_ZERO
which I believe can cause cache_count to be some random
value. If this value is close to or larger than the syncache
limit, the garbage collector may run with no entries to reuse,
causing a NULL pointer dereference and panic.
Reported-by: pa3k #3088
Matthew Dillon [Sat, 21 Oct 2017 23:56:06 +0000 (16:56 -0700)]
hdaa - Remove dead code
* Remove dead code (an impossible condition).
Reported-by: dcb #3077
Matthew Dillon [Sat, 21 Oct 2017 23:52:06 +0000 (16:52 -0700)]
bc - Adjust bad syntax
* Adjust a badly syntaxed expression.
Reported-by: dcb #3079
Matthew Dillon [Sat, 21 Oct 2017 23:31:30 +0000 (16:31 -0700)]
swapon - Fix minor memory leak
* Fix a minor memory leak
Reported-by: liweitianux bug #3086
Matthew Dillon [Sat, 21 Oct 2017 22:02:05 +0000 (15:02 -0700)]
kernel - Cleanup token code, add simple exclusive priority (2)
* The priority mechanism revealed an issue with lwkt_switch()'s
fall-back code in dealing with contended tokens. The code was
refusing to schedule a lower-priority thread on a cpu requesting an
exclusive lock as another on that same cpu requesting a shared lock.
This creates a problem for the exclusive priority feature. More
pointedly, it also creates a fairness problem in the mixed lock
type use case generally.
* Change the mechanism to allow any thread polling on tokens to be
scheduled. The scheduler will still iterate in priority order.
This imposes a little extra overhead with regards to userspace
returns as a thread might be scheduled that then tries to return
to userland without being the designated user thread.
* This also fixes another bug that cropped up recently where a
32-way threaded program would sometimes not quickly schedule to
all 32 cpus, sometimes leaving one or two cpus idle for a few
seconds.
Sepherosa Ziehau [Sat, 21 Oct 2017 07:31:40 +0000 (15:31 +0800)]
inet6: Make non-prefix and directly reachable inet6 routes work.
e.g. inet6 routes added w/ -interface:
sysctl net.inet6.icmp6.nd6_onlink_ns_rfc4861=1
ifconfig ix0 inet6 2003:db8::1
route add -inet6 2003:db8:1::/64 -interface ix0
NOTE: net.inet6.icmp6.nd6_onlink_ns_rfc4861 MUST be on.
Sascha Wildner [Sat, 21 Oct 2017 10:30:50 +0000 (12:30 +0200)]
pstat.8: Add markup.
Aaron LI [Sat, 21 Oct 2017 05:34:39 +0000 (13:34 +0800)]
pstat.8: Remove a duplicate option of swapinfo
Matthew Dillon [Fri, 20 Oct 2017 23:42:42 +0000 (16:42 -0700)]
initrd - Add 'fetch'
* Add the 'fetch' program to the recovery shell. This is just too
useful a program to not have on the rescue ramdisk.
Matthew Dillon [Thu, 19 Oct 2017 20:27:22 +0000 (13:27 -0700)]
kernel - Cleanup token code, add simple exclusive priority
* Cleanup the token code and bring the comments up to date.
* Implement exclusive priority for the situation where a thread is
acquiring only a single shared token. We cannot implement exclusive
priority when multiple tokens are held because that can lead to
deadlocks. The token code guarantees no deadlocks.
Matthew Dillon [Thu, 19 Oct 2017 19:09:56 +0000 (12:09 -0700)]
kernel - Add p_ppid
* We have proc->p_pptr, but still needed a shared p->p_token to access
the ppid. Buckle under and add proc->p_ppid as well so getppid() can
run lockless.
* Adjust the vmtotal proc scan to use a shared proc->p_token instead
of an exclusive one.
Matthew Dillon [Thu, 19 Oct 2017 19:08:05 +0000 (12:08 -0700)]
kernel - Adjust tsc_delay()
* Add more cpu_pause()'s to the tsc_delay() loop to
be more hyper-thread friendly.
Sascha Wildner [Thu, 19 Oct 2017 20:17:09 +0000 (22:17 +0200)]
kernel/acpi: Ouch, add forgotten semicolon.