Matthew Dillon [Wed, 18 Oct 2017 06:25:24 +0000 (23:25 -0700)]
kernel - refactor vm_page busy
* Move PG_BUSY, PG_WANTED, PG_SBUSY, and PG_SWAPINPROG out of m->flags.
* Add m->busy_count with PBUSY_LOCKED, PBUSY_WANTED, PBUSY_SWAPINPROG,
and PBUSY_MASK (for the soft-busy count).
* Add support for acquiring a soft-busy count without a hard-busy.
This requires that there not already be a hard-busy. The purpose
of this is to allow a vm_page to be 'locked' in a shared manner
via the soft-busy for situations where we only intend to read from
it.
Imre Vadász [Tue, 17 Oct 2017 20:06:23 +0000 (22:06 +0200)]
if_vtnet - Handle missing IFCAP_VLAN_* flags nicer. Comment IFCAP_LOR stuff.
* The if_vtnet driver used to define the IFCAP_LRO, IFCAP_VLAN_HWFILTER and
IFCAP_VLAN_HWTSO flags itself, to make the code from FreeBSD build.
Instead define IFCAP_VLAN_HWFILTER and IFCAP_VLAN_HWTSO to 0, when they
are not defined already. This allows the code to build, but all checks
for the flags fail. (Inspired by the vmxnet3 driver port).
* The IFCAP_LRO flag is unavailable in DragonFly, but the LRO offload seems
to work somehow.
* According to the virtio specification, LRO support should be possible
without rx checksum support as well.
Matthew Dillon [Tue, 17 Oct 2017 21:57:19 +0000 (14:57 -0700)]
kernel - Cleanup vm_page_repurpose()
* Remove the now unused vm_page_repurpose() function.
* Remove emrunning variable.
Imre Vadász [Tue, 17 Oct 2017 20:11:08 +0000 (22:11 +0200)]
if_vtnet - Disable rx csum offload due to unsupported ipv6 rx csum offload.
* Ignoring the checksum offloading in the receive path of the driver isn't
sufficient, since we might receive only partially checksummed packets
from the host.
* Unfortunately there is only a single feature flag for both ipv4 and ipv6
receive checksum offloading, so we need to disable both for now.
* At the moment we don't support a way to explicitly enable the rx csum
feature at runtime, but this will be easily possible by adding support
for the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS feature.
* Mention this as a caveat in the manpage.
* Update correct default value of hw.vtnet.lro_disable tunable in the
manpage, to match the code again.
Sascha Wildner [Tue, 17 Oct 2017 20:18:35 +0000 (22:18 +0200)]
kernel: Remove <sys/sysref{,2}.h> inclusion from files that don't need it.
Some of the headers are public in one way or another so bump
__DragonFly_version for safety.
While here, add a missing <sys/objcache.h> include to kern_exec.c which
was previously relying on it coming in via <sys/sysref.h> (which was
included by <sys/vm_map.h> prior to this commit).
Sascha Wildner [Tue, 17 Oct 2017 20:15:17 +0000 (22:15 +0200)]
<sys/indefinite2.h>: Add missing include for VKERNEL64.
Matthew Dillon [Tue, 17 Oct 2017 18:55:24 +0000 (11:55 -0700)]
kernel - Remove 'Emergency Pager' debugging messages
* Remove these messages. They were for debugging only and, in fact,
the activation of the anonymous-only pager is not really an
'Emergency'.
Sascha Wildner [Tue, 17 Oct 2017 18:31:29 +0000 (20:31 +0200)]
Stitch LINT64 build back together.
b1793cc6ba47622ab6ad154905f5c1385a6825bd removed the debuglockmgr()
code in kern_lock.c that was enabled with the DEBUG_LOCKS kernel
option. Its only consumer was in vfs_vnops.c for vn_lock.
For now, remove all associated remains.
Justin C. Sherrill [Tue, 17 Oct 2017 18:28:14 +0000 (14:28 -0400)]
Add mount_hammer2 and newfs_hammer2 to initrd list.
Sascha Wildner [Tue, 17 Oct 2017 07:36:30 +0000 (09:36 +0200)]
Remove "kernel ppp", i.e. if_ppp.ko and pppd(8).
It has been replaced by ppp(8), in conjunction with tun(4).
While here, rename the ppp-user rc script to 'ppp' and fix up
REQUIRE/PROVIDE situation.
Matthew Dillon [Mon, 16 Oct 2017 22:17:42 +0000 (15:17 -0700)]
mkinitrd - Add missing /var/db (3)
* When /var is mounted via tmpfs we have to mkdir the subdirs
manually.
* Add /var/db and /var/empty to the directories initrd creates
in its rc.
Submitted-by: amonk
Imre Vadász [Mon, 16 Oct 2017 22:00:32 +0000 (00:00 +0200)]
virtio_blk - Fix capacity calculation, when host sets large disk block size.
* The disk capacity in the virtio configuration space is always specified
in 512 byte sectors, so info.d_media_blksize should be 512.
* Also check for VIRTIO_BLK_F_GEOMETRY feature before reading the disk
geometry from configuration space.
* Add some device_printf calls to report the disk size and (if available)
geometry during bootup.
Matthew Dillon [Mon, 16 Oct 2017 07:28:11 +0000 (00:28 -0700)]
kernel - Rewrite umtx_sleep() and umtx_wakeup()
* Rewrite umtx_sleep() and umtx_wakeup() to no longer use
vm_fault_page_quick(). Calling the VM fault code incurs a huge
overhead and creates massive contention when many threads are
using these calls.
The new code uses fuword(), translate to the physical address via
PTmap, and has very low overhead and basically zero contention.
* Instead, impose a mandatory timeout for umtx_sleep() and cap it
at 2 seconds (adjustable via sysctl kern.umtx_timeout_max, set
in microseconds). When the memory mapping underpinning a umtx
changes, userland will not stall for more than 2 seconds.
* The common remapping case caused by fork() is handled by the kernel
by immediately waking up all sleeping umtx_sleep() calls for the
related process.
* Any other copy-on-write or remapping cases will stall no more
than the maximum timeout (2 seconds). This might include paging
to/from swap, for example, which can remap the physical page
underpinning the umtx. This could also include user application
snafus or weirdness.
* umtx_sleep() and umtx_wakeup() still translate the user virtual
address to a physical address for the tsleep() and wakeup() operation.
This is done via a fault-protected access to the PTmap (the page-table
self-mapping).
Matthew Dillon [Mon, 16 Oct 2017 01:57:43 +0000 (18:57 -0700)]
world - World build for ucred changes
* Adjust mountd and fstat kernel structure access for
changes.
Matthew Dillon [Mon, 16 Oct 2017 00:42:26 +0000 (17:42 -0700)]
kernel - Clean up ucred and plimit cache line locality
* Move struct plimit's p_spin and p_refcnt fields into their own
cacheline. This structure is massively shared and read often.
Doing this avoids unnecessary cache line ping-pongs.
* Only use p_spin to modify a resource limit. Do not use it to
access the resource limit.
* Integrate plimit's exclusivity flag into p_refcnt.
* Move struct ucred's cr_ref into its own cacheline. This structure
is massively shared and read often. Doing this avoids unnecessary
cache line ping-pongs.
Matthew Dillon [Sun, 15 Oct 2017 21:26:20 +0000 (14:26 -0700)]
kernel - Use fcmpset in lockmgr and tokens
* Use fcmpset for lockmgr and token locks.
Matthew Dillon [Sun, 15 Oct 2017 21:20:56 +0000 (14:20 -0700)]
kernel - Add atomic_fcmpset_*()
* Add atomic_fcmpset_*(). GCC has gotten good enough that it no longer
forces that &count onto the stack.
These functions work like atomic_cmpset_*() but update the originating
value on failure, allowing us to avoid reloading it from memory.
Suggested-by: mjg__
Matthew Dillon [Sun, 15 Oct 2017 19:26:28 +0000 (12:26 -0700)]
kernel - Partition large anon mappings, optimize vm_map_entry_reserve*()
* Partition large anonymous mappings in (for now) 16MB chunks.
The purpose of this is to improve concurrent VM faults for
threaded programs. Note that the pmap itself is still a
bottleneck.
* Refactor vm_map_entry_reserve() and related code to remove
unnecessary critical sections.
Matthew Dillon [Sun, 15 Oct 2017 18:25:21 +0000 (11:25 -0700)]
kernel - Optimize struct uidinfo
* Refactor struct uidinfo. Use atomic ops for ui_posixlocks
and ui_proccnt. They were already being used for ui_openfiles
and ui_ref.
* Refactor ui_ref a bit to improve the drop code. Use a cute
trick for the transition. When we transition to 0 we allow
ui_ref to actually go to 0, and then do an independent lookup
of the uid with the hash table spinlock to conditionally free
it if it remains 0.
This allows us to completely avoid using atomic_cmpset_int(),
which can be seriously inefficient due to races in SMP
environments.
Suggested-by: mjg__
Matthew Dillon [Sun, 15 Oct 2017 18:02:15 +0000 (11:02 -0700)]
kernel - pmap->pm_spin now uses a shared spinlock
* A shared spinlock is used whenever possible for pmap->pm_spin.
This is particularly beneficial for umtx_sleep/umtx_wakeup
operations.
Matthew Dillon [Sun, 15 Oct 2017 18:01:11 +0000 (11:01 -0700)]
kernel - Increase pmap placemarks hash from 16 to 64 entries
* Increase the pmap placemarks hash from 16 to 64 entries,
improving concurrent fault performance for threads a bit.
Matthew Dillon [Sun, 15 Oct 2017 17:54:59 +0000 (10:54 -0700)]
kernel - Simplify umtx_sleep and umtx_wakeup support
* Rip out the vm_page_action / vm_page_event() API. This code was
fairly SMP unfriendly and created serious bottlenecks with large
threaded user programs using mutexes.
* Replace with a simpler mechanism that simply wakes up any UMTX
domain tsleeps after a fork().
* Implement a 4uS spin loop in umtx_sleep() similar to what the
pipe code does.
Matthew Dillon [Sat, 14 Oct 2017 06:26:56 +0000 (23:26 -0700)]
kernel - Increase ncmount_cache array
* Increase the ncmount_cache hash from 1009 to 16301. The
slow-path (which can contend heavily on the mountlist_token)
was getting hit too often in the synth test due to the
number of mounts synth maintains.
* Improve the hash function to reduce chances of collisions.
Matthew Dillon [Sat, 14 Oct 2017 04:26:30 +0000 (21:26 -0700)]
kernel - Reoptimize sys_pipe
* Use atomic ops for state updates, allowing us to avoid acquiring
the other side's token. This removes all remaining contention.
* Performance boosted by around 35%. On the ryzen, bulk buffer
write->read tests between localized cpu cores went from 9.2 GB/sec
to around 13 GBytes/sec. Cross-die performance increased from
2.5 GB/sec to around 4.5 GB/sec (gigabytes/sec).
1-byte ping-ponging (write-1/read-1/turn-around/write-back-1/
read-back1) fell from 1.0-2.0uS to 0.7uS to 1.7uS.
* Add kern.pipe.size, allowing the kernel pipe buffer size to be
changed (effects new pipes only). The default buffer size has
been increased to 32KB (it was 16KB).
* Refactor pipelining optimizations, further reducing unnecessary
tsleep/wakeup IPIs.
* Improve kern.pipe.delay operation (an IPI avoidance mechanism),
and reduce from 5uS to 4uS.
Also add cpu_pause() in the TSC loop (suggested-by mjg_).
Matthew Dillon [Sat, 14 Oct 2017 00:55:41 +0000 (17:55 -0700)]
kernel - Refactor sys_pipe
* Refactor the pipe code in preparation for optimization. Get rid of
the dual-pipe structure and instead have one pipe structure with
two buffers.
* Scrap a ton of global statistics variables that nobody uses any more,
get rid of pipe_peer, and get rid of the slock.
Matthew Dillon [Fri, 13 Oct 2017 05:59:02 +0000 (22:59 -0700)]
kernel - Improve mountlist_scan() performance, track vfs_getvfs()
* Use a shared token whenever possible, and do not hold the token
across the callback in the mountlist_scan() call.
* vfs_getvfs() mount_hold()'s the returned mp. The caller is now
expected to mount_drop() it when done. This fixes a very rare
race.
Matthew Dillon [Fri, 13 Oct 2017 03:42:33 +0000 (20:42 -0700)]
kernel - Refactor smp collision statistics (2)
* tsc_uclock_t and tsc_sclock_t need to be exposed for now for
userland.
Matthew Dillon [Thu, 5 Oct 2017 16:09:27 +0000 (09:09 -0700)]
kernel - Refactor smp collision statistics (2)
* Refactor indefinite_info mechanics. Instead of tracking indefinite
loops on a per-thread basis for tokens, track them on a scheduler
basis. The scheduler records the overhead while it is live-looping
on tokens, but the moment it finds a thread it can actually schedule
it stops (then restarts later the next time it is entered), even
if some of the other threads still have unresolved tokens.
This gives us a fairer representation of how many cpu cycles are
actually being wasted waiting for tokens.
* Go back to using a local indefinite_info in the lockmgr*(), mutex*(),
and spinlock code.
* Refactor lockmgr() by implementing an __inline frontend to
interpret the directive. Since this argument is usually a constant,
the change effectively removes the switch().
Use LK_NOCOLLSTATS to create a clean recursion to wrap the blocking
case with the indefinite*() API.
Matthew Dillon [Thu, 5 Oct 2017 05:04:13 +0000 (22:04 -0700)]
kernel - Optimize shared -> excl spinlock contention
* When exclusive request is spinning waiting for shared holders to
release, throw in addition cpu_pause()'s based on the number of
shared holders.
Suggested-by: mjg_
Matthew Dillon [Thu, 5 Oct 2017 04:46:57 +0000 (21:46 -0700)]
kernel - Refactor smp collision statistics
* Add an indefinite wait timing API (sys/indefinite.h,
sys/indefinite2.h). This interface uses the TSC and will
record lock latencies to our pcpu stats in microseconds.
The systat -pv 1 display shows this under smpcoll.
Note that latencies generated by tokens, lockmgr, and mutex
locks do not necessarily reflect actual lost cpu time as the
kernel will schedule other threads while those are blocked,
if other threads are available.
* Formalize TSC operations more, supply a type (tsc_uclock_t and
tsc_sclock_t).
* Reinstrument lockmgr, mutex, token, and spinlocks to use the new
indefinite timing interface.
Matthew Dillon [Thu, 5 Oct 2017 03:28:55 +0000 (20:28 -0700)]
kernel - KVABIO allocbuf() optimization
* When using allocbuf() to set bufsize to 0 during buffer reuse,
do not bother synchronizing the pmap.
Matthew Dillon [Wed, 4 Oct 2017 03:06:04 +0000 (20:06 -0700)]
kernel - KVABIO stabilization
* bp->b_cpumask must be cleared in vfs_vmio_release().
* Generally speaking, it is generally desireable for the kernel to set
B_KVABIO when flushing or disposing of a buffer, as long as b_cpumask
is also correct. This avoids unnecessary synchronization when
underlying device drivers support KVABIO, even if the filesystem does
not.
* In findblk() we cannot just gratuitously clear B_KVABIO. We must issue
a bkvasync_all() to clear the flag in order to ensure proper
synchronization with the caller's desired B_KVABIO state.
* It was intended that bkvasync_all() clear the B_KVABIO flag. Make
sure it does.
* In contrast, B_KVABIO can always be set at any time, so long as the
cpumask is cleared whenever the mappings are changed, and also as long
as the caller's B_KVABIO state is respected if the buffer is later
returned to the caller in a locked state. If the buffer will simply
be disposed of by the kernel instead, the flag can be set. The
wrapper (typically a vn_strategy() or dev_dstrategy() call) will clear
the flag via bkvasync_all() if the target does not support KVABIO.
* Kernel support code outside of filesystem and device drivers is
expected to support KVABIO.
* nvtruncbuf() and nvextendbuf() now use bread_kvabio() (i.e. they now
properly support KVABIO).
* The buf_countdeps(), buf_checkread(), and buf_checkwrite() callbacks
call bkvasync_all() in situations where the vnode does not support
KVABIO. This is because the kernel might have set the flag for other
incidental operations even if the filesystem did not.
* As per above, devfs_spec_strategy() now sets B_KVABIO and properly
calls bkvasync() when it needs to operate directly on buf->b_data.
* Fix bug in tmpfs(). tmpfs() was using bread_kvabio() as intended,
but failed to call bkvasync() prior to operating directly on
buf->b_data (prior to calling uiomovebp()).
* Any VFS function that calls BUF_LOCK*() itself may also have to
call bkvasync_all() if it wishes to operate directly on buf->b_data,
even if the VFS is not KVABIO aware. This is because the VFS bypassed
the normal buffer cache APIs to obtain a locked buffer.
Matthew Dillon [Tue, 3 Oct 2017 01:49:28 +0000 (18:49 -0700)]
kernel - Adjust ipiq execution code a bit
* Remove unnecessary fences
* Adjust documentation
Matthew Dillon [Tue, 3 Oct 2017 01:48:19 +0000 (18:48 -0700)]
kernel - Add wakeup() probe sysctl
* Add a sysctl to allow us to probe wakeups.
* Add a few assertions in the optimized wakeup() path.
* Adjust documentation.
Matthew Dillon [Mon, 2 Oct 2017 02:42:59 +0000 (19:42 -0700)]
kernel - Implement KVABIO API in TMPFS
* TMPFS now fully supports the KVABIO API. This removes nearly all
IPIs from buffer cache operations related to TMPFS.
* In synth tests on 32-way and 48-way servers, the number of IPIs/cpu/sec
drops from 5000-12000 down to 200-1000. Needless to say, this is a
huge win, particularly on VMs.
Recommend-by: mjg_ (Mateusz Guzik)
Matthew Dillon [Mon, 2 Oct 2017 02:39:33 +0000 (19:39 -0700)]
kernel - Add KVABIO support to NVMe, disk translation layer, and swap
* Add KVABIO support to the NVMe driver. The driver no longer
requires that buffers be synchronized to all cpus.
* Add KVABIO support to the disk translation layer. The layer no
longer requires that buffers besynchronized to all cpus (note
however that the underlying device may still require such).
* Add KVABIO support to the swap subsystem. Again, actual avoidance
of buffer memory synchronization depends on the underlying devices.
Matthew Dillon [Mon, 2 Oct 2017 02:28:56 +0000 (19:28 -0700)]
kernel - Add KVABIO API (ability to avoid global TLB syncs)
* Add KVABIO support. This works as follows:
(1) Devices can set D_KVABIO in the ops flags to specify that the
device strategy routine supports the API.
passed to
The dev_dstrategy() wrapper will fully synchronize the buffer to
all cpus prior to dispatch if the device flag is not set.
(2) Vnodes can set VKVABIO in v_flag to indicate that VOP_STRATEGY
supports the API.
The vn_strategy() wrapper will fully synchronize the buffer to
all cpus prior to dispatch if the vnode flag is not set.
(3) GETBLK_KVABIO and FINDBLK_KVABIO flags added to allow buffer
cache consumers (primarily filesystem code) to indicate that
they support the API. B_KVABIO flag added to struct buf.
This occurs on a per-acquisition basis. For example, a standard
bread() will clear the flag, indicating no support. A bread_kvabio()
will set the flag, indicating support.
* The getblk(), getcacheblk(), and cluster*() interfaces set the flag for
any I/O they dispatch, and then adjust the flag as necessary upon return
according to the caller's wishes.
Matthew Dillon [Sun, 1 Oct 2017 22:11:21 +0000 (15:11 -0700)]
kernel - Remove geteblk()
* Remove geteblk(), the last B_MALLOC buffer cache API. Generally
use getpbuf_mem() instead.
Matthew Dillon [Sun, 1 Oct 2017 22:09:52 +0000 (15:09 -0700)]
kernel - Add pmap_qenter_noinval()
* Add pmap_qenter_noinval() API
Matthew Dillon [Sun, 1 Oct 2017 19:11:10 +0000 (12:11 -0700)]
kernel - Remove repurposebuf
* Remove the repurposebuf hack to prepare for the buffer cache
KVABIO API, which is a better solution.
Matthew Dillon [Sat, 30 Sep 2017 19:14:21 +0000 (12:14 -0700)]
kernel - Remove B_MALLOC
* Remove B_MALLOC buffer support. All primary buffer cache buffer
operations should now use pages. B_VMIO is required for all
vnode-centric operations like allocbuf(), but does not have to be set
for nominal I/O.
* Remove vm_hold_load_pages() and vm_hold_free_pages(). This code was
used to support mapping ad-hoc data buffers into buf structures, but
the only remaining use case in the CAM periph code can just use
getpbuf_mem() instead. So this code is no longer used.
Sepherosa Ziehau [Mon, 16 Oct 2017 05:16:18 +0000 (13:16 +0800)]
ipfw: Factor out ipfw_init_args()
Sepherosa Ziehau [Mon, 16 Oct 2017 04:52:17 +0000 (12:52 +0800)]
ipfw: Flush the rules before unload the module.
Sepherosa Ziehau [Mon, 16 Oct 2017 04:19:23 +0000 (12:19 +0800)]
ipfw: Factor out ipfw_defrag_redispatch.
Remove no longer needed IP_FW_CONTINUE.
Sepherosa Ziehau [Mon, 16 Oct 2017 04:07:45 +0000 (12:07 +0800)]
kern: Remove ncpus2 and friends.
They were no longer used, after netisr_ncpus was delployed.
Reminded-by: dillon@
Sepherosa Ziehau [Mon, 16 Oct 2017 03:44:45 +0000 (11:44 +0800)]
mpls: Use netisr_ncpus
Reminded-by: dillon@
Sascha Wildner [Sun, 15 Oct 2017 20:52:51 +0000 (22:52 +0200)]
Update the pciconf(8) database.
October 12, 2017 snapshot from http://pciids.sourceforge.net/
Sascha Wildner [Sun, 15 Oct 2017 11:07:04 +0000 (13:07 +0200)]
LINT64: Sort vmx a bit better.
Matthew Dillon [Sun, 15 Oct 2017 07:44:38 +0000 (00:44 -0700)]
Revert "libthread_xu - Wakeup all waiters"
This reverts commit
de7ba607e4500e7df6ade3916977cc8a91e1b4e9.
* Didn't intend to push this.
Sepherosa Ziehau [Sat, 30 Sep 2017 06:39:48 +0000 (14:39 +0800)]
ipfw: Implement state based "redirect", i.e. without using libalias.
Redirection creates two states, i.e. one before the translation (xlat0)
and one after the translation (xlat1). If the hash of the translated
packet indicates that it is owned by a remote CPU:
- If the packet triggers the state pair creation, the 'xlat1' will be
piggybacked by the translated packet, which will be forwarded to the
remote CPU for further evalution. And the 'xlat1' will be installed
on the remote CPU before the evalution of the translated packet.
- Else only the translated packet will be forwarded to the remote CPU
for further evalution.
The 'xlat1' is called the slave state, which will be deleted only when
the 'xlat0' (the master state) is deleted. The state pair is always
deleted on the CPU owning the 'xlat1'; the 'xlat0' will be forwarded
there.
The reference counting of the state pair is maintained independently
in each state, the memory of the state pair will be freed only after
the sum of the counter in each state reaches 0. This avoids expensive
per-packet atomic ops.
As far as I have tested, this implementation of "redirect" does _not_
introduce any noticeable performance reduction, latency increasing or
latency destability.
This commit makes most of the necessary bits for NAT ready too.
Matthew Dillon [Sun, 15 Oct 2017 07:13:42 +0000 (00:13 -0700)]
libthread_xu - Wakeup all waiters
* For now punt on trying to wakeup an optimized numbers of waiters.
Wake up all waiters and let them sort it out.
* This may fix specific count races in threaded programs using
pthread mutexes.
Matthew Dillon [Sat, 14 Oct 2017 22:28:12 +0000 (15:28 -0700)]
hammer2 - Handle error on rename in media out of space case
* Process the error code from hammer2_chain_delete() in
hammer2_xop_nrename() to ensure that we do not try to reattach
the chain under another parent.
Reported-by: arcade (Bug #3055)
Matthew Dillon [Sat, 14 Oct 2017 21:18:39 +0000 (14:18 -0700)]
sshd - Disable tunneled clear text passwords by default
* Reapply
1cb3a32c13b and
c866a462b3. sshd on DragonFlyBSD defaults
to disabling cleartext passwords by default.
Reminded-by: ivadasz
Sascha Wildner [Sat, 14 Oct 2017 19:06:14 +0000 (21:06 +0200)]
cpdup(1): Some improvements.
* Make cpdup retry failed rmdirs after chflags. It already does this
for remove().
* When deciding whether to copy a file, cpdup should ignore the
UF_ARCHIVE file flag. If that flag is supported by the destination
file system but it's cleared on a source file, then multiple
invocations of cpdup would all copy the source file because its
flags wouldn't match. OTOH, if the destination filesystem doesn't
support UF_ARCHIVE, then there's no point in cpdup setting it.
Submitted-by: Will Andrews <will@firepipe.net>
Dragonfly-bug: https://bugs.dragonflybsd.org/issues/2987
https://bugs.dragonflybsd.org/issues/2988
https://bugs.dragonflybsd.org/issues/3067
Matthew Dillon [Sat, 14 Oct 2017 17:59:30 +0000 (10:59 -0700)]
hammer2 - Slightly reduce LZ4 output buffer limit
* LZ4_compress_limitedOutput() appears to be able to overrun the
supplied buffer.
* Slightly reduce the LZ4 output buffer limit from a 4-byte alignment
to an 8-byte alignment to try to fix the problem.
Lubos Boucek [Fri, 13 Oct 2017 21:33:01 +0000 (21:33 +0000)]
Fix additional cases of seg-faults on crypt(3) failure
* On failure, crypt(3) returns NULL, which is then used as a
strcmp(3) argument
opieftpd.c and opiesu.c are not actually used anywhere.
Sascha Wildner [Sat, 14 Oct 2017 08:48:04 +0000 (10:48 +0200)]
rc.8: Clarify foo.sh behavior.
Improve wording a bit. See NetBSD's revision 1.38.
Reported-by: Aaron LI <aly@aaronly.me>
Aaron LI [Fri, 13 Oct 2017 04:26:29 +0000 (12:26 +0800)]
disklabel64: Fix an error message
Sascha Wildner [Sat, 14 Oct 2017 08:38:45 +0000 (10:38 +0200)]
ifconfig(8): Add 'lscan'. Like 'scan', but displays long SSIDs.
Submitted-by: Max Herrgard <herrgard@gmail.com>
Matthew Dillon [Sat, 14 Oct 2017 06:14:31 +0000 (23:14 -0700)]
mkinitrd - Add missing /var/db
* dhclient also needs /var/db to exist, make sure it does.
Reported-by: amonk
Matthew Dillon [Sat, 14 Oct 2017 05:39:31 +0000 (22:39 -0700)]
mkinitrd - Add missing /var/empty
* /var/empty is required by dhclient, which will SIGHUP itself
without it.
Reported-by: amonk
Matthew Dillon [Sat, 14 Oct 2017 04:44:06 +0000 (21:44 -0700)]
kernel - Rearrange namecache globals a bit
* Make sure ncspin and ncneglist are in the same cache line, and
do not overlap other globals in that cache line.
Suggested-by: mjg_
Matthew Dillon [Sat, 14 Oct 2017 04:41:46 +0000 (21:41 -0700)]
test - Cleanup pipe2
* Cleanup the pipe2 code a bit
Matthew Dillon [Fri, 13 Oct 2017 01:33:48 +0000 (18:33 -0700)]
Import OpenSSH-7.6p1
* Adjust build to match import
Matthew Dillon [Fri, 13 Oct 2017 01:32:28 +0000 (18:32 -0700)]
Import OpenSSH-7.6p1
* Import OpeNSSH-7.6p1. Couldn't really merge from the vendor branch
so just brought it in.
* Adjustments for WARNS issues
Tomohiro Kusumi [Thu, 12 Oct 2017 20:03:00 +0000 (23:03 +0300)]
usr.sbin/fstyp: Update fstyp(8) for HAMMER2
Tomohiro Kusumi [Thu, 12 Oct 2017 20:08:52 +0000 (23:08 +0300)]
sbin/newfs_hammer2: Add missing free() for uuid_to_string'd strings
Tomohiro Kusumi [Thu, 12 Oct 2017 19:57:45 +0000 (22:57 +0300)]
sbin/newfs_hammer2: Fix compile-time warning on Linux distros (gcc6)
--
warning: pointer targets in passing argument 1 of 'snprintf' differ in signedness [-Wpointer-sign]
warning: pointer targets in passing argument 1 of 'strlen' differ in signedness [-Wpointer-sign]
Tomohiro Kusumi [Thu, 12 Oct 2017 19:48:11 +0000 (22:48 +0300)]
sbin/newfs_hammer2: Check S_ISREG()
The comment says as follows, so check S_ISREG().
/* Allow the formatting of regular files as HAMMER2 volumes */
This is also what the same function in HAMMER1 does.
Sascha Wildner [Thu, 12 Oct 2017 08:16:06 +0000 (10:16 +0200)]
<netdb.h>: Adjust comment a bit.
Sascha Wildner [Thu, 12 Oct 2017 08:13:31 +0000 (10:13 +0200)]
libc/net: Add NI_NUMERICSCOPE flag for getnameinfo().
Code to handle it is already present in getnameinfo() but we
were missing the flag so far.
See http://pubs.opengroup.org/onlinepubs/
9699919799/basedefs/netdb.h.html
Thomas Nikolajsen [Wed, 11 Oct 2017 23:12:47 +0000 (01:12 +0200)]
Merge commit 'crater/master'
Thomas Nikolajsen [Wed, 11 Oct 2017 23:11:00 +0000 (01:11 +0200)]
Merge commit 'crater/master'
Thomas Nikolajsen [Wed, 11 Oct 2017 23:03:46 +0000 (01:03 +0200)]
systat.1: Update man page: sync to current program & improve markup a bit
Substantial changes has happened, especially for vmstat display.
Sascha Wildner [Wed, 11 Oct 2017 20:05:26 +0000 (22:05 +0200)]
kernel/atkbdc: Fix a prototype.
Sascha Wildner [Wed, 11 Oct 2017 20:04:43 +0000 (22:04 +0200)]
kernel/cam: Add some missing parameter names.
Just like the rest of the file.
Sascha Wildner [Wed, 11 Oct 2017 17:31:25 +0000 (19:31 +0200)]
<vfs/hammer2/hammer2.h>: Fix parameter names in two prototypes.
Discussed-with: dillon
Sascha Wildner [Wed, 11 Oct 2017 14:55:23 +0000 (16:55 +0200)]
kernel: Simplify various redundant conditions.
Found-by: cppcheck
One was reported by dcb in <https://bugs.dragonflybsd.org/issues/3078>.
Matthew Dillon [Tue, 10 Oct 2017 22:38:08 +0000 (15:38 -0700)]
libc - Bring in s_ceill.c v1.2 from OpenBSD (2)
* Note, correction, v1.3 from OpenBSD, not v1.2
* Restore a cast that we need to compile with our higher WARNS level.
Reported-by: marino, xenu
Lubos Boucek [Mon, 2 Oct 2017 02:16:10 +0000 (02:16 +0000)]
kernel/mrsas: Simplify redundant conditions and remove never used variable
Reported-by: dcb
Matthew Dillon [Tue, 10 Oct 2017 02:17:32 +0000 (19:17 -0700)]
libc - Bring in s_ceill.c v1.2 from OpenBSD
fix a case where ceill() returns 1.0L: in the x86 extended precision format
the fraction part has no implicit bit.
Reported-by: xenu
Taken-from: OpenBSD
Sascha Wildner [Sun, 8 Oct 2017 07:47:58 +0000 (09:47 +0200)]
hammer2.8/pthread_attr_setaffinity_np.3: Fix mdoc issues.
Thomas Nikolajsen [Sat, 7 Oct 2017 14:14:52 +0000 (16:14 +0200)]
disklabel64.8: Add HAMMER2 fstype info.
Thomas Nikolajsen [Sat, 7 Oct 2017 14:09:36 +0000 (16:09 +0200)]
periodic.conf.5: Add hammer2 variables.
Add description for periodic HAMMER2 script variables: 161.clean_hammer2.
While here add HAMMER man pages to SEE ALSO section.
Thomas Nikolajsen [Sat, 7 Oct 2017 13:53:37 +0000 (15:53 +0200)]
etc/periodic/daily/161.clean-hammer2: Fix typo
pfslist variable for HAMMER, not HAMMER2 was used.
This will typically have no effect, as pfslist is typically empty.
Thomas Nikolajsen [Sat, 7 Oct 2017 13:19:41 +0000 (15:19 +0200)]
periodic.conf: Fix typo in comment
Thomas Nikolajsen [Sat, 7 Oct 2017 13:13:11 +0000 (15:13 +0200)]
mount_hammer2(8): Add man page.
Matthew Dillon [Fri, 6 Oct 2017 05:59:40 +0000 (22:59 -0700)]
kernel - Refuse to swapoff under certain conditions
* Both tmpfs and vn can't handle swapoff's method of bringing pages
back in from the swap partition being decomissioned.
* Fixing this properly is fairly involved. The normal swapoff procedure
is to page swap into the related VM object, but tmpfs and vn use their
VM objects ONLY to track swap blocks and not for vm_page manipulation,
so that just won't work. In addition, the swap code may associate
a swap block with a VM object before issuing the write I/O to page
out the data, and the swapoff code's asynchronous pagein might cause
problems.
For now, just make sure that swapoff refuses to remove the partition
under these conditions, so it doesn't blow up tmpfs or vn.
Matthew Dillon [Fri, 6 Oct 2017 01:57:33 +0000 (18:57 -0700)]
tmpfs - Fix bug in call to vinitvmio()
* TMPFS_BLKMASK was being passed to vinitvmio() instead of
TMPFS_BLKSIZE. It is unclear if this caused any particular
issue other than an occasional console warning. Fixed.
Matthew Dillon [Thu, 5 Oct 2017 20:46:54 +0000 (13:46 -0700)]
kernel - Change index fields from unsigned to signed
* We use a signed trick for (j), fix the code so it actually works.
* The chipset field used to index (i) cannot exceed 1024 anyway.
Reported-by: lubos Bug #3020
Lubos Boucek [Sat, 23 Sep 2017 07:12:28 +0000 (07:12 +0000)]
Fix seg-faults on crypt(3) failure
Lubos Boucek [Fri, 22 Sep 2017 22:27:18 +0000 (22:27 +0000)]
Improve kdump.1 and ktrace.1
Aaron LI [Wed, 27 Sep 2017 10:24:05 +0000 (18:24 +0800)]
nologin(8): Sync with FreeBSD; Symlink as /usr/sbin/nologin
* Sync "nologin.c" with FreeBSD. Login attempts are logged into syslog.
* Create symlink "/usr/sbin/nologin" to "/sbin/nologin". FreeBSD
(and Linux) installs "nologin" at "/usr/sbin/nologin", and the users
created by DPorts/packages also use "/usr/sbin/nologin" (see
"/usr/dports/UIDs").
* Statically link "nologin" as done by FreeBSD.
Sascha Wildner [Thu, 5 Oct 2017 18:16:24 +0000 (20:16 +0200)]
camcontrol(8): Check scsiserial()'s error, too.
After some testing with devices that have no serial number, it looks
like this is safe to add nowadays.
Reported-by: dcb
Submitted-by: Lubos Boucek <bouceklubos@gmail.com>
Dragonfly-bug: <https://bugs.dragonflybsd.org/issues/3059>
Sepherosa Ziehau [Thu, 5 Oct 2017 06:06:11 +0000 (14:06 +0800)]
socket: Limit the number of accepted sockets that kevent reports.
By default it is limited to 32. It can be changed through:
sysctl kern.ipc.soavailconn=X
This change does _not_ affect userland using accept(2) in the following
way:
for (;;) {
s = accept();
if (s < 0 && errno == EAGAIN)
break;
/* Processing accepted socket. */
}
This change only affects optimized userland using kevent.data to avoid
extra accept(2) syscall:
for (i = 0; i < kevent.data; ++i) {
s = accept();
/* Processing accepted socket. */
}
The above logic is applied by nginx. However, due to the cost of the
"Processing accepted socket" parts, this kinda of loop can increase
latency and destablize latency.
The comparison w/ 30K concurrent connections, 1 request/connection.
1K web object
| performance | lat-avg | lat-stdev | lat-99%
---------+--------------+----------+-----------+----------
no limit | 210279.88tps | 59.19ms | 4.60ms | 69.02ms
---------+--------------+----------+-----------+----------
32 limit | 217599.01tps | 32.00ms | 2.35ms | 35.59ms
========
8K web object
| performance | lat-avg | lat-stdev | lat-99%
---------+--------------+----------+-----------+----------
no limit | 180627.61tps | 70.53ms | 4.95ms | 80.61ms
---------+--------------+----------+-----------+----------
32 limit | 186324.41tps | 37.41ms | 4.81ms | 48.69ms
========
16K web object
| performance | lat-avg | lat-stdev | lat-99%
---------+--------------+----------+-----------+----------
no limit | 138667.84tps | 95.93ms | 14.90ms | 135.47ms
---------+--------------+----------+-----------+----------
32 limit | 138778.11tps | 60.90ms | 11.80ms | 92.07ms
This change significantly reduces average latency and .99 latency,
and performance is improved slightly.
Sascha Wildner [Wed, 4 Oct 2017 17:01:17 +0000 (19:01 +0200)]
Bring in vmx(4) (VMware virtual network driver, aka vmxnet3).
Some features are still disabled, namely LRO, TSO, VLAN_HWFILTER,
and MSI-X support. That being said, it works and seems stable.
Tested-by: swildner (VMware Player 7.1.4 build-3848939)
tuxillo (VMware ESXi 6.5.0 (Build 4887370))
Taken-from: FreeBSD (in turn based on OpenBSD's driver)
Matthew Dillon [Tue, 3 Oct 2017 01:42:34 +0000 (18:42 -0700)]
kernel - Fix GCC reordering problem with td_critcount
* Wrap all ++td->td_critcount and --td->td_critcount use cases
with an inline which executes cpu_ccfence() before and after,
to guarantee that GCC does not try to reorder the operation around
critical memory changes.
* This fixes a race in lockmgr() and possibly a few other places
too.
Matthew Dillon [Sun, 1 Oct 2017 18:18:49 +0000 (11:18 -0700)]
kernel - Fix rare lockmgr() state transition (2)
* Fix two lock timeout cases for LK_EXCLUPGRADE and LK_UPGRADE, and
fix a bug in undo_upreq().
* A tsleep failure (such as the LK_TIMELOCK case via
vm_map_lock_read_to()) was not properly backing-out a LKC_UPREQ,
resulting in a situation where the lock becomes exclusively owned
by nobody and deadlocks against all-comers. Fix by properly
calling undo_upreq().
* Fix a bug in undo_upreq() itself. When undoing a granted UPREQ,
the lockholder must be set prior to releasing the now-granted
exclusive lock in order to avoid an assertion panic.
* While we are at it, replace a weird cmpset count,count with a
fetchadd(count, 0).
Tomohiro Kusumi [Sun, 1 Oct 2017 12:37:54 +0000 (15:37 +0300)]
sbin/hammer: Fix compile-time warning by some Linux distros
--
test_dupkey.c: In function 'main':
test_dupkey.c:54:1: warning: control reaches end of non-void function [-Wreturn-type]
}
Sascha Wildner [Sun, 1 Oct 2017 10:09:02 +0000 (12:09 +0200)]
Fix some minor issues in several manual pages.