Sascha Wildner [Sat, 23 Sep 2017 19:23:05 +0000 (21:23 +0200)]
Merge branch 'vendor/FILE'
Sascha Wildner [Sat, 23 Sep 2017 19:22:44 +0000 (21:22 +0200)]
Revert "Import file-5.22."
This reverts commit
89a9c80e537ed7238142c9af2cdc03401742a18a.
For some reason the 5.22 upgrade was not git-merged, looks like
copied instead. Caused merge conflicts with 5.32.
Sascha Wildner [Sat, 23 Sep 2017 19:13:08 +0000 (21:13 +0200)]
Import file-5.32.
See ChangeLog for details.
Imre Vadász [Sat, 23 Sep 2017 15:04:38 +0000 (17:04 +0200)]
microuptime.9 microtime.9: Fix documentation of the get* function versions.
The kern.timecounter sysctl tree doesn't exist nowadays, the getmicrotime(),
getnanotime(), getmicrouptime() and getnanouptime() functions always
return the less precise time.
Tomohiro Kusumi [Sat, 23 Sep 2017 11:27:20 +0000 (14:27 +0300)]
sbin/newfs_hammer2: Fix typo in newfs_hammer2(8)
of of
Tomohiro Kusumi [Fri, 22 Sep 2017 22:17:20 +0000 (01:17 +0300)]
usr.sbin/fstyp: Add initial HAMMER2 support
-l option and multiple/partial volumes are not supported yet.
Tomohiro Kusumi [Thu, 21 Sep 2017 16:06:37 +0000 (19:06 +0300)]
sys/vfs/hammer: Add typedef hammer_uuid_t
Add typedef for uuid_t for better portability,
similar to hammer_crc_t and other hammer_xxx_t.
(Some platforms have char[16] for uuid_t instead of struct value)
No functional changes.
Tomohiro Kusumi [Thu, 21 Sep 2017 16:06:16 +0000 (19:06 +0300)]
sbin/hammer: Add uuid.c
Add a simple wrapper over uuid functions for better portability,
similar to sys/vfs/hammer/hammer_crc.h (which helped implement
version 7 CRC).
No functional changes.
Imre Vadász [Sat, 23 Sep 2017 11:12:34 +0000 (13:12 +0200)]
psm: Drop bpsm%d device files. Instead support non-blocking reads on psm%d.
The /dev/psm%d vs. /dev/bpsm%d separation doesn't serve any clear purpose
nowadays. Userland can just use fcntl(2) to switch the fd to non-blocking
or blocking mode as needed.
Matthew Dillon [Fri, 22 Sep 2017 16:27:04 +0000 (09:27 -0700)]
hammer2 - Fix hammer2 snapshot operation
* Bring the hammer2 snapshot code up-to-date with the pfs-create
code.
* Fix initial inode number assignment for hammer2 snapshot code (it
was starting at 1024 which obviously won't work).
* Correct hammer2_vop_ncreate() error code - it was not converting
the hammer2 error code to a system error code.
Imre Vadász [Fri, 22 Sep 2017 15:46:45 +0000 (17:46 +0200)]
psm: Get rid of PSM_LEVEL_NATIVE, and the psmwrite method used with that.
* Nothing in userspace ever uses this feature. This apparently was intended
to allow implementing the complete mouse packet parsing in userspace.
Imre Vadász [Fri, 22 Sep 2017 12:21:52 +0000 (14:21 +0200)]
psm: Remove dead unused code: psmpoll(), enable_lordless(), is_a_mouse().
* The is_a_mouse() check method was already disabled in the original
FreeBSD commit, which added the psm(4) driver
(git
b3062b5d6a9d9631bf6a1612e27107ea9aa6801d ).
Sepherosa Ziehau [Fri, 22 Sep 2017 01:09:10 +0000 (09:09 +0800)]
inet/inet6: Randomize local port
Due to avoid lock intruction, this also improves connect(2)
performance a bit.
Sepherosa Ziehau [Thu, 21 Sep 2017 23:35:21 +0000 (07:35 +0800)]
arc4random: Make arc4random context per-cpu.
Critical section is commented out, no consumers from ISRs/ithreads.
Matthew Dillon [Fri, 22 Sep 2017 05:01:03 +0000 (22:01 -0700)]
hammer2 - Fix panic related to the accounting for pfs-create
* Properly disconnect the inode created by pfs-create from the spmp so it
can be reassociated with the pmp.
* Do not allow the newly created inode to be emplaced on the spmp's sideq,
which will cause a duplicate inode structure to be created if the
pfs is then mounted.
Reported-by: Romick
Matthew Dillon [Fri, 22 Sep 2017 00:35:56 +0000 (17:35 -0700)]
hammer2 - Fix flush issues with unmounted PFSs and shutdown panic
* Fix flush and shutdown issues when unmounted PFS's are present.
These PFSs do not get flushed by the filesystem sync code because
they haven't been mounted, but may still contain modified or
referenced chains, as well as sideq'd inodes.
* Fix some other cleanup issues when unmounting. Clean out vchain.pmp
and fchain.pmp for the spmp during the unmount scan, which fixes a
hammer2 pfs_memory_*() panic.
Reported-by: yellowrabbit2010
Sepherosa Ziehau [Thu, 21 Sep 2017 07:04:18 +0000 (15:04 +0800)]
arc4random: Minor style changes.
Use uintX_t instead of u_intX_t.
Sepherosa Ziehau [Thu, 21 Sep 2017 05:46:41 +0000 (13:46 +0800)]
x86: Use kmem_alloc3 for cpu0's ipiq
Matthew Dillon [Thu, 21 Sep 2017 06:49:51 +0000 (23:49 -0700)]
hammer2 - performance pass
* Get rid of vfs.hammer2.cluster_write and stop using cluster_write()
for the block device I/O. This coupled into common unlock/lock
situations on chains which would acquire and retire the DIO, and
usually thus also the underlying buffer, many times before it
really needed to be committed.
This greatly reduces unnecessary writes to disk.
* Increase HAMMER2_FLUSH_DEPTH_LIMIT to 60. It was set to 10 for
debugging purposes. This created an O(N^2) overhead situation
in hammer2_flush(). 20,000 dirty inodes would translate to
30 million chain scans, resulting in cpu-bound stalls for long
periods of time.
Fixing this value reduces a 20,000 dirty inode flush to around
200,000 chain scans (100x faster).
* Use hammer2_chain_ref_hold() and hammer2_chain_drop_unhold()
to reduce the amount of buffer cache buffer cycling that occurs
during a flush, by retaining the DIO associated with a parent
chain across its unlock/recurse/relock sequence.
The number of buffers held locked is limited by the flush recursion
depth.
Sepherosa Ziehau [Wed, 20 Sep 2017 05:40:08 +0000 (13:40 +0800)]
ipfw: Factor out fucntion to setup local variables.
Sepherosa Ziehau [Wed, 20 Sep 2017 00:21:58 +0000 (08:21 +0800)]
ipfw: Add ipfrag filter.
Unlike 'frag' filter, which only matches non-first IP fragments,
this filter matches all IP fragments.
Sepherosa Ziehau [Wed, 20 Sep 2017 00:13:57 +0000 (08:13 +0800)]
ipfw: Remove unnecessary complexity
Matthew Dillon [Wed, 20 Sep 2017 00:31:03 +0000 (17:31 -0700)]
hammer2 - Remove debugging, adjust iocom
* Call hammer2_iocom_uninit() before we start cleaning up the hmp.
* Remove numerous debug messages.
Matthew Dillon [Wed, 20 Sep 2017 00:29:42 +0000 (17:29 -0700)]
kernel - Fix races in kern_dmsg.c (hammer2)
* Fix kdmsg races during shutdown which can assert or panic
* Fixes numerous hammer2 assertions or panics related to unmounting,
including mount failures due to missing labels.
Matthew Dillon [Tue, 19 Sep 2017 21:20:25 +0000 (14:20 -0700)]
kernel - Remove some kdmsg debugging
* Remove '<blah> thread terminating' kdmsg debug messages.
Matthew Dillon [Tue, 19 Sep 2017 21:13:57 +0000 (14:13 -0700)]
kernel - support dummy reallocblks in devfs
* cluster_write() calls VOP_REALLOCBLKS() in certain situations.
* Supply a dummy for devfs's .vop_reallocblks to avoid a panic.
Reported-by: tuxillo
François Tigeot [Tue, 19 Sep 2017 20:15:35 +0000 (22:15 +0200)]
gpt(8): Add HAMMER and HAMMER2 support
This makes it possible to create HAMMER or HAMMER2 partitions
with simple commands such as:
gpt add -t hammer2 /dev/device
Sascha Wildner [Tue, 19 Sep 2017 18:24:03 +0000 (20:24 +0200)]
boot/loader: Fix the 'crc' command to the intended code.
It doesn't change the result, but fixes a cppcheck warning.
Reported-by: dcb
Fix-submitted-by: Lubos Boucek
Dragonfly-bug: <https://bugs.dragonflybsd.org/issues/3060>
Tomohiro Kusumi [Sat, 16 Sep 2017 17:53:35 +0000 (20:53 +0300)]
sbin/hammer: Use uuid_compare(3) instead of bcmp(3)
Tomohiro Kusumi [Sat, 16 Sep 2017 16:02:55 +0000 (19:02 +0300)]
sbin/newfs_hammer: Use uuid_create(3) instead of uuidgen(2)
HAMMER userspace uses uuid_create(3) except for this one.
uuidgen(2) syscall isn't part of the specification.
Tomohiro Kusumi [Sat, 16 Sep 2017 12:23:36 +0000 (15:23 +0300)]
sbin/newfs_hammer: Use hwarnx() instead of hwarn()
This one should be with x.
Sascha Wildner [Tue, 19 Sep 2017 14:23:12 +0000 (16:23 +0200)]
hammer2(8): Fix printf.
Sepherosa Ziehau [Sat, 16 Sep 2017 06:17:52 +0000 (14:17 +0800)]
ipfw: Add defrag action.
IP fragment reassembling is almost required for stateful firewall,
and will be needed for in-kernel NAT.
NOTE: Reassemabled IP packets will be passed to the next rule for
further evaluation.
Matthew Dillon [Tue, 19 Sep 2017 09:07:51 +0000 (02:07 -0700)]
hammer2 - Fix corruption on sync (2)
* Looping on ONFLUSH to call RB_SCAN() can be endless due to deferrals.
Just do it twice to catch the indirect block maintenance issue.
Matthew Dillon [Tue, 19 Sep 2017 08:35:41 +0000 (01:35 -0700)]
hammer2 - Fix corruption on sync, fix excessive stall, optimize sideq
* Fix topology corruption which can occur due to the new
hammer2_chain_indirect_maintenance() code. This code can make
modifications to the parent from inside the flush code itself.
This can cause the flush code's RB_SCAN() recursion to miss
mandatory chains during the flush, resulting in some of the
topology missing from the synchronized flush.
This bug could cause corruption due to a crash, but not due to
a normal unmount, shutdown, or reboot, because that code always
runs extra sync() calls which corrects the problem.
Fix the bug by detecting that UPDATE was again set in the parent
and run the RB_SCAN() again.
* Fix an excessive stall that can occur in the XOP code due to a
sleep/wakeup race. This race could cause a VOP operation to stall
for 60 seconds (it then hit some failsafe code and continued running
normally).
Fix this issue by removing hamemr2_xop_head->check_counter and
integrating its flagging functions into run_mask. Increase run_mask
to 64 bits to accomodate the counter in the upper 32 bits.
* Optimize hammer2_inode_run_sideq(). Avoid running the sideq if the
number of detached inodes is not significant, except when flushing
in which case we always want to run the entire sideq.
Matthew Dillon [Tue, 19 Sep 2017 08:34:37 +0000 (01:34 -0700)]
hammer2 - augment freemap directive
* The hammer2 freemap debugging dump now sums up free blocks and
displays the results, allowing the actual free bytes to be
compared against df output.
Sascha Wildner [Mon, 18 Sep 2017 07:01:58 +0000 (09:01 +0200)]
Update the pciconf(8) database.
September 17, 2017 snapshot from http://pciids.sourceforge.net/
Matthew Dillon [Mon, 18 Sep 2017 02:36:14 +0000 (19:36 -0700)]
hammer2 - push missing file (cmd_destroy.c)
* Push missing file for the 'destroy' directive.
Sascha Wildner [Sun, 17 Sep 2017 16:07:54 +0000 (18:07 +0200)]
shm_open(3): Set the FD_CLOEXEC flag for the new fd, per POSIX.
See:
http://pubs.opengroup.org/onlinepubs/
9699919799/functions/shm_open.html
Sepherosa Ziehau [Sat, 16 Sep 2017 23:47:54 +0000 (07:47 +0800)]
ip: Don't double check length.
Sepherosa Ziehau [Sat, 16 Sep 2017 22:40:50 +0000 (06:40 +0800)]
dummynet: ip_input expects ip_off/ip_len in network byte order.
Sepherosa Ziehau [Sat, 16 Sep 2017 22:27:11 +0000 (06:27 +0800)]
ipfw/ipfw3: Use INTWAIT|NULLOK for mtag allocation.
Sepherosa Ziehau [Sat, 16 Sep 2017 22:21:05 +0000 (06:21 +0800)]
dummynet: Don't deliver freed mbuf to callers.
Sepherosa Ziehau [Sat, 16 Sep 2017 22:02:30 +0000 (06:02 +0800)]
ip: Move mbuf length assertion into an earlier place.
Before mbuf is casted to ip.
Matthew Dillon [Sun, 17 Sep 2017 04:50:11 +0000 (21:50 -0700)]
kernel - Order ipfw3 module before other ipfw3_* modules
* Order ipfw3 first, i.e. before any other ipfw3_* modules. This avoids
an assertion in the other modules during their init.
Reported-by: shassard (irc)
Matthew Dillon [Sun, 17 Sep 2017 01:17:16 +0000 (18:17 -0700)]
hammer2 - Add directive to destroy bad directory entries
* Add a directive and ioctl that is capable of destroying bad hammer2
directory entries. If topological corruption occurs due to a crash
(which theoretically shouldn't be possible with HAMMER2), this directive
allows you to destroy directory entries which do not have working inodes
and cannot otherwise be destroyed with 'rm'.
* Sysops should only use this directive when absolutely necessary.
Sepherosa Ziehau [Sat, 16 Sep 2017 06:45:42 +0000 (14:45 +0800)]
mtag: Use kmalloc flags, instead of just M_WAITOK or M_NOWAIT.
This allows more fine-grained mtag allocation control, e.g.
M_INTWAIT|M_NULLOK.
Sepherosa Ziehau [Sat, 16 Sep 2017 02:54:49 +0000 (10:54 +0800)]
netisr: Make dynamic netisr rollup register/unregister MPSAFE.
Sepherosa Ziehau [Sat, 16 Sep 2017 02:07:33 +0000 (10:07 +0800)]
netisr: Use kmem_alloc3 for netisr thread and netlastfunc.
Matthew Dillon [Fri, 15 Sep 2017 17:21:14 +0000 (10:21 -0700)]
hammer2 - Fix inode nlinks / directory-entry desynchronization on crash
* Hammer2 must flush dirty inodes, buffers, and chains when doing a sync,
before writing-out the volume header.
* Inodes are flushed in two stages... we flush inodes via vfsyncscan()
which runs through dirty vnodes, but inodes disassociated from vnodes
are recorded separately and must also be flushed. This is handled by
hammer2_inode_run_sideq().
* Fix an ordering bug where hammer2_inode_run_sideq() was being called
before vfsyncscan() instead of after. This could result in some dirty
inodes slipping through the cracks by getting retired by the system
after the hammer2_inode_run_sideq() call but before vfsyncscan() can
get to them.
Fixed by calling hammer2_inode_run_sideq() after vfsyncscan() instead
of before.
Note that vnodes cannot normally be dirtied during the serialized portion
of the flush because the flush serializes against modifying VOPs. So we
should not have a second source of desynchronization from that sort of
activity. In fact, strategy calls via shared R/W mmap()'s can execute
concurrent with a flush, but these will have no effect on inode size
or nlinks.
Sepherosa Ziehau [Fri, 15 Sep 2017 05:20:39 +0000 (13:20 +0800)]
tcp: Use primary hash for TCP ports.
This fixes the hash aliasing issue, which is caused by port space
devisiion. Improve TCP connection establish performance a bit.
Sepherosa Ziehau [Fri, 15 Sep 2017 04:32:41 +0000 (12:32 +0800)]
tcp/udp: Make sure hash size macro is powerof2
Matthew Dillon [Thu, 14 Sep 2017 20:31:45 +0000 (13:31 -0700)]
hammer2 - Instrument error path for indirect block maintenance
* Instrument error path, fix a crash case when 'chain' cannot be modified
(usually due to a filesystem full error). Just complain instead.
* Add some temporary debugging for another possible issue under test.
Reported-by: arcade@b1t.name
Matthew Dillon [Thu, 14 Sep 2017 17:36:23 +0000 (10:36 -0700)]
kernel - Fix memory ordering race
* Fix a race in the mtx wait/wakeup code for situations where the
releasing thread hands lock ownership to the waiter. In this
situation the waiter can sometimes succeed without having to do
additional atomic ops. However, this also allows speculative reads
by the waiting cpu to preceed the lock handover.
* Add an mfence to fix this problem. Add a few cpu_sfence()s (which
are basically NOPs on Intel) to clarify other bits of code too.
Matthew Dillon [Wed, 13 Sep 2017 23:07:27 +0000 (16:07 -0700)]
hammer2 - Remove dead code, clarify comment
* Remove some dead code.
* Clarify the flags passed in to hammer2_chain_getparent() and
hammer2_chain_repparent().
Matthew Dillon [Wed, 13 Sep 2017 23:03:19 +0000 (16:03 -0700)]
kernel - Fix shared lock bug in kern_mutex.c
* When the last exclusive lock is unlocked or when downgrading an exclusive
lock to a shared lock, pending shared links must be processed. The
last 'lock count' is transfered to the first link, thus preventing the
lock from getting ripped out from under the transfer code.
* However, when multiple shared links are pending, it is possible for the
first recipient link to wakeup and release its lock before the unlock/drop
code is able to finish its scan, which places the lock in an unexpected
state. The lock count was only being incremented within the link scan
loop, once at a time.
* Fix the problem by applying a lock count representing ALL pending
shared lock links after the first one before processing the first link.
This ensures that the lock remains in a shared-lock state while the loop
is running.
* This fixes a race that can occur in HAMMER2.
Antonio Huete Jimenez [Wed, 13 Sep 2017 20:32:12 +0000 (22:32 +0200)]
installer - Avoid endless loop for UEFI installations
- While doing an UEFI installation after selecting the disk if the dialog
to write changes to the disk is cancelled there was no way to get back
to the previous screen.
- Fix it by going to the select disk state.
Sascha Wildner [Wed, 13 Sep 2017 20:23:43 +0000 (22:23 +0200)]
kernel/hammer2: Rename DEBUG to H2_ZLIB_DEBUG in the zlib code.
This unbreaks LINT64, to which hammer2 was added in
cf4ab83ee58092c57
without actually having tested it.
There is a DEBUG kernel option that this conflicts with. Also, most of
this code is userland code, not kernel code.
H2's zlib really needs to be cleaned up better.
Matthew Dillon [Wed, 13 Sep 2017 17:33:27 +0000 (10:33 -0700)]
hammer2 - Allow simple 'mount @label <target>' shortcut for snapshots
* If any hammer2 PFS on a device is already mounted, all other PFS's on
the device can be mounted simply by specifying their label. There is
no need to specify the device. e.g.:
# hammer2 pfs-list /build
Type ClusterId (pfs_clid) Label
MASTER
726d8ab1-9839-11e7-98a7-
6145cb9ac050 ROOT
MASTER
726d8a72-9839-11e7-98a7-
6145cb9ac050 LOCAL
SNAPSHOT
eb19b5fa-98a7-11e7-98a7-
6145cb9ac050 ROOT.
20170913.102102
#
# mount @ROOT.
20170913.102102 /mnt
Sascha Wildner [Wed, 13 Sep 2017 12:10:57 +0000 (14:10 +0200)]
ipfw: WARNS=6 isn't necessary, it's in the parent Makefile.inc.
Sepherosa Ziehau [Wed, 13 Sep 2017 01:41:43 +0000 (09:41 +0800)]
ipfw: Raise WARNS to 6
Sepherosa Ziehau [Wed, 13 Sep 2017 01:28:18 +0000 (09:28 +0800)]
ipfw: Raise WARNS to 3
Sepherosa Ziehau [Wed, 13 Sep 2017 01:07:45 +0000 (09:07 +0800)]
sshlockout: Add ipfw(8) table support.
Matthew Dillon [Wed, 13 Sep 2017 02:50:47 +0000 (19:50 -0700)]
kernel - Fix sys% time reporting
* Fix system time reporting in systat -vm 1, systat -pv 1, and process
stats.
* Basically the issue is that when coincident systimer interrupts occur,
such as when the statclock, hardclock, and schedclock all fire at the
same time, the statclock must execute first in order to properly detect
the state the current thread is in. If it does not, it may see a lwkt
thread schedule by one of the other systimers and improper dock the
current thread as being in 'system' time.
* The various systimer interrupts could wind up out of phase and
desynchronized due to the tsc_frequency not being perfectly divisible
by the requested frequencies. In addition, various timers could queue
in an undesirable order due to being different integral frequencies of
each other.
* Refactor the systimer API a bit, adding new functions which guarantee
synchronization for nominally requested frequencies and which guarantee
ordering for coincident systimer events (which statclock uses). This
should completely solve the problem.
* Also, if the RQF_INTPEND flag is set, count as interrupt time. This
will give us a slightly more accurate understanding of interrupt overhead
(alternatively we could do this test for just the case where curthread is
the idlethread, which might be more accurate).
Matthew Dillon [Tue, 12 Sep 2017 23:42:08 +0000 (16:42 -0700)]
kernel - Change legacy MBR partition type from 0xA5 to 0x6C
* Should have done this years ago but finally change the legacy MBR
partition type DragonFlyBSD uses from 0xA5 (which was shared with
FreeBSD), to something different 0x6C.
* Makes it less confusing for Grub.
* Does not change EFI boot, which uses 16-byte UUIDs (we already have
our own) and does not use 8-bit partition ids.
* Boot code and kernel now recognize both 0xA5 and 0x6C. Existing users
do *NOT* need to reinstall their boot code.
Sascha Wildner [Tue, 12 Sep 2017 19:09:45 +0000 (21:09 +0200)]
mount_udf.8: Correct typo in arguments.
Sepherosa Ziehau [Tue, 12 Sep 2017 07:22:19 +0000 (15:22 +0800)]
sshlockout: Style changes; no functional changes.
Sepherosa Ziehau [Thu, 7 Sep 2017 00:56:57 +0000 (08:56 +0800)]
ipfw: Add per-cpu table support.
This is intended to improve performance and reduce latency for
matching discrete addresses. Table itself is radix tree.
For exmaple, nginx, 1KB web object, 30K concurrent connections,
1 request/connection. ipfw is running on the server side.
Comparison between no-match rules and no-match table entries:
| perf-avg | lat-avg | lat-stdev | lat-99%
| (tps) | (ms) | (ms) | (ms)
-------------------+-----------+---------+-----------+---------
100 nomatch rules | 184752.65 | 67.50 | 5.69 | 79.11
-------------------+-----------+---------+-----------+---------
100 nomatch tblent | 200754.53 | 61.18 | 5.72 | 73.10
1K nomatch rules | 90836.43 | 144.72 | 12.28 | 168.97
-------------------+-----------+---------+-----------+---------
1K nomatch tblent | 199750.35 | 61.54 | 5.73 | 72.90
10K nomatch rules | 14836.69 | 864.46 | 157.49 | 1110.00
-------------------+-----------+---------+-----------+---------
10K nomatch tblent | 198412.93 | 62.17 | 5.66 | 73.08
Comparison between number of no-match table entries:
| perf-avg | lat-avg | lat-stdev | lat-99%
| (tps) | (ms) | (ms) | (ms)
-------------------+-----------+---------+-----------+---------
no-ipfw | 210658.80 | 58.01 | 5.20 | 68.73
-------------------+-----------+---------+-----------+---------
100 nomatch tblent | 200754.53 | 61.18 | 5.72 | 73.10
-------------------+-----------+---------+-----------+---------
1K nomatch tblent | 199750.35 | 61.54 | 5.73 | 72.90
-------------------+-----------+---------+-----------+---------
10K nomatch tblent | 198412.93 | 62.17 | 5.66 | 73.08
It scales pretty well with the number of no-match table entries.
En if it is compared w/ no-ipfw case, the performance and latency
impacts of the ipfw after this commit are pretty small.
Matthew Dillon [Tue, 12 Sep 2017 04:26:06 +0000 (21:26 -0700)]
hammer2 - Add daily periodic for hammer2 cleanup
* Add a daily periodic for hammer2 cleanups
Matthew Dillon [Tue, 12 Sep 2017 00:57:56 +0000 (17:57 -0700)]
installer - Add hammer2 support to the installer
* hammer2 can now be selected as a filesystem in the installer.
* Note that we still for /boot to use UFS. The boot loader *CAN*
access a hammer2 /boot, but the small size of the filesystem makes
it too easy to fill up when doing installkernel or installworld.
* Also fix a minor bug in the installer. when issuing a 'dumpon device'
be sure to first issue a 'dumpon off' to avoid dumpon complaints about
a dump device already being specified.
Matthew Dillon [Tue, 12 Sep 2017 00:55:22 +0000 (17:55 -0700)]
hammer2 - Include by default in kernel build
* Include hammer2 by default in the kernel build
Matthew Dillon [Tue, 12 Sep 2017 00:53:41 +0000 (17:53 -0700)]
hammer2 - Add 'cleanup' command, retool h2 build for conf/files inclusion
* Add a preliminary 'hammer2 cleanup' command that works similar to
hammer1.
* Retool xxhash and zlib prefixing to avoid kernel conflicts and to
allow hammer2 to be included in conf/files.
Matthew Dillon [Mon, 11 Sep 2017 21:46:31 +0000 (14:46 -0700)]
hammer2 - Limit bulkfree cpu and SSD I/O
* Limit resource utilization when running bulkfree. The default is 5000
tps (meta-data blocks per second) and can be changed via the
vfs.hammer2.bulkfree_tps sysctl.
* Designed primarily to limit cpu utilization when meta-data is cached,
and to limit SSD utilization otherwise. This feature generally cannot
be used to limit HDD utilization because it cannot currently distinguish
between cached and uncached I/O. Setting a low a number to accomodate
a HDD will cause bulkfree to take way too long to run.
Matthew Dillon [Mon, 11 Sep 2017 07:11:31 +0000 (00:11 -0700)]
kernel - Fix callout_stop/callout_reset rearm race
* If a callout_reset() occurs while a callout_stop() is running, the
callout_stop() can wind up blocking forever. Change the conditional
to break out of the processing loop to simply wait for the IPI to finish
executing, and if the callout is still armed due to a callout_reset()
the callout_stop() simply loops back to the top and retries the stop.
* Can be reproduced when itimers are used heavily (typically ghc processes
that run during a bulk synth run).
* Race tested and verified to occur, fix appears to solve the problem.
Matthew Dillon [Mon, 11 Sep 2017 07:08:24 +0000 (00:08 -0700)]
hammer2 - Fix inode count statistics
* Fix two inode count statistics bugs related to recent work.
(1) We have to zero out the stats when deleting an indirect block in
the indirect block collapse case because the underlying children
will add them back in when they are moved into the prent. This is
because we are using the 'trick' of removing the indirect block
before moving its children to avoid significant code complexity.
(2) inode_count accounting was improperly counting certain DATA records
as inodes.
Imre Vadász [Sun, 10 Sep 2017 22:21:24 +0000 (00:21 +0200)]
if_vtnet - Fix potential vtnet_set_hwaddr call before virtqueues are set up.
* When VIRTIO_NET_F_MAC isn't negotiated (i.e. the host doesn't give us
a macaddress), we generate a random mac address. The vtnet_set_hwaddr()
call to set this mac address, would have tried to use the control
virtqueue too early, where it isn't yet allocated.
* To fix this case, do the vtnet_set_hwaddr() call for this a bit later,
after the virtqueues have been set up. (We are already disabling
promiscuous mode there, so we know the controlq is working there)
Antonio Huete Jimenez [Sun, 10 Sep 2017 21:56:30 +0000 (23:56 +0200)]
vkernel - Remove 'bootdev' related sysctl.
- This fixes VKERNEL64 build.
Matthew Dillon [Sun, 10 Sep 2017 17:03:09 +0000 (10:03 -0700)]
hammer2 - Add indirect node collapse code
* Move the contents of an indirect node into its parent when either
becomes empty enough to accomodate the move. This is done during
the flush and incurs no extra overhead or I/O.
* This is not a rebalancing algorithm but it does do a pretty good
job reducing degenerate indirect nodes in the topology.
* Note that I am not using bref->leaf_count yet. This will be used
in a later rebalancing algorithm.
* Fix minor bug in hammer2_chain_create_indirect() where a chain's
bref was being tested without holding a lock on the chain.
* Remove misc debugging that we no longer need.
Matthew Dillon [Sat, 9 Sep 2017 22:59:25 +0000 (15:59 -0700)]
hammer2 - Track leaf counts for topological collapse
* Track leaf counts through indirect blocks. This is a prereq to
being able to efficiently collapse indirect nodes that have become
too empty to be useful.
* Leaf count is capped at 65535. Attempting to decrement the count will
flag the chain to recount (in a later commit).
* Because this count will be used to determine when a collapse is possible,
we do not track leaf counts through inodes. That is, an inode counts as
a leaf.
Matthew Dillon [Sat, 9 Sep 2017 17:22:53 +0000 (10:22 -0700)]
debug - Remove PG_ZERO
* PG_ZERO no longer exists, remove from debugging utilities.
Matthew Dillon [Sat, 9 Sep 2017 17:20:22 +0000 (10:20 -0700)]
kernel - Remove kernel 'bootdev' variable
* The 'bootdev' variable is no longer used. It used to default-out the
root mount to UFS using boot device info passed-in from the boot code,
but that was disconnected long ago and this code no longer serves
any purpose.
* We have depended on vfs.root.mountfrom in /boot/loader.conf to tell
the kernel where the root mount is for a long time now.
Matthew Dillon [Sat, 9 Sep 2017 17:16:36 +0000 (10:16 -0700)]
hammer2 - API breadnx / cluster_read API, bulkfree adj
* API adjustments for breadnx() and cluster_readx() calls. Properly
separate data and meta-data flagging for better swapcache operation.
* Separate cluster read parameters for data and meta-data. Default
data to 4 (+3 read-ahead) and meta-data to 1 (no read-ahead).
Matthew Dillon [Sat, 9 Sep 2017 17:15:24 +0000 (10:15 -0700)]
hammer - Adjust hammer to new breadnx / cluster_readx API
* API adjustments for breadnx() and cluster_readx() calls
Matthew Dillon [Sat, 9 Sep 2017 17:12:35 +0000 (10:12 -0700)]
kernel - Expand breadnx/breadcb/cluster_readx/cluster_readcb API
* Pass B_NOTMETA flagging into breadnx(), breadcb(), cluster_readx(),
and cluster_readcb().
Solve issues where data can wind up not being tagged B_NOTMETA
in read-ahead and clustered buffers.
* Adjust the standard bread(), breadn(), and cluster_read() inlines
to pass B_NOTMETA.
Matthew Dillon [Fri, 8 Sep 2017 18:00:40 +0000 (11:00 -0700)]
hammer2 - Improve swapcache support
* Properly set B_NOTMETA on media buffers, which allows swapcache to
distinguish between data and meta-data buffers.
This ensures that a user desiring to only swapcache meta-data does
not blow all the SSD swap on file data, particularly if there is an
excessive amount of file data.
* Greatly improves bulkfree performance on hard drives when SSD swapcache
is enabled for meta-data only.
Matthew Dillon [Fri, 8 Sep 2017 15:47:47 +0000 (08:47 -0700)]
hammer2 - Do not use transaction for bulkfree pass
* Remove the remaining transaction layer in bulkfree passes. Do not
update the modify_tid. Generally speaking, what this code does is
allow bulkfree to operate independent of normal flushes, which means
it no longer stalls flushes (or vise-versa).
Theoretically it should be ok for a normal flush to 'catch' bulkfree
modifications in the middle of their work.
Matthew Dillon [Fri, 8 Sep 2017 15:44:13 +0000 (08:44 -0700)]
kernel - vfsync() use synchronous bwrite() in low-memory situations
* For now, make vfsync() use a synchronous bwrite() in low-memory
situations and do not call vm_wait_nominal(). This could wind up
being a chicken-or-egg issue unfortunately.
* Addresses issues where the pageout daemon gets indirectly deadlocked
when other unrelated kernel threads (aka H2 support threads) are
flushing buffers.
Matthew Dillon [Fri, 8 Sep 2017 15:42:06 +0000 (08:42 -0700)]
kernel - Adjust emergency pager, add D_NOEMERGPGR
* Adjust emergency pager and pager thread tests a little. Allow the
emergency pager to also page to VCHR devices as long as D_NOEMERGPGR
is not flagged.
* Add the D_NOEMERGPGR flag and apply to "vn" and "mfs" block devices.
Matthew Dillon [Thu, 7 Sep 2017 22:42:49 +0000 (15:42 -0700)]
kernel - Adjust emergency pager
* Adjust the emergency pager to always try to move some pages from the
active queue to the inactive queue, just in case the inactive queue
has plenty of pages (and is thus does not trigger the active scan),
but none of those pages are anonymous.
* Should fix a rare low-memory deadlock situation.
Matthew Dillon [Thu, 7 Sep 2017 22:35:54 +0000 (15:35 -0700)]
kernel - Fix panic() on AMD
* The cpu_smp_stopped() function assumed that cpu_mwait_hints[] exists,
but it might not (could be NULL).
* Conditionalize to avoid insta-panicing all cpus all at the same time
when one cpu tries to stop the others during a panic(). This fixes
panics on AMD cpus.
Matthew Dillon [Thu, 7 Sep 2017 03:47:23 +0000 (20:47 -0700)]
hammer2 - Allow rm and rmdir to ignore free space reserve
* Allow rm and rmdir to ignore the free space reserve. These operations
do eat more disk space, though if you are doing a rm -rf it's actually
fairly minimal because H2 is able to optimize it heavily.
This allows the user to free up disk space by removing files. The user
must still run hammer2 bulkfree (possibly twice) to actually free the
space up.
Matthew Dillon [Thu, 7 Sep 2017 02:56:24 +0000 (19:56 -0700)]
hammer2 - Implement error processing and free reserve enforcement
* newfs_hammer2 calculates the correct amount of reserved space. We
have to reserve 4MB per 1GB, not 4MB per 2GB, due to a snafu. This
is still only 0.4% of the storage.
* Flesh out HAMMER2_ERROR_* codes and make most hammer2 functions return
a proper error code.
* Add error handling to nearly all code that can dirty a chain, in
particular to handle ENOSPC issues. Any dirty buffers that cannot be
flushed will incur a write error (which in DragonFly typically causes
the buffer to be retries later). Any dirty chain that cannot be
flushed will remain in the topology and can be completed in a later
flush if space has been freed up.
We try to avoid allowing the filesystem to get into this situation in
the first place, but if it does, it should be possible to flush these
asynchronous modifying chains and buffers once space is freed up via
bulkfree.
* Relax class match requirements in the freemap allocator when the freemap
gets close to full. This will allow e.g. inodes to be allocated out of
DATA bitmaps and vise versa, and so forth. This fixes edge conditions
where there is enough free space available but it has all been earmarked
for the wrong data class.
* Try to fix a bug in live_count tracking when destroying an indirect
block chain or inode chain that has not yet been blockmapped due to
a drop. This situation only occurs when chains cannot be flushed due
to I/O errors or disk full conditions, and are then later destroyed
(e.g. such as when the governing file is removed).
This should fix a live_count assertion that can occur under these
circumstances. See hammer2_chain_lastdrop().
* Enforce the free reserve requirement for all modifying VOP calls.
Root users can nominally fill the file system to 97.5%, non-root
users to 95%. At 90%, write()s will enforce bawrite() verses bdwrite()
to try to avoid buffer cache flushes from actually running the
filesystem out of space.
This is needed because we do not actually know how much disk space is
going to be needed at write() time. Deduplication and compression
occurs later, at buffer-flush time.
* Do NOT flush the volume header when a vfs sync is unable to completely
flush a device due to errors. This ensures that the underlying media
does not become corrupt.
* Fix an issue where bref.check.freemap.bigmask was not being properly
reset to -1 when bulkfree is able to free an element. This bug
prevented the allocator from recognizing that free space was available
in that bitmap.
* Modify bulkfree operation to use the live topology when flushing and
snapshot operations fail due to errors, allowing bulkfree to run.
* Nominal bulkfree operations now runs on the snapshot without a
transaction (more testing is needed). This theoretically should allow
bulkfree to run concurrent with just about any operation including
flushes.
* Add a freespace tracking heuristic to reduce the overhead that modifying
VOP calls incur in checking the free reserve requirement.
* hammer2 show dumps additional info for freemap nodes.
Sepherosa Ziehau [Wed, 6 Sep 2017 05:42:15 +0000 (13:42 +0800)]
ipfw: Stringent assertions.
Sepherosa Ziehau [Wed, 6 Sep 2017 05:22:51 +0000 (13:22 +0800)]
ipfw: Utilize netisr_domsg_global, which is more expressive.
While I'm here, add assertions.
Sepherosa Ziehau [Wed, 6 Sep 2017 05:11:21 +0000 (13:11 +0800)]
ipfw: Use netisr wrappers
Sepherosa Ziehau [Wed, 6 Sep 2017 05:10:54 +0000 (13:10 +0800)]
netisr: Add wrapper for lwkt_dropmsg()
Sepherosa Ziehau [Wed, 6 Sep 2017 05:02:23 +0000 (13:02 +0800)]
ipfw: It can only be configured in netisr0; make it explicit.
Imre Vadász [Sun, 3 Sep 2017 10:49:20 +0000 (12:49 +0200)]
pc64: Setup TSC_AUX register with each cpu's id, when rdtscp is available.
* This allows userland to utilize the rdtscp instruction, to associate
timestamps to physical cores, and detect migration between 2 measurements.
Imre Vadász [Tue, 5 Sep 2017 20:06:34 +0000 (22:06 +0200)]
pc64: An mfence is supposed to suffice for TSC_DEADLINE vs. xAPIC ordering.
* I accidentally used a too old version of the intel sdm documentation,
which still described that complicated serialization method, but newest
documentation claims that an mfence should be used for serializing the
xAPIC write vs. the wrmsr to the TSC_DEADLINE register.
Sepherosa Ziehau [Tue, 5 Sep 2017 05:58:43 +0000 (13:58 +0800)]
ipfw3: Simplify sockopt.