kernel - Add per-process capability-based restrictions * This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed. Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features. * Add new system calls and a manual page for syscap_get(2) and syscap_set(2) * Add sys/caps.h * Add the "setcaps" userland utility and manual page. * Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure. * The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code.
kernel - Try to fix tcp ISN generator * The ISN generator couldn't stand the test of time. Very fast port reuse can catch the destination host inpcb still in a TIME_WAIT state and a bad ISN results in the destination ignoring the new SYN. The old ISN generator could wind up returning the same sequence number for fast reconnects occuring within the same tick. Reimplement the ISN generator and also make it SMP friendly and cache friendly. Because... it really wasn't before. Also attempt to modernize the monotonic sequence space algorithm, reseed the secret every 20 seconds, and make the reseeding non-disruptive to sequence space monotonicity. * Change the TH_SYN + TIME_WAIT state handling. Generally speaking it is inteded that a new SYN when the inpcb is in TIME_WAIT recycle the port/address pair and allow the new connection. The sequence space checks for the TH_SYN may have been too strict. Change the check to allow the recycling of the port/address pair as long as the SYN has a different sequence number as the previous connection. I believe this is relatively safe since the recycling can only happen if the socket is already in a TIME_WAIT state, but consider the code still under test.
kernel - Rejigger random number generator to be per-cpu 1/2 * Refactor all the kernel random number generation code to operate on a per-cpu basis. The csprng, ibaa, and l15 structures are now per-cpu. * RDRAND now runs a periodic timer callback on all available cpus rather than just on cpu 0, allowing rdrand data to mix on each cpu's rng independently. * The nrandom helper thread now chains state with an iteration between cpus, injecting a random data buffer generated from the previous cpu into the mix of the current.
kernel - Rewrite the callout_*() API * Rewrite the entire API from scratch and improve compatibility with FreeBSD. This is not an attempt to achieve full API compatibility, as FreeBSD's API has unnecessary complexity that coders would frequently make mistakes interpreting. * Remove the IPI mechanisms in favor of fine-grained spin-locks instead. * Add some robustness features in an attempt to track down corrupted callwheel lists due to originating subsystems freeing structures out from under an active callout. * The code supports a full-blown type-stable/adhoc-reuse structural separation between the front-end and the back-end, but this feature is currently not operational and may be removed at some future point. Instead we currently just embed the struct _callout inside the struct callout. * Replace callout_stop_sync() with callout_cancel(). * callout_drain() is now implemented as a synchronous cancel instead of an asynchronous stop, which is closer to the FreeBSD API and expected operation for ported code (usb stack in particular). We will just have to fix any deadlocks which we come across. * Retain our callout_terminate() function as the 'better' way to stop using a callout, as it will not only cancel the callout but also de-flag the structure so it can no longer be used.
Remove IPsec and related code from the system. It was unmaintained ever since we inherited it from FreeBSD 4.8. In fact, we had two implementations from that time: IPSEC and FAST_IPSEC. FAST_IPSEC is the implementation to which FreeBSD has moved since, but it didn't even build in DragonFly. Fixes for dports have been committed to DeltaPorts. Requested-by: dillon Dports-testing-and-fixing: zrj
tcp: Reduce minimum retransmit timeout to 190ms. Increase retransmit timeout slop to ~160ms and reduce TCPTV_MIN to ~30ms. Bring in dillon's comment about TCPTV_MIN reduction and retransmit timeout slop from FreeBSD. And make sure that tcp_rexmit_min is valid for a low kern.hz setting.