kernel - Add per-process capability-based restrictions * This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed. Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features. * Add new system calls and a manual page for syscap_get(2) and syscap_set(2) * Add sys/caps.h * Add the "setcaps" userland utility and manual page. * Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure. * The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code.
kernel - Remove P_SWAPPEDOUT flag and paging mode * This code basically no longer functions in any worthwhile or useful manner, remove it. The code harkens back to a time when machines had very little memory and had to time-share processes by actually descheduling them for long periods of time (like 20 seconds) and paging out the related memory. In modern times the chooser algorithm just doesn't work well because we can no longer assume that programs with large memory footprints can be demoted. * In modern times machines have sufficient memory to rely almost entirely on the VM fault and pageout scan. The latencies caused by fault-ins are usually sufficient to demote paging-intensive processes while allowing the machine to continue to function. If functionality need to be added back in, it can be added back in on the fault path and not here.
Rename some functions to better names. devfs_find_device_by_udev() -> devfs_find_device_by_devid() dev2udev() -> devid_from_dev() udev2dev() -> dev_from_devid() This fits with the rest of the code. 'dev' usually means a cdev_t, such as in make_dev(), etc. Instead of 'udev', use 'devid', since that's what dev_t is, a "Device ID".
kernel: Remove explicit dependencies on <sys/malloc.h> in headers. All except <net/if_var.h> for now, it needs decoupling in drm first. * Include <sys/malloc.h> in foo.c if they have kmalloc()/kfree() calls. * Consistently check if MALLOC_DECLARE was declared before. * <sys/mountctl.h>: include <sys/thread.h> for _KERNEL_STRUCTURES too since the "struct journal" embeds "struct thread". * <sys/tty.h>: Only two kernel sources makes use of M_TTYS. * <sys/socketvar2.h>: Make it kernel only header.
<termios.h>: Add TABDLY, TAB0 and TAB3 to satisfy POSIX a bit better. * TAB3 is what we already have as OXTABS. Make the latter an alias of the former in <sys/_termios.h>. * Add 'tab0' and 'tab3' operands to stty(1) too. Most other output flags from the POSIX spec deal with actual time delays 'to allow for mechanical or other movement when certain characters are sent to the terminal'. Blast from the past. Taken-from: FreeBSD (with some adjustments)
kernel: Cleanup <sys/uio.h> issues. The iovec_free() inline very complicates this header inclusion. The NULL check is not always seen from <sys/_null.h>. Luckily only three kernel sources needs it: kern_subr.c, sys_generic.c and uipc_syscalls.c. Also just a single dev/drm source makes use of 'struct uio'. * Include <sys/uio.h> explicitly first in drm_fops.c to avoid kfree() macro override in drm compat layer. * Use <sys/_uio.h> where only enums and struct uio is needed, but ensure that userland will not include it for possible later <sys/user.h> use. * Stop using <sys/vnode.h> as shortcut for uiomove*() prototypes. The uiomove*() family functions possibly transfer data across kernel/user space boundary. This header presence explicitly mark sources as such. * Prefer to add <sys/uio.h> after <sys/systm.h>, but before <sys/proc.h> and definitely before <sys/malloc.h> (except for 3 mentioned sources). This will allow to remove <sys/malloc.h> from <sys/uio.h> later on. * Adjust <sys/user.h> to use component headers instead of <sys/uio.h>. While there, use opportunity for a minimal whitespace cleanup. No functional differences observed in compiler intermediates.
kernel - Refactor tty clist code * Remove all the old cruft, completely rewrite the clist code to use a single linear buffer and a FIFO mechanism. * The linear buffer just uses 16-bit elements in order to record TTY_QUOTE along with the character. * Fixes bug in last commit (lack of global locks around global clist caches) by removing the cache entirely.
kernel - Refactor tty_token, fix SMP performance issues * Remove most uses of tty_token in favor of per-tty tp->t_token. This is particularly important for removing bottlenecks related to PTYs, which are used all over the place. tty_token remains in a few places managing overall registration and global list manipulation. * tty structures are now required to be persistent. Implement a sepearate ttyinit() function. Continue to allow ttyregister() and ttyunregister() calls, but these no longer presume destruction of the structure. * Refactor ttymalloc() to take a **tty pointer and interlock allocations. Allocations are intended to be one-time. ttymalloc() only requires the tty_token for initial allocations. * Remove all critical section use that was combined with tty_token and tp->t_token. Leave only the tokens. The critical sections were hold-overs going all the way back to pre-SMP days. * syscons now gets its own token, vga_token. The ISA VGA code and the framebuffer code also now use this token instead of tty_token. * The keyboard subsystem now uses kbd_token instead of tty_token. * A few remaining serial-like devices (snp, nmdm) also get their own tokens, as well as use the now required tp->t_token. * Remove use of tty_token in the session management code. This fixes a niggling performance path since sessions almost universally go hand-in-hand with fork/exec/exit sequences. Instead we use the already-existing per-hash session token.
kernel: Remove the COMPAT_43 kernel option along with all related code. It is commented out in our default kernel config files for almost five years now, since 9466f37df5258f3bc3d99ae43627a71c1c085e7d. Approved-by: dillon Dragonfly-bug: <https://bugs.dragonflybsd.org/issues/2946>
kernel - Redo struct vmspace allocator and ref-count handling. * Get rid of the sysref-based allocator and ref-count handler and replace with objcache. Replace all sysref API calls in other kernel modules with vmspace_*() API calls (adding new API calls as needed). * Roll-our-own hopefully safer ref-count handling. We get rid of exitingcnt and instead just leave holdcnt bumped during the exit/reap sequence. We add vm_refcnt and redo vm_holdcnt. Now a formal reference (vm_refcnt) is ALSO covered by a holdcnt. Stage-1 termination occurs when vm_refcnt transitions from 1->0. Stage-2 termination occurs when vm_holdcnt transitions from 1->0. * Should fix rare reported panic under heavy load.
kernel - Performance tuning (3) * The VOP_CLOSE issues revealed a bigger issue with vn_lock(). Many callers do not check the return code for vn_lock() and in nearly all of those cases it wouldn't fail anyway due to a prior ref, but it creates an API issue. * Add the LK_FAILRECLAIM flag to vn_lock(). This flag explicitly allows vn_lock() to fail if the vnode is undergoing reclamation. This fixes numerous issues, particularly when VOP_CLOSE() is called during a reclaim due to recent LK_UPGRADE's that we do in some VFS *_close() functions. * Remove some unused LK_ defines.
kernel - proc_token removal pass stage 1/2 * Remove proc_token use from all subsystems except kern/kern_proc.c. * The token had become mostly useless in these subsystems now that process locking is more fine-grained. Do the final wipe of proc_token except for allproc/zombproc list use in kern_proc.c