kernel - Add per-process capability-based restrictions * This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed. Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features. * Add new system calls and a manual page for syscap_get(2) and syscap_set(2) * Add sys/caps.h * Add the "setcaps" userland utility and manual page. * Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure. * The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code.
kernel - Fix ktrace's handling of system call return values * Distinguish between int and long returns, properly recording long returns (e.g. lseek(), mmap(), read(), write(), etc). kdump will then display the correct return value instead of a truncated return value for the affected system calls. * Involves including a return value type width in the sysent and adjusting the generator to calculate it.
sys/kern: Add fdatasync(2) Based on the following FreeBSD commits in 2016. 295af703a0d7987c6cf4987e7b7f5f07b3ca1221 1c1cc89580f0fbfabaf6f6c7f0f6440eef0c128e Add the syscall and also add it to pthread's cancellation point. The default behavior is same as fsync(2), which is fine but inefficient.
Implement the fexecve(2) system call The fexecve(2) function is equivalent to execve(2), except that the file to be executed is determined by the file descriptor fd instead of a pathname. The purpose of fexecve(2) is to enable executing a file which has been verified to be the intended file. It is possible to actively check the file by reading from the file descriptor and be sure that the file is not exchanged for another between the reading and the execution. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html This work is partially based on swildner's patch and FreeBSD's implementation (revisions 177787, 182191, 238220). XXX: We're missing O_EXEC support in open(2). Reviewed-by: dillon
Implement clock_nanosleep(2) system call * Extend the nanosleep1() function in kern_time.c to support the clock_nanosleep(2) system call. Add {kern,sys}_clock_nanosleep() functions and update kern_nanosleep() accordingly. * Add clock_nanosleep() syscall to syscalls.master and regenerate syscall-related files. * Update libc symbols with the new syscall. * Add clock_nanosleep() warpper in libthread_xu. * Update nanosleep.2 man page to describe clock_nanosleep(). * Update <time.h> and bump __DragonFly_version. This work is loosely based on the FreeBSD implementation: https://reviews.freebsd.org/rS315526 This clock_nanosleep(2) syscall passed all tests in the Open POSIX Test Suite [0]: conformance/interfaces/clock_nanosleep/1-1: execution: PASS conformance/interfaces/clock_nanosleep/1-2: execution: PASS conformance/interfaces/clock_nanosleep/1-3: execution: PASS conformance/interfaces/clock_nanosleep/1-4: execution: PASS conformance/interfaces/clock_nanosleep/1-5: execution: PASS conformance/interfaces/clock_nanosleep/2-1: execution: PASS conformance/interfaces/clock_nanosleep/2-2: execution: PASS conformance/interfaces/clock_nanosleep/2-3: execution: PASS conformance/interfaces/clock_nanosleep/3-1: execution: PASS conformance/interfaces/clock_nanosleep/4-1: execution: PASS conformance/interfaces/clock_nanosleep/5-1: execution: PASS conformance/interfaces/clock_nanosleep/6-1: execution: PASS conformance/interfaces/clock_nanosleep/9-1: execution: PASS conformance/interfaces/clock_nanosleep/8-1: execution: PASS conformance/interfaces/clock_nanosleep/10-1: execution: PASS conformance/interfaces/clock_nanosleep/11-1: execution: PASS conformance/interfaces/clock_nanosleep/13-1: execution: PASS conformance/interfaces/clock_nanosleep/15-1: execution: PASS [0] Open POSIX Test Suite: http://posixtest.sourceforge.net/ Reviewed-by: swildner, dillon, tuxillo, zach
kernel - more cleanup of syscall2() * Implement an actual SYS___nosys system call. * Convert one conditional to something that can use CMOV, using the new SYS___nosys system call code. * Get rid of special checks for SYS_syscall and SYS___syscall. Instead, provide real vectors for these functions. This also cleans up a few other bits of code in syscall2().
kernel: GC a few old system calls which are libc functions in DragonFly. Namely, getdomainname, setdomainname, and uname, all of which were deprecated in early FreeBSD but never really phased out. They were likely never used (as system calls) in DragonFly at all. For more information on the FreeBSD history, see FreeBSD's r184789.
kernel - Add __realpath() and getrandom() system calls * Add a kernel __realpath() system call. libc must still implement the realpath() function to handle NULL buffers (malloc()d buffer returned). The libc implementation checks the osversion for backwards compatibility before attempting to use the new system call. * Add a kernel getrandom() system call. * Bump __DragonFly_version to 500710. Suggested-by: tuxillo, mjg
system - Add wait6(), waitid(), and si_pid/si_uid siginfo support * Add the wait6() system call (header definitions taken from FreeBSD). This required rearranging kern_wait() a bit. In particular, we now maintain a hold count of 1 on the process during processing instead of releasing the hold count early. * Add waitid() to libc (waitid.c taken from FreeBSD). * Adjust manual pages (taken from FreeBSD). * Add siginfo si_pid and si_uid support. This basically allows a process taking a signal to determine where the signal came from. The fields already existed in siginfo but were not implemented. Implemented using a non-queued per-process array of signal numbers. The last originator sending any given signal is recorded and passed through to userland in the siginfo. * Fixes the 'lightdm' X display manager. lightdm relies on si_pid support. In addition, note that avoiding long lightdm related latencies and timeouts require a softlink from libmozjs-52.so to libmozjs-52.so.0 (must be addressed in dports, not addressed in this commit). Loosely-taken-from: FreeBSD (wait6, waitid support only) Reviewed-by: swildner
libc/libpthread: Add clock_getcpuclockid() and pthread_getcpuclockid(). * Adjust clock_gettime() and clock_getres() to accept values obtained this way. * Also set _POSIX_CPUTIME and _POSIX_THREAD_CPUTIME, although we should really support values obtained by these functions in clock_settime() too. Based on and taken from FreeBSD's code. Reviewed-by: sephe
kernel: Remove the COMPAT_43 kernel option along with all related code. It is commented out in our default kernel config files for almost five years now, since 9466f37df5258f3bc3d99ae43627a71c1c085e7d. Approved-by: dillon Dragonfly-bug: <https://bugs.dragonflybsd.org/issues/2946>