kernel - Add per-process capability-based restrictions * This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed. Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features. * Add new system calls and a manual page for syscap_get(2) and syscap_set(2) * Add sys/caps.h * Add the "setcaps" userland utility and manual page. * Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure. * The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code.
sys/kern: Add fdatasync(2) Based on the following FreeBSD commits in 2016. 295af703a0d7987c6cf4987e7b7f5f07b3ca1221 1c1cc89580f0fbfabaf6f6c7f0f6440eef0c128e Add the syscall and also add it to pthread's cancellation point. The default behavior is same as fsync(2), which is fine but inefficient.
Implement the fexecve(2) system call The fexecve(2) function is equivalent to execve(2), except that the file to be executed is determined by the file descriptor fd instead of a pathname. The purpose of fexecve(2) is to enable executing a file which has been verified to be the intended file. It is possible to actively check the file by reading from the file descriptor and be sure that the file is not exchanged for another between the reading and the execution. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html This work is partially based on swildner's patch and FreeBSD's implementation (revisions 177787, 182191, 238220). XXX: We're missing O_EXEC support in open(2). Reviewed-by: dillon
Implement clock_nanosleep(2) system call * Extend the nanosleep1() function in kern_time.c to support the clock_nanosleep(2) system call. Add {kern,sys}_clock_nanosleep() functions and update kern_nanosleep() accordingly. * Add clock_nanosleep() syscall to syscalls.master and regenerate syscall-related files. * Update libc symbols with the new syscall. * Add clock_nanosleep() warpper in libthread_xu. * Update nanosleep.2 man page to describe clock_nanosleep(). * Update <time.h> and bump __DragonFly_version. This work is loosely based on the FreeBSD implementation: https://reviews.freebsd.org/rS315526 This clock_nanosleep(2) syscall passed all tests in the Open POSIX Test Suite [0]: conformance/interfaces/clock_nanosleep/1-1: execution: PASS conformance/interfaces/clock_nanosleep/1-2: execution: PASS conformance/interfaces/clock_nanosleep/1-3: execution: PASS conformance/interfaces/clock_nanosleep/1-4: execution: PASS conformance/interfaces/clock_nanosleep/1-5: execution: PASS conformance/interfaces/clock_nanosleep/2-1: execution: PASS conformance/interfaces/clock_nanosleep/2-2: execution: PASS conformance/interfaces/clock_nanosleep/2-3: execution: PASS conformance/interfaces/clock_nanosleep/3-1: execution: PASS conformance/interfaces/clock_nanosleep/4-1: execution: PASS conformance/interfaces/clock_nanosleep/5-1: execution: PASS conformance/interfaces/clock_nanosleep/6-1: execution: PASS conformance/interfaces/clock_nanosleep/9-1: execution: PASS conformance/interfaces/clock_nanosleep/8-1: execution: PASS conformance/interfaces/clock_nanosleep/10-1: execution: PASS conformance/interfaces/clock_nanosleep/11-1: execution: PASS conformance/interfaces/clock_nanosleep/13-1: execution: PASS conformance/interfaces/clock_nanosleep/15-1: execution: PASS [0] Open POSIX Test Suite: http://posixtest.sourceforge.net/ Reviewed-by: swildner, dillon, tuxillo, zach
<sys/sysproto.h>: Remove unneeded inclusion of <sys/sysmsg.h>. After 80d831e1ad5c5886e45827bf13837cf84baba296, which removed the struct sysmsg's in the *_args structures, this is no longer needed. It also resolves circular #include issues because that commit at the same time added #include <sys/sysproto.h> to <sys/sysmsg.h>.
kernel - more cleanup of syscall2() * Implement an actual SYS___nosys system call. * Convert one conditional to something that can use CMOV, using the new SYS___nosys system call code. * Get rid of special checks for SYS_syscall and SYS___syscall. Instead, provide real vectors for these functions. This also cleans up a few other bits of code in syscall2().
kernel - Refactor in-kernel system call API to remove bcopy() * Change the in-kernel system call prototype to take the system call arguments as a separate pointer, and make the contents read-only. int sy_call_t (void *); int sy_call_t (struct sysmsg *sysmsg, const void *); * System calls with 6 arguments or less no longer need to copy the arguments from the trapframe to a holding structure. Instead, we simply point into the trapframe. The L1 cache footprint will be a bit smaller, but in simple tests the results are not noticably faster... maybe 1ns or so (roughly 1%).
kernel: GC a few old system calls which are libc functions in DragonFly. Namely, getdomainname, setdomainname, and uname, all of which were deprecated in early FreeBSD but never really phased out. They were likely never used (as system calls) in DragonFly at all. For more information on the FreeBSD history, see FreeBSD's r184789.
kernel - Add __realpath() and getrandom() system calls * Add a kernel __realpath() system call. libc must still implement the realpath() function to handle NULL buffers (malloc()d buffer returned). The libc implementation checks the osversion for backwards compatibility before attempting to use the new system call. * Add a kernel getrandom() system call. * Bump __DragonFly_version to 500710. Suggested-by: tuxillo, mjg
libc: Change getlogin_r()'s second argument to size_t, as POSIX likes it. https://pubs.opengroup.org/onlinepubs/9699919799/functions/getlogin.html Also adjust the getconf syscall, aka _getconf() in userland, which is called by getconf(3) and getconf_r(3). Approved-by: dillon Tested-by: zrj
Add <sys/cpumask.h>. Collect and gather all scatter cpumask bits to correct headers. This cleans up the namespace and simplifies platform handling in asm macros. The cpumask_t together with its macros is already non MI feature that is used in userland utilities, libraries, kernel scheduler and syscalls. It deserves sys/ header. Adjust syscalls.master and rerun sysent. While there, fix an issue in ports that set POSIX env, but has implementation of setting thread names through pthread_set_name_np().
kernel: Make chflags syscalls argument types consistent with userland. There was an inconsistency between userland and syscalls argument types that was inherited after initial fork. Adjust prototypes to use u_long and add missing const char* too. Rerun sysent. Change tmpfs/dirfs to use u_int for flags since mask for superuser changeable flags is SF_SETTABLE 0xffff0000 (most fs use uint32_t), adjust mksubr script. Remove no longer needed (u_long) casts I could find elsewhere. While there, adjust unistd.h prototypes to use generic types too.
<unistd.h>: Fix profil(2) prototype. First off the vm_offset_t is somewhat bogus in this context, same could be said about size_t variant in sys/sysproto.h. Just use plain u_long type like it is already used in "struct uprof" and fix forth argument to take u_int. In public headers prefer to use generic types.