kernel - Add per-process capability-based restrictions * This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed. Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features. * Add new system calls and a manual page for syscap_get(2) and syscap_set(2) * Add sys/caps.h * Add the "setcaps" userland utility and manual page. * Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure. * The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code.
<sys/time.h>: Add 3rd arg to timespecadd()/sub() and make them public. * Switch to the three argument versions of the timespecadd() and timespecsub() macros. These are now the predominant ones. FreeBSD, OpenBSD, NetBSD, and Solaris (albeit only for the kernel) have them. * Make those macros public too. This allows for a number of cleanups where they were defined locally. Pointed-out-by: zrj Reviewed-by: dillon
kernel - per-thread fd cache, p_fd lock bypass * Implement a per-thread (fd,fp) cache. Cache hits can keep fp's in a held state (avoiding the need to fhold()/fdrop() the ref count), and bypasses the p_fd spinlock. This allows the file pointer structure to generally be shared across cpu caches. * Can cache up to four descriptors in each thread, LRU. This is the common case. Highly threaded programs tend to focus work on a distinct file descriptors in each thread. * One file descriptor can be cached in up to four threads. This is a significant limitation, though relatively uncommon. On a cache miss the code drops into the normal shared p_fd spinlock lookup.
smbfs - Fix mount_smbfs authentication error (but 'ls' still broken) * Fixes an authentication error with mount_smbfs. Most windows file servers require a later crypto rev and man-in-the-middle protection. * Note however that while mounting works, and files can be copied by name, 'ls' currently returns empty and the mount appears to get stuck, so more work is needed.
devfs(9): Rename DEVFS_DECLARE_CLONE_BITMAP to DEVFS_DEFINE_CLONE_BITMAP. Also, add DEVFS_DECLARE_CLONE_BITMAP() for extern declarations, analogous to MALLOC_DEFINE() and MALLOC_DECLARE(). In the sound code, replace some externs with DEVFS_DECLARE_CLONE_BITMAP() and remove one unneeded extern.
kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions. The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD because it tended to get forgotten and the code would compile anyway with the wrong constants. And since MB_WAIT and MB_DONTWAIT ended up as ocflags for an objcache_get() or objcache_reclaimlist call (which use M_WAITOK and M_NOWAIT), it was just one big converting back and forth with some sanitization in between. This commit allows M_* again for the mbuf functions and keeps the sanitizing as it was before: when M_WAITOK is among the passed flags, objcache functions will be called with M_WAITOK and when it is absent, they will be called with M_NOWAIT. All other flags are scrubbed by the MB_OCFLAG() macro which does the same as the former MBTOM(). Approved-by: dillon
Remove support for the IPX and NCP protocols, and for NWFS. This was on the list for a longer time now. FreeBSD removed it recently, too. Their commit msg has some more info: "IPX was a network transport protocol in Novell's NetWare network operating system from late 80s and then 90s. The NetWare itself switched to TCP/IP as default transport in 1998. Later, in this century the Novell Open Enterprise Server became successor of Novell NetWare. The last release that claimed to still support IPX was OES 2 in 2007. Routing equipment vendors (e.g. Cisco) discontinued support for IPX in 2011." The commit removes support for NCP (NetWare Core Protocol) and NWFS (NetWare File System) along with it (both gone from FreeBSD too since a while).
tcp: Implement asynchronous pru_connect This is mainly used to improve TCP nonblocking connect(2) performance. Before this commit the user space thread uses nonblocking connect(2) will have to wait for the netisr completes the SYN output. This could be performance hit for nonblocking connect(2). First, the user space thread is put into sleep, even if the connect(2) is nonblocking. Second, it does not make too much sense for nonblocking connect(2) to wait for the SYN output. TCP's asynchronous pru_connect implementation will set ISCONNECTING before dispatching netmsg to netisr0. The errors like EADDRNOTAVAIL, i.e. out of local port space, will be notified through kevent(2) or getsockopt(2) SOL_SOCKET/SO_ERROR. NFS and other kernel code still use old synchronized pru_connect. This commit only affects connect(2) syscall. Sysctl node kern.ipc.soconnect_async is added to enable and disable asynchronous pru_connect. It is enabled by default. The performance measurement (i7-2600 w/ bnx(4)), using tools/tools/netrate/accept_connect/kq_connect_client: kq_connect_client -4 SERVADDR -p SERVPORT -i 8 -c 32 -l 30 (8 processes, each creates 32 connections simultaniously, run 30 secs) 16 runs average: asynchronous pru_connect synchronized pru_connect 220979.89 conns/s 189106.88 conns/s This commit gives ~16% performance improvement for nonblocking connect(2)