kernel - Do not block indefinitely in exit1() when draining controlling tty * exit1() tries to drain the controlling terminal. This can block indefinitely and signals are no longer operational. Force a 1-second timeout for drain attempts. This is a bit of a hack but it works. Reported-by: zach, pikrzyszt, piecuch, bug #3239
kernel - Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS * Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS to procctl(2). This follows the linux and freebsd semantics, however it should be noted that since the child of a fork() clears the setting, these semantics have a fork/exit race between an exiting parent and a child which has not yet setup its death wish. * Also fix a number of signal ranging checks. Requested-by: zrj
kernel - Refactor in-kernel system call API to remove bcopy() * Change the in-kernel system call prototype to take the system call arguments as a separate pointer, and make the contents read-only. int sy_call_t (void *); int sy_call_t (struct sysmsg *sysmsg, const void *); * System calls with 6 arguments or less no longer need to copy the arguments from the trapframe to a holding structure. Instead, we simply point into the trapframe. The L1 cache footprint will be a bit smaller, but in simple tests the results are not noticably faster... maybe 1ns or so (roughly 1%).
kernel - Fix rare wait*() deadlock * It is possible for the kernel to deadlock two processes or process threads attempting to wait*() on the same pid. * Fix by adding a bit of magic to give ownership of the reaping operation to one of the waiters, and causing the other waiters to skip/reject that pid.
kernel - sigblockall()/sigunblockall() support (per thread shared page) * Implement /dev/lpmap, a per-thread RW shared page between userland and the kernel. Each thread in the process will receive a unique shared page for communication with the kernel when memory-mapping /dev/lpmap and can access varous variables via this map. * The current thread's TID is retained for both fork() and vfork(). Previously it was only retained for vfork(). This avoids userland code confusion for any bits and pieces that are indexed based on the TID. * Implement support for a per-thread block-all-signals feature that does not require any system calls (see next commit to libc). The functions will be called sigblockall() and sigunblockall(). The lpmap->blockallsigs variable prevents normal signals from being dispatched. They will still be queued to the LWP as per normal. The behavior is not quite that of a signal mask when dealing with global signals. The low 31 bits represents a recursion counter, allowing recursive use of the functions. The high bit (bit 31) is set by the kernel if a signal was prevented from being dispatched. When userland decrements the counter to 0 (the low 31 bits), it can check and clear bit 31 and if found to be set userland can then make a dummy 'real' system call to cause pending signals to be delivered. Synchronous TRAPs (e.g. kernel-generated SIGFPE, SIGSEGV, etc) are not affected by this feature and will still be dispatched synchronously. * PThreads is expected to unmap the mapped page upon thread exit. The kernel will force-unmap the page upon thread exit if pthreads does not. XXX needs work - currently if the page has not been faulted in the kernel has no visbility into the mapping and will not unmap it, but neither will it get confused if the address is accessed. To be fixed soon. Because if we don't, programs using LWP primitives instead of pthreads might not realize that libc has mapped the page. * The TID is reset to 1 on a successful exec*() * On [v]fork(), if lpmap exists for the current thread, the kernel will copy the lpmap->blockallsigs value to the lpmap for the new thread in the new process. This way sigblock*() state is retained across the [v]fork(). This feature not only reduces code confusion in userland, it also allows [v]fork() to be implemented by the userland program in a way that ensures no signal races in either the parent or the new child process until it is ready for them. * The implementation leverages our vm_map_backing extents by having the per-thread memory mappings indexed within the lwp. This allows the lwp to remove the mappings when it exits (since not doing so would result in a wild pmap entry and kernel memory disclosure). * The implementation currently delays instantiation of the mapped page(s) and some side structures until the first fault. XXX this will have to be changed.
drm - Refactor task_struct and implement mm_struct * Change td->td_linux_task from an embedded structure to a pointer. * Add p->p_linux_mm to support tracking mm_struct's. * Change the 'current' macro to test td->td_linux_task and call a support function, linux_task_alloc(), if it is NULL. * Implement callbacks from the main kernel for thread exit and process exit to support functions that drop the td_linux_task and p_linux_mm pointers. Initialize and clear these callbacks in the module load/unload in drm_drv.c * Implement required support functions in linux_sched.c
kernel - Don't block in tstop() with locks held * There are several places where the kernel improperly blocks on a STOP signal while locks might be held. This is a particular problem when PCATCH is specified e.g. in the middle of the NFS code. It is meant to catch INTR but it also improperly allowed STOP to function and left the vnode lock held. Several other places in the kernel also use PCATCH and don't expect the kernel to actually block indefinitely on a STOP. * Don't block in STOP in these situations. Simply mark the thread as stopped and wait until it tries to return to userland before actually stopping. Any kernel subsystems which desire to act on the STOP in-line instead of upon return to userland can do so manually, as long as they release all locks for the duration.
kernel - Rewrite the callout_*() API * Rewrite the entire API from scratch and improve compatibility with FreeBSD. This is not an attempt to achieve full API compatibility, as FreeBSD's API has unnecessary complexity that coders would frequently make mistakes interpreting. * Remove the IPI mechanisms in favor of fine-grained spin-locks instead. * Add some robustness features in an attempt to track down corrupted callwheel lists due to originating subsystems freeing structures out from under an active callout. * The code supports a full-blown type-stable/adhoc-reuse structural separation between the front-end and the back-end, but this feature is currently not operational and may be removed at some future point. Instead we currently just embed the struct _callout inside the struct callout. * Replace callout_stop_sync() with callout_cancel(). * callout_drain() is now implemented as a synchronous cancel instead of an asynchronous stop, which is closer to the FreeBSD API and expected operation for ported code (usb stack in particular). We will just have to fix any deadlocks which we come across. * Retain our callout_terminate() function as the 'better' way to stop using a callout, as it will not only cancel the callout but also de-flag the structure so it can no longer be used.
kernel: Remove numerous #include <sys/thread2.h>. Most of them were added when we converted spl*() calls to crit_enter()/crit_exit(), almost 14 years ago. We can now remove a good chunk of them again for where crit_*() are no longer used. I had to adjust some files that were relying on thread2.h or headers that it includes coming in via other headers that it was removed from.
kernel - Fix nstopped SMP race during core dump * During a process core dump, p->p_nstopped can be adjusted without holding p->p_token, resulting in a SMP race which can cause p_nstopped to become permanently desynchronized and deadlock the process. * Be robust in a p_nstopped handling case in kern_exit, just in case.
kernel - Fix missing tokens in killalllwps() * There appears to be at least one code path where killalllwps() is being called without the necessary tokens held. * Just have the routine itself obtain the necessary tokens. * Might be responsible for extremely rare core-dump stop/wait stalls.
system - Add wait6(), waitid(), and si_pid/si_uid siginfo support * Add the wait6() system call (header definitions taken from FreeBSD). This required rearranging kern_wait() a bit. In particular, we now maintain a hold count of 1 on the process during processing instead of releasing the hold count early. * Add waitid() to libc (waitid.c taken from FreeBSD). * Adjust manual pages (taken from FreeBSD). * Add siginfo si_pid and si_uid support. This basically allows a process taking a signal to determine where the signal came from. The fields already existed in siginfo but were not implemented. Implemented using a non-queued per-process array of signal numbers. The last originator sending any given signal is recorded and passed through to userland in the siginfo. * Fixes the 'lightdm' X display manager. lightdm relies on si_pid support. In addition, note that avoiding long lightdm related latencies and timeouts require a softlink from libmozjs-52.so to libmozjs-52.so.0 (must be addressed in dports, not addressed in this commit). Loosely-taken-from: FreeBSD (wait6, waitid support only) Reviewed-by: swildner
kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf * Use an eventcounter and the per-thread fd cache to fix bottlenecks in checkfdclosed(). This will work well for the vast majority of applications and test benches. * Batch holdfp*() operations on kqueue collections when implementing poll() and select(). This significant improves performance. Full scaling not yet achieved, however. * Increase copyin item batching from 8 to 32 for select() and poll(). * Give the uidinfo structure a pcpu array to hold the posixlocks and openfiles count fields, with a rollup contained in the uidinfo structure itself. This removes numerous global bottlenecks related to open(), close(), dup*(), and lockf operations (posixlocks count). ui_openfiles will force a rollup on limit reached to be sure that the limit was actually reached. ui_posixlocks stays fairly loose. Each cpu rolls up generally only when the pcpu count exceeds +32 or goes below -32. * Give the proc structure a pcpu array for the same counts, in order to properly support seteuid() and such. * Replace P_ADVLOCK with a char field proc->p_advlock_flag, and remove token operations around the field.
kernel - per-thread fd cache, p_fd lock bypass * Implement a per-thread (fd,fp) cache. Cache hits can keep fp's in a held state (avoiding the need to fhold()/fdrop() the ref count), and bypasses the p_fd spinlock. This allows the file pointer structure to generally be shared across cpu caches. * Can cache up to four descriptors in each thread, LRU. This is the common case. Highly threaded programs tend to focus work on a distinct file descriptors in each thread. * One file descriptor can be cached in up to four threads. This is a significant limitation, though relatively uncommon. On a cache miss the code drops into the normal shared p_fd spinlock lookup.