kernel - Refactor sysclock_t from 32 to 64 bits * Refactor the core cpu timer API, changing sysclock_t from 32 to 64 bits. Provide a full 64-bit count from all sources. * Implement muldivu64() using gcc's 128-bit integer type. This functions takes three 64-bit valus, performs (a * b) / d using a 128-bit intermediate calculation, and returns a 64-bit result. Change all timer scaling functions to use this function which effectively gives systimers the capability of handling any timeout that fits 64 bits for the timer's resolution. * Remove TSC frequency scaling, it is no longer needed. The TSC timer is now used at its full resolution. * Use atomic_fcmpset_long() instead of a clock spinlock when updating the msb bits for hardware timer sources less than 64 bits wide. * Properly recalculate existing systimers when the clock source is changed. Existing systimers were not being recalculated, leading to the system failing to boot when time sources had radically different clock frequencies.
kernel - kernel - Incidental MPLOCK removal (usched, affinity) * Affinity code needs to be protected via p->p_token and lwp->lwp_token. Remove use of the mplock. * If tid is -1 getaffinity() will lookup the lowest-numbered thread, and setaffinity will adjust ALL threads associated with the process. * usched doesn't need mplock2.h
kernel - Reduce BSS size (2) * Fix a bunch of other places in the kernel where large BSS arrays are declared. Reduces the kernel image by another ~2MB or so on top of the ~6MB saved in the last commit. * Primarily these are places where a 'struct thread' is being embedded in a structure which is being declared [MAXCPU]. With MAXCPU at 256 the result is pretty bloated. Changing the embedded thread to a thread pointer removes most of the bloat.
kernel: Move semicolon from the definition of SYSINIT() to its invocations. This affected around 70 of our (more or less) 270 SYSINIT() calls. style(9) advocates the terminating semicolon to be supplied by the invocation too, because it can make life easier for editors and other source code parsing programs.
kernel - Refactor cpumask_t to extend cpus past 64, part 1/2 * 64-bit systems only. 32-bit builds use the macros but cannot be expanded past 32 cpus. * Change cpumask_t from __uint64_t to a structure. This commit implements one 64-bit sub-element (the next one will implement four for 256 cpus). * Create a CPUMASK_*() macro API for non-atomic and atomic cpumask manipulation. These macros generally take lvalues as arguments, allowing for a fairly optimal implementation. * Change all C code operating on cpumask's to use the newly created CPUMASK_*() macro API. * Compile-test 32 and 64-bit. Run-test 64-bit. * Adjust sbin/usched, usr.sbin/powerd. usched currently needs more work.
kernel - Fix panic when usched is used to force a cpu w/the dfly scheduler * Fix a panic for 'usched dfly:0x1 sh', or other similar cpu forcing mechanic. * The scheduler was not being notified of the forced migration which caused it to assert on a sanity check later on. Add the needed infrastructure. Reported-by: vsrinivas
kernel - usched_dfly revamp (7), bring back td_release, sysv_sem, weights * Bring back the td_release kernel priority adjustment. * sysv_sem now attempts to delay wakeups until after releasing its token. * Tune default weights. * Do not depress priority until we've become the uschedcp. * Fix priority sort for LWKT and usched_dfly to avoid context-switching across all runable threads twice.
kernel - usched_dfly revamp (6), reimplement shared spinlocks & misc others * Rename gd_spinlocks_wr to just gd_spinlocks. * Reimplement shared spinlocks and optimize the shared spinlock path. Contended exclusive spinlocks are less optimal with this change. * Use shared spinlocks for all file descriptor accesses. This includes not only most IO calls like read() and write(), but also callbacks from kqueue to double-check the validity of a file descriptor. * Use getnanouptime() instead of nanouptime() in kqueue_sleep() and kern_kevent(), removing a hardware I/O serialization (to read the HPET) from the critical path. * These changes significantly reduce kernel spinlock contention when running postgres/pgbench benchmarks.
kernel - usched_dfly revamp (4), improve tail * Improve tail performance (many more cpu-bound processes than available cpus). * Experiment with removing the LWKT priority adjustments for kernel vs user. Instead give LWKT a hint about the user scheduler when scheduling a thread. LWKT's round-robin is left unhinted to hopefully round-robin starved LWKTs running in kernel mode. * Implement a better calculation for the per-thread uload than the priority. Instead, use estcpu. * Adjust default weigntings for new uload calculation scale.
kernel - usched_dfly revamp * NOTE: This introduces a few regressions at high loads. They've been identified and will be fixed in another iteration. We've identified an issue with weight2. When weight2 successfully schedules a process pair on the same cpu it can lead to inefficiencies elsewhere in the scheduler related to user-mode and kernel-mode priority switching. In this situation testing pgbench/postgres pairs (e.g. -j $ncpus -c $ncpus) we sometimes see some serious regressions on multi-socket machines, and other times see remarkably high performance. * Fix a reported panic. * Revamp the weights and algorithms signficantly. Fix algorithmic errors and improve the accuracy of weight3. Add weight4 which basically tells the scheduler to try harder to find a free cpu to schedule the lwp on when the current cpu is busy doing something else.