kernel - Change time_second to time_uptime for all expiration calculations * Vet the entire kernel and change use cases for expiration calculations using time_second to use time_uptime instead. * Protects these expiration calculations from step changes in the wall time, particularly needed for route table entries. * Probably requires further variable type adjustments but the use of time_uptime instead if time_second is highly unlikely to ever overrun any demotions to int still present.
clock/tsc: Detect invariant TSC CPU synchronization The detected result could be used to determine whether TSC could be used as cputimer or not, and could be used by other stuffs, e.g. CoDel AQM packet time stamping. - Only invariant TSC will be tested - If there is only one CPU, then invariant TSC is always synchronized - Only CPUs from Intel are tested (*) The test is conducted using lwkt_cpusync interfaces: BSP read the TSC, then ask APs to read TSC. If TSC read from any APs is less then the BSP's TSC, the invariant TSC is not synchronized across CPUs. Currently the test runs ~100ms. (*) AMD family 15h model 00h-0fh may also have synchronized TSC across CPUs as pointed out by vsrinivas@, however, according to AMD: <Revision Guide for AMD Family 15h Models 00h-0Fh Processors Rev. 3.18 October 2012> 759 One Core May Observe a Time Stamp Counter Skew AMD family 15h model 00h-0fh is _not_ ready yet.
clock/tsc: Detect invariant TSC According to Intel's description: "The invariant TSC will run at a constant rate in all ACPI P-, C-. " and T-states. ..." The difference between invariant TSC and constant TSC is that invariant TSC is not affected by frequency changes and deep ACPI C-state. Constant TSC could be detected based on the CPU model (Intel has the model list, while there is no information from AMD's document); constant TSC is not detected yet.
kernel - usched_dfly revamp (6), reimplement shared spinlocks & misc others * Rename gd_spinlocks_wr to just gd_spinlocks. * Reimplement shared spinlocks and optimize the shared spinlock path. Contended exclusive spinlocks are less optimal with this change. * Use shared spinlocks for all file descriptor accesses. This includes not only most IO calls like read() and write(), but also callbacks from kqueue to double-check the validity of a file descriptor. * Use getnanouptime() instead of nanouptime() in kqueue_sleep() and kern_kevent(), removing a hardware I/O serialization (to read the HPET) from the critical path. * These changes significantly reduce kernel spinlock contention when running postgres/pgbench benchmarks.
intr: Per-cpu MI interrupt information array - Interrupt information is only recorded in its target CPU's interrupt information array. - Interrupt threads, emergency polling threads, interrupt livelock processing and hardware interrupt threads scheduling only access the interrupt information of the CPU they are running on; they have already been locked to the interrupt's target CPU. - Location of SWI information is saved in a global array swi_info_ary, since scheduling SWI does not necessarily happens on the CPU that SWI thread is running, we need a quick and correct way find the SWI information. - Factor out sched_ithd_intern, which accept interrupt information (struct intr_info) instead of interrupt number. Split the original sched_ithd() into sched_ithd_soft(), which schedules SWI thread, and sched_ithd_hard() which schedules hardware interrupt thread. - vmstat(8) interrupt reporting w/ -v is augmented to print the interrupts' target CPU. This paves way to the per-cpu MD interrupt description table