kernel - Remove dsched * After consultation, remove dsched from the kernel. The original idea is still valid but the current implementation has had lingering bugs for several years now and we've determined that it's just got its fingers into too many structures. Also, the implementation was designed before SSDs, and doesn't play well with SSDs. * Leave various empty entry points in so we can revisit at some future date.
kern/dsched: Fix a panic at proc exit * When a proc exit, dsched tries to destroy the thread context. But if the thread context is not empty, it needs to wait for the thread ios to be drained. * add a callback to wakeup the thread when the last queued io is completed and resume the in progress destruction. * Should fix #2645
kernel - Performance tuning * Use a shared lock in the exec*() code, open, close, chdir, fchdir, access, stat, and readlink. * Adjust nlookup() to allow the last namecache record in a path to be locked shared if it is already resolved, and the caller requests it. * Remove nearly all global locks from critical dsched paths. Defer creation of the tdio until an I/O actually occurs (huge savings in the fork/exit paths). * Improves fork/exec concurrency on monster of static binaries from 14200/sec to 55000/sec+. For dynamic binaries improve from around 2500/sec to 9000/sec or so (48 cores fork/exec'ing different dynamic binaries). For the same dynamic binary it's more around 5000/sec or so. Lots of issues here including the fact that all dynamic binaries load many shared resources, even hen the binaries are different programs. AKA libc.so.X and ld-elf.so.2, as well as /dev/urandom (from libc), and access numerous common path elements. Nearly all of these paths are now non-contending. The major remaining contention is in per-vm_page/PMAP manipulation. This is per-page and concurrent execs of the same program tend to pipeline so it isn't a big problem.
kernel - Fix races in disk iteration and diskctx handling * Add disk->d_refs to prevent a disk structure from being destroyed out from under an iteration. * Redo the disk_enumeration() API to use markers and d_refs. * Make adjustments to the dsched API. In particular, do not return unreferenced tdio pointers in situations where they aren't used by the caller. * Properly implement the ref count on the tdio's, one for each of the two lists the tdio belongs to, and ensure that dsched_thread_io_alloc() keeps an extra ref on the tdio after releasing the diskctx lock to prevent it from being ripped out while the code is pondering whether to place the tdio on the tdctx list. * When deleting the tdio's for a tdctx try to destroy the diskctx. That is, simply dereferencing it from the thread is not sufficient. * When deleting the tdio's for a diskctx try to destroy the tdctx. That is, simply dereferencing it from the diskctx is not sufficient. * Handle destroy/ref races.
dsched - Add request polling wrapper * Add a request polling emulation layer to dsched. This emulated request polling as if a disk driver would poll for requests instead of requests being actively pushed down. * The policy->polling_func() callback is called whenever a BIO completes. * A field in the diskctx that shows the current tag queue depth and the maximum tag queue depth (currently fixed value of 32) are used in the policies using request polling directly and is not enforced in the dsched layer. That is, a policy using request polling emulation should take care of not having (many) more BIOs in flight than max_tag_queue_depth. Sponsored-by: Google Summer of Code
dsched - Add debugging & fix rare problem conditions * Add a bunch of debugging to see whether a particular tdio was initialized and to which policy it belongs. * Reorder some locking to ensure the whole switch of policy is protected as expected. * Make sure the tdio from the newest policy is used when there are tdios for several scheduling policies in the tdctx->tdio_list. Reported-by: Brills Peng
dsched - serno support * Add support for serno loader tunables, e.g. dsched.policy.WD1293193 = "fq". * Incidentally do another name change on the loader tunables from kern.dsched.* to dsched.* to have the same namespace as the sysctls. * Add a sysctl dsched.policies showing the currently available policies.
dsched - expand framework to track threads * The dsched framework now takes care of tracking threads/procs and bufs. Most of this code was factored out of dsched_fq. * fq now uses the new, much simplified API, reducing the lines of code by about 50%. * this will also allow for runtime policy switching, even to other policies that need to track procs/threads. Previously it was only possible to have one policy that tracked threads. * Now all policies can be loaded at any time and will still be able to track all the threads. * dsched_fq is now a module that can be loaded if required. Once loaded the policy is registered and ready to use with any disk. * There is also a kernel option DSCHED_FQ now; otherwise dsched_fq_load="YES" has to be set in loader.conf to be able to use fq from boot on. * Make a dsched sysctl tree. Suggested-by: Aggelos Economopoulos
dsched, dsched_fq - Major cleanup SHORT version: major cleanup and rename to useful names LONG version: dsched: * Unify dsched_ops with dsched_policy, remove fallout * Rename dsched_{create,destroy} -> dsched_disk_{create,destroy}_callback * Kill .head in dsched_policy * Kill dead code dsched_fq: * Rename fqp -> thread_io/tdio * Rename fqmp -> thread_ctx/tdctx * Rename dpriv -> disk_ctx/diskctx * Several related renames of functions (alloc/ref/unref). * Remove dead code * rename tdctx->s_* -> tdctx->interval_* * comment struct members * ... and some more changes I probably forgot Huge-Thanks-To: Aggelos Economopoulos