dsched - expand framework to track threads * The dsched framework now takes care of tracking threads/procs and bufs. Most of this code was factored out of dsched_fq. * fq now uses the new, much simplified API, reducing the lines of code by about 50%. * this will also allow for runtime policy switching, even to other policies that need to track procs/threads. Previously it was only possible to have one policy that tracked threads. * Now all policies can be loaded at any time and will still be able to track all the threads. * dsched_fq is now a module that can be loaded if required. Once loaded the policy is registered and ready to use with any disk. * There is also a kernel option DSCHED_FQ now; otherwise dsched_fq_load="YES" has to be set in loader.conf to be able to use fq from boot on. * Make a dsched sysctl tree. Suggested-by: Aggelos Economopoulos
dsched, dsched_fq - Major cleanup SHORT version: major cleanup and rename to useful names LONG version: dsched: * Unify dsched_ops with dsched_policy, remove fallout * Rename dsched_{create,destroy} -> dsched_disk_{create,destroy}_callback * Kill .head in dsched_policy * Kill dead code dsched_fq: * Rename fqp -> thread_io/tdio * Rename fqmp -> thread_ctx/tdctx * Rename dpriv -> disk_ctx/diskctx * Several related renames of functions (alloc/ref/unref). * Remove dead code * rename tdctx->s_* -> tdctx->interval_* * comment struct members * ... and some more changes I probably forgot Huge-Thanks-To: Aggelos Economopoulos
dsched - Add the FQ policy * Add the FQ (fair queueing) policy for the dsched I/O scheduler framework. * Right now, this is at best experimental; it only starts rate limiting when the disk is busy. Each process is allocated an equal fair amount of disk time, based on the average request latency and tps. If the disk is busy and one process exceeeds its fair share, its bios are queued for later dispatch. To avoid starvation of heavy write processes, heavy writes are interleaved once every 3 scheduler rebalances. The scheduler rebalance time is currently set to 1s, so that exceeding processes will be limited after this period. * While I've done some limited testing on switching policies at runtime and even under heavy I/O going on, it is not recommended to do this, as some problems will crop up. * Future work to do: - stabilization pass - adding bucket support (i.e. having different priority buckets for groups of processes, so that for example processes A,B and C get a total aggregate of 80% disk time, while processes D and E get a total aggregate of 20%, instead of each process getting 20%) - adding an "ionice" userland tool to allow to change the bucket/priority of a process