.\" .\" Copyright (c) 2010 .\" The DragonFly Project. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in .\" the documentation and/or other materials provided with the .\" distribution. .\" 3. Neither the name of The DragonFly Project nor the names of its .\" contributors may be used to endorse or promote products derived .\" from this software without specific, prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT .\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS .\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE .\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, .\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, .\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; .\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED .\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, .\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT .\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .Dd August 7, 2012 .Dt DSCHED 9 .Os .Sh NAME .Nm dsched , .Nm dsched_cancel_bio , .Nm dsched_debug , .Nm dsched_disk_ctx_ref , .Nm dsched_disk_ctx_unref , .Nm dsched_new_policy_thread_tdio , .Nm dsched_register , .Nm dsched_strategy_async , .Nm dsched_strategy_raw , .Nm dsched_thread_io_ref , .Nm dsched_thread_io_unref , .Nm dsched_unregister , .Nm DSCHED_POLICY_MODULE , .Nm DSCHED_DISK_CTX_LOCK , .Nm DSCHED_DISK_CTX_UNLOCK , .Nm DSCHED_THREAD_IO_LOCK , .Nm DSCHED_THREAD_IO_UNLOCK , .Nm dsched_get_bio_dp , .Nm dsched_get_bio_priv , .Nm dsched_get_disk_priv .Nd kernel disk scheduler framework .Sh SYNOPSIS .In sys/dsched.h .Pp Functions: .Ft void .Fn dsched_cancel_bio "struct bio *bp" .Ft int .Fn dsched_debug "int level" "char *fmt" "..." .Ft void .Fn dsched_disk_ctx_ref "struct dsched_disk_ctx *diskctx" .Ft void .Fn dsched_disk_ctx_unref "struct dsched_disk_ctx *diskctx" .Ft struct dsched_thread_io * .Fn dsched_new_policy_thread_tdio "struct dsched_disk_ctx *diskctx" "struct dsched_policy *pol" .Ft int .Fn dsched_register "struct dsched_policy *d_policy" .Ft void .Fn dsched_strategy_async "struct disk *dp" "struct bio *bp" "biodone_t *done" "void *priv" .Ft void .Fn dsched_strategy_raw "struct disk *dp" "struct bio *bp" .Ft void .Fn dsched_thread_io_ref "struct dsched_thread_io *tdio" .Ft void .Fn dsched_thread_io_unref "struct dsched_thread_io *tdio" .Ft int .Fn dsched_unregister "struct dsched_policy *d_policy" .Pp Macros: .Fn DSCHED_POLICY_MODULE "name" "modeventhand_t evh" "version" .Fn DSCHED_DISK_CTX_LOCK "struct dsched_disk_ctx *diskctx" .Fn DSCHED_DISK_CTX_UNLOCK "struct dsched_disk_ctx *diskctx" .Fn DSCHED_THREAD_IO_LOCK "struct dsched_thread_io *tdio" .Fn DSCHED_THREAD_IO_UNLOCK "struct dsched_thread_io *tdio" .Fn dsched_get_bio_dp "struct bio *bio" .Fn dsched_get_bio_priv "struct bio *bio" .Fn dsched_get_disk_priv "struct disk *dp" "void *priv" .Pp Callbacks: .Ft typedef int .Fn dsched_prepare_t "struct dsched_disk_ctx *diskctx" .Ft typedef void .Fn dsched_teardown_t "struct dsched_disk_ctx *diskctx" .Ft typedef void .Fn dsched_cancel_t "struct dsched_disk_ctx *diskctx" .Ft typedef int .Fn dsched_queue_t "struct dsched_disk_ctx *diskctx" "struct dsched_thread_io *tdio" "struct bio *bio" .Ft typedef void .Fn dsched_new_tdio_t "struct dsched_thread_io *tdio" .Ft typedef void .Fn dsched_destroy_tdio_t "struct dsched_thread_io *tdio" .Ft typedef void .Fn dsched_new_diskctx_t "struct dsched_disk_ctx *diskctx" .Ft typedef void .Fn dsched_destroy_diskctx_t "struct dsched_disk_ctx *diskctx" .Sh DESCRIPTION To create a new dsched policy .Sq foo the following is required: .Bd -literal DSCHED_POLICY_MODULE(dsched_foo, foo_mod_handler, 1); struct dsched_policy dsched_foo_policy = { .name = "foo", .prepare = foo_prepare, .teardown = foo_teardown, .cancel_all = foo_cancel, .bio_queue = foo_queue, /* The following are optional */ .new_tdio = foo_tdio_ctor, .new_diskctx = foo_diskctx_ctor, .destroy_tdio = foo_tdio_dtor, .destroy_diskctx = foo_diskctx_dtor }; .Ed .Pp The .Fa name is the unique identifier of the dsched policy and the name the user specifies to set this .Nm policy. .Pp The .Fa prepare callback is called whenever the new .Nm policy is set for a new disk. This can be used to create per disk threads for the .Nm policy instance. Note that any thread created during .Fa prepare will not have a .Ft dsched_thread_ctx or .Ft dsched_thread_io associated with it. If this is required because the thread will do I/O, the thread itself needs to call .Fn dsched_new_policy_thread_tdio . .Pp The .Fa teardown callback is called whenever a .Nm policy is unset/detached from a disk or when a disk is disconnected. It should clean up all per-disk resources such as any thread created in .Fa prepare . The .Nm framework guarantees that no more calls to any other method such as .Fa bio_queue will occur once .Fa teardown has been called. .Pp The .Fa cancel_all callback is called immediately before .Fa teardown . It is required to cancel all .Vt bio Ns s currently queued or stalled in the .Nm policy instance for the given disk. The .Nm framework guarantees that no more calls to any other method such as .Fa bio_queue will occur once .Fa cancel_all has been called. .Pp The .Fa bio_queue callback is called for every .Vt bio intended for the disk(s) with the given .Nm policy. It needs to either dispatch it, queue it in any other form for later dispatch, or return a non-zero return value, in which case the .Nm framework will dispatch that .Vt bio directly. If the function took care of the .Vt bio and does not want dsched to dispatch it, 0 must be returned. .Pp The .Fa new_tdio callback is called for every .Vt dsched_thread_io created for a disk with this .Nm policy. Similarly, the .Fa destroy_tdio callback is called on destruction (release of all references) of the .Vt dsched_thread_io . These functions don't have to be specified; if they are left out or set to .Dv NULL they simply won't be called. .Pp The .Fa new_diskctx callback is called for every .Vt dsched_disk_ctx created for a disk with this .Nm policy. Similarly, the .Fa destroy_diskctx callback is called on destruction (release of all references) of the .Vt dsched_disk_ctx . These functions don't have to be specified; if they are left out or set to .Dv NULL , they simply won't be called. .Pp For convenience, the structs .Vt dsched_thread_io and .Vt dsched_disk_ctx are allocated with plenty of spare space, so that each policy can extend these, for example as follows: .Bd -literal struct foo_thread_io { struct dsched_thread_io head; int foo; int bar; }; struct foo_disk_ctx { struct dsched_disk_ctx head; int foo; int bar; }; CTASSERT(sizeof(struct foo_thread_io) <= DSCHED_THREAD_IO_MAX_SZ); CTASSERT(sizeof(struct foo_disk_ctx) <= DSCHED_DISK_CTX_MAX_SZ); .Ed .Pp It is important that the first member of the new struct is one of type .Vt dsched_thread_io or .Vt dsched_disk_ctx , respectively. The .Fn CTASSERT must be used to ensure that the new structs fit into the space provided by .Nm dsched . Not including these asserts can cause serious and difficult to debug issues. For all the functions described in .Sx FUNCTIONS that require a .Vt dsched_thread_io or .Vt dsched_disk_ctx , the address of the .Fa head element should be passed, or alternatively the address of the new struct be cast to the right type and that passed. .Sh FUNCTIONS The .Fn DSCHED_POLICY_MODULE macro declares a .Nm policy kernel module. .Fa evh is the event handler for the module (see .Xr DECLARE_MODULE 9 for more information). The event handler is supposed to register a .Nm policy with .Fn dsched_register on load and to unregister it using .Fn dsched_unregister when it is unloaded. .Fa version is the version number of the module (see .Xr MODULE_VERSION 9 for more information). .Pp The .Fn dsched_strategy_async function dispatches a .Vt bio Fa bp in an asynchronous manner to the disk specified by .Fa dp . The private data .Fa priv will be attached to the .Vt bio and is later retrievable via .Fn dsched_get_bio_priv . The .Vt biodone_t routine .Fa done will be called once the .Vt bio completes. The .Fa done routine can use .Fn dsched_get_disk_priv , .Fn dsched_get_bio_dp and .Fn dsched_get_bio_priv to retrieve the context. Since .Fn dsched_strategy_async also saves the current time (via .Fn getmicrotime ) in .Fa bio->bio_caller_info3.tv , the .Fa done routine can also calculate the time passed from dispatch to completion by getting the current time again (via .Fn getmicrotime ) and calculating the timeval difference to the value stored in .Fa bio->bio_caller_info3.tv . At the end of the .Fa done routine it needs to call .Fn pop_bio and .Fn biodone as for any other .Vt biodone_t routine. .Pp The .Fn dsched_cancel_bio function cancels the .Vt bio and sets .Er ENXIO as error on the buf. .Pp The .Fn dsched_strategy_raw function simply dispatches the .Vt bio directly to the disk specified by .Fa dp using .Fn dev_dstrategy . .Pp The .Fn dsched_debug function works as a conditional .Fn kprintf . Depending on the setting of the .Va dsched.debug .Xr sysctl 8 variable, the debug info will be shown or not. .Pp The .Fn dsched_register function registers the policy described by .Fa d_policy as a valid .Nm policy which can then be used as a scheduler policy for the disks. If a policy with the given name already exists, .Er EEXIST is returned (otherwise 0). .Pp The .Fn dsched_unregister function unregisters the policy described by .Fa d_policy . The given .Nm policy will no longer be valid as a scheduler policy. If the given policy is currently in use, .Er EBUSY will be returned and the policy won't be unregistered; otherwise 0 is returned. .Pp The .Fn DSCHED_THREAD_IO_LOCK and .Fn DSCHED_THREAD_IO_UNLOCK functions lock and unlock a .Vt dsched_thread_io .Fa tdio , respectively. The lock must be held whenever the members .Fa queue and .Fa qlength are manipulated to avoid messing up the .Vt TAILQ . It can also be used to serialize any other access to the derived .Vt foo_thread_io members. .Pp The .Fn DSCHED_DISK_CTX_LOCK and .Fn DSCHED_DISK_CTX_UNLOCK functions lock and unlock a .Vt dsched_disk_ctx .Fa diskctx , respectively. The lock must be held whenever the member .Fa queue is manipulated to avoid messing up the .Vt TAILQ . It can also be used to serialize any other access to the derived .Vt foo_disk_ctx members. .Pp The .Fn dsched_thread_io_ref and .Fn dsched_thread_io_unref functions increase and decrease the reference count on a .Vt dsched_thread_io .Fa tdio , respectively. Whenever the reference count drops to 0, the .Fa tdio will be released. Be aware that it is possible that the .Nm framework holds references on the .Fa tdio , too, so it can be that the object is not freed when all references are dropped. .Pp The .Fn dsched_disk_ctx_ref and .Fn dsched_disk_ctx_unref functions increase and decrease the reference count on a .Vt dsched_disk_ctx .Fa diskctx , respectively. Whenever the reference count drops to 0, the .Fa diskctx will be released. Be aware that it is possible that the .Nm framework holds references on the .Fa diskctx , too, so it can be that the object is not freed when all references are dropped. .Pp The .Fn dsched_get_bio_dp , .Fn dsched_get_disk_priv and .Fn dsched_get_bio_priv are intended for use in the .Vt biodone_t routine specified in the call to .Fn dsched_strategy_async . .Fn dsched_get_bio_dp retrieves the .Vt struct disk associated with the .Vt bio . This can then be used to retrieve the .Vt struct dsched_disk_ctx via .Fn dsched_get_disk_priv . The .Fn dsched_get_bio_priv function returns the private data associated with the .Fa bio on the call to .Fn dsched_strategy_async . .Pp The .Fn dsched_new_policy_thread_tdio function must be called from any thread created within the .Fa prepare method that will perform I/O, since these won't have a .Vt dsched_thread_io associated with them. The function returns a new .Vt dsched_thread_io for the current thread, for the .Fa diskctx and .Fa policy specified. .Sh FILES The uncontended path of the .Nm implementation is in .Pa /sys/kern/kern_dsched.c . The data structures are in .Pa /sys/sys/dsched.h . .Sh SEE ALSO .Xr dsched 4 .Sh HISTORY The .Nm framework first appeared in .Dx 2.5 . .Sh AUTHORS The .Nm framework was written by .An Alex Hornung .