kernel - Add usched_dfly algorith, set as default for now (8)
* Fix additional edge cases, in particular improving the process pairing
algorithm to reduce flapping.
* Reorder conditionals in dd->uschedcp assignment to improve the hot path.
* Rewrite the balancing rover. The rover will now move one process per
tick from a very heavily loaded cpu queue to a lightly loaded cpu queue.
Each cpu target is iterated by the rover, one target per tick.
* Reformulate dfly_chooseproc_locked() and friends. Add a capability to
choose the 'worst' process (from the end of the queue), which is used
by the rover.
* When pulling a random thread we require the queue it is taken from to
be MUCH more heavily loaded than our own queue, which avoids ping-ponging
processes back and forth when the load is not balanced against the number
of cpu cores (e.g. 6 servers, 4 cores).