kernel - Add usched_dfly algorith, set as default for now (3)
* Add a field to the thread structure, td_wakefromcpu. All wakeup()
family calls will load this field with the cpu the thread was woken
up FROM.
* Use this field in usched_dfly to weight scheduling such that pairs
of synchronously-dependent threads (for example, a pgbench thread
and a postgres server process) are placed closer to each other in
the cpu topology.
* Weighting:
- Load matters the most
- Current cpu thread is scheduled on is next
- Synchronous wait/wakeup weighting is last
* Tests on monster yield better all-around results with a new all-time
high w/ pgbench -j 40 -c 40 -T 60 -S bench:
25% idle at 40:40 tps = 215293.173300 (excluding connections establishing)
Without the wait/wakeup weighting (but with allwload and current cpu
weighting):
41% idle at 40:40 tps = 162352.813046 (excluding connections establishing)
Without wait/wakeup or current-cpu weighting. Load balancing only:
43% idle at 40:40 tps = 159047.440641 (excluding connections establishing)