hammer2 - Optimize hammer2 support threads and dispatch
* Refactor the XOP groups in order to be able to queue strategy
calls, whenever possible, to the same CPU as the issuer. This
optimizes several cases and reduces unnecessary IPI traffic between
cores. The next best thing to do would be to not queue certain XOPs
to an H2 support thread at all, but I would like to keep the threads
intact for later clustering work.
The best scaling case for this is when one has a large number of user
threads doing I/O. One instance of a single-threaded program on
an otherwise idle machine might see a slightly reduction in performance
but at the same time we completely avoid unnecessarily spamming all
cores in the system on the behalf of a single program, so overhead is
also significantly lower.
* This will tend to increase the number of H2 support threads since
we need a certain degree of multiplication for domain separation.
* This should significantly increase I/O performance for multi-threaded
workloads.