kernel - Refactor Xinvltlb a little, turn off the idle-thread invltlb opt
* Turn off the idle-thread invltlb optimization. This feature can be
turned on with a sysctl (default-off) machdep.optimized_invltlb. It
will be turned on by default when we've life-tested that it works
properly.
* Remove excess critical sections and interrupt disablements. All entries
into smp_invlpg() now occur with interrupts already disabled and the
thread already in a critical section. This also defers critical-section
1->0 transition handling away from smp_invlpg() and into its caller.
* Refactor the Xinvltlb APIs a bit. Have Xinvltlb enter the critical
section (it didn't before). Remove the critical section from
smp_inval_intr(). The critical section is now handled by the assembly,
and by any other callers.
* Add additional tsc-based loop/counter debugging to try to catch problems.
* Move inner-loop handling of smp_invltlb_mask to act on invltlbs a little
faster.
* Disable interrupts a little later inside pmap_inval_smp() and
pmap_inval_smp_cmpset().