kernel - Greatly improve concurrent fork's and concurrent exec's
* Rewrite all the vm_fault*() API functions to use a two-stage methodology
which keeps track of whether a shared or exclusive lock is being used
on fs.first_object and fs.object. For most VM faults a shared lock is
sufficient, particularly under fork and exec circumstances.
If the shared lock is not sufficient the functions will back-down to an
exclusive lock on either or both elements.
* Implement shared chain locks for use by the above.
* kern_exec - exec_map_page() now attempts to access the page with a
shared lock first, and backs down to an exclusive lock if the page
is not conveniently available.
* vm_object ref-counting now uses atomic ops across the board. The
acquisition call can operate with a shared object lock. The deallocate
call will optimize decrementation of ref_count for values above 3 using
an atomic op without needing any lock at all.
* vm_map_split() and vm_object_collapse() and associated functions are now
smart about handling terminal (e.g. OBJT_VNODE) VM objects and will use
a shared lock when possible.
* When creating new shadow chains in front of a OBJT_VNODE object, we no
longer enter those objects onto the OBJT_VNODE object's shadow_head.
That is, only DEFAULT and SWAP objects need to track who might be shadowing
them. TODO: This code needs to be cleaned up a bit though.
This removes another exclusive object lock from the critical path.
* vm_page_grab() will use a shared object lock when possible.