locks: slightly depessimize lockstat
The slow path is always taken when lockstat is enabled. This induces
rdtsc (or other) calls to get the cycle count even when there was no
contention.
Still go to the slow path to not mess with the fast path, but avoid
the heavy lifting unless necessary.
This reduces sys and real time during -j 80 buildkernel:
before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total
So note this is still a significant hit.
LOCK_PROFILING results are not affected.