kernel - Reduce excessive rdrand harvesting
* Our rdrand driver harvests 512 bytes on each cpu thread at a rate
of 10hz. Ryzen CPUs appear to burn about 0.73uS per word, creating
an overhead of about 460uS/sec on EACH cpu thread in the system.
When added to the even higher overhead of the add_buffer_randomness()
call, the result was a roughly 3% loss of performance across the board.
* Reduce the harvest size to 16 bytes, which honestly is still plenty
of entropy to inject.
* Change some symbolic branch targets to local branch targets in the
rdrand and padlock code to avoid generating symbols that can cause
weird output in our PC sampler (I was getting 'loop+N' and 'out+N'
while testing the above).