kernel - Fix floating point save state structure and minor npx issues
* The floating point save structure(s) used by the kernel and possibly
also userland were too large for x86-64 due to a porting error where
'long' variables were left intact that should have been turned into
32-bit variables.
No known adverse effect to the too-large structures but we have to get
it right.
* npxexit() was not being called in a kernel thread exit case. Kernel
threads do not use the FP unit so the case was never hit, but fix it
anyway.
* Move a critical section to cover a flags test to handle a very rare
preemptive thread switch issue. Since the preempting thread is a
kernel thread which does not use the FP unit this case was never hit,
but fix it anyway.