nvme - Fix BUF_KERNPROC() SMP race
* BUF_KERNPROC() must be issued before we submit the request. The subq
lock is not sufficient to interlock request completion (which only needs
the comq lock).
* Only occurs under extreme loads, probably due to an IPI or Xinvltlb
causing enough of a pause that the completion can run. NVMe is so fast,
probably no other controller would hit this particular race condition.
* Also fix a bio queueing race which can leave a bio hanging. If no
requests are available (which can only happen under very heavy I/O
loads), the signaling to the admin thread on the next I/O completion
can race the queueing of the bio. Fix the race by making sure the
admin thread is signalled *after* queueing the bio.