ethernet: If caller thread cpu is fixed, pass cpuid to ether_input_pkt()
So we could use optimized lwkt_sendmsg_oncpu() instead of lwkt_sendmsg(),
if the target netisr is on the same cpu as caller thread cpu. Mainly to
avoid unnecessary wakeup() IPIs to other cpus.
THE RESULT:
On i7-3770 w/ HT enabled (8 logical cpus); NIC is 82599ES w/ 8 RX rings
and 8 TX rings. Run:
repeat 10 tcp_stream -H ... -i 256 -l 10 -r
(256 netperf TCP_MAERTS instances for 10 seconds, 10 rounds)
Total amount of cross IPIs before this commit is 6946097. Total amount
of cross IPIs as of this commit is 5445324. ~22% unnecessary wakeup()
IPIs are avoided!