udp: Support asynchronized pru_send for connected socket
The result:
192.168.3.1 PhenomII 970 (runs netperf, hw 82571EB)
192.168.3.2 Phenom 9550 (runs netserver, hw 82574L)
netperf -H 192.168.3.2 -t UDP_STREAM -P0 -l 30 -- -n -m 18
(10 second `netstat -nI emx0 -w 1`, unit: pps)
old new
204736 1225536
203712 1224960
203520 1224640
202880 1228416
203392 1225408
203648 1224960
203456 1219968
203648 1224064
203712 1218880
204224 1222464
This gives ~600% tiny UDP packet TX performance improvement.
The the current tiny UDP packet TX rate (1.22Mpps) is quite near
the 1.48Mpps 1000baseT limitation.