sosendudp: Try to optimize out the additional mbuf alloc on output path This optimization leaves enough space at the beginning of the mbuf, so later on M_PREPEND() probably will not allocate addition mbuf. This probably will not benefit any data that will be fragmented, e.g. by IPv4, so this optimization is only performed when the size of data and max size of protocol+link headers fit into one mbuf cluster. This optimization could be turned off by net.inet.udp.sosend_prepend, which is on by default.
mbuf: Save linker layer, IP and TCP/UDP header length This could ease most drivers's TSO operation and avoid extra data area accessing during TSO setting up. This could also help Intel's 1000M/10G drivers' hardware checksum offloading, which requires protocol header length.
pru_send: Allow non-NULL address parameter to be passed Currently the passed in address is copied into a newly allocated memory (grr, additional blocking kmalloc), and the PRUS_FREEADDR will be set so that protocol thread could know when to free the address. Before this change netperf UDP_STREAM (unconnected socket) could only do ~200Kpps (w/ -m 18), now it could do ~990Kpps (w/ -m 18). This gives ~500% performance improvement for tiny UDP packet TX. The improvement is not as good as the connected socket, which is ~600%, mainly because of the additional memory allocation for the address. We _may_ further optimize out the address allocation.
udp: Support asynchronized pru_send for connected socket The result: 192.168.3.1 PhenomII 970 (runs netperf, hw 82571EB) 192.168.3.2 Phenom 9550 (runs netserver, hw 82574L) netperf -H 192.168.3.2 -t UDP_STREAM -P0 -l 30 -- -n -m 18 (10 second `netstat -nI emx0 -w 1`, unit: pps) old new 204736 1225536 203712 1224960 203520 1224640 202880 1228416 203392 1225408 203648 1224960 203456 1219968 203648 1224064 203712 1218880 204224 1222464 This gives ~600% tiny UDP packet TX performance improvement. The the current tiny UDP packet TX rate (1.22Mpps) is quite near the 1.48Mpps 1000baseT limitation.
network - Major netmsg retooling, part 1 * Remove all the netmsg shims and make all pr_usrreqs and some proto->pr_* requests directly netmsg'd. * Fix issues with tcp implied connects and tcp6->tcp4 fallbacks with implied connects. * Fix an issue with a stack-based udp netmsg (allocate it) * Consolidate struct ip6protosw and struct protosw into a single structure and normalize the API functions which differed between the two (primarily proto->pr_input()). * Remove protosw->pr_soport() * Replace varargs protocol *_input() functions (ongoing) with fixed arguments.