bge: Improve tiny packets transmission performance on low frequency CPU
Update TX HOST_PROD register a little bit often; mainly to improve
timeliness of packets transmission:
The TX HOST_PROD register is updated after certain amount of TX
descriptors are added to the hardware TX ring. The default value of the
amount of TX descriptors are 16. This value could be further tuned by
per-device sysctl node hw.bgeX.tx_wreg.
The default value improves tiny packets transmission performance w/ 5721
on AMD970@800Mhz (770Kpps -> 800Kpps) and it does not increase CPU usage
on AMD970@3500Mhz (CPU usage stays @20%, 5721 could only do 810Kpps).
NOTE: Tuning hw.bgeX.tx_coal_bds to 32 could make packets transmission
performance w/ 5721 on AMD970@800Mhz to be 810Kpps, however, this shows
no performance improvement on commonly used CPU frequency like 2200Mhz
and 3500Mhz. And this tuning could burden systems not using polling(4).