jme: Improve tiny packets transmission performance on low frequency CPU Update TXCSR register a little bit often; mainly to improve timeliness of packets transmission: The TXCSR register is updated after certain amount of TX descriptors are added to the hardware TX ring. The default value of the amount of TX descriptors are 16. This value could be further tuned by per-device sysctl node hw.jmeX.tx_wreg. The default value improves tiny packets transmission performance w/ JMC250 on AMD970@2200Mhz (831Kpps -> 911Kpps), on AMD970@800Mhz (484Kpps -> 834Kpps) and it does not increase CPU usage on AMD970@3500Mhz (CPU usage stays @26%, JMC250 could only do 911Kpps).
jme: Rework MSI-X mapping, so RX MSI-X need not read register RX empty event rarely happens (I didn't see it even if the card is sinking full speed tiny packets on one RX ring). Put the RX empty events into independent MSI-X, so the hot path RX MSI-X need not read register at all.
jme: Don't immediately recycle the TX descriptor even if it is owned by us. This chip will always update the TX descriptor's 32bits fields in order, so even if the status field has been updated, i.e. OWN is cleared, it still does not mean that the buflen field has been updated. To avoid this race we don't immediately recycle the currently checking TX descriptor. Instead, next TX descriptor's OWN bit is checked, if it is cleared, then the updating of the currently checked TX descrptor is really done. This is intended to fix the seldom watchdog timeout that was observed on this chip. Thank devinchiu@jmicron.com very much for providing necessary information.