serialize: Optimize atomic_intr_cond_{enter,try,exit}()
authorSepherosa Ziehau <sephe@dragonflybsd.org>
Fri, 3 Jan 2014 12:10:32 +0000 (20:10 +0800)
committerSepherosa Ziehau <sephe@dragonflybsd.org>
Tue, 7 Jan 2014 06:58:50 +0000 (14:58 +0800)
commit24befe94b31f6491cdca91f4f4f87bbd74787db1
treebafe242ada7d68ee9aaac90c68d77566e32a4f55
parent744bce873dcf831d2979daacfcc3b3bf200128e0
serialize: Optimize atomic_intr_cond_{enter,try,exit}()

Use counter (30bits) of __atomic_intr_t as wait counter instead of request
counter:
- This avoids counter updates in atomic_intr_cond_try().
- Move counter decrement from atomic_intr_cond_exit() to
  atomic_intr_cond_enter().
- Try obtaining intr_cond first in atomic_intr_cond_enter().  If the try
  failed, counter would be incremented then.

This reduces the number of locked bus cycle intructions.
- For "try ok/exit" sequence: 4 -> 2.
- For "try fail": 3 -> 1.
- For uncontended "enter/exit" sequence: 3 -> 2

For contended "enter/exit" sequence, this increases the number of locked
bus cycle intructions from 3 to 4.  Compared with the sleep, this should
be relatively cheap.

Tested on 8 HT (i7-3770) box, using kq_accept_server/kq_connect_client:
- 4/4 TX/RX rings device (BCM5719, using MSI-X), slight improvement.
- 8/8 TX/RX rings device (Intel 82580, using MSI-X), slight improvement.
- 1/2 TX/RX rings device (Intel 82599, using MSI), no observable
  improvement.
sys/cpu/i386/include/atomic.h
sys/cpu/x86_64/include/atomic.h