gitweb.dragonflybsd.org Git - dragonfly.git/atom - sys/net/altq/if_altq.h history

ifnet: Add ringmap, which does ring/cpu map and generates redirect table.

2017-04-18T05:43:26Z

ifnet: Add ringmap, which does ring/cpu map and generates redirect table.

[D B] sys/net/altq/if_altq.h

ifq: Factor out if_classq from altq_classq and use it for default ifq.

2016-12-29T12:47:10Z

ifq: Factor out if_classq from altq_classq and use it for default ifq.

This reduces memory foot print for default ifq and could be used
by the upcoming "flow" of FQ-CoDel.

[D B] sys/net/altq/if_altq.h

altq: Byte counter is not compat w/ RED or RIO AQM

2014-02-06T13:21:58Z

altq: Byte counter is not compat w/ RED or RIO AQM

For altq packet schedulers, use packet counter should be enough.

Reported-by: Pierre Abbat

[D B] sys/net/altq/if_altq.h

kernel: Fix two typos, _KERNRL -> _KERNEL and ALTQ_unLOCK -> ALTQ_UNLOCK.

2013-11-21T00:45:27Z

kernel: Fix two typos, _KERNRL -> _KERNEL and ALTQ_unLOCK -> ALTQ_UNLOCK.

[D B] sys/net/altq/if_altq.h

altq: Implement two level "rough" priority queue for plain sub-queue

2013-06-08T05:47:43Z

altq: Implement two level "rough" priority queue for plain sub-queue

The "rough" part comes from two sources:
- Hardware queue could be deep, normally 512 or more even for GigE
- Round robin on the transmission queues is used by all of the multiple
  transmission queue capable hardwares supported by DragonFly as of this
  commit.
These two sources affect the packet priority set by DragonFly.

DragonFly's "rough" prority queue has only two level, i.e. high priority
and normal priority, which should be enough.  Each queue has its own
header.  The normal priority queue will be dequeue only when there is no
packets in the high priority queue.  During enqueue, if the sub-queue is
full and the high priority queue length is less than half of the sub-
queue length (both packet count and byte count), drop-head will be
applied on the normal priority queue.

M_PRIO mbuf flag is added to mark that the mbuf is destined for the high
priority queue.  Currently TCP uses it to prioritize SYN, SYN|ACK, and
pure ACK w/o FIN and RST.  This behaviour could be turn off by
net.inet.tcp.prio_synack, which is on by default.

The performance improvement!

The test environment:
All three boxes are using Intel i7-2600 w/ HT enabled

                          +-----+
                          |     |
                +->- emx1 |  B  | TCP_MAERTS
+-----+         |         |     |
|     |         |         +-----+
|  A  | bnx0 ---+
|     |         |         +-----+
+-----+         |         |     |
                +-<- emx1 |  C  | TCP_STREAM/TCP_RR
                          |     |
                          +-----+

A's kernel has this commit compiled.  bnx0 has all four transmission
queues enabled.  For bnx0, the hardware's transmission queue round-robin
is on TSO segment boundry.

Some base line measurement:
B<--A TCP_MAERTS (raw stats) (128 client): 984 Mbps
    (tcp_stream -H A -l 15 -i 128 -r)
C-->A TCP_STREAM (128 client): 942 Mbps (tcp_stream -H A -l 15 -i 128)
C-->A TCP_CC (768 client): 221199 conns/s (tcp_cc -H A -l 15 -i 768)

To effectively measure the TCP_CC, the prefix route's MSL is changed to
10ms: route change 10.1.0.0/24 -msl 10

All stats gather in the following measurement are below the base line
measurement (well, they should be).

C-->A TCP_CC improvement, during test B<--A TCP_MAERTS is running:
                        TCP_MAERTS(raw)  TCP_CC
TSO prio_synack=1       948 Mbps         15988 conns/s
TSO prio_synack=0       965 Mbps          8867 conns/s
non-TSO prio_synack=1   943 Mbps         18128 conns/s
non-TSO prio_synack=0   959 Mbps         11371 conns/s

* 80% TCP_CC performance improvement w/ TSO and 60% w/o TSO!

C-->A TCP_STREAM improvement, during test B<--A TCP_MAERTS is running:
                        TCP_MAERTS(raw)  TCP_STREAM
TSO prio_synack=1       969 Mbps         920 Mbps
TSO prio_synack=0       969 Mbps         865 Mbps
non-TSO prio_synack=1   969 Mbps         920 Mbps
non-TSO prio_synack=0   969 Mbps         879 Mbps

* 6% TCP_STREAM performance improvement w/ TSO and 4% w/o TSO.

[D B] sys/net/altq/if_altq.h

altq: Add byte based limit and counter

2013-06-07T09:18:33Z

altq: Add byte based limit and counter

- This avoids having too much mbufs sitting on the send queue for TSO
  capable devices.  Even by default, DragonFly has already limited TSO
  burst to at most 4 TCP segments, for TSO capable devices, there still
  could be 4 times mbufs sitting on the send queue compared with non-TSO
  capable devices.
- This paves way for the AQMs, which require send queue byte counter,
  e.g. CoDel.

For ethernet devices, the byte based limit is (1514 x max_packets).

For other devices, e.g. pseudo devices, the byte based limit is
(MCLBYTES x max_packets).

[D B] sys/net/altq/if_altq.h

altq: Update comment

2013-06-07T07:37:16Z

altq: Update comment

[D B] sys/net/altq/if_altq.h

ifsubque: Cut ties with ifqueue

2013-06-04T09:36:44Z

ifsubque: Cut ties with ifqueue

[D B] sys/net/altq/if_altq.h

altq: Remove the unused parameter 'mpolled' from dequeue method

2013-06-04T08:24:48Z

altq: Remove the unused parameter 'mpolled' from dequeue method

[D B] sys/net/altq/if_altq.h

ifq: Remove the unused parameter 'mpolled' from ifq dequeue interface

2013-06-04T02:00:27Z

ifq: Remove the unused parameter 'mpolled' from ifq dequeue interface

The ifq_poll() -> ifq_dequeue() model is not MPSAFE, and mpolled has
not been used, i.e. set to NULL, for years; time to let it go.

[D B] sys/net/altq/if_altq.h

ifsq: Let ifaltq_subque know its related hardware TX queue's serializer

2013-03-15T05:57:21Z

ifsq: Let ifaltq_subque know its related hardware TX queue's serializer

This avoids following operations on packet transmission hot path:
- Dereferening device driver supplied serialize function pointers
- Locating hardware TX queue's serializer

Comparing to the lwkt_serialize functions, the above two operations are
costful.

Driver changes:
- For device drivers which use the default ifnet serializer, no additional
  code will be needed, if_attach() will assign ifnet serializer to
  ifaltq_subque.
- For device drivers which use independent serializers for main function,
  RX queues and TX queues, ifsq_set_hw_serialize() must be called to
  properly assign the hardware TX queue's serializer to ifaltq_subque.
  Drivers in this category are bce(4), emx(4), igb(4) and jme(4).

[D B] sys/net/altq/if_altq.h

if: Add power of 2 mask based CPUID to subqueue mapping

2013-01-28T13:24:17Z

if: Add power of 2 mask based CPUID to subqueue mapping

[D B] sys/net/altq/if_altq.h

if: Multiple TX queue support step 3 of 3; map CPUID to subqueue

2013-01-13T10:42:45Z

if: Multiple TX queue support step 3 of 3; map CPUID to subqueue

Add CPUID to subqueue mapping method to ifaltq.  Driver could provide
its own CPUID to subqueue mapping method through ifnet.if_mapsubq,
which is used when ALTQ's packet scheduler is not enabled.  ALTQ's
packet schedulers always map CPUID to the default subqueue.

[D B] sys/net/altq/if_altq.h

if: Multiple TX queue support step 1 of many; introduce ifaltq subqueue

2013-01-11T05:31:30Z

if: Multiple TX queue support step 1 of many; introduce ifaltq subqueue

Put the plain queue information, e.g. queue header and tail, serializer,
packet staging scoreboard and ifnet.if_start schedule netmsg etc. into
its own structure (subqueue).  ifaltq structure could have multiple of
subqueues based on the count that drivers can specify.

Subqueue's enqueue, dequeue, purging and states updating are protected
by the subqueue's serializer, so for hardwares supporting multiple TX
queues, contention on queuing operation could be greatly reduced.

The subqueue is passed to if_start to let the driver know which hardware
TX queue to work on.  Only the related driver's TX queue serializer will
be held, so for hardwares supporting multiple TX queues, contention on
driver's TX queue serializer could be greatly reduced.

Bunch of ifsq_ prefixed functions are added, which is used to perform
various operations on subqueues.  Commonly used ifq_ prefixed functions
are still kept mainly for the drivers which do not support multiple TX
queues (well, these functions also ease the netif/ convertion in this
step :).

All of the pseudo network devices under sys/net are converted to use the
new subqueue operation.  netproto/802_11 is converted too.  igb(4) is
converted to use the new subqueue operation, the rest of the network
drivers are only changed for the if_start interface modification.

For ALTQs which have packet scheduler enabled, only the first subqueue
is used (*).

(*) Whether we should utilize multiple TX queues if ALTQ's packet scheduler
is enabled is quite questionable.  Mainly because hardware's multiple TX
queue packet dequeue mechanism could have negative impact on ALTQ's packet
scheduler's decision.

[D B] sys/net/altq/if_altq.h

if: Move if_start_nmsg into ifaltq; prepare multiple TX queue support

2013-01-06T12:01:27Z

if: Move if_start_nmsg into ifaltq; prepare multiple TX queue support

While im here, also rename some functions to be consistent in naming
convention.

[D B] sys/net/altq/if_altq.h

if: Move if_cpuid into ifaltq; prepare multiple TX queues support

2013-01-05T13:55:26Z

if: Move if_cpuid into ifaltq; prepare multiple TX queues support

if_cpuid and if_npoll_cpuid are merged and moved into ifaltq as
altq_cpuid, which indicates the owner CPU of the tx queue.  Since
we already have code in if_start_dispatch() to catching tx queue
owner CPU changes, this merging is quite safe.

[D B] sys/net/altq/if_altq.h

if: Move IFF_OACTIVE bit into ifaltq; prepare multiple TX queues support

2012-12-28T09:31:10Z

if: Move IFF_OACTIVE bit into ifaltq; prepare multiple TX queues support

ifaltq.altq_hw_oactive is now used to record that NIC's TX queue is full.
IFF_OACTIVE is removed from kernel.  User space IFF_OACTIVE is kept for
compability.

ifaltq.altq_hw_oactive should not be accessed directly.  Following set of
functions are provided and should be used:
ifq_is_oactive(ifnet.if_snd)  - Whether NIC's TX queue is full or not
ifq_set_oactive(ifnet.if_snd) - NIC's TX queue is full
ifq_clr_oactive(ifnet.if_snd) - NIC's TX queue is no longer full

[D B] sys/net/altq/if_altq.h

ifq/staging: Perform IFQ packet staging for if_start scheduling

2012-12-24T09:35:29Z

ifq/staging: Perform IFQ packet staging for if_start scheduling

IFQ packets staging is now performed for ifnet's if_start scheduling,
i.e. if_start_schedule(), in addition to direct ifnet's if_start calling.

IFQ packets staging stopping condition
- if_start interlock (if_snd.altq_started) is not released.
is now changed to
- if_start_schedule() is not pending on the current CPU and if_start
  interlock (if_snd.altq_started) is not released.

By setting net.link.stage_cntmax to 8 and hw.igbX.tx_wreg_nsegs to 16,
following performance improvement is gained:
+80Kpps for normal IP forwarding
+30Kpps for fast IP forwarding

[D B] sys/net/altq/if_altq.h

ifq/staging: Initial implementation of IFQ packet staging mechanism

2012-12-23T12:31:32Z

ifq/staging: Initial implementation of IFQ packet staging mechanism

The packets enqueued into IFQ are staged to a certain amount before the
ifnet's if_start is called.  In this way, the driver could avoid writing
to hardware registers upon every packet, instead, hardware registers
could be written when certain amount of packets are put onto hardware
TX ring.  The measurement on several modern NICs (emx(4), igb(4), bnx(4),
bge(4), jme(4)) shows that the hardware registers writing aggregation
could save ~20% CPU time when 18bytes UDP datagrams are transmitted at
1.48Mpps.

IFQ packets staging is performed for direct ifnet's if_start calling,
i.e. ifq_try_ifstart()

IFQ packets staging will be stopped upon any of the following conditions:
- If the count of packets enqueued on the current CPU is great than or
  equal to ifq_stage_cntmax.
- If the total length of packets enqueued on the current CPU is great
  than or equal to the hardware's MTU - max_protohdr.  max_protohdr is
  cut from the hardware's MTU mainly bacause a full TCP segment's size
  is usually less than hardware's MTU.
- if_start interlock (if_snd.altq_started) is not released.
- The if_start_rollup(), which is registered as low priority netisr
  rollup function, is called; probably because no more work is pending
  for netisr.

Currently IFQ packet staging is only performed in netisr threads.

Inspired-by: Luigi Rizzo's netmap paper
    (http://info.iet.unipi.it/~luigi/netmap/)
Also-Suggested-by: dillon@

[D B] sys/net/altq/if_altq.h

kernel: Make SMP support default (and non-optional).

2012-10-24T16:04:05Z

kernel: Make SMP support default (and non-optional).

The 'SMP' kernel option gets removed with this commit, so it has to
be removed from everybody's configs.

Reviewed-by: sjg
Approved-by: many

[D B] sys/net/altq/if_altq.h