polling.4: Add ix(4)
[dragonfly.git] / share / man / man4 / polling.4
CommitLineData
250c8cec
SW
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
984263bc 3.\"
250c8cec
SW
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\" notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\" notice, this list of conditions and the following disclaimer in the
11.\" documentation and/or other materials provided with the distribution.
984263bc 12.\"
250c8cec
SW
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $
250c8cec 26.\"
fa51a32c 27.Dd May 23, 2013
984263bc
MD
28.Dt POLLING 4
29.Os
30.Sh NAME
31.Nm polling
c38e5c48 32.Nd network device driver polling support
984263bc 33.Sh SYNOPSIS
c38e5c48 34.Cd "options IFPOLL_ENABLE"
984263bc 35.Sh DESCRIPTION
fa51a32c 36Network device polling
250c8cec
SW
37.Nm (
38for brevity) refers to a technique that
fa51a32c
SZ
39lets the operating system periodically poll network devices, instead of
40relying on the network devices to generate interrupts when they need attention.
984263bc
MD
41This might seem inefficient and counterintuitive, but when done
42properly,
43.Nm
44gives more control to the operating system on
fa51a32c 45when and how to handle network devices, with a number of advantages in terms
250c8cec 46of system responsiveness and performance.
984263bc
MD
47.Pp
48In particular,
49.Nm
50reduces the overhead for context
51switches which is incurred when servicing interrupts, and
250c8cec 52gives more control on the scheduling of a CPU between various
984263bc
MD
53tasks (user processes, software interrupts, device handling)
54which ultimately reduces the chances of livelock in the system.
250c8cec 55.Ss Principles of Operation
fa51a32c 56In the normal, interrupt-based mode, network devices generate an interrupt
984263bc
MD
57whenever they need attention.
58This in turn causes a
59context switch and the execution of an interrupt handler
fa51a32c 60which performs whatever processing is needed by the network device.
984263bc 61The duration of the interrupt handler is potentially unbounded
fa51a32c 62unless the network device driver has been programmed with real-time
984263bc 63concerns in mind (which is generally not the case for
9bb2a92d 64.Dx
984263bc 65drivers).
250c8cec 66Furthermore, under heavy traffic load, the system might be
984263bc
MD
67persistently processing interrupts without being able to
68complete other work, either in the kernel or in userland.
69.Pp
fa51a32c
SZ
70Network device polling disables interrupts by polling network devices on
71clock interrupts.
984263bc
MD
72This way, the context switch overhead is removed.
73Furthermore,
74the operating system can control accurately how much work to spend
fa51a32c 75in handling network device events, and thus prevent livelock by reserving
984263bc
MD
76some amount of CPU to other tasks.
77.Pp
250c8cec
SW
78Enabling
79.Nm
80also changes the way software network interrupts
81are scheduled, so there is never the risk of livelock because
82packets are not processed to completion.
83.Ss Enabling polling
250c8cec
SW
84It is turned on and off with help of
85.Xr ifconfig 8
86command.
577491c1
SW
87An interface does not have to be
88.Dq up
89in order to turn on its
90.Nm
91feature.
250c8cec
SW
92.Ss Loader Tunables
93The following tunables can be set from
c38e5c48
SZ
94.Xr loader.conf 5
95.Em ( X
96is the CPU number):
250c8cec 97.Bl -tag -width indent -compact
c38e5c48
SZ
98.It Va net.ifpoll.burst_max
99Default value for
100.Va net.ifpoll.X.rx.burst_max
101sysctl nodes.
102.Pp
103.It Va net.ifpoll.each_burst
104Default value for
105.Va net.ifpoll.X.rx.each_burst
106sysctl nodes.
107.Pp
ac65d38f
SZ
108.It Va net.ifpoll.user_frac
109Default value for
110.Va net.ifpoll.X.rx.user_frac
111sysctl nodes.
112.Pp
c38e5c48
SZ
113.It Va net.ifpoll.pollhz
114Default value for
115.Va net.ifpoll.X.pollhz
116sysctl nodes.
250c8cec 117.Pp
c38e5c48
SZ
118.It Va net.ifpoll.status_frac
119Default value for
120.Va net.ifpoll.0.status_frac
121sysctl node.
122.Pp
123.It Va net.ifpoll.tx_frac
124Default value for
125.Va net.ifpoll.X.tx_frac
126sysctl nodes.
250c8cec
SW
127.El
128.Ss MIB Variables
129The operation of
130.Nm
131is controlled by the following per CPU
984263bc 132.Xr sysctl 8
250c8cec
SW
133MIB variables
134.Em ( X
135is the CPU number):
984263bc 136.Pp
250c8cec 137.Bl -tag -width indent -compact
c38e5c48 138.It Va net.ifpoll.X.pollhz
250c8cec 139The polling frequency, whose range is 1 to 30000.
fa7903f3 140Default is 6000.
250c8cec 141.Pp
c38e5c48 142.It Va net.ifpoll.X.rx.user_frac
984263bc
MD
143When
144.Nm
250c8cec
SW
145is enabled, and provided that there is some work to do,
146up to this percent of the CPU cycles is reserved to userland tasks,
147the remaining fraction being available for
148.Nm
149processing.
150Default is 50.
984263bc 151.Pp
c38e5c48 152.It Va net.ifpoll.X.rx.burst
250c8cec
SW
153Maximum number of packets grabbed from each network interface in
154each timer tick.
155This number is dynamically adjusted by the kernel,
156according to the programmed
157.Va user_frac , burst_max ,
158CPU speed, and system load.
159.Pp
c38e5c48 160.It Va net.ifpoll.X.rx.each_burst
250c8cec
SW
161The burst above is split into smaller chunks of this number of
162packets, going round-robin among all interfaces registered for
163.Nm .
164This prevents the case that a large burst from a single interface
c38e5c48 165can saturate the IP interrupt queue.
107282b9 166Default is 50.
250c8cec 167.Pp
c38e5c48 168.It Va net.ifpoll.X.rx.burst_max
250c8cec 169Upper bound for
c38e5c48 170.Va net.ifpoll.X.rx.burst .
250c8cec 171Note that when
984263bc 172.Nm
250c8cec
SW
173is enabled, each interface can receive at most
174.Pq Va pollhz No * Va burst_max
175packets per second unless there are spare CPU cycles available for
176.Nm
177in the idle loop.
c38e5c48 178This number should be tuned to match the expected load.
fa7903f3 179Default is 250 which is adequate for 1000Mbit network and pollhz=6000.
c38e5c48
SZ
180.Pp
181.It Va net.ifpoll.X.rx.handlers
fa51a32c 182How many active network devices have registered for packet reception
c38e5c48 183.Nm .
250c8cec 184.Pp
c38e5c48 185.It Va net.ifpoll.X.tx_frac
250c8cec 186Controls how often (every
c38e5c48
SZ
187.Va tx_frac No / Va pollhz
188seconds) the tranmission queue is checked for packet transmission
189done events.
190Increasing this value reduces the time spent on checking packets
191transmission done events thus reduces bus load,
192but it also increases chance
193that the transmission queue getting saturated.
194Default is 1.
195.Pp
196.It Va net.ifpoll.X.tx.handlers
fa51a32c 197How many active network devices have registered for packet transmission
c38e5c48
SZ
198.Nm .
199.Pp
200.It Va net.ifpoll.0.status_frac
201Controls how often (every
202.Va status_frac No / Va pollhz
fa51a32c 203seconds) the status registers of the network device are checked for error
250c8cec 204conditions and the like.
c38e5c48
SZ
205Increasing this value reduces the load on the bus,
206but also delays the error detection.
fa7903f3 207Default is 120.
250c8cec 208.Pp
c38e5c48 209.It Va net.ifpoll.0.status.handlers
fa51a32c 210How many active network devices have registered for status
250c8cec 211.Nm .
984263bc 212.Pp
c38e5c48
SZ
213.It Va net.ifpoll.X.rx.short_ticks
214.It Va net.ifpoll.X.rx.lost_polls
215.It Va net.ifpoll.X.rx.pending_polls
216.It Va net.ifpoll.X.rx.residual_burst
217.It Va net.ifpoll.X.rx.phase
218.It Va net.ifpoll.X.rx.suspect
219.It Va net.ifpoll.X.rx.stalled
220.It Va net.ifpoll.X.tx.short_ticks
221.It Va net.ifpoll.X.tx.lost_polls
222.It Va net.ifpoll.X.tx.pending_polls
223.It Va net.ifpoll.X.tx.residual_burst
224.It Va net.ifpoll.X.tx.phase
225.It Va net.ifpoll.X.tx.suspect
226.It Va net.ifpoll.X.tx.stalled
250c8cec
SW
227Debugging variables.
228.El
984263bc 229.Sh SUPPORTED DEVICES
fa51a32c
SZ
230Network device polling requires explicit modifications to
231the network device drivers.
984263bc 232As of this writing, the
0fa5b73e 233.Xr bce 4 ,
20f020b4 234.Xr bge 4 ,
c38e5c48 235.Xr bnx 4 ,
984263bc
MD
236.Xr dc 4 ,
237.Xr em 4 ,
c38e5c48 238.Xr emx 4 ,
ea303db7 239.Xr fwe 4 ,
984263bc 240.Xr fxp 4 ,
c38e5c48 241.Xr igb 4 ,
11df03e3 242.Xr ix 4 ,
9de40864 243.Xr jme 4 ,
fa5612e9 244.Xr mxge 4 ,
01fe1724 245.Xr nfe 4 ,
ea303db7
JS
246.Xr nge 4 ,
247.Xr re 4 ,
984263bc 248.Xr rl 4 ,
28e5ef00 249.Xr sis 4 ,
01fe1724
SW
250.Xr stge 4 ,
251.Xr vge 4 ,
28e5ef00 252.Xr vr 4 ,
01fe1724
SW
253and
254.Xr xl 4
c38e5c48
SZ
255devices are supported,
256with others in the works.
257The
fa51a32c
SZ
258.Xr bce 4 ,
259.Xr bnx 4 ,
c38e5c48
SZ
260.Xr emx 4 ,
261.Xr igb 4 ,
11df03e3 262.Xr ix 4 ,
fa5612e9 263.Xr jme 4 ,
c38e5c48 264and
fa5612e9 265.Xr mxge 4 ,
c38e5c48
SZ
266support multiple reception queues based
267.Nm .
fa51a32c
SZ
268The
269.Xr bce 4 ,
270.Xr bnx 4 ,
271certain types of
272.Xr emx 4 ,
11df03e3 273.Xr igb 4 ,
fa51a32c 274and
11df03e3 275.Xr ix 4
fa51a32c
SZ
276support multiple transmission queues based
277.Nm .
984263bc
MD
278The modifications are rather straightforward, consisting in
279the extraction of the inner part of the interrupt service routine
280and writing a callback function,
c38e5c48 281.Fn *_npoll ,
984263bc 282which is invoked
fa51a32c 283to probe the network device for events and process them.
250c8cec 284(See the
fa51a32c 285conditionally compiled sections of the network devices mentioned above
250c8cec 286for more details.)
984263bc 287.Pp
28e5ef00
SW
288In order to reduce the latency in processing packets,
289it is advisable to set the
290.Xr sysctl 8
291variable
c38e5c48 292.Va net.ifpoll.X.pollhz
28e5ef00 293to at least 1000.
984263bc 294.Sh HISTORY
fa51a32c 295Network device polling first appeared in
250c8cec
SW
296.Fx 4.6 .
297It was rewritten in
298.Dx 1.3 .
299.Sh AUTHORS
300.An -nosplit
fa51a32c 301The network device polling code was rewritten by
250c8cec
SW
302.An Matt Dillon
303based on the original code by
e18a87e3 304.An Luigi Rizzo Aq Mt luigi@iet.unipi.it .
250c8cec 305.An Sepherosa Ziehau
c38e5c48
SZ
306made the polling frequency settable at runtime,
307added per CPU polling
fa51a32c 308and added multiple reception and tranmission queue polling support.