polling.4: Add ix(4)
[dragonfly.git] / share / man / man4 / polling.4
... / ...
CommitLineData
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\" notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\" notice, this list of conditions and the following disclaimer in the
11.\" documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $
26.\"
27.Dd May 23, 2013
28.Dt POLLING 4
29.Os
30.Sh NAME
31.Nm polling
32.Nd network device driver polling support
33.Sh SYNOPSIS
34.Cd "options IFPOLL_ENABLE"
35.Sh DESCRIPTION
36Network device polling
37.Nm (
38for brevity) refers to a technique that
39lets the operating system periodically poll network devices, instead of
40relying on the network devices to generate interrupts when they need attention.
41This might seem inefficient and counterintuitive, but when done
42properly,
43.Nm
44gives more control to the operating system on
45when and how to handle network devices, with a number of advantages in terms
46of system responsiveness and performance.
47.Pp
48In particular,
49.Nm
50reduces the overhead for context
51switches which is incurred when servicing interrupts, and
52gives more control on the scheduling of a CPU between various
53tasks (user processes, software interrupts, device handling)
54which ultimately reduces the chances of livelock in the system.
55.Ss Principles of Operation
56In the normal, interrupt-based mode, network devices generate an interrupt
57whenever they need attention.
58This in turn causes a
59context switch and the execution of an interrupt handler
60which performs whatever processing is needed by the network device.
61The duration of the interrupt handler is potentially unbounded
62unless the network device driver has been programmed with real-time
63concerns in mind (which is generally not the case for
64.Dx
65drivers).
66Furthermore, under heavy traffic load, the system might be
67persistently processing interrupts without being able to
68complete other work, either in the kernel or in userland.
69.Pp
70Network device polling disables interrupts by polling network devices on
71clock interrupts.
72This way, the context switch overhead is removed.
73Furthermore,
74the operating system can control accurately how much work to spend
75in handling network device events, and thus prevent livelock by reserving
76some amount of CPU to other tasks.
77.Pp
78Enabling
79.Nm
80also changes the way software network interrupts
81are scheduled, so there is never the risk of livelock because
82packets are not processed to completion.
83.Ss Enabling polling
84It is turned on and off with help of
85.Xr ifconfig 8
86command.
87An interface does not have to be
88.Dq up
89in order to turn on its
90.Nm
91feature.
92.Ss Loader Tunables
93The following tunables can be set from
94.Xr loader.conf 5
95.Em ( X
96is the CPU number):
97.Bl -tag -width indent -compact
98.It Va net.ifpoll.burst_max
99Default value for
100.Va net.ifpoll.X.rx.burst_max
101sysctl nodes.
102.Pp
103.It Va net.ifpoll.each_burst
104Default value for
105.Va net.ifpoll.X.rx.each_burst
106sysctl nodes.
107.Pp
108.It Va net.ifpoll.user_frac
109Default value for
110.Va net.ifpoll.X.rx.user_frac
111sysctl nodes.
112.Pp
113.It Va net.ifpoll.pollhz
114Default value for
115.Va net.ifpoll.X.pollhz
116sysctl nodes.
117.Pp
118.It Va net.ifpoll.status_frac
119Default value for
120.Va net.ifpoll.0.status_frac
121sysctl node.
122.Pp
123.It Va net.ifpoll.tx_frac
124Default value for
125.Va net.ifpoll.X.tx_frac
126sysctl nodes.
127.El
128.Ss MIB Variables
129The operation of
130.Nm
131is controlled by the following per CPU
132.Xr sysctl 8
133MIB variables
134.Em ( X
135is the CPU number):
136.Pp
137.Bl -tag -width indent -compact
138.It Va net.ifpoll.X.pollhz
139The polling frequency, whose range is 1 to 30000.
140Default is 6000.
141.Pp
142.It Va net.ifpoll.X.rx.user_frac
143When
144.Nm
145is enabled, and provided that there is some work to do,
146up to this percent of the CPU cycles is reserved to userland tasks,
147the remaining fraction being available for
148.Nm
149processing.
150Default is 50.
151.Pp
152.It Va net.ifpoll.X.rx.burst
153Maximum number of packets grabbed from each network interface in
154each timer tick.
155This number is dynamically adjusted by the kernel,
156according to the programmed
157.Va user_frac , burst_max ,
158CPU speed, and system load.
159.Pp
160.It Va net.ifpoll.X.rx.each_burst
161The burst above is split into smaller chunks of this number of
162packets, going round-robin among all interfaces registered for
163.Nm .
164This prevents the case that a large burst from a single interface
165can saturate the IP interrupt queue.
166Default is 50.
167.Pp
168.It Va net.ifpoll.X.rx.burst_max
169Upper bound for
170.Va net.ifpoll.X.rx.burst .
171Note that when
172.Nm
173is enabled, each interface can receive at most
174.Pq Va pollhz No * Va burst_max
175packets per second unless there are spare CPU cycles available for
176.Nm
177in the idle loop.
178This number should be tuned to match the expected load.
179Default is 250 which is adequate for 1000Mbit network and pollhz=6000.
180.Pp
181.It Va net.ifpoll.X.rx.handlers
182How many active network devices have registered for packet reception
183.Nm .
184.Pp
185.It Va net.ifpoll.X.tx_frac
186Controls how often (every
187.Va tx_frac No / Va pollhz
188seconds) the tranmission queue is checked for packet transmission
189done events.
190Increasing this value reduces the time spent on checking packets
191transmission done events thus reduces bus load,
192but it also increases chance
193that the transmission queue getting saturated.
194Default is 1.
195.Pp
196.It Va net.ifpoll.X.tx.handlers
197How many active network devices have registered for packet transmission
198.Nm .
199.Pp
200.It Va net.ifpoll.0.status_frac
201Controls how often (every
202.Va status_frac No / Va pollhz
203seconds) the status registers of the network device are checked for error
204conditions and the like.
205Increasing this value reduces the load on the bus,
206but also delays the error detection.
207Default is 120.
208.Pp
209.It Va net.ifpoll.0.status.handlers
210How many active network devices have registered for status
211.Nm .
212.Pp
213.It Va net.ifpoll.X.rx.short_ticks
214.It Va net.ifpoll.X.rx.lost_polls
215.It Va net.ifpoll.X.rx.pending_polls
216.It Va net.ifpoll.X.rx.residual_burst
217.It Va net.ifpoll.X.rx.phase
218.It Va net.ifpoll.X.rx.suspect
219.It Va net.ifpoll.X.rx.stalled
220.It Va net.ifpoll.X.tx.short_ticks
221.It Va net.ifpoll.X.tx.lost_polls
222.It Va net.ifpoll.X.tx.pending_polls
223.It Va net.ifpoll.X.tx.residual_burst
224.It Va net.ifpoll.X.tx.phase
225.It Va net.ifpoll.X.tx.suspect
226.It Va net.ifpoll.X.tx.stalled
227Debugging variables.
228.El
229.Sh SUPPORTED DEVICES
230Network device polling requires explicit modifications to
231the network device drivers.
232As of this writing, the
233.Xr bce 4 ,
234.Xr bge 4 ,
235.Xr bnx 4 ,
236.Xr dc 4 ,
237.Xr em 4 ,
238.Xr emx 4 ,
239.Xr fwe 4 ,
240.Xr fxp 4 ,
241.Xr igb 4 ,
242.Xr ix 4 ,
243.Xr jme 4 ,
244.Xr mxge 4 ,
245.Xr nfe 4 ,
246.Xr nge 4 ,
247.Xr re 4 ,
248.Xr rl 4 ,
249.Xr sis 4 ,
250.Xr stge 4 ,
251.Xr vge 4 ,
252.Xr vr 4 ,
253and
254.Xr xl 4
255devices are supported,
256with others in the works.
257The
258.Xr bce 4 ,
259.Xr bnx 4 ,
260.Xr emx 4 ,
261.Xr igb 4 ,
262.Xr ix 4 ,
263.Xr jme 4 ,
264and
265.Xr mxge 4 ,
266support multiple reception queues based
267.Nm .
268The
269.Xr bce 4 ,
270.Xr bnx 4 ,
271certain types of
272.Xr emx 4 ,
273.Xr igb 4 ,
274and
275.Xr ix 4
276support multiple transmission queues based
277.Nm .
278The modifications are rather straightforward, consisting in
279the extraction of the inner part of the interrupt service routine
280and writing a callback function,
281.Fn *_npoll ,
282which is invoked
283to probe the network device for events and process them.
284(See the
285conditionally compiled sections of the network devices mentioned above
286for more details.)
287.Pp
288In order to reduce the latency in processing packets,
289it is advisable to set the
290.Xr sysctl 8
291variable
292.Va net.ifpoll.X.pollhz
293to at least 1000.
294.Sh HISTORY
295Network device polling first appeared in
296.Fx 4.6 .
297It was rewritten in
298.Dx 1.3 .
299.Sh AUTHORS
300.An -nosplit
301The network device polling code was rewritten by
302.An Matt Dillon
303based on the original code by
304.An Luigi Rizzo Aq Mt luigi@iet.unipi.it .
305.An Sepherosa Ziehau
306made the polling frequency settable at runtime,
307added per CPU polling
308and added multiple reception and tranmission queue polling support.