polling: Increase default status polling fraction to 80
[dragonfly.git] / share / man / man4 / polling.4
CommitLineData
250c8cec
SW
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
984263bc 3.\"
250c8cec
SW
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\" notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\" notice, this list of conditions and the following disclaimer in the
11.\" documentation and/or other materials provided with the distribution.
984263bc 12.\"
250c8cec
SW
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $
577491c1 26.\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $
250c8cec 27.\"
c38e5c48 28.Dd November 16, 2012
984263bc
MD
29.Dt POLLING 4
30.Os
31.Sh NAME
32.Nm polling
c38e5c48 33.Nd network device driver polling support
984263bc 34.Sh SYNOPSIS
c38e5c48 35.Cd "options IFPOLL_ENABLE"
984263bc 36.Sh DESCRIPTION
250c8cec
SW
37Device polling
38.Nm (
39for brevity) refers to a technique that
40lets the operating system periodically poll devices, instead of
41relying on the devices to generate interrupts when they need attention.
984263bc
MD
42This might seem inefficient and counterintuitive, but when done
43properly,
44.Nm
45gives more control to the operating system on
46when and how to handle devices, with a number of advantages in terms
250c8cec 47of system responsiveness and performance.
984263bc
MD
48.Pp
49In particular,
50.Nm
51reduces the overhead for context
52switches which is incurred when servicing interrupts, and
250c8cec 53gives more control on the scheduling of a CPU between various
984263bc
MD
54tasks (user processes, software interrupts, device handling)
55which ultimately reduces the chances of livelock in the system.
250c8cec 56.Ss Principles of Operation
984263bc
MD
57In the normal, interrupt-based mode, devices generate an interrupt
58whenever they need attention.
59This in turn causes a
60context switch and the execution of an interrupt handler
61which performs whatever processing is needed by the device.
62The duration of the interrupt handler is potentially unbounded
63unless the device driver has been programmed with real-time
64concerns in mind (which is generally not the case for
9bb2a92d 65.Dx
984263bc 66drivers).
250c8cec 67Furthermore, under heavy traffic load, the system might be
984263bc
MD
68persistently processing interrupts without being able to
69complete other work, either in the kernel or in userland.
70.Pp
250c8cec
SW
71Device polling disables interrupts by polling devices on clock
72interrupts.
984263bc
MD
73This way, the context switch overhead is removed.
74Furthermore,
75the operating system can control accurately how much work to spend
76in handling device events, and thus prevent livelock by reserving
77some amount of CPU to other tasks.
78.Pp
250c8cec
SW
79Enabling
80.Nm
81also changes the way software network interrupts
82are scheduled, so there is never the risk of livelock because
83packets are not processed to completion.
84.Ss Enabling polling
85Currently only network interface drivers support the
86.Nm
87feature.
88It is turned on and off with help of
89.Xr ifconfig 8
90command.
577491c1
SW
91An interface does not have to be
92.Dq up
93in order to turn on its
94.Nm
95feature.
250c8cec
SW
96.Ss Loader Tunables
97The following tunables can be set from
c38e5c48
SZ
98.Xr loader.conf 5
99.Em ( X
100is the CPU number):
250c8cec 101.Bl -tag -width indent -compact
c38e5c48
SZ
102.It Va net.ifpoll.burst_max
103Default value for
104.Va net.ifpoll.X.rx.burst_max
105sysctl nodes.
106.Pp
107.It Va net.ifpoll.each_burst
108Default value for
109.Va net.ifpoll.X.rx.each_burst
110sysctl nodes.
111.Pp
112.It Va net.ifpoll.pollhz
113Default value for
114.Va net.ifpoll.X.pollhz
115sysctl nodes.
250c8cec 116.Pp
c38e5c48
SZ
117.It Va net.ifpoll.status_frac
118Default value for
119.Va net.ifpoll.0.status_frac
120sysctl node.
121.Pp
122.It Va net.ifpoll.tx_frac
123Default value for
124.Va net.ifpoll.X.tx_frac
125sysctl nodes.
250c8cec
SW
126.El
127.Ss MIB Variables
128The operation of
129.Nm
130is controlled by the following per CPU
984263bc 131.Xr sysctl 8
250c8cec
SW
132MIB variables
133.Em ( X
134is the CPU number):
984263bc 135.Pp
250c8cec 136.Bl -tag -width indent -compact
c38e5c48 137.It Va net.ifpoll.X.pollhz
250c8cec 138The polling frequency, whose range is 1 to 30000.
c38e5c48 139Default is 4000.
250c8cec 140.Pp
c38e5c48 141.It Va net.ifpoll.X.rx.user_frac
984263bc
MD
142When
143.Nm
250c8cec
SW
144is enabled, and provided that there is some work to do,
145up to this percent of the CPU cycles is reserved to userland tasks,
146the remaining fraction being available for
147.Nm
148processing.
149Default is 50.
984263bc 150.Pp
c38e5c48 151.It Va net.ifpoll.X.rx.burst
250c8cec
SW
152Maximum number of packets grabbed from each network interface in
153each timer tick.
154This number is dynamically adjusted by the kernel,
155according to the programmed
156.Va user_frac , burst_max ,
157CPU speed, and system load.
158.Pp
c38e5c48 159.It Va net.ifpoll.X.rx.each_burst
250c8cec
SW
160The burst above is split into smaller chunks of this number of
161packets, going round-robin among all interfaces registered for
162.Nm .
163This prevents the case that a large burst from a single interface
c38e5c48
SZ
164can saturate the IP interrupt queue.
165Default is 15.
250c8cec 166.Pp
c38e5c48 167.It Va net.ifpoll.X.rx.burst_max
250c8cec 168Upper bound for
c38e5c48 169.Va net.ifpoll.X.rx.burst .
250c8cec 170Note that when
984263bc 171.Nm
250c8cec
SW
172is enabled, each interface can receive at most
173.Pq Va pollhz No * Va burst_max
174packets per second unless there are spare CPU cycles available for
175.Nm
176in the idle loop.
c38e5c48
SZ
177This number should be tuned to match the expected load.
178Default is 375 which is adequate for 1000Mbit network and pollhz=4000.
179.Pp
180.It Va net.ifpoll.X.rx.handlers
181How many active devices have registered for packet reception
182.Nm .
250c8cec 183.Pp
c38e5c48 184.It Va net.ifpoll.X.tx_frac
250c8cec 185Controls how often (every
c38e5c48
SZ
186.Va tx_frac No / Va pollhz
187seconds) the tranmission queue is checked for packet transmission
188done events.
189Increasing this value reduces the time spent on checking packets
190transmission done events thus reduces bus load,
191but it also increases chance
192that the transmission queue getting saturated.
193Default is 1.
194.Pp
195.It Va net.ifpoll.X.tx.handlers
196How many active devices have registered for packet transmission
197.Nm .
198.Pp
199.It Va net.ifpoll.0.status_frac
200Controls how often (every
201.Va status_frac No / Va pollhz
250c8cec
SW
202seconds) the status registers of the device are checked for error
203conditions and the like.
c38e5c48
SZ
204Increasing this value reduces the load on the bus,
205but also delays the error detection.
21ccbc2d 206Default is 80.
250c8cec 207.Pp
c38e5c48
SZ
208.It Va net.ifpoll.0.status.handlers
209How many active devices have registered for status
250c8cec 210.Nm .
984263bc 211.Pp
c38e5c48
SZ
212.It Va net.ifpoll.X.rx.short_ticks
213.It Va net.ifpoll.X.rx.lost_polls
214.It Va net.ifpoll.X.rx.pending_polls
215.It Va net.ifpoll.X.rx.residual_burst
216.It Va net.ifpoll.X.rx.phase
217.It Va net.ifpoll.X.rx.suspect
218.It Va net.ifpoll.X.rx.stalled
219.It Va net.ifpoll.X.tx.short_ticks
220.It Va net.ifpoll.X.tx.lost_polls
221.It Va net.ifpoll.X.tx.pending_polls
222.It Va net.ifpoll.X.tx.residual_burst
223.It Va net.ifpoll.X.tx.phase
224.It Va net.ifpoll.X.tx.suspect
225.It Va net.ifpoll.X.tx.stalled
250c8cec
SW
226Debugging variables.
227.El
984263bc 228.Sh SUPPORTED DEVICES
250c8cec 229Device polling requires explicit modifications to the device drivers.
984263bc 230As of this writing, the
0fa5b73e 231.Xr bce 4 ,
20f020b4 232.Xr bge 4 ,
c38e5c48 233.Xr bnx 4 ,
984263bc
MD
234.Xr dc 4 ,
235.Xr em 4 ,
c38e5c48 236.Xr emx 4 ,
ea303db7 237.Xr fwe 4 ,
984263bc 238.Xr fxp 4 ,
c38e5c48 239.Xr igb 4 ,
9de40864 240.Xr jme 4 ,
01fe1724 241.Xr nfe 4 ,
ea303db7
JS
242.Xr nge 4 ,
243.Xr re 4 ,
984263bc 244.Xr rl 4 ,
28e5ef00 245.Xr sis 4 ,
01fe1724
SW
246.Xr stge 4 ,
247.Xr vge 4 ,
28e5ef00 248.Xr vr 4 ,
01fe1724
SW
249and
250.Xr xl 4
c38e5c48
SZ
251devices are supported,
252with others in the works.
253The
254.Xr emx 4 ,
255.Xr igb 4 ,
256and
257.Xr jme 4
258support multiple reception queues based
259.Nm .
984263bc
MD
260The modifications are rather straightforward, consisting in
261the extraction of the inner part of the interrupt service routine
262and writing a callback function,
c38e5c48 263.Fn *_npoll ,
984263bc
MD
264which is invoked
265to probe the device for events and process them.
250c8cec 266(See the
984263bc 267conditionally compiled sections of the devices mentioned above
250c8cec 268for more details.)
984263bc 269.Pp
28e5ef00
SW
270In order to reduce the latency in processing packets,
271it is advisable to set the
272.Xr sysctl 8
273variable
c38e5c48 274.Va net.ifpoll.X.pollhz
28e5ef00 275to at least 1000.
984263bc 276.Sh HISTORY
250c8cec
SW
277Device polling first appeared in
278.Fx 4.6 .
279It was rewritten in
280.Dx 1.3 .
281.Sh AUTHORS
282.An -nosplit
283The device polling code was rewritten by
284.An Matt Dillon
285based on the original code by
984263bc 286.An Luigi Rizzo Aq luigi@iet.unipi.it .
250c8cec 287.An Sepherosa Ziehau
c38e5c48
SZ
288made the polling frequency settable at runtime,
289added per CPU polling
290and added multiple reception queue polling support.