- Add polling(4) support for jme(4)
[dragonfly.git] / share / man / man4 / polling.4
CommitLineData
250c8cec
SW
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
984263bc 3.\"
250c8cec
SW
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\" notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\" notice, this list of conditions and the following disclaimer in the
11.\" documentation and/or other materials provided with the distribution.
984263bc 12.\"
250c8cec
SW
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $
577491c1 26.\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $
250c8cec
SW
27.\"
28.Dd October 2, 2007
984263bc
MD
29.Dt POLLING 4
30.Os
31.Sh NAME
32.Nm polling
33.Nd device polling support
34.Sh SYNOPSIS
35.Cd "options DEVICE_POLLING"
984263bc 36.Sh DESCRIPTION
250c8cec
SW
37Device polling
38.Nm (
39for brevity) refers to a technique that
40lets the operating system periodically poll devices, instead of
41relying on the devices to generate interrupts when they need attention.
984263bc
MD
42This might seem inefficient and counterintuitive, but when done
43properly,
44.Nm
45gives more control to the operating system on
46when and how to handle devices, with a number of advantages in terms
250c8cec 47of system responsiveness and performance.
984263bc
MD
48.Pp
49In particular,
50.Nm
51reduces the overhead for context
52switches which is incurred when servicing interrupts, and
250c8cec 53gives more control on the scheduling of a CPU between various
984263bc
MD
54tasks (user processes, software interrupts, device handling)
55which ultimately reduces the chances of livelock in the system.
250c8cec 56.Ss Principles of Operation
984263bc
MD
57In the normal, interrupt-based mode, devices generate an interrupt
58whenever they need attention.
59This in turn causes a
60context switch and the execution of an interrupt handler
61which performs whatever processing is needed by the device.
62The duration of the interrupt handler is potentially unbounded
63unless the device driver has been programmed with real-time
64concerns in mind (which is generally not the case for
9bb2a92d 65.Dx
984263bc 66drivers).
250c8cec 67Furthermore, under heavy traffic load, the system might be
984263bc
MD
68persistently processing interrupts without being able to
69complete other work, either in the kernel or in userland.
70.Pp
250c8cec
SW
71Device polling disables interrupts by polling devices on clock
72interrupts.
984263bc
MD
73This way, the context switch overhead is removed.
74Furthermore,
75the operating system can control accurately how much work to spend
76in handling device events, and thus prevent livelock by reserving
77some amount of CPU to other tasks.
78.Pp
250c8cec
SW
79Enabling
80.Nm
81also changes the way software network interrupts
82are scheduled, so there is never the risk of livelock because
83packets are not processed to completion.
84.Ss Enabling polling
85Currently only network interface drivers support the
86.Nm
87feature.
88It is turned on and off with help of
89.Xr ifconfig 8
90command.
577491c1
SW
91An interface does not have to be
92.Dq up
93in order to turn on its
94.Nm
95feature.
250c8cec
SW
96.Ss Loader Tunables
97The following tunables can be set from
98.Xr loader.conf 5 :
99.Bl -tag -width indent -compact
100.It Va kern.polling.enable
101If set to non-zero,
102.Nm
103is enabled.
7d5ac269 104Default is enabled.
250c8cec
SW
105.Pp
106.It Va kern.polling.cpumask
107A bitmask that controls which CPUs support device polling.
108Default is 0xffffffff.
109.El
110.Ss MIB Variables
111The operation of
112.Nm
113is controlled by the following per CPU
984263bc 114.Xr sysctl 8
250c8cec
SW
115MIB variables
116.Em ( X
117is the CPU number):
984263bc 118.Pp
250c8cec
SW
119.Bl -tag -width indent -compact
120.It Va kern.polling.X.enable
121If set to non-zero,
122.Nm
123is enabled.
7d5ac269 124Default is enabled.
250c8cec
SW
125.Pp
126.It Va kern.polling.X.pollhz
127The polling frequency, whose range is 1 to 30000.
128Default is 2000.
129.Pp
130.It Va kern.polling.cpumask
131A read only bitmask of the CPUs that support device polling.
132.Pp
133.It Va kern.polling.defcpu
134The default CPU used to run device polling (read only).
135.Pp
136.It Va kern.polling.X.user_frac
984263bc
MD
137When
138.Nm
250c8cec
SW
139is enabled, and provided that there is some work to do,
140up to this percent of the CPU cycles is reserved to userland tasks,
141the remaining fraction being available for
142.Nm
143processing.
144Default is 50.
984263bc 145.Pp
250c8cec
SW
146.It Va kern.polling.X.burst
147Maximum number of packets grabbed from each network interface in
148each timer tick.
149This number is dynamically adjusted by the kernel,
150according to the programmed
151.Va user_frac , burst_max ,
152CPU speed, and system load.
153.Pp
154.It Va kern.polling.X.each_burst
155The burst above is split into smaller chunks of this number of
156packets, going round-robin among all interfaces registered for
157.Nm .
158This prevents the case that a large burst from a single interface
159can saturate the IP interrupt queue
160.Pq Va net.inet.ip.intr_queue_maxlen .
161Default is 5.
162.Pp
163.It Va kern.polling.X.burst_max
164Upper bound for
165.Va kern.polling.burst .
166Note that when
984263bc 167.Nm
250c8cec
SW
168is enabled, each interface can receive at most
169.Pq Va pollhz No * Va burst_max
170packets per second unless there are spare CPU cycles available for
171.Nm
172in the idle loop.
173This number should be tuned to match the expected load
174(which can be quite high with GigE cards).
175Default is 150 which is adequate for 100Mbit network and pollhz=1000.
176.Pp
177.It Va kern.polling.X.reg_frac
178Controls how often (every
179.Va reg_frac No / Va pollhz
180seconds) the status registers of the device are checked for error
181conditions and the like.
182Increasing this value reduces the load on the bus, but also delays
183the error detection.
184Default is 20.
185.Pp
186.It Va kern.polling.X.handlers
187How many active devices have registered for
188.Nm .
984263bc 189.Pp
250c8cec
SW
190.It Va kern.polling.X.short_ticks
191.It Va kern.polling.X.lost_polls
192.It Va kern.polling.X.pending_polls
193.It Va kern.polling.X.residual_burst
194.It Va kern.polling.X.phase
195.It Va kern.polling.X.suspect
196.It Va kern.polling.X.stalled
197Debugging variables.
198.El
984263bc 199.Sh SUPPORTED DEVICES
250c8cec 200Device polling requires explicit modifications to the device drivers.
984263bc 201As of this writing, the
0fa5b73e 202.Xr bce 4 ,
20f020b4 203.Xr bge 4 ,
984263bc
MD
204.Xr dc 4 ,
205.Xr em 4 ,
ea303db7 206.Xr fwe 4 ,
984263bc 207.Xr fxp 4 ,
9de40864 208.Xr jme 4 ,
01fe1724 209.Xr nfe 4 ,
ea303db7
JS
210.Xr nge 4 ,
211.Xr re 4 ,
984263bc 212.Xr rl 4 ,
28e5ef00 213.Xr sis 4 ,
01fe1724
SW
214.Xr stge 4 ,
215.Xr vge 4 ,
28e5ef00 216.Xr vr 4 ,
28e5ef00 217.Xr wi 4
01fe1724
SW
218and
219.Xr xl 4
250c8cec 220devices are supported, with others in the works.
984263bc
MD
221The modifications are rather straightforward, consisting in
222the extraction of the inner part of the interrupt service routine
223and writing a callback function,
224.Fn *_poll ,
225which is invoked
226to probe the device for events and process them.
250c8cec 227(See the
984263bc 228conditionally compiled sections of the devices mentioned above
250c8cec 229for more details.)
984263bc 230.Pp
28e5ef00
SW
231In order to reduce the latency in processing packets,
232it is advisable to set the
233.Xr sysctl 8
234variable
250c8cec 235.Va kern.polling.X.pollhz
28e5ef00 236to at least 1000.
984263bc 237.Sh HISTORY
250c8cec
SW
238Device polling first appeared in
239.Fx 4.6 .
240It was rewritten in
241.Dx 1.3 .
242.Sh AUTHORS
243.An -nosplit
244The device polling code was rewritten by
245.An Matt Dillon
246based on the original code by
984263bc 247.An Luigi Rizzo Aq luigi@iet.unipi.it .
250c8cec
SW
248.An Sepherosa Ziehau
249made the polling frequency settable at runtime and added per CPU polling.