1 .\" Copyright (c) 2002 Luigi Rizzo
2 .\" All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
13 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25 .\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $
26 .\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $
33 .Nd network device driver polling support
35 .Cd "options IFPOLL_ENABLE"
39 for brevity) refers to a technique that
40 lets the operating system periodically poll devices, instead of
41 relying on the devices to generate interrupts when they need attention.
42 This might seem inefficient and counterintuitive, but when done
45 gives more control to the operating system on
46 when and how to handle devices, with a number of advantages in terms
47 of system responsiveness and performance.
51 reduces the overhead for context
52 switches which is incurred when servicing interrupts, and
53 gives more control on the scheduling of a CPU between various
54 tasks (user processes, software interrupts, device handling)
55 which ultimately reduces the chances of livelock in the system.
56 .Ss Principles of Operation
57 In the normal, interrupt-based mode, devices generate an interrupt
58 whenever they need attention.
60 context switch and the execution of an interrupt handler
61 which performs whatever processing is needed by the device.
62 The duration of the interrupt handler is potentially unbounded
63 unless the device driver has been programmed with real-time
64 concerns in mind (which is generally not the case for
67 Furthermore, under heavy traffic load, the system might be
68 persistently processing interrupts without being able to
69 complete other work, either in the kernel or in userland.
71 Device polling disables interrupts by polling devices on clock
73 This way, the context switch overhead is removed.
75 the operating system can control accurately how much work to spend
76 in handling device events, and thus prevent livelock by reserving
77 some amount of CPU to other tasks.
81 also changes the way software network interrupts
82 are scheduled, so there is never the risk of livelock because
83 packets are not processed to completion.
85 Currently only network interface drivers support the
88 It is turned on and off with help of
91 An interface does not have to be
93 in order to turn on its
97 The following tunables can be set from
101 .Bl -tag -width indent -compact
102 .It Va net.ifpoll.burst_max
104 .Va net.ifpoll.X.rx.burst_max
107 .It Va net.ifpoll.each_burst
109 .Va net.ifpoll.X.rx.each_burst
112 .It Va net.ifpoll.user_frac
114 .Va net.ifpoll.X.rx.user_frac
117 .It Va net.ifpoll.pollhz
119 .Va net.ifpoll.X.pollhz
122 .It Va net.ifpoll.status_frac
124 .Va net.ifpoll.0.status_frac
127 .It Va net.ifpoll.tx_frac
129 .Va net.ifpoll.X.tx_frac
135 is controlled by the following per CPU
141 .Bl -tag -width indent -compact
142 .It Va net.ifpoll.X.pollhz
143 The polling frequency, whose range is 1 to 30000.
146 .It Va net.ifpoll.X.rx.user_frac
149 is enabled, and provided that there is some work to do,
150 up to this percent of the CPU cycles is reserved to userland tasks,
151 the remaining fraction being available for
156 .It Va net.ifpoll.X.rx.burst
157 Maximum number of packets grabbed from each network interface in
159 This number is dynamically adjusted by the kernel,
160 according to the programmed
161 .Va user_frac , burst_max ,
162 CPU speed, and system load.
164 .It Va net.ifpoll.X.rx.each_burst
165 The burst above is split into smaller chunks of this number of
166 packets, going round-robin among all interfaces registered for
168 This prevents the case that a large burst from a single interface
169 can saturate the IP interrupt queue.
172 .It Va net.ifpoll.X.rx.burst_max
174 .Va net.ifpoll.X.rx.burst .
177 is enabled, each interface can receive at most
178 .Pq Va pollhz No * Va burst_max
179 packets per second unless there are spare CPU cycles available for
182 This number should be tuned to match the expected load.
183 Default is 375 which is adequate for 1000Mbit network and pollhz=4000.
185 .It Va net.ifpoll.X.rx.handlers
186 How many active devices have registered for packet reception
189 .It Va net.ifpoll.X.tx_frac
190 Controls how often (every
191 .Va tx_frac No / Va pollhz
192 seconds) the tranmission queue is checked for packet transmission
194 Increasing this value reduces the time spent on checking packets
195 transmission done events thus reduces bus load,
196 but it also increases chance
197 that the transmission queue getting saturated.
200 .It Va net.ifpoll.X.tx.handlers
201 How many active devices have registered for packet transmission
204 .It Va net.ifpoll.0.status_frac
205 Controls how often (every
206 .Va status_frac No / Va pollhz
207 seconds) the status registers of the device are checked for error
208 conditions and the like.
209 Increasing this value reduces the load on the bus,
210 but also delays the error detection.
213 .It Va net.ifpoll.0.status.handlers
214 How many active devices have registered for status
217 .It Va net.ifpoll.X.rx.short_ticks
218 .It Va net.ifpoll.X.rx.lost_polls
219 .It Va net.ifpoll.X.rx.pending_polls
220 .It Va net.ifpoll.X.rx.residual_burst
221 .It Va net.ifpoll.X.rx.phase
222 .It Va net.ifpoll.X.rx.suspect
223 .It Va net.ifpoll.X.rx.stalled
224 .It Va net.ifpoll.X.tx.short_ticks
225 .It Va net.ifpoll.X.tx.lost_polls
226 .It Va net.ifpoll.X.tx.pending_polls
227 .It Va net.ifpoll.X.tx.residual_burst
228 .It Va net.ifpoll.X.tx.phase
229 .It Va net.ifpoll.X.tx.suspect
230 .It Va net.ifpoll.X.tx.stalled
233 .Sh SUPPORTED DEVICES
234 Device polling requires explicit modifications to the device drivers.
235 As of this writing, the
256 devices are supported,
257 with others in the works.
263 support multiple reception queues based
265 The modifications are rather straightforward, consisting in
266 the extraction of the inner part of the interrupt service routine
267 and writing a callback function,
270 to probe the device for events and process them.
272 conditionally compiled sections of the devices mentioned above
275 In order to reduce the latency in processing packets,
276 it is advisable to set the
279 .Va net.ifpoll.X.pollhz
282 Device polling first appeared in
288 The device polling code was rewritten by
290 based on the original code by
291 .An Luigi Rizzo Aq luigi@iet.unipi.it .
293 made the polling frequency settable at runtime,
294 added per CPU polling
295 and added multiple reception queue polling support.