| Commit | Line | Data |
|---|---|---|
| 250c8cec SW |
1 | .\" Copyright (c) 2002 Luigi Rizzo |
| 2 | .\" All rights reserved. | |
| 984263bc | 3 | .\" |
| 250c8cec SW |
4 | .\" Redistribution and use in source and binary forms, with or without |
| 5 | .\" modification, are permitted provided that the following conditions | |
| 6 | .\" are met: | |
| 7 | .\" 1. Redistributions of source code must retain the above copyright | |
| 8 | .\" notice, this list of conditions and the following disclaimer. | |
| 9 | .\" 2. Redistributions in binary form must reproduce the above copyright | |
| 10 | .\" notice, this list of conditions and the following disclaimer in the | |
| 11 | .\" documentation and/or other materials provided with the distribution. | |
| 984263bc | 12 | .\" |
| 250c8cec SW |
13 | .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND |
| 14 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
| 15 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
| 16 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE | |
| 17 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
| 18 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
| 19 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
| 20 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
| 21 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
| 22 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
| 23 | .\" SUCH DAMAGE. | |
| 24 | .\" | |
| 25 | .\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $ | |
| 577491c1 | 26 | .\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $ |
| 250c8cec SW |
27 | .\" |
| 28 | .Dd October 2, 2007 | |
| 984263bc MD |
29 | .Dt POLLING 4 |
| 30 | .Os | |
| 31 | .Sh NAME | |
| 32 | .Nm polling | |
| 33 | .Nd device polling support | |
| 34 | .Sh SYNOPSIS | |
| 35 | .Cd "options DEVICE_POLLING" | |
| 984263bc | 36 | .Sh DESCRIPTION |
| 250c8cec SW |
37 | Device polling |
| 38 | .Nm ( | |
| 39 | for brevity) refers to a technique that | |
| 40 | lets the operating system periodically poll devices, instead of | |
| 41 | relying on the devices to generate interrupts when they need attention. | |
| 984263bc MD |
42 | This might seem inefficient and counterintuitive, but when done |
| 43 | properly, | |
| 44 | .Nm | |
| 45 | gives more control to the operating system on | |
| 46 | when and how to handle devices, with a number of advantages in terms | |
| 250c8cec | 47 | of system responsiveness and performance. |
| 984263bc MD |
48 | .Pp |
| 49 | In particular, | |
| 50 | .Nm | |
| 51 | reduces the overhead for context | |
| 52 | switches which is incurred when servicing interrupts, and | |
| 250c8cec | 53 | gives more control on the scheduling of a CPU between various |
| 984263bc MD |
54 | tasks (user processes, software interrupts, device handling) |
| 55 | which ultimately reduces the chances of livelock in the system. | |
| 250c8cec | 56 | .Ss Principles of Operation |
| 984263bc MD |
57 | In the normal, interrupt-based mode, devices generate an interrupt |
| 58 | whenever they need attention. | |
| 59 | This in turn causes a | |
| 60 | context switch and the execution of an interrupt handler | |
| 61 | which performs whatever processing is needed by the device. | |
| 62 | The duration of the interrupt handler is potentially unbounded | |
| 63 | unless the device driver has been programmed with real-time | |
| 64 | concerns in mind (which is generally not the case for | |
| 9bb2a92d | 65 | .Dx |
| 984263bc | 66 | drivers). |
| 250c8cec | 67 | Furthermore, under heavy traffic load, the system might be |
| 984263bc MD |
68 | persistently processing interrupts without being able to |
| 69 | complete other work, either in the kernel or in userland. | |
| 70 | .Pp | |
| 250c8cec SW |
71 | Device polling disables interrupts by polling devices on clock |
| 72 | interrupts. | |
| 984263bc MD |
73 | This way, the context switch overhead is removed. |
| 74 | Furthermore, | |
| 75 | the operating system can control accurately how much work to spend | |
| 76 | in handling device events, and thus prevent livelock by reserving | |
| 77 | some amount of CPU to other tasks. | |
| 78 | .Pp | |
| 250c8cec SW |
79 | Enabling |
| 80 | .Nm | |
| 81 | also changes the way software network interrupts | |
| 82 | are scheduled, so there is never the risk of livelock because | |
| 83 | packets are not processed to completion. | |
| 84 | .Ss Enabling polling | |
| 85 | Currently only network interface drivers support the | |
| 86 | .Nm | |
| 87 | feature. | |
| 88 | It is turned on and off with help of | |
| 89 | .Xr ifconfig 8 | |
| 90 | command. | |
| 577491c1 SW |
91 | An interface does not have to be |
| 92 | .Dq up | |
| 93 | in order to turn on its | |
| 94 | .Nm | |
| 95 | feature. | |
| 250c8cec SW |
96 | .Ss Loader Tunables |
| 97 | The following tunables can be set from | |
| 98 | .Xr loader.conf 5 : | |
| 99 | .Bl -tag -width indent -compact | |
| 100 | .It Va kern.polling.enable | |
| 101 | If set to non-zero, | |
| 102 | .Nm | |
| 103 | is enabled. | |
| 7d5ac269 | 104 | Default is enabled. |
| 250c8cec SW |
105 | .Pp |
| 106 | .It Va kern.polling.cpumask | |
| 107 | A bitmask that controls which CPUs support device polling. | |
| 108 | Default is 0xffffffff. | |
| 109 | .El | |
| 110 | .Ss MIB Variables | |
| 111 | The operation of | |
| 112 | .Nm | |
| 113 | is controlled by the following per CPU | |
| 984263bc | 114 | .Xr sysctl 8 |
| 250c8cec SW |
115 | MIB variables |
| 116 | .Em ( X | |
| 117 | is the CPU number): | |
| 984263bc | 118 | .Pp |
| 250c8cec SW |
119 | .Bl -tag -width indent -compact |
| 120 | .It Va kern.polling.X.enable | |
| 121 | If set to non-zero, | |
| 122 | .Nm | |
| 123 | is enabled. | |
| 7d5ac269 | 124 | Default is enabled. |
| 250c8cec SW |
125 | .Pp |
| 126 | .It Va kern.polling.X.pollhz | |
| 127 | The polling frequency, whose range is 1 to 30000. | |
| 128 | Default is 2000. | |
| 129 | .Pp | |
| 130 | .It Va kern.polling.cpumask | |
| 131 | A read only bitmask of the CPUs that support device polling. | |
| 132 | .Pp | |
| 133 | .It Va kern.polling.defcpu | |
| 134 | The default CPU used to run device polling (read only). | |
| 135 | .Pp | |
| 136 | .It Va kern.polling.X.user_frac | |
| 984263bc MD |
137 | When |
| 138 | .Nm | |
| 250c8cec SW |
139 | is enabled, and provided that there is some work to do, |
| 140 | up to this percent of the CPU cycles is reserved to userland tasks, | |
| 141 | the remaining fraction being available for | |
| 142 | .Nm | |
| 143 | processing. | |
| 144 | Default is 50. | |
| 984263bc | 145 | .Pp |
| 250c8cec SW |
146 | .It Va kern.polling.X.burst |
| 147 | Maximum number of packets grabbed from each network interface in | |
| 148 | each timer tick. | |
| 149 | This number is dynamically adjusted by the kernel, | |
| 150 | according to the programmed | |
| 151 | .Va user_frac , burst_max , | |
| 152 | CPU speed, and system load. | |
| 153 | .Pp | |
| 154 | .It Va kern.polling.X.each_burst | |
| 155 | The burst above is split into smaller chunks of this number of | |
| 156 | packets, going round-robin among all interfaces registered for | |
| 157 | .Nm . | |
| 158 | This prevents the case that a large burst from a single interface | |
| 159 | can saturate the IP interrupt queue | |
| 160 | .Pq Va net.inet.ip.intr_queue_maxlen . | |
| 161 | Default is 5. | |
| 162 | .Pp | |
| 163 | .It Va kern.polling.X.burst_max | |
| 164 | Upper bound for | |
| 165 | .Va kern.polling.burst . | |
| 166 | Note that when | |
| 984263bc | 167 | .Nm |
| 250c8cec SW |
168 | is enabled, each interface can receive at most |
| 169 | .Pq Va pollhz No * Va burst_max | |
| 170 | packets per second unless there are spare CPU cycles available for | |
| 171 | .Nm | |
| 172 | in the idle loop. | |
| 173 | This number should be tuned to match the expected load | |
| 174 | (which can be quite high with GigE cards). | |
| 175 | Default is 150 which is adequate for 100Mbit network and pollhz=1000. | |
| 176 | .Pp | |
| 177 | .It Va kern.polling.X.reg_frac | |
| 178 | Controls how often (every | |
| 179 | .Va reg_frac No / Va pollhz | |
| 180 | seconds) the status registers of the device are checked for error | |
| 181 | conditions and the like. | |
| 182 | Increasing this value reduces the load on the bus, but also delays | |
| 183 | the error detection. | |
| 184 | Default is 20. | |
| 185 | .Pp | |
| 186 | .It Va kern.polling.X.handlers | |
| 187 | How many active devices have registered for | |
| 188 | .Nm . | |
| 984263bc | 189 | .Pp |
| 250c8cec SW |
190 | .It Va kern.polling.X.short_ticks |
| 191 | .It Va kern.polling.X.lost_polls | |
| 192 | .It Va kern.polling.X.pending_polls | |
| 193 | .It Va kern.polling.X.residual_burst | |
| 194 | .It Va kern.polling.X.phase | |
| 195 | .It Va kern.polling.X.suspect | |
| 196 | .It Va kern.polling.X.stalled | |
| 197 | Debugging variables. | |
| 198 | .El | |
| 984263bc | 199 | .Sh SUPPORTED DEVICES |
| 250c8cec | 200 | Device polling requires explicit modifications to the device drivers. |
| 984263bc | 201 | As of this writing, the |
| 0fa5b73e | 202 | .Xr bce 4 , |
| 20f020b4 | 203 | .Xr bge 4 , |
| 984263bc MD |
204 | .Xr dc 4 , |
| 205 | .Xr em 4 , | |
| ea303db7 | 206 | .Xr fwe 4 , |
| 984263bc | 207 | .Xr fxp 4 , |
| 9de40864 | 208 | .Xr jme 4 , |
| 01fe1724 | 209 | .Xr nfe 4 , |
| ea303db7 JS |
210 | .Xr nge 4 , |
| 211 | .Xr re 4 , | |
| 984263bc | 212 | .Xr rl 4 , |
| 28e5ef00 | 213 | .Xr sis 4 , |
| 01fe1724 SW |
214 | .Xr stge 4 , |
| 215 | .Xr vge 4 , | |
| 28e5ef00 | 216 | .Xr vr 4 , |
| 28e5ef00 | 217 | .Xr wi 4 |
| 01fe1724 SW |
218 | and |
| 219 | .Xr xl 4 | |
| 250c8cec | 220 | devices are supported, with others in the works. |
| 984263bc MD |
221 | The modifications are rather straightforward, consisting in |
| 222 | the extraction of the inner part of the interrupt service routine | |
| 223 | and writing a callback function, | |
| 224 | .Fn *_poll , | |
| 225 | which is invoked | |
| 226 | to probe the device for events and process them. | |
| 250c8cec | 227 | (See the |
| 984263bc | 228 | conditionally compiled sections of the devices mentioned above |
| 250c8cec | 229 | for more details.) |
| 984263bc | 230 | .Pp |
| 28e5ef00 SW |
231 | In order to reduce the latency in processing packets, |
| 232 | it is advisable to set the | |
| 233 | .Xr sysctl 8 | |
| 234 | variable | |
| 250c8cec | 235 | .Va kern.polling.X.pollhz |
| 28e5ef00 | 236 | to at least 1000. |
| 984263bc | 237 | .Sh HISTORY |
| 250c8cec SW |
238 | Device polling first appeared in |
| 239 | .Fx 4.6 . | |
| 240 | It was rewritten in | |
| 241 | .Dx 1.3 . | |
| 242 | .Sh AUTHORS | |
| 243 | .An -nosplit | |
| 244 | The device polling code was rewritten by | |
| 245 | .An Matt Dillon | |
| 246 | based on the original code by | |
| 984263bc | 247 | .An Luigi Rizzo Aq luigi@iet.unipi.it . |
| 250c8cec SW |
248 | .An Sepherosa Ziehau |
| 249 | made the polling frequency settable at runtime and added per CPU polling. |