| 1 | .\" Copyright (c) 2002 Luigi Rizzo |
| 2 | .\" All rights reserved. |
| 3 | .\" |
| 4 | .\" Redistribution and use in source and binary forms, with or without |
| 5 | .\" modification, are permitted provided that the following conditions |
| 6 | .\" are met: |
| 7 | .\" 1. Redistributions of source code must retain the above copyright |
| 8 | .\" notice, this list of conditions and the following disclaimer. |
| 9 | .\" 2. Redistributions in binary form must reproduce the above copyright |
| 10 | .\" notice, this list of conditions and the following disclaimer in the |
| 11 | .\" documentation and/or other materials provided with the distribution. |
| 12 | .\" |
| 13 | .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND |
| 14 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| 15 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
| 16 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE |
| 17 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
| 18 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS |
| 19 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) |
| 20 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT |
| 21 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY |
| 22 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF |
| 23 | .\" SUCH DAMAGE. |
| 24 | .\" |
| 25 | .\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $ |
| 26 | .\" |
| 27 | .Dd May 23, 2013 |
| 28 | .Dt POLLING 4 |
| 29 | .Os |
| 30 | .Sh NAME |
| 31 | .Nm polling |
| 32 | .Nd network device driver polling support |
| 33 | .Sh SYNOPSIS |
| 34 | .Cd "options IFPOLL_ENABLE" |
| 35 | .Sh DESCRIPTION |
| 36 | Network device polling |
| 37 | .Nm ( |
| 38 | for brevity) refers to a technique that |
| 39 | lets the operating system periodically poll network devices, instead of |
| 40 | relying on the network devices to generate interrupts when they need attention. |
| 41 | This might seem inefficient and counterintuitive, but when done |
| 42 | properly, |
| 43 | .Nm |
| 44 | gives more control to the operating system on |
| 45 | when and how to handle network devices, with a number of advantages in terms |
| 46 | of system responsiveness and performance. |
| 47 | .Pp |
| 48 | In particular, |
| 49 | .Nm |
| 50 | reduces the overhead for context |
| 51 | switches which is incurred when servicing interrupts, and |
| 52 | gives more control on the scheduling of a CPU between various |
| 53 | tasks (user processes, software interrupts, device handling) |
| 54 | which ultimately reduces the chances of livelock in the system. |
| 55 | .Ss Principles of Operation |
| 56 | In the normal, interrupt-based mode, network devices generate an interrupt |
| 57 | whenever they need attention. |
| 58 | This in turn causes a |
| 59 | context switch and the execution of an interrupt handler |
| 60 | which performs whatever processing is needed by the network device. |
| 61 | The duration of the interrupt handler is potentially unbounded |
| 62 | unless the network device driver has been programmed with real-time |
| 63 | concerns in mind (which is generally not the case for |
| 64 | .Dx |
| 65 | drivers). |
| 66 | Furthermore, under heavy traffic load, the system might be |
| 67 | persistently processing interrupts without being able to |
| 68 | complete other work, either in the kernel or in userland. |
| 69 | .Pp |
| 70 | Network device polling disables interrupts by polling network devices on |
| 71 | clock interrupts. |
| 72 | This way, the context switch overhead is removed. |
| 73 | Furthermore, |
| 74 | the operating system can control accurately how much work to spend |
| 75 | in handling network device events, and thus prevent livelock by reserving |
| 76 | some amount of CPU to other tasks. |
| 77 | .Pp |
| 78 | Enabling |
| 79 | .Nm |
| 80 | also changes the way software network interrupts |
| 81 | are scheduled, so there is never the risk of livelock because |
| 82 | packets are not processed to completion. |
| 83 | .Ss Enabling polling |
| 84 | It is turned on and off with help of |
| 85 | .Xr ifconfig 8 |
| 86 | command. |
| 87 | An interface does not have to be |
| 88 | .Dq up |
| 89 | in order to turn on its |
| 90 | .Nm |
| 91 | feature. |
| 92 | .Ss Loader Tunables |
| 93 | The following tunables can be set from |
| 94 | .Xr loader.conf 5 |
| 95 | .Em ( X |
| 96 | is the CPU number): |
| 97 | .Bl -tag -width indent -compact |
| 98 | .It Va net.ifpoll.burst_max |
| 99 | Default value for |
| 100 | .Va net.ifpoll.X.rx.burst_max |
| 101 | sysctl nodes. |
| 102 | .Pp |
| 103 | .It Va net.ifpoll.each_burst |
| 104 | Default value for |
| 105 | .Va net.ifpoll.X.rx.each_burst |
| 106 | sysctl nodes. |
| 107 | .Pp |
| 108 | .It Va net.ifpoll.user_frac |
| 109 | Default value for |
| 110 | .Va net.ifpoll.X.rx.user_frac |
| 111 | sysctl nodes. |
| 112 | .Pp |
| 113 | .It Va net.ifpoll.pollhz |
| 114 | Default value for |
| 115 | .Va net.ifpoll.X.pollhz |
| 116 | sysctl nodes. |
| 117 | .Pp |
| 118 | .It Va net.ifpoll.status_frac |
| 119 | Default value for |
| 120 | .Va net.ifpoll.0.status_frac |
| 121 | sysctl node. |
| 122 | .Pp |
| 123 | .It Va net.ifpoll.tx_frac |
| 124 | Default value for |
| 125 | .Va net.ifpoll.X.tx_frac |
| 126 | sysctl nodes. |
| 127 | .El |
| 128 | .Ss MIB Variables |
| 129 | The operation of |
| 130 | .Nm |
| 131 | is controlled by the following per CPU |
| 132 | .Xr sysctl 8 |
| 133 | MIB variables |
| 134 | .Em ( X |
| 135 | is the CPU number): |
| 136 | .Pp |
| 137 | .Bl -tag -width indent -compact |
| 138 | .It Va net.ifpoll.X.pollhz |
| 139 | The polling frequency, whose range is 1 to 30000. |
| 140 | Default is 6000. |
| 141 | .Pp |
| 142 | .It Va net.ifpoll.X.rx.user_frac |
| 143 | When |
| 144 | .Nm |
| 145 | is enabled, and provided that there is some work to do, |
| 146 | up to this percent of the CPU cycles is reserved to userland tasks, |
| 147 | the remaining fraction being available for |
| 148 | .Nm |
| 149 | processing. |
| 150 | Default is 50. |
| 151 | .Pp |
| 152 | .It Va net.ifpoll.X.rx.burst |
| 153 | Maximum number of packets grabbed from each network interface in |
| 154 | each timer tick. |
| 155 | This number is dynamically adjusted by the kernel, |
| 156 | according to the programmed |
| 157 | .Va user_frac , burst_max , |
| 158 | CPU speed, and system load. |
| 159 | .Pp |
| 160 | .It Va net.ifpoll.X.rx.each_burst |
| 161 | The burst above is split into smaller chunks of this number of |
| 162 | packets, going round-robin among all interfaces registered for |
| 163 | .Nm . |
| 164 | This prevents the case that a large burst from a single interface |
| 165 | can saturate the IP interrupt queue. |
| 166 | Default is 50. |
| 167 | .Pp |
| 168 | .It Va net.ifpoll.X.rx.burst_max |
| 169 | Upper bound for |
| 170 | .Va net.ifpoll.X.rx.burst . |
| 171 | Note that when |
| 172 | .Nm |
| 173 | is enabled, each interface can receive at most |
| 174 | .Pq Va pollhz No * Va burst_max |
| 175 | packets per second unless there are spare CPU cycles available for |
| 176 | .Nm |
| 177 | in the idle loop. |
| 178 | This number should be tuned to match the expected load. |
| 179 | Default is 250 which is adequate for 1000Mbit network and pollhz=6000. |
| 180 | .Pp |
| 181 | .It Va net.ifpoll.X.rx.handlers |
| 182 | How many active network devices have registered for packet reception |
| 183 | .Nm . |
| 184 | .Pp |
| 185 | .It Va net.ifpoll.X.tx_frac |
| 186 | Controls how often (every |
| 187 | .Va tx_frac No / Va pollhz |
| 188 | seconds) the tranmission queue is checked for packet transmission |
| 189 | done events. |
| 190 | Increasing this value reduces the time spent on checking packets |
| 191 | transmission done events thus reduces bus load, |
| 192 | but it also increases chance |
| 193 | that the transmission queue getting saturated. |
| 194 | Default is 1. |
| 195 | .Pp |
| 196 | .It Va net.ifpoll.X.tx.handlers |
| 197 | How many active network devices have registered for packet transmission |
| 198 | .Nm . |
| 199 | .Pp |
| 200 | .It Va net.ifpoll.0.status_frac |
| 201 | Controls how often (every |
| 202 | .Va status_frac No / Va pollhz |
| 203 | seconds) the status registers of the network device are checked for error |
| 204 | conditions and the like. |
| 205 | Increasing this value reduces the load on the bus, |
| 206 | but also delays the error detection. |
| 207 | Default is 120. |
| 208 | .Pp |
| 209 | .It Va net.ifpoll.0.status.handlers |
| 210 | How many active network devices have registered for status |
| 211 | .Nm . |
| 212 | .Pp |
| 213 | .It Va net.ifpoll.X.rx.short_ticks |
| 214 | .It Va net.ifpoll.X.rx.lost_polls |
| 215 | .It Va net.ifpoll.X.rx.pending_polls |
| 216 | .It Va net.ifpoll.X.rx.residual_burst |
| 217 | .It Va net.ifpoll.X.rx.phase |
| 218 | .It Va net.ifpoll.X.rx.suspect |
| 219 | .It Va net.ifpoll.X.rx.stalled |
| 220 | .It Va net.ifpoll.X.tx.short_ticks |
| 221 | .It Va net.ifpoll.X.tx.lost_polls |
| 222 | .It Va net.ifpoll.X.tx.pending_polls |
| 223 | .It Va net.ifpoll.X.tx.residual_burst |
| 224 | .It Va net.ifpoll.X.tx.phase |
| 225 | .It Va net.ifpoll.X.tx.suspect |
| 226 | .It Va net.ifpoll.X.tx.stalled |
| 227 | Debugging variables. |
| 228 | .El |
| 229 | .Sh SUPPORTED DEVICES |
| 230 | Network device polling requires explicit modifications to |
| 231 | the network device drivers. |
| 232 | As of this writing, the |
| 233 | .Xr bce 4 , |
| 234 | .Xr bge 4 , |
| 235 | .Xr bnx 4 , |
| 236 | .Xr dc 4 , |
| 237 | .Xr em 4 , |
| 238 | .Xr emx 4 , |
| 239 | .Xr fwe 4 , |
| 240 | .Xr fxp 4 , |
| 241 | .Xr igb 4 , |
| 242 | .Xr ix 4 , |
| 243 | .Xr jme 4 , |
| 244 | .Xr mxge 4 , |
| 245 | .Xr nfe 4 , |
| 246 | .Xr nge 4 , |
| 247 | .Xr re 4 , |
| 248 | .Xr rl 4 , |
| 249 | .Xr sis 4 , |
| 250 | .Xr stge 4 , |
| 251 | .Xr vge 4 , |
| 252 | .Xr vr 4 , |
| 253 | and |
| 254 | .Xr xl 4 |
| 255 | devices are supported, |
| 256 | with others in the works. |
| 257 | The |
| 258 | .Xr bce 4 , |
| 259 | .Xr bnx 4 , |
| 260 | .Xr emx 4 , |
| 261 | .Xr igb 4 , |
| 262 | .Xr ix 4 , |
| 263 | .Xr jme 4 , |
| 264 | and |
| 265 | .Xr mxge 4 , |
| 266 | support multiple reception queues based |
| 267 | .Nm . |
| 268 | The |
| 269 | .Xr bce 4 , |
| 270 | .Xr bnx 4 , |
| 271 | certain types of |
| 272 | .Xr emx 4 , |
| 273 | .Xr igb 4 , |
| 274 | and |
| 275 | .Xr ix 4 |
| 276 | support multiple transmission queues based |
| 277 | .Nm . |
| 278 | The modifications are rather straightforward, consisting in |
| 279 | the extraction of the inner part of the interrupt service routine |
| 280 | and writing a callback function, |
| 281 | .Fn *_npoll , |
| 282 | which is invoked |
| 283 | to probe the network device for events and process them. |
| 284 | (See the |
| 285 | conditionally compiled sections of the network devices mentioned above |
| 286 | for more details.) |
| 287 | .Pp |
| 288 | In order to reduce the latency in processing packets, |
| 289 | it is advisable to set the |
| 290 | .Xr sysctl 8 |
| 291 | variable |
| 292 | .Va net.ifpoll.X.pollhz |
| 293 | to at least 1000. |
| 294 | .Sh HISTORY |
| 295 | Network device polling first appeared in |
| 296 | .Fx 4.6 . |
| 297 | It was rewritten in |
| 298 | .Dx 1.3 . |
| 299 | .Sh AUTHORS |
| 300 | .An -nosplit |
| 301 | The network device polling code was rewritten by |
| 302 | .An Matt Dillon |
| 303 | based on the original code by |
| 304 | .An Luigi Rizzo Aq Mt luigi@iet.unipi.it . |
| 305 | .An Sepherosa Ziehau |
| 306 | made the polling frequency settable at runtime, |
| 307 | added per CPU polling |
| 308 | and added multiple reception and tranmission queue polling support. |