Commit | Line | Data |
---|---|---|
033a4603 | 1 | .\" Copyright (c) 2001 Matthew Dillon. Terms and conditions are those of |
984263bc MD |
2 | .\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in |
3 | .\" the source tree. | |
4 | .\" | |
5 | .\" $FreeBSD: src/share/man/man7/tuning.7,v 1.1.2.30 2002/12/17 19:32:08 dillon Exp $ | |
1bf4b486 | 6 | .\" $DragonFly: src/share/man/man7/tuning.7,v 1.5 2005/08/01 01:49:17 swildner Exp $ |
984263bc MD |
7 | .\" |
8 | .Dd May 25, 2001 | |
9 | .Dt TUNING 7 | |
10 | .Os | |
11 | .Sh NAME | |
12 | .Nm tuning | |
9bb2a92d | 13 | .Nd performance tuning under DragonFly |
984263bc MD |
14 | .Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP |
15 | When using | |
16 | .Xr disklabel 8 | |
17 | or | |
18 | .Xr sysinstall 8 | |
19 | to lay out your filesystems on a hard disk it is important to remember | |
20 | that hard drives can transfer data much more quickly from outer tracks | |
21 | than they can from inner tracks. | |
22 | To take advantage of this you should | |
23 | try to pack your smaller filesystems and swap closer to the outer tracks, | |
24 | follow with the larger filesystems, and end with the largest filesystems. | |
25 | It is also important to size system standard filesystems such that you | |
26 | will not be forced to resize them later as you scale the machine up. | |
27 | I usually create, in order, a 128M root, 1G swap, 128M | |
28 | .Pa /var , | |
29 | 128M | |
30 | .Pa /var/tmp , | |
31 | 3G | |
32 | .Pa /usr , | |
33 | and use any remaining space for | |
34 | .Pa /home . | |
35 | .Pp | |
36 | You should typically size your swap space to approximately 2x main memory. | |
37 | If you do not have a lot of RAM, though, you will generally want a lot | |
38 | more swap. | |
39 | It is not recommended that you configure any less than | |
40 | 256M of swap on a system and you should keep in mind future memory | |
41 | expansion when sizing the swap partition. | |
42 | The kernel's VM paging algorithms are tuned to perform best when there is | |
43 | at least 2x swap versus main memory. | |
44 | Configuring too little swap can lead | |
45 | to inefficiencies in the VM page scanning code as well as create issues | |
46 | later on if you add more memory to your machine. | |
47 | Finally, on larger systems | |
48 | with multiple SCSI disks (or multiple IDE disks operating on different | |
49 | controllers), we strongly recommend that you configure swap on each drive | |
50 | (up to four drives). | |
51 | The swap partitions on the drives should be approximately the same size. | |
52 | The kernel can handle arbitrary sizes but | |
53 | internal data structures scale to 4 times the largest swap partition. | |
54 | Keeping | |
55 | the swap partitions near the same size will allow the kernel to optimally | |
56 | stripe swap space across the N disks. | |
57 | Do not worry about overdoing it a | |
58 | little, swap space is the saving grace of | |
59 | .Ux | |
60 | and even if you do not normally use much swap, it can give you more time to | |
61 | recover from a runaway program before being forced to reboot. | |
62 | .Pp | |
63 | How you size your | |
64 | .Pa /var | |
65 | partition depends heavily on what you intend to use the machine for. | |
66 | This | |
67 | partition is primarily used to hold mailboxes, the print spool, and log | |
68 | files. | |
69 | Some people even make | |
70 | .Pa /var/log | |
71 | its own partition (but except for extreme cases it is not worth the waste | |
72 | of a partition ID). | |
73 | If your machine is intended to act as a mail | |
74 | or print server, | |
75 | or you are running a heavily visited web server, you should consider | |
76 | creating a much larger partition \(en perhaps a gig or more. | |
77 | It is very easy | |
78 | to underestimate log file storage requirements. | |
79 | .Pp | |
80 | Sizing | |
81 | .Pa /var/tmp | |
82 | depends on the kind of temporary file usage you think you will need. | |
83 | 128M is | |
84 | the minimum we recommend. | |
85 | Also note that sysinstall will create a | |
86 | .Pa /tmp | |
87 | directory. | |
88 | Dedicating a partition for temporary file storage is important for | |
89 | two reasons: first, it reduces the possibility of filesystem corruption | |
90 | in a crash, and second it reduces the chance of a runaway process that | |
91 | fills up | |
92 | .Oo Pa /var Oc Ns Pa /tmp | |
93 | from blowing up more critical subsystems (mail, | |
94 | logging, etc). | |
95 | Filling up | |
96 | .Oo Pa /var Oc Ns Pa /tmp | |
97 | is a very common problem to have. | |
98 | .Pp | |
99 | In the old days there were differences between | |
100 | .Pa /tmp | |
101 | and | |
102 | .Pa /var/tmp , | |
103 | but the introduction of | |
104 | .Pa /var | |
105 | (and | |
106 | .Pa /var/tmp ) | |
107 | led to massive confusion | |
108 | by program writers so today programs haphazardly use one or the | |
109 | other and thus no real distinction can be made between the two. | |
110 | So it makes sense to have just one temporary directory and | |
111 | softlink to it from the other tmp directory locations. | |
112 | However you handle | |
113 | .Pa /tmp , | |
114 | the one thing you do not want to do is leave it sitting | |
115 | on the root partition where it might cause root to fill up or possibly | |
116 | corrupt root in a crash/reboot situation. | |
117 | .Pp | |
118 | The | |
119 | .Pa /usr | |
120 | partition holds the bulk of the files required to support the system and | |
121 | a subdirectory within it called | |
122 | .Pa /usr/local | |
123 | holds the bulk of the files installed from the | |
124 | .Xr ports 7 | |
125 | hierarchy. | |
126 | If you do not use ports all that much and do not intend to keep | |
127 | system source | |
128 | .Pq Pa /usr/src | |
129 | on the machine, you can get away with | |
130 | a 1 gigabyte | |
131 | .Pa /usr | |
132 | partition. | |
133 | However, if you install a lot of ports | |
134 | (especially window managers and Linux-emulated binaries), we recommend | |
135 | at least a 2 gigabyte | |
136 | .Pa /usr | |
137 | and if you also intend to keep system source | |
138 | on the machine, we recommend a 3 gigabyte | |
139 | .Pa /usr . | |
140 | Do not underestimate the | |
141 | amount of space you will need in this partition, it can creep up and | |
142 | surprise you! | |
143 | .Pp | |
144 | The | |
145 | .Pa /home | |
146 | partition is typically used to hold user-specific data. | |
147 | I usually size it to the remainder of the disk. | |
148 | .Pp | |
149 | Why partition at all? | |
150 | Why not create one big | |
151 | .Pa / | |
152 | partition and be done with it? | |
153 | Then I do not have to worry about undersizing things! | |
154 | Well, there are several reasons this is not a good idea. | |
155 | First, | |
156 | each partition has different operational characteristics and separating them | |
157 | allows the filesystem to tune itself to those characteristics. | |
158 | For example, | |
159 | the root and | |
160 | .Pa /usr | |
161 | partitions are read-mostly, with very little writing, while | |
162 | a lot of reading and writing could occur in | |
163 | .Pa /var | |
164 | and | |
165 | .Pa /var/tmp . | |
166 | By properly | |
167 | partitioning your system fragmentation introduced in the smaller more | |
168 | heavily write-loaded partitions will not bleed over into the mostly-read | |
169 | partitions. | |
170 | Additionally, keeping the write-loaded partitions closer to | |
171 | the edge of the disk (i.e. before the really big partitions instead of after | |
172 | in the partition table) will increase I/O performance in the partitions | |
173 | where you need it the most. | |
174 | Now it is true that you might also need I/O | |
175 | performance in the larger partitions, but they are so large that shifting | |
176 | them more towards the edge of the disk will not lead to a significant | |
177 | performance improvement whereas moving | |
178 | .Pa /var | |
179 | to the edge can have a huge impact. | |
180 | Finally, there are safety concerns. | |
181 | Having a small neat root partition that | |
182 | is essentially read-only gives it a greater chance of surviving a bad crash | |
183 | intact. | |
184 | .Pp | |
185 | Properly partitioning your system also allows you to tune | |
186 | .Xr newfs 8 , | |
187 | and | |
188 | .Xr tunefs 8 | |
189 | parameters. | |
190 | Tuning | |
191 | .Xr newfs 8 | |
192 | requires more experience but can lead to significant improvements in | |
193 | performance. | |
194 | There are three parameters that are relatively safe to tune: | |
195 | .Em blocksize , bytes/i-node , | |
196 | and | |
197 | .Em cylinders/group . | |
198 | .Pp | |
9bb2a92d | 199 | .Dx |
984263bc MD |
200 | performs best when using 8K or 16K filesystem block sizes. |
201 | The default filesystem block size is 16K, | |
202 | which provides best performance for most applications, | |
203 | with the exception of those that perform random access on large files | |
204 | (such as database server software). | |
205 | Such applications tend to perform better with a smaller block size, | |
206 | although modern disk characteristics are such that the performance | |
207 | gain from using a smaller block size may not be worth consideration. | |
208 | Using a block size larger than 16K | |
209 | can cause fragmentation of the buffer cache and | |
210 | lead to lower performance. | |
211 | .Pp | |
212 | The defaults may be unsuitable | |
213 | for a filesystem that requires a very large number of i-nodes | |
214 | or is intended to hold a large number of very small files. | |
215 | Such a filesystem should be created with an 8K or 4K block size. | |
216 | This also requires you to specify a smaller | |
217 | fragment size. | |
218 | We recommend always using a fragment size that is 1/8 | |
219 | the block size (less testing has been done on other fragment size factors). | |
220 | The | |
221 | .Xr newfs 8 | |
222 | options for this would be | |
223 | .Dq Li "newfs -f 1024 -b 8192 ..." . | |
224 | .Pp | |
225 | If a large partition is intended to be used to hold fewer, larger files, such | |
226 | as database files, you can increase the | |
227 | .Em bytes/i-node | |
228 | ratio which reduces the number of i-nodes (maximum number of files and | |
229 | directories that can be created) for that partition. | |
230 | Decreasing the number | |
231 | of i-nodes in a filesystem can greatly reduce | |
232 | .Xr fsck 8 | |
233 | recovery times after a crash. | |
234 | Do not use this option | |
235 | unless you are actually storing large files on the partition, because if you | |
236 | overcompensate you can wind up with a filesystem that has lots of free | |
237 | space remaining but cannot accommodate any more files. | |
238 | Using 32768, 65536, or 262144 bytes/i-node is recommended. | |
239 | You can go higher but | |
240 | it will have only incremental effects on | |
241 | .Xr fsck 8 | |
242 | recovery times. | |
243 | For example, | |
244 | .Dq Li "newfs -i 32768 ..." . | |
245 | .Pp | |
246 | .Xr tunefs 8 | |
247 | may be used to further tune a filesystem. | |
248 | This command can be run in | |
249 | single-user mode without having to reformat the filesystem. | |
250 | However, this is possibly the most abused program in the system. | |
251 | Many people attempt to | |
252 | increase available filesystem space by setting the min-free percentage to 0. | |
253 | This can lead to severe filesystem fragmentation and we do not recommend | |
254 | that you do this. | |
255 | Really the only | |
256 | .Xr tunefs 8 | |
257 | option worthwhile here is turning on | |
258 | .Em softupdates | |
259 | with | |
260 | .Dq Li "tunefs -n enable /filesystem" . | |
261 | (Note: in | |
262 | .Fx 4.5 | |
263 | and later, softupdates can be turned on using the | |
264 | .Fl U | |
265 | option to | |
266 | .Xr newfs 8 , | |
267 | and | |
268 | .Xr sysinstall 8 | |
269 | will typically enable softupdates automatically for non-root filesystems). | |
270 | Softupdates drastically improves meta-data performance, mainly file | |
271 | creation and deletion. | |
272 | We recommend enabling softupdates on most filesystems; however, there | |
273 | are two limitations to softupdates that you should be aware of when | |
274 | determining whether to use it on a filesystem. | |
275 | First, softupdates guarantees filesystem consistency in the | |
276 | case of a crash but could very easily be several seconds (even a minute!) | |
277 | behind on pending writes to the physical disk. | |
278 | If you crash you may lose more work | |
279 | than otherwise. | |
280 | Secondly, softupdates delays the freeing of filesystem | |
281 | blocks. | |
282 | If you have a filesystem (such as the root filesystem) which is | |
283 | close to full, doing a major update of it, e.g.\& | |
284 | .Dq Li "make installworld" , | |
285 | can run it out of space and cause the update to fail. | |
286 | For this reason, softupdates will not be enabled on the root filesystem | |
287 | during a typical install. There is no loss of performance since the root | |
288 | filesystem is rarely written to. | |
289 | .Pp | |
290 | A number of run-time | |
291 | .Xr mount 8 | |
292 | options exist that can help you tune the system. | |
293 | The most obvious and most dangerous one is | |
294 | .Cm async . | |
295 | Do not ever use it; it is far too dangerous. | |
296 | A less dangerous and more | |
297 | useful | |
298 | .Xr mount 8 | |
299 | option is called | |
300 | .Cm noatime . | |
301 | .Ux | |
302 | filesystems normally update the last-accessed time of a file or | |
303 | directory whenever it is accessed. | |
304 | This operation is handled in | |
9bb2a92d | 305 | .Dx |
984263bc MD |
306 | with a delayed write and normally does not create a burden on the system. |
307 | However, if your system is accessing a huge number of files on a continuing | |
308 | basis the buffer cache can wind up getting polluted with atime updates, | |
309 | creating a burden on the system. | |
310 | For example, if you are running a heavily | |
311 | loaded web site, or a news server with lots of readers, you might want to | |
312 | consider turning off atime updates on your larger partitions with this | |
313 | .Xr mount 8 | |
314 | option. | |
315 | However, you should not gratuitously turn off atime | |
316 | updates everywhere. | |
317 | For example, the | |
318 | .Pa /var | |
319 | filesystem customarily | |
320 | holds mailboxes, and atime (in combination with mtime) is used to | |
321 | determine whether a mailbox has new mail. | |
322 | You might as well leave | |
323 | atime turned on for mostly read-only partitions such as | |
324 | .Pa / | |
325 | and | |
326 | .Pa /usr | |
327 | as well. | |
328 | This is especially useful for | |
329 | .Pa / | |
330 | since some system utilities | |
331 | use the atime field for reporting. | |
332 | .Sh STRIPING DISKS | |
333 | In larger systems you can stripe partitions from several drives together | |
334 | to create a much larger overall partition. | |
335 | Striping can also improve | |
336 | the performance of a filesystem by splitting I/O operations across two | |
337 | or more disks. | |
338 | The | |
339 | .Xr vinum 8 | |
340 | and | |
341 | .Xr ccdconfig 8 | |
342 | utilities may be used to create simple striped filesystems. | |
343 | Generally | |
344 | speaking, striping smaller partitions such as the root and | |
345 | .Pa /var/tmp , | |
346 | or essentially read-only partitions such as | |
347 | .Pa /usr | |
348 | is a complete waste of time. | |
349 | You should only stripe partitions that require serious I/O performance, | |
350 | typically | |
351 | .Pa /var , /home , | |
352 | or custom partitions used to hold databases and web pages. | |
353 | Choosing the proper stripe size is also | |
354 | important. | |
355 | Filesystems tend to store meta-data on power-of-2 boundaries | |
356 | and you usually want to reduce seeking rather than increase seeking. | |
357 | This | |
358 | means you want to use a large off-center stripe size such as 1152 sectors | |
359 | so sequential I/O does not seek both disks and so meta-data is distributed | |
360 | across both disks rather than concentrated on a single disk. | |
361 | If | |
362 | you really need to get sophisticated, we recommend using a real hardware | |
363 | RAID controller from the list of | |
9bb2a92d | 364 | .Dx |
984263bc MD |
365 | supported controllers. |
366 | .Sh SYSCTL TUNING | |
367 | .Xr sysctl 8 | |
368 | variables permit system behavior to be monitored and controlled at | |
369 | run-time. | |
370 | Some sysctls simply report on the behavior of the system; others allow | |
371 | the system behavior to be modified; | |
372 | some may be set at boot time using | |
373 | .Xr rc.conf 5 , | |
374 | but most will be set via | |
375 | .Xr sysctl.conf 5 . | |
376 | There are several hundred sysctls in the system, including many that appear | |
377 | to be candidates for tuning but actually are not. | |
378 | In this document we will only cover the ones that have the greatest effect | |
379 | on the system. | |
380 | .Pp | |
381 | The | |
382 | .Va kern.ipc.shm_use_phys | |
383 | sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on). | |
384 | Setting | |
385 | this parameter to 1 will cause all System V shared memory segments to be | |
386 | mapped to unpageable physical RAM. | |
387 | This feature only has an effect if you | |
388 | are either (A) mapping small amounts of shared memory across many (hundreds) | |
389 | of processes, or (B) mapping large amounts of shared memory across any | |
390 | number of processes. | |
391 | This feature allows the kernel to remove a great deal | |
392 | of internal memory management page-tracking overhead at the cost of wiring | |
393 | the shared memory into core, making it unswappable. | |
394 | .Pp | |
395 | The | |
396 | .Va vfs.vmiodirenable | |
397 | sysctl defaults to 1 (on). | |
398 | This parameter controls how directories are cached | |
399 | by the system. | |
400 | Most directories are small and use but a single fragment | |
401 | (typically 1K) in the filesystem and even less (typically 512 bytes) in | |
402 | the buffer cache. | |
403 | However, when operating in the default mode the buffer | |
404 | cache will only cache a fixed number of directories even if you have a huge | |
405 | amount of memory. | |
406 | Turning on this sysctl allows the buffer cache to use | |
407 | the VM Page Cache to cache the directories. | |
408 | The advantage is that all of | |
409 | memory is now available for caching directories. | |
410 | The disadvantage is that | |
411 | the minimum in-core memory used to cache a directory is the physical page | |
412 | size (typically 4K) rather than 512 bytes. | |
413 | We recommend turning this option off in memory-constrained environments; | |
414 | however, when on, it will substantially improve the performance of services | |
415 | that manipulate a large number of files. | |
416 | Such services can include web caches, large mail systems, and news systems. | |
417 | Turning on this option will generally not reduce performance even with the | |
418 | wasted memory but you should experiment to find out. | |
419 | .Pp | |
420 | The | |
421 | .Va vfs.write_behind | |
422 | sysctl defaults to 1 (on). This tells the filesystem to issue media | |
423 | writes as full clusters are collected, which typically occurs when writing | |
424 | large sequential files. The idea is to avoid saturating the buffer | |
425 | cache with dirty buffers when it would not benefit I/O performance. However, | |
426 | this may stall processes and under certain circumstances you may wish to turn | |
427 | it off. | |
428 | .Pp | |
429 | The | |
430 | .Va vfs.hirunningspace | |
431 | sysctl determines how much outstanding write I/O may be queued to | |
432 | disk controllers system wide at any given instance. The default is | |
433 | usually sufficient but on machines with lots of disks you may want to bump | |
434 | it up to four or five megabytes. Note that setting too high a value | |
435 | (exceeding the buffer cache's write threshold) can lead to extremely | |
436 | bad clustering performance. Do not set this value arbitrarily high! Also, | |
437 | higher write queueing values may add latency to reads occuring at the same | |
438 | time. | |
439 | .Pp | |
440 | There are various other buffer-cache and VM page cache related sysctls. | |
441 | We do not recommend modifying these values. | |
442 | As of | |
443 | .Fx 4.3 , | |
444 | the VM system does an extremely good job tuning itself. | |
445 | .Pp | |
446 | The | |
447 | .Va net.inet.tcp.sendspace | |
448 | and | |
449 | .Va net.inet.tcp.recvspace | |
450 | sysctls are of particular interest if you are running network intensive | |
451 | applications. | |
452 | They control the amount of send and receive buffer space | |
453 | allowed for any given TCP connection. | |
454 | The default sending buffer is 32K; the default receiving buffer | |
455 | is 64K. | |
456 | You can often | |
457 | improve bandwidth utilization by increasing the default at the cost of | |
458 | eating up more kernel memory for each connection. | |
459 | We do not recommend | |
460 | increasing the defaults if you are serving hundreds or thousands of | |
461 | simultaneous connections because it is possible to quickly run the system | |
462 | out of memory due to stalled connections building up. | |
463 | But if you need | |
464 | high bandwidth over a fewer number of connections, especially if you have | |
465 | gigabit Ethernet, increasing these defaults can make a huge difference. | |
466 | You can adjust the buffer size for incoming and outgoing data separately. | |
467 | For example, if your machine is primarily doing web serving you may want | |
468 | to decrease the recvspace in order to be able to increase the | |
469 | sendspace without eating too much kernel memory. | |
470 | Note that the routing table (see | |
471 | .Xr route 8 ) | |
472 | can be used to introduce route-specific send and receive buffer size | |
473 | defaults. | |
474 | .Pp | |
475 | As an additional management tool you can use pipes in your | |
476 | firewall rules (see | |
477 | .Xr ipfw 8 ) | |
478 | to limit the bandwidth going to or from particular IP blocks or ports. | |
479 | For example, if you have a T1 you might want to limit your web traffic | |
480 | to 70% of the T1's bandwidth in order to leave the remainder available | |
481 | for mail and interactive use. | |
482 | Normally a heavily loaded web server | |
483 | will not introduce significant latencies into other services even if | |
484 | the network link is maxed out, but enforcing a limit can smooth things | |
485 | out and lead to longer term stability. | |
486 | Many people also enforce artificial | |
487 | bandwidth limitations in order to ensure that they are not charged for | |
488 | using too much bandwidth. | |
489 | .Pp | |
490 | Setting the send or receive TCP buffer to values larger then 65535 will result | |
491 | in a marginal performance improvement unless both hosts support the window | |
492 | scaling extension of the TCP protocol, which is controlled by the | |
493 | .Va net.inet.tcp.rfc1323 | |
494 | sysctl. | |
495 | These extensions should be enabled and the TCP buffer size should be set | |
496 | to a value larger than 65536 in order to obtain good performance from | |
497 | certain types of network links; specifically, gigabit WAN links and | |
498 | high-latency satellite links. | |
499 | RFC1323 support is enabled by default. | |
500 | .Pp | |
501 | The | |
502 | .Va net.inet.tcp.always_keepalive | |
503 | sysctl determines whether or not the TCP implementation should attempt | |
504 | to detect dead TCP connections by intermittently delivering | |
505 | .Dq keepalives | |
506 | on the connection. | |
507 | By default, this is enabled for all applications; by setting this | |
508 | sysctl to 0, only applications that specifically request keepalives | |
509 | will use them. | |
510 | In most environments, TCP keepalives will improve the management of | |
511 | system state by expiring dead TCP connections, particularly for | |
512 | systems serving dialup users who may not always terminate individual | |
513 | TCP connections before disconnecting from the network. | |
514 | However, in some environments, temporary network outages may be | |
515 | incorrectly identified as dead sessions, resulting in unexpectedly | |
516 | terminated TCP connections. | |
517 | In such environments, setting the sysctl to 0 may reduce the occurrence of | |
518 | TCP session disconnections. | |
519 | .Pp | |
520 | The | |
521 | .Va net.inet.tcp.delayed_ack | |
522 | TCP feature is largly misunderstood. Historically speaking this feature | |
523 | was designed to allow the acknowledgement to transmitted data to be returned | |
524 | along with the response. For example, when you type over a remote shell | |
525 | the acknowledgement to the character you send can be returned along with the | |
526 | data representing the echo of the character. With delayed acks turned off | |
527 | the acknowledgement may be sent in its own packet before the remote service | |
528 | has a chance to echo the data it just received. This same concept also | |
529 | applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the | |
9bb2a92d | 530 | number of tiny packets flowing across the network in half. The DragonFly |
984263bc MD |
531 | delayed-ack implementation also follows the TCP protocol rule that |
532 | at least every other packet be acknowledged even if the standard 100ms | |
533 | timeout has not yet passed. Normally the worst a delayed ack can do is | |
534 | slightly delay the teardown of a connection, or slightly delay the ramp-up | |
535 | of a slow-start TCP connection. While we aren't sure we believe that | |
536 | the several FAQs related to packages such as SAMBA and SQUID which advise | |
9bb2a92d | 537 | turning off delayed acks may be refering to the slow-start issue. In DragonFly |
984263bc MD |
538 | it would be more beneficial to increase the slow-start flightsize via |
539 | the | |
540 | .Va net.inet.tcp.slowstart_flightsize | |
541 | sysctl rather then disable delayed acks. | |
542 | .Pp | |
543 | The | |
544 | .Va net.inet.tcp.inflight_enable | |
545 | sysctl turns on bandwidth delay product limiting for all TCP connections. | |
546 | The system will attempt to calculate the bandwidth delay product for each | |
547 | connection and limit the amount of data queued to the network to just the | |
548 | amount required to maintain optimum throughput. This feature is useful | |
549 | if you are serving data over modems, GigE, or high speed WAN links (or | |
550 | any other link with a high bandwidth*delay product), especially if you are | |
551 | also using window scaling or have configured a large send window. If | |
552 | you enable this option you should also be sure to set | |
553 | .Va net.inet.tcp.inflight_debug | |
554 | to 0 (disable debugging), and for production use setting | |
555 | .Va net.inet.tcp.inflight_min | |
556 | to at least 6144 may be beneficial. Note, however, that setting high | |
557 | minimums may effectively disable bandwidth limiting depending on the link. | |
558 | The limiting feature reduces the amount of data built up in intermediate | |
559 | router and switch packet queues as well as reduces the amount of data built | |
560 | up in the local host's interface queue. With fewer packets queued up, | |
561 | interactive connections, especially over slow modems, will also be able | |
562 | to operate with lower round trip times. However, note that this feature | |
563 | only effects data transmission (uploading / server-side). It does not | |
564 | effect data reception (downloading). | |
565 | .Pp | |
566 | Adjusting | |
567 | .Va net.inet.tcp.inflight_stab | |
568 | is not recommended. | |
1bf4b486 | 569 | This parameter defaults to 20, representing 2 maximal packets added |
984263bc MD |
570 | to the bandwidth delay product window calculation. The additional |
571 | window is required to stabilize the algorithm and improve responsiveness | |
572 | to changing conditions, but it can also result in higher ping times | |
1bf4b486 | 573 | over slow links (though still much lower then you would get without |
984263bc MD |
574 | the inflight algorithm). In such cases you may |
575 | wish to try reducing this parameter to 15, 10, or 5, and you may also | |
576 | have to reduce | |
577 | .Va net.inet.tcp.inflight_min | |
578 | (for example, to 3500) to get the desired effect. Reducing these parameters | |
579 | should be done as a last resort only. | |
580 | .Pp | |
581 | The | |
582 | .Va net.inet.ip.portrange.* | |
583 | sysctls control the port number ranges automatically bound to TCP and UDP | |
584 | sockets. There are three ranges: A low range, a default range, and a | |
1bf4b486 | 585 | high range, selectable via an IP_PORTRANGE setsockopt() call. Most |
984263bc MD |
586 | network programs use the default range which is controlled by |
587 | .Va net.inet.ip.portrange.first | |
588 | and | |
589 | .Va net.inet.ip.portrange.last , | |
590 | which defaults to 1024 and 5000 respectively. Bound port ranges are | |
591 | used for outgoing connections and it is possible to run the system out | |
592 | of ports under certain circumstances. This most commonly occurs when you are | |
593 | running a heavily loaded web proxy. The port range is not an issue | |
594 | when running serves which handle mainly incoming connections such as a | |
595 | normal web server, or has a limited number of outgoing connections such | |
596 | as a mail relay. For situations where you may run yourself out of | |
597 | ports we recommend increasing | |
598 | .Va net.inet.ip.portrange.last | |
599 | modestly. A value of 10000 or 20000 or 30000 may be reasonable. You should | |
600 | also consider firewall effects when changing the port range. Some firewalls | |
601 | may block large ranges of ports (usually low-numbered ports) and expect systems | |
602 | to use higher ranges of ports for outgoing connections. For this reason | |
603 | we do not recommend that | |
604 | .Va net.inet.ip.portrange.first | |
605 | be lowered. | |
606 | .Pp | |
607 | The | |
608 | .Va kern.ipc.somaxconn | |
609 | sysctl limits the size of the listen queue for accepting new TCP connections. | |
610 | The default value of 128 is typically too low for robust handling of new | |
611 | connections in a heavily loaded web server environment. | |
612 | For such environments, | |
613 | we recommend increasing this value to 1024 or higher. | |
614 | The service daemon | |
615 | may itself limit the listen queue size (e.g.\& | |
616 | .Xr sendmail 8 , | |
617 | apache) but will | |
618 | often have a directive in its configuration file to adjust the queue size up. | |
619 | Larger listen queues also do a better job of fending off denial of service | |
620 | attacks. | |
621 | .Pp | |
622 | The | |
623 | .Va kern.maxfiles | |
624 | sysctl determines how many open files the system supports. | |
625 | The default is | |
626 | typically a few thousand but you may need to bump this up to ten or twenty | |
627 | thousand if you are running databases or large descriptor-heavy daemons. | |
628 | The read-only | |
629 | .Va kern.openfiles | |
630 | sysctl may be interrogated to determine the current number of open files | |
631 | on the system. | |
632 | .Pp | |
633 | The | |
634 | .Va vm.swap_idle_enabled | |
635 | sysctl is useful in large multi-user systems where you have lots of users | |
636 | entering and leaving the system and lots of idle processes. | |
637 | Such systems | |
638 | tend to generate a great deal of continuous pressure on free memory reserves. | |
639 | Turning this feature on and adjusting the swapout hysteresis (in idle | |
640 | seconds) via | |
641 | .Va vm.swap_idle_threshold1 | |
642 | and | |
643 | .Va vm.swap_idle_threshold2 | |
644 | allows you to depress the priority of pages associated with idle processes | |
645 | more quickly then the normal pageout algorithm. | |
646 | This gives a helping hand | |
647 | to the pageout daemon. | |
648 | Do not turn this option on unless you need it, | |
649 | because the tradeoff you are making is to essentially pre-page memory sooner | |
650 | rather then later, eating more swap and disk bandwidth. | |
651 | In a small system | |
652 | this option will have a detrimental effect but in a large system that is | |
653 | already doing moderate paging this option allows the VM system to stage | |
654 | whole processes into and out of memory more easily. | |
655 | .Sh LOADER TUNABLES | |
656 | Some aspects of the system behavior may not be tunable at runtime because | |
657 | memory allocations they perform must occur early in the boot process. | |
658 | To change loader tunables, you must set their values in | |
659 | .Xr loader.conf 5 | |
660 | and reboot the system. | |
661 | .Pp | |
662 | .Va kern.maxusers | |
663 | controls the scaling of a number of static system tables, including defaults | |
664 | for the maximum number of open files, sizing of network memory resources, etc. | |
665 | As of | |
666 | .Fx 4.5 , | |
667 | .Va kern.maxusers | |
668 | is automatically sized at boot based on the amount of memory available in | |
669 | the system, and may be determined at run-time by inspecting the value of the | |
670 | read-only | |
671 | .Va kern.maxusers | |
672 | sysctl. | |
673 | Some sites will require larger or smaller values of | |
674 | .Va kern.maxusers | |
675 | and may set it as a loader tunable; values of 64, 128, and 256 are not | |
676 | uncommon. | |
677 | We do not recommend going above 256 unless you need a huge number | |
678 | of file descriptors; many of the tunable values set to their defaults by | |
679 | .Va kern.maxusers | |
680 | may be individually overridden at boot-time or run-time as described | |
681 | elsewhere in this document. | |
682 | Systems older than | |
683 | .Fx 4.4 | |
684 | must set this value via the kernel | |
685 | .Xr config 8 | |
686 | option | |
687 | .Cd maxusers | |
688 | instead. | |
689 | .Pp | |
690 | .Va kern.ipc.nmbclusters | |
691 | may be adjusted to increase the number of network mbufs the system is | |
692 | willing to allocate. | |
693 | Each cluster represents approximately 2K of memory, | |
694 | so a value of 1024 represents 2M of kernel memory reserved for network | |
695 | buffers. | |
696 | You can do a simple calculation to figure out how many you need. | |
697 | If you have a web server which maxes out at 1000 simultaneous connections, | |
698 | and each connection eats a 16K receive and 16K send buffer, you need | |
699 | approximately 32MB worth of network buffers to deal with it. | |
700 | A good rule of | |
701 | thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. | |
702 | So for this case | |
703 | you would want to set | |
704 | .Va kern.ipc.nmbclusters | |
705 | to 32768. | |
706 | We recommend values between | |
707 | 1024 and 4096 for machines with moderates amount of memory, and between 4096 | |
708 | and 32768 for machines with greater amounts of memory. | |
709 | Under no circumstances | |
710 | should you specify an arbitrarily high value for this parameter, it could | |
711 | lead to a boot-time crash. | |
712 | The | |
713 | .Fl m | |
714 | option to | |
715 | .Xr netstat 1 | |
716 | may be used to observe network cluster use. | |
717 | Older versions of | |
718 | .Fx | |
719 | do not have this tunable and require that the | |
720 | kernel | |
721 | .Xr config 8 | |
722 | option | |
723 | .Dv NMBCLUSTERS | |
724 | be set instead. | |
725 | .Pp | |
726 | More and more programs are using the | |
727 | .Xr sendfile 2 | |
728 | system call to transmit files over the network. | |
729 | The | |
730 | .Va kern.ipc.nsfbufs | |
731 | sysctl controls the number of filesystem buffers | |
732 | .Xr sendfile 2 | |
733 | is allowed to use to perform its work. | |
734 | This parameter nominally scales | |
735 | with | |
736 | .Va kern.maxusers | |
737 | so you should not need to modify this parameter except under extreme | |
738 | circumstances. | |
739 | .Sh KERNEL CONFIG TUNING | |
740 | There are a number of kernel options that you may have to fiddle with in | |
741 | a large-scale system. | |
742 | In order to change these options you need to be | |
743 | able to compile a new kernel from source. | |
744 | The | |
745 | .Xr config 8 | |
746 | manual page and the handbook are good starting points for learning how to | |
747 | do this. | |
748 | Generally the first thing you do when creating your own custom | |
749 | kernel is to strip out all the drivers and services you do not use. | |
750 | Removing things like | |
751 | .Dv INET6 | |
752 | and drivers you do not have will reduce the size of your kernel, sometimes | |
753 | by a megabyte or more, leaving more memory available for applications. | |
754 | .Pp | |
755 | .Dv SCSI_DELAY | |
756 | and | |
757 | .Dv IDE_DELAY | |
758 | may be used to reduce system boot times. | |
759 | The defaults are fairly high and | |
760 | can be responsible for 15+ seconds of delay in the boot process. | |
761 | Reducing | |
762 | .Dv SCSI_DELAY | |
763 | to 5 seconds usually works (especially with modern drives). | |
764 | Reducing | |
765 | .Dv IDE_DELAY | |
766 | also works but you have to be a little more careful. | |
767 | .Pp | |
768 | There are a number of | |
769 | .Dv *_CPU | |
770 | options that can be commented out. | |
771 | If you only want the kernel to run | |
772 | on a Pentium class CPU, you can easily remove | |
773 | .Dv I386_CPU | |
774 | and | |
775 | .Dv I486_CPU , | |
776 | but only remove | |
777 | .Dv I586_CPU | |
778 | if you are sure your CPU is being recognized as a Pentium II or better. | |
779 | Some clones may be recognized as a Pentium or even a 486 and not be able | |
780 | to boot without those options. | |
781 | If it works, great! | |
782 | The operating system | |
783 | will be able to better-use higher-end CPU features for MMU, task switching, | |
784 | timebase, and even device operations. | |
785 | Additionally, higher-end CPUs support | |
786 | 4MB MMU pages, which the kernel uses to map the kernel itself into memory, | |
787 | increasing its efficiency under heavy syscall loads. | |
788 | .Sh IDE WRITE CACHING | |
789 | .Fx 4.3 | |
790 | flirted with turning off IDE write caching. | |
791 | This reduced write bandwidth | |
792 | to IDE disks but was considered necessary due to serious data consistency | |
793 | issues introduced by hard drive vendors. | |
794 | Basically the problem is that | |
795 | IDE drives lie about when a write completes. | |
796 | With IDE write caching turned | |
797 | on, IDE hard drives will not only write data to disk out of order, they | |
798 | will sometimes delay some of the blocks indefinitely under heavy disk | |
799 | load. | |
800 | A crash or power failure can result in serious filesystem | |
801 | corruption. | |
802 | So our default was changed to be safe. | |
803 | Unfortunately, the | |
804 | result was such a huge loss in performance that we caved in and changed the | |
805 | default back to on after the release. | |
806 | You should check the default on | |
807 | your system by observing the | |
808 | .Va hw.ata.wc | |
809 | sysctl variable. | |
810 | If IDE write caching is turned off, you can turn it back | |
811 | on by setting the | |
812 | .Va hw.ata.wc | |
813 | loader tunable to 1. | |
814 | More information on tuning the ATA driver system may be found in the | |
815 | .Xr ata 4 | |
816 | man page. | |
817 | .Pp | |
818 | There is a new experimental feature for IDE hard drives called | |
819 | .Va hw.ata.tags | |
820 | (you also set this in the boot loader) which allows write caching to be safely | |
821 | turned on. | |
822 | This brings SCSI tagging features to IDE drives. | |
823 | As of this | |
824 | writing only IBM DPTA and DTLA drives support the feature. | |
825 | Warning! | |
826 | These | |
827 | drives apparently have quality control problems and I do not recommend | |
828 | purchasing them at this time. | |
829 | If you need performance, go with SCSI. | |
830 | .Sh CPU, MEMORY, DISK, NETWORK | |
831 | The type of tuning you do depends heavily on where your system begins to | |
832 | bottleneck as load increases. | |
833 | If your system runs out of CPU (idle times | |
834 | are perpetually 0%) then you need to consider upgrading the CPU or moving to | |
835 | an SMP motherboard (multiple CPU's), or perhaps you need to revisit the | |
836 | programs that are causing the load and try to optimize them. | |
837 | If your system | |
838 | is paging to swap a lot you need to consider adding more memory. | |
839 | If your | |
840 | system is saturating the disk you typically see high CPU idle times and | |
841 | total disk saturation. | |
842 | .Xr systat 1 | |
843 | can be used to monitor this. | |
844 | There are many solutions to saturated disks: | |
845 | increasing memory for caching, mirroring disks, distributing operations across | |
846 | several machines, and so forth. | |
847 | If disk performance is an issue and you | |
848 | are using IDE drives, switching to SCSI can help a great deal. | |
849 | While modern | |
850 | IDE drives compare with SCSI in raw sequential bandwidth, the moment you | |
851 | start seeking around the disk SCSI drives usually win. | |
852 | .Pp | |
853 | Finally, you might run out of network suds. | |
854 | The first line of defense for | |
855 | improving network performance is to make sure you are using switches instead | |
856 | of hubs, especially these days where switches are almost as cheap. | |
857 | Hubs | |
858 | have severe problems under heavy loads due to collision backoff and one bad | |
859 | host can severely degrade the entire LAN. | |
860 | Second, optimize the network path | |
861 | as much as possible. | |
862 | For example, in | |
863 | .Xr firewall 7 | |
864 | we describe a firewall protecting internal hosts with a topology where | |
865 | the externally visible hosts are not routed through it. | |
866 | Use 100BaseT rather | |
867 | than 10BaseT, or use 1000BaseT rather then 100BaseT, depending on your needs. | |
868 | Most bottlenecks occur at the WAN link (e.g.\& | |
869 | modem, T1, DSL, whatever). | |
870 | If expanding the link is not an option it may be possible to use the | |
871 | .Xr dummynet 4 | |
872 | feature to implement peak shaving or other forms of traffic shaping to | |
873 | prevent the overloaded service (such as web services) from affecting other | |
874 | services (such as email), or vice versa. | |
875 | In home installations this could | |
876 | be used to give interactive traffic (your browser, | |
877 | .Xr ssh 1 | |
878 | logins) priority | |
879 | over services you export from your box (web services, email). | |
880 | .Sh SEE ALSO | |
881 | .Xr netstat 1 , | |
882 | .Xr systat 1 , | |
883 | .Xr ata 4 , | |
884 | .Xr dummynet 4 , | |
885 | .Xr login.conf 5 , | |
886 | .Xr rc.conf 5 , | |
887 | .Xr sysctl.conf 5 , | |
888 | .Xr firewall 7 , | |
889 | .Xr hier 7 , | |
890 | .Xr ports 7 , | |
891 | .Xr boot 8 , | |
892 | .Xr ccdconfig 8 , | |
893 | .Xr config 8 , | |
894 | .Xr disklabel 8 , | |
895 | .Xr fsck 8 , | |
896 | .Xr ifconfig 8 , | |
897 | .Xr ipfw 8 , | |
898 | .Xr loader 8 , | |
899 | .Xr mount 8 , | |
900 | .Xr newfs 8 , | |
901 | .Xr route 8 , | |
902 | .Xr sysctl 8 , | |
903 | .Xr sysinstall 8 , | |
904 | .Xr tunefs 8 , | |
905 | .Xr vinum 8 | |
906 | .Sh HISTORY | |
907 | The | |
908 | .Nm | |
909 | manual page was originally written by | |
910 | .An Matthew Dillon | |
911 | and first appeared | |
912 | in | |
913 | .Fx 4.3 , | |
914 | May 2001. |