Commit | Line | Data |
---|---|---|
033a4603 | 1 | .\" Copyright (c) 2001 Matthew Dillon. Terms and conditions are those of |
984263bc MD |
2 | .\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in |
3 | .\" the source tree. | |
4 | .\" | |
5 | .\" $FreeBSD: src/share/man/man7/tuning.7,v 1.1.2.30 2002/12/17 19:32:08 dillon Exp $ | |
ac5c99e1 | 6 | .\" $DragonFly: src/share/man/man7/tuning.7,v 1.18 2007/12/13 20:51:36 swildner Exp $ |
984263bc | 7 | .\" |
7bc27c52 | 8 | .Dd March 4, 2007 |
984263bc MD |
9 | .Dt TUNING 7 |
10 | .Os | |
11 | .Sh NAME | |
12 | .Nm tuning | |
ac5c99e1 | 13 | .Nd performance tuning under DragonFly |
984263bc MD |
14 | .Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP |
15 | When using | |
16 | .Xr disklabel 8 | |
f5f2fec6 SW |
17 | or the |
18 | .Dx | |
19 | installer | |
984263bc MD |
20 | to lay out your filesystems on a hard disk it is important to remember |
21 | that hard drives can transfer data much more quickly from outer tracks | |
22 | than they can from inner tracks. | |
23 | To take advantage of this you should | |
24 | try to pack your smaller filesystems and swap closer to the outer tracks, | |
25 | follow with the larger filesystems, and end with the largest filesystems. | |
26 | It is also important to size system standard filesystems such that you | |
27 | will not be forced to resize them later as you scale the machine up. | |
28 | I usually create, in order, a 128M root, 1G swap, 128M | |
29 | .Pa /var , | |
30 | 128M | |
31 | .Pa /var/tmp , | |
32 | 3G | |
33 | .Pa /usr , | |
34 | and use any remaining space for | |
35 | .Pa /home . | |
36 | .Pp | |
37 | You should typically size your swap space to approximately 2x main memory. | |
38 | If you do not have a lot of RAM, though, you will generally want a lot | |
39 | more swap. | |
40 | It is not recommended that you configure any less than | |
41 | 256M of swap on a system and you should keep in mind future memory | |
42 | expansion when sizing the swap partition. | |
43 | The kernel's VM paging algorithms are tuned to perform best when there is | |
44 | at least 2x swap versus main memory. | |
45 | Configuring too little swap can lead | |
46 | to inefficiencies in the VM page scanning code as well as create issues | |
47 | later on if you add more memory to your machine. | |
48 | Finally, on larger systems | |
49 | with multiple SCSI disks (or multiple IDE disks operating on different | |
50 | controllers), we strongly recommend that you configure swap on each drive | |
51 | (up to four drives). | |
52 | The swap partitions on the drives should be approximately the same size. | |
53 | The kernel can handle arbitrary sizes but | |
54 | internal data structures scale to 4 times the largest swap partition. | |
55 | Keeping | |
56 | the swap partitions near the same size will allow the kernel to optimally | |
57 | stripe swap space across the N disks. | |
58 | Do not worry about overdoing it a | |
59 | little, swap space is the saving grace of | |
60 | .Ux | |
61 | and even if you do not normally use much swap, it can give you more time to | |
62 | recover from a runaway program before being forced to reboot. | |
63 | .Pp | |
64 | How you size your | |
65 | .Pa /var | |
66 | partition depends heavily on what you intend to use the machine for. | |
67 | This | |
68 | partition is primarily used to hold mailboxes, the print spool, and log | |
69 | files. | |
70 | Some people even make | |
71 | .Pa /var/log | |
72 | its own partition (but except for extreme cases it is not worth the waste | |
73 | of a partition ID). | |
74 | If your machine is intended to act as a mail | |
75 | or print server, | |
76 | or you are running a heavily visited web server, you should consider | |
77 | creating a much larger partition \(en perhaps a gig or more. | |
78 | It is very easy | |
79 | to underestimate log file storage requirements. | |
80 | .Pp | |
81 | Sizing | |
82 | .Pa /var/tmp | |
83 | depends on the kind of temporary file usage you think you will need. | |
84 | 128M is | |
85 | the minimum we recommend. | |
f5f2fec6 SW |
86 | Also note that the |
87 | .Dx | |
88 | installer will create a | |
984263bc MD |
89 | .Pa /tmp |
90 | directory. | |
91 | Dedicating a partition for temporary file storage is important for | |
92 | two reasons: first, it reduces the possibility of filesystem corruption | |
93 | in a crash, and second it reduces the chance of a runaway process that | |
94 | fills up | |
95 | .Oo Pa /var Oc Ns Pa /tmp | |
96 | from blowing up more critical subsystems (mail, | |
97 | logging, etc). | |
98 | Filling up | |
99 | .Oo Pa /var Oc Ns Pa /tmp | |
100 | is a very common problem to have. | |
101 | .Pp | |
102 | In the old days there were differences between | |
103 | .Pa /tmp | |
104 | and | |
105 | .Pa /var/tmp , | |
106 | but the introduction of | |
107 | .Pa /var | |
108 | (and | |
109 | .Pa /var/tmp ) | |
110 | led to massive confusion | |
111 | by program writers so today programs haphazardly use one or the | |
112 | other and thus no real distinction can be made between the two. | |
113 | So it makes sense to have just one temporary directory and | |
114 | softlink to it from the other tmp directory locations. | |
115 | However you handle | |
116 | .Pa /tmp , | |
117 | the one thing you do not want to do is leave it sitting | |
118 | on the root partition where it might cause root to fill up or possibly | |
119 | corrupt root in a crash/reboot situation. | |
120 | .Pp | |
121 | The | |
122 | .Pa /usr | |
123 | partition holds the bulk of the files required to support the system and | |
124 | a subdirectory within it called | |
f5f2fec6 | 125 | .Pa /usr/pkg |
984263bc | 126 | holds the bulk of the files installed from the |
28feafc7 SW |
127 | .Xr pkgsrc 7 |
128 | collection. | |
129 | If you do not use | |
130 | .Xr pkgsrc 7 | |
131 | all that much and do not intend to keep system source | |
984263bc MD |
132 | .Pq Pa /usr/src |
133 | on the machine, you can get away with | |
134 | a 1 gigabyte | |
135 | .Pa /usr | |
136 | partition. | |
f5f2fec6 | 137 | However, if you install a lot of packages |
984263bc MD |
138 | (especially window managers and Linux-emulated binaries), we recommend |
139 | at least a 2 gigabyte | |
140 | .Pa /usr | |
141 | and if you also intend to keep system source | |
142 | on the machine, we recommend a 3 gigabyte | |
143 | .Pa /usr . | |
144 | Do not underestimate the | |
145 | amount of space you will need in this partition, it can creep up and | |
146 | surprise you! | |
147 | .Pp | |
148 | The | |
149 | .Pa /home | |
150 | partition is typically used to hold user-specific data. | |
151 | I usually size it to the remainder of the disk. | |
152 | .Pp | |
153 | Why partition at all? | |
154 | Why not create one big | |
155 | .Pa / | |
156 | partition and be done with it? | |
157 | Then I do not have to worry about undersizing things! | |
158 | Well, there are several reasons this is not a good idea. | |
159 | First, | |
160 | each partition has different operational characteristics and separating them | |
161 | allows the filesystem to tune itself to those characteristics. | |
162 | For example, | |
163 | the root and | |
164 | .Pa /usr | |
165 | partitions are read-mostly, with very little writing, while | |
166 | a lot of reading and writing could occur in | |
167 | .Pa /var | |
168 | and | |
169 | .Pa /var/tmp . | |
170 | By properly | |
171 | partitioning your system fragmentation introduced in the smaller more | |
172 | heavily write-loaded partitions will not bleed over into the mostly-read | |
173 | partitions. | |
174 | Additionally, keeping the write-loaded partitions closer to | |
175 | the edge of the disk (i.e. before the really big partitions instead of after | |
176 | in the partition table) will increase I/O performance in the partitions | |
177 | where you need it the most. | |
178 | Now it is true that you might also need I/O | |
179 | performance in the larger partitions, but they are so large that shifting | |
180 | them more towards the edge of the disk will not lead to a significant | |
181 | performance improvement whereas moving | |
182 | .Pa /var | |
183 | to the edge can have a huge impact. | |
184 | Finally, there are safety concerns. | |
185 | Having a small neat root partition that | |
186 | is essentially read-only gives it a greater chance of surviving a bad crash | |
187 | intact. | |
188 | .Pp | |
189 | Properly partitioning your system also allows you to tune | |
190 | .Xr newfs 8 , | |
191 | and | |
192 | .Xr tunefs 8 | |
193 | parameters. | |
194 | Tuning | |
195 | .Xr newfs 8 | |
196 | requires more experience but can lead to significant improvements in | |
197 | performance. | |
198 | There are three parameters that are relatively safe to tune: | |
199 | .Em blocksize , bytes/i-node , | |
200 | and | |
201 | .Em cylinders/group . | |
202 | .Pp | |
9bb2a92d | 203 | .Dx |
984263bc MD |
204 | performs best when using 8K or 16K filesystem block sizes. |
205 | The default filesystem block size is 16K, | |
206 | which provides best performance for most applications, | |
207 | with the exception of those that perform random access on large files | |
208 | (such as database server software). | |
209 | Such applications tend to perform better with a smaller block size, | |
210 | although modern disk characteristics are such that the performance | |
211 | gain from using a smaller block size may not be worth consideration. | |
212 | Using a block size larger than 16K | |
213 | can cause fragmentation of the buffer cache and | |
214 | lead to lower performance. | |
215 | .Pp | |
216 | The defaults may be unsuitable | |
217 | for a filesystem that requires a very large number of i-nodes | |
218 | or is intended to hold a large number of very small files. | |
219 | Such a filesystem should be created with an 8K or 4K block size. | |
220 | This also requires you to specify a smaller | |
221 | fragment size. | |
aa0d550a | 222 | We recommend always using a fragment size that is \(18 |
984263bc MD |
223 | the block size (less testing has been done on other fragment size factors). |
224 | The | |
225 | .Xr newfs 8 | |
226 | options for this would be | |
227 | .Dq Li "newfs -f 1024 -b 8192 ..." . | |
228 | .Pp | |
229 | If a large partition is intended to be used to hold fewer, larger files, such | |
230 | as database files, you can increase the | |
231 | .Em bytes/i-node | |
232 | ratio which reduces the number of i-nodes (maximum number of files and | |
233 | directories that can be created) for that partition. | |
234 | Decreasing the number | |
235 | of i-nodes in a filesystem can greatly reduce | |
236 | .Xr fsck 8 | |
237 | recovery times after a crash. | |
238 | Do not use this option | |
239 | unless you are actually storing large files on the partition, because if you | |
240 | overcompensate you can wind up with a filesystem that has lots of free | |
241 | space remaining but cannot accommodate any more files. | |
242 | Using 32768, 65536, or 262144 bytes/i-node is recommended. | |
243 | You can go higher but | |
244 | it will have only incremental effects on | |
245 | .Xr fsck 8 | |
246 | recovery times. | |
247 | For example, | |
248 | .Dq Li "newfs -i 32768 ..." . | |
249 | .Pp | |
250 | .Xr tunefs 8 | |
251 | may be used to further tune a filesystem. | |
252 | This command can be run in | |
253 | single-user mode without having to reformat the filesystem. | |
254 | However, this is possibly the most abused program in the system. | |
255 | Many people attempt to | |
256 | increase available filesystem space by setting the min-free percentage to 0. | |
257 | This can lead to severe filesystem fragmentation and we do not recommend | |
258 | that you do this. | |
259 | Really the only | |
260 | .Xr tunefs 8 | |
261 | option worthwhile here is turning on | |
262 | .Em softupdates | |
263 | with | |
264 | .Dq Li "tunefs -n enable /filesystem" . | |
265 | (Note: in | |
f5f2fec6 SW |
266 | .Dx , |
267 | softupdates can be turned on using the | |
984263bc MD |
268 | .Fl U |
269 | option to | |
270 | .Xr newfs 8 , | |
271 | and | |
f5f2fec6 SW |
272 | .Dx |
273 | installer will typically enable softupdates automatically for | |
274 | non-root filesystems). | |
984263bc MD |
275 | Softupdates drastically improves meta-data performance, mainly file |
276 | creation and deletion. | |
277 | We recommend enabling softupdates on most filesystems; however, there | |
278 | are two limitations to softupdates that you should be aware of when | |
279 | determining whether to use it on a filesystem. | |
280 | First, softupdates guarantees filesystem consistency in the | |
281 | case of a crash but could very easily be several seconds (even a minute!) | |
282 | behind on pending writes to the physical disk. | |
283 | If you crash you may lose more work | |
284 | than otherwise. | |
285 | Secondly, softupdates delays the freeing of filesystem | |
286 | blocks. | |
287 | If you have a filesystem (such as the root filesystem) which is | |
288 | close to full, doing a major update of it, e.g.\& | |
289 | .Dq Li "make installworld" , | |
290 | can run it out of space and cause the update to fail. | |
291 | For this reason, softupdates will not be enabled on the root filesystem | |
292 | during a typical install. There is no loss of performance since the root | |
293 | filesystem is rarely written to. | |
294 | .Pp | |
295 | A number of run-time | |
296 | .Xr mount 8 | |
297 | options exist that can help you tune the system. | |
298 | The most obvious and most dangerous one is | |
299 | .Cm async . | |
300 | Do not ever use it; it is far too dangerous. | |
301 | A less dangerous and more | |
302 | useful | |
303 | .Xr mount 8 | |
304 | option is called | |
305 | .Cm noatime . | |
306 | .Ux | |
307 | filesystems normally update the last-accessed time of a file or | |
308 | directory whenever it is accessed. | |
309 | This operation is handled in | |
9bb2a92d | 310 | .Dx |
984263bc MD |
311 | with a delayed write and normally does not create a burden on the system. |
312 | However, if your system is accessing a huge number of files on a continuing | |
313 | basis the buffer cache can wind up getting polluted with atime updates, | |
314 | creating a burden on the system. | |
315 | For example, if you are running a heavily | |
316 | loaded web site, or a news server with lots of readers, you might want to | |
317 | consider turning off atime updates on your larger partitions with this | |
318 | .Xr mount 8 | |
319 | option. | |
320 | However, you should not gratuitously turn off atime | |
321 | updates everywhere. | |
322 | For example, the | |
323 | .Pa /var | |
324 | filesystem customarily | |
325 | holds mailboxes, and atime (in combination with mtime) is used to | |
326 | determine whether a mailbox has new mail. | |
327 | You might as well leave | |
328 | atime turned on for mostly read-only partitions such as | |
329 | .Pa / | |
330 | and | |
331 | .Pa /usr | |
332 | as well. | |
333 | This is especially useful for | |
334 | .Pa / | |
335 | since some system utilities | |
336 | use the atime field for reporting. | |
337 | .Sh STRIPING DISKS | |
338 | In larger systems you can stripe partitions from several drives together | |
339 | to create a much larger overall partition. | |
340 | Striping can also improve | |
341 | the performance of a filesystem by splitting I/O operations across two | |
342 | or more disks. | |
343 | The | |
344 | .Xr vinum 8 | |
345 | and | |
346 | .Xr ccdconfig 8 | |
347 | utilities may be used to create simple striped filesystems. | |
348 | Generally | |
349 | speaking, striping smaller partitions such as the root and | |
350 | .Pa /var/tmp , | |
351 | or essentially read-only partitions such as | |
352 | .Pa /usr | |
353 | is a complete waste of time. | |
354 | You should only stripe partitions that require serious I/O performance, | |
355 | typically | |
356 | .Pa /var , /home , | |
357 | or custom partitions used to hold databases and web pages. | |
358 | Choosing the proper stripe size is also | |
359 | important. | |
360 | Filesystems tend to store meta-data on power-of-2 boundaries | |
361 | and you usually want to reduce seeking rather than increase seeking. | |
362 | This | |
363 | means you want to use a large off-center stripe size such as 1152 sectors | |
364 | so sequential I/O does not seek both disks and so meta-data is distributed | |
365 | across both disks rather than concentrated on a single disk. | |
366 | If | |
367 | you really need to get sophisticated, we recommend using a real hardware | |
368 | RAID controller from the list of | |
9bb2a92d | 369 | .Dx |
984263bc MD |
370 | supported controllers. |
371 | .Sh SYSCTL TUNING | |
372 | .Xr sysctl 8 | |
373 | variables permit system behavior to be monitored and controlled at | |
374 | run-time. | |
375 | Some sysctls simply report on the behavior of the system; others allow | |
376 | the system behavior to be modified; | |
377 | some may be set at boot time using | |
378 | .Xr rc.conf 5 , | |
379 | but most will be set via | |
380 | .Xr sysctl.conf 5 . | |
381 | There are several hundred sysctls in the system, including many that appear | |
382 | to be candidates for tuning but actually are not. | |
383 | In this document we will only cover the ones that have the greatest effect | |
384 | on the system. | |
385 | .Pp | |
386 | The | |
387 | .Va kern.ipc.shm_use_phys | |
388 | sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on). | |
389 | Setting | |
390 | this parameter to 1 will cause all System V shared memory segments to be | |
391 | mapped to unpageable physical RAM. | |
392 | This feature only has an effect if you | |
393 | are either (A) mapping small amounts of shared memory across many (hundreds) | |
394 | of processes, or (B) mapping large amounts of shared memory across any | |
395 | number of processes. | |
396 | This feature allows the kernel to remove a great deal | |
397 | of internal memory management page-tracking overhead at the cost of wiring | |
398 | the shared memory into core, making it unswappable. | |
399 | .Pp | |
400 | The | |
984263bc MD |
401 | .Va vfs.write_behind |
402 | sysctl defaults to 1 (on). This tells the filesystem to issue media | |
403 | writes as full clusters are collected, which typically occurs when writing | |
404 | large sequential files. The idea is to avoid saturating the buffer | |
405 | cache with dirty buffers when it would not benefit I/O performance. However, | |
406 | this may stall processes and under certain circumstances you may wish to turn | |
407 | it off. | |
408 | .Pp | |
409 | The | |
410 | .Va vfs.hirunningspace | |
411 | sysctl determines how much outstanding write I/O may be queued to | |
412 | disk controllers system wide at any given instance. The default is | |
413 | usually sufficient but on machines with lots of disks you may want to bump | |
414 | it up to four or five megabytes. Note that setting too high a value | |
415 | (exceeding the buffer cache's write threshold) can lead to extremely | |
416 | bad clustering performance. Do not set this value arbitrarily high! Also, | |
3221afbe | 417 | higher write queueing values may add latency to reads occurring at the same |
984263bc MD |
418 | time. |
419 | .Pp | |
420 | There are various other buffer-cache and VM page cache related sysctls. | |
421 | We do not recommend modifying these values. | |
422 | As of | |
423 | .Fx 4.3 , | |
424 | the VM system does an extremely good job tuning itself. | |
425 | .Pp | |
426 | The | |
427 | .Va net.inet.tcp.sendspace | |
428 | and | |
429 | .Va net.inet.tcp.recvspace | |
430 | sysctls are of particular interest if you are running network intensive | |
431 | applications. | |
432 | They control the amount of send and receive buffer space | |
433 | allowed for any given TCP connection. | |
434 | The default sending buffer is 32K; the default receiving buffer | |
435 | is 64K. | |
436 | You can often | |
437 | improve bandwidth utilization by increasing the default at the cost of | |
438 | eating up more kernel memory for each connection. | |
439 | We do not recommend | |
440 | increasing the defaults if you are serving hundreds or thousands of | |
441 | simultaneous connections because it is possible to quickly run the system | |
442 | out of memory due to stalled connections building up. | |
443 | But if you need | |
444 | high bandwidth over a fewer number of connections, especially if you have | |
445 | gigabit Ethernet, increasing these defaults can make a huge difference. | |
446 | You can adjust the buffer size for incoming and outgoing data separately. | |
447 | For example, if your machine is primarily doing web serving you may want | |
448 | to decrease the recvspace in order to be able to increase the | |
449 | sendspace without eating too much kernel memory. | |
450 | Note that the routing table (see | |
451 | .Xr route 8 ) | |
452 | can be used to introduce route-specific send and receive buffer size | |
453 | defaults. | |
454 | .Pp | |
455 | As an additional management tool you can use pipes in your | |
456 | firewall rules (see | |
457 | .Xr ipfw 8 ) | |
458 | to limit the bandwidth going to or from particular IP blocks or ports. | |
459 | For example, if you have a T1 you might want to limit your web traffic | |
460 | to 70% of the T1's bandwidth in order to leave the remainder available | |
461 | for mail and interactive use. | |
462 | Normally a heavily loaded web server | |
463 | will not introduce significant latencies into other services even if | |
464 | the network link is maxed out, but enforcing a limit can smooth things | |
465 | out and lead to longer term stability. | |
466 | Many people also enforce artificial | |
467 | bandwidth limitations in order to ensure that they are not charged for | |
468 | using too much bandwidth. | |
469 | .Pp | |
470 | Setting the send or receive TCP buffer to values larger then 65535 will result | |
471 | in a marginal performance improvement unless both hosts support the window | |
472 | scaling extension of the TCP protocol, which is controlled by the | |
473 | .Va net.inet.tcp.rfc1323 | |
474 | sysctl. | |
475 | These extensions should be enabled and the TCP buffer size should be set | |
476 | to a value larger than 65536 in order to obtain good performance from | |
477 | certain types of network links; specifically, gigabit WAN links and | |
478 | high-latency satellite links. | |
cabeba47 | 479 | RFC 1323 support is enabled by default. |
984263bc MD |
480 | .Pp |
481 | The | |
482 | .Va net.inet.tcp.always_keepalive | |
483 | sysctl determines whether or not the TCP implementation should attempt | |
484 | to detect dead TCP connections by intermittently delivering | |
485 | .Dq keepalives | |
486 | on the connection. | |
f5f2fec6 SW |
487 | By default, this is disabled for all applications, only applications |
488 | that specifically request keepalives will use them. | |
984263bc MD |
489 | In most environments, TCP keepalives will improve the management of |
490 | system state by expiring dead TCP connections, particularly for | |
491 | systems serving dialup users who may not always terminate individual | |
492 | TCP connections before disconnecting from the network. | |
493 | However, in some environments, temporary network outages may be | |
494 | incorrectly identified as dead sessions, resulting in unexpectedly | |
495 | terminated TCP connections. | |
496 | In such environments, setting the sysctl to 0 may reduce the occurrence of | |
497 | TCP session disconnections. | |
498 | .Pp | |
499 | The | |
500 | .Va net.inet.tcp.delayed_ack | |
3f5e28f4 | 501 | TCP feature is largely misunderstood. Historically speaking this feature |
984263bc MD |
502 | was designed to allow the acknowledgement to transmitted data to be returned |
503 | along with the response. For example, when you type over a remote shell | |
504 | the acknowledgement to the character you send can be returned along with the | |
505 | data representing the echo of the character. With delayed acks turned off | |
506 | the acknowledgement may be sent in its own packet before the remote service | |
507 | has a chance to echo the data it just received. This same concept also | |
508 | applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the | |
a3220ac5 SW |
509 | number of tiny packets flowing across the network in half. The |
510 | .Dx | |
984263bc MD |
511 | delayed-ack implementation also follows the TCP protocol rule that |
512 | at least every other packet be acknowledged even if the standard 100ms | |
513 | timeout has not yet passed. Normally the worst a delayed ack can do is | |
514 | slightly delay the teardown of a connection, or slightly delay the ramp-up | |
515 | of a slow-start TCP connection. While we aren't sure we believe that | |
516 | the several FAQs related to packages such as SAMBA and SQUID which advise | |
f5f2fec6 | 517 | turning off delayed acks may be refering to the slow-start issue. |
984263bc MD |
518 | .Pp |
519 | The | |
520 | .Va net.inet.tcp.inflight_enable | |
521 | sysctl turns on bandwidth delay product limiting for all TCP connections. | |
522 | The system will attempt to calculate the bandwidth delay product for each | |
523 | connection and limit the amount of data queued to the network to just the | |
524 | amount required to maintain optimum throughput. This feature is useful | |
525 | if you are serving data over modems, GigE, or high speed WAN links (or | |
526 | any other link with a high bandwidth*delay product), especially if you are | |
527 | also using window scaling or have configured a large send window. If | |
528 | you enable this option you should also be sure to set | |
529 | .Va net.inet.tcp.inflight_debug | |
530 | to 0 (disable debugging), and for production use setting | |
531 | .Va net.inet.tcp.inflight_min | |
532 | to at least 6144 may be beneficial. Note, however, that setting high | |
533 | minimums may effectively disable bandwidth limiting depending on the link. | |
534 | The limiting feature reduces the amount of data built up in intermediate | |
535 | router and switch packet queues as well as reduces the amount of data built | |
536 | up in the local host's interface queue. With fewer packets queued up, | |
537 | interactive connections, especially over slow modems, will also be able | |
538 | to operate with lower round trip times. However, note that this feature | |
68b2c890 SW |
539 | only affects data transmission (uploading / server-side). It does not |
540 | affect data reception (downloading). | |
984263bc MD |
541 | .Pp |
542 | Adjusting | |
543 | .Va net.inet.tcp.inflight_stab | |
544 | is not recommended. | |
1bf4b486 | 545 | This parameter defaults to 20, representing 2 maximal packets added |
984263bc MD |
546 | to the bandwidth delay product window calculation. The additional |
547 | window is required to stabilize the algorithm and improve responsiveness | |
548 | to changing conditions, but it can also result in higher ping times | |
1bf4b486 | 549 | over slow links (though still much lower then you would get without |
984263bc MD |
550 | the inflight algorithm). In such cases you may |
551 | wish to try reducing this parameter to 15, 10, or 5, and you may also | |
552 | have to reduce | |
553 | .Va net.inet.tcp.inflight_min | |
554 | (for example, to 3500) to get the desired effect. Reducing these parameters | |
555 | should be done as a last resort only. | |
556 | .Pp | |
557 | The | |
558 | .Va net.inet.ip.portrange.* | |
559 | sysctls control the port number ranges automatically bound to TCP and UDP | |
560 | sockets. There are three ranges: A low range, a default range, and a | |
1bf4b486 | 561 | high range, selectable via an IP_PORTRANGE setsockopt() call. Most |
984263bc MD |
562 | network programs use the default range which is controlled by |
563 | .Va net.inet.ip.portrange.first | |
564 | and | |
565 | .Va net.inet.ip.portrange.last , | |
566 | which defaults to 1024 and 5000 respectively. Bound port ranges are | |
567 | used for outgoing connections and it is possible to run the system out | |
568 | of ports under certain circumstances. This most commonly occurs when you are | |
569 | running a heavily loaded web proxy. The port range is not an issue | |
570 | when running serves which handle mainly incoming connections such as a | |
571 | normal web server, or has a limited number of outgoing connections such | |
572 | as a mail relay. For situations where you may run yourself out of | |
573 | ports we recommend increasing | |
574 | .Va net.inet.ip.portrange.last | |
575 | modestly. A value of 10000 or 20000 or 30000 may be reasonable. You should | |
576 | also consider firewall effects when changing the port range. Some firewalls | |
577 | may block large ranges of ports (usually low-numbered ports) and expect systems | |
578 | to use higher ranges of ports for outgoing connections. For this reason | |
579 | we do not recommend that | |
580 | .Va net.inet.ip.portrange.first | |
581 | be lowered. | |
582 | .Pp | |
583 | The | |
584 | .Va kern.ipc.somaxconn | |
585 | sysctl limits the size of the listen queue for accepting new TCP connections. | |
586 | The default value of 128 is typically too low for robust handling of new | |
587 | connections in a heavily loaded web server environment. | |
588 | For such environments, | |
589 | we recommend increasing this value to 1024 or higher. | |
590 | The service daemon | |
591 | may itself limit the listen queue size (e.g.\& | |
592 | .Xr sendmail 8 , | |
593 | apache) but will | |
594 | often have a directive in its configuration file to adjust the queue size up. | |
595 | Larger listen queues also do a better job of fending off denial of service | |
596 | attacks. | |
597 | .Pp | |
598 | The | |
599 | .Va kern.maxfiles | |
600 | sysctl determines how many open files the system supports. | |
601 | The default is | |
602 | typically a few thousand but you may need to bump this up to ten or twenty | |
603 | thousand if you are running databases or large descriptor-heavy daemons. | |
604 | The read-only | |
605 | .Va kern.openfiles | |
606 | sysctl may be interrogated to determine the current number of open files | |
607 | on the system. | |
608 | .Pp | |
609 | The | |
610 | .Va vm.swap_idle_enabled | |
611 | sysctl is useful in large multi-user systems where you have lots of users | |
612 | entering and leaving the system and lots of idle processes. | |
613 | Such systems | |
614 | tend to generate a great deal of continuous pressure on free memory reserves. | |
615 | Turning this feature on and adjusting the swapout hysteresis (in idle | |
616 | seconds) via | |
617 | .Va vm.swap_idle_threshold1 | |
618 | and | |
619 | .Va vm.swap_idle_threshold2 | |
620 | allows you to depress the priority of pages associated with idle processes | |
621 | more quickly then the normal pageout algorithm. | |
622 | This gives a helping hand | |
623 | to the pageout daemon. | |
624 | Do not turn this option on unless you need it, | |
625 | because the tradeoff you are making is to essentially pre-page memory sooner | |
23265324 | 626 | rather than later, eating more swap and disk bandwidth. |
984263bc MD |
627 | In a small system |
628 | this option will have a detrimental effect but in a large system that is | |
629 | already doing moderate paging this option allows the VM system to stage | |
630 | whole processes into and out of memory more easily. | |
631 | .Sh LOADER TUNABLES | |
632 | Some aspects of the system behavior may not be tunable at runtime because | |
633 | memory allocations they perform must occur early in the boot process. | |
634 | To change loader tunables, you must set their values in | |
635 | .Xr loader.conf 5 | |
636 | and reboot the system. | |
637 | .Pp | |
638 | .Va kern.maxusers | |
639 | controls the scaling of a number of static system tables, including defaults | |
640 | for the maximum number of open files, sizing of network memory resources, etc. | |
f5f2fec6 SW |
641 | On |
642 | .Dx , | |
984263bc MD |
643 | .Va kern.maxusers |
644 | is automatically sized at boot based on the amount of memory available in | |
645 | the system, and may be determined at run-time by inspecting the value of the | |
646 | read-only | |
647 | .Va kern.maxusers | |
648 | sysctl. | |
649 | Some sites will require larger or smaller values of | |
650 | .Va kern.maxusers | |
651 | and may set it as a loader tunable; values of 64, 128, and 256 are not | |
652 | uncommon. | |
653 | We do not recommend going above 256 unless you need a huge number | |
654 | of file descriptors; many of the tunable values set to their defaults by | |
655 | .Va kern.maxusers | |
656 | may be individually overridden at boot-time or run-time as described | |
657 | elsewhere in this document. | |
984263bc | 658 | .Pp |
7bc27c52 SW |
659 | The |
660 | .Va kern.dfldsiz | |
661 | and | |
662 | .Va kern.dflssiz | |
663 | tunables set the default soft limits for process data and stack size | |
664 | respectively. | |
665 | Processes may increase these up to the hard limits by calling | |
666 | .Xr setrlimit 2 . | |
667 | The | |
668 | .Va kern.maxdsiz , | |
669 | .Va kern.maxssiz , | |
670 | and | |
671 | .Va kern.maxtsiz | |
672 | tunables set the hard limits for process data, stack, and text size | |
673 | respectively; processes may not exceed these limits. | |
674 | The | |
675 | .Va kern.sgrowsiz | |
676 | tunable controls how much the stack segment will grow when a process | |
677 | needs to allocate more stack. | |
678 | .Pp | |
984263bc MD |
679 | .Va kern.ipc.nmbclusters |
680 | may be adjusted to increase the number of network mbufs the system is | |
681 | willing to allocate. | |
682 | Each cluster represents approximately 2K of memory, | |
683 | so a value of 1024 represents 2M of kernel memory reserved for network | |
684 | buffers. | |
685 | You can do a simple calculation to figure out how many you need. | |
686 | If you have a web server which maxes out at 1000 simultaneous connections, | |
687 | and each connection eats a 16K receive and 16K send buffer, you need | |
688 | approximately 32MB worth of network buffers to deal with it. | |
689 | A good rule of | |
690 | thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. | |
691 | So for this case | |
692 | you would want to set | |
693 | .Va kern.ipc.nmbclusters | |
694 | to 32768. | |
695 | We recommend values between | |
696 | 1024 and 4096 for machines with moderates amount of memory, and between 4096 | |
697 | and 32768 for machines with greater amounts of memory. | |
698 | Under no circumstances | |
699 | should you specify an arbitrarily high value for this parameter, it could | |
700 | lead to a boot-time crash. | |
701 | The | |
702 | .Fl m | |
703 | option to | |
704 | .Xr netstat 1 | |
705 | may be used to observe network cluster use. | |
984263bc MD |
706 | .Pp |
707 | More and more programs are using the | |
708 | .Xr sendfile 2 | |
709 | system call to transmit files over the network. | |
710 | The | |
711 | .Va kern.ipc.nsfbufs | |
712 | sysctl controls the number of filesystem buffers | |
713 | .Xr sendfile 2 | |
714 | is allowed to use to perform its work. | |
715 | This parameter nominally scales | |
716 | with | |
717 | .Va kern.maxusers | |
718 | so you should not need to modify this parameter except under extreme | |
719 | circumstances. | |
720 | .Sh KERNEL CONFIG TUNING | |
721 | There are a number of kernel options that you may have to fiddle with in | |
722 | a large-scale system. | |
723 | In order to change these options you need to be | |
724 | able to compile a new kernel from source. | |
725 | The | |
726 | .Xr config 8 | |
727 | manual page and the handbook are good starting points for learning how to | |
728 | do this. | |
729 | Generally the first thing you do when creating your own custom | |
730 | kernel is to strip out all the drivers and services you do not use. | |
731 | Removing things like | |
732 | .Dv INET6 | |
733 | and drivers you do not have will reduce the size of your kernel, sometimes | |
734 | by a megabyte or more, leaving more memory available for applications. | |
735 | .Pp | |
736 | .Dv SCSI_DELAY | |
984263bc | 737 | may be used to reduce system boot times. |
4ad6607f | 738 | The default is fairly high and |
984263bc MD |
739 | can be responsible for 15+ seconds of delay in the boot process. |
740 | Reducing | |
741 | .Dv SCSI_DELAY | |
742 | to 5 seconds usually works (especially with modern drives). | |
984263bc MD |
743 | .Pp |
744 | There are a number of | |
745 | .Dv *_CPU | |
746 | options that can be commented out. | |
747 | If you only want the kernel to run | |
748 | on a Pentium class CPU, you can easily remove | |
984263bc MD |
749 | .Dv I486_CPU , |
750 | but only remove | |
751 | .Dv I586_CPU | |
752 | if you are sure your CPU is being recognized as a Pentium II or better. | |
4db955e1 MD |
753 | Some clones may be recognized as a Pentium and not be able to boot |
754 | without those options. | |
984263bc MD |
755 | If it works, great! |
756 | The operating system | |
757 | will be able to better-use higher-end CPU features for MMU, task switching, | |
758 | timebase, and even device operations. | |
759 | Additionally, higher-end CPUs support | |
760 | 4MB MMU pages, which the kernel uses to map the kernel itself into memory, | |
761 | increasing its efficiency under heavy syscall loads. | |
762 | .Sh IDE WRITE CACHING | |
763 | .Fx 4.3 | |
764 | flirted with turning off IDE write caching. | |
765 | This reduced write bandwidth | |
766 | to IDE disks but was considered necessary due to serious data consistency | |
767 | issues introduced by hard drive vendors. | |
768 | Basically the problem is that | |
769 | IDE drives lie about when a write completes. | |
770 | With IDE write caching turned | |
771 | on, IDE hard drives will not only write data to disk out of order, they | |
772 | will sometimes delay some of the blocks indefinitely under heavy disk | |
773 | load. | |
774 | A crash or power failure can result in serious filesystem | |
775 | corruption. | |
776 | So our default was changed to be safe. | |
777 | Unfortunately, the | |
778 | result was such a huge loss in performance that we caved in and changed the | |
779 | default back to on after the release. | |
780 | You should check the default on | |
781 | your system by observing the | |
782 | .Va hw.ata.wc | |
783 | sysctl variable. | |
784 | If IDE write caching is turned off, you can turn it back | |
785 | on by setting the | |
786 | .Va hw.ata.wc | |
787 | loader tunable to 1. | |
788 | More information on tuning the ATA driver system may be found in the | |
789 | .Xr ata 4 | |
790 | man page. | |
791 | .Pp | |
792 | There is a new experimental feature for IDE hard drives called | |
793 | .Va hw.ata.tags | |
794 | (you also set this in the boot loader) which allows write caching to be safely | |
795 | turned on. | |
796 | This brings SCSI tagging features to IDE drives. | |
797 | As of this | |
798 | writing only IBM DPTA and DTLA drives support the feature. | |
799 | Warning! | |
800 | These | |
801 | drives apparently have quality control problems and I do not recommend | |
802 | purchasing them at this time. | |
803 | If you need performance, go with SCSI. | |
804 | .Sh CPU, MEMORY, DISK, NETWORK | |
805 | The type of tuning you do depends heavily on where your system begins to | |
806 | bottleneck as load increases. | |
807 | If your system runs out of CPU (idle times | |
808 | are perpetually 0%) then you need to consider upgrading the CPU or moving to | |
809 | an SMP motherboard (multiple CPU's), or perhaps you need to revisit the | |
810 | programs that are causing the load and try to optimize them. | |
811 | If your system | |
812 | is paging to swap a lot you need to consider adding more memory. | |
813 | If your | |
814 | system is saturating the disk you typically see high CPU idle times and | |
815 | total disk saturation. | |
816 | .Xr systat 1 | |
817 | can be used to monitor this. | |
818 | There are many solutions to saturated disks: | |
819 | increasing memory for caching, mirroring disks, distributing operations across | |
820 | several machines, and so forth. | |
821 | If disk performance is an issue and you | |
822 | are using IDE drives, switching to SCSI can help a great deal. | |
823 | While modern | |
824 | IDE drives compare with SCSI in raw sequential bandwidth, the moment you | |
825 | start seeking around the disk SCSI drives usually win. | |
826 | .Pp | |
827 | Finally, you might run out of network suds. | |
828 | The first line of defense for | |
829 | improving network performance is to make sure you are using switches instead | |
830 | of hubs, especially these days where switches are almost as cheap. | |
831 | Hubs | |
832 | have severe problems under heavy loads due to collision backoff and one bad | |
833 | host can severely degrade the entire LAN. | |
834 | Second, optimize the network path | |
835 | as much as possible. | |
836 | For example, in | |
837 | .Xr firewall 7 | |
838 | we describe a firewall protecting internal hosts with a topology where | |
839 | the externally visible hosts are not routed through it. | |
840 | Use 100BaseT rather | |
23265324 | 841 | than 10BaseT, or use 1000BaseT rather than 100BaseT, depending on your needs. |
984263bc MD |
842 | Most bottlenecks occur at the WAN link (e.g.\& |
843 | modem, T1, DSL, whatever). | |
844 | If expanding the link is not an option it may be possible to use the | |
845 | .Xr dummynet 4 | |
846 | feature to implement peak shaving or other forms of traffic shaping to | |
847 | prevent the overloaded service (such as web services) from affecting other | |
848 | services (such as email), or vice versa. | |
849 | In home installations this could | |
850 | be used to give interactive traffic (your browser, | |
851 | .Xr ssh 1 | |
852 | logins) priority | |
853 | over services you export from your box (web services, email). | |
854 | .Sh SEE ALSO | |
855 | .Xr netstat 1 , | |
856 | .Xr systat 1 , | |
857 | .Xr ata 4 , | |
858 | .Xr dummynet 4 , | |
859 | .Xr login.conf 5 , | |
860 | .Xr rc.conf 5 , | |
861 | .Xr sysctl.conf 5 , | |
862 | .Xr firewall 7 , | |
863 | .Xr hier 7 , | |
984263bc MD |
864 | .Xr boot 8 , |
865 | .Xr ccdconfig 8 , | |
866 | .Xr config 8 , | |
867 | .Xr disklabel 8 , | |
868 | .Xr fsck 8 , | |
869 | .Xr ifconfig 8 , | |
870 | .Xr ipfw 8 , | |
871 | .Xr loader 8 , | |
872 | .Xr mount 8 , | |
873 | .Xr newfs 8 , | |
874 | .Xr route 8 , | |
875 | .Xr sysctl 8 , | |
984263bc MD |
876 | .Xr tunefs 8 , |
877 | .Xr vinum 8 | |
878 | .Sh HISTORY | |
879 | The | |
880 | .Nm | |
881 | manual page was originally written by | |
882 | .An Matthew Dillon | |
883 | and first appeared | |
884 | in | |
885 | .Fx 4.3 , | |
886 | May 2001. |