Initial import from FreeBSD RELENG_4:
[dragonfly.git] / share / man / man7 / tuning.7
CommitLineData
984263bc
MD
1.\" Copyright (c) 2001, Matthew Dillon. Terms and conditions are those of
2.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in
3.\" the source tree.
4.\"
5.\" $FreeBSD: src/share/man/man7/tuning.7,v 1.1.2.30 2002/12/17 19:32:08 dillon Exp $
6.\"
7.Dd May 25, 2001
8.Dt TUNING 7
9.Os
10.Sh NAME
11.Nm tuning
12.Nd performance tuning under FreeBSD
13.Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP
14When using
15.Xr disklabel 8
16or
17.Xr sysinstall 8
18to lay out your filesystems on a hard disk it is important to remember
19that hard drives can transfer data much more quickly from outer tracks
20than they can from inner tracks.
21To take advantage of this you should
22try to pack your smaller filesystems and swap closer to the outer tracks,
23follow with the larger filesystems, and end with the largest filesystems.
24It is also important to size system standard filesystems such that you
25will not be forced to resize them later as you scale the machine up.
26I usually create, in order, a 128M root, 1G swap, 128M
27.Pa /var ,
28128M
29.Pa /var/tmp ,
303G
31.Pa /usr ,
32and use any remaining space for
33.Pa /home .
34.Pp
35You should typically size your swap space to approximately 2x main memory.
36If you do not have a lot of RAM, though, you will generally want a lot
37more swap.
38It is not recommended that you configure any less than
39256M of swap on a system and you should keep in mind future memory
40expansion when sizing the swap partition.
41The kernel's VM paging algorithms are tuned to perform best when there is
42at least 2x swap versus main memory.
43Configuring too little swap can lead
44to inefficiencies in the VM page scanning code as well as create issues
45later on if you add more memory to your machine.
46Finally, on larger systems
47with multiple SCSI disks (or multiple IDE disks operating on different
48controllers), we strongly recommend that you configure swap on each drive
49(up to four drives).
50The swap partitions on the drives should be approximately the same size.
51The kernel can handle arbitrary sizes but
52internal data structures scale to 4 times the largest swap partition.
53Keeping
54the swap partitions near the same size will allow the kernel to optimally
55stripe swap space across the N disks.
56Do not worry about overdoing it a
57little, swap space is the saving grace of
58.Ux
59and even if you do not normally use much swap, it can give you more time to
60recover from a runaway program before being forced to reboot.
61.Pp
62How you size your
63.Pa /var
64partition depends heavily on what you intend to use the machine for.
65This
66partition is primarily used to hold mailboxes, the print spool, and log
67files.
68Some people even make
69.Pa /var/log
70its own partition (but except for extreme cases it is not worth the waste
71of a partition ID).
72If your machine is intended to act as a mail
73or print server,
74or you are running a heavily visited web server, you should consider
75creating a much larger partition \(en perhaps a gig or more.
76It is very easy
77to underestimate log file storage requirements.
78.Pp
79Sizing
80.Pa /var/tmp
81depends on the kind of temporary file usage you think you will need.
82128M is
83the minimum we recommend.
84Also note that sysinstall will create a
85.Pa /tmp
86directory.
87Dedicating a partition for temporary file storage is important for
88two reasons: first, it reduces the possibility of filesystem corruption
89in a crash, and second it reduces the chance of a runaway process that
90fills up
91.Oo Pa /var Oc Ns Pa /tmp
92from blowing up more critical subsystems (mail,
93logging, etc).
94Filling up
95.Oo Pa /var Oc Ns Pa /tmp
96is a very common problem to have.
97.Pp
98In the old days there were differences between
99.Pa /tmp
100and
101.Pa /var/tmp ,
102but the introduction of
103.Pa /var
104(and
105.Pa /var/tmp )
106led to massive confusion
107by program writers so today programs haphazardly use one or the
108other and thus no real distinction can be made between the two.
109So it makes sense to have just one temporary directory and
110softlink to it from the other tmp directory locations.
111However you handle
112.Pa /tmp ,
113the one thing you do not want to do is leave it sitting
114on the root partition where it might cause root to fill up or possibly
115corrupt root in a crash/reboot situation.
116.Pp
117The
118.Pa /usr
119partition holds the bulk of the files required to support the system and
120a subdirectory within it called
121.Pa /usr/local
122holds the bulk of the files installed from the
123.Xr ports 7
124hierarchy.
125If you do not use ports all that much and do not intend to keep
126system source
127.Pq Pa /usr/src
128on the machine, you can get away with
129a 1 gigabyte
130.Pa /usr
131partition.
132However, if you install a lot of ports
133(especially window managers and Linux-emulated binaries), we recommend
134at least a 2 gigabyte
135.Pa /usr
136and if you also intend to keep system source
137on the machine, we recommend a 3 gigabyte
138.Pa /usr .
139Do not underestimate the
140amount of space you will need in this partition, it can creep up and
141surprise you!
142.Pp
143The
144.Pa /home
145partition is typically used to hold user-specific data.
146I usually size it to the remainder of the disk.
147.Pp
148Why partition at all?
149Why not create one big
150.Pa /
151partition and be done with it?
152Then I do not have to worry about undersizing things!
153Well, there are several reasons this is not a good idea.
154First,
155each partition has different operational characteristics and separating them
156allows the filesystem to tune itself to those characteristics.
157For example,
158the root and
159.Pa /usr
160partitions are read-mostly, with very little writing, while
161a lot of reading and writing could occur in
162.Pa /var
163and
164.Pa /var/tmp .
165By properly
166partitioning your system fragmentation introduced in the smaller more
167heavily write-loaded partitions will not bleed over into the mostly-read
168partitions.
169Additionally, keeping the write-loaded partitions closer to
170the edge of the disk (i.e. before the really big partitions instead of after
171in the partition table) will increase I/O performance in the partitions
172where you need it the most.
173Now it is true that you might also need I/O
174performance in the larger partitions, but they are so large that shifting
175them more towards the edge of the disk will not lead to a significant
176performance improvement whereas moving
177.Pa /var
178to the edge can have a huge impact.
179Finally, there are safety concerns.
180Having a small neat root partition that
181is essentially read-only gives it a greater chance of surviving a bad crash
182intact.
183.Pp
184Properly partitioning your system also allows you to tune
185.Xr newfs 8 ,
186and
187.Xr tunefs 8
188parameters.
189Tuning
190.Xr newfs 8
191requires more experience but can lead to significant improvements in
192performance.
193There are three parameters that are relatively safe to tune:
194.Em blocksize , bytes/i-node ,
195and
196.Em cylinders/group .
197.Pp
198.Fx
199performs best when using 8K or 16K filesystem block sizes.
200The default filesystem block size is 16K,
201which provides best performance for most applications,
202with the exception of those that perform random access on large files
203(such as database server software).
204Such applications tend to perform better with a smaller block size,
205although modern disk characteristics are such that the performance
206gain from using a smaller block size may not be worth consideration.
207Using a block size larger than 16K
208can cause fragmentation of the buffer cache and
209lead to lower performance.
210.Pp
211The defaults may be unsuitable
212for a filesystem that requires a very large number of i-nodes
213or is intended to hold a large number of very small files.
214Such a filesystem should be created with an 8K or 4K block size.
215This also requires you to specify a smaller
216fragment size.
217We recommend always using a fragment size that is 1/8
218the block size (less testing has been done on other fragment size factors).
219The
220.Xr newfs 8
221options for this would be
222.Dq Li "newfs -f 1024 -b 8192 ..." .
223.Pp
224If a large partition is intended to be used to hold fewer, larger files, such
225as database files, you can increase the
226.Em bytes/i-node
227ratio which reduces the number of i-nodes (maximum number of files and
228directories that can be created) for that partition.
229Decreasing the number
230of i-nodes in a filesystem can greatly reduce
231.Xr fsck 8
232recovery times after a crash.
233Do not use this option
234unless you are actually storing large files on the partition, because if you
235overcompensate you can wind up with a filesystem that has lots of free
236space remaining but cannot accommodate any more files.
237Using 32768, 65536, or 262144 bytes/i-node is recommended.
238You can go higher but
239it will have only incremental effects on
240.Xr fsck 8
241recovery times.
242For example,
243.Dq Li "newfs -i 32768 ..." .
244.Pp
245.Xr tunefs 8
246may be used to further tune a filesystem.
247This command can be run in
248single-user mode without having to reformat the filesystem.
249However, this is possibly the most abused program in the system.
250Many people attempt to
251increase available filesystem space by setting the min-free percentage to 0.
252This can lead to severe filesystem fragmentation and we do not recommend
253that you do this.
254Really the only
255.Xr tunefs 8
256option worthwhile here is turning on
257.Em softupdates
258with
259.Dq Li "tunefs -n enable /filesystem" .
260(Note: in
261.Fx 4.5
262and later, softupdates can be turned on using the
263.Fl U
264option to
265.Xr newfs 8 ,
266and
267.Xr sysinstall 8
268will typically enable softupdates automatically for non-root filesystems).
269Softupdates drastically improves meta-data performance, mainly file
270creation and deletion.
271We recommend enabling softupdates on most filesystems; however, there
272are two limitations to softupdates that you should be aware of when
273determining whether to use it on a filesystem.
274First, softupdates guarantees filesystem consistency in the
275case of a crash but could very easily be several seconds (even a minute!)
276behind on pending writes to the physical disk.
277If you crash you may lose more work
278than otherwise.
279Secondly, softupdates delays the freeing of filesystem
280blocks.
281If you have a filesystem (such as the root filesystem) which is
282close to full, doing a major update of it, e.g.\&
283.Dq Li "make installworld" ,
284can run it out of space and cause the update to fail.
285For this reason, softupdates will not be enabled on the root filesystem
286during a typical install. There is no loss of performance since the root
287filesystem is rarely written to.
288.Pp
289A number of run-time
290.Xr mount 8
291options exist that can help you tune the system.
292The most obvious and most dangerous one is
293.Cm async .
294Do not ever use it; it is far too dangerous.
295A less dangerous and more
296useful
297.Xr mount 8
298option is called
299.Cm noatime .
300.Ux
301filesystems normally update the last-accessed time of a file or
302directory whenever it is accessed.
303This operation is handled in
304.Fx
305with a delayed write and normally does not create a burden on the system.
306However, if your system is accessing a huge number of files on a continuing
307basis the buffer cache can wind up getting polluted with atime updates,
308creating a burden on the system.
309For example, if you are running a heavily
310loaded web site, or a news server with lots of readers, you might want to
311consider turning off atime updates on your larger partitions with this
312.Xr mount 8
313option.
314However, you should not gratuitously turn off atime
315updates everywhere.
316For example, the
317.Pa /var
318filesystem customarily
319holds mailboxes, and atime (in combination with mtime) is used to
320determine whether a mailbox has new mail.
321You might as well leave
322atime turned on for mostly read-only partitions such as
323.Pa /
324and
325.Pa /usr
326as well.
327This is especially useful for
328.Pa /
329since some system utilities
330use the atime field for reporting.
331.Sh STRIPING DISKS
332In larger systems you can stripe partitions from several drives together
333to create a much larger overall partition.
334Striping can also improve
335the performance of a filesystem by splitting I/O operations across two
336or more disks.
337The
338.Xr vinum 8
339and
340.Xr ccdconfig 8
341utilities may be used to create simple striped filesystems.
342Generally
343speaking, striping smaller partitions such as the root and
344.Pa /var/tmp ,
345or essentially read-only partitions such as
346.Pa /usr
347is a complete waste of time.
348You should only stripe partitions that require serious I/O performance,
349typically
350.Pa /var , /home ,
351or custom partitions used to hold databases and web pages.
352Choosing the proper stripe size is also
353important.
354Filesystems tend to store meta-data on power-of-2 boundaries
355and you usually want to reduce seeking rather than increase seeking.
356This
357means you want to use a large off-center stripe size such as 1152 sectors
358so sequential I/O does not seek both disks and so meta-data is distributed
359across both disks rather than concentrated on a single disk.
360If
361you really need to get sophisticated, we recommend using a real hardware
362RAID controller from the list of
363.Fx
364supported controllers.
365.Sh SYSCTL TUNING
366.Xr sysctl 8
367variables permit system behavior to be monitored and controlled at
368run-time.
369Some sysctls simply report on the behavior of the system; others allow
370the system behavior to be modified;
371some may be set at boot time using
372.Xr rc.conf 5 ,
373but most will be set via
374.Xr sysctl.conf 5 .
375There are several hundred sysctls in the system, including many that appear
376to be candidates for tuning but actually are not.
377In this document we will only cover the ones that have the greatest effect
378on the system.
379.Pp
380The
381.Va kern.ipc.shm_use_phys
382sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on).
383Setting
384this parameter to 1 will cause all System V shared memory segments to be
385mapped to unpageable physical RAM.
386This feature only has an effect if you
387are either (A) mapping small amounts of shared memory across many (hundreds)
388of processes, or (B) mapping large amounts of shared memory across any
389number of processes.
390This feature allows the kernel to remove a great deal
391of internal memory management page-tracking overhead at the cost of wiring
392the shared memory into core, making it unswappable.
393.Pp
394The
395.Va vfs.vmiodirenable
396sysctl defaults to 1 (on).
397This parameter controls how directories are cached
398by the system.
399Most directories are small and use but a single fragment
400(typically 1K) in the filesystem and even less (typically 512 bytes) in
401the buffer cache.
402However, when operating in the default mode the buffer
403cache will only cache a fixed number of directories even if you have a huge
404amount of memory.
405Turning on this sysctl allows the buffer cache to use
406the VM Page Cache to cache the directories.
407The advantage is that all of
408memory is now available for caching directories.
409The disadvantage is that
410the minimum in-core memory used to cache a directory is the physical page
411size (typically 4K) rather than 512 bytes.
412We recommend turning this option off in memory-constrained environments;
413however, when on, it will substantially improve the performance of services
414that manipulate a large number of files.
415Such services can include web caches, large mail systems, and news systems.
416Turning on this option will generally not reduce performance even with the
417wasted memory but you should experiment to find out.
418.Pp
419The
420.Va vfs.write_behind
421sysctl defaults to 1 (on). This tells the filesystem to issue media
422writes as full clusters are collected, which typically occurs when writing
423large sequential files. The idea is to avoid saturating the buffer
424cache with dirty buffers when it would not benefit I/O performance. However,
425this may stall processes and under certain circumstances you may wish to turn
426it off.
427.Pp
428The
429.Va vfs.hirunningspace
430sysctl determines how much outstanding write I/O may be queued to
431disk controllers system wide at any given instance. The default is
432usually sufficient but on machines with lots of disks you may want to bump
433it up to four or five megabytes. Note that setting too high a value
434(exceeding the buffer cache's write threshold) can lead to extremely
435bad clustering performance. Do not set this value arbitrarily high! Also,
436higher write queueing values may add latency to reads occuring at the same
437time.
438.Pp
439There are various other buffer-cache and VM page cache related sysctls.
440We do not recommend modifying these values.
441As of
442.Fx 4.3 ,
443the VM system does an extremely good job tuning itself.
444.Pp
445The
446.Va net.inet.tcp.sendspace
447and
448.Va net.inet.tcp.recvspace
449sysctls are of particular interest if you are running network intensive
450applications.
451They control the amount of send and receive buffer space
452allowed for any given TCP connection.
453The default sending buffer is 32K; the default receiving buffer
454is 64K.
455You can often
456improve bandwidth utilization by increasing the default at the cost of
457eating up more kernel memory for each connection.
458We do not recommend
459increasing the defaults if you are serving hundreds or thousands of
460simultaneous connections because it is possible to quickly run the system
461out of memory due to stalled connections building up.
462But if you need
463high bandwidth over a fewer number of connections, especially if you have
464gigabit Ethernet, increasing these defaults can make a huge difference.
465You can adjust the buffer size for incoming and outgoing data separately.
466For example, if your machine is primarily doing web serving you may want
467to decrease the recvspace in order to be able to increase the
468sendspace without eating too much kernel memory.
469Note that the routing table (see
470.Xr route 8 )
471can be used to introduce route-specific send and receive buffer size
472defaults.
473.Pp
474As an additional management tool you can use pipes in your
475firewall rules (see
476.Xr ipfw 8 )
477to limit the bandwidth going to or from particular IP blocks or ports.
478For example, if you have a T1 you might want to limit your web traffic
479to 70% of the T1's bandwidth in order to leave the remainder available
480for mail and interactive use.
481Normally a heavily loaded web server
482will not introduce significant latencies into other services even if
483the network link is maxed out, but enforcing a limit can smooth things
484out and lead to longer term stability.
485Many people also enforce artificial
486bandwidth limitations in order to ensure that they are not charged for
487using too much bandwidth.
488.Pp
489Setting the send or receive TCP buffer to values larger then 65535 will result
490in a marginal performance improvement unless both hosts support the window
491scaling extension of the TCP protocol, which is controlled by the
492.Va net.inet.tcp.rfc1323
493sysctl.
494These extensions should be enabled and the TCP buffer size should be set
495to a value larger than 65536 in order to obtain good performance from
496certain types of network links; specifically, gigabit WAN links and
497high-latency satellite links.
498RFC1323 support is enabled by default.
499.Pp
500The
501.Va net.inet.tcp.always_keepalive
502sysctl determines whether or not the TCP implementation should attempt
503to detect dead TCP connections by intermittently delivering
504.Dq keepalives
505on the connection.
506By default, this is enabled for all applications; by setting this
507sysctl to 0, only applications that specifically request keepalives
508will use them.
509In most environments, TCP keepalives will improve the management of
510system state by expiring dead TCP connections, particularly for
511systems serving dialup users who may not always terminate individual
512TCP connections before disconnecting from the network.
513However, in some environments, temporary network outages may be
514incorrectly identified as dead sessions, resulting in unexpectedly
515terminated TCP connections.
516In such environments, setting the sysctl to 0 may reduce the occurrence of
517TCP session disconnections.
518.Pp
519The
520.Va net.inet.tcp.delayed_ack
521TCP feature is largly misunderstood. Historically speaking this feature
522was designed to allow the acknowledgement to transmitted data to be returned
523along with the response. For example, when you type over a remote shell
524the acknowledgement to the character you send can be returned along with the
525data representing the echo of the character. With delayed acks turned off
526the acknowledgement may be sent in its own packet before the remote service
527has a chance to echo the data it just received. This same concept also
528applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the
529number of tiny packets flowing across the network in half. The FreeBSD
530delayed-ack implementation also follows the TCP protocol rule that
531at least every other packet be acknowledged even if the standard 100ms
532timeout has not yet passed. Normally the worst a delayed ack can do is
533slightly delay the teardown of a connection, or slightly delay the ramp-up
534of a slow-start TCP connection. While we aren't sure we believe that
535the several FAQs related to packages such as SAMBA and SQUID which advise
536turning off delayed acks may be refering to the slow-start issue. In FreeBSD
537it would be more beneficial to increase the slow-start flightsize via
538the
539.Va net.inet.tcp.slowstart_flightsize
540sysctl rather then disable delayed acks.
541.Pp
542The
543.Va net.inet.tcp.inflight_enable
544sysctl turns on bandwidth delay product limiting for all TCP connections.
545The system will attempt to calculate the bandwidth delay product for each
546connection and limit the amount of data queued to the network to just the
547amount required to maintain optimum throughput. This feature is useful
548if you are serving data over modems, GigE, or high speed WAN links (or
549any other link with a high bandwidth*delay product), especially if you are
550also using window scaling or have configured a large send window. If
551you enable this option you should also be sure to set
552.Va net.inet.tcp.inflight_debug
553to 0 (disable debugging), and for production use setting
554.Va net.inet.tcp.inflight_min
555to at least 6144 may be beneficial. Note, however, that setting high
556minimums may effectively disable bandwidth limiting depending on the link.
557The limiting feature reduces the amount of data built up in intermediate
558router and switch packet queues as well as reduces the amount of data built
559up in the local host's interface queue. With fewer packets queued up,
560interactive connections, especially over slow modems, will also be able
561to operate with lower round trip times. However, note that this feature
562only effects data transmission (uploading / server-side). It does not
563effect data reception (downloading).
564.Pp
565Adjusting
566.Va net.inet.tcp.inflight_stab
567is not recommended.
568This parameter defaults to 20, representing 2 maximal packets added
569to the bandwidth delay product window calculation. The additional
570window is required to stabilize the algorithm and improve responsiveness
571to changing conditions, but it can also result in higher ping times
572over slow links (though still much lower then you would get without
573the inflight algorithm). In such cases you may
574wish to try reducing this parameter to 15, 10, or 5, and you may also
575have to reduce
576.Va net.inet.tcp.inflight_min
577(for example, to 3500) to get the desired effect. Reducing these parameters
578should be done as a last resort only.
579.Pp
580The
581.Va net.inet.ip.portrange.*
582sysctls control the port number ranges automatically bound to TCP and UDP
583sockets. There are three ranges: A low range, a default range, and a
584high range, selectable via an IP_PORTRANGE setsockopt() call. Most
585network programs use the default range which is controlled by
586.Va net.inet.ip.portrange.first
587and
588.Va net.inet.ip.portrange.last ,
589which defaults to 1024 and 5000 respectively. Bound port ranges are
590used for outgoing connections and it is possible to run the system out
591of ports under certain circumstances. This most commonly occurs when you are
592running a heavily loaded web proxy. The port range is not an issue
593when running serves which handle mainly incoming connections such as a
594normal web server, or has a limited number of outgoing connections such
595as a mail relay. For situations where you may run yourself out of
596ports we recommend increasing
597.Va net.inet.ip.portrange.last
598modestly. A value of 10000 or 20000 or 30000 may be reasonable. You should
599also consider firewall effects when changing the port range. Some firewalls
600may block large ranges of ports (usually low-numbered ports) and expect systems
601to use higher ranges of ports for outgoing connections. For this reason
602we do not recommend that
603.Va net.inet.ip.portrange.first
604be lowered.
605.Pp
606The
607.Va kern.ipc.somaxconn
608sysctl limits the size of the listen queue for accepting new TCP connections.
609The default value of 128 is typically too low for robust handling of new
610connections in a heavily loaded web server environment.
611For such environments,
612we recommend increasing this value to 1024 or higher.
613The service daemon
614may itself limit the listen queue size (e.g.\&
615.Xr sendmail 8 ,
616apache) but will
617often have a directive in its configuration file to adjust the queue size up.
618Larger listen queues also do a better job of fending off denial of service
619attacks.
620.Pp
621The
622.Va kern.maxfiles
623sysctl determines how many open files the system supports.
624The default is
625typically a few thousand but you may need to bump this up to ten or twenty
626thousand if you are running databases or large descriptor-heavy daemons.
627The read-only
628.Va kern.openfiles
629sysctl may be interrogated to determine the current number of open files
630on the system.
631.Pp
632The
633.Va vm.swap_idle_enabled
634sysctl is useful in large multi-user systems where you have lots of users
635entering and leaving the system and lots of idle processes.
636Such systems
637tend to generate a great deal of continuous pressure on free memory reserves.
638Turning this feature on and adjusting the swapout hysteresis (in idle
639seconds) via
640.Va vm.swap_idle_threshold1
641and
642.Va vm.swap_idle_threshold2
643allows you to depress the priority of pages associated with idle processes
644more quickly then the normal pageout algorithm.
645This gives a helping hand
646to the pageout daemon.
647Do not turn this option on unless you need it,
648because the tradeoff you are making is to essentially pre-page memory sooner
649rather then later, eating more swap and disk bandwidth.
650In a small system
651this option will have a detrimental effect but in a large system that is
652already doing moderate paging this option allows the VM system to stage
653whole processes into and out of memory more easily.
654.Sh LOADER TUNABLES
655Some aspects of the system behavior may not be tunable at runtime because
656memory allocations they perform must occur early in the boot process.
657To change loader tunables, you must set their values in
658.Xr loader.conf 5
659and reboot the system.
660.Pp
661.Va kern.maxusers
662controls the scaling of a number of static system tables, including defaults
663for the maximum number of open files, sizing of network memory resources, etc.
664As of
665.Fx 4.5 ,
666.Va kern.maxusers
667is automatically sized at boot based on the amount of memory available in
668the system, and may be determined at run-time by inspecting the value of the
669read-only
670.Va kern.maxusers
671sysctl.
672Some sites will require larger or smaller values of
673.Va kern.maxusers
674and may set it as a loader tunable; values of 64, 128, and 256 are not
675uncommon.
676We do not recommend going above 256 unless you need a huge number
677of file descriptors; many of the tunable values set to their defaults by
678.Va kern.maxusers
679may be individually overridden at boot-time or run-time as described
680elsewhere in this document.
681Systems older than
682.Fx 4.4
683must set this value via the kernel
684.Xr config 8
685option
686.Cd maxusers
687instead.
688.Pp
689.Va kern.ipc.nmbclusters
690may be adjusted to increase the number of network mbufs the system is
691willing to allocate.
692Each cluster represents approximately 2K of memory,
693so a value of 1024 represents 2M of kernel memory reserved for network
694buffers.
695You can do a simple calculation to figure out how many you need.
696If you have a web server which maxes out at 1000 simultaneous connections,
697and each connection eats a 16K receive and 16K send buffer, you need
698approximately 32MB worth of network buffers to deal with it.
699A good rule of
700thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768.
701So for this case
702you would want to set
703.Va kern.ipc.nmbclusters
704to 32768.
705We recommend values between
7061024 and 4096 for machines with moderates amount of memory, and between 4096
707and 32768 for machines with greater amounts of memory.
708Under no circumstances
709should you specify an arbitrarily high value for this parameter, it could
710lead to a boot-time crash.
711The
712.Fl m
713option to
714.Xr netstat 1
715may be used to observe network cluster use.
716Older versions of
717.Fx
718do not have this tunable and require that the
719kernel
720.Xr config 8
721option
722.Dv NMBCLUSTERS
723be set instead.
724.Pp
725More and more programs are using the
726.Xr sendfile 2
727system call to transmit files over the network.
728The
729.Va kern.ipc.nsfbufs
730sysctl controls the number of filesystem buffers
731.Xr sendfile 2
732is allowed to use to perform its work.
733This parameter nominally scales
734with
735.Va kern.maxusers
736so you should not need to modify this parameter except under extreme
737circumstances.
738.Sh KERNEL CONFIG TUNING
739There are a number of kernel options that you may have to fiddle with in
740a large-scale system.
741In order to change these options you need to be
742able to compile a new kernel from source.
743The
744.Xr config 8
745manual page and the handbook are good starting points for learning how to
746do this.
747Generally the first thing you do when creating your own custom
748kernel is to strip out all the drivers and services you do not use.
749Removing things like
750.Dv INET6
751and drivers you do not have will reduce the size of your kernel, sometimes
752by a megabyte or more, leaving more memory available for applications.
753.Pp
754.Dv SCSI_DELAY
755and
756.Dv IDE_DELAY
757may be used to reduce system boot times.
758The defaults are fairly high and
759can be responsible for 15+ seconds of delay in the boot process.
760Reducing
761.Dv SCSI_DELAY
762to 5 seconds usually works (especially with modern drives).
763Reducing
764.Dv IDE_DELAY
765also works but you have to be a little more careful.
766.Pp
767There are a number of
768.Dv *_CPU
769options that can be commented out.
770If you only want the kernel to run
771on a Pentium class CPU, you can easily remove
772.Dv I386_CPU
773and
774.Dv I486_CPU ,
775but only remove
776.Dv I586_CPU
777if you are sure your CPU is being recognized as a Pentium II or better.
778Some clones may be recognized as a Pentium or even a 486 and not be able
779to boot without those options.
780If it works, great!
781The operating system
782will be able to better-use higher-end CPU features for MMU, task switching,
783timebase, and even device operations.
784Additionally, higher-end CPUs support
7854MB MMU pages, which the kernel uses to map the kernel itself into memory,
786increasing its efficiency under heavy syscall loads.
787.Sh IDE WRITE CACHING
788.Fx 4.3
789flirted with turning off IDE write caching.
790This reduced write bandwidth
791to IDE disks but was considered necessary due to serious data consistency
792issues introduced by hard drive vendors.
793Basically the problem is that
794IDE drives lie about when a write completes.
795With IDE write caching turned
796on, IDE hard drives will not only write data to disk out of order, they
797will sometimes delay some of the blocks indefinitely under heavy disk
798load.
799A crash or power failure can result in serious filesystem
800corruption.
801So our default was changed to be safe.
802Unfortunately, the
803result was such a huge loss in performance that we caved in and changed the
804default back to on after the release.
805You should check the default on
806your system by observing the
807.Va hw.ata.wc
808sysctl variable.
809If IDE write caching is turned off, you can turn it back
810on by setting the
811.Va hw.ata.wc
812loader tunable to 1.
813More information on tuning the ATA driver system may be found in the
814.Xr ata 4
815man page.
816.Pp
817There is a new experimental feature for IDE hard drives called
818.Va hw.ata.tags
819(you also set this in the boot loader) which allows write caching to be safely
820turned on.
821This brings SCSI tagging features to IDE drives.
822As of this
823writing only IBM DPTA and DTLA drives support the feature.
824Warning!
825These
826drives apparently have quality control problems and I do not recommend
827purchasing them at this time.
828If you need performance, go with SCSI.
829.Sh CPU, MEMORY, DISK, NETWORK
830The type of tuning you do depends heavily on where your system begins to
831bottleneck as load increases.
832If your system runs out of CPU (idle times
833are perpetually 0%) then you need to consider upgrading the CPU or moving to
834an SMP motherboard (multiple CPU's), or perhaps you need to revisit the
835programs that are causing the load and try to optimize them.
836If your system
837is paging to swap a lot you need to consider adding more memory.
838If your
839system is saturating the disk you typically see high CPU idle times and
840total disk saturation.
841.Xr systat 1
842can be used to monitor this.
843There are many solutions to saturated disks:
844increasing memory for caching, mirroring disks, distributing operations across
845several machines, and so forth.
846If disk performance is an issue and you
847are using IDE drives, switching to SCSI can help a great deal.
848While modern
849IDE drives compare with SCSI in raw sequential bandwidth, the moment you
850start seeking around the disk SCSI drives usually win.
851.Pp
852Finally, you might run out of network suds.
853The first line of defense for
854improving network performance is to make sure you are using switches instead
855of hubs, especially these days where switches are almost as cheap.
856Hubs
857have severe problems under heavy loads due to collision backoff and one bad
858host can severely degrade the entire LAN.
859Second, optimize the network path
860as much as possible.
861For example, in
862.Xr firewall 7
863we describe a firewall protecting internal hosts with a topology where
864the externally visible hosts are not routed through it.
865Use 100BaseT rather
866than 10BaseT, or use 1000BaseT rather then 100BaseT, depending on your needs.
867Most bottlenecks occur at the WAN link (e.g.\&
868modem, T1, DSL, whatever).
869If expanding the link is not an option it may be possible to use the
870.Xr dummynet 4
871feature to implement peak shaving or other forms of traffic shaping to
872prevent the overloaded service (such as web services) from affecting other
873services (such as email), or vice versa.
874In home installations this could
875be used to give interactive traffic (your browser,
876.Xr ssh 1
877logins) priority
878over services you export from your box (web services, email).
879.Sh SEE ALSO
880.Xr netstat 1 ,
881.Xr systat 1 ,
882.Xr ata 4 ,
883.Xr dummynet 4 ,
884.Xr login.conf 5 ,
885.Xr rc.conf 5 ,
886.Xr sysctl.conf 5 ,
887.Xr firewall 7 ,
888.Xr hier 7 ,
889.Xr ports 7 ,
890.Xr boot 8 ,
891.Xr ccdconfig 8 ,
892.Xr config 8 ,
893.Xr disklabel 8 ,
894.Xr fsck 8 ,
895.Xr ifconfig 8 ,
896.Xr ipfw 8 ,
897.Xr loader 8 ,
898.Xr mount 8 ,
899.Xr newfs 8 ,
900.Xr route 8 ,
901.Xr sysctl 8 ,
902.Xr sysinstall 8 ,
903.Xr tunefs 8 ,
904.Xr vinum 8
905.Sh HISTORY
906The
907.Nm
908manual page was originally written by
909.An Matthew Dillon
910and first appeared
911in
912.Fx 4.3 ,
913May 2001.