1 .\" Copyright (c) 1983, 1991, 1993
2 .\" The Regents of the University of California. All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
12 .\" 3. All advertising materials mentioning features or use of this software
13 .\" must display the following acknowledgement:
14 .\" This product includes software developed by the University of
15 .\" California, Berkeley and its contributors.
16 .\" 4. Neither the name of the University nor the names of its contributors
17 .\" may be used to endorse or promote products derived from this software
18 .\" without specific prior written permission.
20 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
21 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
23 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
24 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
26 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
27 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
28 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
29 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32 .\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93
33 .\" $FreeBSD: src/share/man/man4/tcp.4,v 1.11.2.14 2002/12/29 16:35:38 schweikh Exp $
34 .\" $DragonFly: src/share/man/man4/tcp.4,v 1.9 2008/10/17 11:30:24 swildner Exp $
41 .Nd Internet Transmission Control Protocol
47 .Fn socket AF_INET SOCK_STREAM 0
51 protocol provides reliable, flow-controlled, two-way
52 transmission of data. It is a byte-stream protocol used to
55 abstraction. TCP uses the standard
56 Internet address format and, in addition, provides a per-host
59 Thus, each address is composed
60 of an Internet address specifying the host and network, with
63 port on the host identifying the peer entity.
65 Sockets utilizing the tcp protocol are either
69 Active sockets initiate connections to passive
72 sockets are created active; to create a
75 system call must be used
76 after binding the socket with the
79 passive sockets may use the
81 call to accept incoming connections. Only active sockets may
84 call to initiate connections.
88 their location to match
89 incoming connection requests from multiple networks. This
91 .Dq wildcard addressing ,
93 server to provide service to clients on multiple networks.
94 To create a socket which listens on all networks, the Internet
99 port may still be specified
100 at this time; if the port is not specified the system will assign one.
101 Once a connection has been established the socket's address is
102 fixed by the peer entity's location. The address assigned the
103 socket is the address associated with the network interface
104 through which packets are being transmitted and received. Normally
105 this address corresponds to the peer entity's network.
108 supports a number of socket options which can be set with
112 .Bl -tag -width TCP_NODELAYx
114 Under most circumstances,
116 sends data when it is presented;
117 when outstanding data has not yet been acknowledged, it gathers
118 small amounts of output to be sent in a single packet once
119 an acknowledgement is received.
120 For a small number of clients, such as window systems
121 that send a stream of mouse events which receive no replies,
122 this packetization may cause significant delays.
125 defeats this algorithm.
127 By default, a sender\- and receiver-TCP
128 will negotiate among themselves to determine the maximum segment size
129 to be used for each connection. The
131 option allows the user to determine the result of this negotiation,
132 and to reduce it if desired.
135 usually sends a number of options in each packet, corresponding to
138 extensions which are provided in this implementation. The boolean
141 is provided to disable
143 option use on a per-connection basis.
145 By convention, the sender-TCP
148 bit and begin transmission immediately (if permitted) at the end of
155 option is set to a non-zero value,
157 will delay sending any data at all until either the socket is closed,
158 or the internal send buffer is filled.
159 .It Dv TCP_SIGNATURE_ENABLE
160 This option enables the use of MD5 digests (also known as TCP-MD5)
161 on writes to the specified socket.
162 In the current release, only outgoing traffic is digested;
163 digests on incoming traffic are not verified.
164 The current default behavior for the system is to respond to a system
165 advertising this option with TCP-MD5; this may change.
167 One common use for this in a DragonFlyBSD router deployment is to enable
168 based routers to interwork with Cisco equipment at peering points.
169 Support for this feature conforms to RFC 2385.
170 Only IPv4 (AF_INET) sessions are supported.
172 In order for this option to function correctly, it is necessary for the
173 administrator to add a tcp-md5 key entry to the system's security
174 associations database (SADB) using the
177 This entry must have an SPI of 0x1000 and can therefore only be specified
178 on a per-host basis at this time.
180 If an SADB entry cannot be found for the destination, the outgoing traffic
181 will have an invalid digest option prepended, and the following error message
182 will be visible on the system console:
183 .Em "tcpsignature_compute: SADB lookup failed for %d.%d.%d.%d" .
187 connection cannot be established within a period of time,
189 will time out the connection attempt.
192 option specifies the number of milliseconds to wait
193 before the connection attempt times out.
194 The default value for
196 is tcp.keepinit milliseconds.
197 For the accepted sockets, the
199 option value is inherited from the listening socket.
205 sends a keepalive probe to the remote system of a connection
206 that has been idle for a period of time.
209 specifies the number of milliseconds before
211 will send the initial keepalive probe.
212 The default value for
214 is tcp.keepidle milliseconds.
215 For the accepted sockets,
218 option value is inherited from the listening socket.
224 sends a keepalive probe to the remote system of a connection
225 that has been idle for a period of time.
228 option specifies the number of milliseconds to wait
229 before retransmitting a keepalive probe.
230 The default value for
232 is tcp.keepintvl milliseconds.
233 For the accepted sockets,
236 option value is inherited from the listening socket.
242 sends a keepalive probe to the remote system of a connection
243 that has been idle for a period of time.
246 option specifies the maximum number of keepalive
247 probes to be sent before dropping the connection.
248 The default value for
250 is tcp.keepcnt milliseconds.
251 For the accepted sockets,
254 option value is inherited from the listening socket.
257 The option level for the
259 call is the protocol number for
262 .Xr getprotobyname 3 ,
265 All options are declared in
270 transport level may be used with
274 Incoming connection requests that are source-routed are noted,
275 and the reverse source route is used in responding.
279 protocol implements a number of variables in the
284 .Bl -tag -width TCPCTL_DO_RFC1644
285 .It Dv TCPCTL_DO_RFC1323
287 Implement the window scaling and timestamp options of RFC 1323
289 .It Dv TCPCTL_MSSDFLT
291 The default value used for the maximum segment size
293 when no advice to the contrary is received from MSS negotiation.
294 .It Dv TCPCTL_SENDSPACE
296 Maximum TCP send window.
297 .It Dv TCPCTL_RECVSPACE
299 Maximum TCP receive window.
301 Log any connection attempts to ports where there is not a socket
302 accepting connections.
303 The value of 1 limits the logging to SYN (connection establishment)
305 That of 2 results in any TCP packets to closed ports being logged.
306 Any value unlisted above disables the logging
307 (default is 0, i.e., the logging is disabled).
309 The Maximum Segment Lifetime for a packet.
311 Timeout for new, non-established TCP connections.
313 Amount of time the connection should be idle before keepalive
314 probes (if enabled) are sent.
316 The interval between keepalive probes sent to remote machines.
319 (default 8) probes are sent, with no response, the connection is dropped.
321 The maximum number of keepalive probes to be sent
322 before dropping the connection.
323 .It tcp.always_keepalive
328 connections, the kernel will
329 periodically send a packet to the remote host to verify the connection
334 unreachable messages may abort connections in
340 reassembly queue if the system is low on mbufs.
342 If enabled, disable sending of RST when a connection is attempted
343 to a port where there is not a socket accepting connections.
347 Delay ACK to try and piggyback it onto a data packet.
349 Maximum amount of time before a delayed ACK is sent.
351 Enable TCP NewReno Fast Recovery algorithm,
352 as described in RFC 2582.
353 .It tcp.path_mtu_discovery
354 Enables Path MTU Discovery. PMTU Discovery is helpful for avoiding
355 IP fragmentation when tranferring lots of data to the same client.
356 For web servers, where most of the connections are short and to
357 different clients, PMTU Discovery actually hurts performance due
358 to unnecessary retransmissions. Turn this on only if most of your
359 TCP connections are long transfers or are repeatedly to the same
364 control-block hashtable
366 This may be tuned using the kernel option
369 .Va net.inet.tcp.tcbhashsize
373 Number of active process control blocks
376 Determines whether or not syn cookies should be generated for
377 outbound syn-ack packets. Syn cookies are a great help during
378 syn flood attacks, and are enabled by default.
379 .It tcp.isn_reseed_interval
380 The interval (in seconds) specifying how often the secret data used in
381 RFC 1948 initial sequence number calculations should be reseeded.
382 By default, this variable is set to zero, indicating that
383 no reseeding will occur.
384 Reseeding should not be necessary, and will break
386 recycling for a few minutes.
387 .It tcp.inet.tcp.rexmit_{min,slop}
388 Adjust the retransmit timer calculation for TCP. The slop is
389 typically added to the raw calculation to take into account
390 occasional variances that the SRTT (smoothed round trip time)
391 is unable to accommodate, while the minimum specifies an
392 absolute minimum. While a number of TCP RFCs suggest a 1
393 second minimum these RFCs tend to focus on streaming behavior
394 and fail to deal with the fact that a 1 second minimum has severe
395 detrimental effects over lossy interactive connections, such
396 as a 802.11b wireless link, and over very fast but lossy
397 connections for those cases not covered by the fast retransmit
398 code. For this reason we suggest changing the slop to 200ms and
399 setting the minimum to something out of the way, like 20ms,
400 which gives you an effective minimum of 200ms (similar to Linux).
401 .It tcp.inflight_enable
404 bandwidth delay product limiting. An attempt will be made to calculate
405 the bandwidth delay product for each individual TCP connection and limit
406 the amount of inflight data being transmitted to avoid building up
407 unnecessary packets in the network. This option is recommended if you
408 are serving a lot of data over connections with high bandwidth-delay
409 products, such as modems, GigE links, and fast long-haul WANs, and/or
410 you have configured your machine to accommodate large TCP windows. In such
411 situations, without this option, you may experience high interactive
412 latencies or packet loss due to the overloading of intermediate routers
413 and switches. Note that bandwidth delay product limiting only affects
414 the transmit side of a TCP connection.
415 .It tcp.inflight_debug
416 Enable debugging for the bandwidth delay product algorithm. This may
417 default to on (1) so if you enable the algorithm you should probably also
418 disable debugging by setting this variable to 0.
420 This puts an lower bound on the bandwidth delay product window, in bytes.
421 A value of 1024 is typically used for debugging. 6000-16000 is more typical
422 in a production installation. Setting this value too low may result in
423 slow ramp-up times for bursty connections. Setting this value too high
424 effectively disables the algorithm.
426 This puts an upper bound on the bandwidth delay product window, in bytes.
427 This value should not generally be modified but may be used to set a
428 global per-connection limit on queued data, potentially allowing you to
429 intentionally set a less than optimum limit to smooth data flow over a
430 network while still being able to specify huge internal TCP buffers.
431 .It tcp.inflight_stab
432 The bandwidth delay product algorithm requires a slightly larger window
433 than it otherwise calculates for stability. This parameter determines the
434 extra window in maximal packets / 10. The default value of 20 represents
435 2 maximal packets. Reducing this value is not recommended but you may
436 come across a situation with very slow links where the ping time
437 reduction of the default inflight code is not sufficient. If this case
438 occurs you should first try reducing tcp.inflight_min and, if that does not
439 work, reduce both tcp.inflight_min and tcp.inflight_stab, trying values of
440 15, 10, or 5 for the latter. Never use a value less than 5. Reducing
441 tcp.inflight_stab can lead to upwards of a 20% underutilization of the link
442 as well as reducing the algorithm's ability to adapt to changing
443 situations and should only be done as a last resort.
446 A socket operation may fail with one of the following errors returned:
449 when trying to establish a connection on a socket which
452 when the system runs out of memory for
453 an internal data structure;
455 when a connection was dropped
456 due to excessive retransmissions;
459 forces the connection to be closed;
460 .It Bq Er ECONNREFUSED
462 peer actively refuses connection establishment (usually because
463 no process is listening to the port);
466 is made to create a socket with a port which has already been
468 .It Bq Er EADDRNOTAVAIL
469 when an attempt is made to create a
470 socket with a network address for which no network interface
472 .It Bq Er EAFNOSUPPORT
473 when an attempt is made to bind or connect a socket to a multicast
489 .%T "TCP Extensions for High Performance"
494 .%T "Protection of BGP Sessions via the TCP MD5 Signature Option"
502 The RFC 1323 extensions for window scaling and timestamps were added