iwm: Fix S:N reporting in ifconfig(8)
[dragonfly.git] / share / man / man4 / tcp.4
CommitLineData
984263bc
MD
1.\" Copyright (c) 1983, 1991, 1993
2.\" The Regents of the University of California. All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\" notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\" notice, this list of conditions and the following disclaimer in the
11.\" documentation and/or other materials provided with the distribution.
dc71b7ab 12.\" 3. Neither the name of the University nor the names of its contributors
984263bc
MD
13.\" may be used to endorse or promote products derived from this software
14.\" without specific prior written permission.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93
29.\" $FreeBSD: src/share/man/man4/tcp.4,v 1.11.2.14 2002/12/29 16:35:38 schweikh Exp $
30.\"
755d70b8 31.Dd April 21, 2018
984263bc
MD
32.Dt TCP 4
33.Os
34.Sh NAME
35.Nm tcp
36.Nd Internet Transmission Control Protocol
37.Sh SYNOPSIS
38.In sys/types.h
39.In sys/socket.h
40.In netinet/in.h
41.Ft int
42.Fn socket AF_INET SOCK_STREAM 0
43.Sh DESCRIPTION
44The
45.Tn TCP
46protocol provides reliable, flow-controlled, two-way
47transmission of data. It is a byte-stream protocol used to
48support the
49.Dv SOCK_STREAM
50abstraction. TCP uses the standard
51Internet address format and, in addition, provides a per-host
52collection of
53.Dq port addresses .
54Thus, each address is composed
55of an Internet address specifying the host and network, with
56a specific
57.Tn TCP
58port on the host identifying the peer entity.
59.Pp
60Sockets utilizing the tcp protocol are either
61.Dq active
62or
63.Dq passive .
64Active sockets initiate connections to passive
65sockets. By default
66.Tn TCP
67sockets are created active; to create a
68passive socket the
69.Xr listen 2
70system call must be used
71after binding the socket with the
72.Xr bind 2
73system call. Only
74passive sockets may use the
75.Xr accept 2
76call to accept incoming connections. Only active sockets may
77use the
78.Xr connect 2
79call to initiate connections.
984263bc
MD
80.Pp
81Passive sockets may
82.Dq underspecify
83their location to match
84incoming connection requests from multiple networks. This
85technique, termed
86.Dq wildcard addressing ,
87allows a single
88server to provide service to clients on multiple networks.
89To create a socket which listens on all networks, the Internet
90address
91.Dv INADDR_ANY
92must be bound. The
93.Tn TCP
94port may still be specified
95at this time; if the port is not specified the system will assign one.
96Once a connection has been established the socket's address is
97fixed by the peer entity's location. The address assigned the
98socket is the address associated with the network interface
99through which packets are being transmitted and received. Normally
100this address corresponds to the peer entity's network.
101.Pp
102.Tn TCP
103supports a number of socket options which can be set with
104.Xr setsockopt 2
105and tested with
106.Xr getsockopt 2 :
107.Bl -tag -width TCP_NODELAYx
108.It Dv TCP_NODELAY
109Under most circumstances,
110.Tn TCP
111sends data when it is presented;
112when outstanding data has not yet been acknowledged, it gathers
113small amounts of output to be sent in a single packet once
114an acknowledgement is received.
115For a small number of clients, such as window systems
116that send a stream of mouse events which receive no replies,
117this packetization may cause significant delays.
118The boolean option
119.Dv TCP_NODELAY
120defeats this algorithm.
121.It Dv TCP_MAXSEG
122By default, a sender\- and receiver-TCP
123will negotiate among themselves to determine the maximum segment size
124to be used for each connection. The
125.Dv TCP_MAXSEG
126option allows the user to determine the result of this negotiation,
127and to reduce it if desired.
128.It Dv TCP_NOOPT
129.Tn TCP
130usually sends a number of options in each packet, corresponding to
131various
132.Tn TCP
133extensions which are provided in this implementation. The boolean
134option
135.Dv TCP_NOOPT
136is provided to disable
137.Tn TCP
138option use on a per-connection basis.
139.It Dv TCP_NOPUSH
140By convention, the sender-TCP
141will set the
142.Dq push
143bit and begin transmission immediately (if permitted) at the end of
144every user call to
145.Xr write 2
146or
147.Xr writev 2 .
68b63b2b 148When the
984263bc 149.Dv TCP_NOPUSH
68b63b2b 150option is set to a non-zero value,
984263bc
MD
151.Tn TCP
152will delay sending any data at all until either the socket is closed,
153or the internal send buffer is filled.
86de01bd
SW
154.\".It Dv TCP_SIGNATURE_ENABLE
155.\"This option enables the use of MD5 digests (also known as TCP-MD5)
156.\"on writes to the specified socket.
157.\"In the current release, only outgoing traffic is digested;
158.\"digests on incoming traffic are not verified.
159.\"The current default behavior for the system is to respond to a system
160.\"advertising this option with TCP-MD5; this may change.
161.\".Pp
162.\"One common use for this in a DragonFlyBSD router deployment is to enable
163.\"based routers to interwork with Cisco equipment at peering points.
164.\"Support for this feature conforms to RFC 2385.
165.\"Only IPv4 (AF_INET) sessions are supported.
166.\".Pp
167.\"In order for this option to function correctly, it is necessary for the
168.\"administrator to add a tcp-md5 key entry to the system's security
169.\"associations database (SADB) using the
170.\".Xr setkey 8
171.\"utility.
172.\"This entry must have an SPI of 0x1000 and can therefore only be specified
173.\"on a per-host basis at this time.
174.\".Pp
175.\"If an SADB entry cannot be found for the destination, the outgoing traffic
176.\"will have an invalid digest option prepended, and the following error message
177.\"will be visible on the system console:
178.\".Em "tcpsignature_compute: SADB lookup failed for %d.%d.%d.%d" .
efb6b593
SZ
179.It Dv TCP_KEEPINIT
180If a
181.Tn TCP
182connection cannot be established within a period of time,
183.Tn TCP
184will time out the connection attempt.
185The
186.Dv TCP_KEEPINIT
1931d00d 187option specifies the number of seconds to wait
efb6b593
SZ
188before the connection attempt times out.
189The default value for
190.Dv TCP_KEEPINIT
1931d00d 191is tcp.keepinit seconds.
efb6b593
SZ
192For the accepted sockets, the
193.Dv TCP_KEEPINIT
194option value is inherited from the listening socket.
195.It Dv TCP_KEEPIDLE
196When the
197.Dv SO_KEEPALIVE
198option is enabled,
199.Tn TCP
200sends a keepalive probe to the remote system of a connection
201that has been idle for a period of time.
202The
203.Dv TCP_KEEPIDLE
1931d00d 204specifies the number of seconds before
efb6b593
SZ
205.Tn TCP
206will send the initial keepalive probe.
207The default value for
208.Dv TCP_KEEPIDLE
1931d00d 209is tcp.keepidle seconds.
efb6b593
SZ
210For the accepted sockets,
211the
212.Dv TCP_KEEPIDLE
213option value is inherited from the listening socket.
214.It Dv TCP_KEEPINTVL
215When the
216.Dv SO_KEEPALIVE
217option is enabled,
218.Tn TCP
219sends a keepalive probe to the remote system of a connection
220that has been idle for a period of time.
221The
222.Dv TCP_KEEPINTVL
1931d00d 223option specifies the number of seconds to wait
efb6b593
SZ
224before retransmitting a keepalive probe.
225The default value for
226.Dv TCP_KEEPINTVL
1931d00d 227is tcp.keepintvl seconds.
efb6b593
SZ
228For the accepted sockets,
229the
230.Dv TCP_KEEPINTVL
231option value is inherited from the listening socket.
232.It Dv TCP_KEEPCNT
233When the
234.Dv SO_KEEPALIVE
235option is enabled,
236.Tn TCP
237sends a keepalive probe to the remote system of a connection
238that has been idle for a period of time.
239The
240.Dv TCP_KEEPCNT
241option specifies the maximum number of keepalive
242probes to be sent before dropping the connection.
243The default value for
244.Dv TCP_KEEPCNT
1931d00d 245is tcp.keepcnt seconds.
efb6b593
SZ
246For the accepted sockets,
247the
248.Dv TCP_KEEPCNT
249option value is inherited from the listening socket.
984263bc
MD
250.El
251.Pp
252The option level for the
253.Xr setsockopt 2
254call is the protocol number for
255.Tn TCP ,
256available from
257.Xr getprotobyname 3 ,
258or
259.Dv IPPROTO_TCP .
260All options are declared in
44cb301e 261.In netinet/tcp.h .
984263bc
MD
262.Pp
263Options at the
264.Tn IP
265transport level may be used with
266.Tn TCP ;
267see
268.Xr ip 4 .
269Incoming connection requests that are source-routed are noted,
270and the reverse source route is used in responding.
271.Sh MIB VARIABLES
272The
273.Nm
274protocol implements a number of variables in the
275.Li net.inet
276branch of the
277.Xr sysctl 3
278MIB.
279.Bl -tag -width TCPCTL_DO_RFC1644
280.It Dv TCPCTL_DO_RFC1323
281.Pq tcp.rfc1323
282Implement the window scaling and timestamp options of RFC 1323
283(default true).
984263bc
MD
284.It Dv TCPCTL_MSSDFLT
285.Pq tcp.mssdflt
286The default value used for the maximum segment size
287.Pq Dq MSS
288when no advice to the contrary is received from MSS negotiation.
289.It Dv TCPCTL_SENDSPACE
290.Pq tcp.sendspace
291Maximum TCP send window.
292.It Dv TCPCTL_RECVSPACE
293.Pq tcp.recvspace
294Maximum TCP receive window.
295.It tcp.log_in_vain
296Log any connection attempts to ports where there is not a socket
297accepting connections.
298The value of 1 limits the logging to SYN (connection establishment)
299packets only.
300That of 2 results in any TCP packets to closed ports being logged.
301Any value unlisted above disables the logging
302(default is 0, i.e., the logging is disabled).
984263bc
MD
303.It tcp.msl
304The Maximum Segment Lifetime for a packet.
305.It tcp.keepinit
306Timeout for new, non-established TCP connections.
307.It tcp.keepidle
308Amount of time the connection should be idle before keepalive
309probes (if enabled) are sent.
310.It tcp.keepintvl
311The interval between keepalive probes sent to remote machines.
312After
ed02878f 313tcp.keepcnt
984263bc 314(default 8) probes are sent, with no response, the connection is dropped.
ed02878f
SZ
315.It tcp.keepcnt
316The maximum number of keepalive probes to be sent
317before dropping the connection.
984263bc
MD
318.It tcp.always_keepalive
319Assume that
320.Dv SO_KEEPALIVE
321is set on all
322.Tn TCP
323connections, the kernel will
324periodically send a packet to the remote host to verify the connection
325is still up.
326.It tcp.icmp_may_rst
327Certain
328.Tn ICMP
329unreachable messages may abort connections in
330.Tn SYN-SENT
331state.
332.It tcp.do_tcpdrain
333Flush packets in the
334.Tn TCP
335reassembly queue if the system is low on mbufs.
336.It tcp.blackhole
337If enabled, disable sending of RST when a connection is attempted
338to a port where there is not a socket accepting connections.
339See
340.Xr blackhole 4 .
341.It tcp.delayed_ack
e9d1c8d1 342Delay ACK to try to piggyback it onto a data packet.
984263bc
MD
343.It tcp.delacktime
344Maximum amount of time before a delayed ACK is sent.
345.It tcp.newreno
346Enable TCP NewReno Fast Recovery algorithm,
347as described in RFC 2582.
348.It tcp.path_mtu_discovery
7c3b84db
JH
349Enables Path MTU Discovery. PMTU Discovery is helpful for avoiding
350IP fragmentation when tranferring lots of data to the same client.
351For web servers, where most of the connections are short and to
352different clients, PMTU Discovery actually hurts performance due
353to unnecessary retransmissions. Turn this on only if most of your
354TCP connections are long transfers or are repeatedly to the same
355set of clients.
984263bc
MD
356.It tcp.tcbhashsize
357Size of the
358.Tn TCP
359control-block hashtable
360(read-only).
361This may be tuned using the kernel option
362.Dv TCBHASHSIZE
363or by setting
364.Va net.inet.tcp.tcbhashsize
365in the
366.Xr loader 8 .
367.It tcp.pcbcount
368Number of active process control blocks
369(read-only).
370.It tcp.syncookies
371Determines whether or not syn cookies should be generated for
372outbound syn-ack packets. Syn cookies are a great help during
373syn flood attacks, and are enabled by default.
374.It tcp.isn_reseed_interval
375The interval (in seconds) specifying how often the secret data used in
376RFC 1948 initial sequence number calculations should be reseeded.
377By default, this variable is set to zero, indicating that
378no reseeding will occur.
379Reseeding should not be necessary, and will break
380.Dv TIME_WAIT
381recycling for a few minutes.
382.It tcp.inet.tcp.rexmit_{min,slop}
383Adjust the retransmit timer calculation for TCP. The slop is
384typically added to the raw calculation to take into account
385occasional variances that the SRTT (smoothed round trip time)
3f625015 386is unable to accommodate, while the minimum specifies an
984263bc
MD
387absolute minimum. While a number of TCP RFCs suggest a 1
388second minimum these RFCs tend to focus on streaming behavior
389and fail to deal with the fact that a 1 second minimum has severe
390detrimental effects over lossy interactive connections, such
391as a 802.11b wireless link, and over very fast but lossy
392connections for those cases not covered by the fast retransmit
393code. For this reason we suggest changing the slop to 200ms and
394setting the minimum to something out of the way, like 20ms,
395which gives you an effective minimum of 200ms (similar to Linux).
396.It tcp.inflight_enable
397Enable
398.Tn TCP
399bandwidth delay product limiting. An attempt will be made to calculate
400the bandwidth delay product for each individual TCP connection and limit
1bf4b486 401the amount of inflight data being transmitted to avoid building up
984263bc
MD
402unnecessary packets in the network. This option is recommended if you
403are serving a lot of data over connections with high bandwidth-delay
404products, such as modems, GigE links, and fast long-haul WANs, and/or
3f625015 405you have configured your machine to accommodate large TCP windows. In such
984263bc
MD
406situations, without this option, you may experience high interactive
407latencies or packet loss due to the overloading of intermediate routers
68b2c890 408and switches. Note that bandwidth delay product limiting only affects
984263bc
MD
409the transmit side of a TCP connection.
410.It tcp.inflight_debug
411Enable debugging for the bandwidth delay product algorithm. This may
412default to on (1) so if you enable the algorithm you should probably also
413disable debugging by setting this variable to 0.
414.It tcp.inflight_min
415This puts an lower bound on the bandwidth delay product window, in bytes.
416A value of 1024 is typically used for debugging. 6000-16000 is more typical
417in a production installation. Setting this value too low may result in
418slow ramp-up times for bursty connections. Setting this value too high
419effectively disables the algorithm.
420.It tcp.inflight_max
421This puts an upper bound on the bandwidth delay product window, in bytes.
422This value should not generally be modified but may be used to set a
423global per-connection limit on queued data, potentially allowing you to
f79ec571 424intentionally set a less than optimum limit to smooth data flow over a
984263bc
MD
425network while still being able to specify huge internal TCP buffers.
426.It tcp.inflight_stab
a4bb2daa
MD
427This value stabilizes the bwnd (write window) calculation at high speeds
428by increasing the bandwidth calculation in 1/10% increments. The default
429value of 50 represents a +5% increase. In addition, bwnd is further increased
430by a fixed 2*maxseg bytes to stabilize the algorithm at low speeds.
431Changing the stab value is not recommended, but you may come across
432situations where tuning is beneficial.
433However, our recommendation for tuning is to stick with only adjusting
434tcp.inflight_min.
435Reducing tcp.inflight_stab too much can lead to upwards of a 20%
436underutilization of the link and prevent the algorithm from properly adapting
437to changing situations. Increasing tcp.inflight_stab too much can lead to
438an excessive packet buffering situation.
984263bc
MD
439.El
440.Sh ERRORS
441A socket operation may fail with one of the following errors returned:
442.Bl -tag -width Er
443.It Bq Er EISCONN
444when trying to establish a connection on a socket which
445already has one;
446.It Bq Er ENOBUFS
447when the system runs out of memory for
448an internal data structure;
449.It Bq Er ETIMEDOUT
450when a connection was dropped
451due to excessive retransmissions;
452.It Bq Er ECONNRESET
453when the remote peer
454forces the connection to be closed;
455.It Bq Er ECONNREFUSED
456when the remote
457peer actively refuses connection establishment (usually because
458no process is listening to the port);
459.It Bq Er EADDRINUSE
460when an attempt
461is made to create a socket with a port which has already been
462allocated;
463.It Bq Er EADDRNOTAVAIL
464when an attempt is made to create a
465socket with a network address for which no network interface
466exists.
467.It Bq Er EAFNOSUPPORT
468when an attempt is made to bind or connect a socket to a multicast
469address.
470.El
471.Sh SEE ALSO
472.Xr getsockopt 2 ,
473.Xr socket 2 ,
474.Xr sysctl 3 ,
475.Xr blackhole 4 ,
476.Xr inet 4 ,
477.Xr intro 4 ,
755d70b8 478.Xr ip 4
984263bc
MD
479.Rs
480.%A V. Jacobson
481.%A R. Braden
482.%A D. Borman
483.%T "TCP Extensions for High Performance"
484.%O RFC 1323
485.Re
51006084
MD
486.Rs
487.%A "A. Heffernan"
488.%T "Protection of BGP Sessions via the TCP MD5 Signature Option"
489.%O "RFC 2385"
490.Re
984263bc
MD
491.Sh HISTORY
492The
493.Nm
494protocol appeared in
495.Bx 4.2 .
496The RFC 1323 extensions for window scaling and timestamps were added
497in
498.Bx 4.4 .