Commit | Line | Data |
---|---|---|
984263bc MD |
1 | .\" Copyright (c) 1983, 1991, 1993 |
2 | .\" The Regents of the University of California. All rights reserved. | |
3 | .\" | |
4 | .\" Redistribution and use in source and binary forms, with or without | |
5 | .\" modification, are permitted provided that the following conditions | |
6 | .\" are met: | |
7 | .\" 1. Redistributions of source code must retain the above copyright | |
8 | .\" notice, this list of conditions and the following disclaimer. | |
9 | .\" 2. Redistributions in binary form must reproduce the above copyright | |
10 | .\" notice, this list of conditions and the following disclaimer in the | |
11 | .\" documentation and/or other materials provided with the distribution. | |
dc71b7ab | 12 | .\" 3. Neither the name of the University nor the names of its contributors |
984263bc MD |
13 | .\" may be used to endorse or promote products derived from this software |
14 | .\" without specific prior written permission. | |
15 | .\" | |
16 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | |
17 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
18 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
19 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | |
20 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
21 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
22 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
23 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
24 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
25 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
26 | .\" SUCH DAMAGE. | |
27 | .\" | |
28 | .\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93 | |
29 | .\" $FreeBSD: src/share/man/man4/tcp.4,v 1.11.2.14 2002/12/29 16:35:38 schweikh Exp $ | |
30 | .\" | |
755d70b8 | 31 | .Dd April 21, 2018 |
984263bc MD |
32 | .Dt TCP 4 |
33 | .Os | |
34 | .Sh NAME | |
35 | .Nm tcp | |
36 | .Nd Internet Transmission Control Protocol | |
37 | .Sh SYNOPSIS | |
38 | .In sys/types.h | |
39 | .In sys/socket.h | |
40 | .In netinet/in.h | |
41 | .Ft int | |
42 | .Fn socket AF_INET SOCK_STREAM 0 | |
43 | .Sh DESCRIPTION | |
44 | The | |
45 | .Tn TCP | |
46 | protocol provides reliable, flow-controlled, two-way | |
47 | transmission of data. It is a byte-stream protocol used to | |
48 | support the | |
49 | .Dv SOCK_STREAM | |
50 | abstraction. TCP uses the standard | |
51 | Internet address format and, in addition, provides a per-host | |
52 | collection of | |
53 | .Dq port addresses . | |
54 | Thus, each address is composed | |
55 | of an Internet address specifying the host and network, with | |
56 | a specific | |
57 | .Tn TCP | |
58 | port on the host identifying the peer entity. | |
59 | .Pp | |
60 | Sockets utilizing the tcp protocol are either | |
61 | .Dq active | |
62 | or | |
63 | .Dq passive . | |
64 | Active sockets initiate connections to passive | |
65 | sockets. By default | |
66 | .Tn TCP | |
67 | sockets are created active; to create a | |
68 | passive socket the | |
69 | .Xr listen 2 | |
70 | system call must be used | |
71 | after binding the socket with the | |
72 | .Xr bind 2 | |
73 | system call. Only | |
74 | passive sockets may use the | |
75 | .Xr accept 2 | |
76 | call to accept incoming connections. Only active sockets may | |
77 | use the | |
78 | .Xr connect 2 | |
79 | call to initiate connections. | |
984263bc MD |
80 | .Pp |
81 | Passive sockets may | |
82 | .Dq underspecify | |
83 | their location to match | |
84 | incoming connection requests from multiple networks. This | |
85 | technique, termed | |
86 | .Dq wildcard addressing , | |
87 | allows a single | |
88 | server to provide service to clients on multiple networks. | |
89 | To create a socket which listens on all networks, the Internet | |
90 | address | |
91 | .Dv INADDR_ANY | |
92 | must be bound. The | |
93 | .Tn TCP | |
94 | port may still be specified | |
95 | at this time; if the port is not specified the system will assign one. | |
96 | Once a connection has been established the socket's address is | |
97 | fixed by the peer entity's location. The address assigned the | |
98 | socket is the address associated with the network interface | |
99 | through which packets are being transmitted and received. Normally | |
100 | this address corresponds to the peer entity's network. | |
101 | .Pp | |
102 | .Tn TCP | |
103 | supports a number of socket options which can be set with | |
104 | .Xr setsockopt 2 | |
105 | and tested with | |
106 | .Xr getsockopt 2 : | |
107 | .Bl -tag -width TCP_NODELAYx | |
108 | .It Dv TCP_NODELAY | |
109 | Under most circumstances, | |
110 | .Tn TCP | |
111 | sends data when it is presented; | |
112 | when outstanding data has not yet been acknowledged, it gathers | |
113 | small amounts of output to be sent in a single packet once | |
114 | an acknowledgement is received. | |
115 | For a small number of clients, such as window systems | |
116 | that send a stream of mouse events which receive no replies, | |
117 | this packetization may cause significant delays. | |
118 | The boolean option | |
119 | .Dv TCP_NODELAY | |
120 | defeats this algorithm. | |
121 | .It Dv TCP_MAXSEG | |
122 | By default, a sender\- and receiver-TCP | |
123 | will negotiate among themselves to determine the maximum segment size | |
124 | to be used for each connection. The | |
125 | .Dv TCP_MAXSEG | |
126 | option allows the user to determine the result of this negotiation, | |
127 | and to reduce it if desired. | |
128 | .It Dv TCP_NOOPT | |
129 | .Tn TCP | |
130 | usually sends a number of options in each packet, corresponding to | |
131 | various | |
132 | .Tn TCP | |
133 | extensions which are provided in this implementation. The boolean | |
134 | option | |
135 | .Dv TCP_NOOPT | |
136 | is provided to disable | |
137 | .Tn TCP | |
138 | option use on a per-connection basis. | |
139 | .It Dv TCP_NOPUSH | |
140 | By convention, the sender-TCP | |
141 | will set the | |
142 | .Dq push | |
143 | bit and begin transmission immediately (if permitted) at the end of | |
144 | every user call to | |
145 | .Xr write 2 | |
146 | or | |
147 | .Xr writev 2 . | |
68b63b2b | 148 | When the |
984263bc | 149 | .Dv TCP_NOPUSH |
68b63b2b | 150 | option is set to a non-zero value, |
984263bc MD |
151 | .Tn TCP |
152 | will delay sending any data at all until either the socket is closed, | |
153 | or the internal send buffer is filled. | |
86de01bd SW |
154 | .\".It Dv TCP_SIGNATURE_ENABLE |
155 | .\"This option enables the use of MD5 digests (also known as TCP-MD5) | |
156 | .\"on writes to the specified socket. | |
157 | .\"In the current release, only outgoing traffic is digested; | |
158 | .\"digests on incoming traffic are not verified. | |
159 | .\"The current default behavior for the system is to respond to a system | |
160 | .\"advertising this option with TCP-MD5; this may change. | |
161 | .\".Pp | |
162 | .\"One common use for this in a DragonFlyBSD router deployment is to enable | |
163 | .\"based routers to interwork with Cisco equipment at peering points. | |
164 | .\"Support for this feature conforms to RFC 2385. | |
165 | .\"Only IPv4 (AF_INET) sessions are supported. | |
166 | .\".Pp | |
167 | .\"In order for this option to function correctly, it is necessary for the | |
168 | .\"administrator to add a tcp-md5 key entry to the system's security | |
169 | .\"associations database (SADB) using the | |
170 | .\".Xr setkey 8 | |
171 | .\"utility. | |
172 | .\"This entry must have an SPI of 0x1000 and can therefore only be specified | |
173 | .\"on a per-host basis at this time. | |
174 | .\".Pp | |
175 | .\"If an SADB entry cannot be found for the destination, the outgoing traffic | |
176 | .\"will have an invalid digest option prepended, and the following error message | |
177 | .\"will be visible on the system console: | |
178 | .\".Em "tcpsignature_compute: SADB lookup failed for %d.%d.%d.%d" . | |
efb6b593 SZ |
179 | .It Dv TCP_KEEPINIT |
180 | If a | |
181 | .Tn TCP | |
182 | connection cannot be established within a period of time, | |
183 | .Tn TCP | |
184 | will time out the connection attempt. | |
185 | The | |
186 | .Dv TCP_KEEPINIT | |
1931d00d | 187 | option specifies the number of seconds to wait |
efb6b593 SZ |
188 | before the connection attempt times out. |
189 | The default value for | |
190 | .Dv TCP_KEEPINIT | |
1931d00d | 191 | is tcp.keepinit seconds. |
efb6b593 SZ |
192 | For the accepted sockets, the |
193 | .Dv TCP_KEEPINIT | |
194 | option value is inherited from the listening socket. | |
195 | .It Dv TCP_KEEPIDLE | |
196 | When the | |
197 | .Dv SO_KEEPALIVE | |
198 | option is enabled, | |
199 | .Tn TCP | |
200 | sends a keepalive probe to the remote system of a connection | |
201 | that has been idle for a period of time. | |
202 | The | |
203 | .Dv TCP_KEEPIDLE | |
1931d00d | 204 | specifies the number of seconds before |
efb6b593 SZ |
205 | .Tn TCP |
206 | will send the initial keepalive probe. | |
207 | The default value for | |
208 | .Dv TCP_KEEPIDLE | |
1931d00d | 209 | is tcp.keepidle seconds. |
efb6b593 SZ |
210 | For the accepted sockets, |
211 | the | |
212 | .Dv TCP_KEEPIDLE | |
213 | option value is inherited from the listening socket. | |
214 | .It Dv TCP_KEEPINTVL | |
215 | When the | |
216 | .Dv SO_KEEPALIVE | |
217 | option is enabled, | |
218 | .Tn TCP | |
219 | sends a keepalive probe to the remote system of a connection | |
220 | that has been idle for a period of time. | |
221 | The | |
222 | .Dv TCP_KEEPINTVL | |
1931d00d | 223 | option specifies the number of seconds to wait |
efb6b593 SZ |
224 | before retransmitting a keepalive probe. |
225 | The default value for | |
226 | .Dv TCP_KEEPINTVL | |
1931d00d | 227 | is tcp.keepintvl seconds. |
efb6b593 SZ |
228 | For the accepted sockets, |
229 | the | |
230 | .Dv TCP_KEEPINTVL | |
231 | option value is inherited from the listening socket. | |
232 | .It Dv TCP_KEEPCNT | |
233 | When the | |
234 | .Dv SO_KEEPALIVE | |
235 | option is enabled, | |
236 | .Tn TCP | |
237 | sends a keepalive probe to the remote system of a connection | |
238 | that has been idle for a period of time. | |
239 | The | |
240 | .Dv TCP_KEEPCNT | |
241 | option specifies the maximum number of keepalive | |
242 | probes to be sent before dropping the connection. | |
243 | The default value for | |
244 | .Dv TCP_KEEPCNT | |
1931d00d | 245 | is tcp.keepcnt seconds. |
efb6b593 SZ |
246 | For the accepted sockets, |
247 | the | |
248 | .Dv TCP_KEEPCNT | |
249 | option value is inherited from the listening socket. | |
984263bc MD |
250 | .El |
251 | .Pp | |
252 | The option level for the | |
253 | .Xr setsockopt 2 | |
254 | call is the protocol number for | |
255 | .Tn TCP , | |
256 | available from | |
257 | .Xr getprotobyname 3 , | |
258 | or | |
259 | .Dv IPPROTO_TCP . | |
260 | All options are declared in | |
44cb301e | 261 | .In netinet/tcp.h . |
984263bc MD |
262 | .Pp |
263 | Options at the | |
264 | .Tn IP | |
265 | transport level may be used with | |
266 | .Tn TCP ; | |
267 | see | |
268 | .Xr ip 4 . | |
269 | Incoming connection requests that are source-routed are noted, | |
270 | and the reverse source route is used in responding. | |
271 | .Sh MIB VARIABLES | |
272 | The | |
273 | .Nm | |
274 | protocol implements a number of variables in the | |
275 | .Li net.inet | |
276 | branch of the | |
277 | .Xr sysctl 3 | |
278 | MIB. | |
279 | .Bl -tag -width TCPCTL_DO_RFC1644 | |
280 | .It Dv TCPCTL_DO_RFC1323 | |
281 | .Pq tcp.rfc1323 | |
282 | Implement the window scaling and timestamp options of RFC 1323 | |
283 | (default true). | |
984263bc MD |
284 | .It Dv TCPCTL_MSSDFLT |
285 | .Pq tcp.mssdflt | |
286 | The default value used for the maximum segment size | |
287 | .Pq Dq MSS | |
288 | when no advice to the contrary is received from MSS negotiation. | |
289 | .It Dv TCPCTL_SENDSPACE | |
290 | .Pq tcp.sendspace | |
291 | Maximum TCP send window. | |
292 | .It Dv TCPCTL_RECVSPACE | |
293 | .Pq tcp.recvspace | |
294 | Maximum TCP receive window. | |
295 | .It tcp.log_in_vain | |
296 | Log any connection attempts to ports where there is not a socket | |
297 | accepting connections. | |
298 | The value of 1 limits the logging to SYN (connection establishment) | |
299 | packets only. | |
300 | That of 2 results in any TCP packets to closed ports being logged. | |
301 | Any value unlisted above disables the logging | |
302 | (default is 0, i.e., the logging is disabled). | |
984263bc MD |
303 | .It tcp.msl |
304 | The Maximum Segment Lifetime for a packet. | |
305 | .It tcp.keepinit | |
306 | Timeout for new, non-established TCP connections. | |
307 | .It tcp.keepidle | |
308 | Amount of time the connection should be idle before keepalive | |
309 | probes (if enabled) are sent. | |
310 | .It tcp.keepintvl | |
311 | The interval between keepalive probes sent to remote machines. | |
312 | After | |
ed02878f | 313 | tcp.keepcnt |
984263bc | 314 | (default 8) probes are sent, with no response, the connection is dropped. |
ed02878f SZ |
315 | .It tcp.keepcnt |
316 | The maximum number of keepalive probes to be sent | |
317 | before dropping the connection. | |
984263bc MD |
318 | .It tcp.always_keepalive |
319 | Assume that | |
320 | .Dv SO_KEEPALIVE | |
321 | is set on all | |
322 | .Tn TCP | |
323 | connections, the kernel will | |
324 | periodically send a packet to the remote host to verify the connection | |
325 | is still up. | |
326 | .It tcp.icmp_may_rst | |
327 | Certain | |
328 | .Tn ICMP | |
329 | unreachable messages may abort connections in | |
330 | .Tn SYN-SENT | |
331 | state. | |
332 | .It tcp.do_tcpdrain | |
333 | Flush packets in the | |
334 | .Tn TCP | |
335 | reassembly queue if the system is low on mbufs. | |
336 | .It tcp.blackhole | |
337 | If enabled, disable sending of RST when a connection is attempted | |
338 | to a port where there is not a socket accepting connections. | |
339 | See | |
340 | .Xr blackhole 4 . | |
341 | .It tcp.delayed_ack | |
e9d1c8d1 | 342 | Delay ACK to try to piggyback it onto a data packet. |
984263bc MD |
343 | .It tcp.delacktime |
344 | Maximum amount of time before a delayed ACK is sent. | |
345 | .It tcp.newreno | |
346 | Enable TCP NewReno Fast Recovery algorithm, | |
347 | as described in RFC 2582. | |
348 | .It tcp.path_mtu_discovery | |
7c3b84db JH |
349 | Enables Path MTU Discovery. PMTU Discovery is helpful for avoiding |
350 | IP fragmentation when tranferring lots of data to the same client. | |
351 | For web servers, where most of the connections are short and to | |
352 | different clients, PMTU Discovery actually hurts performance due | |
353 | to unnecessary retransmissions. Turn this on only if most of your | |
354 | TCP connections are long transfers or are repeatedly to the same | |
355 | set of clients. | |
984263bc MD |
356 | .It tcp.tcbhashsize |
357 | Size of the | |
358 | .Tn TCP | |
359 | control-block hashtable | |
360 | (read-only). | |
361 | This may be tuned using the kernel option | |
362 | .Dv TCBHASHSIZE | |
363 | or by setting | |
364 | .Va net.inet.tcp.tcbhashsize | |
365 | in the | |
366 | .Xr loader 8 . | |
367 | .It tcp.pcbcount | |
368 | Number of active process control blocks | |
369 | (read-only). | |
370 | .It tcp.syncookies | |
371 | Determines whether or not syn cookies should be generated for | |
372 | outbound syn-ack packets. Syn cookies are a great help during | |
373 | syn flood attacks, and are enabled by default. | |
374 | .It tcp.isn_reseed_interval | |
375 | The interval (in seconds) specifying how often the secret data used in | |
376 | RFC 1948 initial sequence number calculations should be reseeded. | |
377 | By default, this variable is set to zero, indicating that | |
378 | no reseeding will occur. | |
379 | Reseeding should not be necessary, and will break | |
380 | .Dv TIME_WAIT | |
381 | recycling for a few minutes. | |
382 | .It tcp.inet.tcp.rexmit_{min,slop} | |
383 | Adjust the retransmit timer calculation for TCP. The slop is | |
384 | typically added to the raw calculation to take into account | |
385 | occasional variances that the SRTT (smoothed round trip time) | |
3f625015 | 386 | is unable to accommodate, while the minimum specifies an |
984263bc MD |
387 | absolute minimum. While a number of TCP RFCs suggest a 1 |
388 | second minimum these RFCs tend to focus on streaming behavior | |
389 | and fail to deal with the fact that a 1 second minimum has severe | |
390 | detrimental effects over lossy interactive connections, such | |
391 | as a 802.11b wireless link, and over very fast but lossy | |
392 | connections for those cases not covered by the fast retransmit | |
393 | code. For this reason we suggest changing the slop to 200ms and | |
394 | setting the minimum to something out of the way, like 20ms, | |
395 | which gives you an effective minimum of 200ms (similar to Linux). | |
396 | .It tcp.inflight_enable | |
397 | Enable | |
398 | .Tn TCP | |
399 | bandwidth delay product limiting. An attempt will be made to calculate | |
400 | the bandwidth delay product for each individual TCP connection and limit | |
1bf4b486 | 401 | the amount of inflight data being transmitted to avoid building up |
984263bc MD |
402 | unnecessary packets in the network. This option is recommended if you |
403 | are serving a lot of data over connections with high bandwidth-delay | |
404 | products, such as modems, GigE links, and fast long-haul WANs, and/or | |
3f625015 | 405 | you have configured your machine to accommodate large TCP windows. In such |
984263bc MD |
406 | situations, without this option, you may experience high interactive |
407 | latencies or packet loss due to the overloading of intermediate routers | |
68b2c890 | 408 | and switches. Note that bandwidth delay product limiting only affects |
984263bc MD |
409 | the transmit side of a TCP connection. |
410 | .It tcp.inflight_debug | |
411 | Enable debugging for the bandwidth delay product algorithm. This may | |
412 | default to on (1) so if you enable the algorithm you should probably also | |
413 | disable debugging by setting this variable to 0. | |
414 | .It tcp.inflight_min | |
415 | This puts an lower bound on the bandwidth delay product window, in bytes. | |
416 | A value of 1024 is typically used for debugging. 6000-16000 is more typical | |
417 | in a production installation. Setting this value too low may result in | |
418 | slow ramp-up times for bursty connections. Setting this value too high | |
419 | effectively disables the algorithm. | |
420 | .It tcp.inflight_max | |
421 | This puts an upper bound on the bandwidth delay product window, in bytes. | |
422 | This value should not generally be modified but may be used to set a | |
423 | global per-connection limit on queued data, potentially allowing you to | |
f79ec571 | 424 | intentionally set a less than optimum limit to smooth data flow over a |
984263bc MD |
425 | network while still being able to specify huge internal TCP buffers. |
426 | .It tcp.inflight_stab | |
a4bb2daa MD |
427 | This value stabilizes the bwnd (write window) calculation at high speeds |
428 | by increasing the bandwidth calculation in 1/10% increments. The default | |
429 | value of 50 represents a +5% increase. In addition, bwnd is further increased | |
430 | by a fixed 2*maxseg bytes to stabilize the algorithm at low speeds. | |
431 | Changing the stab value is not recommended, but you may come across | |
432 | situations where tuning is beneficial. | |
433 | However, our recommendation for tuning is to stick with only adjusting | |
434 | tcp.inflight_min. | |
435 | Reducing tcp.inflight_stab too much can lead to upwards of a 20% | |
436 | underutilization of the link and prevent the algorithm from properly adapting | |
437 | to changing situations. Increasing tcp.inflight_stab too much can lead to | |
438 | an excessive packet buffering situation. | |
984263bc MD |
439 | .El |
440 | .Sh ERRORS | |
441 | A socket operation may fail with one of the following errors returned: | |
442 | .Bl -tag -width Er | |
443 | .It Bq Er EISCONN | |
444 | when trying to establish a connection on a socket which | |
445 | already has one; | |
446 | .It Bq Er ENOBUFS | |
447 | when the system runs out of memory for | |
448 | an internal data structure; | |
449 | .It Bq Er ETIMEDOUT | |
450 | when a connection was dropped | |
451 | due to excessive retransmissions; | |
452 | .It Bq Er ECONNRESET | |
453 | when the remote peer | |
454 | forces the connection to be closed; | |
455 | .It Bq Er ECONNREFUSED | |
456 | when the remote | |
457 | peer actively refuses connection establishment (usually because | |
458 | no process is listening to the port); | |
459 | .It Bq Er EADDRINUSE | |
460 | when an attempt | |
461 | is made to create a socket with a port which has already been | |
462 | allocated; | |
463 | .It Bq Er EADDRNOTAVAIL | |
464 | when an attempt is made to create a | |
465 | socket with a network address for which no network interface | |
466 | exists. | |
467 | .It Bq Er EAFNOSUPPORT | |
468 | when an attempt is made to bind or connect a socket to a multicast | |
469 | address. | |
470 | .El | |
471 | .Sh SEE ALSO | |
472 | .Xr getsockopt 2 , | |
473 | .Xr socket 2 , | |
474 | .Xr sysctl 3 , | |
475 | .Xr blackhole 4 , | |
476 | .Xr inet 4 , | |
477 | .Xr intro 4 , | |
755d70b8 | 478 | .Xr ip 4 |
984263bc MD |
479 | .Rs |
480 | .%A V. Jacobson | |
481 | .%A R. Braden | |
482 | .%A D. Borman | |
483 | .%T "TCP Extensions for High Performance" | |
484 | .%O RFC 1323 | |
485 | .Re | |
51006084 MD |
486 | .Rs |
487 | .%A "A. Heffernan" | |
488 | .%T "Protection of BGP Sessions via the TCP MD5 Signature Option" | |
489 | .%O "RFC 2385" | |
490 | .Re | |
984263bc MD |
491 | .Sh HISTORY |
492 | The | |
493 | .Nm | |
494 | protocol appeared in | |
495 | .Bx 4.2 . | |
496 | The RFC 1323 extensions for window scaling and timestamps were added | |
497 | in | |
498 | .Bx 4.4 . |