share/doc/psd/20.ipctut/tutor.me

   1 .\" Copyright (c) 1986, 1993
   2 .\"     The Regents of the University of California.  All rights reserved.
   3 .\"
   4 .\" Redistribution and use in source and binary forms, with or without
   5 .\" modification, are permitted provided that the following conditions
   6 .\" are met:
   7 .\" 1. Redistributions of source code must retain the above copyright
   8 .\"    notice, this list of conditions and the following disclaimer.
   9 .\" 2. Redistributions in binary form must reproduce the above copyright
  10 .\"    notice, this list of conditions and the following disclaimer in the
  11 .\"    documentation and/or other materials provided with the distribution.
  12 .\" 3. All advertising materials mentioning features or use of this software
  13 .\"    must display the following acknowledgement:
  14 .\"     This product includes software developed by the University of
  15 .\"     California, Berkeley and its contributors.
  16 .\" 4. Neither the name of the University nor the names of its contributors
  17 .\"    may be used to endorse or promote products derived from this software
  18 .\"    without specific prior written permission.
  19 .\"
  20 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  21 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  22 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  23 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  24 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  25 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  26 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  27 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  28 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  29 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  30 .\" SUCH DAMAGE.
  31 .\"
  32 .\"     @(#)tutor.me    8.1 (Berkeley) 8/14/93
  33 .\"
  34 .oh 'Introductory 4.4BSD IPC''PSD:20-%'
  35 .eh 'PSD:20-%''Introductory 4.4BSD IPC'
  36 .rs
  37 .sp 2
  38 .sz 14
  39 .ft B
  40 .ce 2
  41 An Introductory 4.4BSD
  42 Interprocess Communication Tutorial
  43 .sz 10
  44 .sp 2
  45 .ce
  46 .i "Stuart Sechrest"
  47 .ft
  48 .sp
  49 .ce 4
  50 Computer Science Research Group
  51 Computer Science Division
  52 Department of Electrical Engineering and Computer Science
  53 University of California, Berkeley
  54 .sp 2
  55 .ce
  56 .i ABSTRACT
  57 .sp
  58 .(c
  59 .pp
  60 Berkeley UNIX\(dg 4.4BSD offers several choices for interprocess communication.
  61 To aid the programmer in  developing programs which are comprised of
  62 cooperating
  63 processes, the different choices are discussed and a series of example
  64 programs are presented.  These programs
  65 demonstrate in a simple way the use of pipes, socketpairs, sockets
  66 and the use of datagram and stream communication.  The intent of this
  67 document is to present a few simple example programs, not to describe the
  68 networking system in full.
  69 .)c
  70 .sp 2
  71 .(f
  72 \(dg\|UNIX is a trademark of AT&T Bell Laboratories.
  73 .)f
  74 .b
  75 .sh 1 "Goals"
  76 .r
  77 .pp
  78 Facilities for interprocess communication (IPC) and networking
  79 were a major addition to UNIX in the Berkeley UNIX 4.2BSD release.
  80 These facilities required major additions and some changes
  81 to the system interface.
  82 The basic idea of this interface is to make IPC similar to file I/O.
  83 In UNIX a process has a set of I/O descriptors, from which one reads
  84 and to which one writes.
  85 Descriptors may refer to normal files, to devices (including terminals),
  86 or to communication channels.
  87 The use of a descriptor has three phases: its creation,
  88 its use for reading and writing, and its destruction.  By using descriptors
  89 to write files, rather than simply naming the target file in the write
  90 call, one gains a surprising amount of flexibility.  Often, the program that
  91 creates a descriptor will be different from the program that uses the
  92 descriptor.  For example the shell can create a descriptor for the output
  93 of the `ls'
  94 command that will cause the listing to appear in a file rather than
  95 on a terminal.
  96 Pipes are another form of descriptor that have been used in UNIX
  97 for some time.
  98 Pipes allow one-way data transmission from one process
  99 to another; the two processes and the pipe must be set up by a common
 100 ancestor.
 101 .pp
 102 The use of descriptors is not the only communication interface
 103 provided by UNIX.
 104 The signal mechanism sends a tiny amount of information from one
 105 process to another.
 106 The signaled process receives only the signal type,
 107 not the identity of the sender,
 108 and the number of possible signals is small.
 109 The signal semantics limit the flexibility of the signaling mechanism
 110 as a means of interprocess communication.
 111 .pp
 112 The identification of IPC with I/O is quite longstanding in UNIX and
 113 has proved quite successful.  At first, however, IPC was limited to
 114 processes communicating within a single machine.  With Berkeley UNIX
 115 4.2BSD this expanded to include IPC between machines.  This expansion
 116 has necessitated some change in the way that descriptors are created.
 117 Additionally, new possibilities for the meaning of read and write have
 118 been admitted.  Originally the meanings, or semantics, of these terms
 119 were fairly simple.  When you wrote something it was delivered.  When
 120 you read something, you were blocked until the data arrived.
 121 Other possibilities exist,
 122 however.  One can write without full assurance of delivery if one can
 123 check later to catch occasional failures.  Messages can be kept as
 124 discrete units or merged into a stream.
 125 One can ask to read, but insist on not waiting if nothing is immediately
 126 available.  These new possibilities are allowed in the Berkeley UNIX IPC
 127 interface.
 128 .pp
 129 Thus Berkeley UNIX 4.4BSD offers several choices for IPC.
 130 This paper presents simple examples that illustrate some of
 131 the choices.
 132 The reader is presumed to be familiar with the C programming language
 133 [Kernighan & Ritchie 1978],
 134 but not necessarily with the system calls of the UNIX system or with
 135 processes and interprocess communication.
 136 The paper reviews the notion of a process and the types of
 137 communication that are supported by Berkeley UNIX 4.4BSD.
 138 A series of examples are presented that create processes that communicate
 139 with one another.  The programs show different ways of establishing
 140 channels of communication.
 141 Finally, the calls that actually transfer data are reviewed.
 142 To clearly present how communication can take place,
 143 the example programs have been cleared of anything that
 144 might be construed as useful work.
 145 They can, therefore, serve as models
 146 for the programmer trying to construct programs which are comprised of
 147 cooperating processes.
 148 .b
 149 .sh 1 "Processes"
 150 .pp
 151 A \fIprogram\fP is both a sequence of statements and a rough way of referring
 152 to the computation that occurs when the compiled statements are run.
 153 A \fIprocess\fP can be thought of as a single line of control in a program.
 154 Most programs execute some statements, go through a few loops, branch in
 155 various directions and then end.  These are single process programs.
 156 Programs can also have a point where control splits into two independent lines,
 157 an action called \fIforking.\fP
 158 In UNIX these lines can never join again.  A call to the system routine
 159 \fIfork()\fP, causes a process to split in this way.
 160 The result of this call is that two independent processes will be
 161 running, executing exactly the same code.
 162 Memory values will be the same for all values set before the fork, but,
 163 subsequently, each version will be able to change only the
 164 value of its own copy of each variable.
 165 Initially, the only difference between the two will be the value returned by
 166 \fIfork().\fP  The parent will receive a process id for the child,
 167 the child will receive a zero.
 168 Calls to \fIfork(),\fP
 169 therefore, typically precede, or are included in, an if-statement.
 170 .pp
 171 A process views the rest of the system through a private table of descriptors.
 172 The descriptors can represent open files or sockets (sockets are communication
 173 objects that will be discussed below).  Descriptors are referred to
 174 by their index numbers in the table.  The first three descriptors are often
 175 known by special names, \fI stdin, stdout\fP and \fIstderr\fP.
 176 These are the standard input, output and error.
 177 When a process forks, its descriptor table is copied to the child.
 178 Thus, if the parent's standard input is being taken from a terminal
 179 (devices are also treated as files in UNIX), the child's input will
 180 be taken from the
 181 same terminal.  Whoever reads first will get the input.  If, before forking,
 182 the parent changes its standard input so that it is reading from a
 183 new file, the child will take its input from the new file.  It is
 184 also possible to take input from a socket, rather than from a file.
 185 .b
 186 .sh 1 "Pipes"
 187 .r
 188 .pp
 189 Most users of UNIX know that they can pipe the output of a
 190 program ``prog1'' to the input of another, ``prog2,'' by typing the command
 191 \fI``prog1 | prog2.''\fP
 192 This is called ``piping'' the output of one program
 193 to another because the mechanism used to transfer the output is called a
 194 pipe.
 195 When the user types a command, the command is read by the shell, which
 196 decides how to execute it.  If the command is simple, for example,
 197 .i "``prog1,''"
 198 the shell forks a process, which executes the program, prog1, and then dies.
 199 The shell waits for this termination and then prompts for the next
 200 command.
 201 If the command is a compound command,
 202 .i "``prog1 | prog2,''"
 203 the shell creates two processes connected by a pipe. One process
 204 runs the program, prog1, the other runs prog2.  The pipe is an I/O
 205 mechanism with two ends, or sockets.  Data that is written into one socket
 206 can be read from the other.
 207 .(z
 208 .ft CW
 209 .so pipe.c
 210 .ft
 211 .ce 1
 212 Figure 1\ \ Use of a pipe
 213 .)z
 214 .pp
 215 Since a program specifies its input and output only by the descriptor table
 216 indices, which appear as variables or constants,
 217 the input source and output destination can be changed without
 218 changing the text of the program.
 219 It is in this way that the shell is able to set up pipes.  Before executing
 220 prog1, the process can close whatever is at \fIstdout\fP
 221 and replace it with one
 222 end of a pipe.  Similarly, the process that will execute prog2 can substitute
 223 the opposite end of the pipe for
 224 \fIstdin.\fP
 225 .pp
 226 Let us now examine a program that creates a pipe for communication between
 227 its child and itself (Figure 1).
 228 A pipe is created by a parent process, which then forks.
 229 When a process forks, the parent's descriptor table is copied into
 230 the child's.
 231 .pp
 232 In Figure 1, the parent process makes a call to the system routine
 233 \fIpipe().\fP
 234 This routine creates a pipe and places descriptors for the sockets
 235 for the two ends of the pipe in the process's descriptor table.
 236 \fIPipe()\fP
 237 is passed an array into which it places the index numbers of the
 238 sockets it created.
 239 The two ends are not equivalent.  The socket whose index is
 240 returned in the low word of the array is opened for reading only,
 241 while the socket in the high end is opened only for writing.
 242 This corresponds to the fact that the standard input is the first
 243 descriptor of a process's descriptor table and the standard output
 244 is the second.  After creating the pipe, the parent creates the child
 245 with which it will share the pipe by calling \fIfork().\fP
 246 Figure 2 illustrates the effect of a fork.
 247 The parent process's descriptor table points to both ends of the pipe.
 248 After the fork, both parent's and child's descriptor tables point to
 249 the pipe.
 250 The child can then use the pipe to send a message to the parent.
 251 .(z
 252 .so fig2.pic
 253 .ce 2
 254 Figure 2\ \ Sharing a pipe between parent and child
 255 .ce 0
 256 .)z
 257 .pp
 258 Just what is a pipe?
 259 It is a one-way communication mechanism, with one end opened
 260 for reading and the other end for writing.
 261 Therefore, parent and child need to agree on which way to turn
 262 the pipe, from parent to child or the other way around.
 263 Using the same pipe for communication both from parent to child and
 264 from child to parent would be possible (since both processes have
 265 references to both ends), but very complicated.
 266 If the parent and child are to have a two-way conversation,
 267 the parent creates two pipes, one for use in each direction.
 268 (In accordance with their plans, both parent and child in the example above
 269 close the socket that they will not use.  It is not required that unused
 270 descriptors be closed, but it is good practice.)
 271 A pipe is also a \fIstream\fP communication mechanism; that
 272 is, all messages sent through the pipe are placed in order
 273 and reliably delivered.  When the reader asks for a certain
 274 number of bytes from this
 275 stream, he is given as many bytes as are available, up
 276 to the amount of the request. Note that these bytes may have come from
 277 the same call to \fIwrite()\fR or from several calls to \fIwrite()\fR
 278 which were concatenated.
 279 .b
 280 .sh 1 "Socketpairs"
 281 .r
 282 .pp
 283 Berkeley UNIX 4.4BSD provides a slight generalization of pipes.  A pipe is a
 284 pair of connected sockets for one-way stream communication.  One may
 285 obtain a pair of connected sockets for two-way stream communication
 286 by calling the routine \fIsocketpair().\fP
 287 The program in Figure 3 calls \fIsocketpair()\fP
 288 to create such a connection.  The program uses the link for
 289 communication in both directions.  Since socketpairs are
 290 an extension of pipes, their use resembles that of pipes.
 291 Figure 4 illustrates the result of a fork following a call to
 292 \fIsocketpair().\fP
 293 .pp
 294 \fISocketpair()\fP
 295 takes as
 296 arguments a specification of a domain, a style of communication, and a
 297 protocol.
 298 These are the parameters shown in the example.
 299 Domains and protocols will be discussed in the next section.
 300 Briefly,
 301 a domain is a space of names that may be bound
 302 to sockets and implies certain other conventions.
 303 Currently, socketpairs have only been implemented for one
 304 domain, called the UNIX domain.
 305 The UNIX domain uses UNIX path names for naming sockets.
 306 It only allows communication
 307 between sockets on the same machine.
 308 .pp
 309 Note that the header files
 310 .i "<sys/socket.h>"
 311 and
 312 .i "<sys/types.h>."
 313 are required in this program.
 314 The constants AF_UNIX and SOCK_STREAM are defined in
 315 .i "<sys/socket.h>,"
 316 which in turn requires the file
 317 .i "<sys/types.h>"
 318 for some of its definitions.
 319 .(z
 320 .ft CW
 321 .so socketpair.c
 322 .ft
 323 .ce 1
 324 Figure 3\ \ Use of a socketpair
 325 .)z
 326 .(z
 327 .so fig3.pic
 328 .ce 1
 329 Figure 4\ \ Sharing a socketpair between parent and child
 330 .)z
 331 .b
 332 .sh 1 "Domains and Protocols"
 333 .r
 334 .pp
 335 Pipes and socketpairs are a simple solution for communicating between
 336 a parent and child or between child processes.
 337 What if we wanted to have processes that have no common ancestor
 338 with whom to set up communication?
 339 Neither standard UNIX pipes nor socketpairs are
 340 the answer here, since both mechanisms require a common ancestor to
 341 set up the communication.
 342 We would like to have two processes separately create sockets
 343 and then have messages sent between them.  This is often the
 344 case when providing or using a service in the system.  This is
 345 also the case when the communicating processes are on separate machines.
 346 In Berkeley UNIX 4.4BSD one can create individual sockets, give them names and
 347 send messages between them.
 348 .pp
 349 Sockets created by different programs use names to refer to one another;
 350 names generally must be translated into addresses for use.
 351 The space from which an address is drawn is referred to as a
 352 .i domain.
 353 There are several domains for sockets.
 354 Two that will be used in the examples here are the UNIX domain (or AF_UNIX,
 355 for Address Format UNIX) and the Internet domain (or AF_INET).
 356 UNIX domain IPC is an experimental facility in 4.2BSD and 4.3BSD.
 357 In the UNIX domain, a socket is given a path name within the file system
 358 name space.
 359 A file system node is created for the socket and other processes may
 360 then refer to the socket by giving the proper pathname.
 361 UNIX domain names, therefore, allow communication between any two processes
 362 that work in the same file system.
 363 The Internet domain is the UNIX implementation of the DARPA Internet
 364 standard protocols IP/TCP/UDP.
 365 Addresses in the Internet domain consist of a machine network address
 366 and an identifying number, called a port.
 367 Internet domain names allow communication between machines.
 368 .pp
 369 Communication follows some particular ``style.''
 370 Currently, communication is either through a \fIstream\fP
 371 or by \fIdatagram.\fP
 372 Stream communication implies several things.  Communication takes
 373 place across a connection between two sockets.  The communication
 374 is reliable, error-free, and, as in pipes, no message boundaries are
 375 kept. Reading from a stream may result in reading the data sent from
 376 one or several calls to \fIwrite()\fP
 377 or only part of the data from a single call, if there is not enough room
 378 for the entire message, or if not all the data from a large message
 379 has been transferred.
 380 The protocol implementing such a style will retransmit messages
 381 received with errors. It will also return error messages if one tries to
 382 send a message after the connection has been broken.
 383 Datagram communication does not use connections.  Each message is
 384 addressed individually.  If the address is correct, it will generally
 385 be received, although this is not guaranteed.  Often datagrams are
 386 used for requests that require a response from the
 387 recipient.  If no response
 388 arrives in a reasonable amount of time, the request is repeated.
 389 The individual datagrams will be kept separate when they are read, that
 390 is, message boundaries are preserved.
 391 .pp
 392 The difference in performance between the two styles of communication is
 393 generally less important than the difference in semantics.  The
 394 performance gain that one might find in using datagrams must be weighed
 395 against the increased complexity of the program, which must now concern
 396 itself with lost or out of order messages.  If lost messages may simply be
 397 ignored, the quantity of traffic may be a consideration. The expense
 398 of setting up a connection is best justified by frequent use of the connection.
 399 Since the performance of a protocol changes as it is tuned for different
 400 situations, it is best to seek the most up-to-date information when
 401 making choices for a program in which performance is crucial.
 402 .pp
 403 A protocol is a set of rules, data formats and conventions that regulate the
 404 transfer of data between participants in the communication.
 405 In general, there is one protocol for each socket type (stream,
 406 datagram, etc.) within each domain.
 407 The code that implements a protocol
 408 keeps track of the names that are bound to sockets,
 409 sets up connections and transfers data between sockets,
 410 perhaps sending the data across a network.
 411 This code also keeps track of the names that are bound to sockets.
 412 It is possible for several protocols, differing only in low level
 413 details, to implement the same style of communication within
 414 a particular domain.  Although it is possible to select
 415 which protocol should be used, for nearly all uses it is sufficient to
 416 request the default protocol.  This has been done in all of the example
 417 programs.
 418 .pp
 419 One specifies the domain, style and protocol of a socket when
 420 it is created.  For example, in Figure 5a the call to \fIsocket()\fP
 421 causes the creation of a datagram socket with the default protocol
 422 in the UNIX domain.
 423 .b
 424 .sh 1 "Datagrams in the UNIX Domain"
 425 .r
 426 .(z
 427 .ft CW
 428 .so udgramread.c
 429 .ft
 430 .ce 1
 431 Figure 5a\ \ Reading UNIX domain datagrams
 432 .)z
 433 .pp
 434 Let us now look at two programs that create sockets separately.
 435 The programs in Figures 5a and 5b use datagram communication
 436 rather than a stream.
 437 The structure used to name UNIX domain sockets is defined
 438 in the file \fI<sys/un.h>.\fP
 439 The definition has also been included in the example for clarity.
 440 .pp
 441 Each program creates a socket with a call to \fIsocket().\fP
 442 These sockets are in the UNIX domain.
 443 Once a name has been decided upon it is attached to a socket by the
 444 system call \fIbind().\fP
 445 The program in Figure 5a uses the name ``socket'',
 446 which it binds to its socket.
 447 This name will appear in the working directory of the program.
 448 The routines in Figure 5b use its
 449 socket only for sending messages.  It does not create a name for
 450 the socket because no other process has to refer to it.
 451 .(z
 452 .ft CW
 453 .so udgramsend.c
 454 .ft
 455 .ce 1
 456 Figure 5b\ \ Sending a UNIX domain datagrams
 457 .)z
 458 .pp
 459 Names in the UNIX domain are path names.  Like file path names they may
 460 be either absolute (e.g. ``/dev/imaginary'') or relative (e.g. ``socket'').
 461 Because these names are used to allow processes to rendezvous, relative
 462 path names can pose difficulties and should be used with care.
 463 When a name is bound into the name space, a file (inode) is allocated in the
 464 file system.  If
 465 the inode is not deallocated, the name will continue to exist even after
 466 the bound socket is closed.  This can cause subsequent runs of a program
 467 to find that a name is unavailable, and can cause
 468 directories to fill up with these
 469 objects.  The names are removed by calling \fIunlink()\fP or using
 470 the \fIrm\fP\|(1) command.
 471 Names in the UNIX domain are only used for rendezvous.  They are not used
 472 for message delivery once a connection is established.  Therefore, in
 473 contrast with the Internet domain, unbound sockets need not be (and are
 474 not) automatically given addresses when they are connected.
 475 .pp
 476 There is no established means of communicating names to interested
 477 parties.  In the example, the program in Figure 5b gets the
 478 name of the socket to which it will send its message through its
 479 command line arguments.  Once a line of communication has been created,
 480 one can send the names of additional, perhaps new, sockets over the link.
 481 Facilities will have to be built that will make the distribution of
 482 names less of a problem than it now is.
 483 .b
 484 .sh 1 "Datagrams in the Internet Domain"
 485 .r
 486 .(z
 487 .ft CW
 488 .so dgramread.c
 489 .ft
 490 .ce 1
 491 Figure 6a\ \ Reading Internet domain datagrams
 492 .)z
 493 .pp
 494 The examples in Figure 6a and 6b are very close to the previous example
 495 except that the socket is in the Internet domain.
 496 The structure of Internet domain addresses is defined in the file
 497 \fI<netinet/in.h>\fP.
 498 Internet addresses specify a host address (a 32-bit number)
 499 and a delivery slot, or port, on that
 500 machine.  These ports are managed by the system routines that implement
 501 a particular protocol.
 502 Unlike UNIX domain names, Internet socket names are not entered into
 503 the file system and, therefore,
 504 they do not have to be unlinked after the socket has been closed.
 505 When a message must be sent between machines it is sent to
 506 the protocol routine on the destination machine, which interprets the
 507 address to determine to which socket the message should be delivered.
 508 Several different protocols may be active on
 509 the same machine, but, in general, they will not communicate with one another.
 510 As a result, different protocols are allowed to use the same port numbers.
 511 Thus, implicitly, an Internet address is a triple including a protocol as
 512 well as the port and machine address.
 513 An \fIassociation\fP is a temporary or permanent specification
 514 of a pair of communicating sockets.
 515 An association is thus identified by the tuple
 516 <\fIprotocol, local machine address, local port,
 517 remote machine address, remote port\fP>.
 518 An association may be transient when using datagram sockets;
 519 the association actually exists during a \fIsend\fP operation.
 520 .(z
 521 .ft CW
 522 .so dgramsend.c
 523 .ft
 524 .ce 1
 525 Figure 6b\ \ Sending an Internet domain datagram
 526 .)z
 527 .pp
 528 The protocol for a socket is chosen when the socket is created.  The
 529 local machine address for a socket can be any valid network address of the
 530 machine, if it has more than one, or it can be the wildcard value
 531 INADDR_ANY.
 532 The wildcard value is used in the program in Figure 6a.
 533 If a machine has several network addresses, it is likely
 534 that messages sent to any of the addresses should be deliverable to
 535 a socket.  This will be the case if the wildcard value has been chosen.
 536 Note that even if the wildcard value is chosen, a program sending messages
 537 to the named socket must specify a valid network address.  One can be willing
 538 to receive from ``anywhere,'' but one cannot send a message ``anywhere.''
 539 The program in Figure 6b is given the destination host name as a command
 540 line argument.
 541 To determine a network address to which it can send the message, it looks
 542 up
 543 the host address by the call to \fIgethostbyname()\fP.
 544 The returned structure includes the host's network address,
 545 which is copied into the structure specifying the
 546 destination of the message.
 547 .pp
 548 The port number can be thought of as the number of a mailbox, into
 549 which the protocol places one's messages.  Certain daemons, offering
 550 certain advertised services, have reserved
 551 or ``well-known'' port numbers.  These fall in the range
 552 from 1 to 1023.  Higher numbers are available to general users.
 553 Only servers need to ask for a particular number.
 554 The system will assign an unused port number when an address
 555 is bound to a socket.
 556 This may happen when an explicit \fIbind\fP
 557 call is made with a port number of 0, or
 558 when a \fIconnect\fP or \fIsend\fP
 559 is performed on an unbound socket.
 560 Note that port numbers are not automatically reported back to the user.
 561 After calling \fIbind(),\fP asking for port 0, one may call
 562 \fIgetsockname()\fP to discover what port was actually assigned.
 563 The routine \fIgetsockname()\fP
 564 will not work for names in the UNIX domain.
 565 .pp
 566 The format of the socket address is specified in part by standards within the
 567 Internet domain.  The specification includes the order of the bytes in
 568 the address.  Because machines differ in the internal representation
 569 they ordinarily use
 570 to represent integers, printing out the port number as returned by
 571 \fIgetsockname()\fP may result in a misinterpretation.  To
 572 print out the number, it is necessary to use the routine \fIntohs()\fP
 573 (for \fInetwork to host: short\fP) to convert the number from the
 574 network representation to the host's representation.  On some machines,
 575 such as 68000-based machines, this is a null operation.  On others,
 576 such as VAXes, this results in a swapping of bytes.  Another routine
 577 exists to convert a short integer from the host format to the network format,
 578 called \fIhtons()\fP; similar routines exist for long integers.
 579 For further information, refer to the
 580 entry for \fIbyteorder\fP in section 3 of the manual.
 581 .b
 582 .sh 1 "Connections"
 583 .r
 584 .pp
 585 To send data between stream sockets (having communication style SOCK_STREAM),
 586 the sockets must be connected.
 587 Figures 7a and 7b show two programs that create such a connection.
 588 The program in 7a is relatively simple.
 589 To initiate a connection, this program simply creates
 590 a stream socket, then calls \fIconnect()\fP,
 591 specifying the address of the socket to which
 592 it wishes its socket connected.  Provided that the target socket exists and
 593 is prepared to handle a connection, connection will be complete,
 594 and the program can begin to send
 595 messages.  Messages will be delivered in order without message
 596 boundaries, as with pipes.  The connection is destroyed when either
 597 socket is closed (or soon thereafter).  If a process persists
 598 in sending messages after the connection is closed, a SIGPIPE signal
 599 is sent to the process by the operating system.  Unless explicit action
 600 is taken to handle the signal (see the manual page for \fIsignal\fP
 601 or \fIsigvec\fP),
 602 the process will terminate and the shell
 603 will print the message ``broken pipe.''
 604 .(z
 605 .ft CW
 606 .so streamwrite.c
 607 .ft
 608 .ce 1
 609 Figure 7a\ \ Initiating an Internet domain stream connection
 610 .)z
 611 .(z
 612 .ft CW
 613 .so streamread.c
 614 .ft
 615 .ce 1
 616 Figure 7b\ \ Accepting an Internet domain stream connection
 617 .sp 2
 618 .ft CW
 619 .so strchkread.c
 620 .ft
 621 .ce 1
 622 Figure 7c\ \ Using select() to check for pending connections
 623 .)z
 624 .(z
 625 .so fig8.pic
 626 .sp
 627 .ce 1
 628 Figure 8\ \ Establishing a stream connection
 629 .)z
 630 .pp
 631 Forming a connection is asymmetrical; one process, such as the
 632 program in Figure 7a, requests a connection with a particular socket,
 633 the other process accepts connection requests.
 634 Before a connection can be accepted a socket must be created and an address
 635 bound to it.  This
 636 situation is illustrated in the top half of Figure 8.  Process 2
 637 has created a socket and bound a port number to it.  Process 1 has created an
 638 unnamed socket.
 639 The address bound to process 2's socket is then made known to process 1 and,
 640 perhaps to several other potential communicants as well.
 641 If there are several possible communicants,
 642 this one socket might receive several requests for connections.
 643 As a result, a new socket is created for each connection.  This new socket
 644 is the endpoint for communication within this process for this connection.
 645 A connection may be destroyed by closing the corresponding socket.
 646 .pp
 647 The program in Figure 7b is a rather trivial example of a server.  It
 648 creates a socket to which it binds a name, which it then advertises.
 649 (In this case it prints out the socket number.)  The program then calls
 650 \fIlisten()\fP for this socket.
 651 Since several clients may attempt to connect more or less
 652 simultaneously, a queue of pending connections is maintained in the system
 653 address space.  \fIListen()\fP
 654 marks the socket as willing to accept connections and initializes the queue.
 655 When a connection is requested, it is listed in the queue.  If the
 656 queue is full, an error status may be returned to the requester.
 657 The maximum length of this queue is specified by the second argument of
 658 \fIlisten()\fP; the maximum length is limited by the system.
 659 Once the listen call has been completed, the program enters
 660 an infinite loop.  On each pass through the loop, a new connection is
 661 accepted and removed from the queue, and, hence, a new socket for the
 662 connection is created.  The bottom half of Figure 8 shows the result of
 663 Process 1 connecting with the named socket of Process 2, and Process 2
 664 accepting the connection.  After the connection is created, the
 665 service, in this case printing out the messages, is performed and the
 666 connection socket closed.  The \fIaccept()\fP
 667 call will take a pending connection
 668 request from the queue if one is available, or block waiting for a request.
 669 Messages are read from the connection socket.
 670 Reads from an active connection will normally block until data is available.
 671 The number of bytes read is returned.  When a connection is destroyed,
 672 the read call returns immediately.  The number of bytes returned will
 673 be zero.
 674 .pp
 675 The program in Figure 7c is a slight variation on the server in Figure 7b.
 676 It avoids blocking when there are no pending connection requests by
 677 calling \fIselect()\fP
 678 to check for pending requests before calling \fIaccept().\fP
 679 This strategy is useful when connections may be received
 680 on more than one socket, or when data may arrive on other connected
 681 sockets before another connection request.
 682 .pp
 683 The programs in Figures 9a and 9b show a program using stream communication
 684 in the UNIX domain.  Streams in the UNIX domain can be used for this sort
 685 of program in exactly the same way as Internet domain streams, except for
 686 the form of the names and the restriction of the connections to a single
 687 file system.  There are some differences, however, in the functionality of
 688 streams in the two domains, notably in the handling of
 689 \fIout-of-band\fP data (discussed briefly below).  These differences
 690 are beyond the scope of this paper.
 691 .(z
 692 .ft CW
 693 .so ustreamwrite.c
 694 .ft
 695 .ce 1
 696 Figure 9a\ \ Initiating a UNIX domain stream connection
 697 .sp 2
 698 .ft CW
 699 .so ustreamread.c
 700 .ft
 701 .ce 1
 702 Figure 9b\ \ Accepting a UNIX domain stream connection
 703 .)z
 704 .b
 705 .sh 1 "Reads, Writes, Recvs, etc."
 706 .r
 707 .pp
 708 UNIX 4.4BSD has several system calls for reading and writing information.
 709 The simplest calls are \fIread() \fP and \fIwrite().\fP \fIWrite()\fP
 710 takes as arguments the index of a descriptor, a pointer to a buffer
 711 containing the data and the size of the data.
 712 The descriptor may indicate either a file or a connected socket.
 713 ``Connected'' can mean either a connected stream socket (as described
 714 in Section 8) or a datagram socket for which a \fIconnect()\fP
 715 call has provided a default destination (see the \fIconnect()\fP manual page).
 716 \fIRead()\fP also takes a descriptor that indicates either a file or a socket.
 717 \fIWrite()\fP requires a connected socket since no destination is
 718 specified in the parameters of the system call.
 719 \fIRead()\fP can be used for either a connected or an unconnected socket.
 720 These calls are, therefore, quite flexible and may be used to
 721 write applications that require no assumptions about the source of
 722 their input or the destination of their output.
 723 There are variations on \fIread() \fP and \fIwrite()\fP
 724 that allow the source and destination of the input and output to use
 725 several separate buffers, while retaining the flexibility to handle
 726 both files and sockets.  These are \fIreadv()\fP and \fI writev(),\fP
 727 for read and write \fIvector.\fP
 728 .pp
 729 It is sometimes necessary to send high priority data over a
 730 connection that may have unread low priority data at the
 731 other end.  For example, a user interface process may be interpreting
 732 commands and sending them on to another process through a stream connection.
 733 The user interface may have filled the stream with as yet unprocessed
 734 requests when the user types
 735 a command to cancel all outstanding requests.
 736 Rather than have the high priority data wait
 737 to be processed after the low priority data, it is possible to
 738 send it as \fIout-of-band\fP
 739 (OOB) data.  The notification of pending OOB data results in the generation of
 740 a SIGURG signal, if this signal has been enabled (see the manual
 741 page for \fIsignal\fP or \fIsigvec\fP).
 742 See [Leffler 1986] for a more complete description of the OOB mechanism.
 743 There are a pair of calls similar to \fIread\fP and \fIwrite\fP
 744 that allow options, including sending
 745 and receiving OOB information; these are \fI send()\fP
 746 and \fIrecv().\fP
 747 These calls are used only with sockets; specifying a descriptor for a file will
 748 result in the return of an error status.  These calls also allow
 749 \fIpeeking\fP at data in a stream.
 750 That is, they allow a process to read data without removing the data from
 751 the stream.  One use of this facility is to read ahead in a stream
 752 to determine the size of the next item to be read.
 753 When not using these options, these calls have the same functions as
 754 \fIread()\fP and \fIwrite().\fP
 755 .pp
 756 To send datagrams, one must be allowed to specify the destination.
 757 The call \fIsendto()\fP
 758 takes a destination address as an argument and is therefore used for
 759 sending datagrams.  The call \fIrecvfrom()\fP
 760 is often used to read datagrams, since this call returns the address
 761 of the sender, if it is available, along with the data.
 762 If the identity of the sender does not matter, one may use \fIread()\fP
 763 or \fIrecv().\fP
 764 .pp
 765 Finally, there are a pair of calls that allow the sending and
 766 receiving of messages from multiple buffers, when the address of the
 767 recipient must be specified.  These are \fIsendmsg()\fP and
 768 \fIrecvmsg().\fP
 769 These calls are actually quite general and have other uses,
 770 including, in the UNIX domain, the transmission of a file descriptor from one
 771 process to another.
 772 .pp
 773 The various options for reading and writing are shown in Figure 10,
 774 together with their parameters.  The parameters for each system call
 775 reflect the differences in function of the different calls.
 776 In the examples given in this paper, the calls \fIread()\fP and
 777 \fIwrite()\fP have been used whenever possible.
 778 .(z
 779 .ft CW
 780         /*
 781          * The variable descriptor may be the descriptor of either a file
 782          * or of a socket.
 783          */
 784         cc = read(descriptor, buf, nbytes)
 785         int cc, descriptor; char *buf; int nbytes;
 786
 787         /*
 788          * An iovec can include several source buffers.
 789          */
 790         cc = readv(descriptor, iov, iovcnt)
 791         int cc, descriptor; struct iovec *iov; int iovcnt;
 792
 793         cc = write(descriptor, buf, nbytes)
 794         int cc, descriptor; char *buf; int nbytes;
 795
 796         cc = writev(descriptor, iovec, ioveclen)
 797         int cc, descriptor; struct iovec *iovec; int ioveclen;
 798
 799         /*
 800          * The variable ``sock'' must be the descriptor of a socket.
 801          * Flags may include MSG_OOB and MSG_PEEK.
 802          */
 803         cc = send(sock, msg, len, flags)
 804         int cc, sock; char *msg; int len, flags;
 805
 806         cc = sendto(sock, msg, len, flags, to, tolen)
 807         int cc, sock; char *msg; int len, flags;
 808         struct sockaddr *to; int tolen;
 809
 810         cc = sendmsg(sock, msg, flags)
 811         int cc, sock; struct msghdr msg[]; int flags;
 812
 813         cc = recv(sock, buf, len, flags)
 814         int cc, sock; char *buf; int len, flags;
 815
 816         cc = recvfrom(sock, buf, len, flags, from, fromlen)
 817         int cc, sock; char *buf; int len, flags;
 818         struct sockaddr *from; int *fromlen;
 819
 820         cc = recvmsg(sock, msg, flags)
 821         int cc, socket; struct msghdr msg[]; int flags;
 822 .ft
 823 .sp 1
 824 .ce 1
 825 Figure 10\ \ Varieties of read and write commands
 826 .)z
 827 .b
 828 .sh 1 "Choices"
 829 .r
 830 .pp
 831 This paper has presented examples of some of the forms
 832 of communication supported by
 833 Berkeley UNIX 4.4BSD.  These have been presented in an order chosen for
 834 ease of presentation.  It is useful to review these options emphasizing the
 835 factors that make each attractive.
 836 .pp
 837 Pipes have the advantage of portability, in that they are supported in all
 838 UNIX systems.  They also are relatively
 839 simple to use.  Socketpairs share this simplicity and have the additional
 840 advantage of allowing bidirectional communication.  The major shortcoming
 841 of these mechanisms is that they require communicating processes to be
 842 descendants of a common process.  They do not allow intermachine communication.
 843 .pp
 844 The two communication domains, UNIX and Internet, allow processes with no common
 845 ancestor to communicate.
 846 Of the two, only the Internet domain allows
 847 communication between machines.
 848 This makes the Internet domain a necessary
 849 choice for processes running on separate machines.
 850 .pp
 851 The choice between datagrams and stream communication is best made by
 852 carefully considering the semantic and performance
 853 requirements of the application.
 854 Streams can be both advantageous and disadvantageous.  One disadvantage
 855 is that a process is only allowed a limited number of open streams,
 856 as there are usually only 64 entries available in the open descriptor
 857 table.  This can cause problems if a single server must talk with a large
 858 number of clients.
 859 Another is that for delivering a short message the stream setup and
 860 teardown time can be unnecessarily long.  Weighed against this are
 861 the reliability built into the streams.  This will often be the
 862 deciding factor in favor of streams.
 863 .b
 864 .sh 1 "What to do Next"
 865 .r
 866 .pp
 867 Many of the examples presented here can serve as models for multiprocess
 868 programs and for programs distributed across several machines.
 869 In developing a new multiprocess program, it is often easiest to
 870 first write the code to create the processes and communication paths.
 871 After this code is debugged, the code specific to the application can
 872 be added.
 873 .pp
 874 An introduction to the UNIX system and programming using UNIX system calls
 875 can be found in [Kernighan and Pike 1984].
 876 Further documentation of the Berkeley UNIX 4.4BSD IPC mechanisms can be
 877 found in [Leffler et al. 1986].
 878 More detailed information about particular calls and protocols
 879 is provided in sections
 880 2, 3 and 4 of the
 881 UNIX Programmer's Manual [CSRG 1986].
 882 In particular the following manual pages are relevant:
 883 .(b
 884 .TS
 885 l l.
 886 creating and naming sockets     socket(2), bind(2)
 887 establishing connections        listen(2), accept(2), connect(2)
 888 transferring data       read(2), write(2), send(2), recv(2)
 889 addresses       inet(4F)
 890 protocols       tcp(4P), udp(4P).
 891 .TE
 892 .)b
 893 .(b
 894 .sp
 895 .b
 896 Acknowledgements
 897 .pp
 898 I would like to thank Sam Leffler and Mike Karels for their help in
 899 understanding the IPC mechanisms and all the people whose comments
 900 have helped in writing and improving this report.
 901 .pp
 902 This work was sponsored by the Defense Advanced Research Projects Agency
 903 (DoD), ARPA Order No. 4031, monitored by the Naval Electronics Systems
 904 Command under contract No. N00039-C-0235.
 905 The views and conclusions contained in this document are those of the
 906 author and should not be interpreted as representing official policies,
 907 either expressed or implied, of the Defense Research Projects Agency
 908 or of the US Government.
 909 .)b
 910 .(b
 911 .sp
 912 .b
 913 References
 914 .r
 915 .sp
 916 .ls 1
 917 B.W. Kernighan & R. Pike, 1984,
 918 .i "The UNIX Programming Environment."
 919 Englewood Cliffs, N.J.: Prentice-Hall.
 920 .sp
 921 .ls 1
 922 B.W. Kernighan & D.M. Ritchie, 1978,
 923 .i "The C Programming Language,"
 924 Englewood Cliffs, N.J.: Prentice-Hall.
 925 .sp
 926 .ls 1
 927 S.J. Leffler, R.S. Fabry, W.N. Joy, P. Lapsley, S. Miller & C. Torek, 1986,
 928 .i "An Advanced 4.4BSD Interprocess Communication Tutorial."
 929 Computer Systems Research Group,
 930 Department of Electrical Engineering and Computer Science,
 931 University of California, Berkeley.
 932 .sp
 933 .ls 1
 934 Computer Systems Research Group, 1986,
 935 .i "UNIX Programmer's Manual, 4.4 Berkeley Software Distribution."
 936 Computer Systems Research Group,
 937 Department of Electrical Engineering and Computer Science,
 938 University of California, Berkeley.
 939 .)b