2 .\" Must use -- tbl -- with this one
4 .\" @(#)nfs.rfc.ms 2.2 88/08/05 4.0 RPCSRC
5 .\" $FreeBSD: src/lib/libc/rpc/PSD.doc/nfs.rfc.ms,v 1.1.14.1 2000/11/24 09:36:30 ru Exp $
9 .if \\n%=1 .tl ''- % -''
12 .\" prevent excess underlining in nroff
14 .OH 'Network File System: Version 2 Protocol Specification''Page %'
15 .EH 'Page %''Network File System: Version 2 Protocol Specification'
18 \&Network File System: Version 2 Protocol Specification
19 .IX NFS "" "" "" PAGE MAJOR
20 .IX "Network File System" "" "" "" PAGE MAJOR
21 .IX NFS "version-2 protocol specification"
22 .IX "Network File System" "version-2 protocol specification"
25 \&Status of this Standard
27 Note: This document specifies a protocol that Sun Microsystems, Inc.,
28 and others are using. It specifies it in standard ARPA RFC form.
33 The Sun Network Filesystem (NFS) protocol provides transparent remote
34 access to shared filesystems over local area networks. The NFS
35 protocol is designed to be machine, operating system, network architecture,
36 and transport protocol independent. This independence is
37 achieved through the use of Remote Procedure Call (RPC) primitives
38 built on top of an External Data Representation (XDR). Implementations
39 exist for a variety of machines, from personal computers to
42 The supporting mount protocol allows the server to hand out remote
43 access privileges to a restricted set of clients. It performs the
44 operating system-specific functions that allow, for example, to
45 attach remote directory trees to some local file system.
47 \&Remote Procedure Call
48 .IX "Remote Procedure Call"
50 Sun's remote procedure call specification provides a procedure-
51 oriented interface to remote services. Each server supplies a
52 program that is a set of procedures. NFS is one such "program".
53 The combination of host address, program number, and procedure
54 number specifies one remote service procedure. RPC does not depend
55 on services provided by specific protocols, so it can be used with
56 any underlying transport protocol. See the
57 .I "Remote Procedure Calls: Protocol Specification"
58 chapter of this manual.
60 \&External Data Representation
61 .IX "External Data Representation"
63 The External Data Representation (XDR) standard provides a common
64 way of representing a set of data types over a network.
66 Protocol Specification is written using the RPC data description
68 For more information, see the
69 .I " External Data Representation Standard: Protocol Specification."
70 Sun provides implementations of XDR and
71 RPC, but NFS does not require their use. Any software that
72 provides equivalent functionality can be used, and if the encoding
73 is exactly the same it can interoperate with other implementations
77 .IX "stateless servers"
80 The NFS protocol is stateless. That is, a server does not need to
81 maintain any extra state information about any of its clients in
82 order to function correctly. Stateless servers have a distinct
83 advantage over stateful servers in the event of a failure. With
84 stateless servers, a client need only retry a request until the
85 server responds; it does not even need to know that the server has
86 crashed, or the network temporarily went down. The client of a
87 stateful server, on the other hand, needs to either detect a server
88 crash and rebuild the server's state when it comes back up, or
89 cause client operations to fail.
91 This may not sound like an important issue, but it affects the
92 protocol in some unexpected ways. We feel that it is worth a bit
93 of extra complexity in the protocol to be able to write very simple
94 servers that do not require fancy crash recovery.
96 On the other hand, NFS deals with objects such as files and
97 directories that inherently have state -- what good would a file be
98 if it did not keep its contents intact? The goal is to not
99 introduce any extra state in the protocol itself. Another way to
100 simplify recovery is by making operations "idempotent" whenever
101 possible (so that they can potentially be repeated).
103 \&NFS Protocol Definition
104 .IX NFS "protocol definition"
107 Servers have been known to change over time, and so can the
108 protocol that they use. So RPC provides a version number with each
109 RPC request. This RFC describes version two of the NFS protocol.
110 Even in the second version, there are various obsolete procedures
111 and parameters, which will be removed in later versions. An RFC
112 for version three of the NFS protocol is currently under
118 NFS assumes a file system that is hierarchical, with directories as
119 all but the bottom-level files. Each entry in a directory (file,
120 directory, device, etc.) has a string name. Different operating
121 systems may have restrictions on the depth of the tree or the names
122 used, as well as using different syntax to represent the "pathname",
123 which is the concatenation of all the "components" (directory and
124 file names) in the name. A "file system" is a tree on a single
125 server (usually a single disk or physical partition) with a specified
126 "root". Some operating systems provide a "mount" operation to make
127 all file systems appear as a single tree, while others maintain a
128 "forest" of file systems. Files are unstructured streams of
129 uninterpreted bytes. Version 3 of NFS uses a slightly more general
132 NFS looks up one component of a pathname at a time. It may not be
133 obvious why it does not just take the whole pathname, traipse down
134 the directories, and return a file handle when it is done. There are
135 several good reasons not to do this. First, pathnames need
136 separators between the directory components, and different operating
137 systems use different separators. We could define a Network Standard
138 Pathname Representation, but then every pathname would have to be
139 parsed and converted at each end. Other issues are discussed in
140 \fINFS Implementation Issues\fP below.
142 Although files and directories are similar objects in many ways,
143 different procedures are used to read directories and files. This
144 provides a network standard format for representing directories. The
145 same argument as above could have been used to justify a procedure
146 that returns only one directory entry per call. The problem is
147 efficiency. Directories can contain many entries, and a remote call
148 to return each would be just too slow.
151 .IX NFS "RPC information"
152 .IP \fIAuthentication\fP
159 authentication, except in the NULL procedure where
162 .IP "\fITransport Protocols\fP"
163 NFS currently is supported on UDP/IP only.
164 .IP "\fIPort Number\fP"
165 The NFS protocol currently uses the UDP port number 2049. This is
166 not an officially assigned port, so later versions of the protocol
167 use the \*QPortmapping\*U facility of RPC.
169 \&Sizes of XDR Structures
170 .IX "XDR structure sizes"
172 These are the sizes, given in decimal bytes, of various XDR
173 structures used in the protocol:
175 /* \fIThe maximum number of bytes of data in a READ or WRITE request\fP */
176 const MAXDATA = 8192;
178 /* \fIThe maximum number of bytes in a pathname argument\fP */
179 const MAXPATHLEN = 1024;
181 /* \fIThe maximum number of bytes in a file name argument\fP */
182 const MAXNAMLEN = 255;
184 /* \fIThe size in bytes of the opaque "cookie" passed by READDIR\fP */
185 const COOKIESIZE = 4;
187 /* \fIThe size in bytes of the opaque file handle\fP */
193 .IX NFS "basic data types"
195 The following XDR definitions are basic structures and types used
196 in other structures described further on.
200 .IX "NFS data types" stat "" \fIstat\fP
216 NFSERR_NAMETOOLONG=63,
227 type is returned with every procedure's results. A
230 indicates that the call completed successfully and
231 the results are valid. The other values indicate some kind of
232 error occurred on the server side during the servicing of the
233 procedure. The error values are derived from UNIX error numbers.
234 .IP \fBNFSERR_PERM\fP:
235 Not owner. The caller does not have correct ownership
236 to perform the requested operation.
237 .IP \fBNFSERR_NOENT\fP:
238 No such file or directory. The file or directory
239 specified does not exist.
241 Some sort of hard error occurred when the operation was
242 in progress. This could be a disk error, for example.
243 .IP \fBNFSERR_NXIO\fP:
244 No such device or address.
245 .IP \fBNFSERR_ACCES\fP:
246 Permission denied. The caller does not have the
247 correct permission to perform the requested operation.
248 .IP \fBNFSERR_EXIST\fP:
249 File exists. The file specified already exists.
250 .IP \fBNFSERR_NODEV\fP:
252 .IP \fBNFSERR_NOTDIR\fP:
253 Not a directory. The caller specified a
254 non-directory in a directory operation.
255 .IP \fBNFSERR_ISDIR\fP:
256 Is a directory. The caller specified a directory in
257 a non- directory operation.
258 .IP \fBNFSERR_FBIG\fP:
259 File too large. The operation caused a file to grow
260 beyond the server's limit.
261 .IP \fBNFSERR_NOSPC\fP:
262 No space left on device. The operation caused the
263 server's filesystem to reach its limit.
264 .IP \fBNFSERR_ROFS\fP:
265 Read-only filesystem. Write attempted on a read-only filesystem.
266 .IP \fBNFSERR_NAMETOOLONG\fP:
267 File name too long. The file name in an operation was too long.
268 .IP \fBNFSERR_NOTEMPTY\fP:
269 Directory not empty. Attempted to remove a
270 directory that was not empty.
271 .IP \fBNFSERR_DQUOT\fP:
272 Disk quota exceeded. The client's disk quota on the
273 server has been exceeded.
274 .IP \fBNFSERR_STALE\fP:
275 The "fhandle" given in the arguments was invalid.
276 That is, the file referred to by that file handle no longer exists,
277 or access to it has been revoked.
278 .IP \fBNFSERR_WFLUSH\fP:
279 The server's write cache used in the
281 call got flushed to disk.
286 .IX "NFS data types" ftype "" \fIftype\fP
300 gives the type of a file. The type
302 indicates a non-file,
308 is a block-special device,
310 is a character-special device, and
316 .IX "NFS data types" fhandle "" \fIfhandle\fP
318 typedef opaque fhandle[FHSIZE];
323 is the file handle passed between the server and the client.
324 All file operations are done using file handles to refer to a file or
325 directory. The file handle can contain whatever information the server
326 needs to distinguish an individual file.
330 .IX "NFS data types" timeval "" \fItimeval\fP
333 unsigned int seconds;
334 unsigned int useconds;
340 structure is the number of seconds and microseconds
341 since midnight January 1, 1970, Greenwich Mean Time. It is used to
342 pass time and date information.
346 .IX "NFS data types" fattr "" \fIfattr\fP
355 unsigned int blocksize;
368 structure contains the attributes of a file; "type" is the type of
369 the file; "nlink" is the number of hard links to the file (the number
370 of different names for the same file); "uid" is the user
371 identification number of the owner of the file; "gid" is the group
372 identification number of the group of the file; "size" is the size in
373 bytes of the file; "blocksize" is the size in bytes of a block of the
374 file; "rdev" is the device number of the file if it is type
378 "blocks" is the number of blocks the file takes up on disk; "fsid" is
379 the file system identifier for the filesystem containing the file;
380 "fileid" is a number that uniquely identifies the file within its
381 filesystem; "atime" is the time when the file was last accessed for
382 either read or write; "mtime" is the time when the file data was last
383 modified (written); and "ctime" is the time when the status of the
384 file was last changed. Writing to the file also changes "ctime" if
385 the size of the file changes.
387 "mode" is the access mode encoded as a set of bits. Notice that the
388 file type is specified both in the mode bits and in the file type.
389 This is really a bug in the protocol and will be fixed in future
390 versions. The descriptions given below specify the bit positions
398 0040000&This is a directory; "type" field should be NFDIR.
399 0020000&This is a character special file; "type" field should be NFCHR.
400 0060000&This is a block special file; "type" field should be NFBLK.
401 0100000&This is a regular file; "type" field should be NFREG.
402 0120000&This is a symbolic link file; "type" field should be NFLNK.
403 0140000&This is a named socket; "type" field should be NFNON.
404 0004000&Set user id on execution.
405 0002000&Set group id on execution.
406 0001000&Save swapped text even after use.
407 0000400&Read permission for owner.
408 0000200&Write permission for owner.
409 0000100&Execute and search permission for owner.
410 0000040&Read permission for group.
411 0000020&Write permission for group.
412 0000010&Execute and search permission for group.
413 0000004&Read permission for others.
414 0000002&Write permission for others.
415 0000001&Execute and search permission for others.
420 The bits are the same as the mode bits returned by the
422 system call in the UNIX system. The file type is specified both in
423 the mode bits and in the file type. This is fixed in future
426 The "rdev" field in the attributes structure is an operating system
427 specific device specifier. It will be removed and generalized in
428 the next revision of the protocol.
434 .IX "NFS data types" sattr "" \fIsattr\fP
448 structure contains the file attributes which can be set
449 from the client. The fields are the same as for
451 above. A "size" of zero means the file should be truncated.
452 A value of -1 indicates a field that should be ignored.
457 .IX "NFS data types" filename "" \fIfilename\fP
459 typedef string filename<MAXNAMLEN>;
464 is used for passing file names or pathname components.
469 .IX "NFS data types" path "" \fIpath\fP
471 typedef string path<MAXPATHLEN>;
476 is a pathname. The server considers it as a string
477 with no internal structure, but to the client it is the name of a
478 node in a filesystem tree.
483 .IX "NFS data types" attrstat "" \fIattrstat\fP
485 union attrstat switch (stat status) {
495 structure is a common procedure result. It contains
496 a "status" and, if the call succeeded, it also contains the
497 attributes of the file on which the operation was done.
502 .IX "NFS data types" diropargs "" \fIdiropargs\fP
512 structure is used in directory operations. The
513 "fhandle" "dir" is the directory in which to find the file "name".
514 A directory operation is one in which the directory is affected.
519 .IX "NFS data types" diropres "" \fIdiropres\fP
521 union diropres switch (stat status) {
532 The results of a directory operation are returned in a
534 structure. If the call succeeded, a new file handle "file" and the
535 "attributes" associated with that file are returned along with the
539 .IX "NFS server procedures" "" "" "" PAGE MAJOR
541 The protocol definition is given as a set of procedures with
542 arguments and results defined using the RPC language. A brief
543 description of the function of each procedure should provide enough
544 information to allow implementation.
546 All of the procedures in the NFS protocol are assumed to be
547 synchronous. When a procedure returns to the client, the client
548 can assume that the operation has completed and any data associated
549 with the request is now on stable storage. For example, a client
551 request may cause the server to update data blocks,
552 filesystem information blocks (such as indirect blocks), and file
553 attribute information (size and modify times). When the
555 returns to the client, it can assume that the write is safe, even
556 in case of a server crash, and it can discard the data written.
557 This is a very important part of the statelessness of the server.
558 If the server waited to flush data from remote requests, the client
559 would have to save those requests so that it could resend them in
560 case of a server crash.
566 * Remote file service routines
569 program NFS_PROGRAM {
570 version NFS_VERSION {
571 void NFSPROC_NULL(void) = 0;
572 attrstat NFSPROC_GETATTR(fhandle) = 1;
573 attrstat NFSPROC_SETATTR(sattrargs) = 2;
574 void NFSPROC_ROOT(void) = 3;
575 diropres NFSPROC_LOOKUP(diropargs) = 4;
576 readlinkres NFSPROC_READLINK(fhandle) = 5;
577 readres NFSPROC_READ(readargs) = 6;
578 void NFSPROC_WRITECACHE(void) = 7;
579 attrstat NFSPROC_WRITE(writeargs) = 8;
580 diropres NFSPROC_CREATE(createargs) = 9;
581 stat NFSPROC_REMOVE(diropargs) = 10;
582 stat NFSPROC_RENAME(renameargs) = 11;
583 stat NFSPROC_LINK(linkargs) = 12;
584 stat NFSPROC_SYMLINK(symlinkargs) = 13;
585 diropres NFSPROC_MKDIR(createargs) = 14;
586 stat NFSPROC_RMDIR(diropargs) = 15;
587 readdirres NFSPROC_READDIR(readdirargs) = 16;
588 statfsres NFSPROC_STATFS(fhandle) = 17;
595 .IX "NFS server procedures" NFSPROC_NULL() "" \fINFSPROC_NULL()\fP
598 NFSPROC_NULL(void) = 0;
601 This procedure does no work. It is made available in all RPC
602 services to allow server response testing and timing.
605 \&Get File Attributes
606 .IX "NFS server procedures" NFSPROC_GETATTR() "" \fINFSPROC_GETATTR()\fP
609 NFSPROC_GETATTR (fhandle) = 1;
612 If the reply status is
614 then the reply attributes contains
615 the attributes for the file given by the input fhandle.
618 \&Set File Attributes
619 .IX "NFS server procedures" NFSPROC_SETATTR() "" \fINFSPROC_SETATTR()\fP
627 NFSPROC_SETATTR (sattrargs) = 2;
630 The "attributes" argument contains fields which are either -1 or
631 are the new value for the attributes of "file". If the reply
634 then the reply attributes have the attributes of
635 the file after the "SETATTR" operation has completed.
637 Note: The use of -1 to indicate an unused field in "attributes" is
638 changed in the next version of the protocol.
641 \&Get Filesystem Root
642 .IX "NFS server procedures" NFSPROC_ROOT "" \fINFSPROC_ROOT\fP
645 NFSPROC_ROOT(void) = 3;
648 Obsolete. This procedure is no longer used because finding the
649 root file handle of a filesystem requires moving pathnames between
650 client and server. To do this right we would have to define a
651 network standard representation of pathnames. Instead, the
652 function of looking up the root file handle is done by the
655 .I "Mount Protocol Definition"
656 later in this chapter for details).
660 .IX "NFS server procedures" NFSPROC_LOOKUP() "" \fINFSPROC_LOOKUP()\fP
663 NFSPROC_LOOKUP(diropargs) = 4;
666 If the reply "status" is
668 then the reply "file" and reply
669 "attributes" are the file handle and attributes for the file "name"
670 in the directory given by "dir" in the argument.
673 \&Read From Symbolic Link
674 .IX "NFS server procedures" NFSPROC_READLINK() "" \fINFSPROC_READLINK()\fP
676 union readlinkres switch (stat status) {
684 NFSPROC_READLINK(fhandle) = 5;
687 If "status" has the value
689 then the reply "data" is the data in
690 the symbolic link given by the file referred to by the fhandle argument.
692 Note: since NFS always parses pathnames on the client, the
693 pathname in a symbolic link may mean something different (or be
694 meaningless) on a different client or on the server if a different
695 pathname syntax is used.
699 .IX "NFS server procedures" NFSPROC_READ "" \fINFSPROC_READ\fP
708 union readres switch (stat status) {
711 opaque data<NFS_MAXDATA>;
717 NFSPROC_READ(readargs) = 6;
720 Returns up to "count" bytes of "data" from the file given by
721 "file", starting at "offset" bytes from the beginning of the file.
722 The first byte of the file is at offset zero. The file attributes
723 after the read takes place are returned in "attributes".
725 Note: The argument "totalcount" is unused, and is removed in the
726 next protocol revision.
730 .IX "NFS server procedures" NFSPROC_WRITECACHE() "" \fINFSPROC_WRITECACHE()\fP
733 NFSPROC_WRITECACHE(void) = 7;
736 To be used in the next protocol revision.
740 .IX "NFS server procedures" NFSPROC_WRITE() "" \fINFSPROC_WRITE()\fP
744 unsigned beginoffset;
747 opaque data<NFS_MAXDATA>;
751 NFSPROC_WRITE(writeargs) = 8;
754 Writes "data" beginning "offset" bytes from the beginning of
755 "file". The first byte of the file is at offset zero. If the
756 reply "status" is NFS_OK, then the reply "attributes" contains the
757 attributes of the file after the write has completed. The write
758 operation is atomic. Data from this call to
760 will not be mixed with data from another client's calls.
762 Note: The arguments "beginoffset" and "totalcount" are ignored and
763 are removed in the next protocol revision.
767 .IX "NFS server procedures" NFSPROC_CREATE() "" \fINFSPROC_CREATE()\fP
775 NFSPROC_CREATE(createargs) = 9;
778 The file "name" is created in the directory given by "dir". The
779 initial attributes of the new file are given by "attributes". A
780 reply "status" of NFS_OK indicates that the file was created, and
781 reply "file" and reply "attributes" are its file handle and
782 attributes. Any other reply "status" means that the operation
783 failed and no file was created.
785 Note: This routine should pass an exclusive create flag, meaning
786 "create the file only if it is not already there".
790 .IX "NFS server procedures" NFSPROC_REMOVE() "" \fINFSPROC_REMOVE()\fP
793 NFSPROC_REMOVE(diropargs) = 10;
796 The file "name" is removed from the directory given by "dir". A
797 reply of NFS_OK means the directory entry was removed.
799 Note: possibly non-idempotent operation.
803 .IX "NFS server procedures" NFSPROC_RENAME() "" \fINFSPROC_RENAME()\fP
811 NFSPROC_RENAME(renameargs) = 11;
814 The existing file "from.name" in the directory given by "from.dir"
815 is renamed to "to.name" in the directory given by "to.dir". If the
818 the file was renamed. The
821 atomic on the server; it cannot be interrupted in the middle.
823 Note: possibly non-idempotent operation.
826 \&Create Link to File
827 .IX "NFS server procedures" NFSPROC_LINK() "" \fINFSPROC_LINK()\fP
835 NFSPROC_LINK(linkargs) = 12;
838 Creates the file "to.name" in the directory given by "to.dir",
839 which is a hard link to the existing file given by "from". If the
842 a link was created. Any other return value
843 indicates an error, and the link was not created.
845 A hard link should have the property that changes to either of the
846 linked files are reflected in both files. When a hard link is made
847 to a file, the attributes for the file should have a value for
848 "nlink" that is one greater than the value before the link.
850 Note: possibly non-idempotent operation.
853 \&Create Symbolic Link
854 .IX "NFS server procedures" NFSPROC_SYMLINK() "" \fINFSPROC_SYMLINK()\fP
863 NFSPROC_SYMLINK(symlinkargs) = 13;
866 Creates the file "from.name" with ftype
869 given by "from.dir". The new file contains the pathname "to" and
870 has initial attributes given by "attributes". If the return value
873 a link was created. Any other return value indicates an
874 error, and the link was not created.
876 A symbolic link is a pointer to another file. The name given in
877 "to" is not interpreted by the server, only stored in the newly
878 created file. When the client references a file that is a symbolic
879 link, the contents of the symbolic link are normally transparently
880 reinterpreted as a pathname to substitute. A
882 operation returns the data to the client for interpretation.
884 Note: On UNIX servers the attributes are never used, since
885 symbolic links always have mode 0777.
889 .IX "NFS server procedures" NFSPROC_MKDIR() "" \fINFSPROC_MKDIR()\fP
892 NFSPROC_MKDIR (createargs) = 14;
895 The new directory "where.name" is created in the directory given by
896 "where.dir". The initial attributes of the new directory are given
897 by "attributes". A reply "status" of NFS_OK indicates that the new
898 directory was created, and reply "file" and reply "attributes" are
899 its file handle and attributes. Any other reply "status" means
900 that the operation failed and no directory was created.
902 Note: possibly non-idempotent operation.
906 .IX "NFS server procedures" NFSPROC_RMDIR() "" \fINFSPROC_RMDIR()\fP
909 NFSPROC_RMDIR(diropargs) = 15;
912 The existing empty directory "name" in the directory given by "dir"
913 is removed. If the reply is
915 the directory was removed.
917 Note: possibly non-idempotent operation.
920 \&Read From Directory
921 .IX "NFS server procedures" NFSPROC_READDIR() "" \fINFSPROC_READDIR()\fP
936 union readdirres switch (stat status) {
947 NFSPROC_READDIR (readdirargs) = 16;
950 Returns a variable number of directory entries, with a total size
951 of up to "count" bytes, from the directory given by "dir". If the
952 returned value of "status" is
954 then it is followed by a
955 variable number of "entry"s. Each "entry" contains a "fileid"
956 which consists of a unique number to identify the file within a
957 filesystem, the "name" of the file, and a "cookie" which is an
958 opaque pointer to the next entry in the directory. The cookie is
961 call to get more entries starting at a
962 given point in the directory. The special cookie zero (all bits
963 zero) can be used to get the entries starting at the beginning of
964 the directory. The "fileid" field should be the same number as the
965 "fileid" in the the attributes of the file. (See the
966 .I "Basic Data Types"
968 The "eof" flag has a value of
970 if there are no more entries in the directory.
973 \&Get Filesystem Attributes
974 .IX "NFS server procedures" NFSPROC_STATFS() "" \fINFSPROC_STATFS()\fP
976 union statfsres (stat status) {
990 NFSPROC_STATFS(fhandle) = 17;
993 If the reply "status" is
995 then the reply "info" gives the
996 attributes for the filesystem that contains file referred to by the
997 input fhandle. The attribute fields contain the following values:
999 The optimum transfer size of the server in bytes. This is
1000 the number of bytes the server would like to have in the
1001 data part of READ and WRITE requests.
1003 The block size in bytes of the filesystem.
1005 The total number of "bsize" blocks on the filesystem.
1007 The number of free "bsize" blocks on the filesystem.
1009 The number of "bsize" blocks available to non-privileged users.
1011 Note: This call does not work well if a filesystem has variable
1014 \&NFS Implementation Issues
1015 .IX NFS implementation
1017 The NFS protocol is designed to be operating system independent, but
1018 since this version was designed in a UNIX environment, many
1019 operations have semantics similar to the operations of the UNIX file
1020 system. This section discusses some of the implementation-specific
1023 \&Server/Client Relationship
1024 .IX NFS "server/client relationship"
1026 The NFS protocol is designed to allow servers to be as simple and
1027 general as possible. Sometimes the simplicity of the server can be a
1028 problem, if the client wants to implement complicated filesystem
1031 For example, some operating systems allow removal of open files. A
1032 process can open a file and, while it is open, remove it from the
1033 directory. The file can be read and written as long as the process
1034 keeps it open, even though the file has no name in the filesystem.
1035 It is impossible for a stateless server to implement these semantics.
1036 The client can do some tricks such as renaming the file on remove,
1037 and only removing it on close. We believe that the server provides
1038 enough functionality to implement most file system semantics on the
1041 Every NFS client can also potentially be a server, and remote and
1042 local mounted filesystems can be freely intermixed. This leads to
1043 some interesting problems when a client travels down the directory
1044 tree of a remote filesystem and reaches the mount point on the server
1045 for another remote filesystem. Allowing the server to follow the
1046 second remote mount would require loop detection, server lookup, and
1047 user revalidation. Instead, we decided not to let clients cross a
1048 server's mount point. When a client does a LOOKUP on a directory on
1049 which the server has mounted a filesystem, the client sees the
1050 underlying directory instead of the mounted directory. A client can
1051 do remote mounts that match the server's mount points to maintain the
1055 \&Pathname Interpretation
1056 .IX NFS "pathname interpretation"
1058 There are a few complications to the rule that pathnames are always
1059 parsed on the client. For example, symbolic links could have
1060 different interpretations on different clients. Another common
1061 problem for non-UNIX implementations is the special interpretation of
1062 the pathname ".." to mean the parent of a given directory. The next
1063 revision of the protocol uses an explicit flag to indicate the parent
1067 .IX NFS "permission issues"
1069 The NFS protocol, strictly speaking, does not define the permission
1070 checking used by servers. However, it is expected that a server
1071 will do normal operating system permission checking using
1073 style authentication as the basis of its protection mechanism. The
1074 server gets the client's effective "uid", effective "gid", and groups
1075 on each call and uses them to check permission. There are various
1076 problems with this method that can been resolved in interesting ways.
1078 Using "uid" and "gid" implies that the client and server share the
1079 same "uid" list. Every server and client pair must have the same
1080 mapping from user to "uid" and from group to "gid". Since every
1081 client can also be a server, this tends to imply that the whole
1082 network shares the same "uid/gid" space.
1085 revision of the NFS protocol) uses string names instead of numbers,
1086 but there are still complex problems to be solved.
1088 Another problem arises due to the usually stateful open operation.
1089 Most operating systems check permission at open time, and then check
1090 that the file is open on each read and write request. With stateless
1091 servers, the server has no idea that the file is open and must do
1092 permission checking on each read and write call. On a local
1093 filesystem, a user can open a file and then change the permissions so
1094 that no one is allowed to touch it, but will still be able to write
1095 to the file because it is open. On a remote filesystem, by contrast,
1096 the write would fail. To get around this problem, the server's
1097 permission checking algorithm should allow the owner of a file to
1098 access it regardless of the permission setting.
1100 A similar problem has to do with paging in from a file over the
1101 network. The operating system usually checks for execute permission
1102 before opening a file for demand paging, and then reads blocks from
1103 the open file. The file may not have read permission, but after it
1104 is opened it doesn't matter. An NFS server can not tell the
1105 difference between a normal file read and a demand page-in read. To
1106 make this work, the server allows reading of files if the "uid" given
1107 in the call has execute or read permission on the file.
1109 In most operating systems, a particular user (on the user ID zero)
1110 has access to all files no matter what permission and ownership they
1111 have. This "super-user" permission may not be allowed on the server,
1112 since anyone who can become super-user on their workstation could
1113 gain access to all remote files. The UNIX server by default maps
1114 user id 0 to -2 before doing its access checking. This works except
1115 for NFS root filesystems, where super-user access cannot be avoided.
1117 \&Setting RPC Parameters
1118 .IX NFS "setting RPC parameters"
1120 Various file system parameters and options should be set at mount
1121 time. The mount protocol is described in the appendix below. For
1122 example, "Soft" mounts as well as "Hard" mounts are usually both
1123 provided. Soft mounted file systems return errors when RPC
1124 operations fail (after a given number of optional retransmissions),
1125 while hard mounted file systems continue to retransmit forever.
1126 Clients and servers may need to keep caches of recent operations to
1127 help avoid problems with non-idempotent operations.
1129 \&Mount Protocol Definition
1130 .IX "mount protocol" "" "" "" PAGE MAJOR
1134 .IX "mount protocol" introduction
1136 The mount protocol is separate from, but related to, the NFS
1137 protocol. It provides operating system specific services to get the
1138 NFS off the ground -- looking up server path names, validating user
1139 identity, and checking access permissions. Clients use the mount
1140 protocol to get the first file handle, which allows them entry into a
1143 The mount protocol is kept separate from the NFS protocol to make it
1144 easy to plug in new access checking and validation methods without
1145 changing the NFS server protocol.
1147 Notice that the protocol definition implies stateful servers because
1148 the server maintains a list of client's mount requests. The mount
1149 list information is not critical for the correct functioning of
1150 either the client or the server. It is intended for advisory use
1151 only, for example, to warn possible clients when a server is going
1154 Version one of the mount protocol is used with version two of the NFS
1155 protocol. The only connecting point is the
1157 structure, which is the same for both protocols.
1160 .IX "mount protocol" "RPC information"
1161 .IP \fIAuthentication\fP
1162 The mount service uses
1166 style authentication only.
1167 .IP "\fITransport Protocols\fP"
1168 The mount service is currently supported on UDP/IP only.
1169 .IP "\fIPort Number\fP"
1170 Consult the server's portmapper, described in the chapter
1171 .I "Remote Procedure Calls: Protocol Specification",
1172 to find the port number on which the mount service is registered.
1174 \&Sizes of XDR Structures
1175 .IX "mount protocol" "XDR structure sizes"
1177 These are the sizes, given in decimal bytes, of various XDR
1178 structures used in the protocol:
1180 /* \fIThe maximum number of bytes in a pathname argument\fP */
1181 const MNTPATHLEN = 1024;
1183 /* \fIThe maximum number of bytes in a name argument\fP */
1184 const MNTNAMLEN = 255;
1186 /* \fIThe size in bytes of the opaque file handle\fP */
1191 .IX "mount protocol" "basic data types"
1192 .IX "mount data types"
1194 This section presents the data types used by the mount protocol.
1195 In many cases they are similar to the types used in NFS.
1199 .IX "mount data types" fhandle "" \fIfhandle\fP
1201 typedef opaque fhandle[FHSIZE];
1206 is the file handle that the server passes to the
1207 client. All file operations are done using file handles to refer
1208 to a file or directory. The file handle can contain whatever
1209 information the server needs to distinguish an individual file.
1211 This is the same as the "fhandle" XDR definition in version 2 of
1212 the NFS protocol; see
1213 .I "Basic Data Types"
1214 in the definition of the NFS protocol, above.
1218 .IX "mount data types" fhstatus "" \fIfhstatus\fP
1220 union fhstatus switch (unsigned status) {
1230 is a union. If a "status" of zero is returned,
1231 the call completed successfully, and a file handle for the
1232 "directory" follows. A non-zero status indicates some sort of
1233 error. In this case the status is a UNIX error number.
1237 .IX "mount data types" dirpath "" \fIdirpath\fP
1239 typedef string dirpath<MNTPATHLEN>;
1244 is a server pathname of a directory.
1248 .IX "mount data types" name "" \fIname\fP
1250 typedef string name<MNTNAMLEN>;
1255 is an arbitrary string used for various names.
1258 .IX "mount server procedures"
1260 The following sections define the RPC procedures supplied by a
1266 * Protocol description for the mount program
1273 * Version 1 of the mount protocol used with
1274 * version 2 of the NFS protocol.
1278 void MOUNTPROC_NULL(void) = 0;
1279 fhstatus MOUNTPROC_MNT(dirpath) = 1;
1280 mountlist MOUNTPROC_DUMP(void) = 2;
1281 void MOUNTPROC_UMNT(dirpath) = 3;
1282 void MOUNTPROC_UMNTALL(void) = 4;
1283 exportlist MOUNTPROC_EXPORT(void) = 5;
1290 .IX "mount server procedures" MNTPROC_NULL() "" \fIMNTPROC_NULL()\fP
1293 MNTPROC_NULL(void) = 0;
1296 This procedure does no work. It is made available in all RPC
1297 services to allow server response testing and timing.
1301 .IX "mount server procedures" MNTPROC_MNT() "" \fIMNTPROC_MNT()\fP
1304 MNTPROC_MNT(dirpath) = 1;
1307 If the reply "status" is 0, then the reply "directory" contains the
1308 file handle for the directory "dirname". This file handle may be
1309 used in the NFS protocol. This procedure also adds a new entry to
1310 the mount list for this client mounting "dirname".
1313 \&Return Mount Entries
1314 .IX "mount server procedures" MNTPROC_DUMP() "" \fIMNTPROC_DUMP()\fP
1319 mountlist nextentry;
1323 MNTPROC_DUMP(void) = 2;
1326 Returns the list of remote mounted filesystems. The "mountlist"
1327 contains one entry for each "hostname" and "directory" pair.
1330 \&Remove Mount Entry
1331 .IX "mount server procedures" MNTPROC_UMNT() "" \fIMNTPROC_UMNT()\fP
1334 MNTPROC_UMNT(dirpath) = 3;
1337 Removes the mount list entry for the input "dirpath".
1340 \&Remove All Mount Entries
1341 .IX "mount server procedures" MNTPROC_UMNTALL() "" \fIMNTPROC_UMNTALL()\fP
1344 MNTPROC_UMNTALL(void) = 4;
1347 Removes all of the mount list entries for this client.
1350 \&Return Export List
1351 .IX "mount server procedures" MNTPROC_EXPORT() "" \fIMNTPROC_EXPORT()\fP
1358 struct *exportlist {
1365 MNTPROC_EXPORT(void) = 5;
1368 Returns a variable number of export list entries. Each entry
1369 contains a filesystem name and a list of groups that are allowed to
1370 import it. The filesystem name is in "filesys", and the group name
1371 is in the list "groups".
1373 Note: The exportlist should contain
1374 more information about the status of the filesystem, such as a