1 TAR(5) FreeBSD File Formats Manual TAR(5)
4 tar -- format of tape archive files
7 The tar archive format collects any number of files, directories, and
8 other file system objects (symbolic links, device nodes, etc.) into a
9 single stream of bytes. The format was originally designed to be used
10 with tape drives that operate with fixed-size blocks, but is widely used
11 as a general packaging mechanism.
14 A tar archive consists of a series of 512-byte records. Each file system
15 object requires a header record which stores basic metadata (pathname,
16 owner, permissions, etc.) and zero or more records containing any file
17 data. The end of the archive is indicated by two records consisting
18 entirely of zero bytes.
20 For compatibility with tape drives that use fixed block sizes, programs
21 that read or write tar files always read or write a fixed number of
22 records with each I/O operation. These ``blocks'' are always a multiple
23 of the record size. The most common block size--and the maximum sup-
24 ported by historic implementations--is 10240 bytes or 20 records. (Note:
25 the terms ``block'' and ``record'' here are not entirely standard; this
26 document follows the convention established by John Gilmore in document-
29 Old-Style Archive Format
30 The original tar archive format has been extended many times to include
31 additional information that various implementors found necessary. This
32 section describes the variant implemented by the tar command included in
33 Version 7 AT&T UNIX, which is one of the earliest widely-used versions of
36 The header record for an old-style tar archive consists of the following:
38 struct header_old_tar {
50 All unused bytes in the header record are filled with nulls.
52 name Pathname, stored as a null-terminated string. Early tar imple-
53 mentations only stored regular files (including hardlinks to
54 those files). One common early convention used a trailing "/"
55 character to indicate a directory name, allowing directory per-
56 missions and owner information to be archived and restored.
58 mode File mode, stored as an octal number in ASCII.
61 User id and group id of owner, as octal numbers in ASCII.
63 size Size of file, as octal number in ASCII. For regular files only,
64 this indicates the amount of data that follows the header. In
65 particular, this field was ignored by early tar implementations
66 when extracting hardlinks. Modern writers should always store a
67 zero length for hardlink entries.
69 mtime Modification time of file, as an octal number in ASCII. This
70 indicates the number of seconds since the start of the epoch,
71 00:00:00 UTC January 1, 1970. Note that negative values should
72 be avoided here, as they are handled inconsistently.
75 Header checksum, stored as an octal number in ASCII. To compute
76 the checksum, set the checksum field to all spaces, then sum all
77 bytes in the header using unsigned arithmetic. This field should
78 be stored as six octal digits followed by a null and a space
79 character. Note that many early implementations of tar used
80 signed arithmetic for the checksum field, which can cause inter-
81 operability problems when transferring archives between systems.
82 Modern robust readers compute the checksum both ways and accept
83 the header if either computation matches.
86 In order to preserve hardlinks and conserve tape, a file with
87 multiple links is only written to the archive the first time it
88 is encountered. The next time it is encountered, the linkflag is
89 set to an ASCII `1' and the linkname field holds the first name
90 under which this file appears. (Note that regular files have a
91 null value in the linkflag field.)
93 Early tar implementations varied in how they terminated these fields.
94 The tar command in Version 7 AT&T UNIX used the following conventions
95 (this is also documented in early BSD manpages): the pathname must be
96 null-terminated; the mode, uid, and gid fields must end in a space and a
97 null byte; the size and mtime fields must end in a space; the checksum is
98 terminated by a null and a space. Early implementations filled the
99 numeric fields with leading spaces. This seems to have been common prac-
100 tice until the IEEE Std 1003.1-1988 (``POSIX.1'') standard was released.
101 For best portability, modern implementations should fill the numeric
102 fields with leading zeros.
105 An early draft of IEEE Std 1003.1-1988 (``POSIX.1'') served as the basis
106 for John Gilmore's pdtar program and many system implementations from the
107 late 1980s and early 1990s. These archives generally follow the POSIX
108 ustar format described below with the following variations:
109 o The magic value is ``ustar '' (note the following space). The
110 version field contains a space character followed by a null.
111 o The numeric fields are generally filled with leading spaces (not
112 leading zeros as recommended in the final standard).
113 o The prefix field is often not used, limiting pathnames to the 100
114 characters of old-style archives.
117 IEEE Std 1003.1-1988 (``POSIX.1'') defined a standard tar file format to
118 be read and written by compliant implementations of tar(1). This format
119 is often called the ``ustar'' format, after the magic value used in the
120 header. (The name is an acronym for ``Unix Standard TAR''.) It extends
121 the historic format with new fields:
123 struct header_posix_ustar {
144 Type of entry. POSIX extended the earlier linkflag field with
145 several new type values:
146 ``0'' Regular file. NULL should be treated as a synonym, for
147 compatibility purposes.
150 ``3'' Character device node.
151 ``4'' Block device node.
155 Other A POSIX-compliant implementation must treat any unrecog-
156 nized typeflag value as a regular file. In particular,
157 writers should ensure that all entries have a valid file-
158 name so that they can be restored by readers that do not
159 support the corresponding extension. Uppercase letters
160 "A" through "Z" are reserved for custom extensions. Note
161 that sockets and whiteout entries are not archivable.
162 It is worth noting that the size field, in particular, has dif-
163 ferent meanings depending on the type. For regular files, of
164 course, it indicates the amount of data following the header.
165 For directories, it may be used to indicate the total size of all
166 files in the directory, for use by operating systems that pre-
167 allocate directory space. For all other types, it should be set
168 to zero by writers and ignored by readers.
170 magic Contains the magic value ``ustar'' followed by a NULL byte to
171 indicate that this is a POSIX standard archive. Full compliance
172 requires the uname and gname fields be properly set.
175 Version. This should be ``00'' (two copies of the ASCII digit
176 zero) for POSIX standard archives.
179 User and group names, as null-terminated ASCII strings. These
180 should be used in preference to the uid/gid values when they are
181 set and the corresponding names exist on the system.
184 Major and minor numbers for character device or block device
187 prefix First part of pathname. If the pathname is too long to fit in
188 the 100 bytes provided by the standard format, it can be split at
189 any / character with the first portion going here. If the prefix
190 field is not empty, the reader will prepend the prefix value and
191 a / character to the regular name field to obtain the full path-
194 Note that all unused bytes must be set to NULL.
196 Field termination is specified slightly differently by POSIX than by pre-
197 vious implementations. The magic, uname, and gname fields must have a
198 trailing NULL. The pathname, linkname, and prefix fields must have a
199 trailing NULL unless they fill the entire field. (In particular, it is
200 possible to store a 256-character pathname if it happens to have a / as
201 the 156th character.) POSIX requires numeric fields to be zero-padded in
202 the front, and allows them to be terminated with either space or NULL
205 Currently, most tar implementations comply with the ustar format, occa-
206 sionally extending it by adding new fields to the blank area at the end
207 of the header record.
209 Pax Interchange Format
210 There are many attributes that cannot be portably stored in a POSIX ustar
211 archive. IEEE Std 1003.1-2001 (``POSIX.1'') defined a ``pax interchange
212 format'' that uses two new types of entries to hold text-formatted meta-
213 data that applies to following entries. Note that a pax interchange for-
214 mat archive is a ustar archive in every respect. The new data is stored
215 in ustar-compatible archive entries that use the ``x'' or ``g'' typeflag.
216 In particular, older implementations that do not fully support these
217 extensions will extract the metadata into regular files, where the meta-
218 data can be examined as necessary.
220 An entry in a pax interchange format archive consists of one or two stan-
221 dard ustar entries, each with its own header and data. The first
222 optional entry stores the extended attributes for the following entry.
223 This optional first entry has an "x" typeflag and a size field that indi-
224 cates the total size of the extended attributes. The extended attributes
225 themselves are stored as a series of text-format lines encoded in the
226 portable UTF-8 encoding. Each line consists of a decimal number, a
227 space, a key string, an equals sign, a value string, and a new line. The
228 decimal number indicates the length of the entire line, including the
229 initial length field and the trailing newline. An example of such a
231 25 ctime=1084839148.1212\n
232 Keys in all lowercase are standard keys. Vendors can add their own keys
233 by prefixing them with an all uppercase vendor name and a period. Note
234 that, unlike the historic header, numeric values are stored using deci-
235 mal, not octal. A description of some common keys follows:
238 File access, inode change, and modification times. These fields
239 can be negative or include a decimal point and a fractional
242 uname, uid, gname, gid
243 User name, group name, and numeric UID and GID values. The user
244 name and group name stored here are encoded in UTF8 and can thus
245 include non-ASCII characters. The UID and GID fields can be of
249 The full path of the linked-to file. Note that this is encoded
250 in UTF8 and can thus include non-ASCII characters.
252 path The full pathname of the entry. Note that this is encoded in
253 UTF8 and can thus include non-ASCII characters.
255 realtime.*, security.*
256 These keys are reserved and may be used for future standardiza-
259 size The size of the file. Note that there is no length limit on this
260 field, allowing conforming archives to store files much larger
261 than the historic 8GB limit.
264 Vendor-specific attributes used by Joerg Schilling's star imple-
267 SCHILY.acl.access, SCHILY.acl.default
268 Stores the access and default ACLs as textual strings in a format
269 that is an extension of the format specified by POSIX.1e draft
270 17. In particular, each user or group access specification can
271 include a fourth colon-separated field with the numeric UID or
272 GID. This allows ACLs to be restored on systems that may not
273 have complete user or group information available (such as when
274 NIS/YP or LDAP services are temporarily unavailable).
276 SCHILY.devminor, SCHILY.devmajor
277 The full minor and major numbers for device nodes.
279 SCHILY.dev, SCHILY.ino, SCHILY.nlinks
280 The device number, inode number, and link count for the entry.
281 In particular, note that a pax interchange format archive using
282 Joerg Schilling's SCHILY.* extensions can store all of the data
285 LIBARCHIVE.xattr.namespace.key
286 Libarchive stores POSIX.1e-style extended attributes using keys
287 of this form. The key value is URL-encoded: All non-ASCII char-
288 acters and the two special characters ``='' and ``%'' are encoded
289 as ``%'' followed by two uppercase hexadecimal digits. The value
290 of this key is the extended attribute value encoded in base 64.
291 XXX Detail the base-64 format here XXX
294 XXX document other vendor-specific extensions XXX
296 Any values stored in an extended attribute override the corresponding
297 values in the regular tar header. Note that compliant readers should
298 ignore the regular fields when they are overridden. This is important,
299 as existing archivers are known to store non-compliant values in the
300 standard header fields in this situation. There are no limits on length
301 for any of these fields. In particular, numeric fields can be arbitrar-
302 ily large. All text fields are encoded in UTF8. Compliant writers
303 should store only portable 7-bit ASCII characters in the standard ustar
304 header and use extended attributes whenever a text value contains non-
307 In addition to the x entry described above, the pax interchange format
308 also supports a g entry. The g entry is identical in format, but speci-
309 fies attributes that serve as defaults for all subsequent archive
310 entries. The g entry is not widely used.
312 Besides the new x and g entries, the pax interchange format has a few
313 other minor variations from the earlier ustar format. The most troubling
314 one is that hardlinks are permitted to have data following them. This
315 allows readers to restore any hardlink to a file without having to rewind
316 the archive to find an earlier entry. However, it creates complications
317 for robust readers, as it is no longer clear whether or not they should
318 ignore the size field for hardlink entries.
321 The GNU tar program started with a pre-POSIX format similar to that
322 described earlier and has extended it using several different mechanisms:
323 It added new fields to the empty space in the header (some of which was
324 later used by POSIX for conflicting purposes); it allowed the header to
325 be continued over multiple records; and it defined new entries that mod-
326 ify following entries (similar in principle to the x entry described
327 above, but each GNU special entry is single-purpose, unlike the general-
328 purpose x entry). As a result, GNU tar archives are not POSIX compati-
329 ble, although more lenient POSIX-compliant readers can successfully
330 extract most GNU tar archives.
332 struct header_gnu_tar {
363 GNU tar uses the following special entry types, in addition to
364 those defined by POSIX:
366 7 GNU tar treats type "7" records identically to type "0"
367 records, except on one obscure RTOS where they are used
368 to indicate the pre-allocation of a contiguous file on
371 D This indicates a directory entry. Unlike the POSIX-stan-
372 dard "5" typeflag, the header is followed by data records
373 listing the names of files in this directory. Each name
374 is preceded by an ASCII "Y" if the file is stored in this
375 archive or "N" if the file is not stored in this archive.
376 Each name is terminated with a null, and an extra null
377 marks the end of the name list. The purpose of this
378 entry is to support incremental backups; a program
379 restoring from such an archive may wish to delete files
380 on disk that did not exist in the directory when the ar-
383 Note that the "D" typeflag specifically violates POSIX,
384 which requires that unrecognized typeflags be restored as
385 normal files. In this case, restoring the "D" entry as a
386 file could interfere with subsequent creation of the
387 like-named directory.
389 K The data for this entry is a long linkname for the fol-
390 lowing regular entry.
392 L The data for this entry is a long pathname for the fol-
393 lowing regular entry.
395 M This is a continuation of the last file on the previous
396 volume. GNU multi-volume archives guarantee that each
397 volume begins with a valid entry header. To ensure this,
398 a file may be split, with part stored at the end of one
399 volume, and part stored at the beginning of the next vol-
400 ume. The "M" typeflag indicates that this entry contin-
401 ues an existing file. Such entries can only occur as the
402 first or second entry in an archive (the latter only if
403 the first entry is a volume label). The size field spec-
404 ifies the size of this entry. The offset field at bytes
405 369-380 specifies the offset where this file fragment
406 begins. The realsize field specifies the total size of
407 the file (which must equal size plus offset). When
408 extracting, GNU tar checks that the header file name is
409 the one it is expecting, that the header offset is in the
410 correct sequence, and that the sum of offset and size is
411 equal to realsize. FreeBSD's version of GNU tar does not
412 handle the corner case of an archive's being continued in
413 the middle of a long name or other extension header.
415 N Type "N" records are no longer generated by GNU tar.
416 They contained a list of files to be renamed or symlinked
417 after extraction; this was originally used to support
418 long names. The contents of this record are a text
419 description of the operations to be done, in the form
420 ``Rename %s to %s\n'' or ``Symlink %s to %s\n''; in
421 either case, both filenames are escaped using K&R C syn-
424 S This is a ``sparse'' regular file. Sparse files are
425 stored as a series of fragments. The header contains a
426 list of fragment offset/length pairs. If more than four
427 such entries are required, the header is extended as nec-
428 essary with ``extra'' header extensions (an older format
429 that is no longer used), or ``sparse'' extensions.
431 V The name field should be interpreted as a tape/volume
432 header name. This entry should generally be ignored on
435 magic The magic field holds the five characters ``ustar'' followed by a
436 space. Note that POSIX ustar archives have a trailing null.
439 The version field holds a space character followed by a null.
440 Note that POSIX ustar archives use two copies of the ASCII digit
444 The time the file was last accessed and the time of last change
445 of file information, stored in octal as with mtime.
448 This field is apparently no longer used.
450 Sparse offset / numbytes
451 Each such structure specifies a single fragment of a sparse file.
452 The two fields store values as octal numbers. The fragments are
453 each padded to a multiple of 512 bytes in the archive. On
454 extraction, the list of fragments is collected from the header
455 (including any extension headers), and the data is then read and
456 written to the file at appropriate offsets.
459 If this is set to non-zero, the header will be followed by addi-
460 tional ``sparse header'' records. Each such record contains
461 information about as many as 21 additional sparse blocks as shown
464 struct gnu_sparse_header {
474 A binary representation of the file's complete size, with a much
475 larger range than the POSIX file size. In particular, with M
476 type files, the current entry is only a portion of the file. In
477 that case, the POSIX size field will indicate the size of this
478 entry; the realsize field will indicate the total size of the
482 XXX More Details Needed XXX
484 Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an
485 ``extended'' format that is fundamentally similar to pax interchange for-
486 mat, with the following differences:
487 o Extended attributes are stored in an entry whose type is X, not
488 x, as used by pax interchange format. The detailed format of
489 this entry appears to be the same as detailed above for the x
491 o An additional A entry is used to store an ACL for the following
492 regular entry. The body of this entry contains a seven-digit
493 octal number (whose value is 01000000 plus the number of ACL
494 entries) followed by a zero byte, followed by the textual ACL
498 One common extension, utilized by GNU tar, star, and other newer tar
499 implementations, permits binary numbers in the standard numeric fields.
500 This is flagged by setting the high bit of the first character. This
501 permits 95-bit values for the length and time fields and 63-bit values
502 for the uid, gid, and device numbers. GNU tar supports this extension
503 for the length, mtime, ctime, and atime fields. Joerg Schilling's star
504 program supports this extension for all numeric fields. Note that this
505 extension is largely obsoleted by the extended attribute record provided
506 by the pax interchange format.
508 Another early GNU extension allowed base-64 values rather than octal.
509 This extension was short-lived and such archives are almost never seen.
510 However, there is still code in GNU tar to support them; this code is
511 responsible for a very cryptic warning message that is sometimes seen
512 when GNU tar encounters a damaged archive.
515 ar(1), pax(1), tar(1)
518 The tar utility is no longer a part of POSIX or the Single Unix Standard.
519 It last appeared in Version 2 of the Single UNIX Specification
520 (``SUSv2''). It has been supplanted in subsequent standards by pax(1).
521 The ustar format is currently part of the specification for the pax(1)
522 utility. The pax interchange file format is new with IEEE Std
523 1003.1-2001 (``POSIX.1'').
526 A tar command appeared in Seventh Edition Unix, which was released in
527 January, 1979. It replaced the tp program from Fourth Edition Unix which
528 in turn replaced the tap program from First Edition Unix. John Gilmore's
529 pdtar public-domain implementation (circa 1987) was highly influential
530 and formed the basis of GNU tar. Joerg Shilling's star archiver is
531 another open-source (GPL) archiver (originally developed circa 1985)
532 which features complete support for pax interchange format.
534 FreeBSD 6.0 May 20, 2004 FreeBSD 6.0