From: Peter Avalos Date: Wed, 15 Nov 2006 23:13:45 +0000 (+0000) Subject: Use the vendor-supplied man pages for file.1 and magic.5 instead of the X-Git-Tag: v2.0.1~4115^2 X-Git-Url: https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff_plain/59136389b5bf4e7bd88318576d916a7eb0154a1d Use the vendor-supplied man pages for file.1 and magic.5 instead of the stale ones we have now. Reviewed-by: vbd --- diff --git a/contrib/file-4/doc/file.man b/contrib/file-4/doc/file.man new file mode 100644 index 0000000000..ce8caccf93 --- /dev/null +++ b/contrib/file-4/doc/file.man @@ -0,0 +1,530 @@ +.TH FILE __CSECTION__ "Copyright but distributable" +.\" $Id: file.man,v 1.58 2006/05/03 19:20:25 christos Exp $ +.SH NAME +file +\- determine file type +.SH SYNOPSIS +.B file +[ +.B \-bchikLnNprsvz +] +[ +.B \-f +.I namefile +] +[ +.B \-F +.I separator +] +[ +.B \-m +.I magicfiles +] +.I file +\&... +.br +.B file +.B -C +[ +.B \-m +magicfile ] +.SH DESCRIPTION +This manual page documents version __VERSION__ of the +.B file +command. +.PP +.B File +tests each argument in an attempt to classify it. +There are three sets of tests, performed in this order: +filesystem tests, magic number tests, and language tests. +The +.I first +test that succeeds causes the file type to be printed. +.PP +The type printed will usually contain one of the words +.B text +(the file contains only +printing characters and a few common control +characters and is probably safe to read on an +.SM ASCII +terminal), +.B executable +(the file contains the result of compiling a program +in a form understandable to some \s-1UNIX\s0 kernel or another), +or +.B data +meaning anything else (data is usually `binary' or non-printable). +Exceptions are well-known file formats (core files, tar archives) +that are known to contain binary data. +When modifying the file +.I __MAGIC__ +or the program itself, +.B "preserve these keywords" . +People depend on knowing that all the readable files in a directory +have the word ``text'' printed. +Don't do as Berkeley did and change ``shell commands text'' +to ``shell script''. +Note that the file +.I __MAGIC__ +is built mechanically from a large number of small files in +the subdirectory +.I Magdir +in the source distribution of this program. +.PP +The filesystem tests are based on examining the return from a +.BR stat (2) +system call. +The program checks to see if the file is empty, +or if it's some sort of special file. +Any known file types appropriate to the system you are running on +(sockets, symbolic links, or named pipes (FIFOs) on those systems that +implement them) +are intuited if they are defined in +the system header file +.IR . +.PP +The magic number tests are used to check for files with data in +particular fixed formats. +The canonical example of this is a binary executable (compiled program) +.I a.out +file, whose format is defined in +.I a.out.h +and possibly +.I exec.h +in the standard include directory. +These files have a `magic number' stored in a particular place +near the beginning of the file that tells the \s-1UNIX\s0 operating system +that the file is a binary executable, and which of several types thereof. +The concept of `magic number' has been applied by extension to data files. +Any file with some invariant identifier at a small fixed +offset into the file can usually be described in this way. +The information identifying these files is read from the compiled +magic file +.I __MAGIC__.mgc , +or +.I __MAGIC__ +if the compile file does not exist. In addition +.B file +will look in +.I $HOME/.magic.mgc , +or +.I $HOME/.magic +for magic entries. +.PP +If a file does not match any of the entries in the magic file, +it is examined to see if it seems to be a text file. +ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets +(such as those used on Macintosh and IBM PC systems), +UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC +character sets can be distinguished by the different +ranges and sequences of bytes that constitute printable text +in each set. +If a file passes any of these tests, its character set is reported. +ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified +as ``text'' because they will be mostly readable on nearly any terminal; +UTF-16 and EBCDIC are only ``character data'' because, while +they contain text, it is text that will require translation +before it can be read. +In addition, +.B file +will attempt to determine other characteristics of text-type files. +If the lines of a file are terminated by CR, CRLF, or NEL, instead +of the Unix-standard LF, this will be reported. +Files that contain embedded escape sequences or overstriking +will also be identified. +.PP +Once +.B file +has determined the character set used in a text-type file, +it will +attempt to determine in what language the file is written. +The language tests look for particular strings (cf +.IR names.h ) +that can appear anywhere in the first few blocks of a file. +For example, the keyword +.B .br +indicates that the file is most likely a +.BR troff (1) +input file, just as the keyword +.B struct +indicates a C program. +These tests are less reliable than the previous +two groups, so they are performed last. +The language test routines also test for some miscellany +(such as +.BR tar (1) +archives). +.PP +Any file that cannot be identified as having been written +in any of the character sets listed above is simply said to be ``data''. +.SH OPTIONS +.TP 8 +.B "\-b, \-\-brief" +Do not prepend filenames to output lines (brief mode). +.TP 8 +.B "\-c, \-\-checking\-printout" +Cause a checking printout of the parsed form of the magic file. +This is usually used in conjunction with +.B \-m +to debug a new magic file before installing it. +.TP 8 +.B "\-C, \-\-compile" +Write a magic.mgc output file that contains a pre-parsed version of +file. +.TP 8 +.BI "\-f, \-\-files\-from" " namefile" +Read the names of the files to be examined from +.I namefile +(one per line) +before the argument list. +Either +.I namefile +or at least one filename argument must be present; +to test the standard input, use ``\-'' as a filename argument. +.TP 8 +.BI "\-F, \-\-separator" " separator" +Use the specified string as the separator between the filename and the +file result returned. Defaults to ``:''. +.TP 8 +.B "\-h, \-\-no-dereference" +option causes symlinks not to be followed +(on systems that support symbolic links). This is the default if the +environment variable +.I POSIXLY_CORRECT +is not defined. +.TP 8 +.B "\-i, \-\-mime" +Causes the file command to output mime type strings rather than the more +traditional human readable ones. Thus it may say +``text/plain; charset=us-ascii'' +rather +than ``ASCII text''. +In order for this option to work, file changes the way +it handles files recognized by the command itself (such as many of the +text file types, directories etc), and makes use of an alternative +``magic'' file. +(See ``FILES'' section, below). +.TP 8 +.B "\-k, \-\-keep\-going" +Don't stop at the first match, keep going. +.TP 8 +.B "\-L, \-\-dereference" +option causes symlinks to be followed, as the like-named option in +.BR ls (1) +(on systems that support symbolic links). +This is the default if the environment variable +.I POSIXLY_CORRECT +is defined. +.TP 8 +.BI "\-m, \-\-magic\-file" " list" +Specify an alternate list of files containing magic numbers. +This can be a single file, or a colon-separated list of files. +If a compiled magic file is found alongside, it will be used instead. +With the \-i or \-\-mime option, the program adds ".mime" to each file name. +.TP 8 +.B "\-n, \-\-no\-buffer" +Force stdout to be flushed after checking each file. +This is only useful if checking a list of files. +It is intended to be used by programs that want filetype output from a pipe. +.TP 8 +.B "\-N, \-\-no\-pad" +Don't pad filenames so that they align in the output. +.TP 8 +.B "\-p, \-\-preserve\-date" +On systems that support +.BR utime (2) +or +.BR utimes(2), +attempt to preserve the access time of files analyzed, to pretend that +.BR file (2) +never read them. +.TP 8 +.B "\-r, \-\-raw" +Don't translate unprintable characters to \eooo. +Normally +.B file +translates unprintable characters to their octal representation. +.TP 8 +.B "\-s, \-\-special\-files" +Normally, +.B file +only attempts to read and determine the type of argument files which +.BR stat (2) +reports are ordinary files. +This prevents problems, because reading special files may have peculiar +consequences. +Specifying the +.BR \-s +option causes +.B file +to also read argument files which are block or character special files. +This is useful for determining the filesystem types of the data in raw +disk partitions, which are block special files. +This option also causes +.B file +to disregard the file size as reported by +.BR stat (2) +since on some systems it reports a zero size for raw disk partitions. +.TP 8 +.B "\-v, \-\-version" +Print the version of the program and exit. +.TP 8 +.B "\-z, \-\-uncompress" +Try to look inside compressed files. +.TP 8 +.B "\-\-help" +Print a help message and exit. +.SH FILES +.TP +.I __MAGIC__.mgc +Default compiled list of magic numbers +.TP +.I __MAGIC__ +Default list of magic numbers +.TP +.I __MAGIC__.mime.mgc +Default compiled list of magic numbers, used to output mime types when +the -i option is specified. +.TP +.I __MAGIC__.mime +Default list of magic numbers, used to output mime types when the -i option +is specified. + +.SH ENVIRONMENT +The environment variable +.B MAGIC +can be used to set the default magic number file name. +If that variable is set, then +.B file +will not attempt to open +.B $HOME/.magic . +.B file +adds ".mime" and/or ".mgc" to the value of this variable as appropriate. +The environment variable +.B POSIXLY_CORRECT +controls (on systems that support symbolic links), if +.B file +will attempt to follow symlinks or not. If set, then +.B file +follows symlink, otherwise it does not. This is also controlled +by the +.B L +and +.B h +options. +.SH SEE ALSO +.BR magic (__FSECTION__) +\- description of magic file format. +.br +.BR strings (1), " od" (1), " hexdump(1)" +\- tools for examining non-textfiles. +.SH STANDARDS CONFORMANCE +This program is believed to exceed the System V Interface Definition +of FILE(CMD), as near as one can determine from the vague language +contained therein. +Its behavior is mostly compatible with the System V program of the same name. +This version knows more magic, however, so it will produce +different (albeit more accurate) output in many cases. +.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html +.PP +The one significant difference +between this version and System V +is that this version treats any white space +as a delimiter, so that spaces in pattern strings must be escaped. +For example, +.br +>10 string language impress\ (imPRESS data) +.br +in an existing magic file would have to be changed to +.br +>10 string language\e impress (imPRESS data) +.br +In addition, in this version, if a pattern string contains a backslash, +it must be escaped. +For example +.br +0 string \ebegindata Andrew Toolkit document +.br +in an existing magic file would have to be changed to +.br +0 string \e\ebegindata Andrew Toolkit document +.br +.PP +SunOS releases 3.2 and later from Sun Microsystems include a +.BR file (1) +command derived from the System V one, but with some extensions. +My version differs from Sun's only in minor ways. +It includes the extension of the `&' operator, used as, +for example, +.br +>16 long&0x7fffffff >0 not stripped +.SH MAGIC DIRECTORY +The magic file entries have been collected from various sources, +mainly USENET, and contributed by various authors. +Christos Zoulas (address below) will collect additional +or corrected magic file entries. +A consolidation of magic file entries +will be distributed periodically. +.PP +The order of entries in the magic file is significant. +Depending on what system you are using, the order that +they are put together may be incorrect. +If your old +.B file +command uses a magic file, +keep the old magic file around for comparison purposes +(rename it to +.IR __MAGIC__.orig ). +.SH EXAMPLES +.nf +$ file file.c file /dev/{wd0a,hda} +file.c: C program text +file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), + dynamically linked (uses shared libs), stripped +/dev/wd0a: block special (0/0) +/dev/hda: block special (3/0) +$ file -s /dev/wd0{b,d} +/dev/wd0b: data +/dev/wd0d: x86 boot sector +$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} +/dev/hda: x86 boot sector +/dev/hda1: Linux/i386 ext2 filesystem +/dev/hda2: x86 boot sector +/dev/hda3: x86 boot sector, extended partition table +/dev/hda4: Linux/i386 ext2 filesystem +/dev/hda5: Linux/i386 swap file +/dev/hda6: Linux/i386 swap file +/dev/hda7: Linux/i386 swap file +/dev/hda8: Linux/i386 swap file +/dev/hda9: empty +/dev/hda10: empty + +$ file -i file.c file /dev/{wd0a,hda} +file.c: text/x-c +file: application/x-executable, dynamically linked (uses shared libs), +not stripped +/dev/hda: application/x-not-regular-file +/dev/wd0a: application/x-not-regular-file + +.fi +.SH HISTORY +There has been a +.B file +command in every \s-1UNIX\s0 since at least Research Version 4 +(man page dated November, 1973). +The System V version introduced one significant major change: +the external list of magic number types. +This slowed the program down slightly but made it a lot more flexible. +.PP +This program, based on the System V version, +was written by Ian Darwin +without looking at anybody else's source code. +.PP +John Gilmore revised the code extensively, making it better than +the first version. +Geoff Collyer found several inadequacies +and provided some magic file entries. +Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989. +.PP +Guy Harris, guy@netapp.com, made many changes from 1993 to the present. +.PP +Primary development and maintenance from 1990 to the present by +Christos Zoulas (christos@astron.com). +.PP +Altered by Chris Lowth, chris@lowth.com, 2000: +Handle the ``-i'' option to output mime type strings and using an alternative +magic file and internal logic. +.PP +Altered by Eric Fischer (enf@pobox.com), July, 2000, +to identify character codes and attempt to identify the languages +of non-ASCII files. +.PP +The list of contributors to the "Magdir" directory (source for the +.I __MAGIC__ +file) is too long to include here. +You know who you are; thank you. +.SH LEGAL NOTICE +Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. +Covered by the standard Berkeley Software Distribution copyright; see the file +LEGAL.NOTICE in the source distribution. +.PP +The files +.I tar.h +and +.I is_tar.c +were written by John Gilmore from his public-domain +.B tar +program, and are not covered by the above license. +.SH BUGS +There must be a better way to automate the construction of the Magic +file from all the glop in Magdir. +What is it? +Better yet, the magic file should be compiled into binary (say, +.BR ndbm (3) +or, better yet, fixed-length +.SM ASCII +strings for use in heterogenous network environments) for faster startup. +Then the program would run as fast as the Version 7 program of the same name, +with the flexibility of the System V version. +.PP +.B File +uses several algorithms that favor speed over accuracy, +thus it can be misled about the contents of +text +files. +.PP +The support for +text +files (primarily for programming languages) +is simplistic, inefficient and requires recompilation to update. +.PP +There should be an ``else'' clause to follow a series of continuation lines. +.PP +The magic file and keywords should have regular expression support. +Their use of +.SM "ASCII TAB" +as a field delimiter is ugly and makes +it hard to edit the files, but is entrenched. +.PP +It might be advisable to allow upper-case letters in keywords +for e.g., +.BR troff (1) +commands vs man page macros. +Regular expression support would make this easy. +.PP +The program doesn't grok \s-2FORTRAN\s0. +It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which +appear indented at the start of line. +Regular expression support would make this easy. +.PP +The list of keywords in +.I ascmagic +probably belongs in the Magic file. +This could be done by using some keyword like `*' for the offset value. +.PP +Another optimization would be to sort +the magic file so that we can just run down all the +tests for the first byte, first word, first long, etc, once we +have fetched it. +Complain about conflicts in the magic file entries. +Make a rule that the magic entries sort based on file offset rather +than position within the magic file? +.PP +The program should provide a way to give an estimate +of ``how good'' a guess is. +We end up removing guesses (e.g. ``From '' as first 5 chars of file) because +they are not as good as other guesses (e.g. ``Newsgroups:'' versus +``Return-Path:''). +Still, if the others don't pan out, it should be possible to use the +first guess. +.PP +This program is slower than some vendors' file commands. +The new support for multiple character codes makes it even slower. +.PP +This manual page, and particularly this section, is too long. +.SH AVAILABILITY +You can obtain the original author's latest version by anonymous FTP +on +.B ftp.astron.com +in the directory +.I /pub/file/file-X.YZ.tar.gz diff --git a/contrib/file-4/doc/magic.man b/contrib/file-4/doc/magic.man new file mode 100644 index 0000000000..9bd9fa797f --- /dev/null +++ b/contrib/file-4/doc/magic.man @@ -0,0 +1,425 @@ +.TH MAGIC __FSECTION__ "Public Domain" +.\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems. +.SH NAME +magic \- file command's magic number file +.SH DESCRIPTION +This manual page documents the format of the magic file as +used by the +.BR file (__CSECTION__) +command, version __VERSION__. +The +.BR file +command identifies the type of a file using, +among other tests, +a test for whether the file begins with a certain +.IR "magic number" . +The file +.I __MAGIC__ +specifies what magic numbers are to be tested for, +what message to print if a particular magic number is found, +and additional information to extract from the file. +.PP +Each line of the file specifies a test to be performed. +A test compares the data starting at a particular offset +in the file with a 1-byte, 2-byte, or 4-byte numeric value or +a string. +If the test succeeds, a message is printed. +The line consists of the following fields: +.IP offset \w'message'u+2n +A number specifying the offset, in bytes, into the file of the data +which is to be tested. +.IP type +The type of the data to be tested. +The possible values are: +.RS +.IP byte \w'message'u+2n +A one-byte value. +.IP short +A two-byte value (on most systems) in this machine's native byte order. +.IP long +A four-byte value (on most systems) in this machine's native byte order. +.IP quad +An eight-byte value (on most systems) in this machine's native byte order. +.IP string +A string of bytes. +The string type specification can be optionally followed +by /[Bbc]*. +The ``B'' flag compacts whitespace in the target, which must +contain at least one whitespace character. +If the magic has +.I n +consecutive blanks, the target needs at least +.I n +consecutive blanks to match. +The ``b'' flag treats every blank in the target as an optional blank. +Finally the ``c'' flag, specifies case insensitive matching: lowercase +characters in the magic match both lower and upper case characters in the +targer, whereas upper case characters in the magic, only much uppercase +characters in the target. +.IP pstring +A pascal style string where the first byte is interpreted as the an +unsigned length. The string is not NUL terminated. +.IP date +A four-byte value interpreted as a UNIX date. +.IP qdate +A eight-byte value interpreted as a UNIX date. +.IP ldate +A four-byte value interpreted as a UNIX-style date, but interpreted as +local time rather than UTC. +.IP qldate +An eight-byte value interpreted as a UNIX-style date, but interpreted as +local time rather than UTC. +.IP beshort +A two-byte value (on most systems) in big-endian byte order. +.IP belong +A four-byte value (on most systems) in big-endian byte order. +.IP bequad +An eight-byte value (on most systems) in big-endian byte order. +.IP bedate +A four-byte value (on most systems) in big-endian byte order, +interpreted as a Unix date. +.IP beqdate +An eight-byte value (on most systems) in big-endian byte order, +interpreted as a Unix date. +.IP beldate +A four-byte value (on most systems) in big-endian byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.IP beqldate +An eight-byte value (on most systems) in big-endian byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.IP bestring16 +A two-byte unicode (UCS16) string in big-endian byte order. +.IP leshort +A two-byte value (on most systems) in little-endian byte order. +.IP lelong +A four-byte value (on most systems) in little-endian byte order. +.IP lequad +An eight-byte value (on most systems) in little-endian byte order. +.IP ledate +A four-byte value (on most systems) in little-endian byte order, +interpreted as a UNIX date. +.IP leqdate +An eight-byte value (on most systems) in little-endian byte order, +interpreted as a UNIX date. +.IP leldate +A four-byte value (on most systems) in little-endian byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.IP leqldate +An eight-byte value (on most systems) in little-endian byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.IP lestring16 +A two-byte unicode (UCS16) string in little-endian byte order. +.IP melong +A four-byte value (on most systems) in middle-endian (PDP-11) byte order. +.IP medate +A four-byte value (on most systems) in middle-endian (PDP-11) byte order, +interpreted as a UNIX date. +.IP meldate +A four-byte value (on most systems) in middle-endian (PDP-11) byte order, +interpreted as a UNIX-style date, but interpreted as local time rather +than UTC. +.IP regex +A regular expression match in extended POSIX regular expression syntax +(much like egrep). +The type specification can be optionally followed by +.B /c +for case-insensitive matches. +The regular expression is always +tested against the first +.B N +lines, where +.B N +is the given offset, thus it +is only useful for (single-byte encoded) text. +.B ^ +and +.B $ +will match the beginning and end of individual lines, respectively, +not beginning and end of file. +.IP search +A literal string search starting at the given offset. It must be followed by +.B / +which specifies how many matches shall be attempted (the range). +This is suitable for searching larger binary expressions with variable +offsets, using +.B \e +escapes for special characters. +.RE +.PP +The numeric types may optionally be followed by +.B & +and a numeric value, +to specify that the value is to be AND'ed with the +numeric value before any comparisons are done. +Prepending a +.B u +to the type indicates that ordered comparisons should be unsigned. +.IP test +The value to be compared with the value from the file. +If the type is +numeric, this value +is specified in C form; if it is a string, it is specified as a C string +with the usual escapes permitted (e.g. \en for new-line). +.IP +Numeric values +may be preceded by a character indicating the operation to be performed. +It may be +.BR = , +to specify that the value from the file must equal the specified value, +.BR < , +to specify that the value from the file must be less than the specified +value, +.BR > , +to specify that the value from the file must be greater than the specified +value, +.BR & , +to specify that the value from the file must have set all of the bits +that are set in the specified value, +.BR ^ , +to specify that the value from the file must have clear any of the bits +that are set in the specified value, or +.BR ~ , +the value specified after is negated before tested. +.BR x , +to specify that any value will match. +If the character is omitted, it is assumed to be +.BR = . +For all tests except +.B string +and +.B regex, +operation +.BR ! +specifies that the line matches if the test does +.B not +succeed. +.IP +Numeric values are specified in C form; e.g. +.B 13 +is decimal, +.B 013 +is octal, and +.B 0x13 +is hexadecimal. +.IP +For string values, the byte string from the +file must match the specified byte string. +The operators +.BR = , +.B < +and +.B > +(but not +.BR & ) +can be applied to strings. +The length used for matching is that of the string argument +in the magic file. +This means that a line can match any string, and +then presumably print that string, by doing +.B >\e0 +(because all strings are greater than the null string). +.IP message +The message to be printed if the comparison succeeds. If the string +contains a +.BR printf (3) +format specification, the value from the file (with any specified masking +performed) is printed using the message as the format string. +.PP +Some file formats contain additional information which is to be printed +along with the file type or need additional tests to determine the true +file type. +These additional tests are introduced by one or more +.B > +characters preceding the offset. +The number of +.B > +on the line indicates the level of the test; a line with no +.B > +at the beginning is considered to be at level 0. +Tests are arranged in a tree-like hierarchy: +If a the test on a line at level +.IB n +succeeds, all following tests at level +.IB n+1 +are performed, and the messages printed if the tests succeed, untile a line +with level +.IB n +(or less) appears. +For more complex files, one can use empty messages to get just the +"if/then" effect, in the following way: +.sp +.nf + 0 string MZ + >0x18 leshort <0x40 MS-DOS executable + >0x18 leshort >0x3f extended PC executable (e.g., MS Windows) +.fi +.PP +Offsets do not need to be constant, but can also be read from the file +being examined. +If the first character following the last +.B > +is a +.B ( +then the string after the parenthesis is interpreted as an indirect offset. +That means that the number after the parenthesis is used as an offset in +the file. +The value at that offset is read, and is used again as an offset +in the file. +Indirect offsets are of the form: +.BI (( x [.[bslBSL]][+\-][ y ]). +The value of +.I x +is used as an offset in the file. A byte, short or long is read at that offset +depending on the +.B [bslBSLm] +type specifier. +The capitalized types interpret the number as a big endian +value, whereas the small letter versions interpret the number as a little +endian value; +the +.B m +type interprets the number as a middle endian (PDP-11) value. +To that number the value of +.I y +is added and the result is used as an offset in the file. +The default type if one is not specified is long. +.PP +That way variable length structures can be examined: +.sp +.nf + # MS Windows executables are also valid MS-DOS executables + 0 string MZ + >0x18 leshort <0x40 MZ executable (MS-DOS) + # skip the whole block below if it is not an extended executable + >0x18 leshort >0x3f + >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) + >>(0x3c.l) string LX\e0\e0 LX executable (OS/2) +.fi +.PP +This strategy of examining has one drawback: You must make sure that +you eventually print something, or users may get empty output (like, when +there is neither PE\e0\e0 nor LE\e0\e0 in the above example) +.PP +If this indirect offset cannot be used as-is, there are simple calculations +possible: appending +.BI [+-*/%&|^] +inside parentheses allows one to modify +the value read from the file before it is used as an offset: +.sp +.nf + # MS Windows executables are also valid MS-DOS executables + 0 string MZ + # sometimes, the value at 0x18 is less that 0x40 but there's still an + # extended executable, simply appended to the file + >0x18 leshort <0x40 + >>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) + >>(4.s*512) leshort !0x014c MZ executable (MS-DOS) +.fi +.PP +Sometimes you do not know the exact offset as this depends on the length or +position (when indirection was used before) of preceding fields. You can +specify an offset relative to the end of the last up-level field using +.BI & +as a prefix to the offset: +.sp +.nf + 0 string MZ + >0x18 leshort >0x3f + >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) + # immediately following the PE signature is the CPU type + >>>&0 leshort 0x14c for Intel 80386 + >>>&0 leshort 0x184 for DEC Alpha +.fi +.PP +Indirect and relative offsets can be combined: +.sp +.nf + 0 string MZ + >0x18 leshort <0x40 + >>(4.s*512) leshort !0x014c MZ executable (MS-DOS) + # if it's not COFF, go back 512 bytes and add the offset taken + # from byte 2/3, which is yet another way of finding the start + # of the extended executable + >>>&(2.s-514) string LE LE executable (MS Windows VxD driver) +.fi +.PP +Or the other way around: +.sp +.nf + 0 string MZ + >0x18 leshort >0x3f + >>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows) + # at offset 0x80 (-4, since relative offsets start at the end + # of the up-level match) inside the LE header, we find the absolute + # offset to the code area, where we look for a specific signature + >>>(&0x7c.l+0x26) string UPX \eb, UPX compressed +.fi +.PP +Or even both! +.sp +.nf + 0 string MZ + >0x18 leshort >0x3f + >>(0x3c.l) string LE\e0\e0 LE executable (MS-Windows) + # at offset 0x58 inside the LE header, we find the relative offset + # to a data area where we look for a specific signature + >>>&(&0x54.l-3) string UNACE \eb, ACE self-extracting archive +.fi +.PP +Finally, if you have to deal with offset/length pairs in your file, even the +second value in a parenthesized expression can be taken from the file itself, +using another set of parentheses. Note that this additional indirect offset +is always relative to the start of the main indirect offset. +.sp +.nf + 0 string MZ + >0x18 leshort >0x3f + >>(0x3c.l) string PE\e0\e0 PE executable (MS-Windows) + # search for the PE section called ".idata"... + >>>&0xf4 search/0x140 .idata + # ...and go to the end of it, calculated from start+length; + # these are located 14 and 10 bytes after the section name + >>>>(&0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive +.fi +.SH BUGS +The formats +.IR long , +.IR belong , +.IR lelong , +.IR melong , +.IR short , +.IR beshort , +.IR leshort , +.IR date , +.IR bedate , +.IR medate , +.IR ledate , +.IR beldate , +.IR leldate , +and +.I meldate +are system-dependent; perhaps they should be specified as a number +of bytes (2B, 4B, etc), +since the files being recognized typically come from +a system on which the lengths are invariant. +.SH SEE ALSO +.BR file (__CSECTION__) +\- the command that reads this file. +.\" +.\" From: guy@sun.uucp (Guy Harris) +.\" Newsgroups: net.bugs.usg +.\" Subject: /etc/magic's format isn't well documented +.\" Message-ID: <2752@sun.uucp> +.\" Date: 3 Sep 85 08:19:07 GMT +.\" Organization: Sun Microsystems, Inc. +.\" Lines: 136 +.\" +.\" Here's a manual page for the format accepted by the "file" made by adding +.\" the changes I posted to the S5R2 version. +.\" +.\" Modified for Ian Darwin's version of the file command. +.\" @(#)$Id: magic.man,v 1.33 2006/10/31 19:37:16 christos Exp $