1 .\" $FreeBSD: src/contrib/awk/doc/awk.1,v 1.5 1999/09/27 08:57:04 sheldonh Exp $
2 .\" $DragonFly: src/contrib/awk/doc/Attic/awk.1,v 1.2 2003/06/17 04:23:58 dillon Exp $
6 .TH GAWK 1 "Apr 28 1999" "Free Software Foundation" "Utility Commands"
8 gawk \- pattern scanning and processing language
11 [ POSIX or GNU style options ]
19 [ POSIX or GNU style options ]
27 is the GNU Project's implementation of the AWK programming language.
28 It conforms to the definition of the language in
29 the \*(PX 1003.2 Command Language And Utilities Standard.
30 This version in turn is based on the description in
31 .IR "The AWK Programming Language" ,
32 by Aho, Kernighan, and Weinberger,
33 with the additional features found in the System V Release 4 version
37 also provides more recent Bell Labs
39 extensions, and some GNU-specific extensions.
41 The command line consists of options to
43 itself, the AWK program text (if not supplied via the
47 options), and values to be made
52 pre-defined AWK variables.
56 options may be either the traditional \*(PX one letter options,
57 or the GNU style long options. \*(PX options start with a single ``\-'',
58 while long options start with ``\-\^\-''.
59 Long options are provided for both GNU-specific features and
60 for \*(PX mandated features.
62 Following the \*(PX standard,
64 options are supplied via arguments to the
68 options may be supplied
71 option has a corresponding long option, as detailed below.
72 Arguments to long options are either joined with the option
75 sign, with no intervening spaces, or they may be provided in the
76 next command line argument.
77 Long options may be abbreviated, as long as the abbreviation
82 accepts the following options.
88 .BI \-\^\-field-separator " fs"
91 for the input field separator (the value of the
97 \fB\-v\fI var\fB\^=\^\fIval\fR
100 \fB\-\^\-assign \fIvar\fB\^=\^\fIval\fR
105 before execution of the program begins.
106 Such variable values are available to the
108 block of an AWK program.
111 .BI \-f " program-file"
114 .BI \-\^\-file " program-file"
115 Read the AWK program source from the file
117 instead of from the first command line argument.
129 Set various memory limits to the value
133 flag sets the maximum number of fields, and the
135 flag sets the maximum record size. These two flags and the
137 option are from the Bell Labs research version of \*(UX
143 has no pre-defined limits.
158 mode. In compatibility mode,
160 behaves identically to \*(UX
162 none of the GNU-specific extensions are recognized.
165 is preferred over the other forms of this option.
167 .BR "GNU EXTENSIONS" ,
168 below, for more information.
181 Print the short version of the GNU copyright information message on
182 the standard output, and exits successfully.
195 Print a relatively short summary of the available options on
198 .IR "GNU Coding Standards" ,
199 these options cause an immediate, successful exit.)
206 Provide warnings about constructs that are
207 dubious or non-portable to other AWK implementations.
214 Provide warnings about constructs that are
215 not portable to the original version of Unix
218 .\" This option is left undocumented, on purpose.
225 Provide a moment of nostalgia for long time
237 mode, with the following additional restrictions:
242 escape sequences are not recognized.
245 Only space and tab act as field separators when
247 is set to a single space, newline does not.
261 cannot be used in place of
269 function is not available.
273 .B "\-W re\-interval"
276 .B \-\^\-re\-interval
278 .I "interval expressions"
279 in regular expression matching
281 .BR "Regular Expressions" ,
283 Interval expressions were not traditionally available in the
284 AWK language. The POSIX standard added them, to make
288 consistent with each other.
289 However, their use is likely
290 to break old AWK programs, so
292 only provides them if they are requested with this option, or when
297 .BI "\-W source " program-text
300 .BI \-\^\-source " program-text"
303 as AWK program source code.
304 This option allows the easy intermixing of library functions (used via the
308 options) with source code entered on the command line.
309 It is intended primarily for medium to large AWK programs used
317 Print version information for this particular copy of
319 on the standard output.
320 This is useful mainly for knowing if the current copy of
323 is up to date with respect to whatever the Free Software Foundation
325 This is also useful when reporting bugs.
327 .IR "GNU Coding Standards" ,
328 these options cause an immediate, successful exit.)
331 Signal the end of options. This is useful to allow further arguments to the
332 AWK program itself to start with a ``\-''.
333 This is mainly for consistency with the argument parsing convention used
334 by most other \*(PX programs.
336 In compatibility mode,
337 any other options are flagged as illegal, but are otherwise ignored.
338 In normal operation, as long as program text has been supplied, unknown
339 options are passed on to the AWK program in the
341 array for processing. This is particularly useful for running AWK
342 programs via the ``#!'' executable interpreter mechanism.
343 .SH AWK PROGRAM EXECUTION
345 An AWK program consists of a sequence of pattern-action statements
346 and optional function definitions.
349 \fIpattern\fB { \fIaction statements\fB }\fR
351 \fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
355 first reads the program source from the
360 or from the first non-option argument on the command line.
365 options may be used multiple times on the command line.
367 will read the program text as if all the
369 and command line source texts
370 had been concatenated together. This is useful for building libraries
371 of AWK functions, without having to include them in each new AWK
372 program that uses them. It also provides the ability to mix library
373 functions with command line programs.
375 The environment variable
377 specifies a search path to use when finding source files named with
380 option. If this variable does not exist, the default path is
381 \fB".:/usr/local/share/awk"\fR.
382 (The actual directory may vary, depending upon how
384 was built and installed.)
385 If a file name given to the
387 option contains a ``/'' character, no path search is performed.
390 executes AWK programs in the following order.
392 all variable assignments specified via the
394 option are performed.
397 compiles the program into an internal form.
400 executes the code in the
403 and then proceeds to read
404 each file named in the
407 If there are no files named on the command line,
409 reads the standard input.
411 If a filename on the command line has the form
413 it is treated as a variable assignment. The variable
415 will be assigned the value
417 (This happens after any
419 block(s) have been run.)
420 Command line variable assignment
421 is most useful for dynamically assigning values to the variables
422 AWK uses to control how input is broken into fields and records. It
423 is also useful for controlling state if multiple passes are needed over
426 If the value of a particular element of
432 For each record in the input,
434 tests to see if it matches any
437 For each pattern that the record matches, the associated
440 The patterns are tested in the order they occur in the program.
442 Finally, after all the input is exhausted,
444 executes the code in the
447 .SH VARIABLES, RECORDS AND FIELDS
448 AWK variables are dynamic; they come into existence when they are
449 first used. Their values are either floating-point numbers or strings,
451 depending upon how they are used. AWK also has one dimensional
452 arrays; arrays with multiple dimensions may be simulated.
453 Several pre-defined variables are set as a program
454 runs; these will be described as needed and summarized below.
456 Normally, records are separated by newline characters. You can control how
457 records are separated by assigning values to the built-in variable
461 is any single character, that character separates records.
464 is a regular expression. Text in the input that matches this
465 regular expression will separate the record.
466 However, in compatibility mode,
467 only the first character of its string
468 value is used for separating records.
471 is set to the null string, then records are separated by
475 is set to the null string, the newline character always acts as
476 a field separator, in addition to whatever value
481 As each input record is read,
483 splits the record into
485 using the value of the
487 variable as the field separator.
490 is a single character, fields are separated by that character.
493 is the null string, then each individual character becomes a
497 is expected to be a full regular expression.
498 In the special case that
500 is a single space, fields are separated
501 by runs of spaces and/or tabs and/or newlines.
502 (But see the discussion of
505 Note that the value of
507 (see below) will also affect how fields are split when
509 is a regular expression, and how records are separated when
511 is a regular expression.
515 variable is set to a space separated list of numbers, each field is
516 expected to have fixed width, and
518 will split up the record using the specified widths. The value of
521 Assigning a new value to
525 and restores the default behavior.
527 Each field in the input record may be referenced by its position,
532 is the whole record. The value of a field may be assigned to as well.
533 Fields need not be referenced by constants:
543 prints the fifth field in the input record.
546 is set to the total number of fields in the input record.
548 References to non-existent fields (i.e. fields after
550 produce the null-string. However, assigning to a non-existent field
553 will increase the value of
555 create any intervening fields with the null string as their value, and
558 to be recomputed, with the fields being separated by the value of
560 References to negative numbered fields cause a fatal error.
563 causes the values of fields past the new value to be lost, and the value of
565 to be recomputed, with the fields being separated by the value of
567 .SS Built-in Variables
570 built-in variables are:
572 .TP \w'\fBFIELDWIDTHS\fR'u+1n
574 The number of command line arguments (does not include options to
576 or the program source).
581 of the current file being processed.
584 Array of command line arguments. The array is indexed from
588 Dynamically changing the contents of
590 can control the files used for data.
593 The conversion format for numbers, \fB"%.6g"\fR, by default.
596 An array containing the values of the current environment.
597 The array is indexed by the environment variables, each element being
598 the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
600 Changing this array does not affect the environment seen by programs which
602 spawns via redirection or the
605 (This may change in a future version of
607 .\" but don't hold your breath...
610 If a system error occurs either doing a redirection for
619 a string describing the error.
622 A white-space separated list of fieldwidths. When set,
624 parses the input into fields of fixed width, instead of using the
627 variable as the field separator.
628 The fixed field width facility is still experimental; the
629 semantics may change as
634 The name of the current input file.
635 If no files are specified on the command line, the value of
640 is undefined inside the
645 The input record number in the current input file.
648 The input field separator, a space by default. See
653 Controls the case-sensitivity of all regular expression
654 and string operations. If
656 has a non-zero value, then string comparisons and
657 pattern matching in rules,
660 record separating with
675 pre-defined functions will all ignore case when doing regular expression
678 is not equal to zero,
680 matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
682 As with all AWK variables, the initial value of
684 is zero, so all regular expression and string
685 operations are normally case-sensitive.
686 Under Unix, the full ISO 8859-1 Latin-1 character set is used
693 only affected regular expression operations. It now affects string
697 The number of fields in the current input record.
700 The total number of input records seen so far.
703 The output format for numbers, \fB"%.6g"\fR, by default.
706 The output field separator, a space by default.
709 The output record separator, by default a newline.
712 The input record separator, by default a newline.
715 The record terminator.
719 to the input text that matched the character or regular expression
724 The index of the first character matched by
729 The length of the string matched by
734 The character used to separate multiple subscripts in array
735 elements, by default \fB"\e034"\fR.
738 Arrays are subscripted with an expression between square brackets
740 If the expression is an expression list
741 .RI ( expr ", " expr " ...)"
742 then the array subscript is a string consisting of the
743 concatenation of the (string) value of each expression,
744 separated by the value of the
747 This facility is used to simulate multiply dimensioned
752 i = "A";\^ j = "B";\^ k = "C"
754 x[i, j, k] = "hello, world\en"
758 assigns the string \fB"hello, world\en"\fR to the element of the array
760 which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK
761 are associative, i.e. indexed by string values.
769 statement to see if an array has an index consisting of a particular
781 If the array has multiple subscripts, use
782 .BR "(i, j) in array" .
786 construct may also be used in a
788 loop to iterate over all the elements of an array.
790 An element may be deleted from an array using the
795 statement may also be used to delete the entire contents of an array,
796 just by specifying the array name without a subscript.
797 .SS Variable Typing And Conversion
800 may be (floating point) numbers, or strings, or both. How the
801 value of a variable is interpreted depends upon its context. If used in
802 a numeric expression, it will be treated as a number, if used as a string
803 it will be treated as a string.
805 To force a variable to be treated as a number, add 0 to it; to force it
806 to be treated as a string, concatenate it with the null string.
808 When a string must be converted to a number, the conversion is accomplished
811 A number is converted to a string by using the value of
813 as a format string for
815 with the numeric value of the variable as the argument.
816 However, even though all numbers in AWK are floating-point,
819 converted as integers. Thus, given
833 has a string value of \fB"12"\fR and not \fB"12.00"\fR.
836 performs comparisons as follows:
837 If two variables are numeric, they are compared numerically.
838 If one value is numeric and the other has a string value that is a
839 ``numeric string,'' then comparisons are also done numerically.
840 Otherwise, the numeric value is converted to a string and a string
841 comparison is performed.
842 Two strings are compared, of course, as strings.
843 According to the \*(PX standard, even if two strings are
844 numeric strings, a numeric comparison is performed. However, this is
845 clearly incorrect, and
849 Note that string constants, such as \fB"57"\fP, are
851 numeric strings, they are string constants. The idea of ``numeric string''
852 only applies to fields,
859 elements and the elements of an array created by
861 that are numeric strings.
862 The basic idea is that
864 and only user input, that looks numeric,
865 should be treated that way.
867 Uninitialized variables have the numeric value 0 and the string value ""
868 (the null, or empty, string).
869 .SH PATTERNS AND ACTIONS
870 AWK is a line oriented language. The pattern comes first, and then the
871 action. Action statements are enclosed in
875 Either the pattern may be missing, or the action may be missing, but,
876 of course, not both. If the pattern is missing, the action will be
877 executed for every single record of input.
878 A missing action is equivalent to
884 which prints the entire record.
886 Comments begin with the ``#'' character, and continue until the
888 Blank lines may be used to separate statements.
889 Normally, a statement ends with a newline, however, this is not the
890 case for lines ending in
902 also have their statements automatically continued on the following line.
903 In other cases, a line can be continued by ending it with a ``\e'',
904 in which case the newline will be ignored.
906 Multiple statements may
907 be put on one line by separating them with a ``;''.
908 This applies to both the statements within the action part of a
909 pattern-action pair (the usual case),
910 and to the pattern-action statements themselves.
912 AWK patterns may be one of the following:
918 .BI / "regular expression" /
919 .I "relational expression"
920 .IB pattern " && " pattern
921 .IB pattern " || " pattern
922 .IB pattern " ? " pattern " : " pattern
925 .IB pattern1 ", " pattern2
932 are two special kinds of patterns which are not tested against
934 The action parts of all
936 patterns are merged as if all the statements had
937 been written in a single
939 block. They are executed before any
940 of the input is read. Similarly, all the
943 and executed when all the input is exhausted (or when an
945 statement is executed).
949 patterns cannot be combined with other patterns in pattern expressions.
953 patterns cannot have missing action parts.
956 .BI / "regular expression" /
957 patterns, the associated statement is executed for each input record that matches
958 the regular expression.
959 Regular expressions are the same as those in
961 and are summarized below.
964 .I "relational expression"
965 may use any of the operators defined below in the section on actions.
966 These generally test whether certain fields match certain regular expressions.
973 operators are logical AND, logical OR, and logical NOT, respectively, as in C.
974 They do short-circuit evaluation, also as in C, and are used for combining
975 more primitive pattern expressions. As in most languages, parentheses
976 may be used to change the order of evaluation.
980 operator is like the same operator in C. If the first pattern is true
981 then the pattern used for testing is the second pattern, otherwise it is
982 the third. Only one of the second and third patterns is evaluated.
985 .IB pattern1 ", " pattern2
986 form of an expression is called a
987 .IR "range pattern" .
988 It matches all input records starting with a record that matches
990 and continuing until a record that matches
992 inclusive. It does not combine with any other sort of pattern expression.
993 .SS Regular Expressions
994 Regular expressions are the extended kind found in
996 They are composed of characters as follows:
997 .TP \w'\fB[^\fIabc...\fB]\fR'u+2n
999 matches the non-metacharacter
1003 matches the literal character
1007 matches any character
1012 matches the beginning of a string.
1015 matches the end of a string.
1018 character list, matches any of the characters
1022 negated character list, matches any character except
1026 alternation: matches either
1032 concatenation: matches
1042 matches zero or more
1061 One or two numbers inside braces denote an
1062 .IR "interval expression" .
1063 If there is one number in the braces, the preceding regexp
1067 times. If there are two numbers separated by a comma,
1074 If there is one number followed by a comma, then
1076 is repeated at least
1080 Interval expressions are only available if either
1083 .B \-\^\-re\-interval
1084 is specified on the command line.
1087 matches the empty string at either the beginning or the
1091 matches the empty string within a word.
1094 matches the empty string at the beginning of a word.
1097 matches the empty string at the end of a word.
1100 matches any word-constituent character (letter, digit, or underscore).
1103 matches any character that is not word-constituent.
1106 matches the empty string at the beginning of a buffer (string).
1109 matches the empty string at the end of a buffer.
1111 The escape sequences that are valid in string constants (see below)
1112 are also legal in regular expressions.
1114 .I "Character classes"
1115 are a new feature introduced in the POSIX standard.
1116 A character class is a special notation for describing
1117 lists of characters that have a specific attribute, but where the
1118 actual characters themselves can vary from country to country and/or
1119 from character set to character set. For example, the notion of what
1120 is an alphabetic character differs in the USA and in France.
1122 A character class is only valid in a regexp
1124 the brackets of a character list. Character classes consist of
1126 a keyword denoting the class, and
1128 Here are the character
1129 classes defined by the POSIX standard.
1132 Alphanumeric characters.
1135 Alphabetic characters.
1138 Space or tab characters.
1147 Characters that are both printable and visible.
1148 (A space is printable, but not visible, while an
1153 Lower-case alphabetic characters.
1156 Printable characters (characters that are not control characters.)
1159 Punctuation characters (characters that are not letter, digits,
1160 control characters, or space characters).
1163 Space characters (such as space, tab, and formfeed, to name a few).
1166 Upper-case alphabetic characters.
1169 Characters that are hexadecimal digits.
1171 For example, before the POSIX standard, to match alphanumeric
1172 characters, you would have had to write
1173 .BR /[A\-Za\-z0\-9]/ .
1174 If your character set had other alphabetic characters in it, this would not
1175 match them. With the POSIX character classes, you can write
1179 the alphabetic and numeric characters in your character set.
1181 Two additional special sequences can appear in character lists.
1182 These apply to non-ASCII character sets, which can have single symbols
1184 .IR "collating elements" )
1185 that are represented with more than one
1186 character, as well as several characters that are equivalent for
1188 or sorting, purposes. (E.g., in French, a plain ``e''
1189 and a grave-accented e\` are equivalent.)
1192 A collating symbols is a multi-character collating element enclosed in
1198 is a collating element, then
1200 is a regexp that matches this collating element, while
1202 is a regexp that matches either
1208 An equivalence class is a locale-specific name for a list of
1209 characters that are equivalent. The name is enclosed in
1213 For example, the name
1215 might be used to represent all of
1216 ``e,'' ``e\`,'' and ``e\`.''
1226 These features are very valuable in non-English speaking locales.
1227 The library functions that
1229 uses for regular expression matching
1230 currently only recognize POSIX character classes; they do not recognize
1231 collating symbols or equivalence classes.
1243 operators are specific to
1245 they are extensions based on facilities in the GNU regexp libraries.
1247 The various command line options
1250 interprets characters in regexps.
1253 In the default case,
1255 provide all the facilities of
1256 POSIX regexps and the GNU regexp operators described above.
1257 However, interval expressions are not supported.
1260 Only POSIX regexps are supported, the GNU operators are not special.
1265 Interval expressions are allowed.
1267 .B \-\^\-traditional
1270 regexps are matched. The GNU operators
1271 are not special, interval expressions are not available, and neither
1272 are the POSIX character classes
1275 Characters described by octal and hexadecimal escape sequences are
1276 treated literally, even if they represent regexp metacharacters.
1278 .B \-\^\-re\-interval
1279 Allow interval expressions in regexps, even if
1280 .B \-\^\-traditional
1283 Action statements are enclosed in braces,
1287 Action statements consist of the usual assignment, conditional, and looping
1288 statements found in most languages. The operators, control statements,
1289 and input/output statements
1290 available are patterned after those in C.
1293 The operators in AWK, in order of decreasing precedence, are
1295 .TP "\w'\fB*= /= %= ^=\fR'u+1n"
1303 Increment and decrement, both prefix and postfix.
1306 Exponentiation (\fB**\fR may also be used, and \fB**=\fR for
1307 the assignment operator).
1310 Unary plus, unary minus, and logical negation.
1313 Multiplication, division, and modulus.
1316 Addition and subtraction.
1319 String concatenation.
1329 The regular relational operators.
1332 Regular expression match, negated match.
1334 Do not use a constant regular expression
1336 on the left-hand side of a
1340 Only use one on the right-hand side. The expression
1342 has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR.
1357 The C conditional expression. This has the form
1358 .IB expr1 " ? " expr2 " : " expr3\c
1361 is true, the value of the expression is
1376 Assignment. Both absolute assignment
1377 .BI ( var " = " value )
1378 and operator-assignment (the other forms) are supported.
1379 .SS Control Statements
1381 The control statements are
1386 \fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
1387 \fBwhile (\fIcondition\fB) \fIstatement \fR
1388 \fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
1389 \fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
1390 \fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
1393 \fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
1394 \fBdelete \fIarray\^\fR
1395 \fBexit\fR [ \fIexpression\fR ]
1396 \fB{ \fIstatements \fB}
1399 .SS "I/O Statements"
1401 The input/output statements are as follows:
1403 .TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n"
1405 Close file (or pipe, see below).
1410 from next input record; set
1415 .BI "getline <" file
1426 from next input record; set
1430 .BI getline " var" " <" file
1437 Stop processing the current input record. The next input record
1438 is read and processing starts over with the first pattern in the
1439 AWK program. If the end of the input data is reached, the
1441 block(s), if any, are executed.
1444 Stop processing the current input file. The next input record read
1445 comes from the next input file.
1451 is reset to 1, and processing starts over with the first pattern in the
1452 AWK program. If the end of the input data is reached, the
1454 block(s), if any, are executed.
1456 Earlier versions of gawk used
1458 as two words. While this usage is still recognized, it generates a
1459 warning message and will eventually be removed.
1462 Prints the current record.
1463 The output record is terminated with the value of the
1467 .BI print " expr-list"
1469 Each expression is separated by the value of the
1472 The output record is terminated with the value of the
1476 .BI print " expr-list" " >" file
1477 Prints expressions on
1479 Each expression is separated by the value of the
1481 variable. The output record is terminated with the value of the
1485 .BI printf " fmt, expr-list"
1488 .BI printf " fmt, expr-list" " >" file
1492 .BI system( cmd-line )
1495 and return the exit status.
1496 (This may not be available on non-\*(PX systems.)
1498 \&\fBfflush(\fR[\fIfile\^\fR]\fB)\fR
1499 Flush any buffers associated with the open output file or pipe
1503 is missing, then standard output is flushed.
1507 then all open output files and pipes
1508 have their buffers flushed.
1510 Other input/output redirections are also allowed. For
1515 appends output to the
1520 In a similar fashion,
1521 .IB command " | getline"
1526 command will return 0 on end of file, and \-1 on an error.
1527 .SS The \fIprintf\fP\^ Statement
1529 The AWK versions of the
1535 accept the following conversion specification formats:
1538 An \s-1ASCII\s+1 character.
1539 If the argument used for
1541 is numeric, it is treated as a character and printed.
1542 Otherwise, the argument is assumed to be a string, and the only first
1543 character of that string is printed.
1550 A decimal number (the integer part).
1557 A floating point number of the form
1558 .BR [\-]d.dddddde[+\^\-]dd .
1567 A floating point number of the form
1568 .BR [\-]ddd.dddddd .
1579 conversion, whichever is shorter, with nonsignificant zeros suppressed.
1588 An unsigned octal number (again, an integer).
1598 An unsigned hexadecimal number (an integer).
1609 character; no argument is converted.
1611 There are optional, additional parameters that may lie between the
1613 and the control letter:
1616 The expression should be left-justified within its field.
1619 For numeric conversions, prefix positive values with a space, and
1620 negative values with a minus sign.
1623 The plus sign, used before the width modifier (see below),
1624 says to always supply a sign for numeric conversions, even if the data
1625 to be formatted is positive. The
1627 overrides the space modifier.
1630 Use an ``alternate form'' for certain control letters.
1633 supply a leading zero.
1649 the result will always contain a
1655 trailing zeros are not removed from the result.
1660 (zero) acts as a flag, that indicates output should be
1661 padded with zeroes instead of spaces.
1662 This applies even to non-numeric output formats.
1663 This flag only has an effect when the field width is wider than the
1664 value to be printed.
1667 The field should be padded to this width. The field is normally padded
1670 flag has been used, it is padded with zeroes.
1673 A number that specifies the precision to use when printing.
1679 formats, this specifies the
1680 number of digits you want printed to the right of the decimal point.
1685 formats, it specifies the maximum number
1686 of significant digits. For the
1694 formats, it specifies the minimum number of
1695 digits to print. For a string, it specifies the maximum number of
1696 characters from the string that should be printed.
1702 capabilities of the \*(AN C
1704 routines are supported.
1707 in place of either the
1711 specifications will cause their values to be taken from
1712 the argument list to
1716 .SS Special File Names
1718 When doing I/O redirection from either
1727 recognizes certain special filenames internally. These filenames
1728 allow access to open file descriptors inherited from
1730 parent process (usually the shell).
1731 Other special filenames provide access to information about the running
1735 .TP \w'\fB/dev/stdout\fR'u+1n
1737 Reading this file returns the process ID of the current process,
1738 in decimal, terminated with a newline.
1741 Reading this file returns the parent process ID of the current process,
1742 in decimal, terminated with a newline.
1745 Reading this file returns the process group ID of the current process,
1746 in decimal, terminated with a newline.
1749 Reading this file returns a single record terminated with a newline.
1750 The fields are separated with spaces.
1767 If there are any additional fields, they are the group IDs returned by
1769 Multiple groups may not be supported on all systems.
1775 The standard output.
1778 The standard error output.
1781 The file associated with the open file descriptor
1784 These are particularly useful for error messages. For example:
1788 print "You blew it!" > "/dev/stderr"
1792 whereas you would otherwise have to use
1796 print "You blew it!" | "cat 1>&2"
1800 These file names may also be used on the command line to name data files.
1801 .SS Numeric Functions
1803 AWK has the following pre-defined arithmetic functions:
1805 .TP \w'\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR'u+1n
1806 .BI atan2( y , " x" )
1807 returns the arctangent of
1812 returns the cosine of
1814 which is in radians.
1817 the exponential function.
1820 truncates to integer.
1823 the natural logarithm function.
1826 returns a random number between 0 and 1.
1831 which is in radians.
1834 the square root function.
1836 \&\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR
1839 as a new seed for the random number generator. If no
1841 is provided, the time of day will be used.
1842 The return value is the previous seed for the random
1844 .SS String Functions
1847 has the following pre-defined string functions:
1849 .TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
1850 \fBgensub(\fIr\fB, \fIs\fB, \fIh \fR[\fB, \fIt\fR]\fB)\fR
1851 search the target string
1853 for matches of the regular expression
1857 is a string beginning with
1861 then replace all matches of
1867 is a number indicating which match of
1875 Within the replacement text
1881 is a digit from 1 to 9, may be used to indicate just the text that
1884 parenthesized subexpression. The sequence
1886 represents the entire matched text, as does the character
1892 the modified string is returned as the result of the function,
1893 and the original target string is
1896 .TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
1897 \fBgsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
1898 for each substring matching the regular expression
1902 substitute the string
1904 and return the number of substitutions.
1907 is not supplied, use
1911 in the replacement text is replaced with the text that was actually matched.
1917 .I "AWK Language Programming"
1918 for a fuller discussion of the rules for
1920 and backslashes in the replacement text of
1926 .BI index( s , " t" )
1927 returns the index of the string
1935 \fBlength(\fR[\fIs\fR]\fB)
1936 returns the length of the string
1944 .BI match( s , " r" )
1945 returns the position in
1947 where the regular expression
1951 is not present, and sets the values of
1956 \fBsplit(\fIs\fB, \fIa \fR[\fB, \fIr\fR]\fB)\fR
1961 on the regular expression
1963 and returns the number of fields. If
1971 Splitting behaves identically to field splitting, described above.
1973 .BI sprintf( fmt , " expr-list" )
1978 and returns the resulting string.
1980 \fBsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
1983 but only the first matching substring is replaced.
1985 \fBsubstr(\fIs\fB, \fIi \fR[\fB, \fIn\fR]\fB)\fR
1994 is omitted, the rest of
1999 returns a copy of the string
2001 with all the upper-case characters in
2003 translated to their corresponding lower-case counterparts.
2004 Non-alphabetic characters are left unchanged.
2007 returns a copy of the string
2009 with all the lower-case characters in
2011 translated to their corresponding upper-case counterparts.
2012 Non-alphabetic characters are left unchanged.
2015 Since one of the primary uses of AWK programs is processing log files
2016 that contain time stamp information,
2018 provides the following two functions for obtaining time stamps and
2021 .TP "\w'\fBsystime()\fR'u+1n"
2023 returns the current time of day as the number of seconds since the Epoch
2024 (Midnight UTC, January 1, 1970 on \*(PX systems).
2026 \fBstrftime(\fR[\fIformat \fR[\fB, \fItimestamp\fR]]\fB)\fR
2029 according to the specification in
2033 should be of the same form as returned by
2037 is missing, the current time of day is used.
2040 is missing, a default format equivalent to the output of
2043 See the specification for the
2045 function in \*(AN C for the format conversions that are
2046 guaranteed to be available.
2047 A public-domain version of
2049 and a man page for it come with
2051 if that version was used to build
2053 then all of the conversions described in that man page are available to
2055 .SS String Constants
2057 String constants in AWK are sequences of characters enclosed
2058 between double quotes (\fB"\fR). Within strings, certain
2059 .I "escape sequences"
2060 are recognized, as in C. These are:
2062 .TP \w'\fB\e\^\fIddd\fR'u+1n
2064 A literal backslash.
2067 The ``alert'' character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character.
2087 .BI \ex "\^hex digits"
2088 The character represented by the string of hexadecimal digits following
2091 As in \*(AN C, all following hexadecimal digits are considered part of
2092 the escape sequence.
2093 (This feature should tell us something about language design by committee.)
2094 E.g., \fB"\ex1B"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
2097 The character represented by the 1-, 2-, or 3-digit sequence of octal
2098 digits. E.g. \fB"\e033"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
2101 The literal character
2104 The escape sequences may also be used inside constant regular expressions
2106 .B "/[\ \et\ef\en\er\ev]/"
2107 matches whitespace characters).
2109 In compatibility mode, the characters represented by octal and
2110 hexadecimal escape sequences are treated literally when used in
2111 regexp constants. Thus,
2116 Functions in AWK are defined as follows:
2119 \fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
2122 Functions are executed when they are called from within expressions
2123 in either patterns or actions. Actual parameters supplied in the function
2124 call are used to instantiate the formal parameters declared in the function.
2125 Arrays are passed by reference, other variables are passed by value.
2127 Since functions were not originally part of the AWK language, the provision
2128 for local variables is rather clumsy: They are declared as extra parameters
2129 in the parameter list. The convention is to separate local variables from
2130 real parameters by extra spaces in the parameter list. For example:
2135 function f(p, q, a, b) # a & b are local
2140 /abc/ { ... ; f(1, 2) ; ... }
2145 The left parenthesis in a function call is required
2146 to immediately follow the function name,
2147 without any intervening white space.
2148 This is to avoid a syntactic ambiguity with the concatenation operator.
2149 This restriction does not apply to the built-in functions listed above.
2151 Functions may call each other and may be recursive.
2152 Function parameters used as local variables are initialized
2153 to the null string and the number zero upon function invocation.
2157 to return a value from a function. The return value is undefined if no
2158 value is provided, or if the function returns by ``falling off'' the
2165 will warn about calls to undefined functions at parse time,
2166 instead of at run time.
2167 Calling an undefined function at run time is a fatal error.
2171 may be used in place of
2175 Print and sort the login names of all users:
2179 { print $1 | "sort" }
2182 Count lines in a file:
2186 END { print nlines }
2189 Precede each line by its number in the file:
2195 Concatenate and line number (a variation on a theme):
2212 .IR "The AWK Programming Language" ,
2213 Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
2214 Addison-Wesley, 1988. ISBN 0-201-07981-X.
2216 .IR "AWK Language Programming" ,
2217 Edition 1.0, published by the Free Software Foundation, 1995.
2218 .SH POSIX COMPATIBILITY
2221 is compatibility with the \*(PX standard, as well as with the
2222 latest version of \*(UX
2226 incorporates the following user visible
2227 features which are not described in the AWK book,
2228 but are part of the Bell Labs version of
2230 and are in the \*(PX standard.
2234 option for assigning variables before program execution starts is new.
2235 The book indicates that command line variable assignment happens when
2237 would otherwise open the argument as a file, which is after the
2239 block is executed. However, in earlier implementations, when such an
2240 assignment appeared before any file names, the assignment would happen
2244 block was run. Applications came to depend on this ``feature.''
2247 was changed to match its documentation, this option was added to
2248 accommodate applications that depended upon the old behavior.
2249 (This feature was agreed upon by both the AT&T and GNU developers.)
2253 option for implementation specific features is from the \*(PX standard.
2255 When processing arguments,
2257 uses the special option ``\fB\-\^\-\fP'' to signal the end of
2259 In compatibility mode, it will warn about, but otherwise ignore,
2261 In normal operation, such arguments are passed on to the AWK program for
2264 The AWK book does not define the return value of
2267 has it return the seed it was using, to allow keeping track
2268 of random number sequences. Therefore
2272 also returns its current seed.
2274 Other new features are:
2285 escape sequences (done originally in
2287 and fed back into AT&T's); the
2291 built-in functions (from AT&T); and the \*(AN C conversion specifications in
2293 (done first in AT&T's version).
2296 has a number of extensions to \*(PX
2298 They are described in this section. All the extensions described here
2303 .B \-\^\-traditional
2306 The following features of
2308 are not available in
2336 The special file names available for I/O redirection are not recognized.
2344 variables are not special.
2349 variable and its side-effects are not available.
2354 variable and fixed-width field splitting.
2359 as a regular expression.
2362 The ability to split out individual characters using the null string
2365 and as the third argument to
2369 No path search is performed for files named via the
2371 option. Therefore the
2373 environment variable is not special.
2378 to abandon processing of the current input file.
2383 to delete the entire contents of an array.
2386 The AWK book does not define the return value of the
2391 returns the value from
2395 when closing a file or pipe, respectively.
2400 .B \-\^\-traditional
2406 option is ``t'', then
2408 will be set to the tab character.
2410 .B "gawk \-F\et \&..."
2411 simply causes the shell to quote the ``t,'', and does not pass
2415 Since this is a rather ugly special case, it is not the default behavior.
2416 This behavior also does not occur if
2419 To really get a tab character as the field separator, it is best to use
2421 .BR "gawk \-F'\et' \&..." .
2426 was compiled for debugging, it will
2427 accept the following additional options:
2438 debugging output during program parsing.
2439 This option should only be of interest to the
2441 maintainers, and may not even be compiled into
2444 .SH HISTORICAL FEATURES
2445 There are two features of historical AWK implementations that
2448 First, it is possible to call the
2450 built-in function not only with no argument, but even without parentheses!
2455 a = length # Holy Algol 60, Batman!
2459 is the same as either of
2469 This feature is marked as ``deprecated'' in the \*(PX standard, and
2471 will issue a warning about its use if
2473 is specified on the command line.
2475 The other feature is the use of either the
2479 statements outside the body of a
2484 loop. Traditional AWK implementations have treated such usage as
2489 will support this usage if
2490 .B \-\^\-traditional
2495 exists in the environment, then
2497 behaves exactly as if
2499 had been specified on the command line.
2504 will issue a warning message to this effect.
2508 environment variable can be used to provide a list of directories that
2510 will search when looking for files named via the
2518 option is not necessary given the command line variable assignment feature;
2519 it remains only for backwards compatibility.
2521 If your system actually has support for
2528 files, you may get different output from
2530 than you would get on a system without those files. When
2532 interprets these files internally, it synchronizes output to the standard
2533 output with output to
2535 while on a system with those files, the output is actually to different
2539 Syntactically invalid single character programs tend to overflow
2540 the parse stack, generating a rather unhelpful message. Such programs
2541 are surprisingly difficult to diagnose in the completely general case,
2542 and the effort to do so really is not worth it.
2543 .SH VERSION INFORMATION
2544 This man page documents
2548 The original version of \*(UX
2550 was designed and implemented by Alfred Aho,
2551 Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
2552 continues to maintain and enhance it.
2554 Paul Rubin and Jay Fenlason,
2555 of the Free Software Foundation, wrote
2557 to be compatible with the original version of
2559 distributed in Seventh Edition \*(UX.
2560 John Woods contributed a number of bug fixes.
2561 David Trueman, with contributions
2562 from Arnold Robbins, made
2564 compatible with the new version of \*(UX
2566 Arnold Robbins is the current maintainer.
2568 The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
2569 Scott Deifik is the current DOS maintainer. Pat Rankin did the
2570 port to VMS, and Michal Jaegermann did the port to the Atari ST.
2571 The port to OS/2 was done by Kai Uwe Rommel, with contributions and
2572 help from Darrel Hankerson. Fred Fish supplied support for the Amiga.
2574 If you find a bug in
2576 please send electronic mail to
2577 .BR bug-gnu-utils@gnu.org ,
2580 .BR arnold@gnu.org .
2581 Please include your operating system and its revision, the version of
2583 what C compiler you used to compile it, and a test program
2584 and data that are as small as possible for reproducing the problem.
2586 Before sending a bug report, please do two things. First, verify that
2587 you have the latest version of
2589 Many bugs (usually subtle ones) are fixed at each release, and if
2590 yours is out of date, the problem may already have been solved.
2591 Second, please read this man page and the reference manual carefully to
2592 be sure that what you think is a bug really is, instead of just a quirk
2597 post a bug report in
2601 developers occasionally read this newsgroup, posting bug reports there
2602 is an unreliable way to report bugs. Instead, please use the electronic mail
2603 addresses given above.
2604 .SH ACKNOWLEDGEMENTS
2605 Brian Kernighan of Bell Labs
2606 provided valuable assistance during testing and debugging.
2608 .SH COPYING PERMISSIONS
2609 Copyright \(co) 1996,97,98,99 Free Software Foundation, Inc.
2611 Permission is granted to make and distribute verbatim copies of
2612 this manual page provided the copyright notice and this permission
2613 notice are preserved on all copies.
2615 Permission is granted to process this file through troff and print the
2616 results, provided the printed document carries copying permission
2617 notice identical to this one except for the removal of this paragraph
2618 (this paragraph not being relevant to the printed manual page).
2621 Permission is granted to copy and distribute modified versions of this
2622 manual page under the conditions for verbatim copying, provided that
2623 the entire resulting derived work is distributed under the terms of a
2624 permission notice identical to this one.
2626 Permission is granted to copy and distribute translations of this
2627 manual page into another language, under the above conditions for
2628 modified versions, except that this permission notice may be stated in
2629 a translation approved by the Foundation.