1 .\" $FreeBSD: src/contrib/awk/doc/awk.1,v 1.5 1999/09/27 08:57:04 sheldonh Exp $
5 .TH GAWK 1 "Apr 28 1999" "Free Software Foundation" "Utility Commands"
7 gawk \- pattern scanning and processing language
10 [ POSIX or GNU style options ]
18 [ POSIX or GNU style options ]
26 is the GNU Project's implementation of the AWK programming language.
27 It conforms to the definition of the language in
28 the \*(PX 1003.2 Command Language And Utilities Standard.
29 This version in turn is based on the description in
30 .IR "The AWK Programming Language" ,
31 by Aho, Kernighan, and Weinberger,
32 with the additional features found in the System V Release 4 version
36 also provides more recent Bell Labs
38 extensions, and some GNU-specific extensions.
40 The command line consists of options to
42 itself, the AWK program text (if not supplied via the
46 options), and values to be made
51 pre-defined AWK variables.
55 options may be either the traditional \*(PX one letter options,
56 or the GNU style long options. \*(PX options start with a single ``\-'',
57 while long options start with ``\-\^\-''.
58 Long options are provided for both GNU-specific features and
59 for \*(PX mandated features.
61 Following the \*(PX standard,
63 options are supplied via arguments to the
67 options may be supplied
70 option has a corresponding long option, as detailed below.
71 Arguments to long options are either joined with the option
74 sign, with no intervening spaces, or they may be provided in the
75 next command line argument.
76 Long options may be abbreviated, as long as the abbreviation
81 accepts the following options.
87 .BI \-\^\-field-separator " fs"
90 for the input field separator (the value of the
96 \fB\-v\fI var\fB\^=\^\fIval\fR
99 \fB\-\^\-assign \fIvar\fB\^=\^\fIval\fR
104 before execution of the program begins.
105 Such variable values are available to the
107 block of an AWK program.
110 .BI \-f " program-file"
113 .BI \-\^\-file " program-file"
114 Read the AWK program source from the file
116 instead of from the first command line argument.
128 Set various memory limits to the value
132 flag sets the maximum number of fields, and the
134 flag sets the maximum record size. These two flags and the
136 option are from the Bell Labs research version of \*(UX
142 has no pre-defined limits.
157 mode. In compatibility mode,
159 behaves identically to \*(UX
161 none of the GNU-specific extensions are recognized.
164 is preferred over the other forms of this option.
166 .BR "GNU EXTENSIONS" ,
167 below, for more information.
180 Print the short version of the GNU copyright information message on
181 the standard output, and exits successfully.
194 Print a relatively short summary of the available options on
197 .IR "GNU Coding Standards" ,
198 these options cause an immediate, successful exit.)
205 Provide warnings about constructs that are
206 dubious or non-portable to other AWK implementations.
213 Provide warnings about constructs that are
214 not portable to the original version of Unix
217 .\" This option is left undocumented, on purpose.
224 Provide a moment of nostalgia for long time
236 mode, with the following additional restrictions:
241 escape sequences are not recognized.
244 Only space and tab act as field separators when
246 is set to a single space, newline does not.
260 cannot be used in place of
268 function is not available.
272 .B "\-W re\-interval"
275 .B \-\^\-re\-interval
277 .I "interval expressions"
278 in regular expression matching
280 .BR "Regular Expressions" ,
282 Interval expressions were not traditionally available in the
283 AWK language. The POSIX standard added them, to make
287 consistent with each other.
288 However, their use is likely
289 to break old AWK programs, so
291 only provides them if they are requested with this option, or when
296 .BI "\-W source " program-text
299 .BI \-\^\-source " program-text"
302 as AWK program source code.
303 This option allows the easy intermixing of library functions (used via the
307 options) with source code entered on the command line.
308 It is intended primarily for medium to large AWK programs used
316 Print version information for this particular copy of
318 on the standard output.
319 This is useful mainly for knowing if the current copy of
322 is up to date with respect to whatever the Free Software Foundation
324 This is also useful when reporting bugs.
326 .IR "GNU Coding Standards" ,
327 these options cause an immediate, successful exit.)
330 Signal the end of options. This is useful to allow further arguments to the
331 AWK program itself to start with a ``\-''.
332 This is mainly for consistency with the argument parsing convention used
333 by most other \*(PX programs.
335 In compatibility mode,
336 any other options are flagged as illegal, but are otherwise ignored.
337 In normal operation, as long as program text has been supplied, unknown
338 options are passed on to the AWK program in the
340 array for processing. This is particularly useful for running AWK
341 programs via the ``#!'' executable interpreter mechanism.
342 .SH AWK PROGRAM EXECUTION
344 An AWK program consists of a sequence of pattern-action statements
345 and optional function definitions.
348 \fIpattern\fB { \fIaction statements\fB }\fR
350 \fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
354 first reads the program source from the
359 or from the first non-option argument on the command line.
364 options may be used multiple times on the command line.
366 will read the program text as if all the
368 and command line source texts
369 had been concatenated together. This is useful for building libraries
370 of AWK functions, without having to include them in each new AWK
371 program that uses them. It also provides the ability to mix library
372 functions with command line programs.
374 The environment variable
376 specifies a search path to use when finding source files named with
379 option. If this variable does not exist, the default path is
380 \fB".:/usr/local/share/awk"\fR.
381 (The actual directory may vary, depending upon how
383 was built and installed.)
384 If a file name given to the
386 option contains a ``/'' character, no path search is performed.
389 executes AWK programs in the following order.
391 all variable assignments specified via the
393 option are performed.
396 compiles the program into an internal form.
399 executes the code in the
402 and then proceeds to read
403 each file named in the
406 If there are no files named on the command line,
408 reads the standard input.
410 If a filename on the command line has the form
412 it is treated as a variable assignment. The variable
414 will be assigned the value
416 (This happens after any
418 block(s) have been run.)
419 Command line variable assignment
420 is most useful for dynamically assigning values to the variables
421 AWK uses to control how input is broken into fields and records. It
422 is also useful for controlling state if multiple passes are needed over
425 If the value of a particular element of
431 For each record in the input,
433 tests to see if it matches any
436 For each pattern that the record matches, the associated
439 The patterns are tested in the order they occur in the program.
441 Finally, after all the input is exhausted,
443 executes the code in the
446 .SH VARIABLES, RECORDS AND FIELDS
447 AWK variables are dynamic; they come into existence when they are
448 first used. Their values are either floating-point numbers or strings,
450 depending upon how they are used. AWK also has one dimensional
451 arrays; arrays with multiple dimensions may be simulated.
452 Several pre-defined variables are set as a program
453 runs; these will be described as needed and summarized below.
455 Normally, records are separated by newline characters. You can control how
456 records are separated by assigning values to the built-in variable
460 is any single character, that character separates records.
463 is a regular expression. Text in the input that matches this
464 regular expression will separate the record.
465 However, in compatibility mode,
466 only the first character of its string
467 value is used for separating records.
470 is set to the null string, then records are separated by
474 is set to the null string, the newline character always acts as
475 a field separator, in addition to whatever value
480 As each input record is read,
482 splits the record into
484 using the value of the
486 variable as the field separator.
489 is a single character, fields are separated by that character.
492 is the null string, then each individual character becomes a
496 is expected to be a full regular expression.
497 In the special case that
499 is a single space, fields are separated
500 by runs of spaces and/or tabs and/or newlines.
501 (But see the discussion of
504 Note that the value of
506 (see below) will also affect how fields are split when
508 is a regular expression, and how records are separated when
510 is a regular expression.
514 variable is set to a space separated list of numbers, each field is
515 expected to have fixed width, and
517 will split up the record using the specified widths. The value of
520 Assigning a new value to
524 and restores the default behavior.
526 Each field in the input record may be referenced by its position,
531 is the whole record. The value of a field may be assigned to as well.
532 Fields need not be referenced by constants:
542 prints the fifth field in the input record.
545 is set to the total number of fields in the input record.
547 References to non-existent fields (i.e. fields after
549 produce the null-string. However, assigning to a non-existent field
552 will increase the value of
554 create any intervening fields with the null string as their value, and
557 to be recomputed, with the fields being separated by the value of
559 References to negative numbered fields cause a fatal error.
562 causes the values of fields past the new value to be lost, and the value of
564 to be recomputed, with the fields being separated by the value of
566 .SS Built-in Variables
569 built-in variables are:
571 .TP \w'\fBFIELDWIDTHS\fR'u+1n
573 The number of command line arguments (does not include options to
575 or the program source).
580 of the current file being processed.
583 Array of command line arguments. The array is indexed from
587 Dynamically changing the contents of
589 can control the files used for data.
592 The conversion format for numbers, \fB"%.6g"\fR, by default.
595 An array containing the values of the current environment.
596 The array is indexed by the environment variables, each element being
597 the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
599 Changing this array does not affect the environment seen by programs which
601 spawns via redirection or the
604 (This may change in a future version of
606 .\" but don't hold your breath...
609 If a system error occurs either doing a redirection for
618 a string describing the error.
621 A white-space separated list of fieldwidths. When set,
623 parses the input into fields of fixed width, instead of using the
626 variable as the field separator.
627 The fixed field width facility is still experimental; the
628 semantics may change as
633 The name of the current input file.
634 If no files are specified on the command line, the value of
639 is undefined inside the
644 The input record number in the current input file.
647 The input field separator, a space by default. See
652 Controls the case-sensitivity of all regular expression
653 and string operations. If
655 has a non-zero value, then string comparisons and
656 pattern matching in rules,
659 record separating with
674 pre-defined functions will all ignore case when doing regular expression
677 is not equal to zero,
679 matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
681 As with all AWK variables, the initial value of
683 is zero, so all regular expression and string
684 operations are normally case-sensitive.
685 Under Unix, the full ISO 8859-1 Latin-1 character set is used
692 only affected regular expression operations. It now affects string
696 The number of fields in the current input record.
699 The total number of input records seen so far.
702 The output format for numbers, \fB"%.6g"\fR, by default.
705 The output field separator, a space by default.
708 The output record separator, by default a newline.
711 The input record separator, by default a newline.
714 The record terminator.
718 to the input text that matched the character or regular expression
723 The index of the first character matched by
728 The length of the string matched by
733 The character used to separate multiple subscripts in array
734 elements, by default \fB"\e034"\fR.
737 Arrays are subscripted with an expression between square brackets
739 If the expression is an expression list
740 .RI ( expr ", " expr " ...)"
741 then the array subscript is a string consisting of the
742 concatenation of the (string) value of each expression,
743 separated by the value of the
746 This facility is used to simulate multiply dimensioned
751 i = "A";\^ j = "B";\^ k = "C"
753 x[i, j, k] = "hello, world\en"
757 assigns the string \fB"hello, world\en"\fR to the element of the array
759 which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK
760 are associative, i.e. indexed by string values.
768 statement to see if an array has an index consisting of a particular
780 If the array has multiple subscripts, use
781 .BR "(i, j) in array" .
785 construct may also be used in a
787 loop to iterate over all the elements of an array.
789 An element may be deleted from an array using the
794 statement may also be used to delete the entire contents of an array,
795 just by specifying the array name without a subscript.
796 .SS Variable Typing And Conversion
799 may be (floating point) numbers, or strings, or both. How the
800 value of a variable is interpreted depends upon its context. If used in
801 a numeric expression, it will be treated as a number, if used as a string
802 it will be treated as a string.
804 To force a variable to be treated as a number, add 0 to it; to force it
805 to be treated as a string, concatenate it with the null string.
807 When a string must be converted to a number, the conversion is accomplished
810 A number is converted to a string by using the value of
812 as a format string for
814 with the numeric value of the variable as the argument.
815 However, even though all numbers in AWK are floating-point,
818 converted as integers. Thus, given
832 has a string value of \fB"12"\fR and not \fB"12.00"\fR.
835 performs comparisons as follows:
836 If two variables are numeric, they are compared numerically.
837 If one value is numeric and the other has a string value that is a
838 ``numeric string,'' then comparisons are also done numerically.
839 Otherwise, the numeric value is converted to a string and a string
840 comparison is performed.
841 Two strings are compared, of course, as strings.
842 According to the \*(PX standard, even if two strings are
843 numeric strings, a numeric comparison is performed. However, this is
844 clearly incorrect, and
848 Note that string constants, such as \fB"57"\fP, are
850 numeric strings, they are string constants. The idea of ``numeric string''
851 only applies to fields,
858 elements and the elements of an array created by
860 that are numeric strings.
861 The basic idea is that
863 and only user input, that looks numeric,
864 should be treated that way.
866 Uninitialized variables have the numeric value 0 and the string value ""
867 (the null, or empty, string).
868 .SH PATTERNS AND ACTIONS
869 AWK is a line oriented language. The pattern comes first, and then the
870 action. Action statements are enclosed in
874 Either the pattern may be missing, or the action may be missing, but,
875 of course, not both. If the pattern is missing, the action will be
876 executed for every single record of input.
877 A missing action is equivalent to
883 which prints the entire record.
885 Comments begin with the ``#'' character, and continue until the
887 Blank lines may be used to separate statements.
888 Normally, a statement ends with a newline, however, this is not the
889 case for lines ending in
901 also have their statements automatically continued on the following line.
902 In other cases, a line can be continued by ending it with a ``\e'',
903 in which case the newline will be ignored.
905 Multiple statements may
906 be put on one line by separating them with a ``;''.
907 This applies to both the statements within the action part of a
908 pattern-action pair (the usual case),
909 and to the pattern-action statements themselves.
911 AWK patterns may be one of the following:
917 .BI / "regular expression" /
918 .I "relational expression"
919 .IB pattern " && " pattern
920 .IB pattern " || " pattern
921 .IB pattern " ? " pattern " : " pattern
924 .IB pattern1 ", " pattern2
931 are two special kinds of patterns which are not tested against
933 The action parts of all
935 patterns are merged as if all the statements had
936 been written in a single
938 block. They are executed before any
939 of the input is read. Similarly, all the
942 and executed when all the input is exhausted (or when an
944 statement is executed).
948 patterns cannot be combined with other patterns in pattern expressions.
952 patterns cannot have missing action parts.
955 .BI / "regular expression" /
956 patterns, the associated statement is executed for each input record that matches
957 the regular expression.
958 Regular expressions are the same as those in
960 and are summarized below.
963 .I "relational expression"
964 may use any of the operators defined below in the section on actions.
965 These generally test whether certain fields match certain regular expressions.
972 operators are logical AND, logical OR, and logical NOT, respectively, as in C.
973 They do short-circuit evaluation, also as in C, and are used for combining
974 more primitive pattern expressions. As in most languages, parentheses
975 may be used to change the order of evaluation.
979 operator is like the same operator in C. If the first pattern is true
980 then the pattern used for testing is the second pattern, otherwise it is
981 the third. Only one of the second and third patterns is evaluated.
984 .IB pattern1 ", " pattern2
985 form of an expression is called a
986 .IR "range pattern" .
987 It matches all input records starting with a record that matches
989 and continuing until a record that matches
991 inclusive. It does not combine with any other sort of pattern expression.
992 .SS Regular Expressions
993 Regular expressions are the extended kind found in
995 They are composed of characters as follows:
996 .TP \w'\fB[^\fIabc...\fB]\fR'u+2n
998 matches the non-metacharacter
1002 matches the literal character
1006 matches any character
1011 matches the beginning of a string.
1014 matches the end of a string.
1017 character list, matches any of the characters
1021 negated character list, matches any character except
1025 alternation: matches either
1031 concatenation: matches
1041 matches zero or more
1060 One or two numbers inside braces denote an
1061 .IR "interval expression" .
1062 If there is one number in the braces, the preceding regexp
1066 times. If there are two numbers separated by a comma,
1073 If there is one number followed by a comma, then
1075 is repeated at least
1079 Interval expressions are only available if either
1082 .B \-\^\-re\-interval
1083 is specified on the command line.
1086 matches the empty string at either the beginning or the
1090 matches the empty string within a word.
1093 matches the empty string at the beginning of a word.
1096 matches the empty string at the end of a word.
1099 matches any word-constituent character (letter, digit, or underscore).
1102 matches any character that is not word-constituent.
1105 matches the empty string at the beginning of a buffer (string).
1108 matches the empty string at the end of a buffer.
1110 The escape sequences that are valid in string constants (see below)
1111 are also legal in regular expressions.
1113 .I "Character classes"
1114 are a new feature introduced in the POSIX standard.
1115 A character class is a special notation for describing
1116 lists of characters that have a specific attribute, but where the
1117 actual characters themselves can vary from country to country and/or
1118 from character set to character set. For example, the notion of what
1119 is an alphabetic character differs in the USA and in France.
1121 A character class is only valid in a regexp
1123 the brackets of a character list. Character classes consist of
1125 a keyword denoting the class, and
1127 Here are the character
1128 classes defined by the POSIX standard.
1131 Alphanumeric characters.
1134 Alphabetic characters.
1137 Space or tab characters.
1146 Characters that are both printable and visible.
1147 (A space is printable, but not visible, while an
1152 Lower-case alphabetic characters.
1155 Printable characters (characters that are not control characters.)
1158 Punctuation characters (characters that are not letter, digits,
1159 control characters, or space characters).
1162 Space characters (such as space, tab, and formfeed, to name a few).
1165 Upper-case alphabetic characters.
1168 Characters that are hexadecimal digits.
1170 For example, before the POSIX standard, to match alphanumeric
1171 characters, you would have had to write
1172 .BR /[A\-Za\-z0\-9]/ .
1173 If your character set had other alphabetic characters in it, this would not
1174 match them. With the POSIX character classes, you can write
1178 the alphabetic and numeric characters in your character set.
1180 Two additional special sequences can appear in character lists.
1181 These apply to non-ASCII character sets, which can have single symbols
1183 .IR "collating elements" )
1184 that are represented with more than one
1185 character, as well as several characters that are equivalent for
1187 or sorting, purposes. (E.g., in French, a plain ``e''
1188 and a grave-accented e\` are equivalent.)
1191 A collating symbols is a multi-character collating element enclosed in
1197 is a collating element, then
1199 is a regexp that matches this collating element, while
1201 is a regexp that matches either
1207 An equivalence class is a locale-specific name for a list of
1208 characters that are equivalent. The name is enclosed in
1212 For example, the name
1214 might be used to represent all of
1215 ``e,'' ``e\`,'' and ``e\`.''
1225 These features are very valuable in non-English speaking locales.
1226 The library functions that
1228 uses for regular expression matching
1229 currently only recognize POSIX character classes; they do not recognize
1230 collating symbols or equivalence classes.
1242 operators are specific to
1244 they are extensions based on facilities in the GNU regexp libraries.
1246 The various command line options
1249 interprets characters in regexps.
1252 In the default case,
1254 provide all the facilities of
1255 POSIX regexps and the GNU regexp operators described above.
1256 However, interval expressions are not supported.
1259 Only POSIX regexps are supported, the GNU operators are not special.
1264 Interval expressions are allowed.
1266 .B \-\^\-traditional
1269 regexps are matched. The GNU operators
1270 are not special, interval expressions are not available, and neither
1271 are the POSIX character classes
1274 Characters described by octal and hexadecimal escape sequences are
1275 treated literally, even if they represent regexp metacharacters.
1277 .B \-\^\-re\-interval
1278 Allow interval expressions in regexps, even if
1279 .B \-\^\-traditional
1282 Action statements are enclosed in braces,
1286 Action statements consist of the usual assignment, conditional, and looping
1287 statements found in most languages. The operators, control statements,
1288 and input/output statements
1289 available are patterned after those in C.
1292 The operators in AWK, in order of decreasing precedence, are
1294 .TP "\w'\fB*= /= %= ^=\fR'u+1n"
1302 Increment and decrement, both prefix and postfix.
1305 Exponentiation (\fB**\fR may also be used, and \fB**=\fR for
1306 the assignment operator).
1309 Unary plus, unary minus, and logical negation.
1312 Multiplication, division, and modulus.
1315 Addition and subtraction.
1318 String concatenation.
1328 The regular relational operators.
1331 Regular expression match, negated match.
1333 Do not use a constant regular expression
1335 on the left-hand side of a
1339 Only use one on the right-hand side. The expression
1341 has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR.
1356 The C conditional expression. This has the form
1357 .IB expr1 " ? " expr2 " : " expr3\c
1360 is true, the value of the expression is
1375 Assignment. Both absolute assignment
1376 .BI ( var " = " value )
1377 and operator-assignment (the other forms) are supported.
1378 .SS Control Statements
1380 The control statements are
1385 \fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
1386 \fBwhile (\fIcondition\fB) \fIstatement \fR
1387 \fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
1388 \fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
1389 \fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
1392 \fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
1393 \fBdelete \fIarray\^\fR
1394 \fBexit\fR [ \fIexpression\fR ]
1395 \fB{ \fIstatements \fB}
1398 .SS "I/O Statements"
1400 The input/output statements are as follows:
1402 .TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n"
1404 Close file (or pipe, see below).
1409 from next input record; set
1414 .BI "getline <" file
1425 from next input record; set
1429 .BI getline " var" " <" file
1436 Stop processing the current input record. The next input record
1437 is read and processing starts over with the first pattern in the
1438 AWK program. If the end of the input data is reached, the
1440 block(s), if any, are executed.
1443 Stop processing the current input file. The next input record read
1444 comes from the next input file.
1450 is reset to 1, and processing starts over with the first pattern in the
1451 AWK program. If the end of the input data is reached, the
1453 block(s), if any, are executed.
1455 Earlier versions of gawk used
1457 as two words. While this usage is still recognized, it generates a
1458 warning message and will eventually be removed.
1461 Prints the current record.
1462 The output record is terminated with the value of the
1466 .BI print " expr-list"
1468 Each expression is separated by the value of the
1471 The output record is terminated with the value of the
1475 .BI print " expr-list" " >" file
1476 Prints expressions on
1478 Each expression is separated by the value of the
1480 variable. The output record is terminated with the value of the
1484 .BI printf " fmt, expr-list"
1487 .BI printf " fmt, expr-list" " >" file
1491 .BI system( cmd-line )
1494 and return the exit status.
1495 (This may not be available on non-\*(PX systems.)
1497 \&\fBfflush(\fR[\fIfile\^\fR]\fB)\fR
1498 Flush any buffers associated with the open output file or pipe
1502 is missing, then standard output is flushed.
1506 then all open output files and pipes
1507 have their buffers flushed.
1509 Other input/output redirections are also allowed. For
1514 appends output to the
1519 In a similar fashion,
1520 .IB command " | getline"
1525 command will return 0 on end of file, and \-1 on an error.
1526 .SS The \fIprintf\fP\^ Statement
1528 The AWK versions of the
1534 accept the following conversion specification formats:
1537 An \s-1ASCII\s+1 character.
1538 If the argument used for
1540 is numeric, it is treated as a character and printed.
1541 Otherwise, the argument is assumed to be a string, and the only first
1542 character of that string is printed.
1549 A decimal number (the integer part).
1556 A floating point number of the form
1557 .BR [\-]d.dddddde[+\^\-]dd .
1566 A floating point number of the form
1567 .BR [\-]ddd.dddddd .
1578 conversion, whichever is shorter, with nonsignificant zeros suppressed.
1587 An unsigned octal number (again, an integer).
1597 An unsigned hexadecimal number (an integer).
1608 character; no argument is converted.
1610 There are optional, additional parameters that may lie between the
1612 and the control letter:
1615 The expression should be left-justified within its field.
1618 For numeric conversions, prefix positive values with a space, and
1619 negative values with a minus sign.
1622 The plus sign, used before the width modifier (see below),
1623 says to always supply a sign for numeric conversions, even if the data
1624 to be formatted is positive. The
1626 overrides the space modifier.
1629 Use an ``alternate form'' for certain control letters.
1632 supply a leading zero.
1648 the result will always contain a
1654 trailing zeros are not removed from the result.
1659 (zero) acts as a flag, that indicates output should be
1660 padded with zeroes instead of spaces.
1661 This applies even to non-numeric output formats.
1662 This flag only has an effect when the field width is wider than the
1663 value to be printed.
1666 The field should be padded to this width. The field is normally padded
1669 flag has been used, it is padded with zeroes.
1672 A number that specifies the precision to use when printing.
1678 formats, this specifies the
1679 number of digits you want printed to the right of the decimal point.
1684 formats, it specifies the maximum number
1685 of significant digits. For the
1693 formats, it specifies the minimum number of
1694 digits to print. For a string, it specifies the maximum number of
1695 characters from the string that should be printed.
1701 capabilities of the \*(AN C
1703 routines are supported.
1706 in place of either the
1710 specifications will cause their values to be taken from
1711 the argument list to
1715 .SS Special File Names
1717 When doing I/O redirection from either
1726 recognizes certain special filenames internally. These filenames
1727 allow access to open file descriptors inherited from
1729 parent process (usually the shell).
1730 Other special filenames provide access to information about the running
1734 .TP \w'\fB/dev/stdout\fR'u+1n
1736 Reading this file returns the process ID of the current process,
1737 in decimal, terminated with a newline.
1740 Reading this file returns the parent process ID of the current process,
1741 in decimal, terminated with a newline.
1744 Reading this file returns the process group ID of the current process,
1745 in decimal, terminated with a newline.
1748 Reading this file returns a single record terminated with a newline.
1749 The fields are separated with spaces.
1766 If there are any additional fields, they are the group IDs returned by
1768 Multiple groups may not be supported on all systems.
1774 The standard output.
1777 The standard error output.
1780 The file associated with the open file descriptor
1783 These are particularly useful for error messages. For example:
1787 print "You blew it!" > "/dev/stderr"
1791 whereas you would otherwise have to use
1795 print "You blew it!" | "cat 1>&2"
1799 These file names may also be used on the command line to name data files.
1800 .SS Numeric Functions
1802 AWK has the following pre-defined arithmetic functions:
1804 .TP \w'\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR'u+1n
1805 .BI atan2( y , " x" )
1806 returns the arctangent of
1811 returns the cosine of
1813 which is in radians.
1816 the exponential function.
1819 truncates to integer.
1822 the natural logarithm function.
1825 returns a random number between 0 and 1.
1830 which is in radians.
1833 the square root function.
1835 \&\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR
1838 as a new seed for the random number generator. If no
1840 is provided, the time of day will be used.
1841 The return value is the previous seed for the random
1843 .SS String Functions
1846 has the following pre-defined string functions:
1848 .TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
1849 \fBgensub(\fIr\fB, \fIs\fB, \fIh \fR[\fB, \fIt\fR]\fB)\fR
1850 search the target string
1852 for matches of the regular expression
1856 is a string beginning with
1860 then replace all matches of
1866 is a number indicating which match of
1874 Within the replacement text
1880 is a digit from 1 to 9, may be used to indicate just the text that
1883 parenthesized subexpression. The sequence
1885 represents the entire matched text, as does the character
1891 the modified string is returned as the result of the function,
1892 and the original target string is
1895 .TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
1896 \fBgsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
1897 for each substring matching the regular expression
1901 substitute the string
1903 and return the number of substitutions.
1906 is not supplied, use
1910 in the replacement text is replaced with the text that was actually matched.
1916 .I "AWK Language Programming"
1917 for a fuller discussion of the rules for
1919 and backslashes in the replacement text of
1925 .BI index( s , " t" )
1926 returns the index of the string
1934 \fBlength(\fR[\fIs\fR]\fB)
1935 returns the length of the string
1943 .BI match( s , " r" )
1944 returns the position in
1946 where the regular expression
1950 is not present, and sets the values of
1955 \fBsplit(\fIs\fB, \fIa \fR[\fB, \fIr\fR]\fB)\fR
1960 on the regular expression
1962 and returns the number of fields. If
1970 Splitting behaves identically to field splitting, described above.
1972 .BI sprintf( fmt , " expr-list" )
1977 and returns the resulting string.
1979 \fBsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
1982 but only the first matching substring is replaced.
1984 \fBsubstr(\fIs\fB, \fIi \fR[\fB, \fIn\fR]\fB)\fR
1993 is omitted, the rest of
1998 returns a copy of the string
2000 with all the upper-case characters in
2002 translated to their corresponding lower-case counterparts.
2003 Non-alphabetic characters are left unchanged.
2006 returns a copy of the string
2008 with all the lower-case characters in
2010 translated to their corresponding upper-case counterparts.
2011 Non-alphabetic characters are left unchanged.
2014 Since one of the primary uses of AWK programs is processing log files
2015 that contain time stamp information,
2017 provides the following two functions for obtaining time stamps and
2020 .TP "\w'\fBsystime()\fR'u+1n"
2022 returns the current time of day as the number of seconds since the Epoch
2023 (Midnight UTC, January 1, 1970 on \*(PX systems).
2025 \fBstrftime(\fR[\fIformat \fR[\fB, \fItimestamp\fR]]\fB)\fR
2028 according to the specification in
2032 should be of the same form as returned by
2036 is missing, the current time of day is used.
2039 is missing, a default format equivalent to the output of
2042 See the specification for the
2044 function in \*(AN C for the format conversions that are
2045 guaranteed to be available.
2046 A public-domain version of
2048 and a man page for it come with
2050 if that version was used to build
2052 then all of the conversions described in that man page are available to
2054 .SS String Constants
2056 String constants in AWK are sequences of characters enclosed
2057 between double quotes (\fB"\fR). Within strings, certain
2058 .I "escape sequences"
2059 are recognized, as in C. These are:
2061 .TP \w'\fB\e\^\fIddd\fR'u+1n
2063 A literal backslash.
2066 The ``alert'' character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character.
2086 .BI \ex "\^hex digits"
2087 The character represented by the string of hexadecimal digits following
2090 As in \*(AN C, all following hexadecimal digits are considered part of
2091 the escape sequence.
2092 (This feature should tell us something about language design by committee.)
2093 E.g., \fB"\ex1B"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
2096 The character represented by the 1-, 2-, or 3-digit sequence of octal
2097 digits. E.g. \fB"\e033"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
2100 The literal character
2103 The escape sequences may also be used inside constant regular expressions
2105 .B "/[\ \et\ef\en\er\ev]/"
2106 matches whitespace characters).
2108 In compatibility mode, the characters represented by octal and
2109 hexadecimal escape sequences are treated literally when used in
2110 regexp constants. Thus,
2115 Functions in AWK are defined as follows:
2118 \fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
2121 Functions are executed when they are called from within expressions
2122 in either patterns or actions. Actual parameters supplied in the function
2123 call are used to instantiate the formal parameters declared in the function.
2124 Arrays are passed by reference, other variables are passed by value.
2126 Since functions were not originally part of the AWK language, the provision
2127 for local variables is rather clumsy: They are declared as extra parameters
2128 in the parameter list. The convention is to separate local variables from
2129 real parameters by extra spaces in the parameter list. For example:
2134 function f(p, q, a, b) # a & b are local
2139 /abc/ { ... ; f(1, 2) ; ... }
2144 The left parenthesis in a function call is required
2145 to immediately follow the function name,
2146 without any intervening white space.
2147 This is to avoid a syntactic ambiguity with the concatenation operator.
2148 This restriction does not apply to the built-in functions listed above.
2150 Functions may call each other and may be recursive.
2151 Function parameters used as local variables are initialized
2152 to the null string and the number zero upon function invocation.
2156 to return a value from a function. The return value is undefined if no
2157 value is provided, or if the function returns by ``falling off'' the
2164 will warn about calls to undefined functions at parse time,
2165 instead of at run time.
2166 Calling an undefined function at run time is a fatal error.
2170 may be used in place of
2174 Print and sort the login names of all users:
2178 { print $1 | "sort" }
2181 Count lines in a file:
2185 END { print nlines }
2188 Precede each line by its number in the file:
2194 Concatenate and line number (a variation on a theme):
2211 .IR "The AWK Programming Language" ,
2212 Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
2213 Addison-Wesley, 1988. ISBN 0-201-07981-X.
2215 .IR "AWK Language Programming" ,
2216 Edition 1.0, published by the Free Software Foundation, 1995.
2217 .SH POSIX COMPATIBILITY
2220 is compatibility with the \*(PX standard, as well as with the
2221 latest version of \*(UX
2225 incorporates the following user visible
2226 features which are not described in the AWK book,
2227 but are part of the Bell Labs version of
2229 and are in the \*(PX standard.
2233 option for assigning variables before program execution starts is new.
2234 The book indicates that command line variable assignment happens when
2236 would otherwise open the argument as a file, which is after the
2238 block is executed. However, in earlier implementations, when such an
2239 assignment appeared before any file names, the assignment would happen
2243 block was run. Applications came to depend on this ``feature.''
2246 was changed to match its documentation, this option was added to
2247 accommodate applications that depended upon the old behavior.
2248 (This feature was agreed upon by both the AT&T and GNU developers.)
2252 option for implementation specific features is from the \*(PX standard.
2254 When processing arguments,
2256 uses the special option ``\fB\-\^\-\fP'' to signal the end of
2258 In compatibility mode, it will warn about, but otherwise ignore,
2260 In normal operation, such arguments are passed on to the AWK program for
2263 The AWK book does not define the return value of
2266 has it return the seed it was using, to allow keeping track
2267 of random number sequences. Therefore
2271 also returns its current seed.
2273 Other new features are:
2284 escape sequences (done originally in
2286 and fed back into AT&T's); the
2290 built-in functions (from AT&T); and the \*(AN C conversion specifications in
2292 (done first in AT&T's version).
2295 has a number of extensions to \*(PX
2297 They are described in this section. All the extensions described here
2302 .B \-\^\-traditional
2305 The following features of
2307 are not available in
2335 The special file names available for I/O redirection are not recognized.
2343 variables are not special.
2348 variable and its side-effects are not available.
2353 variable and fixed-width field splitting.
2358 as a regular expression.
2361 The ability to split out individual characters using the null string
2364 and as the third argument to
2368 No path search is performed for files named via the
2370 option. Therefore the
2372 environment variable is not special.
2377 to abandon processing of the current input file.
2382 to delete the entire contents of an array.
2385 The AWK book does not define the return value of the
2390 returns the value from
2394 when closing a file or pipe, respectively.
2399 .B \-\^\-traditional
2405 option is ``t'', then
2407 will be set to the tab character.
2409 .B "gawk \-F\et \&..."
2410 simply causes the shell to quote the ``t,'', and does not pass
2414 Since this is a rather ugly special case, it is not the default behavior.
2415 This behavior also does not occur if
2418 To really get a tab character as the field separator, it is best to use
2420 .BR "gawk \-F'\et' \&..." .
2425 was compiled for debugging, it will
2426 accept the following additional options:
2437 debugging output during program parsing.
2438 This option should only be of interest to the
2440 maintainers, and may not even be compiled into
2443 .SH HISTORICAL FEATURES
2444 There are two features of historical AWK implementations that
2447 First, it is possible to call the
2449 built-in function not only with no argument, but even without parentheses!
2454 a = length # Holy Algol 60, Batman!
2458 is the same as either of
2468 This feature is marked as ``deprecated'' in the \*(PX standard, and
2470 will issue a warning about its use if
2472 is specified on the command line.
2474 The other feature is the use of either the
2478 statements outside the body of a
2483 loop. Traditional AWK implementations have treated such usage as
2488 will support this usage if
2489 .B \-\^\-traditional
2494 exists in the environment, then
2496 behaves exactly as if
2498 had been specified on the command line.
2503 will issue a warning message to this effect.
2507 environment variable can be used to provide a list of directories that
2509 will search when looking for files named via the
2517 option is not necessary given the command line variable assignment feature;
2518 it remains only for backwards compatibility.
2520 If your system actually has support for
2527 files, you may get different output from
2529 than you would get on a system without those files. When
2531 interprets these files internally, it synchronizes output to the standard
2532 output with output to
2534 while on a system with those files, the output is actually to different
2538 Syntactically invalid single character programs tend to overflow
2539 the parse stack, generating a rather unhelpful message. Such programs
2540 are surprisingly difficult to diagnose in the completely general case,
2541 and the effort to do so really is not worth it.
2542 .SH VERSION INFORMATION
2543 This man page documents
2547 The original version of \*(UX
2549 was designed and implemented by Alfred Aho,
2550 Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
2551 continues to maintain and enhance it.
2553 Paul Rubin and Jay Fenlason,
2554 of the Free Software Foundation, wrote
2556 to be compatible with the original version of
2558 distributed in Seventh Edition \*(UX.
2559 John Woods contributed a number of bug fixes.
2560 David Trueman, with contributions
2561 from Arnold Robbins, made
2563 compatible with the new version of \*(UX
2565 Arnold Robbins is the current maintainer.
2567 The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
2568 Scott Deifik is the current DOS maintainer. Pat Rankin did the
2569 port to VMS, and Michal Jaegermann did the port to the Atari ST.
2570 The port to OS/2 was done by Kai Uwe Rommel, with contributions and
2571 help from Darrel Hankerson. Fred Fish supplied support for the Amiga.
2573 If you find a bug in
2575 please send electronic mail to
2576 .BR bug-gnu-utils@gnu.org ,
2579 .BR arnold@gnu.org .
2580 Please include your operating system and its revision, the version of
2582 what C compiler you used to compile it, and a test program
2583 and data that are as small as possible for reproducing the problem.
2585 Before sending a bug report, please do two things. First, verify that
2586 you have the latest version of
2588 Many bugs (usually subtle ones) are fixed at each release, and if
2589 yours is out of date, the problem may already have been solved.
2590 Second, please read this man page and the reference manual carefully to
2591 be sure that what you think is a bug really is, instead of just a quirk
2596 post a bug report in
2600 developers occasionally read this newsgroup, posting bug reports there
2601 is an unreliable way to report bugs. Instead, please use the electronic mail
2602 addresses given above.
2603 .SH ACKNOWLEDGEMENTS
2604 Brian Kernighan of Bell Labs
2605 provided valuable assistance during testing and debugging.
2607 .SH COPYING PERMISSIONS
2608 Copyright \(co) 1996,97,98,99 Free Software Foundation, Inc.
2610 Permission is granted to make and distribute verbatim copies of
2611 this manual page provided the copyright notice and this permission
2612 notice are preserved on all copies.
2614 Permission is granted to process this file through troff and print the
2615 results, provided the printed document carries copying permission
2616 notice identical to this one except for the removal of this paragraph
2617 (this paragraph not being relevant to the printed manual page).
2620 Permission is granted to copy and distribute modified versions of this
2621 manual page under the conditions for verbatim copying, provided that
2622 the entire resulting derived work is distributed under the terms of a
2623 permission notice identical to this one.
2625 Permission is granted to copy and distribute translations of this
2626 manual page into another language, under the above conditions for
2627 modified versions, except that this permission notice may be stated in
2628 a translation approved by the Foundation.