9 . if \w'\(lq' .ds lq "\(lq
13 . if \w'\(rq' .ds rq "\(rq
16 .TH GAWK 1 "May 17 2000" "Free Software Foundation" "Utility Commands"
18 gawk \- pattern scanning and processing language
21 [ \*(PX or \*(GN style options ]
29 [ \*(PX or \*(GN style options ]
37 is the \*(GN Project's implementation of the \*(AK programming language.
38 It conforms to the definition of the language in
39 the \*(PX 1003.2 Command Language And Utilities Standard.
40 This version in turn is based on the description in
41 .IR "The AWK Programming Language" ,
42 by Aho, Kernighan, and Weinberger,
43 with the additional features found in the System V Release 4 version
47 also provides more recent Bell Labs
49 extensions, and some \*(GN-specific extensions.
51 The command line consists of options to
53 itself, the \*(AK program text (if not supplied via the
57 options), and values to be made
62 pre-defined \*(AK variables.
66 options may be either the traditional \*(PX one letter options,
67 or the \*(GN style long options. \*(PX options start with a single \*(lq\-\*(rq,
68 while long options start with \*(lq\-\^\-\*(rq.
69 Long options are provided for both \*(GN-specific features and
70 for \*(PX mandated features.
72 Following the \*(PX standard,
74 options are supplied via arguments to the
78 options may be supplied
81 option has a corresponding long option, as detailed below.
82 Arguments to long options are either joined with the option
85 sign, with no intervening spaces, or they may be provided in the
86 next command line argument.
87 Long options may be abbreviated, as long as the abbreviation
92 accepts the following options.
98 .BI \-\^\-field-separator " fs"
101 for the input field separator (the value of the
107 \fB\-v\fI var\fB\^=\^\fIval\fR
110 \fB\-\^\-assign \fIvar\fB\^=\^\fIval\fR
115 before execution of the program begins.
116 Such variable values are available to the
118 block of an \*(AK program.
121 .BI \-f " program-file"
124 .BI \-\^\-file " program-file"
125 Read the \*(AK program source from the file
127 instead of from the first command line argument.
139 Set various memory limits to the value
143 flag sets the maximum number of fields, and the
145 flag sets the maximum record size. These two flags and the
147 option are from the Bell Labs research version of \*(UX
153 has no pre-defined limits.
168 mode. In compatibility mode,
170 behaves identically to \*(UX
172 none of the \*(GN-specific extensions are recognized.
175 is preferred over the other forms of this option.
177 .BR "GNU EXTENSIONS" ,
178 below, for more information.
191 Print the short version of the \*(GN copyright information message on
192 the standard output, and exits successfully.
205 Print a relatively short summary of the available options on
208 .IR "GNU Coding Standards" ,
209 these options cause an immediate, successful exit.)
216 Provide warnings about constructs that are
217 dubious or non-portable to other \*(AK implementations.
224 Provide warnings about constructs that are
225 not portable to the original version of Unix
228 .\" This option is left undocumented, on purpose.
235 Provide a moment of nostalgia for long time
247 mode, with the following additional restrictions:
252 escape sequences are not recognized.
255 Only space and tab act as field separators when
257 is set to a single space, newline does not.
271 cannot be used in place of
279 function is not available.
283 .B "\-W re\-interval"
286 .B \-\^\-re\-interval
288 .I "interval expressions"
289 in regular expression matching
291 .BR "Regular Expressions" ,
293 Interval expressions were not traditionally available in the
294 \*(AK language. The \*(PX standard added them, to make
298 consistent with each other.
299 However, their use is likely
300 to break old \*(AK programs, so
302 only provides them if they are requested with this option, or when
307 .BI "\-W source " program-text
310 .BI \-\^\-source " program-text"
313 as \*(AK program source code.
314 This option allows the easy intermixing of library functions (used via the
318 options) with source code entered on the command line.
319 It is intended primarily for medium to large \*(AK programs used
327 Print version information for this particular copy of
329 on the standard output.
330 This is useful mainly for knowing if the current copy of
333 is up to date with respect to whatever the Free Software Foundation
335 This is also useful when reporting bugs.
337 .IR "GNU Coding Standards" ,
338 these options cause an immediate, successful exit.)
341 Signal the end of options. This is useful to allow further arguments to the
342 \*(AK program itself to start with a \*(lq\-\*(rq.
343 This is mainly for consistency with the argument parsing convention used
344 by most other \*(PX programs.
346 In compatibility mode,
347 any other options are flagged as illegal, but are otherwise ignored.
348 In normal operation, as long as program text has been supplied, unknown
349 options are passed on to the \*(AK program in the
351 array for processing. This is particularly useful for running \*(AK
352 programs via the \*(lq#!\*(rq executable interpreter mechanism.
353 .SH AWK PROGRAM EXECUTION
355 An \*(AK program consists of a sequence of pattern-action statements
356 and optional function definitions.
359 \fIpattern\fB { \fIaction statements\fB }\fR
361 \fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
365 first reads the program source from the
370 or from the first non-option argument on the command line.
375 options may be used multiple times on the command line.
377 will read the program text as if all the
379 and command line source texts
380 had been concatenated together. This is useful for building libraries
381 of \*(AK functions, without having to include them in each new \*(AK
382 program that uses them. It also provides the ability to mix library
383 functions with command line programs.
385 The environment variable
387 specifies a search path to use when finding source files named with
390 option. If this variable does not exist, the default path is
391 \fB".:/usr/local/share/awk"\fR.
392 (The actual directory may vary, depending upon how
394 was built and installed.)
395 If a file name given to the
397 option contains a \*(lq/\*(rq character, no path search is performed.
400 executes \*(AK programs in the following order.
402 all variable assignments specified via the
404 option are performed.
407 compiles the program into an internal form.
410 executes the code in the
413 and then proceeds to read
414 each file named in the
417 If there are no files named on the command line,
419 reads the standard input.
421 If a filename on the command line has the form
423 it is treated as a variable assignment. The variable
425 will be assigned the value
427 (This happens after any
429 block(s) have been run.)
430 Command line variable assignment
431 is most useful for dynamically assigning values to the variables
432 \*(AK uses to control how input is broken into fields and records.
433 It is also useful for controlling state if multiple passes are needed over
436 If the value of a particular element of
442 For each record in the input,
444 tests to see if it matches any
446 in the \*(AK program.
447 For each pattern that the record matches, the associated
450 The patterns are tested in the order they occur in the program.
452 Finally, after all the input is exhausted,
454 executes the code in the
457 .SH VARIABLES, RECORDS AND FIELDS
458 \*(AK variables are dynamic; they come into existence when they are
459 first used. Their values are either floating-point numbers or strings,
461 depending upon how they are used. \*(AK also has one dimensional
462 arrays; arrays with multiple dimensions may be simulated.
463 Several pre-defined variables are set as a program
464 runs; these will be described as needed and summarized below.
466 Normally, records are separated by newline characters. You can control how
467 records are separated by assigning values to the built-in variable
471 is any single character, that character separates records.
474 is a regular expression. Text in the input that matches this
475 regular expression will separate the record.
476 However, in compatibility mode,
477 only the first character of its string
478 value is used for separating records.
481 is set to the null string, then records are separated by
485 is set to the null string, the newline character always acts as
486 a field separator, in addition to whatever value
491 As each input record is read,
493 splits the record into
495 using the value of the
497 variable as the field separator.
500 is a single character, fields are separated by that character.
503 is the null string, then each individual character becomes a
507 is expected to be a full regular expression.
508 In the special case that
510 is a single space, fields are separated
511 by runs of spaces and/or tabs and/or newlines.
512 (But see the discussion of
515 Note that the value of
517 (see below) will also affect how fields are split when
519 is a regular expression, and how records are separated when
521 is a regular expression.
525 variable is set to a space separated list of numbers, each field is
526 expected to have fixed width, and
528 will split up the record using the specified widths. The value of
531 Assigning a new value to
535 and restores the default behavior.
537 Each field in the input record may be referenced by its position,
542 is the whole record. The value of a field may be assigned to as well.
543 Fields need not be referenced by constants:
553 prints the fifth field in the input record.
556 is set to the total number of fields in the input record.
558 References to non-existent fields (i.e. fields after
560 produce the null-string. However, assigning to a non-existent field
563 will increase the value of
565 create any intervening fields with the null string as their value, and
568 to be recomputed, with the fields being separated by the value of
570 References to negative numbered fields cause a fatal error.
573 causes the values of fields past the new value to be lost, and the value of
575 to be recomputed, with the fields being separated by the value of
577 .SS Built-in Variables
580 built-in variables are:
582 .TP \w'\fBFIELDWIDTHS\fR'u+1n
584 The number of command line arguments (does not include options to
586 or the program source).
591 of the current file being processed.
594 Array of command line arguments. The array is indexed from
598 Dynamically changing the contents of
600 can control the files used for data.
603 The conversion format for numbers, \fB"%.6g"\fR, by default.
606 An array containing the values of the current environment.
607 The array is indexed by the environment variables, each element being
608 the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
610 Changing this array does not affect the environment seen by programs which
612 spawns via redirection or the
615 (This may change in a future version of
617 .\" but don't hold your breath...
620 If a system error occurs either doing a redirection for
629 a string describing the error.
632 A white-space separated list of fieldwidths. When set,
634 parses the input into fields of fixed width, instead of using the
637 variable as the field separator.
638 The fixed field width facility is still experimental; the
639 semantics may change as
644 The name of the current input file.
645 If no files are specified on the command line, the value of
650 is undefined inside the
655 The input record number in the current input file.
658 The input field separator, a space by default. See
663 Controls the case-sensitivity of all regular expression
664 and string operations. If
666 has a non-zero value, then string comparisons and
667 pattern matching in rules,
670 record separating with
685 pre-defined functions will all ignore case when doing regular expression
688 is not equal to zero,
690 matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
692 As with all \*(AK variables, the initial value of
694 is zero, so all regular expression and string
695 operations are normally case-sensitive.
696 Under Unix, the full ISO 8859-1 Latin-1 character set is used
703 only affected regular expression operations. It now affects string
707 The number of fields in the current input record.
710 The total number of input records seen so far.
713 The output format for numbers, \fB"%.6g"\fR, by default.
716 The output field separator, a space by default.
719 The output record separator, by default a newline.
722 The input record separator, by default a newline.
725 The record terminator.
729 to the input text that matched the character or regular expression
734 The index of the first character matched by
739 The length of the string matched by
744 The character used to separate multiple subscripts in array
745 elements, by default \fB"\e034"\fR.
748 Arrays are subscripted with an expression between square brackets
750 If the expression is an expression list
751 .RI ( expr ", " expr " .\|.\|.)"
752 then the array subscript is a string consisting of the
753 concatenation of the (string) value of each expression,
754 separated by the value of the
757 This facility is used to simulate multiply dimensioned
762 i = "A";\^ j = "B";\^ k = "C"
764 x[i, j, k] = "hello, world\en"
768 assigns the string \fB"hello, world\en"\fR to the element of the array
770 which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in \*(AK
771 are associative, i.e. indexed by string values.
779 statement to see if an array has an index consisting of a particular
791 If the array has multiple subscripts, use
792 .BR "(i, j) in array" .
796 construct may also be used in a
798 loop to iterate over all the elements of an array.
800 An element may be deleted from an array using the
805 statement may also be used to delete the entire contents of an array,
806 just by specifying the array name without a subscript.
807 .SS Variable Typing And Conversion
810 may be (floating point) numbers, or strings, or both. How the
811 value of a variable is interpreted depends upon its context. If used in
812 a numeric expression, it will be treated as a number, if used as a string
813 it will be treated as a string.
815 To force a variable to be treated as a number, add 0 to it; to force it
816 to be treated as a string, concatenate it with the null string.
818 When a string must be converted to a number, the conversion is accomplished
821 A number is converted to a string by using the value of
823 as a format string for
825 with the numeric value of the variable as the argument.
826 However, even though all numbers in \*(AK are floating-point,
829 converted as integers. Thus, given
843 has a string value of \fB"12"\fR and not \fB"12.00"\fR.
846 performs comparisons as follows:
847 If two variables are numeric, they are compared numerically.
848 If one value is numeric and the other has a string value that is a
849 \*(lqnumeric string,\*(rq then comparisons are also done numerically.
850 Otherwise, the numeric value is converted to a string and a string
851 comparison is performed.
852 Two strings are compared, of course, as strings.
853 According to the \*(PX standard, even if two strings are
854 numeric strings, a numeric comparison is performed. However, this is
855 clearly incorrect, and
859 Note that string constants, such as \fB"57"\fP, are
861 numeric strings, they are string constants.
862 The idea of \*(lqnumeric string\*(rq
863 only applies to fields,
870 elements and the elements of an array created by
872 that are numeric strings.
873 The basic idea is that
875 and only user input, that looks numeric,
876 should be treated that way.
878 Uninitialized variables have the numeric value 0 and the string value ""
879 (the null, or empty, string).
880 .SH PATTERNS AND ACTIONS
881 \*(AK is a line-oriented language. The pattern comes first, and then the
882 action. Action statements are enclosed in
886 Either the pattern may be missing, or the action may be missing, but,
887 of course, not both. If the pattern is missing, the action will be
888 executed for every single record of input.
889 A missing action is equivalent to
895 which prints the entire record.
897 Comments begin with the \*(lq#\*(rq character, and continue until the
899 Blank lines may be used to separate statements.
900 Normally, a statement ends with a newline, however, this is not the
901 case for lines ending in
913 also have their statements automatically continued on the following line.
914 In other cases, a line can be continued by ending it with a \*(lq\e\*(rq,
915 in which case the newline will be ignored.
917 Multiple statements may
918 be put on one line by separating them with a \*(lq;\*(rq.
919 This applies to both the statements within the action part of a
920 pattern-action pair (the usual case),
921 and to the pattern-action statements themselves.
923 \*(AK patterns may be one of the following:
929 .BI / "regular expression" /
930 .I "relational expression"
931 .IB pattern " && " pattern
932 .IB pattern " || " pattern
933 .IB pattern " ? " pattern " : " pattern
936 .IB pattern1 ", " pattern2
943 are two special kinds of patterns which are not tested against
945 The action parts of all
947 patterns are merged as if all the statements had
948 been written in a single
950 block. They are executed before any
951 of the input is read. Similarly, all the
954 and executed when all the input is exhausted (or when an
956 statement is executed).
960 patterns cannot be combined with other patterns in pattern expressions.
964 patterns cannot have missing action parts.
967 .BI / "regular expression" /
968 patterns, the associated statement is executed for each input record that matches
969 the regular expression.
970 Regular expressions are the same as those in
972 and are summarized below.
975 .I "relational expression"
976 may use any of the operators defined below in the section on actions.
977 These generally test whether certain fields match certain regular expressions.
984 operators are logical AND, logical OR, and logical NOT, respectively, as in C.
985 They do short-circuit evaluation, also as in C, and are used for combining
986 more primitive pattern expressions. As in most languages, parentheses
987 may be used to change the order of evaluation.
991 operator is like the same operator in C. If the first pattern is true
992 then the pattern used for testing is the second pattern, otherwise it is
993 the third. Only one of the second and third patterns is evaluated.
996 .IB pattern1 ", " pattern2
997 form of an expression is called a
998 .IR "range pattern" .
999 It matches all input records starting with a record that matches
1001 and continuing until a record that matches
1003 inclusive. It does not combine with any other sort of pattern expression.
1004 .SS Regular Expressions
1005 Regular expressions are the extended kind found in
1007 They are composed of characters as follows:
1008 .TP \w'\fB[^\fIabc.\|.\|.\fB]\fR'u+2n
1010 matches the non-metacharacter
1014 matches the literal character
1018 matches any character
1023 matches the beginning of a string.
1026 matches the end of a string.
1029 character list, matches any of the characters
1033 negated character list, matches any character except
1037 alternation: matches either
1043 concatenation: matches
1053 matches zero or more
1072 One or two numbers inside braces denote an
1073 .IR "interval expression" .
1074 If there is one number in the braces, the preceding regexp
1078 times. If there are two numbers separated by a comma,
1085 If there is one number followed by a comma, then
1087 is repeated at least
1091 Interval expressions are only available if either
1094 .B \-\^\-re\-interval
1095 is specified on the command line.
1098 matches the empty string at either the beginning or the
1102 matches the empty string within a word.
1105 matches the empty string at the beginning of a word.
1108 matches the empty string at the end of a word.
1111 matches any word-constituent character (letter, digit, or underscore).
1114 matches any character that is not word-constituent.
1117 matches the empty string at the beginning of a buffer (string).
1120 matches the empty string at the end of a buffer.
1122 The escape sequences that are valid in string constants (see below)
1123 are also legal in regular expressions.
1125 .I "Character classes"
1126 are a new feature introduced in the \*(PX standard.
1127 A character class is a special notation for describing
1128 lists of characters that have a specific attribute, but where the
1129 actual characters themselves can vary from country to country and/or
1130 from character set to character set. For example, the notion of what
1131 is an alphabetic character differs in the USA and in France.
1133 A character class is only valid in a regexp
1135 the brackets of a character list. Character classes consist of
1137 a keyword denoting the class, and
1139 Here are the character
1140 classes defined by the \*(PX standard.
1143 Alphanumeric characters.
1146 Alphabetic characters.
1149 Space or tab characters.
1158 Characters that are both printable and visible.
1159 (A space is printable, but not visible, while an
1164 Lower-case alphabetic characters.
1167 Printable characters (characters that are not control characters.)
1170 Punctuation characters (characters that are not letter, digits,
1171 control characters, or space characters).
1174 Space characters (such as space, tab, and formfeed, to name a few).
1177 Upper-case alphabetic characters.
1180 Characters that are hexadecimal digits.
1182 For example, before the \*(PX standard, to match alphanumeric
1183 characters, you would have had to write
1184 .BR /[A\-Za\-z0\-9]/ .
1185 If your character set had other alphabetic characters in it, this would not
1186 match them. With the \*(PX character classes, you can write
1190 the alphabetic and numeric characters in your character set.
1192 Two additional special sequences can appear in character lists.
1193 These apply to non-ASCII character sets, which can have single symbols
1195 .IR "collating elements" )
1196 that are represented with more than one
1197 character, as well as several characters that are equivalent for
1199 or sorting, purposes. (E.g., in French, a plain \*(lqe\*(rq
1200 and a grave-accented e\` are equivalent.)
1203 A collating symbols is a multi-character collating element enclosed in
1209 is a collating element, then
1211 is a regexp that matches this collating element, while
1213 is a regexp that matches either
1219 An equivalence class is a locale-specific name for a list of
1220 characters that are equivalent. The name is enclosed in
1224 For example, the name
1226 might be used to represent all of
1227 \*(lqe,\*(rq \*(lqe\`,\*(rq and \*(lqe\`.\*(rq
1237 These features are very valuable in non-English speaking locales.
1238 The library functions that
1240 uses for regular expression matching
1241 currently only recognize \*(PX character classes; they do not recognize
1242 collating symbols or equivalence classes.
1254 operators are specific to
1256 they are extensions based on facilities in the \*(GN regexp libraries.
1258 The various command line options
1261 interprets characters in regexps.
1264 In the default case,
1266 provide all the facilities of
1267 \*(PX regexps and the \*(GN regexp operators described above.
1268 However, interval expressions are not supported.
1271 Only \*(PX regexps are supported, the \*(GN operators are not special.
1276 Interval expressions are allowed.
1278 .B \-\^\-traditional
1281 regexps are matched. The \*(GN operators
1282 are not special, interval expressions are not available, and neither
1283 are the \*(PX character classes
1286 Characters described by octal and hexadecimal escape sequences are
1287 treated literally, even if they represent regexp metacharacters.
1289 .B \-\^\-re\-interval
1290 Allow interval expressions in regexps, even if
1291 .B \-\^\-traditional
1294 Action statements are enclosed in braces,
1298 Action statements consist of the usual assignment, conditional, and looping
1299 statements found in most languages. The operators, control statements,
1300 and input/output statements
1301 available are patterned after those in C.
1304 The operators in \*(AK, in order of decreasing precedence, are
1306 .TP "\w'\fB*= /= %= ^=\fR'u+1n"
1314 Increment and decrement, both prefix and postfix.
1317 Exponentiation (\fB**\fR may also be used, and \fB**=\fR for
1318 the assignment operator).
1321 Unary plus, unary minus, and logical negation.
1324 Multiplication, division, and modulus.
1327 Addition and subtraction.
1330 String concatenation.
1340 The regular relational operators.
1343 Regular expression match, negated match.
1345 Do not use a constant regular expression
1347 on the left-hand side of a
1351 Only use one on the right-hand side. The expression
1353 has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR.
1368 The C conditional expression. This has the form
1369 .IB expr1 " ? " expr2 " : " expr3\c
1373 is true, the value of the expression is
1388 Assignment. Both absolute assignment
1389 .BI ( var " = " value )
1390 and operator-assignment (the other forms) are supported.
1391 .SS Control Statements
1393 The control statements are
1398 \fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
1399 \fBwhile (\fIcondition\fB) \fIstatement \fR
1400 \fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
1401 \fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
1402 \fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
1405 \fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
1406 \fBdelete \fIarray\^\fR
1407 \fBexit\fR [ \fIexpression\fR ]
1408 \fB{ \fIstatements \fB}
1411 .SS "I/O Statements"
1413 The input/output statements are as follows:
1415 .TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n"
1417 Close file (or pipe, see below).
1422 from next input record; set
1427 .BI "getline <" file
1438 from next input record; set
1442 .BI getline " var" " <" file
1449 Stop processing the current input record. The next input record
1450 is read and processing starts over with the first pattern in the
1451 \*(AK program. If the end of the input data is reached, the
1453 block(s), if any, are executed.
1456 Stop processing the current input file. The next input record read
1457 comes from the next input file.
1463 is reset to 1, and processing starts over with the first pattern in the
1464 \*(AK program. If the end of the input data is reached, the
1466 block(s), if any, are executed.
1468 Earlier versions of gawk used
1470 as two words. While this usage is still recognized, it generates a
1471 warning message and will eventually be removed.
1474 Prints the current record.
1475 The output record is terminated with the value of the
1479 .BI print " expr-list"
1481 Each expression is separated by the value of the
1484 The output record is terminated with the value of the
1488 .BI print " expr-list" " >" file
1489 Prints expressions on
1491 Each expression is separated by the value of the
1493 variable. The output record is terminated with the value of the
1497 .BI printf " fmt, expr-list"
1500 .BI printf " fmt, expr-list" " >" file
1504 .BI system( cmd-line )
1507 and return the exit status.
1508 (This may not be available on non-\*(PX systems.)
1510 \&\fBfflush(\fR[\fIfile\^\fR]\fB)\fR
1511 Flush any buffers associated with the open output file or pipe
1515 is missing, then standard output is flushed.
1519 then all open output files and pipes
1520 have their buffers flushed.
1522 Other input/output redirections are also allowed. For
1527 appends output to the
1532 In a similar fashion,
1533 .IB command " | getline"
1538 command will return 0 on end of file, and \-1 on an error.
1540 NOTE: If using a pipe to
1550 to create new instances of the command.
1551 AWK does not automatically close pipes when
1553 .SS The \fIprintf\fP\^ Statement
1555 The \*(AK versions of the
1561 accept the following conversion specification formats:
1564 An \s-1ASCII\s+1 character.
1565 If the argument used for
1567 is numeric, it is treated as a character and printed.
1568 Otherwise, the argument is assumed to be a string, and the only first
1569 character of that string is printed.
1576 A decimal number (the integer part).
1583 A floating point number of the form
1584 .BR [\-]d.dddddde[+\^\-]dd .
1593 A floating point number of the form
1594 .BR [\-]ddd.dddddd .
1605 conversion, whichever is shorter, with nonsignificant zeros suppressed.
1614 An unsigned octal number (also an integer).
1618 An unsigned decimal number (again, an integer).
1628 An unsigned hexadecimal number (an integer).
1639 character; no argument is converted.
1641 There are optional, additional parameters that may lie between the
1643 and the control letter:
1646 The expression should be left-justified within its field.
1649 For numeric conversions, prefix positive values with a space, and
1650 negative values with a minus sign.
1653 The plus sign, used before the width modifier (see below),
1654 says to always supply a sign for numeric conversions, even if the data
1655 to be formatted is positive. The
1657 overrides the space modifier.
1660 Use an \*(lqalternate form\*(rq for certain control letters.
1663 supply a leading zero.
1679 the result will always contain a
1685 trailing zeros are not removed from the result.
1690 (zero) acts as a flag, that indicates output should be
1691 padded with zeroes instead of spaces.
1692 This applies even to non-numeric output formats.
1693 This flag only has an effect when the field width is wider than the
1694 value to be printed.
1697 The field should be padded to this width. The field is normally padded
1700 flag has been used, it is padded with zeroes.
1703 A number that specifies the precision to use when printing.
1709 formats, this specifies the
1710 number of digits you want printed to the right of the decimal point.
1715 formats, it specifies the maximum number
1716 of significant digits. For the
1724 formats, it specifies the minimum number of
1725 digits to print. For a string, it specifies the maximum number of
1726 characters from the string that should be printed.
1732 capabilities of the \*(AN C
1734 routines are supported.
1737 in place of either the
1741 specifications will cause their values to be taken from
1742 the argument list to
1746 .SS Special File Names
1748 When doing I/O redirection from either
1757 recognizes certain special filenames internally. These filenames
1758 allow access to open file descriptors inherited from
1760 parent process (usually the shell).
1761 Other special filenames provide access to information about the running
1765 .TP \w'\fB/dev/stdout\fR'u+1n
1767 Reading this file returns the process ID of the current process,
1768 in decimal, terminated with a newline.
1771 Reading this file returns the parent process ID of the current process,
1772 in decimal, terminated with a newline.
1775 Reading this file returns the process group ID of the current process,
1776 in decimal, terminated with a newline.
1779 Reading this file returns a single record terminated with a newline.
1780 The fields are separated with spaces.
1797 If there are any additional fields, they are the group IDs returned by
1799 Multiple groups may not be supported on all systems.
1805 The standard output.
1808 The standard error output.
1811 The file associated with the open file descriptor
1814 These are particularly useful for error messages. For example:
1818 print "You blew it!" > "/dev/stderr"
1822 whereas you would otherwise have to use
1826 print "You blew it!" | "cat 1>&2"
1830 These file names may also be used on the command line to name data files.
1831 .SS Numeric Functions
1833 \*(AK has the following pre-defined arithmetic functions:
1835 .TP \w'\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR'u+1n
1836 .BI atan2( y , " x" )
1837 returns the arctangent of
1842 returns the cosine of
1844 which is in radians.
1847 the exponential function.
1850 truncates to integer.
1853 the natural logarithm function.
1856 returns a random number between 0 and 1.
1861 which is in radians.
1864 the square root function.
1866 \&\fBsrand(\fR[\fIexpr\^\fR]\fB)\fR
1869 as a new seed for the random number generator. If no
1871 is provided, the time of day will be used.
1872 The return value is the previous seed for the random
1874 .SS String Functions
1877 has the following pre-defined string functions:
1879 .TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
1880 \fBgensub(\fIr\fB, \fIs\fB, \fIh \fR[\fB, \fIt\fR]\fB)\fR
1881 search the target string
1883 for matches of the regular expression
1887 is a string beginning with
1891 then replace all matches of
1897 is a number indicating which match of
1905 Within the replacement text
1911 is a digit from 1 to 9, may be used to indicate just the text that
1914 parenthesized subexpression. The sequence
1916 represents the entire matched text, as does the character
1922 the modified string is returned as the result of the function,
1923 and the original target string is
1926 .TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
1927 \fBgsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
1928 for each substring matching the regular expression
1932 substitute the string
1934 and return the number of substitutions.
1937 is not supplied, use
1941 in the replacement text is replaced with the text that was actually matched.
1947 .I "Effective AWK Programming"
1948 for a fuller discussion of the rules for
1950 and backslashes in the replacement text of
1956 .BI index( s , " t" )
1957 returns the index of the string
1965 \fBlength(\fR[\fIs\fR]\fB)
1966 returns the length of the string
1974 .BI match( s , " r" )
1975 returns the position in
1977 where the regular expression
1981 is not present, and sets the values of
1986 \fBsplit(\fIs\fB, \fIa \fR[\fB, \fIr\fR]\fB)\fR
1991 on the regular expression
1993 and returns the number of fields. If
2001 Splitting behaves identically to field splitting, described above.
2003 .BI sprintf( fmt , " expr-list" )
2008 and returns the resulting string.
2010 \fBsub(\fIr\fB, \fIs \fR[\fB, \fIt\fR]\fB)\fR
2013 but only the first matching substring is replaced.
2015 \fBsubstr(\fIs\fB, \fIi \fR[\fB, \fIn\fR]\fB)\fR
2024 is omitted, the rest of
2029 returns a copy of the string
2031 with all the upper-case characters in
2033 translated to their corresponding lower-case counterparts.
2034 Non-alphabetic characters are left unchanged.
2037 returns a copy of the string
2039 with all the lower-case characters in
2041 translated to their corresponding upper-case counterparts.
2042 Non-alphabetic characters are left unchanged.
2045 Since one of the primary uses of \*(AK programs is processing log files
2046 that contain time stamp information,
2048 provides the following two functions for obtaining time stamps and
2051 .TP "\w'\fBsystime()\fR'u+1n"
2053 returns the current time of day as the number of seconds since the Epoch
2054 (Midnight UTC, January 1, 1970 on \*(PX systems).
2056 \fBstrftime(\fR[\fIformat \fR[\fB, \fItimestamp\fR]]\fB)\fR
2059 according to the specification in
2063 should be of the same form as returned by
2067 is missing, the current time of day is used.
2070 is missing, a default format equivalent to the output of
2073 See the specification for the
2075 function in \*(AN C for the format conversions that are
2076 guaranteed to be available.
2077 A public-domain version of
2079 and a man page for it come with
2081 if that version was used to build
2083 then all of the conversions described in that man page are available to
2085 .SS String Constants
2087 String constants in \*(AK are sequences of characters enclosed
2088 between double quotes (\fB"\fR). Within strings, certain
2089 .I "escape sequences"
2090 are recognized, as in C. These are:
2092 .TP \w'\fB\e\^\fIddd\fR'u+1n
2094 A literal backslash.
2097 The \*(lqalert\*(rq character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character.
2117 .BI \ex "\^hex digits"
2118 The character represented by the string of hexadecimal digits following
2121 As in \*(AN C, all following hexadecimal digits are considered part of
2122 the escape sequence.
2123 (This feature should tell us something about language design by committee.)
2124 E.g., \fB"\ex1B"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
2127 The character represented by the 1-, 2-, or 3-digit sequence of octal
2129 E.g., \fB"\e033"\fR is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
2132 The literal character
2135 The escape sequences may also be used inside constant regular expressions
2137 .B "/[\ \et\ef\en\er\ev]/"
2138 matches whitespace characters).
2140 In compatibility mode, the characters represented by octal and
2141 hexadecimal escape sequences are treated literally when used in
2142 regexp constants. Thus,
2147 Functions in \*(AK are defined as follows:
2150 \fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
2153 Functions are executed when they are called from within expressions
2154 in either patterns or actions. Actual parameters supplied in the function
2155 call are used to instantiate the formal parameters declared in the function.
2156 Arrays are passed by reference, other variables are passed by value.
2158 Since functions were not originally part of the \*(AK language, the provision
2159 for local variables is rather clumsy: They are declared as extra parameters
2160 in the parameter list. The convention is to separate local variables from
2161 real parameters by extra spaces in the parameter list. For example:
2166 function f(p, q, a, b) # a & b are local
2171 /abc/ { .\|.\|. ; f(1, 2) ; .\|.\|. }
2176 The left parenthesis in a function call is required
2177 to immediately follow the function name,
2178 without any intervening white space.
2179 This is to avoid a syntactic ambiguity with the concatenation operator.
2180 This restriction does not apply to the built-in functions listed above.
2182 Functions may call each other and may be recursive.
2183 Function parameters used as local variables are initialized
2184 to the null string and the number zero upon function invocation.
2188 to return a value from a function. The return value is undefined if no
2189 value is provided, or if the function returns by \*(lqfalling off\*(rq the
2196 will warn about calls to undefined functions at parse time,
2197 instead of at run time.
2198 Calling an undefined function at run time is a fatal error.
2202 may be used in place of
2206 Print and sort the login names of all users:
2210 { print $1 | "sort" }
2213 Count lines in a file:
2217 END { print nlines }
2220 Precede each line by its number in the file:
2226 Concatenate and line number (a variation on a theme):
2243 .IR "The AWK Programming Language" ,
2244 Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
2245 Addison-Wesley, 1988. ISBN 0-201-07981-X.
2247 .IR "Effective AWK Programming" ,
2248 Edition 1.0, published by the Free Software Foundation, 1995.
2249 .SH POSIX COMPATIBILITY
2252 is compatibility with the \*(PX standard, as well as with the
2253 latest version of \*(UX
2257 incorporates the following user visible
2258 features which are not described in the \*(AK book,
2259 but are part of the Bell Labs version of
2261 and are in the \*(PX standard.
2265 option for assigning variables before program execution starts is new.
2266 The book indicates that command line variable assignment happens when
2268 would otherwise open the argument as a file, which is after the
2270 block is executed. However, in earlier implementations, when such an
2271 assignment appeared before any file names, the assignment would happen
2275 block was run. Applications came to depend on this \*(lqfeature.\*(rq
2278 was changed to match its documentation, this option was added to
2279 accommodate applications that depended upon the old behavior.
2280 (This feature was agreed upon by both the AT&T and \*(GN developers.)
2284 option for implementation specific features is from the \*(PX standard.
2286 When processing arguments,
2288 uses the special option \*(lq\-\^\-\*(rq to signal the end of
2290 In compatibility mode, it will warn about, but otherwise ignore,
2292 In normal operation, such arguments are passed on to the \*(AK program for
2295 The \*(AK book does not define the return value of
2298 has it return the seed it was using, to allow keeping track
2299 of random number sequences. Therefore
2303 also returns its current seed.
2305 Other new features are:
2316 escape sequences (done originally in
2318 and fed back into AT&T's); the
2322 built-in functions (from AT&T); and the \*(AN C conversion specifications in
2324 (done first in AT&T's version).
2327 has a number of extensions to \*(PX
2329 They are described in this section. All the extensions described here
2334 .B \-\^\-traditional
2337 The following features of
2339 are not available in
2367 The special file names available for I/O redirection are not recognized.
2375 variables are not special.
2380 variable and its side-effects are not available.
2385 variable and fixed-width field splitting.
2390 as a regular expression.
2393 The ability to split out individual characters using the null string
2396 and as the third argument to
2400 No path search is performed for files named via the
2402 option. Therefore the
2404 environment variable is not special.
2409 to abandon processing of the current input file.
2414 to delete the entire contents of an array.
2417 The AWK book does not define the return value of the
2422 returns the value from
2426 when closing a file or pipe, respectively.
2431 .B \-\^\-traditional
2437 option is \*(lqt\*(rq, then
2439 will be set to the tab character.
2441 .B "gawk \-F\et \&.\|.\|."
2442 simply causes the shell to quote the \*(lqt,\*(rq, and does not pass
2443 \*(lq\et\*(rq to the
2446 Since this is a rather ugly special case, it is not the default behavior.
2447 This behavior also does not occur if
2450 To really get a tab character as the field separator, it is best to use
2452 .BR "gawk \-F'\et' \&.\|.\|." .
2457 was compiled for debugging, it will
2458 accept the following additional options:
2469 debugging output during program parsing.
2470 This option should only be of interest to the
2472 maintainers, and may not even be compiled into
2475 .SH HISTORICAL FEATURES
2476 There are two features of historical \*(AK implementations that
2479 First, it is possible to call the
2481 built-in function not only with no argument, but even without parentheses!
2486 a = length # Holy Algol 60, Batman!
2490 is the same as either of
2500 This feature is marked as \*(lqdeprecated\*(rq in the \*(PX standard, and
2502 will issue a warning about its use if
2504 is specified on the command line.
2506 The other feature is the use of either the
2510 statements outside the body of a
2515 loop. Traditional \*(AK implementations have treated such usage as
2520 will support this usage if
2521 .B \-\^\-traditional
2523 .SH ENVIRONMENT VARIABLES
2526 exists in the environment, then
2528 behaves exactly as if
2530 had been specified on the command line.
2535 will issue a warning message to this effect.
2539 environment variable can be used to provide a list of directories that
2541 will search when looking for files named via the
2549 option is not necessary given the command line variable assignment feature;
2550 it remains only for backwards compatibility.
2552 If your system actually has support for
2559 files, you may get different output from
2561 than you would get on a system without those files. When
2563 interprets these files internally, it synchronizes output to the standard
2564 output with output to
2566 while on a system with those files, the output is actually to different
2570 Syntactically invalid single character programs tend to overflow
2571 the parse stack, generating a rather unhelpful message. Such programs
2572 are surprisingly difficult to diagnose in the completely general case,
2573 and the effort to do so really is not worth it.
2574 .SH VERSION INFORMATION
2575 This man page documents
2579 The original version of \*(UX
2581 was designed and implemented by Alfred Aho,
2582 Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
2583 continues to maintain and enhance it.
2585 Paul Rubin and Jay Fenlason,
2586 of the Free Software Foundation, wrote
2588 to be compatible with the original version of
2590 distributed in Seventh Edition \*(UX.
2591 John Woods contributed a number of bug fixes.
2592 David Trueman, with contributions
2593 from Arnold Robbins, made
2595 compatible with the new version of \*(UX
2597 Arnold Robbins is the current maintainer.
2599 The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
2600 Scott Deifik is the current DOS maintainer. Pat Rankin did the
2601 port to VMS, and Michal Jaegermann did the port to the Atari ST.
2602 The port to OS/2 was done by Kai Uwe Rommel, with contributions and
2603 help from Darrel Hankerson. Fred Fish supplied support for the Amiga.
2605 If you find a bug in
2607 please send electronic mail to
2608 .BR bug-gawk@gnu.org .
2609 Please include your operating system and its revision, the version of
2611 what C compiler you used to compile it, and a test program
2612 and data that are as small as possible for reproducing the problem.
2614 Before sending a bug report, please do two things. First, verify that
2615 you have the latest version of
2617 Many bugs (usually subtle ones) are fixed at each release, and if
2618 yours is out of date, the problem may already have been solved.
2619 Second, please read this man page and the reference manual carefully to
2620 be sure that what you think is a bug really is, instead of just a quirk
2625 post a bug report in
2629 developers occasionally read this newsgroup, posting bug reports there
2630 is an unreliable way to report bugs. Instead, please use the electronic mail
2631 addresses given above.
2632 .SH ACKNOWLEDGEMENTS
2633 Brian Kernighan of Bell Labs
2634 provided valuable assistance during testing and debugging.
2636 .SH COPYING PERMISSIONS
2637 Copyright \(co 1996\-2000 Free Software Foundation, Inc.
2639 Permission is granted to make and distribute verbatim copies of
2640 this manual page provided the copyright notice and this permission
2641 notice are preserved on all copies.
2643 Permission is granted to process this file through troff and print the
2644 results, provided the printed document carries copying permission
2645 notice identical to this one except for the removal of this paragraph
2646 (this paragraph not being relevant to the printed manual page).
2649 Permission is granted to copy and distribute modified versions of this
2650 manual page under the conditions for verbatim copying, provided that
2651 the entire resulting derived work is distributed under the terms of a
2652 permission notice identical to this one.
2654 Permission is granted to copy and distribute translations of this
2655 manual page into another language, under the above conditions for
2656 modified versions, except that this permission notice may be stated in
2657 a translation approved by the Foundation.