3 .TH FLEX 1 "May 21, 2013" "Version 2.5.37"
5 flex, lex \- fast lexical analyzer generator
8 .B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
9 .B [\-\-help \-\-version]
14 a tool for generating programs that perform pattern-matching on text.
15 The manual includes both tutorial and reference sections:
19 a brief overview of the tool
23 Format Of The Input File
26 the extended regular expressions used by flex
28 How The Input Is Matched
29 the rules for determining what has been matched
32 how to specify what to do when a pattern is matched
35 details regarding the scanner that flex produces;
36 how to control the input source
39 introducing context into your scanners, and
40 managing "mini-scanners"
42 Multiple Input Buffers
43 how to manipulate multiple input sources; how to
44 scan from strings instead of files
47 special rules for matching the end of the input
50 a summary of macros available to the actions
52 Values Available To The User
53 a summary of values available to the actions
56 connecting flex scanners together with yacc parsers
59 flex command-line options, and the "%option"
62 Performance Considerations
63 how to make your scanner go as fast as possible
65 Generating C++ Scanners
66 the (experimental) facility for generating C++
69 Incompatibilities With Lex And POSIX
70 how flex differs from AT&T lex and the POSIX lex
74 those error messages produced by flex (or scanners
75 it generates) whose meanings might not be apparent
81 known problems with flex
84 other documentation, related tools
87 includes contact information
92 is a tool for generating
94 programs which recognize lexical patterns in text.
97 the given input files, or its standard input if no file names are given,
98 for a description of a scanner to generate.
99 The description is in the form of pairs
100 of regular expressions and C code, called
103 generates as output a C source file,
105 which defines a routine
107 This file is compiled and linked with the
109 library to produce an executable.
110 When the executable is run,
111 it analyzes its input for occurrences
112 of the regular expressions.
113 Whenever it finds one, it executes
114 the corresponding C code.
115 .SH SOME SIMPLE EXAMPLES
116 First some simple examples to get the flavor of how one uses
120 input specifies a scanner which whenever it encounters the string
121 "username" will replace it with the user's login name:
125 username printf( "%s", getlogin() );
128 By default, any text not matched by a
131 is copied to the output, so the net effect of this scanner is
132 to copy its input file to its output with each occurrence
133 of "username" expanded.
134 In this input, there is just one rule.
137 and the "printf" is the
139 The "%%" marks the beginning of the rules.
141 Here's another simple example:
145 int num_lines = 0, num_chars = 0;
149 \\n ++num_lines; ++num_chars;
156 printf( "# of lines = %d, # of chars = %d\\n",
157 num_lines, num_chars );
161 This scanner counts the number of characters and the number
162 of lines in its input (it produces no output other than the
163 final report on the counts).
165 declares two globals, "num_lines" and "num_chars", which are accessible
170 routine declared after the second "%%".
171 There are two rules, one
172 which matches a newline ("\\n") and increments both the line count and
173 the character count, and one which matches any character other than
174 a newline (indicated by the "." regular expression).
176 A somewhat more complicated example:
179 /* scanner for a toy Pascal-like language */
182 /* need this for the call to atof() below */
192 printf( "An integer: %s (%d)\\n", yytext,
196 {DIGIT}+"."{DIGIT}* {
197 printf( "A float: %s (%g)\\n", yytext,
201 if|then|begin|end|procedure|function {
202 printf( "A keyword: %s\\n", yytext );
205 {ID} printf( "An identifier: %s\\n", yytext );
207 "+"|"-"|"*"|"/" printf( "An operator: %s\\n", yytext );
209 "{"[^}\\n]*"}" /* eat up one-line comments */
211 [ \\t\\n]+ /* eat up whitespace */
213 . printf( "Unrecognized character: %s\\n", yytext );
221 ++argv, --argc; /* skip over program name */
223 yyin = fopen( argv[0], "r" );
231 This is the beginnings of a simple scanner for a language like
233 It identifies different types of
235 and reports on what it has seen.
237 The details of this example will be explained in the following
239 .SH FORMAT OF THE INPUT FILE
242 input file consists of three sections, separated by a line with just
256 section contains declarations of simple
258 definitions to simplify the scanner specification, and declarations of
260 which are explained in a later section.
262 Name definitions have the form:
268 The "name" is a word beginning with a letter or an underscore ('_')
269 followed by zero or more letters, digits, '_', or '-' (dash).
270 The definition is taken to begin at the first non-white-space character
271 following the name and continuing to the end of the line.
272 The definition can subsequently be referred to using "{name}", which
273 will expand to "(definition)".
281 defines "DIGIT" to be a regular expression which matches a
283 "ID" to be a regular expression which matches a letter
284 followed by zero-or-more letters-or-digits.
285 A subsequent reference to
297 and matches one-or-more digits followed by a '.' followed
298 by zero-or-more digits.
304 input contains a series of rules of the form:
310 where the pattern must be unindented and the action must begin
313 See below for a further description of patterns and actions.
315 Finally, the user code section is simply copied to
318 It is used for companion routines which call or are called
320 The presence of this section is optional;
321 if it is missing, the second
323 in the input file may be skipped, too.
325 In the definitions and rules sections, any
327 text or text enclosed in
331 is copied verbatim to the output (with the %{}'s removed).
332 The %{}'s must appear unindented on lines by themselves.
334 In the rules section,
335 any indented or %{} text appearing before the
336 first rule may be used to declare variables
337 which are local to the scanning routine and (after the declarations)
338 code which is to be executed whenever the scanning routine is entered.
339 Other indented or %{} text in the rule section is still copied to the output,
340 but its meaning is not well-defined and it may well cause compile-time
341 errors (this feature is present for
343 compliance; see below for other such features).
345 In the definitions section (but not in the rules section),
346 an unindented comment (i.e., a line
347 beginning with "/*") is also copied verbatim to the output up
350 The patterns in the input are written using an extended set of regular
355 x match the character 'x'
356 . any character (byte) except newline
357 [xyz] a "character class"; in this case, the pattern
358 matches either an 'x', a 'y', or a 'z'
359 [abj-oZ] a "character class" with a range in it; matches
360 an 'a', a 'b', any letter from 'j' through 'o',
362 [^A-Z] a "negated character class", i.e., any character
363 but those in the class. In this case, any
364 character EXCEPT an uppercase letter.
365 [^A-Z\\n] any character EXCEPT an uppercase letter or
367 r* zero or more r's, where r is any regular expression
369 r? zero or one r's (that is, "an optional r")
370 r{2,5} anywhere from two to five r's
371 r{2,} two or more r's
373 {name} the expansion of the "name" definition
376 the literal string: [xyz]"foo
377 \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
378 then the ANSI-C interpretation of \\x.
379 Otherwise, a literal 'X' (used to escape
380 operators such as '*')
381 \\0 a NUL character (ASCII code 0)
382 \\123 the character with octal value 123
383 \\x2a the character with hexadecimal value 2a
384 (r) match an r; parentheses are used to override
385 precedence (see below)
388 rs the regular expression r followed by the
389 regular expression s; called "concatenation"
392 r|s either an r or an s
395 r/s an r but only if it is followed by an s. The
396 text matched by s is included when determining
397 whether this rule is the "longest match",
398 but is then returned to the input before
399 the action is executed. So the action only
400 sees the text matched by r. This type
401 of pattern is called trailing context".
402 (There are some combinations of r/s that flex
403 cannot match correctly; see notes in the
404 Deficiencies / Bugs section below regarding
405 "dangerous trailing context".)
406 ^r an r, but only at the beginning of a line (i.e.,
407 when just starting to scan, or right after a
408 newline has been scanned).
409 r$ an r, but only at the end of a line (i.e., just
410 before a newline). Equivalent to "r/\\n".
412 Note that flex's notion of "newline" is exactly
413 whatever the C compiler used to compile flex
414 interprets '\\n' as; in particular, on some DOS
415 systems you must either filter out \\r's in the
416 input yourself, or explicitly use r/\\r\\n for "r$".
419 <s>r an r, but only in start condition s (see
420 below for discussion of start conditions)
422 same, but in any of start conditions s1,
424 <*>r an r in any start condition, even an exclusive one.
427 <<EOF>> an end-of-file
429 an end-of-file when in start condition s1 or s2
432 Note that inside of a character class, all regular expression operators
433 lose their special meaning except escape ('\\') and the character class
434 operators, '-', ']', and, at the beginning of the class, '^'.
436 The regular expressions listed above are grouped according to
437 precedence, from highest precedence at the top to lowest at the bottom.
438 Those grouped together have equal precedence.
451 since the '*' operator has higher precedence than concatenation,
452 and concatenation higher than alternation ('|').
458 the string "ba" followed by zero-or-more r's.
459 To match "foo" or zero-or-more "bar"'s, use:
465 and to match zero-or-more "foo"'s-or-"bar"'s:
472 In addition to characters and ranges of characters, character classes
473 can also contain character class
475 These are expressions enclosed inside
479 delimiters (which themselves must appear between the '[' and ']' of the
480 character class; other elements may occur inside the character class, too).
481 The valid expressions are:
484 [:alnum:] [:alpha:] [:blank:]
485 [:cntrl:] [:digit:] [:graph:]
486 [:lower:] [:print:] [:punct:]
487 [:space:] [:upper:] [:xdigit:]
490 These expressions all designate a set of characters equivalent to
491 the corresponding standard C
496 designates those characters for which
498 returns true - i.e., any alphabetic or numeric.
499 Some systems don't provide
505 For example, the following character classes are all equivalent:
514 If your scanner is case-insensitive (the
523 Some notes on patterns:
525 A negated character class such as the example "[^A-Z]"
527 .I will match a newline
528 unless "\\n" (or an equivalent escape sequence) is one of the
529 characters explicitly present in the negated character class
531 This is unlike how many other regular
532 expression tools treat negated character classes, but unfortunately
533 the inconsistency is historically entrenched.
534 Matching newlines means that a pattern like [^"]* can match the entire
535 input unless there's another quote in the input.
537 A rule can have at most one instance of trailing context (the '/' operator
538 or the '$' operator).
539 The start condition, '^', and "<<EOF>>" patterns
540 can only occur at the beginning of a pattern, and, as well as with '/' and '$',
541 cannot be grouped inside parentheses.
542 A '^' which does not occur at
543 the beginning of a rule or a '$' which does not occur at the end of
544 a rule loses its special properties and is treated as a normal character.
546 The following are illegal:
553 Note that the first of these, can be written "foo/bar\\n".
555 The following will result in '$' or '^' being treated as a normal character:
562 If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
563 could be used (the special '|' action is explained below):
567 bar$ /* action goes here */
570 A similar trick will work for matching a foo or a
571 bar-at-the-beginning-of-a-line.
572 .SH HOW THE INPUT IS MATCHED
573 When the generated scanner is run, it analyzes its input looking
574 for strings which match any of its patterns.
575 If it finds more than
576 one match, it takes the one matching the most text (for trailing
577 context rules, this includes the length of the trailing part, even
578 though it will then be returned to the input).
580 or more matches of the same length, the
581 rule listed first in the
583 input file is chosen.
585 Once the match is determined, the text corresponding to the match
588 is made available in the global character pointer
590 and its length in the global integer
594 corresponding to the matched pattern is then executed (a more
595 detailed description of actions follows), and then the remaining
596 input is scanned for another match.
598 If no match is found, then the
600 is executed: the next character in the input is considered matched and
601 copied to the standard output.
602 Thus, the simplest legal
610 which generates a scanner that simply copies its input (one character
611 at a time) to its output.
615 can be defined in two different ways: either as a character
619 You can control which definition
621 uses by including one of the special directives
625 in the first (definitions) section of your flex input.
630 lex compatibility option, in which case
633 The advantage of using
635 is substantially faster scanning and no buffer overflow when matching
636 very large tokens (unless you run out of dynamic memory).
638 is that you are restricted in how your actions can modify
640 (see the next section), and calls to the
642 function destroys the present contents of
644 which can be a considerable porting headache when moving between different
650 is that you can then modify
652 to your heart's content, and calls to
657 Furthermore, existing
659 programs sometimes access
661 externally using declarations of the form:
663 extern char yytext[];
665 This definition is erroneous when used with
675 characters, which defaults to a fairly large value.
677 the size by simply #define'ing
679 to a different value in the first section of your
682 As mentioned above, with
684 yytext grows dynamically to accommodate large tokens.
685 While this means your
687 scanner can accommodate very large tokens (such as matching entire blocks
688 of comments), bear in mind that each time the scanner must resize
690 it also must rescan the entire token from the beginning, so matching such
691 tokens can prove slow.
695 dynamically grow if a call to
697 results in too much text being pushed back; instead, a run-time error results.
699 Also note that you cannot use
701 with C++ scanner classes
706 Each pattern in a rule has a corresponding action, which can be any
707 arbitrary C statement.
708 The pattern ends at the first non-escaped
709 whitespace character; the remainder of the line is its action.
711 action is empty, then when the pattern is matched the input token
713 For example, here is the specification for a program
714 which deletes all occurrences of "zap me" from its input:
721 (It will copy all other characters in the input to the output since
722 they will be matched by the default rule.)
724 Here is a program which compresses multiple blanks and tabs down to
725 a single blank, and throws away whitespace found at the end of a line:
729 [ \\t]+ putchar( ' ' );
730 [ \\t]+$ /* ignore this token */
734 If the action contains a '{', then the action spans till the balancing '}'
735 is found, and the action may cross multiple lines.
737 knows about C strings and comments and won't be fooled by braces found
738 within them, but also allows actions to begin with
740 and will consider the action to be all the text up to the next
742 (regardless of ordinary braces inside the action).
744 An action consisting solely of a vertical bar ('|') means "same as
745 the action for the next rule." See below for an illustration.
747 Actions can include arbitrary C code, including
749 statements to return a value to whatever routine called
753 is called it continues processing tokens from where it last left
754 off until it either reaches
755 the end of the file or executes a return.
757 Actions are free to modify
759 except for lengthening it (adding
760 characters to its end--these will overwrite later characters in the
762 This however does not apply when using
764 (see above); in that case,
766 may be freely modified in any way.
768 Actions are free to modify
770 except they should not do so if the action also includes use of
774 There are a number of special directives which can be included within
778 copies yytext to the scanner's output.
781 followed by the name of a start condition places the scanner in the
782 corresponding start condition (see below).
785 directs the scanner to proceed on to the "second best" rule which matched the
786 input (or a prefix of the input).
787 The rule is chosen as described
788 above in "How the Input is Matched", and
792 set up appropriately.
793 It may either be one which matched as much text
794 as the originally chosen rule but came later in the
796 input file, or one which matched less text.
797 For example, the following will both count the
798 words in the input and call the routine special() whenever "frob" is seen:
804 frob special(); REJECT;
805 [^ \\t\\n]+ ++word_count;
810 any "frob"'s in the input would not be counted as words, since the
811 scanner normally executes only one action per token.
814 are allowed, each one finding the next best choice to the currently
816 For example, when the following scanner scans the token
817 "abcd", it will write "abcdabcaba" to the output:
825 .|\\n /* eat up any unmatched character */
828 (The first three rules share the fourth's action since they use
829 the special '|' action.)
831 is a particularly expensive feature in terms of scanner performance;
834 of the scanner's actions it will slow down
836 of the scanner's matching.
839 cannot be used with the
845 Note also that unlike the other special actions,
849 code immediately following it in the action will
854 tells the scanner that the next time it matches a rule, the corresponding
857 onto the current value of
859 rather than replacing it.
860 For example, given the input "mega-kludge"
861 the following will write "mega-mega-kludge" to the output:
865 mega- ECHO; yymore();
869 First "mega-" is matched and echoed to the output.
871 is matched, but the previous "mega-" is still hanging around at the
876 for the "kludge" rule will actually write "mega-kludge".
878 Two notes regarding use of
882 depends on the value of
884 correctly reflecting the size of the current token, so you must not
889 Second, the presence of
891 in the scanner's action entails a minor performance penalty in the
892 scanner's matching speed.
895 returns all but the first
897 characters of the current token back to the input stream, where they
898 will be rescanned when the scanner looks for the next match.
902 are adjusted appropriately (e.g.,
907 For example, on the input "foobar" the following will write out
912 foobar ECHO; yyless(3);
918 will cause the entire current input string to be scanned again.
920 changed how the scanner will subsequently process its input (using
922 for example), this will result in an endless loop.
926 is a macro and can only be used in the flex input file, not from
932 back onto the input stream.
933 It will be the next character scanned.
934 The following action will take the current token and cause it
935 to be rescanned enclosed in parentheses.
940 /* Copy yytext because unput() trashes yytext */
941 char *yycopy = strdup( yytext );
943 for ( i = yyleng - 1; i >= 0; --i )
952 puts the given character back at the
954 of the input stream, pushing back strings must be done back-to-front.
956 An important potential problem when using
958 is that if you are using
960 (the default), a call to
965 starting with its rightmost character and devouring one character to
966 the left with each call.
967 If you need the value of yytext preserved
970 (as in the above example),
971 you must either first copy it elsewhere, or build your scanner using
973 instead (see How The Input Is Matched).
975 Finally, note that you cannot put back
977 to attempt to mark the input stream with an end-of-file.
980 reads the next character from the input stream.
982 the following is one way to eat up C comments:
991 while ( (c = input()) != '*' &&
993 ; /* eat up text of comment */
997 while ( (c = input()) == '*' )
1000 break; /* found the end */
1005 error( "EOF in comment" );
1012 (Note that if the scanner is compiled using
1016 is instead referred to as
1018 in order to avoid a name clash with the
1020 stream by the name of
1024 flushes the scanner's internal buffer
1025 so that the next time the scanner attempts to match a token, it will
1026 first refill the buffer using
1028 (see The Generated Scanner, below).
1029 This action is a special case
1031 .B yy_flush_buffer()
1032 function, described below in the section Multiple Input Buffers.
1035 can be used in lieu of a return statement in an action.
1037 the scanner and returns a 0 to the scanner's caller, indicating "all done".
1040 is also called when an end-of-file is encountered.
1041 It is a macro and may be redefined.
1042 .SH THE GENERATED SCANNER
1047 which contains the scanning routine
1049 a number of tables used by it for matching tokens, and a number
1050 of auxiliary routines and macros.
1053 is declared as follows:
1058 ... various definitions and the actions in here ...
1062 (If your environment supports function prototypes, then it will
1063 be "int yylex( void )".) This definition may be changed by defining
1064 the "YY_DECL" macro.
1065 For example, you could use:
1068 #define YY_DECL float lexscan( a, b ) float a, b;
1071 to give the scanning routine the name
1073 returning a float, and taking two floats as arguments.
1075 if you give arguments to the scanning routine using a
1076 K&R-style/non-prototyped function declaration, you must terminate
1077 the definition with a semi-colon (;).
1081 is called, it scans tokens from the global input file
1083 (which defaults to stdin).
1084 It continues until it either reaches
1085 an end-of-file (at which point it returns the value 0) or
1086 one of its actions executes a
1090 If the scanner reaches an end-of-file, subsequent calls are undefined
1093 is pointed at a new input file (in which case scanning continues from
1098 takes one argument, a
1100 pointer (which can be nil, if you've set up
1102 to scan from a source other than
1106 for scanning from that file.
1107 Essentially there is no difference between
1110 to a new input file or using
1112 to do so; the latter is available for compatibility with previous versions
1115 and because it can be used to switch input files in the middle of scanning.
1116 It can also be used to throw away the current input buffer, by calling
1117 it with an argument of
1119 but better is to use
1126 reset the start condition to
1128 (see Start Conditions, below).
1132 stops scanning due to executing a
1134 statement in one of the actions, the scanner may then be called again and it
1135 will resume scanning where it left off.
1137 By default (and for purposes of efficiency), the scanner uses
1138 block-reads rather than simple
1140 calls to read characters from
1142 The nature of how it gets its input can be controlled by defining the
1145 YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".
1146 Its action is to place up to
1148 characters in the character array
1150 and return in the integer variable
1153 number of characters read or the constant YY_NULL (0 on Unix systems)
1155 The default YY_INPUT reads from the
1156 global file-pointer "yyin".
1158 A sample definition of YY_INPUT (in the definitions
1159 section of the input file):
1163 #define YY_INPUT(buf,result,max_size) \\
1165 int c = getchar(); \\
1166 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
1171 This definition will change the input processing to occur
1172 one character at a time.
1174 When the scanner receives an end-of-file indication from YY_INPUT,
1180 returns false (zero), then it is assumed that the
1181 function has gone ahead and set up
1183 to point to another input file, and scanning continues.
1185 true (non-zero), then the scanner terminates, returning 0 to its
1187 Note that in either case, the start condition remains unchanged;
1193 If you do not supply your own version of
1195 then you must either use
1197 (in which case the scanner behaves as though
1199 returned 1), or you must link with
1201 to obtain the default version of the routine, which always returns 1.
1203 Three routines are available for scanning from in-memory buffers rather
1205 .B yy_scan_string(), yy_scan_bytes(),
1207 .B yy_scan_buffer().
1208 See the discussion of them below in the section Multiple Input Buffers.
1210 The scanner writes its
1214 global (default, stdout), which may be redefined by the user simply
1215 by assigning it to some other
1218 .SH START CONDITIONS
1220 provides a mechanism for conditionally activating rules.
1222 whose pattern is prefixed with "<sc>" will only be active when
1223 the scanner is in the start condition named "sc".
1227 <STRING>[^"]* { /* eat up the string body ... */
1232 will be active only when the scanner is in the "STRING" start
1236 <INITIAL,STRING,QUOTE>\\. { /* handle an escape ... */
1241 will be active only when the current start condition is
1242 either "INITIAL", "STRING", or "QUOTE".
1245 are declared in the definitions (first) section of the input
1246 using unindented lines beginning with either
1250 followed by a list of names.
1253 start conditions, the latter
1256 A start condition is activated using the
1261 action is executed, rules with the given start
1262 condition will be active and
1263 rules with other start conditions will be inactive.
1264 If the start condition is
1266 then rules with no start conditions at all will also be active.
1271 rules qualified with the start condition will be active.
1272 A set of rules contingent on the same exclusive start condition
1273 describe a scanner which is independent of any of the other rules in the
1277 exclusive start conditions make it easy to specify "mini-scanners"
1278 which scan portions of the input that are syntactically different
1279 from the rest (e.g., comments).
1281 If the distinction between inclusive and exclusive start conditions
1282 is still a little vague, here's a simple example illustrating the
1283 connection between the two.
1290 <example>foo do_something();
1292 bar something_else();
1301 <example>foo do_something();
1303 <INITIAL,example>bar something_else();
1307 .B <INITIAL,example>
1310 pattern in the second example wouldn't be active (i.e., couldn't match)
1311 when in start condition
1317 though, then it would only be active in
1321 while in the first example it's active in both, because in the first
1324 start condition is an
1329 Also note that the special start-condition specifier
1331 matches every start condition.
1332 Thus, the above example could also have been written;
1338 <example>foo do_something();
1340 <*>bar something_else();
1344 The default rule (to
1346 any unmatched character) remains active in start conditions.
1356 returns to the original state where only the rules with
1357 no start conditions are active.
1358 This state can also be
1359 referred to as the start-condition "INITIAL", so
1363 (The parentheses around the start condition name are not required but
1364 are considered good style.)
1367 actions can also be given as indented code at the beginning
1368 of the rules section.
1369 For example, the following will cause
1370 the scanner to enter the "SPECIAL" start condition whenever
1372 is called and the global variable
1381 if ( enter_special )
1384 <SPECIAL>blahblahblah
1385 ...more rules follow...
1389 To illustrate the uses of start conditions,
1390 here is a scanner which provides two different interpretations
1391 of a string like "123.456".
1392 By default it will treat it as
1393 three tokens, the integer "123", a dot ('.'), and the integer "456".
1394 But if the string is preceded earlier in the line by the string
1396 it will treat it as a single token, the floating-point number
1406 expect-floats BEGIN(expect);
1408 <expect>[0-9]+"."[0-9]+ {
1409 printf( "found a float, = %f\\n",
1413 /* that's the end of the line, so
1414 * we need another "expect-number"
1415 * before we'll recognize any more
1422 printf( "found an integer, = %d\\n",
1426 "." printf( "found a dot\\n" );
1429 Here is a scanner which recognizes (and discards) C comments while
1430 maintaining a count of the current input line.
1437 "/*" BEGIN(comment);
1439 <comment>[^*\\n]* /* eat anything that's not a '*' */
1440 <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
1441 <comment>\\n ++line_num;
1442 <comment>"*"+"/" BEGIN(INITIAL);
1445 This scanner goes to a bit of trouble to match as much
1446 text as possible with each rule.
1447 In general, when attempting to write
1448 a high-speed scanner try to match as much possible in each rule, as
1451 Note that start-conditions names are really integer values and
1452 can be stored as such.
1453 Thus, the above could be extended in the
1463 comment_caller = INITIAL;
1470 comment_caller = foo;
1474 <comment>[^*\\n]* /* eat anything that's not a '*' */
1475 <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
1476 <comment>\\n ++line_num;
1477 <comment>"*"+"/" BEGIN(comment_caller);
1480 Furthermore, you can access the current start condition using
1484 For example, the above assignments to
1486 could instead be written
1489 comment_caller = YY_START;
1496 (since that is what's used by AT&T
1499 Note that start conditions do not have their own name-space; %s's and %x's
1500 declare names in the same fashion as #define's.
1502 Finally, here's an example of how to match C-style quoted strings using
1503 exclusive start conditions, including expanded escape sequences (but
1504 not including checking for a string that's too long):
1510 char string_buf[MAX_STR_CONST];
1511 char *string_buf_ptr;
1514 \\" string_buf_ptr = string_buf; BEGIN(str);
1516 <str>\\" { /* saw closing quote - all done */
1518 *string_buf_ptr = '\\0';
1519 /* return string constant token type and
1525 /* error - unterminated string constant */
1526 /* generate error message */
1529 <str>\\\\[0-7]{1,3} {
1530 /* octal escape sequence */
1533 (void) sscanf( yytext + 1, "%o", &result );
1535 if ( result > 0xff )
1536 /* error, constant is out-of-bounds */
1538 *string_buf_ptr++ = result;
1542 /* generate error - bad escape sequence; something
1543 * like '\\48' or '\\0777777'
1547 <str>\\\\n *string_buf_ptr++ = '\\n';
1548 <str>\\\\t *string_buf_ptr++ = '\\t';
1549 <str>\\\\r *string_buf_ptr++ = '\\r';
1550 <str>\\\\b *string_buf_ptr++ = '\\b';
1551 <str>\\\\f *string_buf_ptr++ = '\\f';
1553 <str>\\\\(.|\\n) *string_buf_ptr++ = yytext[1];
1555 <str>[^\\\\\\n\\"]+ {
1556 char *yptr = yytext;
1559 *string_buf_ptr++ = *yptr++;
1564 Often, such as in some of the examples above, you wind up writing a
1565 whole bunch of rules all preceded by the same start condition(s).
1566 Flex makes this a little easier and cleaner by introducing a notion of
1569 A start condition scope is begun with:
1577 is a list of one or more start conditions.
1578 Inside the start condition
1579 scope, every rule automatically has the prefix
1581 applied to it, until a
1583 which matches the initial
1589 "\\\\n" return '\\n';
1590 "\\\\r" return '\\r';
1591 "\\\\f" return '\\f';
1592 "\\\\0" return '\\0';
1599 <ESC>"\\\\n" return '\\n';
1600 <ESC>"\\\\r" return '\\r';
1601 <ESC>"\\\\f" return '\\f';
1602 <ESC>"\\\\0" return '\\0';
1605 Start condition scopes may be nested.
1607 Three routines are available for manipulating stacks of start conditions:
1609 .B void yy_push_state(int new_state)
1610 pushes the current start condition onto the top of the start condition
1611 stack and switches to
1613 as though you had used
1615 (recall that start condition names are also integers).
1617 .B void yy_pop_state()
1618 pops the top of the stack and switches to it via
1621 .B int yy_top_state()
1622 returns the top of the stack without altering the stack's contents.
1624 The start condition stack grows dynamically and so has no built-in
1626 If memory is exhausted, program execution aborts.
1628 To use start condition stacks, your scanner must include a
1630 directive (see Options below).
1631 .SH MULTIPLE INPUT BUFFERS
1632 Some scanners (such as those which support "include" files)
1633 require reading from several input streams.
1636 scanners do a large amount of buffering, one cannot control
1637 where the next input will be read from by simply writing a
1639 which is sensitive to the scanning context.
1641 is only called when the scanner reaches the end of its buffer, which
1642 may be a long time after scanning a statement such as an "include"
1643 which requires switching the input source.
1645 To negotiate these sorts of problems,
1647 provides a mechanism for creating and switching between multiple
1649 An input buffer is created by using:
1652 YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1657 pointer and a size and creates a buffer associated with the given
1658 file and large enough to hold
1660 characters (when in doubt, use
1665 handle, which may then be passed to other routines (see below).
1668 type is a pointer to an opaque
1669 .B struct yy_buffer_state
1670 structure, so you may safely initialize YY_BUFFER_STATE variables to
1671 .B ((YY_BUFFER_STATE) 0)
1672 if you wish, and also refer to the opaque structure in order to
1673 correctly declare input buffers in source files other than that
1677 pointer in the call to
1679 is only used as the value of
1685 so it no longer uses
1687 then you can safely pass a nil
1690 .B yy_create_buffer.
1691 You select a particular buffer to scan from using:
1694 void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1697 switches the scanner's input buffer so subsequent tokens will
1701 .B yy_switch_to_buffer()
1702 may be used by yywrap() to set things up for continued scanning, instead
1703 of opening a new file and pointing
1706 Note also that switching input sources via either
1707 .B yy_switch_to_buffer()
1712 change the start condition.
1715 void yy_delete_buffer( YY_BUFFER_STATE buffer )
1718 is used to reclaim the storage associated with a buffer.
1721 can be nil, in which case the routine does nothing.)
1722 You can also clear the current contents of a buffer using:
1725 void yy_flush_buffer( YY_BUFFER_STATE buffer )
1728 This function discards the buffer's contents,
1729 so the next time the scanner attempts to match a token from the
1730 buffer, it will first fill the buffer anew using
1735 .B yy_create_buffer(),
1736 provided for compatibility with the C++ use of
1740 for creating and destroying dynamic objects.
1743 .B YY_CURRENT_BUFFER
1746 handle to the current buffer.
1748 Here is an example of using these features for writing a scanner
1749 which expands include files (the
1751 feature is discussed below):
1754 /* the "incl" state is used for picking up the name
1755 * of an include file
1760 #define MAX_INCLUDE_DEPTH 10
1761 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1762 int include_stack_ptr = 0;
1766 include BEGIN(incl);
1769 [^a-z\\n]*\\n? ECHO;
1771 <incl>[ \\t]* /* eat the whitespace */
1772 <incl>[^ \\t\\n]+ { /* got the include file name */
1773 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1775 fprintf( stderr, "Includes nested too deeply" );
1779 include_stack[include_stack_ptr++] =
1782 yyin = fopen( yytext, "r" );
1787 yy_switch_to_buffer(
1788 yy_create_buffer( yyin, YY_BUF_SIZE ) );
1794 if ( --include_stack_ptr < 0 )
1801 yy_delete_buffer( YY_CURRENT_BUFFER );
1802 yy_switch_to_buffer(
1803 include_stack[include_stack_ptr] );
1808 Three routines are available for setting up input buffers for
1809 scanning in-memory strings instead of files.
1811 a new input buffer for scanning the string, and return a corresponding
1813 handle (which you should delete with
1814 .B yy_delete_buffer()
1816 They also switch to the new buffer using
1817 .B yy_switch_to_buffer(),
1820 will start scanning the string.
1822 .B yy_scan_string(const char *str)
1823 scans a NUL-terminated string.
1825 .B yy_scan_bytes(const char *bytes, int len)
1828 bytes (including possibly NUL's)
1829 starting at location
1832 Note that both of these functions create and scan a
1834 of the string or bytes.
1835 (This may be desirable, since
1837 modifies the contents of the buffer it is scanning.) You can avoid the
1840 .B yy_scan_buffer(char *base, yy_size_t size)
1841 which scans in place the buffer starting at
1845 bytes, the last two bytes of which
1848 .B YY_END_OF_BUFFER_CHAR
1850 These last two bytes are not scanned; thus, scanning
1857 If you fail to set up
1859 in this manner (i.e., forget the final two
1860 .B YY_END_OF_BUFFER_CHAR
1863 returns a nil pointer instead of creating a new input buffer.
1867 is an integral type to which you can cast an integer expression
1868 reflecting the size of the buffer.
1869 .SH END-OF-FILE RULES
1870 The special rule "<<EOF>>" indicates
1871 actions which are to be taken when an end-of-file is
1872 encountered and yywrap() returns non-zero (i.e., indicates
1873 no further files to process).
1874 The action must finish
1875 by doing one of four things:
1879 to a new input file (in previous versions of flex, after doing the
1880 assignment you had to call the special action
1882 this is no longer necessary);
1888 executing the special
1892 or, switching to a new buffer using
1893 .B yy_switch_to_buffer()
1894 as shown in the example above.
1896 <<EOF>> rules may not be used with other
1897 patterns; they may only be qualified with a list of start
1899 If an unqualified <<EOF>> rule is given, it
1902 start conditions which do not already have <<EOF>> actions.
1904 specify an <<EOF>> rule for only the initial start condition, use
1911 These rules are useful for catching things like unclosed comments.
1918 ...other rules for dealing with quotes...
1921 error( "unterminated quote" );
1926 yyin = fopen( *filelist, "r" );
1932 .SH MISCELLANEOUS MACROS
1935 can be defined to provide an action
1936 which is always executed prior to the matched rule's action.
1938 it could be #define'd to call a routine to convert yytext to lower-case.
1941 is invoked, the variable
1943 gives the number of the matched rule (rules are numbered starting with 1).
1944 Suppose you want to profile how often each of your rules is matched.
1945 The following would do the trick:
1948 #define YY_USER_ACTION ++ctr[yy_act]
1953 is an array to hold the counts for the different rules.
1956 gives the total number of rules (including the default rule, even if
1959 so a correct declaration for
1964 int ctr[YY_NUM_RULES];
1970 may be defined to provide an action which is always executed before
1971 the first scan (and before the scanner's internal initializations are done).
1972 For example, it could be used to call a routine to read
1973 in a data table or open a logging file.
1976 .B yy_set_interactive(is_interactive)
1977 can be used to control whether the current buffer is considered
1979 An interactive buffer is processed more slowly,
1980 but must be used when the scanner's input source is indeed
1981 interactive to avoid problems due to waiting to fill buffers
1982 (see the discussion of the
1986 in the macro invocation marks the buffer as interactive, a zero
1987 value as non-interactive.
1988 Note that use of this macro overrides
1989 .B %option interactive ,
1990 .B %option always-interactive
1992 .B %option never-interactive
1993 (see Options below).
1994 .B yy_set_interactive()
1995 must be invoked prior to beginning to scan the buffer that is
1996 (or is not) to be considered interactive.
1999 .B yy_set_bol(at_bol)
2000 can be used to control whether the current buffer's scanning
2001 context for the next token match is done as though at the
2002 beginning of a line.
2003 A non-zero macro argument makes rules anchored with
2004 '^' active, while a zero argument makes '^' rules inactive.
2008 returns true if the next token scanned from the current buffer
2009 will have '^' rules active, false otherwise.
2011 In the generated scanner, the actions are all gathered in one large
2012 switch statement and separated using
2014 which may be redefined.
2015 By default, it is simply a "break", to separate
2016 each rule's action from the following rule's.
2019 allows, for example, C++ users to
2020 #define YY_BREAK to do nothing (while being very careful that every
2021 rule ends with a "break" or a "return"!) to avoid suffering from
2022 unreachable statement warnings where because a rule's action ends with
2026 .SH VALUES AVAILABLE TO THE USER
2027 This section summarizes the various values available to the user
2028 in the rule actions.
2031 holds the text of the current token.
2032 It may be modified but not lengthened
2033 (you cannot append characters to the end).
2035 If the special directive
2037 appears in the first section of the scanner description, then
2040 .B char yytext[YYLMAX],
2043 is a macro definition that you can redefine in the first section
2044 if you don't like the default value (generally 8KB).
2047 results in somewhat slower scanners, but the value of
2049 becomes immune to calls to
2053 which potentially destroy its value when
2055 is a character pointer.
2060 which is the default.
2064 when generating C++ scanner classes
2070 holds the length of the current token.
2073 is the file which by default
2076 It may be redefined but doing so only makes sense before
2077 scanning begins or after an EOF has been encountered.
2078 Changing it in the midst of scanning will have unexpected results since
2080 buffers its input; use
2083 Once scanning terminates because an end-of-file
2084 has been seen, you can assign
2086 at the new input file and then call the scanner again to continue scanning.
2088 .B void yyrestart( FILE *new_file )
2089 may be called to point
2091 at the new input file.
2092 The switch-over to the new file is immediate
2093 (any previously buffered-up input is lost).
2098 as an argument thus throws away the current input buffer and continues
2099 scanning the same input file.
2102 is the file to which
2105 It can be reassigned by the user.
2107 .B YY_CURRENT_BUFFER
2110 handle to the current buffer.
2113 returns an integer value corresponding to the current start
2115 You can subsequently use this value with
2117 to return to that start condition.
2118 .SH INTERFACING WITH YACC
2119 One of the main uses of
2121 is as a companion to the
2125 parsers expect to call a routine named
2127 to find the next input token.
2128 The routine is supposed to
2129 return the type of the next token as well as putting any associated
2140 to instruct it to generate the file
2142 containing definitions of all the
2147 This file is then included in the
2150 For example, if one of the tokens is "TOK_NUMBER",
2151 part of the scanner might look like:
2160 [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
2165 has the following options:
2168 Generate backing-up information to
2170 This is a list of scanner states which require backing up
2171 and the input characters on which they do so.
2173 can remove backing-up states.
2176 backing-up states are eliminated and
2180 is used, the generated scanner will run faster (see the
2183 Only users who wish to squeeze every last cycle out of their
2184 scanners need worry about this option.
2185 (See the section on Performance Considerations below.)
2188 is a do-nothing, deprecated option included for POSIX compliance.
2191 makes the generated scanner run in
2194 Whenever a pattern is recognized and the global
2196 is non-zero (which is the default),
2197 the scanner will write to
2202 --accepting rule at line 53 ("the matched text")
2205 The line number refers to the location of the rule in the file
2206 defining the scanner (i.e., the file that was fed to flex).
2207 Messages are also generated when the scanner backs up, accepts the
2208 default rule, reaches the end of its input buffer (or encounters
2209 a NUL; at this point, the two look the same as far as the scanner's concerned),
2210 or reaches an end-of-file.
2215 No table compression is done and stdio is bypassed.
2216 The result is large but fast.
2217 This option is equivalent to
2222 generates a "help" summary of
2233 .B \-i, \-\-case-insensitive
2239 The case of letters given in the
2242 be ignored, and tokens in the input will be matched regardless of case.
2243 The matched text given in
2245 will have the preserved case (i.e., it will not be folded).
2247 .B \-l, \-\-lex\-compat
2248 turns on maximum compatibility with the original AT&T
2251 Note that this does not mean
2254 Use of this option costs a considerable amount of
2255 performance, and it cannot be used with the
2256 .B \-+, -f, -F, -Cf,
2260 For details on the compatibilities it provides, see the section
2261 "Incompatibilities With Lex And POSIX" below.
2262 This option also results
2264 .B YY_FLEX_LEX_COMPAT
2265 being #define'd in the generated scanner.
2268 is another do-nothing, deprecated option included only for
2271 .B \-p, \-\-perf\-report
2272 generates a performance report to stderr.
2273 The report consists of comments regarding features of the
2275 input file which will cause a serious loss of performance in the resulting
2277 If you give the flag twice, you will also get comments regarding
2278 features that lead to minor performance losses.
2280 Note that the use of
2282 .B %option yylineno,
2283 and variable trailing context (see the Deficiencies / Bugs section below)
2284 entails a substantial performance penalty; use of
2291 flag entail minor performance penalties.
2293 .B \-s, \-\-no\-default
2296 (that unmatched scanner input is echoed to
2299 If the scanner encounters input that does not
2300 match any of its rules, it aborts with an error.
2302 useful for finding holes in a scanner's rule set.
2307 to write the scanner it generates to standard output instead
2316 a summary of statistics regarding the scanner it generates.
2317 Most of the statistics are meaningless to the casual
2319 user, but the first line identifies the version of
2321 (same as reported by
2323 and the next line the flags used when generating the scanner, including
2324 those that are on by default.
2327 suppresses warning messages.
2334 scanner, the opposite of
2336 scanners generated by
2343 that your scanner will never be used interactively, and you want to
2346 more performance out of it.
2347 If your goal is instead to squeeze out a
2349 more performance, you should be using the
2353 options (discussed below), which turn on
2355 automatically anyway.
2360 scanner table representation should be used (and stdio
2362 This representation is about as fast as the full table representation
2364 and for some sets of patterns will be considerably smaller (and for
2366 In general, if the pattern set contains both "keywords"
2367 and a catch-all, "identifier" rule, such as in the set:
2370 "case" return TOK_CASE;
2371 "switch" return TOK_SWITCH;
2373 "default" return TOK_DEFAULT;
2374 [a-z]+ return TOK_ID;
2377 then you're better off using the full table representation.
2379 the "identifier" rule is present and you then use a hash table or some such
2380 to detect the keywords, you're better off using
2383 This option is equivalent to
2386 It cannot be used with
2389 .B \-I, \-\-interactive
2395 An interactive scanner is one that only looks ahead to decide
2396 what token has been matched if it absolutely must.
2398 always looking one extra character ahead, even if the scanner has already
2399 seen enough text to disambiguate the current token, is a bit faster than
2400 only looking ahead when necessary.
2401 But scanners that always look ahead
2402 give dreadful interactive performance; for example, when a user types
2403 a newline, it is not recognized as a newline token until they enter
2405 token, which often means typing in another whole line.
2414 table-compression options (see below).
2415 That's because if you're looking
2416 for high-performance you should be using one of these options, so if you
2419 assumes you'd rather trade off a bit of run-time performance for intuitive
2420 interactive behavior.
2429 Thus, this option is not really needed; it is on by default for all those
2430 cases in which it is allowed.
2434 returns false for the scanner input, flex will revert to batch mode, even if
2437 To force interactive mode no matter what, use
2438 .B %option always-interactive
2439 (see Options below).
2441 You can force a scanner to
2443 be interactive by using
2453 Without this option,
2455 peppers the generated scanner
2456 with #line directives so error messages in the actions will be correctly
2457 located with respect to either the original
2459 input file (if the errors are due to code in the input file), or
2463 fault -- you should report these sorts of errors to the email address
2472 It will generate a lot of messages to
2475 the form of the input and the resultant non-deterministic and deterministic
2477 This option is mostly for use in maintaining
2481 prints the version number to
2491 to generate a 7-bit scanner, i.e., one which can only recognize 7-bit
2492 characters in its input.
2493 The advantage of using
2495 is that the scanner's tables can be up to half the size of those generated
2499 The disadvantage is that such scanners often hang
2500 or crash if their input contains an 8-bit character.
2502 Note, however, that unless you generate your scanner using the
2506 table compression options, use of
2508 will save only a small amount of table space, and make your scanner
2509 considerably less portable.
2511 default behavior is to generate an 8-bit scanner unless you use the
2517 defaults to generating 7-bit scanners unless your site was always
2518 configured to generate 8-bit scanners (as will often be the case
2519 with non-USA sites).
2520 You can tell whether flex generated a 7-bit
2521 or an 8-bit scanner by inspecting the flag summary in the
2523 output as described above.
2525 Note that if you use
2529 (those table compression options, but also using equivalence classes as
2530 discussed see below), flex still defaults to generating an 8-bit
2531 scanner, since usually with these compression options full 8-bit tables
2532 are not much more expensive than 7-bit tables.
2537 to generate an 8-bit scanner, i.e., one which can recognize 8-bit
2539 This flag is only needed for scanners generated using
2543 as otherwise flex defaults to generating an 8-bit scanner anyway.
2545 See the discussion of
2547 above for flex's default behavior and the tradeoffs between 7-bit
2551 specifies that you want flex to generate a C++
2553 See the section on Generating C++ Scanners below for
2557 controls the degree of table compression and, more generally, trade-offs
2558 between small scanners and fast scanners.
2561 ("align") instructs flex to trade off larger tables in the
2562 generated scanner for faster performance because the elements of
2563 the tables are better aligned for memory access and computation.
2565 RISC architectures, fetching and manipulating longwords is more efficient
2566 than with smaller-sized units such as shortwords.
2568 double the size of the tables used by your scanner.
2574 .I equivalence classes,
2575 i.e., sets of characters
2576 which have identical lexical properties (for example, if the only
2577 appearance of digits in the
2579 input is in the character class
2580 "[0-9]" then the digits '0', '1', ..., '9' will all be put
2581 in the same equivalence class).
2582 Equivalence classes usually give
2583 dramatic reductions in the final table/object file sizes (typically
2584 a factor of 2-5) and are pretty cheap performance-wise (one array
2585 look-up per character scanned).
2590 scanner tables should be generated -
2592 should not compress the
2593 tables by taking advantages of similar transition functions for
2597 specifies that the alternative fast scanner representation (described
2602 This option cannot be used with
2605 .B \-Cm, \-\-meta-ecs
2609 .I meta-equivalence classes,
2610 which are sets of equivalence classes (or characters, if equivalence
2611 classes are not being used) that are commonly used together.
2613 classes are often a big win when using compressed tables, but they
2614 have a moderate performance impact (one or two "if" tests and one
2615 array look-up per character scanned).
2618 causes the generated scanner to
2620 use of the standard I/O library (stdio) for input.
2625 the scanner will use the
2627 system call, resulting in a performance gain which varies from system
2628 to system, but in general is probably negligible unless you are also using
2634 can cause strange behavior if, for example, you read from
2636 using stdio prior to calling the scanner (because the scanner will miss
2637 whatever text your previous reads left in the stdio input buffer).
2640 has no effect if you define
2642 (see The Generated Scanner above).
2646 specifies that the scanner tables should be compressed but neither
2647 equivalence classes nor meta-equivalence classes should be used.
2655 do not make sense together - there is no opportunity for meta-equivalence
2656 classes if the table is not being compressed.
2657 Otherwise the options
2658 may be freely mixed, and are cumulative.
2660 The default setting is
2662 which specifies that
2664 should generate equivalence classes
2665 and meta-equivalence classes.
2666 This setting provides the highest degree of table compression.
2668 faster-executing scanners at the cost of larger tables with
2669 the following generally being true:
2683 Note that scanners with the smallest tables are usually generated and
2684 compiled the quickest, so
2685 during development you will usually want to use the default, maximal
2689 is often a good compromise between speed and size for production
2692 .B \-ooutput, \-\-outputfile=FILE
2693 directs flex to write the scanner to the file
2701 option, then the scanner is written to
2707 option above) refer to the file
2710 .B \-Pprefix, \-\-prefix=STRING
2715 for all globally-visible variable and function names to instead be
2723 It also changes the name of the default output file from
2727 Here are all of the names affected:
2735 yy_load_buffer_state
2747 (If you are using a C++ scanner, then only
2752 Within your scanner itself, you can still refer to the global variables
2753 and functions using either version of their name; but externally, they
2754 have the modified name.
2756 This option lets you easily link together multiple
2758 programs into the same executable.
2759 Note, though, that using this option also renames
2764 provide your own (appropriately-named) version of the routine for your
2766 .B %option noyywrap,
2769 no longer provides one for you by default.
2771 .B \-Sskeleton_file, \-\-skel=FILE
2772 overrides the default skeleton file from which
2774 constructs its scanners.
2775 You'll never need this option unless you are doing
2777 maintenance or development.
2779 .B \-X, \-\-posix\-compat
2780 maximal compatibility with POSIX lex.
2783 track line count in yylineno.
2788 .B \-\-header\-file=FILE
2789 create a C header file in addition to the scanner.
2791 .B \-\-tables\-file[=FILE]
2792 write tables to FILE.
2795 #define macro defn (default defn is '1').
2797 .B \-R, \-\-reentrant
2798 generate a reentrant C scanner
2800 .B \-\-bison\-bridge
2801 scanner for bison pure parser.
2803 .B \-\-bison\-locations
2804 include yylloc support.
2807 initialize yyin/yyout to stdin/stdout.
2809 .B \-\-noansi\-definitions old\-style function definitions.
2811 .B \-\-noansi\-prototypes
2812 empty parameter list in prototypes.
2815 do not include <unistd.h>.
2818 do not generate a particular FUNCTION.
2821 also provides a mechanism for controlling options within the
2822 scanner specification itself, rather than from the flex command-line.
2823 This is done by including
2825 directives in the first section of the scanner specification.
2826 You can specify multiple options with a single
2828 directive, and multiple directives in the first section of your flex input
2831 Most options are given simply as names, optionally preceded by the
2832 word "no" (with no intervening whitespace) to negate their meaning.
2833 A number are equivalent to flex flags or their negation:
2844 case-sensitive opposite of -i (default)
2850 default opposite of -s option
2854 interactive -I option
2855 lex-compat -l option
2857 perf-report -p option
2861 warn opposite of -w option
2862 (use "%option nowarn" for -w)
2864 array equivalent to "%array"
2865 pointer equivalent to "%pointer" (default)
2870 provide features otherwise not available:
2872 .B always-interactive
2873 instructs flex to generate a scanner which always considers its input
2875 Normally, on each new input file the scanner calls
2877 in an attempt to determine whether
2878 the scanner's input source is interactive and thus should be read a
2879 character at a time.
2880 When this option is used, however, then no
2884 directs flex to provide a default
2886 program for the scanner, which simply calls
2892 .B never-interactive
2893 instructs flex to generate a scanner which never considers its input
2894 "interactive" (again, no call made to
2896 This is the opposite of
2897 .B always-interactive.
2900 enables the use of start condition stacks (see Start Conditions above).
2913 instead of the default of
2917 programs depend on this behavior, even though it is not compliant with
2918 ANSI C, which does not require
2922 to be compile-time constant.
2927 to generate a scanner that maintains the number of the current line
2928 read from its input in the global variable
2930 This option is implied by
2931 .B %option lex-compat.
2935 .B %option noyywrap),
2936 makes the scanner not call
2938 upon an end-of-file, but simply assume that there are no more
2939 files to scan (until the user points
2941 at a new file and calls
2946 scans your rule actions to determine whether you use the
2955 options are available to override its decision as to whether you use the
2956 options, either by setting them (e.g.,
2958 to indicate the feature is indeed used, or
2959 unsetting them to indicate it actually is not used
2961 .B %option noyymore).
2963 Three options take string-delimited values, offset with '=':
2966 %option outfile="ABC"
2974 %option prefix="XYZ"
2982 %option yyclass="foo"
2985 only applies when generating a C++ scanner (
2990 that you have derived
2996 will place your actions in the member function
2999 .B yyFlexLexer::yylex().
3001 .B yyFlexLexer::yylex()
3002 member function that emits a run-time error (by invoking
3003 .B yyFlexLexer::LexerError())
3005 See Generating C++ Scanners, below, for additional information.
3007 A number of options are available for lint purists who want to suppress
3008 the appearance of unneeded routines in the generated scanner.
3009 Each of the following, if unset
3012 ), results in the corresponding routine not appearing in
3013 the generated scanner:
3017 yy_push_state, yy_pop_state, yy_top_state
3018 yy_scan_buffer, yy_scan_bytes, yy_scan_string
3023 and friends won't appear anyway unless you use
3025 .SH PERFORMANCE CONSIDERATIONS
3026 The main design goal of
3028 is that it generate high-performance scanners.
3029 It has been optimized
3030 for dealing well with large sets of rules.
3031 Aside from the effects on scanner speed of the table compression
3033 options outlined above,
3034 there are a number of options/actions which degrade performance.
3035 These are, from most expensive to least:
3040 arbitrary trailing context
3042 pattern sets that require backing up
3045 %option always-interactive
3047 '^' beginning-of-line operator
3051 with the first three all being quite expensive and the last two
3055 is implemented as a routine call that potentially does quite a bit of
3058 is a quite-cheap macro; so if just putting back some excess text you
3063 should be avoided at all costs when performance is important.
3064 It is a particularly expensive option.
3066 Getting rid of backing up is messy and often may be an enormous
3067 amount of work for a complicated scanner.
3068 In principal, one begins by using the
3073 For example, on the input
3077 foo return TOK_KEYWORD;
3078 foobar return TOK_KEYWORD;
3081 the file looks like:
3084 State #6 is non-accepting -
3085 associated rule line numbers:
3087 out-transitions: [ o ]
3088 jam-transitions: EOF [ \\001-n p-\\177 ]
3090 State #8 is non-accepting -
3091 associated rule line numbers:
3093 out-transitions: [ a ]
3094 jam-transitions: EOF [ \\001-` b-\\177 ]
3096 State #9 is non-accepting -
3097 associated rule line numbers:
3099 out-transitions: [ r ]
3100 jam-transitions: EOF [ \\001-q s-\\177 ]
3102 Compressed tables always back up.
3105 The first few lines tell us that there's a scanner state in
3106 which it can make a transition on an 'o' but not on any other
3107 character, and that in that state the currently scanned text does not match
3109 The state occurs when trying to match the rules found
3110 at lines 2 and 3 in the input file.
3111 If the scanner is in that state and then reads
3112 something other than an 'o', it will have to back up to find
3113 a rule which is matched.
3114 With a bit of headscratching one can see that this must be the
3115 state it's in when it has seen "fo".
3116 When this has happened,
3117 if anything other than another 'o' is seen, the scanner will
3118 have to back up to simply match the 'f' (by the default rule).
3120 The comment regarding State #8 indicates there's a problem
3121 when "foob" has been scanned.
3122 Indeed, on any character other
3123 than an 'a', the scanner will have to back up to accept "foo".
3124 Similarly, the comment for State #9 concerns when "fooba" has
3125 been scanned and an 'r' does not follow.
3127 The final comment reminds us that there's no point going to
3128 all the trouble of removing backing up from the rules unless
3133 since there's no performance gain doing so with compressed scanners.
3135 The way to remove the backing up is to add "error" rules:
3139 foo return TOK_KEYWORD;
3140 foobar return TOK_KEYWORD;
3145 /* false alarm, not really a keyword */
3151 Eliminating backing up among a list of keywords can also be
3152 done using a "catch-all" rule:
3156 foo return TOK_KEYWORD;
3157 foobar return TOK_KEYWORD;
3159 [a-z]+ return TOK_ID;
3162 This is usually the best solution when appropriate.
3164 Backing up messages tend to cascade.
3165 With a complicated set of rules it's not uncommon to get hundreds
3167 If one can decipher them, though, it often
3168 only takes a dozen or so rules to eliminate the backing up (though
3169 it's easy to make a mistake and have an error rule accidentally match
3173 feature will be to automatically add rules to eliminate backing up).
3175 It's important to keep in mind that you gain the benefits of eliminating
3176 backing up only if you eliminate
3178 instance of backing up.
3179 Leaving just one means you gain nothing.
3182 trailing context (where both the leading and trailing parts do not have
3183 a fixed length) entails almost the same performance loss as
3185 (i.e., substantial).
3186 So when possible a rule like:
3190 mouse|rat/(cat|dog) run();
3197 mouse/cat|dog run();
3205 mouse|rat/cat run();
3206 mouse|rat/dog run();
3209 Note that here the special '|' action does
3211 provide any savings, and can even make things worse (see
3212 Deficiencies / Bugs below).
3214 Another area where the user can increase a scanner's performance
3215 (and one that's easier to implement) arises from the fact that
3216 the longer the tokens matched, the faster the scanner will run.
3217 This is because with long tokens the processing of most input
3218 characters takes place in the (short) inner scanning loop, and
3219 does not often have to go through the additional work of setting up
3220 the scanning environment (e.g.,
3223 Recall the scanner for C comments:
3230 "/*" BEGIN(comment);
3233 <comment>"*"+[^*/\\n]*
3234 <comment>\\n ++line_num;
3235 <comment>"*"+"/" BEGIN(INITIAL);
3238 This could be sped up by writing it as:
3245 "/*" BEGIN(comment);
3248 <comment>[^*\\n]*\\n ++line_num;
3249 <comment>"*"+[^*/\\n]*
3250 <comment>"*"+[^*/\\n]*\\n ++line_num;
3251 <comment>"*"+"/" BEGIN(INITIAL);
3254 Now instead of each newline requiring the processing of another
3255 action, recognizing the newlines is "distributed" over the other rules
3256 to keep the matched text as long as possible.
3261 slow down the scanner! The speed of the scanner is independent
3262 of the number of rules or (modulo the considerations given at the
3263 beginning of this section) how complicated the rules are with
3264 regard to operators such as '*' and '|'.
3266 A final example in speeding up a scanner: suppose you want to scan
3267 through a file containing identifiers and keywords, one per line
3268 and with no other extraneous characters, and recognize all the
3270 A natural first approach is:
3279 while /* it's a keyword */
3281 .|\\n /* it's not a keyword */
3284 To eliminate the back-tracking, introduce a catch-all rule:
3293 while /* it's a keyword */
3296 .|\\n /* it's not a keyword */
3299 Now, if it's guaranteed that there's exactly one word per line,
3300 then we can reduce the total number of matches by a half by
3301 merging in the recognition of newlines with that of the other
3311 while\\n /* it's a keyword */
3314 .|\\n /* it's not a keyword */
3317 One has to be careful here, as we have now reintroduced backing up
3319 In particular, while
3321 know that there will never be any characters in the input stream
3322 other than letters or newlines,
3324 can't figure this out, and it will plan for possibly needing to back up
3325 when it has scanned a token like "auto" and then the next character
3326 is something other than a newline or a letter.
3328 then just match the "auto" rule and be done, but now it has no "auto"
3329 rule, only an "auto\\n" rule.
3330 To eliminate the possibility of backing up,
3331 we could either duplicate all rules but without final newlines, or,
3332 since we never expect to encounter such an input and therefore don't
3333 how it's classified, we can introduce one more catch-all rule, this
3334 one which doesn't include a newline:
3343 while\\n /* it's a keyword */
3347 .|\\n /* it's not a keyword */
3352 this is about as fast as one can get a
3354 scanner to go for this particular problem.
3358 is slow when matching NUL's, particularly when a token contains
3360 It's best to write rules which match
3362 amounts of text if it's anticipated that the text will often include NUL's.
3364 Another final note regarding performance: as mentioned above in the section
3365 How the Input is Matched, dynamically resizing
3367 to accommodate huge tokens is a slow process because it presently requires that
3368 the (huge) token be rescanned from the beginning.
3369 Thus if performance is
3370 vital, you should attempt to match "large" quantities of text but not
3371 "huge" quantities, where the cutoff between the two is at about 8K
3373 .SH GENERATING C++ SCANNERS
3375 provides two different ways to generate scanners for use with C++.
3376 The first way is to simply compile a scanner generated by
3378 using a C++ compiler instead of a C compiler.
3379 You should not encounter
3380 any compilations errors (please report any you find to the email address
3381 given in the Author section below).
3382 You can then use C++ code in your rule actions instead of C code.
3383 Note that the default input source for your scanner remains
3385 and default echoing is still done to
3387 Both of these remain
3389 variables and not C++
3394 to generate a C++ scanner class, using the
3396 option (or, equivalently,
3398 which is automatically specified if the name of the flex
3399 executable ends in a '+', such as
3401 When using this option, flex defaults to generating the scanner to the file
3405 The generated scanner includes the header file
3407 which defines the interface to two C++ classes.
3411 provides an abstract base class defining the general scanner class
3413 It provides the following member functions:
3415 .B const char* YYText()
3416 returns the text of the most recently matched token, the equivalent of
3420 returns the length of the most recently matched token, the equivalent of
3423 .B int lineno() const
3424 returns the current input line number
3426 .B %option yylineno),
3433 .B void set_debug( int flag )
3434 sets the debugging flag for the scanner, equivalent to assigning to
3436 (see the Options section above).
3437 Note that you must build the scanner using
3439 to include debugging information in it.
3441 .B int debug() const
3442 returns the current setting of the debugging flag.
3444 Also provided are member functions equivalent to
3445 .B yy_switch_to_buffer(),
3446 .B yy_create_buffer()
3447 (though the first argument is an
3449 object pointer and not a
3451 .B yy_flush_buffer(),
3452 .B yy_delete_buffer(),
3455 (again, the first argument is a
3459 The second class defined in
3463 which is derived from
3465 It defines the following additional member functions:
3468 yyFlexLexer( std::istream* arg_yyin = 0, std::ostream* arg_yyout = 0 )
3471 object using the given streams for input and output.
3472 If not specified, the streams default to
3478 .B virtual int yylex()
3479 performs the same role is
3481 does for ordinary flex scanners: it scans the input stream, consuming
3482 tokens, until a rule's action returns a value.
3483 If you derive a subclass
3487 and want to access the member functions and variables of
3491 then you need to use
3492 .B %option yyclass="S"
3495 that you will be using that subclass instead of
3497 In this case, rather than generating
3498 .B yyFlexLexer::yylex(),
3502 (and also generates a dummy
3503 .B yyFlexLexer::yylex()
3505 .B yyFlexLexer::LexerError()
3509 virtual void switch_streams(std::istream* new_in = 0,
3511 std::ostream* new_out = 0)
3521 (ditto), deleting the previous input buffer if
3526 int yylex( std::istream* new_in, std::ostream* new_out = 0 )
3527 first switches the input streams via
3528 .B switch_streams( new_in, new_out )
3529 and then returns the value of
3534 defines the following protected virtual functions which you can redefine
3535 in derived classes to tailor the scanner:
3538 virtual int LexerInput( char* buf, int max_size )
3543 and returns the number of characters read.
3544 To indicate end-of-input, return 0 characters.
3545 Note that "interactive" scanners (see the
3549 flags) define the macro
3553 and need to take different actions depending on whether or not
3554 the scanner might be scanning an interactive input source, you can
3555 test for the presence of this name via
3559 virtual void LexerOutput( const char* buf, int size )
3562 characters from the buffer
3564 which, while NUL-terminated, may also contain "internal" NUL's if
3565 the scanner's rules can match text with NUL's in them.
3568 virtual void LexerError( const char* msg )
3569 reports a fatal error message.
3570 The default version of this function writes the message to the stream
3579 Thus you can use such objects to create reentrant scanners.
3580 You can instantiate multiple instances of the same
3582 class, and you can also combine multiple C++ scanner classes together
3583 in the same program using the
3585 option discussed above.
3587 Finally, note that the
3589 feature is not available to C++ scanner classes; you must use
3593 Here is an example of a simple C++ scanner:
3596 // An example of using the flex C++ scanner class.
3602 string \\"[^\\n"]+\\"
3608 name ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
3609 num1 [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
3610 num2 [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
3611 number {num1}|{num2}
3615 {ws} /* skip blanks and tabs */
3620 while((c = yyinput()) != 0)
3627 if((c = yyinput()) == '/')
3635 {number} cout << "number " << YYText() << '\\n';
3639 {name} cout << "name " << YYText() << '\\n';
3641 {string} cout << "string " << YYText() << '\\n';
3645 int main( int /* argc */, char** /* argv */ )
3647 FlexLexer* lexer = new yyFlexLexer;
3648 while(lexer->yylex() != 0)
3653 If you want to create multiple (different) lexer classes, you use the
3657 option) to rename each
3661 You then can include
3663 in your other sources once per lexer class, first renaming
3669 #define yyFlexLexer xxFlexLexer
3670 #include <FlexLexer.h>
3673 #define yyFlexLexer zzFlexLexer
3674 #include <FlexLexer.h>
3677 if, for example, you used
3678 .B %option prefix="xx"
3679 for one of your scanners and
3680 .B %option prefix="zz"
3683 IMPORTANT: the present form of the scanning class is
3685 and may change considerably between major releases.
3686 .SH INCOMPATIBILITIES WITH LEX AND POSIX
3688 is a rewrite of the AT&T Unix
3690 tool (the two implementations do not share any code, though),
3691 with some extensions and incompatibilities, both of which
3692 are of concern to those who wish to write scanners acceptable
3693 to either implementation.
3694 Flex is fully compliant with the POSIX
3696 specification, except that when using
3698 (the default), a call to
3700 destroys the contents of
3702 which is counter to the POSIX specification.
3704 In this section we discuss all of the known areas of incompatibility
3705 between flex, AT&T lex, and the POSIX specification.
3709 option turns on maximum compatibility with the original AT&T
3711 implementation, at the cost of a major loss in the generated scanner's
3713 We note below which incompatibilities can be overcome
3719 is fully compatible with
3721 with the following exceptions:
3725 scanner internal variable
3727 is not supported unless
3734 should be maintained on a per-buffer basis, rather than a per-scanner
3735 (single global variable) basis.
3738 is not part of the POSIX specification.
3742 routine is not redefinable, though it may be called to read characters
3743 following whatever has been matched by a rule.
3746 encounters an end-of-file the normal
3749 A ``real'' end-of-file is returned by
3754 Input is instead controlled by defining the
3762 cannot be redefined is in accordance with the POSIX specification,
3763 which simply does not specify any way of controlling the
3764 scanner's input other than by making an initial assignment to
3769 routine is not redefinable.
3770 This restriction is in accordance with POSIX.
3773 scanners are not as reentrant as
3776 In particular, if you have an interactive scanner and
3777 an interrupt handler which long-jumps out of the scanner, and
3778 the scanner is subsequently called again, you may get the following
3782 fatal flex scanner internal error--end of buffer missed
3785 To reenter the scanner, first use
3791 Note that this call will throw away any buffered input; usually this
3792 isn't a problem with an interactive scanner.
3794 Also note that flex C++ scanner classes
3796 reentrant, so if using C++ is an option for you, you should use
3798 See "Generating C++ Scanners" above for details.
3804 macro is done to the file-pointer
3810 is not part of the POSIX specification.
3813 does not support exclusive start conditions (%x), though they
3814 are in the POSIX specification.
3816 When definitions are expanded,
3818 encloses them in parentheses.
3819 With lex, the following:
3824 foo{NAME}? printf( "Found it\\n" );
3828 will not match the string "foo" because when the macro
3829 is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
3830 and the precedence is such that the '?' is associated with
3834 the rule will be expanded to
3835 "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
3837 Note that if the definition begins with
3843 expanded with parentheses, to allow these operators to appear in
3844 definitions without losing their special meanings.
3849 operators cannot be used in a
3857 behavior of no parentheses around the definition.
3859 The POSIX specification is that the definition be enclosed in parentheses.
3861 Some implementations of
3863 allow a rule's action to begin on a separate line, if the rule's pattern
3864 has trailing whitespace:
3869 { foobar_action(); }
3873 does not support this feature.
3878 (generate a Ratfor scanner) option is not supported.
3880 of the POSIX specification.
3885 is undefined until the next token is matched, unless the scanner
3888 This is not the case with
3890 or the POSIX specification.
3893 option does away with this incompatibility.
3895 The precedence of the
3897 (numeric range) operator is different.
3899 interprets "abc{1,3}" as "match one, two, or
3900 three occurrences of 'abc'", whereas
3902 interprets it as "match 'ab'
3903 followed by one, two, or three occurrences of 'c'".
3904 The latter is in agreement with the POSIX specification.
3906 The precedence of the
3908 operator is different.
3910 interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
3911 or 'bar' anywhere", whereas
3913 interprets it as "match either 'foo' or 'bar' if they come at the beginning
3915 The latter is in agreement with the POSIX specification.
3917 The special table-size declarations such as
3929 is #define'd so scanners may be written for use with either
3933 Scanners also include
3934 .B YY_FLEX_MAJOR_VERSION
3936 .B YY_FLEX_MINOR_VERSION
3937 indicating which version of
3939 generated the scanner
3940 (for example, for the 2.5 release, these defines would be 2 and 5
3945 features are not included in
3947 or the POSIX specification:
3952 start condition scopes
3953 start condition stacks
3954 interactive/non-interactive scanners
3955 yy_scan_string() and friends
3957 yy_set_interactive()
3967 %{}'s around actions
3968 multiple actions on a line
3971 plus almost all of the flex flags.
3972 The last feature in the list refers to the fact that with
3974 you can put multiple actions on the same line, separated with
3975 semi-colons, while with
3980 foo handle_foo(); ++num_foos_seen;
3983 is (rather surprisingly) truncated to
3990 does not truncate the action.
3991 Actions that are not enclosed in
3992 braces are simply terminated at the end of the line.
3994 .I warning, rule cannot be matched
3995 indicates that the given rule
3996 cannot be matched because it follows other rules that will
3997 always match the same text as it.
3998 For example, in the following "foo" cannot be matched because it comes after
3999 an identifier "catch-all" rule:
4002 [a-z]+ got_identifier();
4008 in a scanner suppresses this warning.
4013 option given but default rule can be matched
4014 means that it is possible (perhaps only in a particular start condition)
4015 that the default rule (match any single character) is the only one
4016 that will match a particular input.
4019 was given, presumably this is not intended.
4021 .I reject_used_but_not_detected undefined
4023 .I yymore_used_but_not_detected undefined -
4024 These errors can occur at compile time.
4025 They indicate that the scanner uses
4031 failed to notice the fact, meaning that
4033 scanned the first two sections looking for occurrences of these actions
4034 and failed to find any, but somehow you snuck some in (via a #include
4040 to indicate to flex that you really do use these features.
4042 .I flex scanner jammed -
4043 a scanner compiled with
4045 has encountered an input string which wasn't matched by
4047 This error can also occur due to internal problems.
4049 .I token too large, exceeds YYLMAX -
4052 and one of its rules matched a string longer than the
4054 constant (8K bytes by default).
4055 You can increase the value by
4058 in the definitions section of your
4062 .I scanner requires \-8 flag to
4063 .I use the character 'x' -
4064 Your scanner specification includes recognizing the 8-bit character
4066 and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
4067 because you used the
4071 table compression options.
4072 See the discussion of the
4076 .I flex scanner push-back overflow -
4079 to push back so much text that the scanner's buffer could not hold
4080 both the pushed-back text and the current token in
4082 Ideally the scanner should dynamically resize the buffer in this case, but at
4083 present it does not.
4086 input buffer overflow, can't enlarge buffer because scanner uses REJECT -
4087 the scanner was working on matching an extremely large token and needed
4088 to expand the input buffer.
4089 This doesn't work with scanners that use
4094 fatal flex scanner internal error--end of buffer missed -
4095 This can occur in a scanner which is reentered after a long-jump
4096 has jumped out (or over) the scanner's activation frame.
4097 Before reentering the scanner, use:
4103 or, as noted above, switch to using the C++ scanner class.
4105 .I too many start conditions in <> construct! -
4106 you listed more start conditions in a <> construct than exist (so
4107 you must have listed at least one of them twice).
4111 library with which scanners must be linked.
4114 generated scanner (called
4119 generated C++ scanner class, when using
4123 header file defining the C++ scanner base class,
4125 and its derived class,
4130 This file is only used when building flex, not when flex executes.
4133 backing-up information for
4138 .SH DEFICIENCIES / BUGS
4139 Some trailing context
4140 patterns cannot be properly matched and generate
4141 warning messages ("dangerous trailing context").
4142 These are patterns where the ending of the
4143 first part of the rule matches the beginning of the second
4144 part, such as "zx*/xy*", where the 'x*' matches the 'x' at
4145 the beginning of the trailing context.
4146 (Note that the POSIX draft
4147 states that the text matched by such patterns is undefined.)
4149 For some trailing context rules, parts which are actually fixed-length are
4150 not recognized as such, leading to the above mentioned performance loss.
4151 In particular, parts using '|' or {n} (such as "foo{3}") are always
4152 considered variable-length.
4154 Combining trailing context with the special '|' action can result in
4156 trailing context being turned into the more expensive
4159 For example, in the following:
4170 invalidates yytext and yyleng, unless the
4175 option has been used.
4177 Pattern-matching of NUL's is substantially slower than matching other
4180 Dynamic resizing of the input buffer is slow, as it entails rescanning
4181 all the text matched so far by the current (generally huge) token.
4183 Due to both buffering of input and read-ahead, you cannot intermix
4184 calls to <stdio.h> routines, such as, for example,
4188 rules and expect it to work.
4193 The total table entries listed by the
4195 flag excludes the number of table entries needed to determine
4196 what rule has been matched.
4197 The number of entries is equal
4198 to the number of DFA states if the scanner does not use
4200 and somewhat greater than the number of states if it does.
4203 cannot be used with the
4211 internal algorithms need documentation.
4213 lex(1), yacc(1), sed(1), awk(1).
4215 John Levine, Tony Mason, and Doug Brown,
4217 O'Reilly and Associates.
4218 Be sure to get the 2nd edition.
4220 M. E. Lesk and E. Schmidt,
4221 .I LEX \- Lexical Analyzer Generator
4223 Alfred Aho, Ravi Sethi and Jeffrey Ullman,
4224 .I Compilers: Principles, Techniques and Tools,
4225 Addison-Wesley (1986).
4226 Describes the pattern-matching techniques used by
4228 (deterministic finite automata).
4230 Vern Paxson, with the help of many ideas and much inspiration from
4232 Original version by Jef Poskanzer.
4234 representation is a partial implementation of a design done by Van
4236 The implementation was done by Kevin Gong and Vern Paxson.
4240 beta-testers, feedbackers, and contributors, especially Francois Pinard,
4243 Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
4244 Neal Becker, Nelson H.F. Beebe, benson@odi.com,
4245 Karl Berry, Peter A. Bigot, Simon Blanchard,
4246 Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher,
4247 Brian Clapper, J.T. Conklin,
4248 Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David
4249 Daniels, Chris G. Demetriou, Theo de Raadt,
4250 Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin,
4251 Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,
4252 Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz,
4253 Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel,
4254 Jan Hajic, Charles Hemphill, NORO Hideo,
4255 Jarkko Hietaniemi, Scott Hofmann,
4256 Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
4257 Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
4258 Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
4259 Amir Katz, ken@ken.hilco.com, Kevin B. Kenny,
4260 Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht,
4261 Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle,
4262 David Loffredo, Mike Long,
4263 Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall,
4264 Bengt Martensson, Chris Metcalf,
4265 Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
4266 G.T. Nicol, Landon Noll, James Nordby, Marc Nozell,
4267 Richard Ohnemus, Karsten Pahnke,
4268 Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
4269 Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,
4270 Frederic Raimbault, Pat Rankin, Rick Richardson,
4271 Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini,
4272 Andreas Scherer, Darrell Schiebel, Raf Schietekat,
4273 Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
4274 Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
4275 Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor,
4276 Chris Thewalt, Richard M. Timoney, Jodi Tsai,
4277 Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
4278 Yap, Ron Zellar, Nathan Zelle, David Zuhn,
4279 and those whose names have slipped my marginal
4280 mail-archiving skills but whose contributions are appreciated all the
4283 Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
4284 John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
4285 Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
4286 distribution headaches.
4288 Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to
4289 Benson Margulies and Fred Burke for C++ support; to Kent Williams and Tom
4290 Epperly for C++ class support; to Ove Ewerlid for support of NUL's; and to
4291 Eric Hughes for support of multiple buffers.
4293 This work was primarily done when I was with the Real Time Systems Group
4294 at the Lawrence Berkeley Laboratory in Berkeley, CA.
4295 Many thanks to all there for the support I received.
4297 Send comments to vern@ee.lbl.gov.