1 .\" Copyright (c) 1991, 1993
2 .\" The Regents of the University of California. All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
12 .\" 3. Neither the name of the University nor the names of its contributors
13 .\" may be used to endorse or promote products derived from this software
14 .\" without specific prior written permission.
16 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
17 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
20 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28 .\" @(#)regexp.3 8.1 (Berkeley) 6/4/93
29 .\" $FreeBSD: src/lib/libcompat/regexp/regexp.3,v 1.6.2.2 2001/12/17 10:08:29 ru Exp $
30 .\" $DragonFly: src/lib/libcompat/regexp/regexp.3,v 1.3 2006/04/08 08:17:06 swildner Exp $
40 .Nd regular expression handlers
46 .Fn regcomp "const char *exp"
48 .Fn regexec "const regexp *prog" "const char *string"
50 .Fn regsub "const regexp *prog" "const char *source" "char *dest"
53 This interface is made obsolete by
66 regular expressions and supporting facilities.
71 compiles a regular expression into a structure of type
73 and returns a pointer to it.
74 The space has been allocated using
76 and may be released by
83 .Dv NUL Ns -terminated
85 against the compiled regular expression
88 It returns 1 for success and 0 for failure, and adjusts the contents of
93 (see below) accordingly.
97 structure include at least the following (not necessarily in order):
98 .Bd -literal -offset indent
99 char *startp[NSUBEXP];
105 is defined (as 10) in the header file.
108 has been done using the
111 .Em startp Ns - Em endp
112 pair describes one substring
117 pointing to the first character of the substring and
120 pointing to the first character following the substring.
121 The 0th substring is the substring of
123 that matched the whole
125 The others are those substrings that matched parenthesized expressions
126 within the regular expression, with parenthesized expressions numbered
127 in left-to-right order of their opening parentheses.
136 making substitutions according to the
141 Each instance of `&' in
143 is replaced by the substring
152 is a digit, is replaced by
153 the substring indicated by
154 .Em startp Ns Bq Em n
156 .Em endp Ns Bq Em n .
157 To get a literal `&' or
162 to get a literal `\e' preceding `&' or
170 is called whenever an error is detected in
179 with a suitable indicator of origin,
187 can be replaced by the user if other actions are desirable.
188 .Sh REGULAR EXPRESSION SYNTAX
189 A regular expression is zero or more
192 It matches anything that matches one of the branches.
194 A branch is zero or more
197 It matches a match for the first, followed by a match for the second, etc.
201 possibly followed by `*', `+', or `?'.
202 An atom followed by `*' matches a sequence of 0 or more matches of the atom.
203 An atom followed by `+' matches a sequence of 1 or more matches of the atom.
204 An atom followed by `?' matches a match of the atom, or the null string.
206 An atom is a regular expression in parentheses (matching a match for the
207 regular expression), a
210 (matching any single character), `^' (matching the null string at the
211 beginning of the input string), `$' (matching the null string at the
212 end of the input string), a `\e' followed by a single character (matching
213 that character), or a single character with no other significance
214 (matching that character).
218 is a sequence of characters enclosed in `[]'.
219 It normally matches any single character from the sequence.
220 If the sequence begins with `^',
221 it matches any single character
223 from the rest of the sequence.
224 If two characters in the sequence are separated by `\-', this is shorthand
227 characters between them
228 (e.g. `[0-9]' matches any decimal digit).
229 To include a literal `]' in the sequence, make it the first character
230 (following a possible `^').
231 To include a literal `\-', make it the first or last character.
233 If a regular expression could match two different parts of the input string,
234 it will match the one which begins earliest.
235 If both begin in the same place but match different lengths, or match
236 the same length in different ways, life gets messier, as follows.
238 In general, the possibilities in a list of branches are considered in
239 left-to-right order, the possibilities for `*', `+', and `?' are
240 considered longest-first, nested constructs are considered from the
241 outermost in, and concatenated constructs are considered leftmost-first.
242 The match that will be chosen is the one that uses the earliest
243 possibility in the first choice that has to be made.
244 If there is more than one choice, the next will be made in the same manner
245 (earliest possibility) subject to the decision on the first choice.
251 `abc' in one of two ways.
252 The first choice is between `ab' and `a'; since `ab' is earlier, and does
253 lead to a successful overall match, it is chosen.
254 Since the `b' is already spoken for,
255 the `b*' must match its last possibility\(emthe empty string\(emsince
256 it must respect the earlier choice.
258 In the particular case where no `|'s are present and there is only one
259 `*', `+', or `?', the net effect is that the longest possible
260 match will be chosen.
263 presented with `xabbbby', will match `abbbb'.
266 is tried against `xabyabbbz', it
267 will match `ab' just after `x', due to the begins-earliest rule.
268 (In effect, the decision on where to start the match is the first choice
269 to be made, hence subsequent choices must respect it even if this leads them
270 to less-preferred alternatives.)
280 where failures are syntax errors, exceeding implementation limits,
281 or applying `+' or `*' to a possibly-null operand.
291 Both code and manual page for
297 were written at the University of Toronto
300 They are intended to be compatible with the Bell V8
302 but are not derived from Bell code.
304 Empty branches and empty regular expressions are not portable to V8.
306 The restriction against
307 applying `*' or `+' to a possibly-null operand is an artifact of the
308 simplistic implementation.
312 newline-separated branches;
318 compactness and simplicity,
319 it's not strikingly fast.
320 It does give special attention to handling simple cases quickly.