libc/regex: Replace old regex library with modified TRE
authorJohn Marino <draco@marino.st>
Thu, 6 Aug 2015 21:26:49 +0000 (23:26 +0200)
committerJohn Marino <draco@marino.st>
Thu, 6 Aug 2015 22:07:33 +0000 (00:07 +0200)
commit6af9a77b394698e42f3a7ec6126497a3fc2fd470
tree42f8bd1f113048dc88eb1c9ce650aabbce4e6f4d
parent18c838a6f835d7872dcb7c7260d301665d615c22
libc/regex: Replace old regex library with modified TRE

The existing DragonFly REGEX library has several limitations, including
lack of wide character support and no collation ability due to its being
locked to POSIX/C locale.  It's also slow and doesn't pass a number of
tests of the AT&T Research Regex testsuite:

   basic       : TEST testregex, 539 tests,  0 errors
   categorize  : TEST testregex,  20 tests,  0 errors
   nullsubexpr : TEST testregex,  84 tests, 31 errors
   leftassoc   : TEST testregex,  12 tests, 12 errors
   rightassoc  : TEST testregex,  24 tests,  0 errors
   forcedassoc : TEST testregex,  48 tests,  8 errors
   repetition  : TEST testregex, 129 tests, 37 errors

Now it achieves these scores (elevated with new regnexec support):

   basic       : TEST testregex, 808 tests,  0 errors
   categorize  : TEST testregex,  26 tests,  0 errors
   nullsubexpr : TEST testregex, 172 tests,  0 errors
   leftassoc   : TEST testregex,  12 tests, 12 errors
   rightassoc  : TEST testregex,  36 tests,  0 errors
   forcedassoc : TEST testregex,  84 tests,  0 errors
   repetition  : TEST testregex, 241 tests,  0 errors

Here's proof that the regex library is now locale sensitive:

> env LANG=C sed /abandonn[a-z]/d fwl-sort-C.txt
a
abandonnâmes
abandonnât
abandonnâtes
abandonnèrent
abandonné
abandonnée
abandonnées
abandonnés
abord
abords
absence

> env LANG=fr_FR sed /abandonn[a-z]/d fwl-sort-C.txt
a
abord
abords
absence
accepta
acceptai
acceptaient
acceptais
acceptait
acceptant
acceptas
acceptasse

Several new functions have been added to to libc:

  variations of regcomp: regcomp_l,
    regncomp,  regncomp_l,
    regwcomp,  regwcomp_l,
    regnwcomp, regnwcomp_l

  variations of regexec: regnexec, regwexec, regwnexec

The regex.3 and re_format.7 map pages have been updated and symlinked
accordingly.
23 files changed:
include/Makefile
include/regex.h [deleted file]
lib/libc/Makefile.inc
lib/libc/regex/COPYRIGHT [deleted file]
lib/libc/regex/Makefile.inc [deleted file]
lib/libc/regex/Symbol.map [deleted file]
lib/libc/regex/WHATSNEW [deleted file]
lib/libc/regex/cname.h [deleted file]
lib/libc/regex/engine.c [deleted file]
lib/libc/regex/regcomp.c [deleted file]
lib/libc/regex/regerror.c [deleted file]
lib/libc/regex/regex2.h [deleted file]
lib/libc/regex/regexec.c [deleted file]
lib/libc/regex/regfree.c [deleted file]
lib/libc/regex/utils.h [deleted file]
lib/libc/tre-regex/Makefile.inc [new file with mode: 0644]
lib/libc/tre-regex/Symbol.map [new file with mode: 0644]
lib/libc/tre-regex/cname.h [new file with mode: 0644]
lib/libc/tre-regex/config.h [new file with mode: 0644]
lib/libc/tre-regex/re_format.7 [moved from lib/libc/regex/re_format.7 with 56% similarity]
lib/libc/tre-regex/regex.3 [moved from lib/libc/regex/regex.3 with 69% similarity]
lib/libc/tre-regex/regex.h [new file with mode: 0644]
lib/libc/tre-regex/tre.h [new file with mode: 0644]