Add localedef(1), a locale definition generator tool
authorJohn Marino <draco@marino.st>
Tue, 28 Jul 2015 16:31:53 +0000 (18:31 +0200)
committerJohn Marino <draco@marino.st>
Tue, 28 Jul 2015 17:55:15 +0000 (19:55 +0200)
commitcd1c60858dc5ce0eb760fdf2eb6eed57cb251abd
tree79b09c8ae90d47a1508bf0627c12a4f44b462666
parentf88a4ba1c1ad595fb9d578a46f445c17b742daf5
Add localedef(1), a locale definition generator tool

The localedef tool can read entire (and unmodified) CLDR posix definition
files, and generate all 6 LC categories: LC_COLLATE, LC_CTYPE, LC_TIME,
LC_NUMERIC, LC_MONETARY and LC_MESSAGES.

The last 4 of those aren't needed.  We already have a tool that generates
msgdef, timedef, moneydef and numericdef.  In the immediate future,
localedef will only be used generate LC_COLLATE files in a new format.
This will render colldef files unreadable, thus colldef will be removed
when this happens.

In the future, localedef will be tasked to generate LC_CTYPE files as
well.  When that happens, the mklocale tool will be retired.

While localedef *can* read pristine POSIX files (which causes 6 files
to be generated), it will given files with only the LC_COLLATE part,
which will also have all the white space removed as well.  Remove just
the spaces can save megabytes.

This tool has a long history with Solaris [1].  The Nexenta developers
modified it to read CLDR files and created the much richer collation
formats.  The libc collation functions have to be modified to read the
new format (called "DragonFly-4.4") and to handle the new data structures.

The result will be that locale-sensitive tools and functions will now
properly sort multibyte and unicode strings.  Our "BSD" sort is not locale
sensitive, so it will probably have to be replaced with GNU sort in order
to leverage our new collation capabilities.

This can't be hooked into the build yet.  It needs the new header for
collate.c to define the data structures.  Until that happens, this is
actually unbuildable.

[1] Linux also has a tool called localdef, but I do know know if it shares
    a common history or if it uses CLDR POSIX files.  It seems to have the
    same purpose though.
17 files changed:
usr.bin/localedef/Makefile [new file with mode: 0644]
usr.bin/localedef/avl.c [new file with mode: 0644]
usr.bin/localedef/avl.h [new file with mode: 0644]
usr.bin/localedef/avl_impl.h [new file with mode: 0644]
usr.bin/localedef/charmap.c [new file with mode: 0644]
usr.bin/localedef/collate.c [new file with mode: 0644]
usr.bin/localedef/ctype.c [new file with mode: 0644]
usr.bin/localedef/localedef.1 [new file with mode: 0644]
usr.bin/localedef/localedef.c [new file with mode: 0644]
usr.bin/localedef/localedef.h [new file with mode: 0644]
usr.bin/localedef/messages.c [new file with mode: 0644]
usr.bin/localedef/monetary.c [new file with mode: 0644]
usr.bin/localedef/numeric.c [new file with mode: 0644]
usr.bin/localedef/parser.y [new file with mode: 0644]
usr.bin/localedef/scanner.c [new file with mode: 0644]
usr.bin/localedef/time.c [new file with mode: 0644]
usr.bin/localedef/wide.c [new file with mode: 0644]