cldr2def: Slim down ctype src files
authorJohn Marino <draco@marino.st>
Sat, 15 Aug 2015 13:04:22 +0000 (15:04 +0200)
committerJohn Marino <draco@marino.st>
Sat, 15 Aug 2015 13:28:23 +0000 (15:28 +0200)
I originally modified the tool to use "ranges" to define the CTYPE,
e.g. "<a>;...;<z>" rather than "<a>;<b>;<c> ... <z>".  This worked
great on UTF-8 but converting to other encodings is not supported
because part of the range may not exist, or the upper boundary may
come before the lower boundary in the target encoding.  Thus I had
to remove that work, but I was able to retain the removal of the now
redundant "print" section.

I confirmed that the output without "print" section was identical to
before, and then I added back "print" section with a single element:
NO-BREAK_SPACE. This character is used in quite a few monetary
defintions, but it was never mapped to CTYPE, which I believe is a
mistake.  NO-BREAK_SPACE is also defined as a blank, which is considered
a space as well by localedef tool (so there's no need to also define a
"space" section").

The net change is multibyte encodings now have non-breaking spaces
1) recognized and 2) defined as printable.

tools/tools/locale/tools/cldr2def.pl [changed mode: 0755->0644]

old mode 100755 (executable)
new mode 100644 (file)
index 4ec2c91..b70b72d
@@ -58,7 +58,7 @@ my %FILESNAMES = (
        "timedef"       => "LC_TIME",
        "msgdef"        => "LC_MESSAGES",
        "numericdef"    => "LC_NUMERIC",
-       "colldef"       => "LC_COLLATE",
+       "colldef"       => "LC_COLLATE",
        "ctypedef"      => "LC_CTYPE"
 );
 
@@ -388,7 +388,22 @@ sub transform_ctypes {
 # CLDR project, obtained from http://cldr.unicode.org/
 # -----------------------------------------------------------------------------
 EOF
-               print FOUT @lines;
+               my $category = '';
+               foreach my $line (@lines) {
+                       if ($actfile eq "xx_Comm_US") {
+                               print FOUT $line;
+                               next;
+                       }
+                       if ($line =~ /^([a-z]{3,})\s+</) {
+                               $category = $1;
+                               if ($category eq 'print') {
+                                       print FOUT "blank\t<NO-BREAK_SPACE>\n";
+                                       print FOUT "print\t<NO-BREAK_SPACE>\n\n";
+                               }
+                       }
+                       next if ($category eq 'print');
+                       print FOUT $line;
+               }
                close(FOUT);
 
                foreach my $enc (sort keys(%{$languages{$l}{$f}{data}{$c}})) {