collate 3/4: Bring in input files for new LC_COLLATE format
[dragonfly.git] / usr.bin / colldef / colldef.1
CommitLineData
984263bc
MD
1.\" Copyright (c) 1995 Alex Tatmanjants <alex@elvisti.kiev.ua>
2.\" at Electronni Visti IA, Kiev, Ukraine.
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\" notice, this list of conditions and the following disclaimer.
10.\" 2. Redistributions in binary form must reproduce the above copyright
11.\" notice, this list of conditions and the following disclaimer in the
12.\" documentation and/or other materials provided with the distribution.
13.\"
14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND
15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
17.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE
18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
24.\" SUCH DAMAGE.
25.\"
0d5acd74 26.\" $FreeBSD: head/usr.bin/colldef/colldef.1 213573 2010-10-08 12:40:16Z uqs $
984263bc
MD
27.\"
28.Dd January 27, 1995
29.Dt COLLDEF 1
30.Os
31.Sh NAME
32.Nm colldef
33.Nd convert collation sequence source definition
34.Sh SYNOPSIS
35.Nm
36.Op Fl I Ar map_dir
37.Op Fl o Ar out_file
38.Op Ar filename
39.Sh DESCRIPTION
40The
41.Nm
42utility converts a collation sequence source definition
43into a format usable by the
44.Fn strxfrm
45and
46.Fn strcoll
47functions.
48It is used to define the many ways in which
49strings can be ordered and collated.
0d5acd74 50The
984263bc 51.Fn strxfrm
0d5acd74 52function transforms
984263bc
MD
53its first argument and places the result in its second
54argument.
55The transformed string is such that it can be
56correctly ordered with other transformed strings by using
57.Fn strcmp ,
58.Fn strncmp ,
59or
60.Fn memcmp .
0d5acd74 61The
984263bc 62.Fn strcoll
0d5acd74 63function transforms its arguments and does a
984263bc
MD
64comparison.
65.Pp
66The
67.Nm
68utility reads the collation sequence source definition
69from the standard input and stores the converted definition in filename.
70The output file produced contains the
71database with collating sequence information in a form
72usable by system commands and routines.
73.Pp
0d5acd74
JM
74The following options are available:
75.Bl -tag -width indent
984263bc 76.It Fl I Ar map_dir
0d5acd74 77Set directory name where
984263bc
MD
78.Ar charmap
79files can be found, current directory by default.
80.It Fl o Ar out_file
0d5acd74 81Set output file name,
984263bc
MD
82.Ar LC_COLLATE
83by default.
84.El
85.Pp
86The collation sequence definition specifies a set of collating elements and
87the rules defining how strings containing these should be ordered.
88This is most useful for different language definitions.
89.Pp
90The specification file can consist of three statements:
91.Ar charmap ,
92.Ar substitute
93and
94.Ar order .
95.Pp
96Of these, only the
97.Ar order
98statement is required.
99When
100.Ar charmap
101or
102.Ar substitute
103is
104supplied, these statements must be ordered as above.
105Any
106statements after the order statement are ignored.
107.Pp
108Lines in the specification file beginning with a
0d5acd74 109.Ql #
984263bc
MD
110are
111treated as comments and are ignored.
112Blank lines are also
113ignored.
114.Pp
0d5acd74 115.Dl "charmap charmapfile"
984263bc 116.Pp
0d5acd74 117.Ar Charmap
984263bc
MD
118defines where a mapping of the character
119and collating element symbols to the actual
120character encoding can be found.
121.Pp
122The format of
123.Ar charmapfile
124is shown below.
125Symbol
126names are separated from their values by TAB or
0d5acd74
JM
127SPACE characters.
128Symbol-value can be specified in
984263bc
MD
129a hexadecimal (\ex\fI??\fR) or octal (\e\fI???\fR)
130representation, and can be only one character in length.
0d5acd74
JM
131.Bd -literal -offset indent
132symbol-name1 symbol-value1
133symbol-name2 symbol-value2
134\&...
984263bc
MD
135.Ed
136.Pp
0d5acd74
JM
137Symbol names cannot be specified in
138.Ar substitute
139fields.
140.Pp
984263bc
MD
141The
142.Ar charmap
143statement is optional.
0d5acd74
JM
144.Bd -literal -offset indent
145substitute "symbol" with "repl_string"
146.Ed
984263bc
MD
147.Pp
148The
149.Ar substitute
150statement substitutes the character
151.Ar symbol
152with the string
153.Ar repl_string .
154Symbol names cannot be specified in
155.Ar repl_string
156field.
157The
158.Ar substitute
159statement is optional.
160.Pp
0d5acd74 161.Dl "order order_list"
984263bc 162.Pp
0d5acd74 163.Ar Order_list
984263bc
MD
164is a list of symbols, separated by semi colons, that defines the
165collating sequence.
166The
167special symbol
168.Ar ...
169specifies, in a short-hand
170form, symbols that are sequential in machine code
171order.
172.Pp
173An order list element
174can be represented in any one of the following
175ways:
176.Bl -bullet
177.It
178The symbol itself (for example,
179.Ar a
180for the lower-case letter
0d5acd74 181.Ar a ) .
984263bc
MD
182.It
183The symbol in octal representation (for example,
184.Ar \e141
185for the letter
0d5acd74 186.Ar a ) .
984263bc
MD
187.It
188The symbol in hexadecimal representation (for example,
189.Ar \ex61
190for the letter
0d5acd74 191.Ar a ) .
984263bc
MD
192.It
193The symbol name as defined in the
194.Ar charmap
195file (for example,
196.Ar <letterA>
197for
198.Ar letterA \e023
199record in
200.Ar charmapfile ) .
201If character map name have
202.Ar >
203character, it must be escaped as
204.Ar /> ,
205single
206.Ar /
207must be escaped as
208.Ar // .
209.It
210Symbols
211.Ar \ea ,
212.Ar \eb ,
213.Ar \ef ,
214.Ar \en ,
215.Ar \er ,
216.Ar \ev
0d5acd74 217are permitted in its usual C-language meaning.
984263bc
MD
218.It
219The symbol chain (for example:
220.Ar abc ,
221.Ar <letterA><letterB>c ,
222.Ar \exf1b\exf2 )
223.It
224The symbol range (for example,
0d5acd74 225.Ar a;...;z ) .
984263bc
MD
226.It
227Comma-separated symbols, ranges and chains enclosed in parenthesis (for example
228.Ar \&(
229.Ar sym1 ,
230.Ar sym2 ,
231.Ar ...
232.Ar \&) )
233are assigned the
234same primary ordering but different secondary
235ordering.
236.It
237Comma-separated symbols, ranges and chains enclosed in curly brackets (for example
238.Ar \&{
239.Ar sym1 ,
240.Ar sym2 ,
241.Ar ...
242.Ar \&} )
243are assigned the same primary ordering only.
244.El
245.Pp
246The backslash character
247.Ar \e
248is used for continuation.
249In this case, no characters are permitted
250after the backslash character.
1847e88f 251.Sh FILES
0d5acd74
JM
252.Bl -tag -width indent
253.It Pa /usr/share/locale/ Ns Ao Ar language Ac Ns Pa /LC_COLLATE
254The standard shared location for collation orders
255under the locale
a687a1fc 256.Aq Ar language .
1847e88f 257.El
0d5acd74 258.Sh EXIT STATUS
984263bc
MD
259The
260.Nm
261utility exits with the following values:
262.Bl -tag -width indent
263.It Li 0
264No errors were found and the output was successfully created.
265.It Li !=0
266Errors were found.
267.El
984263bc
MD
268.Sh SEE ALSO
269.Xr mklocale 1 ,
270.Xr setlocale 3 ,
271.Xr strcoll 3 ,
272.Xr strxfrm 3