1 .\" $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $
3 .\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4 .\" All rights reserved.
6 .\" This code is derived from software contributed to The NetBSD Foundation
7 .\" by Gregory McGarry.
9 .\" Redistribution and use in source and binary forms, with or without
10 .\" modification, are permitted provided that the following conditions
12 .\" 1. Redistributions of source code must retain the above copyright
13 .\" notice, this list of conditions and the following disclaimer.
14 .\" 2. Redistributions in binary form must reproduce the above copyright
15 .\" notice, this list of conditions and the following disclaimer in the
16 .\" documentation and/or other materials provided with the distribution.
17 .\" 3. All advertising materials mentioning features or use of this software
18 .\" must display the following acknowledgement:
19 .\" This product includes software developed by the NetBSD
20 .\" Foundation, Inc. and its contributors.
21 .\" 4. Neither the name of The NetBSD Foundation nor the names of its
22 .\" contributors may be used to endorse or promote products derived
23 .\" from this software without specific prior written permission.
25 .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26 .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27 .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28 .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29 .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30 .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31 .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32 .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33 .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34 .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35 .\" POSSIBILITY OF SUCH DAMAGE.
37 .\" $DragonFly: src/share/man/man7/nls.7,v 1.5 2006/10/14 23:59:59 swildner Exp $
44 .Nd Native Language Support Overview
46 Native Language Support (NLS) provides commands for a single
47 worldwide operating system base.
48 An internationalized system has no built-in assumptions or dependencies
49 on language-specific or cultural-specific conventions such as:
51 .Bl -bullet -offset indent -compact
53 Character classifications
55 Character comparison rules
57 Character collation order
59 Numeric and monetary formatting
61 Date and time formatting
68 All information pertaining to cultural conventions and language is
69 obtained at program run time.
71 .Dq Internationalization
74 refers to the operation by which system software is developed to support
75 multiple cultural-specific and language-specific conventions.
76 This is a generalization process by which the system is untied from
77 calling only English strings or other English-specific conventions.
81 refers to the operations by which the user environment is customized to
82 handle its input and output appropriate for specific language and cultural
84 This is a specialization process, by which generic methods already
85 implemented in an internationalized system are used in specific ways.
86 The formal description of cultural conventions for some country, together
87 with all associated translations targeted to the native language, is
92 provides extensive support to programmers and system developers to
93 enable internationalized software to be developed.
95 also supplies a large variety of locales for system localization.
96 .Ss Localization of Information
97 All locale information is accessible to programs at run time so that
98 data is processed and displayed correctly for specific cultural
99 conventions and language.
101 A locale is divided into categories.
102 A category is a group of language-specific and culture-specific conventions
103 as outlined in the list above.
104 ISO C specifies the following six standard categories supported by
107 .Bl -tag -compact -width LC_MONETARYXX
109 string-collation order information
111 character classification, case conversion, and other character attributes
113 the format for affirmative and negative responses
115 rules and symbols for formatting monetary numeric information
117 rules and symbols for formatting nonmonetary numeric information
119 rules and symbols for formatting time and date information
122 Localization of the system is achieved by setting appropriate values
123 in environment variables to identify which locale should be used.
124 The environment variables have the same names as their respective
131 environment variables are used.
134 environment variable specifies a colon-separated list of directory names
135 where the message catalog files of the NLS database are located.
140 environment variables also determine the current locale.
142 The values of these environment variables contains a string format as:
145 language[_territory][.codeset][@modifier]
148 Valid values for the language field come from the ISO639 standard which
149 defines two-character codes for many languages.
150 Some common language codes are:
153 .ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
154 \fILanguage Name\fP \fICode\fP \fILanguage Family\fP
155 .ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
157 ABKHAZIAN AB IBERO-CAUCASIAN
158 AFAN (OROMO) OM HAMITIC
160 AFRIKAANS AF GERMANIC
161 ALBANIAN SQ INDO-EUROPEAN (OTHER)
164 ARMENIAN HY INDO-EUROPEAN (OTHER)
167 AZERBAIJANI AZ TURKIC/ALTAIC
168 BASHKIR BA TURKIC/ALTAIC
177 BYELORUSSIAN BE SLAVIC
187 ESPERANTO EO INTERNATIONAL AUX.
188 ESTONIAN ET FINNO-UGRIC
190 FIJI FJ OCEANIC/INDONESIAN
191 FINNISH FI FINNO-UGRIC
195 GEORGIAN KA IBERO-CAUCASIAN
198 GREENLANDIC KL ESKIMO
199 GUARANI GN AMERINDIAN
201 HAUSA HA NEGRO-AFRICAN
204 HUNGARIAN HU FINNO-UGRIC
205 ICELANDIC IS GERMANIC
206 INDONESIAN ID OCEANIC/INDONESIAN
207 INTERLINGUA IA INTERNATIONAL AUX.
208 INTERLINGUE IE INTERNATIONAL AUX.
214 JAVANESE JV OCEANIC/INDONESIAN
217 KAZAKH KK TURKIC/ALTAIC
218 KINYARWANDA RW NEGRO-AFRICAN
219 KIRGHIZ KY TURKIC/ALTAIC
220 KURUNDI RN NEGRO-AFRICAN
226 LINGALA LN NEGRO-AFRICAN
229 MALAGASY MG OCEANIC/INDONESIAN
230 MALAY MS OCEANIC/INDONESIAN
231 MALAYALAM ML DRAVIDIAN
233 MAORI MI OCEANIC/INDONESIAN
239 NORWEGIAN NO GERMANIC
243 PERSIAN (farsi) FA IRANIAN
245 PORTUGUESE PT ROMANCE
247 QUECHUA QU AMERINDIAN
248 RHAETO-ROMANCE RM ROMANCE
251 SAMOAN SM OCEANIC/INDONESIAN
252 SANGHO SG NEGRO-AFRICAN
254 SCOTS GAELIC GD CELTIC
256 SERBO-CROATIAN SH SLAVIC
257 SESOTHO ST NEGRO-AFRICAN
258 SETSWANA TN NEGRO-AFRICAN
259 SHONA SN NEGRO-AFRICAN
262 SISWATI SS NEGRO-AFRICAN
267 SUNDANESE SU OCEANIC/INDONESIAN
268 SWAHILI SW NEGRO-AFRICAN
270 TAGALOG TL OCEANIC/INDONESIAN
273 TATAR TT TURKIC/ALTAIC
278 TONGA TO OCEANIC/INDONESIAN
279 TSONGA TS NEGRO-AFRICAN
280 TURKISH TR TURKIC/ALTAIC
281 TURKMEN TK TURKIC/ALTAIC
286 UZBEK UZ TURKIC/ALTAIC
288 VOLAPUK VO INTERNATIONAL AUX.
290 WOLOF WO NEGRO-AFRICAN
291 XHOSA XH NEGRO-AFRICAN
293 YORUBA YO NEGRO-AFRICAN
295 ZULU ZU NEGRO-AFRICAN
299 For example, the locale for the Danish language spoken in Denmark
300 using the ISO8859-1 character set is da_DK.ISO8859-1.
301 The da stands for the Danish language and the DK stands for Denmark.
302 The short form of da_DK is sufficient to indicate this locale.
304 The environment variable settings are queried by their priority level
305 in the following manner:
311 environment variable is set, all six categories use the locale it
316 environment variable is not set, each individual category uses the
317 locale specified by its corresponding environment variable.
321 environment variable is not set, and a value for a particular
323 environment variable is not set, the value of the
325 environment variable specifies the default locale for all categories.
328 environment variable should be set in /etc/profile, since it makes it
329 most easy for the user to override the system default using the individual
335 environment variable is not set, a value for a particular
337 environment variable is not set, and the value of the
339 environment variable is not set, the locale for that specific
340 category defaults to the C locale.
341 The C or POSIX locale assumes the 7-bit ASCII character set and defines
342 information for the six categories.
345 A character is any symbol used for the organization, control, or
346 representation of data.
347 A group of such symbols used to describe a
348 particular language make up a character set.
349 It is the encoding values in a character set that provide
350 the interface between the system and its input and output devices.
352 The following character sets are supported in
354 .Bl -tag -width ISO8859_family
356 Industry-standard character sets are provided by means of the ISO8859
357 family of character sets, which provide a range of single-byte character set
358 support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
360 The eucJP character set is the industry-standard character set used to support
363 A Unicode environment based on the UTF-8 character set is supported for all
364 supported language/territories.
365 UTF-8 provides character support for most of the major languages of the
366 world and can be used in environments where multiple languages must be
367 processed simultaneously.
370 A font set contains the glyphs to be displayed on the screen for a
371 corresponding character in a character set.
372 A display must support a suitable font to display a character set.
373 If suitable fonts are available to the X server, then X clients can
374 include support for different character sets.
376 includes support for UTF-8 character sets.
378 is useful for displaying all the characters in an X font.
383 console provides support for loading a variety of fonts using the
385 utility. Available fonts can be found in
386 .Pa /usr/share/syscons/fonts .
387 .Ss Internationalization for Programmers
388 To facilitate translations of messages into various languages and to
389 make the translated messages available to the program based on a
390 user's locale, it is necessary to keep messages separate from the
391 programs and provide them in the form of message catalogs that a
392 program can access at run time.
394 Access to locale information is provided through the
399 See their respective man pages for further information.
401 Message source files containing application messages are created by
402 the programmer and converted to message catalogs.
403 These catalogs are used by the application to retrieve and display
407 supports two message catalog interfaces: the X/Open
409 interface and the Uniforum
414 interface has the advantage that it belongs to a standard which is
416 Unfortunately the interface is complicated to use and
417 maintenance of the catalogs is difficult.
418 The implementation also doesn't support different character sets.
421 interface has not been standardized yet, however it is being supported
422 by an increasing number of systems.
423 It also provides many additional tools which make programming and
424 catalog maintenance much easier.
425 .Ss Support for Multibyte Characters and Wide Characters
426 Character sets with multibyte characters may be difficult to decode, or may
427 contain state (i.e., adjacent characters are dependent).
428 ISO C specifies a set of functions using 'wide characters' which can handle
429 multibyte characters properly.
430 A wide character is specified in ISO C
431 as being a fixed number of bits wide and is stateless.
433 There are two types for wide characters:
438 is a type which can contain one wide character and operates like 'char'
439 type does for one character.
441 can contain one wide character or WEOF (wide EOF).
443 There are functions that operate on
445 and substitute for functions operating on 'char'.
451 There are some additional functions that operate on
459 Wide characters should be used for all I/O processing which may rely
460 on locale-specific strings.
461 The two primary issues requiring special use of wide characters are:
462 .Bl -bullet -offset indent
464 All I/O is performed using multibyte characters.
465 Input data is converted into wide characters immediately after
466 reading and data for output is converted from wide characters to
467 multibyte characters immediately before writing.
468 Conversion is achieved using
478 Wide characters are used directly for I/O, using
489 They are also used for formatted I/O functions for wide characters
501 and wide character identifier of %lc, %C, %ls, %S for conventional
502 formatted I/O functions.
514 This man page is incomplete.