1 .\" $NetBSD: nls.7,v 1.15 2009/04/09 02:51:54 joerg Exp $
3 .\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4 .\" All rights reserved.
6 .\" This code is derived from software contributed to The NetBSD Foundation
7 .\" by Gregory McGarry.
9 .\" Redistribution and use in source and binary forms, with or without
10 .\" modification, are permitted provided that the following conditions
12 .\" 1. Redistributions of source code must retain the above copyright
13 .\" notice, this list of conditions and the following disclaimer.
14 .\" 2. Redistributions in binary form must reproduce the above copyright
15 .\" notice, this list of conditions and the following disclaimer in the
16 .\" documentation and/or other materials provided with the distribution.
18 .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19 .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20 .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21 .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22 .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23 .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24 .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25 .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26 .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27 .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28 .\" POSSIBILITY OF SUCH DAMAGE.
35 .Nd Native Language Support Overview
37 Native Language Support (NLS) provides commands for a single
38 worldwide operating system base.
39 An internationalized system has no built-in assumptions or dependencies
40 on language-specific or cultural-specific conventions such as:
42 .Bl -bullet -offset indent -compact
44 Character classifications
46 Character comparison rules
48 Character collation order
50 Numeric and monetary formatting
52 Date and time formatting
59 All information pertaining to cultural conventions and language is
60 obtained at program run time.
62 .Dq Internationalization
65 refers to the operation by which system software is developed to support
66 multiple cultural-specific and language-specific conventions.
67 This is a generalization process by which the system is untied from
68 calling only English strings or other English-specific conventions.
72 refers to the operations by which the user environment is customized to
73 handle its input and output appropriate for specific language and cultural
75 This is a specialization process, by which generic methods already
76 implemented in an internationalized system are used in specific ways.
77 The formal description of cultural conventions for some country, together
78 with all associated translations targeted to the native language, is
83 provides extensive support to programmers and system developers to
84 enable internationalized software to be developed.
86 also supplies a large variety of locales for system localization.
87 .Ss Localization of Information
88 All locale information is accessible to programs at run time so that
89 data is processed and displayed correctly for specific cultural
90 conventions and language.
92 A locale is divided into categories.
93 A category is a group of language-specific and culture-specific conventions
94 as outlined in the list above.
95 ISO C specifies the following six standard categories supported by
98 .Bl -tag -compact -width ".Ev LC_MONETARY"
100 string-collation order information
102 character classification, case conversion, and other character attributes
104 the format for affirmative and negative responses
106 rules and symbols for formatting monetary numeric information
108 rules and symbols for formatting nonmonetary numeric information
110 rules and symbols for formatting time and date information
113 Localization of the system is achieved by setting appropriate values
114 in environment variables to identify which locale should be used.
115 The environment variables have the same names as their respective
122 environment variables are used.
125 environment variable specifies a colon-separated list of directory names
126 where the message catalog files of the NLS database are located.
131 environment variables also determine the current locale.
133 The values of these environment variables contains a string format as:
135 language[_territory][.codeset][@modifier]
138 Valid values for the language field come from the ISO639 standard which
139 defines two-character codes for many languages.
140 Some common language codes are:
141 .Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN"
142 .It Sy Language Name Ta Sy Code Ta Sy Language Family
143 .It ABKHAZIAN Ta AB Ta IBERO-CAUCASIAN
144 .It AFAN (OROMO) Ta OM Ta HAMITIC
145 .It AFAR Ta AA Ta HAMITIC
146 .It AFRIKAANS Ta AF Ta GERMANIC
147 .It ALBANIAN Ta SQ Ta INDO-EUROPEAN (OTHER)
148 .It AMHARIC Ta AM Ta SEMITIC
149 .It ARABIC Ta AR Ta SEMITIC
150 .It ARMENIAN Ta HY Ta INDO-EUROPEAN (OTHER)
151 .It ASSAMESE Ta AS Ta INDIAN
152 .It AYMARA Ta AY Ta AMERINDIAN
153 .It AZERBAIJANI Ta AZ Ta TURKIC/ALTAIC
154 .It BASHKIR Ta BA Ta TURKIC/ALTAIC
155 .It BASQUE Ta EU Ta BASQUE
156 .It BENGALI Ta BN Ta INDIAN
157 .It BHUTANI Ta DZ Ta ASIAN
158 .It BIHARI Ta BH Ta INDIAN
159 .It BISLAMA Ta BI Ta ""
160 .It BRETON Ta BR Ta CELTIC
161 .It BULGARIAN Ta BG Ta SLAVIC
162 .It BURMESE Ta MY Ta ASIAN
163 .It BYELORUSSIAN Ta BE Ta SLAVIC
164 .It CAMBODIAN Ta KM Ta ASIAN
165 .It CATALAN Ta CA Ta ROMANCE
166 .It CHINESE Ta ZH Ta ASIAN
167 .It CORSICAN Ta CO Ta ROMANCE
168 .It CROATIAN Ta HR Ta SLAVIC
169 .It CZECH Ta CS Ta SLAVIC
170 .It DANISH Ta DA Ta GERMANIC
171 .It DUTCH Ta NL Ta GERMANIC
172 .It ENGLISH Ta EN Ta GERMANIC
173 .It ESPERANTO Ta EO Ta INTERNATIONAL AUX.
174 .It ESTONIAN Ta ET Ta FINNO-UGRIC
175 .It FAROESE Ta FO Ta GERMANIC
176 .It FIJI Ta FJ Ta OCEANIC/INDONESIAN
177 .It FINNISH Ta FI Ta FINNO-UGRIC
178 .It FRENCH Ta FR Ta ROMANCE
179 .It FRISIAN Ta FY Ta GERMANIC
180 .It GALICIAN Ta GL Ta ROMANCE
181 .It GEORGIAN Ta KA Ta IBERO-CAUCASIAN
182 .It GERMAN Ta DE Ta GERMANIC
183 .It GREEK Ta EL Ta LATIN/GREEK
184 .It GREENLANDIC Ta KL Ta ESKIMO
185 .It GUARANI Ta GN Ta AMERINDIAN
186 .It GUJARATI Ta GU Ta INDIAN
187 .It HAUSA Ta HA Ta NEGRO-AFRICAN
188 .It HEBREW Ta HE Ta SEMITIC
189 .It HINDI Ta HI Ta INDIAN
190 .It HUNGARIAN Ta HU Ta FINNO-UGRIC
191 .It ICELANDIC Ta IS Ta GERMANIC
192 .It INDONESIAN Ta ID Ta OCEANIC/INDONESIAN
193 .It INTERLINGUA Ta IA Ta INTERNATIONAL AUX.
194 .It INTERLINGUE Ta IE Ta INTERNATIONAL AUX.
195 .It INUKTITUT Ta IU Ta ""
196 .It INUPIAK Ta IK Ta ESKIMO
197 .It IRISH Ta GA Ta CELTIC
198 .It ITALIAN Ta IT Ta ROMANCE
199 .It JAPANESE Ta JA Ta ASIAN
200 .It JAVANESE Ta JV Ta OCEANIC/INDONESIAN
201 .It KANNADA Ta KN Ta DRAVIDIAN
202 .It KASHMIRI Ta KS Ta INDIAN
203 .It KAZAKH Ta KK Ta TURKIC/ALTAIC
204 .It KINYARWANDA Ta RW Ta NEGRO-AFRICAN
205 .It KIRGHIZ Ta KY Ta TURKIC/ALTAIC
206 .It KURUNDI Ta RN Ta NEGRO-AFRICAN
207 .It KOREAN Ta KO Ta ASIAN
208 .It KURDISH Ta KU Ta IRANIAN
209 .It LAOTHIAN Ta LO Ta ASIAN
210 .It LATIN Ta LA Ta LATIN/GREEK
211 .It LATVIAN Ta LV Ta BALTIC
212 .It LINGALA Ta LN Ta NEGRO-AFRICAN
213 .It LITHUANIAN Ta LT Ta BALTIC
214 .It MACEDONIAN Ta MK Ta SLAVIC
215 .It MALAGASY Ta MG Ta OCEANIC/INDONESIAN
216 .It MALAY Ta MS Ta OCEANIC/INDONESIAN
217 .It MALAYALAM Ta ML Ta DRAVIDIAN
218 .It MALTESE Ta MT Ta SEMITIC
219 .It MAORI Ta MI Ta OCEANIC/INDONESIAN
220 .It MARATHI Ta MR Ta INDIAN
221 .It MOLDAVIAN Ta MO Ta ROMANCE
222 .It MONGOLIAN Ta MN Ta ""
223 .It NAURU Ta NA Ta ""
224 .It NEPALI Ta NE Ta INDIAN
225 .It NORWEGIAN Ta NO Ta GERMANIC
226 .It OCCITAN Ta OC Ta ROMANCE
227 .It ORIYA Ta OR Ta INDIAN
228 .It PASHTO Ta PS Ta IRANIAN
229 .It PERSIAN (farsi) Ta FA Ta IRANIAN
230 .It POLISH Ta PL Ta SLAVIC
231 .It PORTUGUESE Ta PT Ta ROMANCE
232 .It PUNJABI Ta PA Ta INDIAN
233 .It QUECHUA Ta QU Ta AMERINDIAN
234 .It RHAETO-ROMANCE Ta RM Ta ROMANCE
235 .It ROMANIAN Ta RO Ta ROMANCE
236 .It RUSSIAN Ta RU Ta SLAVIC
237 .It SAMOAN Ta SM Ta OCEANIC/INDONESIAN
238 .It SANGHO Ta SG Ta NEGRO-AFRICAN
239 .It SANSKRIT Ta SA Ta INDIAN
240 .It SCOTS GAELIC Ta GD Ta CELTIC
241 .It SERBIAN Ta SR Ta SLAVIC
242 .It SERBO-CROATIAN Ta SH Ta SLAVIC
243 .It SESOTHO Ta ST Ta NEGRO-AFRICAN
244 .It SETSWANA Ta TN Ta NEGRO-AFRICAN
245 .It SHONA Ta SN Ta NEGRO-AFRICAN
246 .It SINDHI Ta SD Ta INDIAN
247 .It SINGHALESE Ta SI Ta INDIAN
248 .It SISWATI Ta SS Ta NEGRO-AFRICAN
249 .It SLOVAK Ta SK Ta SLAVIC
250 .It SLOVENIAN Ta SL Ta SLAVIC
251 .It SOMALI Ta SO Ta HAMITIC
252 .It SPANISH Ta ES Ta ROMANCE
253 .It SUNDANESE Ta SU Ta OCEANIC/INDONESIAN
254 .It SWAHILI Ta SW Ta NEGRO-AFRICAN
255 .It SWEDISH Ta SV Ta GERMANIC
256 .It TAGALOG Ta TL Ta OCEANIC/INDONESIAN
257 .It TAJIK Ta TG Ta IRANIAN
258 .It TAMIL Ta TA Ta DRAVIDIAN
259 .It TATAR Ta TT Ta TURKIC/ALTAIC
260 .It TELUGU Ta TE Ta DRAVIDIAN
261 .It THAI Ta TH Ta ASIAN
262 .It TIBETAN Ta BO Ta ASIAN
263 .It TIGRINYA Ta TI Ta SEMITIC
264 .It TONGA Ta TO Ta OCEANIC/INDONESIAN
265 .It TSONGA Ta TS Ta NEGRO-AFRICAN
266 .It TURKISH Ta TR Ta TURKIC/ALTAIC
267 .It TURKMEN Ta TK Ta TURKIC/ALTAIC
268 .It TWI Ta TW Ta NEGRO-AFRICAN
269 .It UIGUR Ta UG Ta ""
270 .It UKRAINIAN Ta UK Ta SLAVIC
271 .It URDU Ta UR Ta INDIAN
272 .It UZBEK Ta UZ Ta TURKIC/ALTAIC
273 .It VIETNAMESE Ta VI Ta ASIAN
274 .It VOLAPUK Ta VO Ta INTERNATIONAL AUX.
275 .It WELSH Ta CY Ta CELTIC
276 .It WOLOF Ta WO Ta NEGRO-AFRICAN
277 .It XHOSA Ta XH Ta NEGRO-AFRICAN
278 .It YIDDISH Ta YI Ta GERMANIC
279 .It YORUBA Ta YO Ta NEGRO-AFRICAN
280 .It ZHUANG Ta ZA Ta ""
281 .It ZULU Ta ZU Ta NEGRO-AFRICAN
284 For example, the locale for the Danish language spoken in Denmark
285 using the ISO 8859-1 character set is da_DK.ISO8859-1.
286 The da stands for the Danish language and the DK stands for Denmark.
287 The short form of da_DK is sufficient to indicate this locale.
289 The environment variable settings are queried by their priority level
290 in the following manner:
295 environment variable is set, all six categories use the locale it
300 environment variable is not set, each individual category uses the
301 locale specified by its corresponding environment variable.
305 environment variable is not set, and a value for a particular
307 environment variable is not set, the value of the
309 environment variable specifies the default locale for all categories.
312 environment variable should be set in /etc/profile, since it makes it
313 most easy for the user to override the system default using the individual
319 environment variable is not set, a value for a particular
321 environment variable is not set, and the value of the
323 environment variable is not set, the locale for that specific
324 category defaults to the C locale.
325 The C or POSIX locale assumes the ASCII character set and defines
326 information for the six categories.
329 A character is any symbol used for the organization, control, or
330 representation of data.
331 A group of such symbols used to describe a
332 particular language make up a character set.
333 It is the encoding values in a character set that provide
334 the interface between the system and its input and output devices.
336 The following character sets are supported in
338 .Bl -tag -width ISO_8859_family
340 The American Standard Code for Information Exchange (ASCII) standard
341 specifies 128 Roman characters and control codes, encoded in a 7-bit
342 character encoding scheme.
344 Industry-standard character sets specified by the ISO/IEC 8859
346 The standard is divided into 15 numbered parts, with each
347 part specifying broad script similarities.
348 Examples include Western European, Central European, Arabic, Cyrillic,
349 Hebrew, Greek, and Turkish.
350 The character sets use an 8-bit character encoding scheme which is
351 compatible with the ASCII character set.
353 The Unicode character set is the full set of known abstract characters of
354 all real-world scripts. It can be used in environments where multiple
355 scripts must be processed simultaneously.
356 Unicode is compatible with ISO 8859-1 (Western European) and ASCII.
357 Many character encoding schemes are available for Unicode, including UTF-8,
359 These encoding schemes are multi-byte encodings.
360 The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is
361 compatible with ASCII.
362 The UTF-16 encoding scheme uses 16-bit, variable-width encodings.
363 The UTF-32 encoding scheme using 32-bit, fixed-width encodings.
366 A font set contains the glyphs to be displayed on the screen for a
367 corresponding character in a character set.
368 A display must support a suitable font to display a character set.
369 If suitable fonts are available to the X server, then X clients can
370 include support for different character sets.
372 includes support for Unicode with UTF-8 encoding.
374 is useful for displaying all the characters in an X font.
379 console provides support for loading a variety of fonts using the
381 utility. Available fonts can be found in
382 .Pa /usr/share/syscons/fonts .
383 .Ss Internationalization for Programmers
384 To facilitate translations of messages into various languages and to
385 make the translated messages available to the program based on a
386 user's locale, it is necessary to keep messages separate from the
387 programs and provide them in the form of message catalogs that a
388 program can access at run time.
390 Access to locale information is provided through the
395 See their respective man pages for further information.
397 Message source files containing application messages are created by
398 the programmer and converted to message catalogs.
399 These catalogs are used by the application to retrieve and display
403 supports two message catalog interfaces: the X/Open
405 interface and the Uniforum
410 interface has the advantage that it belongs to a standard which is
412 Unfortunately the interface is complicated to use and
413 maintenance of the catalogs is difficult.
414 The implementation also doesn't support different character sets.
417 interface has not been standardized yet, however it is being supported
418 by an increasing number of systems.
419 It also provides many additional tools which make programming and
420 catalog maintenance much easier.
421 .Ss Support for Multi-byte Encodings
422 Some character sets with multi-byte encodings may be difficult to decode,
423 or may contain state (i.e., adjacent characters are dependent).
424 ISO C specifies a set of functions using 'wide characters' which can handle
425 multi-byte encodings properly.
426 The behaviour of these functions is affected
429 category of the current locale.
431 A wide character is specified in ISO C
432 as being a fixed number of bits wide and is stateless.
433 There are two types for wide characters:
438 is a type which can contain one wide character and operates like 'char'
439 type does for one character.
441 can contain one wide character or WEOF (wide EOF).
443 There are functions that operate on
445 and substitute for functions operating on 'char'.
451 There are some additional functions that operate on
459 Wide characters should be used for all I/O processing which may rely
460 on locale-specific strings.
461 The two primary issues requiring special use of wide characters are:
462 .Bl -bullet -offset indent
464 All I/O is performed using multibyte characters.
465 Input data is converted into wide characters immediately after
466 reading and data for output is converted from wide characters to
467 multi-byte encoding immediately before writing.
468 Conversion is controlled by the
478 Wide characters are used directly for I/O, using
489 They are also used for formatted I/O functions for wide characters
501 and wide character identifier of %lc, %C, %ls, %S for conventional
502 formatted I/O functions.
510 .Xr gettext 3 Pq Pa devel/gettext ,
514 This man page is incomplete.