2 Copyright (C) 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
4 Permission is granted to make and distribute verbatim copies of
5 this manual provided the copyright notice and this permission notice
6 are preserved on all copies.
8 Permission is granted to copy and distribute modified versions of this
9 manual under the conditions for verbatim copying, provided that the
10 entire resulting derived work is distributed under the terms of a
11 permission notice identical to this one.
13 Permission is granted to copy and distribute translations of this
14 manual into another language, under the above conditions for modified
15 versions, except that this permission notice may be included in
16 translations approved by the Free Software Foundation instead of in
20 .TH PRECONV @MAN1EXT@ "@MDATE@" "Groff Version @VERSION@"
24 preconv \- convert encoding of input files to something GNU troff understands
46 It is possible to have whitespace between the
48 command line option and its parameter.
55 and converts its encoding(s) to a form GNU
57 can process, sending the data to standard output.
58 Currently, this means ASCII characters and `\e[uXXXX]' entities, where
59 `XXXX' is a hexadecimal number with four to six digits, representing a
63 should be invoked with the
74 Emit debugging messages to standard error (mainly the used encoding).
78 Specify default encoding if everything fails (see below).
82 Specify input encoding explicitly, overriding all other methods.
89 uses the algorithm described below to select the input encoding.
99 Do not add .lf requests.
105 Print version number.
110 tries to find the input encoding with the following algorithm.
113 If the input encoding has been explicitly specified with option
118 Otherwise, check whether the input starts with a
124 Finally, check whether there is a known
126 (see below) in either the first or second input line.
130 If everything fails, use a default encoding as given with option
132 by the current locale, or `latin1' if the locale is set to `C',
133 `POSIX', or empty (in that order).
140 environment variable which is eventually expanded to option
143 .SS "Byte Order Mark"
144 The Unicode Standard defines character U+FEFF as the Byte Order Mark
146 On the other hand, value U+FFFE is guaranteed not be a Unicode character at
148 This allows to detect the byte order within the data stream (either
149 big-endian or lower-endian), and the MIME encodings \%`UTF-16' and
150 \%`UTF-32' mandate that the data stream starts with U+FEFF.
151 Similarly, the data stream encoded as \%`UTF-8' might start with a BOM (to
152 ease the conversion from and to \%UTF-16 and \%UTF-32).
153 In all cases, the byte order mark is
155 part of the data but part of the encoding protocol; in other words,
157 output doesn't contain it.
160 Note that U+FEFF not at the start of the input data actually is emitted;
161 it has then the meaning of a `zero width no-break space' character \[en]
162 something not needed normally in
166 Editors which support more than a single character encoding need tags
167 within the input files to mark the file's encoding.
168 While it is possible to guess the right input encoding with the help of
169 heuristic algorithms for data which represents a greater amount of a natural
170 language, it is still just a guess.
171 Additionally, all algorithms fail easily for input which is either too short
172 or doesn't represent a natural language.
177 supports the coding tag convention (with some restrictions) as used by
181 (and probably other programs too).
188 are stored in so-called
189 .IR "File Variables" .
191 recognizes the following syntax form which must be put into a troff comment
192 in the first or second line.
205 The only relevant tag for
207 is `coding' which can take the values listed below.
208 Here an example line which tells
210 to edit a file in troff mode, and to use \%latin2 as its encoding.
215 \&.\[rs]" \-*\- mode: troff; coding: latin-2 \-*\-
220 The following list gives all MIME coding tags (either lowercase or
221 uppercase) supported by
223 this list is hard-coded in the source.
228 \%big5, \%cp1047, \%euc-jp, \%euc-kr, \%gb2312, \%iso-8859-1, \%iso-8859-2,
229 \%iso-8859-5, \%iso-8859-7, \%iso-8859-9, \%iso-8859-13, \%iso-8859-15,
230 \%koi8-r, \%us-ascii, \%utf-8, \%utf-16, \%utf-16be, \%utf-16le
235 In addition, the following hard-coded list of other tags is recognized which
236 eventually map to values from the list above.
241 \%ascii, \%chinese-big5, \%chinese-euc, \%chinese-iso-8bit, \%cn-big5,
242 \%\%cn-gb, \%cn-gb-2312, \%cp878, \%csascii, \%csisolatin1,
243 \%cyrillic-iso-8bit, \%cyrillic-koi8, \%euc-china, \%euc-cn, \%euc-japan,
244 \%euc-japan-1990, \%euc-korea, \%greek-iso-8bit, \%iso-10646/utf8,
245 \%iso-10646/utf-8, \%iso-latin-1, \%iso-latin-2, \%iso-latin-5,
246 \%iso-latin-7, \%iso-latin-9, \%japanese-euc, \%japanese-iso-8bit, \%jis8,
247 \%koi8, \%korean-euc, \%korean-iso-8bit, \%latin-0, \%latin1, \%latin-1,
248 \%latin-2, \%latin-5, \%latin-7, \%latin-9, \%mule-utf-8, \%mule-utf-16,
249 \%mule-utf-16be, \%mule-utf-16-be, \%mule-utf-16be-with-signature,
250 \%mule-utf-16le, \%mule-utf-16-le, \%mule-utf-16le-with-signature, \%utf8,
251 \%utf-16-be, \%utf-16-be-with-signature, \%utf-16be-with-signature,
252 \%utf-16-le, \%utf-16-le-with-signature, \%utf-16le-with-signature
257 Those tags are taken from
261 together with some aliases.
262 Trailing \%`-dos', \%`-unix', and \%`-mac' suffixes of coding tags (which
263 give the end-of-line convention used in the file) are stripped off before
264 the comparison with the above tags happens.
268 by itself only supports three encodings: \%latin-1, cp1047, and \%UTF-8;
269 all other encodings are passed to the
272 At compile time it is searched and checked for a valid
274 implementation; a call to `preconv \-\-version' shows whether
282 .I "local variable lists"
284 This is a different syntax form to specify local variables at the end of a
289 .BR groff (@MAN1EXT@)