contrib/xz/src/xz/xz.1

   1 '\" t
   2 .\"
   3 .\" Author: Lasse Collin
   4 .\"
   5 .\" This file has been put into the public domain.
   6 .\" You can do whatever you want with this file.
   7 .\"
   8 .TH XZ 1 "2017-04-19" "Tukaani" "XZ Utils"
   9 .
  10 .SH NAME
  11 xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files
  12 .
  13 .SH SYNOPSIS
  14 .B xz
  15 .RI [ option... ]
  16 .RI [ file... ]
  17 .
  18 .SH COMMAND ALIASES
  19 .B unxz
  20 is equivalent to
  21 .BR "xz \-\-decompress" .
  22 .br
  23 .B xzcat
  24 is equivalent to
  25 .BR "xz \-\-decompress \-\-stdout" .
  26 .br
  27 .B lzma
  28 is equivalent to
  29 .BR "xz \-\-format=lzma" .
  30 .br
  31 .B unlzma
  32 is equivalent to
  33 .BR "xz \-\-format=lzma \-\-decompress" .
  34 .br
  35 .B lzcat
  36 is equivalent to
  37 .BR "xz \-\-format=lzma \-\-decompress \-\-stdout" .
  38 .PP
  39 When writing scripts that need to decompress files,
  40 it is recommended to always use the name
  41 .B xz
  42 with appropriate arguments
  43 .RB ( "xz \-d"
  44 or
  45 .BR "xz \-dc" )
  46 instead of the names
  47 .B unxz
  48 and
  49 .BR xzcat .
  50 .
  51 .SH DESCRIPTION
  52 .B xz
  53 is a general-purpose data compression tool with
  54 command line syntax similar to
  55 .BR gzip (1)
  56 and
  57 .BR bzip2 (1).
  58 The native file format is the
  59 .B .xz
  60 format, but the legacy
  61 .B .lzma
  62 format used by LZMA Utils and
  63 raw compressed streams with no container format headers
  64 are also supported.
  65 .PP
  66 .B xz
  67 compresses or decompresses each
  68 .I file
  69 according to the selected operation mode.
  70 If no
  71 .I files
  72 are given or
  73 .I file
  74 is
  75 .BR \- ,
  76 .B xz
  77 reads from standard input and writes the processed data
  78 to standard output.
  79 .B xz
  80 will refuse (display an error and skip the
  81 .IR file )
  82 to write compressed data to standard output if it is a terminal.
  83 Similarly,
  84 .B xz
  85 will refuse to read compressed data
  86 from standard input if it is a terminal.
  87 .PP
  88 Unless
  89 .B \-\-stdout
  90 is specified,
  91 .I files
  92 other than
  93 .B \-
  94 are written to a new file whose name is derived from the source
  95 .I file
  96 name:
  97 .IP \(bu 3
  98 When compressing, the suffix of the target file format
  99 .RB ( .xz
 100 or
 101 .BR .lzma )
 102 is appended to the source filename to get the target filename.
 103 .IP \(bu 3
 104 When decompressing, the
 105 .B .xz
 106 or
 107 .B .lzma
 108 suffix is removed from the filename to get the target filename.
 109 .B xz
 110 also recognizes the suffixes
 111 .B .txz
 112 and
 113 .BR .tlz ,
 114 and replaces them with the
 115 .B .tar
 116 suffix.
 117 .PP
 118 If the target file already exists, an error is displayed and the
 119 .I file
 120 is skipped.
 121 .PP
 122 Unless writing to standard output,
 123 .B xz
 124 will display a warning and skip the
 125 .I file
 126 if any of the following applies:
 127 .IP \(bu 3
 128 .I File
 129 is not a regular file.
 130 Symbolic links are not followed,
 131 and thus they are not considered to be regular files.
 132 .IP \(bu 3
 133 .I File
 134 has more than one hard link.
 135 .IP \(bu 3
 136 .I File
 137 has setuid, setgid, or sticky bit set.
 138 .IP \(bu 3
 139 The operation mode is set to compress and the
 140 .I file
 141 already has a suffix of the target file format
 142 .RB ( .xz
 143 or
 144 .B .txz
 145 when compressing to the
 146 .B .xz
 147 format, and
 148 .B .lzma
 149 or
 150 .B .tlz
 151 when compressing to the
 152 .B .lzma
 153 format).
 154 .IP \(bu 3
 155 The operation mode is set to decompress and the
 156 .I file
 157 doesn't have a suffix of any of the supported file formats
 158 .RB ( .xz ,
 159 .BR .txz ,
 160 .BR .lzma ,
 161 or
 162 .BR .tlz ).
 163 .PP
 164 After successfully compressing or decompressing the
 165 .IR file ,
 166 .B xz
 167 copies the owner, group, permissions, access time,
 168 and modification time from the source
 169 .I file
 170 to the target file.
 171 If copying the group fails, the permissions are modified
 172 so that the target file doesn't become accessible to users
 173 who didn't have permission to access the source
 174 .IR file .
 175 .B xz
 176 doesn't support copying other metadata like access control lists
 177 or extended attributes yet.
 178 .PP
 179 Once the target file has been successfully closed, the source
 180 .I file
 181 is removed unless
 182 .B \-\-keep
 183 was specified.
 184 The source
 185 .I file
 186 is never removed if the output is written to standard output.
 187 .PP
 188 Sending
 189 .B SIGINFO
 190 or
 191 .B SIGUSR1
 192 to the
 193 .B xz
 194 process makes it print progress information to standard error.
 195 This has only limited use since when standard error
 196 is a terminal, using
 197 .B \-\-verbose
 198 will display an automatically updating progress indicator.
 199 .
 200 .SS "Memory usage"
 201 The memory usage of
 202 .B xz
 203 varies from a few hundred kilobytes to several gigabytes
 204 depending on the compression settings.
 205 The settings used when compressing a file determine
 206 the memory requirements of the decompressor.
 207 Typically the decompressor needs 5\ % to 20\ % of
 208 the amount of memory that the compressor needed when
 209 creating the file.
 210 For example, decompressing a file created with
 211 .B xz \-9
 212 currently requires 65\ MiB of memory.
 213 Still, it is possible to have
 214 .B .xz
 215 files that require several gigabytes of memory to decompress.
 216 .PP
 217 Especially users of older systems may find
 218 the possibility of very large memory usage annoying.
 219 To prevent uncomfortable surprises,
 220 .B xz
 221 has a built-in memory usage limiter, which is disabled by default.
 222 While some operating systems provide ways to limit
 223 the memory usage of processes, relying on it
 224 wasn't deemed to be flexible enough (e.g. using
 225 .BR ulimit (1)
 226 to limit virtual memory tends to cripple
 227 .BR mmap (2)).
 228 .PP
 229 The memory usage limiter can be enabled with
 230 the command line option \fB\-\-memlimit=\fIlimit\fR.
 231 Often it is more convenient to enable the limiter
 232 by default by setting the environment variable
 233 .BR XZ_DEFAULTS ,
 234 e.g.\&
 235 .BR XZ_DEFAULTS=\-\-memlimit=150MiB .
 236 It is possible to set the limits separately
 237 for compression and decompression
 238 by using \fB\-\-memlimit\-compress=\fIlimit\fR and
 239 \fB\-\-memlimit\-decompress=\fIlimit\fR.
 240 Using these two options outside
 241 .B XZ_DEFAULTS
 242 is rarely useful because a single run of
 243 .B xz
 244 cannot do both compression and decompression and
 245 .BI \-\-memlimit= limit
 246 (or \fB\-M\fR \fIlimit\fR)
 247 is shorter to type on the command line.
 248 .PP
 249 If the specified memory usage limit is exceeded when decompressing,
 250 .B xz
 251 will display an error and decompressing the file will fail.
 252 If the limit is exceeded when compressing,
 253 .B xz
 254 will try to scale the settings down so that the limit
 255 is no longer exceeded (except when using \fB\-\-format=raw\fR
 256 or \fB\-\-no\-adjust\fR).
 257 This way the operation won't fail unless the limit is very small.
 258 The scaling of the settings is done in steps that don't
 259 match the compression level presets, e.g. if the limit is
 260 only slightly less than the amount required for
 261 .BR "xz \-9" ,
 262 the settings will be scaled down only a little,
 263 not all the way down to
 264 .BR "xz \-8" .
 265 .
 266 .SS "Concatenation and padding with .xz files"
 267 It is possible to concatenate
 268 .B .xz
 269 files as is.
 270 .B xz
 271 will decompress such files as if they were a single
 272 .B .xz
 273 file.
 274 .PP
 275 It is possible to insert padding between the concatenated parts
 276 or after the last part.
 277 The padding must consist of null bytes and the size
 278 of the padding must be a multiple of four bytes.
 279 This can be useful e.g. if the
 280 .B .xz
 281 file is stored on a medium that measures file sizes
 282 in 512-byte blocks.
 283 .PP
 284 Concatenation and padding are not allowed with
 285 .B .lzma
 286 files or raw streams.
 287 .
 288 .SH OPTIONS
 289 .
 290 .SS "Integer suffixes and special values"
 291 In most places where an integer argument is expected,
 292 an optional suffix is supported to easily indicate large integers.
 293 There must be no space between the integer and the suffix.
 294 .TP
 295 .B KiB
 296 Multiply the integer by 1,024 (2^10).
 297 .BR Ki ,
 298 .BR k ,
 299 .BR kB ,
 300 .BR K ,
 301 and
 302 .B KB
 303 are accepted as synonyms for
 304 .BR KiB .
 305 .TP
 306 .B MiB
 307 Multiply the integer by 1,048,576 (2^20).
 308 .BR Mi ,
 309 .BR m ,
 310 .BR M ,
 311 and
 312 .B MB
 313 are accepted as synonyms for
 314 .BR MiB .
 315 .TP
 316 .B GiB
 317 Multiply the integer by 1,073,741,824 (2^30).
 318 .BR Gi ,
 319 .BR g ,
 320 .BR G ,
 321 and
 322 .B GB
 323 are accepted as synonyms for
 324 .BR GiB .
 325 .PP
 326 The special value
 327 .B max
 328 can be used to indicate the maximum integer value
 329 supported by the option.
 330 .
 331 .SS "Operation mode"
 332 If multiple operation mode options are given,
 333 the last one takes effect.
 334 .TP
 335 .BR \-z ", " \-\-compress
 336 Compress.
 337 This is the default operation mode when no operation mode option
 338 is specified and no other operation mode is implied from
 339 the command name (for example,
 340 .B unxz
 341 implies
 342 .BR \-\-decompress ).
 343 .TP
 344 .BR \-d ", " \-\-decompress ", " \-\-uncompress
 345 Decompress.
 346 .TP
 347 .BR \-t ", " \-\-test
 348 Test the integrity of compressed
 349 .IR files .
 350 This option is equivalent to
 351 .B "\-\-decompress \-\-stdout"
 352 except that the decompressed data is discarded instead of being
 353 written to standard output.
 354 No files are created or removed.
 355 .TP
 356 .BR \-l ", " \-\-list
 357 Print information about compressed
 358 .IR files .
 359 No uncompressed output is produced,
 360 and no files are created or removed.
 361 In list mode, the program cannot read
 362 the compressed data from standard
 363 input or from other unseekable sources.
 364 .IP ""
 365 The default listing shows basic information about
 366 .IR files ,
 367 one file per line.
 368 To get more detailed information, use also the
 369 .B \-\-verbose
 370 option.
 371 For even more information, use
 372 .B \-\-verbose
 373 twice, but note that this may be slow, because getting all the extra
 374 information requires many seeks.
 375 The width of verbose output exceeds
 376 80 characters, so piping the output to e.g.\&
 377 .B "less\ \-S"
 378 may be convenient if the terminal isn't wide enough.
 379 .IP ""
 380 The exact output may vary between
 381 .B xz
 382 versions and different locales.
 383 For machine-readable output,
 384 .B \-\-robot \-\-list
 385 should be used.
 386 .
 387 .SS "Operation modifiers"
 388 .TP
 389 .BR \-k ", " \-\-keep
 390 Don't delete the input files.
 391 .TP
 392 .BR \-f ", " \-\-force
 393 This option has several effects:
 394 .RS
 395 .IP \(bu 3
 396 If the target file already exists,
 397 delete it before compressing or decompressing.
 398 .IP \(bu 3
 399 Compress or decompress even if the input is
 400 a symbolic link to a regular file,
 401 has more than one hard link,
 402 or has the setuid, setgid, or sticky bit set.
 403 The setuid, setgid, and sticky bits are not copied
 404 to the target file.
 405 .IP \(bu 3
 406 When used with
 407 .B \-\-decompress
 408 .BR \-\-stdout
 409 and
 410 .B xz
 411 cannot recognize the type of the source file,
 412 copy the source file as is to standard output.
 413 This allows
 414 .B xzcat
 415 .B \-\-force
 416 to be used like
 417 .BR cat (1)
 418 for files that have not been compressed with
 419 .BR xz .
 420 Note that in future,
 421 .B xz
 422 might support new compressed file formats, which may make
 423 .B xz
 424 decompress more types of files instead of copying them as is to
 425 standard output.
 426 .BI \-\-format= format
 427 can be used to restrict
 428 .B xz
 429 to decompress only a single file format.
 430 .RE
 431 .TP
 432 .BR \-c ", " \-\-stdout ", " \-\-to\-stdout
 433 Write the compressed or decompressed data to
 434 standard output instead of a file.
 435 This implies
 436 .BR \-\-keep .
 437 .TP
 438 .B \-\-single\-stream
 439 Decompress only the first
 440 .B .xz
 441 stream, and
 442 silently ignore possible remaining input data following the stream.
 443 Normally such trailing garbage makes
 444 .B xz
 445 display an error.
 446 .IP ""
 447 .B xz
 448 never decompresses more than one stream from
 449 .B .lzma
 450 files or raw streams, but this option still makes
 451 .B xz
 452 ignore the possible trailing data after the
 453 .B .lzma
 454 file or raw stream.
 455 .IP ""
 456 This option has no effect if the operation mode is not
 457 .B \-\-decompress
 458 or
 459 .BR \-\-test .
 460 .TP
 461 .B \-\-no\-sparse
 462 Disable creation of sparse files.
 463 By default, if decompressing into a regular file,
 464 .B xz
 465 tries to make the file sparse if the decompressed data contains
 466 long sequences of binary zeros.
 467 It also works when writing to standard output
 468 as long as standard output is connected to a regular file
 469 and certain additional conditions are met to make it safe.
 470 Creating sparse files may save disk space and speed up
 471 the decompression by reducing the amount of disk I/O.
 472 .TP
 473 \fB\-S\fR \fI.suf\fR, \fB\-\-suffix=\fI.suf
 474 When compressing, use
 475 .I .suf
 476 as the suffix for the target file instead of
 477 .B .xz
 478 or
 479 .BR .lzma .
 480 If not writing to standard output and
 481 the source file already has the suffix
 482 .IR .suf ,
 483 a warning is displayed and the file is skipped.
 484 .IP ""
 485 When decompressing, recognize files with the suffix
 486 .I .suf
 487 in addition to files with the
 488 .BR .xz ,
 489 .BR .txz ,
 490 .BR .lzma ,
 491 or
 492 .B .tlz
 493 suffix.
 494 If the source file has the suffix
 495 .IR .suf ,
 496 the suffix is removed to get the target filename.
 497 .IP ""
 498 When compressing or decompressing raw streams
 499 .RB ( \-\-format=raw ),
 500 the suffix must always be specified unless
 501 writing to standard output,
 502 because there is no default suffix for raw streams.
 503 .TP
 504 \fB\-\-files\fR[\fB=\fIfile\fR]
 505 Read the filenames to process from
 506 .IR file ;
 507 if
 508 .I file
 509 is omitted, filenames are read from standard input.
 510 Filenames must be terminated with the newline character.
 511 A dash
 512 .RB ( \- )
 513 is taken as a regular filename; it doesn't mean standard input.
 514 If filenames are given also as command line arguments, they are
 515 processed before the filenames read from
 516 .IR file .
 517 .TP
 518 \fB\-\-files0\fR[\fB=\fIfile\fR]
 519 This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except
 520 that each filename must be terminated with the null character.
 521 .
 522 .SS "Basic file format and compression options"
 523 .TP
 524 \fB\-F\fR \fIformat\fR, \fB\-\-format=\fIformat
 525 Specify the file
 526 .I format
 527 to compress or decompress:
 528 .RS
 529 .TP
 530 .B auto
 531 This is the default.
 532 When compressing,
 533 .B auto
 534 is equivalent to
 535 .BR xz .
 536 When decompressing,
 537 the format of the input file is automatically detected.
 538 Note that raw streams (created with
 539 .BR \-\-format=raw )
 540 cannot be auto-detected.
 541 .TP
 542 .B xz
 543 Compress to the
 544 .B .xz
 545 file format, or accept only
 546 .B .xz
 547 files when decompressing.
 548 .TP
 549 .BR lzma ", " alone
 550 Compress to the legacy
 551 .B .lzma
 552 file format, or accept only
 553 .B .lzma
 554 files when decompressing.
 555 The alternative name
 556 .B alone
 557 is provided for backwards compatibility with LZMA Utils.
 558 .TP
 559 .B raw
 560 Compress or uncompress a raw stream (no headers).
 561 This is meant for advanced users only.
 562 To decode raw streams, you need use
 563 .B \-\-format=raw
 564 and explicitly specify the filter chain,
 565 which normally would have been stored in the container headers.
 566 .RE
 567 .TP
 568 \fB\-C\fR \fIcheck\fR, \fB\-\-check=\fIcheck
 569 Specify the type of the integrity check.
 570 The check is calculated from the uncompressed data and
 571 stored in the
 572 .B .xz
 573 file.
 574 This option has an effect only when compressing into the
 575 .B .xz
 576 format; the
 577 .B .lzma
 578 format doesn't support integrity checks.
 579 The integrity check (if any) is verified when the
 580 .B .xz
 581 file is decompressed.
 582 .IP ""
 583 Supported
 584 .I check
 585 types:
 586 .RS
 587 .TP
 588 .B none
 589 Don't calculate an integrity check at all.
 590 This is usually a bad idea.
 591 This can be useful when integrity of the data is verified
 592 by other means anyway.
 593 .TP
 594 .B crc32
 595 Calculate CRC32 using the polynomial from IEEE-802.3 (Ethernet).
 596 .TP
 597 .B crc64
 598 Calculate CRC64 using the polynomial from ECMA-182.
 599 This is the default, since it is slightly better than CRC32
 600 at detecting damaged files and the speed difference is negligible.
 601 .TP
 602 .B sha256
 603 Calculate SHA-256.
 604 This is somewhat slower than CRC32 and CRC64.
 605 .RE
 606 .IP ""
 607 Integrity of the
 608 .B .xz
 609 headers is always verified with CRC32.
 610 It is not possible to change or disable it.
 611 .TP
 612 .B \-\-ignore\-check
 613 Don't verify the integrity check of the compressed data when decompressing.
 614 The CRC32 values in the
 615 .B .xz
 616 headers will still be verified normally.
 617 .IP ""
 618 .B "Do not use this option unless you know what you are doing."
 619 Possible reasons to use this option:
 620 .RS
 621 .IP \(bu 3
 622 Trying to recover data from a corrupt .xz file.
 623 .IP \(bu 3
 624 Speeding up decompression.
 625 This matters mostly with SHA-256 or
 626 with files that have compressed extremely well.
 627 It's recommended to not use this option for this purpose
 628 unless the file integrity is verified externally in some other way.
 629 .RE
 630 .TP
 631 .BR \-0 " ... " \-9
 632 Select a compression preset level.
 633 The default is
 634 .BR \-6 .
 635 If multiple preset levels are specified,
 636 the last one takes effect.
 637 If a custom filter chain was already specified, setting
 638 a compression preset level clears the custom filter chain.
 639 .IP ""
 640 The differences between the presets are more significant than with
 641 .BR gzip (1)
 642 and
 643 .BR bzip2 (1).
 644 The selected compression settings determine
 645 the memory requirements of the decompressor,
 646 thus using a too high preset level might make it painful
 647 to decompress the file on an old system with little RAM.
 648 Specifically,
 649 .B "it's not a good idea to blindly use \-9 for everything"
 650 like it often is with
 651 .BR gzip (1)
 652 and
 653 .BR bzip2 (1).
 654 .RS
 655 .TP
 656 .BR "\-0" " ... " "\-3"
 657 These are somewhat fast presets.
 658 .B \-0
 659 is sometimes faster than
 660 .B "gzip \-9"
 661 while compressing much better.
 662 The higher ones often have speed comparable to
 663 .BR bzip2 (1)
 664 with comparable or better compression ratio,
 665 although the results
 666 depend a lot on the type of data being compressed.
 667 .TP
 668 .BR "\-4" " ... " "\-6"
 669 Good to very good compression while keeping
 670 decompressor memory usage reasonable even for old systems.
 671 .B \-6
 672 is the default, which is usually a good choice
 673 e.g. for distributing files that need to be decompressible
 674 even on systems with only 16\ MiB RAM.
 675 .RB ( \-5e
 676 or
 677 .B \-6e
 678 may be worth considering too.
 679 See
 680 .BR \-\-extreme .)
 681 .TP
 682 .B "\-7 ... \-9"
 683 These are like
 684 .B \-6
 685 but with higher compressor and decompressor memory requirements.
 686 These are useful only when compressing files bigger than
 687 8\ MiB, 16\ MiB, and 32\ MiB, respectively.
 688 .RE
 689 .IP ""
 690 On the same hardware, the decompression speed is approximately
 691 a constant number of bytes of compressed data per second.
 692 In other words, the better the compression,
 693 the faster the decompression will usually be.
 694 This also means that the amount of uncompressed output
 695 produced per second can vary a lot.
 696 .IP ""
 697 The following table summarises the features of the presets:
 698 .RS
 699 .RS
 700 .PP
 701 .TS
 702 tab(;);
 703 c c c c c
 704 n n n n n.
 705 Preset;DictSize;CompCPU;CompMem;DecMem
 706 \-0;256 KiB;0;3 MiB;1 MiB
 707 \-1;1 MiB;1;9 MiB;2 MiB
 708 \-2;2 MiB;2;17 MiB;3 MiB
 709 \-3;4 MiB;3;32 MiB;5 MiB
 710 \-4;4 MiB;4;48 MiB;5 MiB
 711 \-5;8 MiB;5;94 MiB;9 MiB
 712 \-6;8 MiB;6;94 MiB;9 MiB
 713 \-7;16 MiB;6;186 MiB;17 MiB
 714 \-8;32 MiB;6;370 MiB;33 MiB
 715 \-9;64 MiB;6;674 MiB;65 MiB
 716 .TE
 717 .RE
 718 .RE
 719 .IP ""
 720 Column descriptions:
 721 .RS
 722 .IP \(bu 3
 723 DictSize is the LZMA2 dictionary size.
 724 It is waste of memory to use a dictionary bigger than
 725 the size of the uncompressed file.
 726 This is why it is good to avoid using the presets
 727 .BR \-7 " ... " \-9
 728 when there's no real need for them.
 729 At
 730 .B \-6
 731 and lower, the amount of memory wasted is
 732 usually low enough to not matter.
 733 .IP \(bu 3
 734 CompCPU is a simplified representation of the LZMA2 settings
 735 that affect compression speed.
 736 The dictionary size affects speed too,
 737 so while CompCPU is the same for levels
 738 .BR \-6 " ... " \-9 ,
 739 higher levels still tend to be a little slower.
 740 To get even slower and thus possibly better compression, see
 741 .BR \-\-extreme .
 742 .IP \(bu 3
 743 CompMem contains the compressor memory requirements
 744 in the single-threaded mode.
 745 It may vary slightly between
 746 .B xz
 747 versions.
 748 Memory requirements of some of the future multithreaded modes may
 749 be dramatically higher than that of the single-threaded mode.
 750 .IP \(bu 3
 751 DecMem contains the decompressor memory requirements.
 752 That is, the compression settings determine
 753 the memory requirements of the decompressor.
 754 The exact decompressor memory usage is slightly more than
 755 the LZMA2 dictionary size, but the values in the table
 756 have been rounded up to the next full MiB.
 757 .RE
 758 .TP
 759 .BR \-e ", " \-\-extreme
 760 Use a slower variant of the selected compression preset level
 761 .RB ( \-0 " ... " \-9 )
 762 to hopefully get a little bit better compression ratio,
 763 but with bad luck this can also make it worse.
 764 Decompressor memory usage is not affected,
 765 but compressor memory usage increases a little at preset levels
 766 .BR \-0 " ... " \-3 .
 767 .IP ""
 768 Since there are two presets with dictionary sizes
 769 4\ MiB and 8\ MiB, the presets
 770 .B \-3e
 771 and
 772 .B \-5e
 773 use slightly faster settings (lower CompCPU) than
 774 .B \-4e
 775 and
 776 .BR \-6e ,
 777 respectively.
 778 That way no two presets are identical.
 779 .RS
 780 .RS
 781 .PP
 782 .TS
 783 tab(;);
 784 c c c c c
 785 n n n n n.
 786 Preset;DictSize;CompCPU;CompMem;DecMem
 787 \-0e;256 KiB;8;4 MiB;1 MiB
 788 \-1e;1 MiB;8;13 MiB;2 MiB
 789 \-2e;2 MiB;8;25 MiB;3 MiB
 790 \-3e;4 MiB;7;48 MiB;5 MiB
 791 \-4e;4 MiB;8;48 MiB;5 MiB
 792 \-5e;8 MiB;7;94 MiB;9 MiB
 793 \-6e;8 MiB;8;94 MiB;9 MiB
 794 \-7e;16 MiB;8;186 MiB;17 MiB
 795 \-8e;32 MiB;8;370 MiB;33 MiB
 796 \-9e;64 MiB;8;674 MiB;65 MiB
 797 .TE
 798 .RE
 799 .RE
 800 .IP ""
 801 For example, there are a total of four presets that use
 802 8\ MiB dictionary, whose order from the fastest to the slowest is
 803 .BR \-5 ,
 804 .BR \-6 ,
 805 .BR \-5e ,
 806 and
 807 .BR \-6e .
 808 .TP
 809 .B \-\-fast
 810 .PD 0
 811 .TP
 812 .B \-\-best
 813 .PD
 814 These are somewhat misleading aliases for
 815 .B \-0
 816 and
 817 .BR \-9 ,
 818 respectively.
 819 These are provided only for backwards compatibility
 820 with LZMA Utils.
 821 Avoid using these options.
 822 .TP
 823 .BI \-\-block\-size= size
 824 When compressing to the
 825 .B .xz
 826 format, split the input data into blocks of
 827 .I size
 828 bytes.
 829 The blocks are compressed independently from each other,
 830 which helps with multi-threading and
 831 makes limited random-access decompression possible.
 832 This option is typically used to override the default
 833 block size in multi-threaded mode,
 834 but this option can be used in single-threaded mode too.
 835 .IP ""
 836 In multi-threaded mode about three times
 837 .I size
 838 bytes will be allocated in each thread for buffering input and output.
 839 The default
 840 .I size
 841 is three times the LZMA2 dictionary size or 1 MiB,
 842 whichever is more.
 843 Typically a good value is 2\-4 times
 844 the size of the LZMA2 dictionary or at least 1 MiB.
 845 Using
 846 .I size
 847 less than the LZMA2 dictionary size is waste of RAM
 848 because then the LZMA2 dictionary buffer will never get fully used.
 849 The sizes of the blocks are stored in the block headers,
 850 which a future version of
 851 .B xz
 852 will use for multi-threaded decompression.
 853 .IP ""
 854 In single-threaded mode no block splitting is done by default.
 855 Setting this option doesn't affect memory usage.
 856 No size information is stored in block headers,
 857 thus files created in single-threaded mode
 858 won't be identical to files created in multi-threaded mode.
 859 The lack of size information also means that a future version of
 860 .B xz
 861 won't be able decompress the files in multi-threaded mode.
 862 .TP
 863 .BI \-\-block\-list= sizes
 864 When compressing to the
 865 .B .xz
 866 format, start a new block after
 867 the given intervals of uncompressed data.
 868 .IP ""
 869 The uncompressed
 870 .I sizes
 871 of the blocks are specified as a comma-separated list.
 872 Omitting a size (two or more consecutive commas) is a shorthand
 873 to use the size of the previous block.
 874 .IP ""
 875 If the input file is bigger than the sum of
 876 .IR sizes ,
 877 the last value in
 878 .I sizes
 879 is repeated until the end of the file.
 880 A special value of
 881 .B 0
 882 may be used as the last value to indicate that
 883 the rest of the file should be encoded as a single block.
 884 .IP ""
 885 If one specifies
 886 .I sizes
 887 that exceed the encoder's block size
 888 (either the default value in threaded mode or
 889 the value specified with \fB\-\-block\-size=\fIsize\fR),
 890 the encoder will create additional blocks while
 891 keeping the boundaries specified in
 892 .IR sizes .
 893 For example, if one specifies
 894 .B \-\-block\-size=10MiB
 895 .B \-\-block\-list=5MiB,10MiB,8MiB,12MiB,24MiB
 896 and the input file is 80 MiB,
 897 one will get 11 blocks:
 898 5, 10, 8, 10, 2, 10, 10, 4, 10, 10, and 1 MiB.
 899 .IP ""
 900 In multi-threaded mode the sizes of the blocks
 901 are stored in the block headers.
 902 This isn't done in single-threaded mode,
 903 so the encoded output won't be
 904 identical to that of the multi-threaded mode.
 905 .TP
 906 .BI \-\-flush\-timeout= timeout
 907 When compressing, if more than
 908 .I timeout
 909 milliseconds (a positive integer) has passed since the previous flush and
 910 reading more input would block,
 911 all the pending input data is flushed from the encoder and
 912 made available in the output stream.
 913 This can be useful if
 914 .B xz
 915 is used to compress data that is streamed over a network.
 916 Small
 917 .I timeout
 918 values make the data available at the receiving end
 919 with a small delay, but large
 920 .I timeout
 921 values give better compression ratio.
 922 .IP ""
 923 This feature is disabled by default.
 924 If this option is specified more than once, the last one takes effect.
 925 The special
 926 .I timeout
 927 value of
 928 .B 0
 929 can be used to explicitly disable this feature.
 930 .IP ""
 931 This feature is not available on non-POSIX systems.
 932 .IP ""
 933 .\" FIXME
 934 .B "This feature is still experimental."
 935 Currently
 936 .B xz
 937 is unsuitable for decompressing the stream in real time due to how
 938 .B xz
 939 does buffering.
 940 .TP
 941 .BI \-\-memlimit\-compress= limit
 942 Set a memory usage limit for compression.
 943 If this option is specified multiple times,
 944 the last one takes effect.
 945 .IP ""
 946 If the compression settings exceed the
 947 .IR limit ,
 948 .B xz
 949 will adjust the settings downwards so that
 950 the limit is no longer exceeded and display a notice that
 951 automatic adjustment was done.
 952 Such adjustments are not made when compressing with
 953 .B \-\-format=raw
 954 or if
 955 .B \-\-no\-adjust
 956 has been specified.
 957 In those cases, an error is displayed and
 958 .B xz
 959 will exit with exit status 1.
 960 .IP ""
 961 The
 962 .I limit
 963 can be specified in multiple ways:
 964 .RS
 965 .IP \(bu 3
 966 The
 967 .I limit
 968 can be an absolute value in bytes.
 969 Using an integer suffix like
 970 .B MiB
 971 can be useful.
 972 Example:
 973 .B "\-\-memlimit\-compress=80MiB"
 974 .IP \(bu 3
 975 The
 976 .I limit
 977 can be specified as a percentage of total physical memory (RAM).
 978 This can be useful especially when setting the
 979 .B XZ_DEFAULTS
 980 environment variable in a shell initialization script
 981 that is shared between different computers.
 982 That way the limit is automatically bigger
 983 on systems with more memory.
 984 Example:
 985 .B "\-\-memlimit\-compress=70%"
 986 .IP \(bu 3
 987 The
 988 .I limit
 989 can be reset back to its default value by setting it to
 990 .BR 0 .
 991 This is currently equivalent to setting the
 992 .I limit
 993 to
 994 .B max
 995 (no memory usage limit).
 996 Once multithreading support has been implemented,
 997 there may be a difference between
 998 .B 0
 999 and
1000 .B max
1001 for the multithreaded case, so it is recommended to use
1002 .B 0
1003 instead of
1004 .B max
1005 until the details have been decided.
1006 .RE
1007 .IP ""
1008 See also the section
1009 .BR "Memory usage" .
1010 .TP
1011 .BI \-\-memlimit\-decompress= limit
1012 Set a memory usage limit for decompression.
1013 This also affects the
1014 .B \-\-list
1015 mode.
1016 If the operation is not possible without exceeding the
1017 .IR limit ,
1018 .B xz
1019 will display an error and decompressing the file will fail.
1020 See
1021 .BI \-\-memlimit\-compress= limit
1022 for possible ways to specify the
1023 .IR limit .
1024 .TP
1025 \fB\-M\fR \fIlimit\fR, \fB\-\-memlimit=\fIlimit\fR, \fB\-\-memory=\fIlimit
1026 This is equivalent to specifying \fB\-\-memlimit\-compress=\fIlimit
1027 \fB\-\-memlimit\-decompress=\fIlimit\fR.
1028 .TP
1029 .B \-\-no\-adjust
1030 Display an error and exit if the compression settings exceed
1031 the memory usage limit.
1032 The default is to adjust the settings downwards so
1033 that the memory usage limit is not exceeded.
1034 Automatic adjusting is always disabled when creating raw streams
1035 .RB ( \-\-format=raw ).
1036 .TP
1037 \fB\-T\fR \fIthreads\fR, \fB\-\-threads=\fIthreads
1038 Specify the number of worker threads to use.
1039 Setting
1040 .I threads
1041 to a special value
1042 .B 0
1043 makes
1044 .B xz
1045 use as many threads as there are CPU cores on the system.
1046 The actual number of threads can be less than
1047 .I threads
1048 if the input file is not big enough
1049 for threading with the given settings or
1050 if using more threads would exceed the memory usage limit.
1051 .IP ""
1052 Currently the only threading method is to split the input into
1053 blocks and compress them independently from each other.
1054 The default block size depends on the compression level and
1055 can be overriden with the
1056 .BI \-\-block\-size= size
1057 option.
1058 .IP ""
1059 Threaded decompression hasn't been implemented yet.
1060 It will only work on files that contain multiple blocks
1061 with size information in block headers.
1062 All files compressed in multi-threaded mode meet this condition,
1063 but files compressed in single-threaded mode don't even if
1064 .BI \-\-block\-size= size
1065 is used.
1066 .
1067 .SS "Custom compressor filter chains"
1068 A custom filter chain allows specifying
1069 the compression settings in detail instead of relying on
1070 the settings associated to the presets.
1071 When a custom filter chain is specified,
1072 preset options (\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR)
1073 earlier on the command line are forgotten.
1074 If a preset option is specified
1075 after one or more custom filter chain options,
1076 the new preset takes effect and
1077 the custom filter chain options specified earlier are forgotten.
1078 .PP
1079 A filter chain is comparable to piping on the command line.
1080 When compressing, the uncompressed input goes to the first filter,
1081 whose output goes to the next filter (if any).
1082 The output of the last filter gets written to the compressed file.
1083 The maximum number of filters in the chain is four,
1084 but typically a filter chain has only one or two filters.
1085 .PP
1086 Many filters have limitations on where they can be
1087 in the filter chain:
1088 some filters can work only as the last filter in the chain,
1089 some only as a non-last filter, and some work in any position
1090 in the chain.
1091 Depending on the filter, this limitation is either inherent to
1092 the filter design or exists to prevent security issues.
1093 .PP
1094 A custom filter chain is specified by using one or more
1095 filter options in the order they are wanted in the filter chain.
1096 That is, the order of filter options is significant!
1097 When decoding raw streams
1098 .RB ( \-\-format=raw ),
1099 the filter chain is specified in the same order as
1100 it was specified when compressing.
1101 .PP
1102 Filters take filter-specific
1103 .I options
1104 as a comma-separated list.
1105 Extra commas in
1106 .I options
1107 are ignored.
1108 Every option has a default value, so you need to
1109 specify only those you want to change.
1110 .PP
1111 To see the whole filter chain and
1112 .IR options ,
1113 use
1114 .B "xz \-vv"
1115 (that is, use
1116 .B \-\-verbose
1117 twice).
1118 This works also for viewing the filter chain options used by presets.
1119 .TP
1120 \fB\-\-lzma1\fR[\fB=\fIoptions\fR]
1121 .PD 0
1122 .TP
1123 \fB\-\-lzma2\fR[\fB=\fIoptions\fR]
1124 .PD
1125 Add LZMA1 or LZMA2 filter to the filter chain.
1126 These filters can be used only as the last filter in the chain.
1127 .IP ""
1128 LZMA1 is a legacy filter,
1129 which is supported almost solely due to the legacy
1130 .B .lzma
1131 file format, which supports only LZMA1.
1132 LZMA2 is an updated
1133 version of LZMA1 to fix some practical issues of LZMA1.
1134 The
1135 .B .xz
1136 format uses LZMA2 and doesn't support LZMA1 at all.
1137 Compression speed and ratios of LZMA1 and LZMA2
1138 are practically the same.
1139 .IP ""
1140 LZMA1 and LZMA2 share the same set of
1141 .IR options :
1142 .RS
1143 .TP
1144 .BI preset= preset
1145 Reset all LZMA1 or LZMA2
1146 .I options
1147 to
1148 .IR preset .
1149 .I Preset
1150 consist of an integer, which may be followed by single-letter
1151 preset modifiers.
1152 The integer can be from
1153 .B 0
1154 to
1155 .BR 9 ,
1156 matching the command line options \fB\-0\fR ... \fB\-9\fR.
1157 The only supported modifier is currently
1158 .BR e ,
1159 which matches
1160 .BR \-\-extreme .
1161 If no
1162 .B preset
1163 is specified, the default values of LZMA1 or LZMA2
1164 .I options
1165 are taken from the preset
1166 .BR 6 .
1167 .TP
1168 .BI dict= size
1169 Dictionary (history buffer)
1170 .I size
1171 indicates how many bytes of the recently processed
1172 uncompressed data is kept in memory.
1173 The algorithm tries to find repeating byte sequences (matches) in
1174 the uncompressed data, and replace them with references
1175 to the data currently in the dictionary.
1176 The bigger the dictionary, the higher is the chance
1177 to find a match.
1178 Thus, increasing dictionary
1179 .I size
1180 usually improves compression ratio, but
1181 a dictionary bigger than the uncompressed file is waste of memory.
1182 .IP ""
1183 Typical dictionary
1184 .I size
1185 is from 64\ KiB to 64\ MiB.
1186 The minimum is 4\ KiB.
1187 The maximum for compression is currently 1.5\ GiB (1536\ MiB).
1188 The decompressor already supports dictionaries up to
1189 one byte less than 4\ GiB, which is the maximum for
1190 the LZMA1 and LZMA2 stream formats.
1191 .IP ""
1192 Dictionary
1193 .I size
1194 and match finder
1195 .RI ( mf )
1196 together determine the memory usage of the LZMA1 or LZMA2 encoder.
1197 The same (or bigger) dictionary
1198 .I size
1199 is required for decompressing that was used when compressing,
1200 thus the memory usage of the decoder is determined
1201 by the dictionary size used when compressing.
1202 The
1203 .B .xz
1204 headers store the dictionary
1205 .I size
1206 either as
1207 .RI "2^" n
1208 or
1209 .RI "2^" n " + 2^(" n "\-1),"
1210 so these
1211 .I sizes
1212 are somewhat preferred for compression.
1213 Other
1214 .I sizes
1215 will get rounded up when stored in the
1216 .B .xz
1217 headers.
1218 .TP
1219 .BI lc= lc
1220 Specify the number of literal context bits.
1221 The minimum is 0 and the maximum is 4; the default is 3.
1222 In addition, the sum of
1223 .I lc
1224 and
1225 .I lp
1226 must not exceed 4.
1227 .IP ""
1228 All bytes that cannot be encoded as matches
1229 are encoded as literals.
1230 That is, literals are simply 8-bit bytes
1231 that are encoded one at a time.
1232 .IP ""
1233 The literal coding makes an assumption that the highest
1234 .I lc
1235 bits of the previous uncompressed byte correlate
1236 with the next byte.
1237 E.g. in typical English text, an upper-case letter is
1238 often followed by a lower-case letter, and a lower-case
1239 letter is usually followed by another lower-case letter.
1240 In the US-ASCII character set, the highest three bits are 010
1241 for upper-case letters and 011 for lower-case letters.
1242 When
1243 .I lc
1244 is at least 3, the literal coding can take advantage of
1245 this property in the uncompressed data.
1246 .IP ""
1247 The default value (3) is usually good.
1248 If you want maximum compression, test
1249 .BR lc=4 .
1250 Sometimes it helps a little, and
1251 sometimes it makes compression worse.
1252 If it makes it worse, test e.g.\&
1253 .B lc=2
1254 too.
1255 .TP
1256 .BI lp= lp
1257 Specify the number of literal position bits.
1258 The minimum is 0 and the maximum is 4; the default is 0.
1259 .IP ""
1260 .I Lp
1261 affects what kind of alignment in the uncompressed data is
1262 assumed when encoding literals.
1263 See
1264 .I pb
1265 below for more information about alignment.
1266 .TP
1267 .BI pb= pb
1268 Specify the number of position bits.
1269 The minimum is 0 and the maximum is 4; the default is 2.
1270 .IP ""
1271 .I Pb
1272 affects what kind of alignment in the uncompressed data is
1273 assumed in general.
1274 The default means four-byte alignment
1275 .RI (2^ pb =2^2=4),
1276 which is often a good choice when there's no better guess.
1277 .IP ""
1278 When the aligment is known, setting
1279 .I pb
1280 accordingly may reduce the file size a little.
1281 E.g. with text files having one-byte
1282 alignment (US-ASCII, ISO-8859-*, UTF-8), setting
1283 .B pb=0
1284 can improve compression slightly.
1285 For UTF-16 text,
1286 .B pb=1
1287 is a good choice.
1288 If the alignment is an odd number like 3 bytes,
1289 .B pb=0
1290 might be the best choice.
1291 .IP ""
1292 Even though the assumed alignment can be adjusted with
1293 .I pb
1294 and
1295 .IR lp ,
1296 LZMA1 and LZMA2 still slightly favor 16-byte alignment.
1297 It might be worth taking into account when designing file formats
1298 that are likely to be often compressed with LZMA1 or LZMA2.
1299 .TP
1300 .BI mf= mf
1301 Match finder has a major effect on encoder speed,
1302 memory usage, and compression ratio.
1303 Usually Hash Chain match finders are faster than Binary Tree
1304 match finders.
1305 The default depends on the
1306 .IR preset :
1307 0 uses
1308 .BR hc3 ,
1309 1\-3
1310 use
1311 .BR hc4 ,
1312 and the rest use
1313 .BR bt4 .
1314 .IP ""
1315 The following match finders are supported.
1316 The memory usage formulas below are rough approximations,
1317 which are closest to the reality when
1318 .I dict
1319 is a power of two.
1320 .RS
1321 .TP
1322 .B hc3
1323 Hash Chain with 2- and 3-byte hashing
1324 .br
1325 Minimum value for
1326 .IR nice :
1327 3
1328 .br
1329 Memory usage:
1330 .br
1331 .I dict
1332 * 7.5 (if
1333 .I dict
1334 <= 16 MiB);
1335 .br
1336 .I dict
1337 * 5.5 + 64 MiB (if
1338 .I dict
1339 > 16 MiB)
1340 .TP
1341 .B hc4
1342 Hash Chain with 2-, 3-, and 4-byte hashing
1343 .br
1344 Minimum value for
1345 .IR nice :
1346 4
1347 .br
1348 Memory usage:
1349 .br
1350 .I dict
1351 * 7.5 (if
1352 .I dict
1353 <= 32 MiB);
1354 .br
1355 .I dict
1356 * 6.5 (if
1357 .I dict
1358 > 32 MiB)
1359 .TP
1360 .B bt2
1361 Binary Tree with 2-byte hashing
1362 .br
1363 Minimum value for
1364 .IR nice :
1365 2
1366 .br
1367 Memory usage:
1368 .I dict
1369 * 9.5
1370 .TP
1371 .B bt3
1372 Binary Tree with 2- and 3-byte hashing
1373 .br
1374 Minimum value for
1375 .IR nice :
1376 3
1377 .br
1378 Memory usage:
1379 .br
1380 .I dict
1381 * 11.5 (if
1382 .I dict
1383 <= 16 MiB);
1384 .br
1385 .I dict
1386 * 9.5 + 64 MiB (if
1387 .I dict
1388 > 16 MiB)
1389 .TP
1390 .B bt4
1391 Binary Tree with 2-, 3-, and 4-byte hashing
1392 .br
1393 Minimum value for
1394 .IR nice :
1395 4
1396 .br
1397 Memory usage:
1398 .br
1399 .I dict
1400 * 11.5 (if
1401 .I dict
1402 <= 32 MiB);
1403 .br
1404 .I dict
1405 * 10.5 (if
1406 .I dict
1407 > 32 MiB)
1408 .RE
1409 .TP
1410 .BI mode= mode
1411 Compression
1412 .I mode
1413 specifies the method to analyze
1414 the data produced by the match finder.
1415 Supported
1416 .I modes
1417 are
1418 .B fast
1419 and
1420 .BR normal .
1421 The default is
1422 .B fast
1423 for
1424 .I presets
1425 0\-3 and
1426 .B normal
1427 for
1428 .I presets
1429 4\-9.
1430 .IP ""
1431 Usually
1432 .B fast
1433 is used with Hash Chain match finders and
1434 .B normal
1435 with Binary Tree match finders.
1436 This is also what the
1437 .I presets
1438 do.
1439 .TP
1440 .BI nice= nice
1441 Specify what is considered to be a nice length for a match.
1442 Once a match of at least
1443 .I nice
1444 bytes is found, the algorithm stops
1445 looking for possibly better matches.
1446 .IP ""
1447 .I Nice
1448 can be 2\-273 bytes.
1449 Higher values tend to give better compression ratio
1450 at the expense of speed.
1451 The default depends on the
1452 .IR preset .
1453 .TP
1454 .BI depth= depth
1455 Specify the maximum search depth in the match finder.
1456 The default is the special value of 0,
1457 which makes the compressor determine a reasonable
1458 .I depth
1459 from
1460 .I mf
1461 and
1462 .IR nice .
1463 .IP ""
1464 Reasonable
1465 .I depth
1466 for Hash Chains is 4\-100 and 16\-1000 for Binary Trees.
1467 Using very high values for
1468 .I depth
1469 can make the encoder extremely slow with some files.
1470 Avoid setting the
1471 .I depth
1472 over 1000 unless you are prepared to interrupt
1473 the compression in case it is taking far too long.
1474 .RE
1475 .IP ""
1476 When decoding raw streams
1477 .RB ( \-\-format=raw ),
1478 LZMA2 needs only the dictionary
1479 .IR size .
1480 LZMA1 needs also
1481 .IR lc ,
1482 .IR lp ,
1483 and
1484 .IR pb .
1485 .TP
1486 \fB\-\-x86\fR[\fB=\fIoptions\fR]
1487 .PD 0
1488 .TP
1489 \fB\-\-powerpc\fR[\fB=\fIoptions\fR]
1490 .TP
1491 \fB\-\-ia64\fR[\fB=\fIoptions\fR]
1492 .TP
1493 \fB\-\-arm\fR[\fB=\fIoptions\fR]
1494 .TP
1495 \fB\-\-armthumb\fR[\fB=\fIoptions\fR]
1496 .TP
1497 \fB\-\-sparc\fR[\fB=\fIoptions\fR]
1498 .PD
1499 Add a branch/call/jump (BCJ) filter to the filter chain.
1500 These filters can be used only as a non-last filter
1501 in the filter chain.
1502 .IP ""
1503 A BCJ filter converts relative addresses in
1504 the machine code to their absolute counterparts.
1505 This doesn't change the size of the data,
1506 but it increases redundancy,
1507 which can help LZMA2 to produce 0\-15\ % smaller
1508 .B .xz
1509 file.
1510 The BCJ filters are always reversible,
1511 so using a BCJ filter for wrong type of data
1512 doesn't cause any data loss, although it may make
1513 the compression ratio slightly worse.
1514 .IP ""
1515 It is fine to apply a BCJ filter on a whole executable;
1516 there's no need to apply it only on the executable section.
1517 Applying a BCJ filter on an archive that contains both executable
1518 and non-executable files may or may not give good results,
1519 so it generally isn't good to blindly apply a BCJ filter when
1520 compressing binary packages for distribution.
1521 .IP ""
1522 These BCJ filters are very fast and
1523 use insignificant amount of memory.
1524 If a BCJ filter improves compression ratio of a file,
1525 it can improve decompression speed at the same time.
1526 This is because, on the same hardware,
1527 the decompression speed of LZMA2 is roughly
1528 a fixed number of bytes of compressed data per second.
1529 .IP ""
1530 These BCJ filters have known problems related to
1531 the compression ratio:
1532 .RS
1533 .IP \(bu 3
1534 Some types of files containing executable code
1535 (e.g. object files, static libraries, and Linux kernel modules)
1536 have the addresses in the instructions filled with filler values.
1537 These BCJ filters will still do the address conversion,
1538 which will make the compression worse with these files.
1539 .IP \(bu 3
1540 Applying a BCJ filter on an archive containing multiple similar
1541 executables can make the compression ratio worse than not using
1542 a BCJ filter.
1543 This is because the BCJ filter doesn't detect the boundaries
1544 of the executable files, and doesn't reset
1545 the address conversion counter for each executable.
1546 .RE
1547 .IP ""
1548 Both of the above problems will be fixed
1549 in the future in a new filter.
1550 The old BCJ filters will still be useful in embedded systems,
1551 because the decoder of the new filter will be bigger
1552 and use more memory.
1553 .IP ""
1554 Different instruction sets have have different alignment:
1555 .RS
1556 .RS
1557 .PP
1558 .TS
1559 tab(;);
1560 l n l
1561 l n l.
1562 Filter;Alignment;Notes
1563 x86;1;32-bit or 64-bit x86
1564 PowerPC;4;Big endian only
1565 ARM;4;Little endian only
1566 ARM-Thumb;2;Little endian only
1567 IA-64;16;Big or little endian
1568 SPARC;4;Big or little endian
1569 .TE
1570 .RE
1571 .RE
1572 .IP ""
1573 Since the BCJ-filtered data is usually compressed with LZMA2,
1574 the compression ratio may be improved slightly if
1575 the LZMA2 options are set to match the
1576 alignment of the selected BCJ filter.
1577 For example, with the IA-64 filter, it's good to set
1578 .B pb=4
1579 with LZMA2 (2^4=16).
1580 The x86 filter is an exception;
1581 it's usually good to stick to LZMA2's default
1582 four-byte alignment when compressing x86 executables.
1583 .IP ""
1584 All BCJ filters support the same
1585 .IR options :
1586 .RS
1587 .TP
1588 .BI start= offset
1589 Specify the start
1590 .I offset
1591 that is used when converting between relative
1592 and absolute addresses.
1593 The
1594 .I offset
1595 must be a multiple of the alignment of the filter
1596 (see the table above).
1597 The default is zero.
1598 In practice, the default is good; specifying a custom
1599 .I offset
1600 is almost never useful.
1601 .RE
1602 .TP
1603 \fB\-\-delta\fR[\fB=\fIoptions\fR]
1604 Add the Delta filter to the filter chain.
1605 The Delta filter can be only used as a non-last filter
1606 in the filter chain.
1607 .IP ""
1608 Currently only simple byte-wise delta calculation is supported.
1609 It can be useful when compressing e.g. uncompressed bitmap images
1610 or uncompressed PCM audio.
1611 However, special purpose algorithms may give significantly better
1612 results than Delta + LZMA2.
1613 This is true especially with audio,
1614 which compresses faster and better e.g. with
1615 .BR flac (1).
1616 .IP ""
1617 Supported
1618 .IR options :
1619 .RS
1620 .TP
1621 .BI dist= distance
1622 Specify the
1623 .I distance
1624 of the delta calculation in bytes.
1625 .I distance
1626 must be 1\-256.
1627 The default is 1.
1628 .IP ""
1629 For example, with
1630 .B dist=2
1631 and eight-byte input A1 B1 A2 B3 A3 B5 A4 B7, the output will be
1632 A1 B1 01 02 01 02 01 02.
1633 .RE
1634 .
1635 .SS "Other options"
1636 .TP
1637 .BR \-q ", " \-\-quiet
1638 Suppress warnings and notices.
1639 Specify this twice to suppress errors too.
1640 This option has no effect on the exit status.
1641 That is, even if a warning was suppressed,
1642 the exit status to indicate a warning is still used.
1643 .TP
1644 .BR \-v ", " \-\-verbose
1645 Be verbose.
1646 If standard error is connected to a terminal,
1647 .B xz
1648 will display a progress indicator.
1649 Specifying
1650 .B \-\-verbose
1651 twice will give even more verbose output.
1652 .IP ""
1653 The progress indicator shows the following information:
1654 .RS
1655 .IP \(bu 3
1656 Completion percentage is shown
1657 if the size of the input file is known.
1658 That is, the percentage cannot be shown in pipes.
1659 .IP \(bu 3
1660 Amount of compressed data produced (compressing)
1661 or consumed (decompressing).
1662 .IP \(bu 3
1663 Amount of uncompressed data consumed (compressing)
1664 or produced (decompressing).
1665 .IP \(bu 3
1666 Compression ratio, which is calculated by dividing
1667 the amount of compressed data processed so far by
1668 the amount of uncompressed data processed so far.
1669 .IP \(bu 3
1670 Compression or decompression speed.
1671 This is measured as the amount of uncompressed data consumed
1672 (compression) or produced (decompression) per second.
1673 It is shown after a few seconds have passed since
1674 .B xz
1675 started processing the file.
1676 .IP \(bu 3
1677 Elapsed time in the format M:SS or H:MM:SS.
1678 .IP \(bu 3
1679 Estimated remaining time is shown
1680 only when the size of the input file is
1681 known and a couple of seconds have already passed since
1682 .B xz
1683 started processing the file.
1684 The time is shown in a less precise format which
1685 never has any colons, e.g. 2 min 30 s.
1686 .RE
1687 .IP ""
1688 When standard error is not a terminal,
1689 .B \-\-verbose
1690 will make
1691 .B xz
1692 print the filename, compressed size, uncompressed size,
1693 compression ratio, and possibly also the speed and elapsed time
1694 on a single line to standard error after compressing or
1695 decompressing the file.
1696 The speed and elapsed time are included only when
1697 the operation took at least a few seconds.
1698 If the operation didn't finish, e.g. due to user interruption,
1699 also the completion percentage is printed
1700 if the size of the input file is known.
1701 .TP
1702 .BR \-Q ", " \-\-no\-warn
1703 Don't set the exit status to 2
1704 even if a condition worth a warning was detected.
1705 This option doesn't affect the verbosity level, thus both
1706 .B \-\-quiet
1707 and
1708 .B \-\-no\-warn
1709 have to be used to not display warnings and
1710 to not alter the exit status.
1711 .TP
1712 .B \-\-robot
1713 Print messages in a machine-parsable format.
1714 This is intended to ease writing frontends that want to use
1715 .B xz
1716 instead of liblzma, which may be the case with various scripts.
1717 The output with this option enabled is meant to be stable across
1718 .B xz
1719 releases.
1720 See the section
1721 .B "ROBOT MODE"
1722 for details.
1723 .TP
1724 .BR \-\-info\-memory
1725 Display, in human-readable format, how much physical memory (RAM)
1726 .B xz
1727 thinks the system has and the memory usage limits for compression
1728 and decompression, and exit successfully.
1729 .TP
1730 .BR \-h ", " \-\-help
1731 Display a help message describing the most commonly used options,
1732 and exit successfully.
1733 .TP
1734 .BR \-H ", " \-\-long\-help
1735 Display a help message describing all features of
1736 .BR xz ,
1737 and exit successfully
1738 .TP
1739 .BR \-V ", " \-\-version
1740 Display the version number of
1741 .B xz
1742 and liblzma in human readable format.
1743 To get machine-parsable output, specify
1744 .B \-\-robot
1745 before
1746 .BR \-\-version .
1747 .
1748 .SH "ROBOT MODE"
1749 The robot mode is activated with the
1750 .B \-\-robot
1751 option.
1752 It makes the output of
1753 .B xz
1754 easier to parse by other programs.
1755 Currently
1756 .B \-\-robot
1757 is supported only together with
1758 .BR \-\-version ,
1759 .BR \-\-info\-memory ,
1760 and
1761 .BR \-\-list .
1762 It will be supported for compression and
1763 decompression in the future.
1764 .
1765 .SS Version
1766 .B "xz \-\-robot \-\-version"
1767 will print the version number of
1768 .B xz
1769 and liblzma in the following format:
1770 .PP
1771 .BI XZ_VERSION= XYYYZZZS
1772 .br
1773 .BI LIBLZMA_VERSION= XYYYZZZS
1774 .TP
1775 .I X
1776 Major version.
1777 .TP
1778 .I YYY
1779 Minor version.
1780 Even numbers are stable.
1781 Odd numbers are alpha or beta versions.
1782 .TP
1783 .I ZZZ
1784 Patch level for stable releases or
1785 just a counter for development releases.
1786 .TP
1787 .I S
1788 Stability.
1789 0 is alpha, 1 is beta, and 2 is stable.
1790 .I S
1791 should be always 2 when
1792 .I YYY
1793 is even.
1794 .PP
1795 .I XYYYZZZS
1796 are the same on both lines if
1797 .B xz
1798 and liblzma are from the same XZ Utils release.
1799 .PP
1800 Examples: 4.999.9beta is
1801 .B 49990091
1802 and
1803 5.0.0 is
1804 .BR 50000002 .
1805 .
1806 .SS "Memory limit information"
1807 .B "xz \-\-robot \-\-info\-memory"
1808 prints a single line with three tab-separated columns:
1809 .IP 1. 4
1810 Total amount of physical memory (RAM) in bytes
1811 .IP 2. 4
1812 Memory usage limit for compression in bytes.
1813 A special value of zero indicates the default setting,
1814 which for single-threaded mode is the same as no limit.
1815 .IP 3. 4
1816 Memory usage limit for decompression in bytes.
1817 A special value of zero indicates the default setting,
1818 which for single-threaded mode is the same as no limit.
1819 .PP
1820 In the future, the output of
1821 .B "xz \-\-robot \-\-info\-memory"
1822 may have more columns, but never more than a single line.
1823 .
1824 .SS "List mode"
1825 .B "xz \-\-robot \-\-list"
1826 uses tab-separated output.
1827 The first column of every line has a string
1828 that indicates the type of the information found on that line:
1829 .TP
1830 .B name
1831 This is always the first line when starting to list a file.
1832 The second column on the line is the filename.
1833 .TP
1834 .B file
1835 This line contains overall information about the
1836 .B .xz
1837 file.
1838 This line is always printed after the
1839 .B name
1840 line.
1841 .TP
1842 .B stream
1843 This line type is used only when
1844 .B \-\-verbose
1845 was specified.
1846 There are as many
1847 .B stream
1848 lines as there are streams in the
1849 .B .xz
1850 file.
1851 .TP
1852 .B block
1853 This line type is used only when
1854 .B \-\-verbose
1855 was specified.
1856 There are as many
1857 .B block
1858 lines as there are blocks in the
1859 .B .xz
1860 file.
1861 The
1862 .B block
1863 lines are shown after all the
1864 .B stream
1865 lines; different line types are not interleaved.
1866 .TP
1867 .B summary
1868 This line type is used only when
1869 .B \-\-verbose
1870 was specified twice.
1871 This line is printed after all
1872 .B block
1873 lines.
1874 Like the
1875 .B file
1876 line, the
1877 .B summary
1878 line contains overall information about the
1879 .B .xz
1880 file.
1881 .TP
1882 .B totals
1883 This line is always the very last line of the list output.
1884 It shows the total counts and sizes.
1885 .PP
1886 The columns of the
1887 .B file
1888 lines:
1889 .PD 0
1890 .RS
1891 .IP 2. 4
1892 Number of streams in the file
1893 .IP 3. 4
1894 Total number of blocks in the stream(s)
1895 .IP 4. 4
1896 Compressed size of the file
1897 .IP 5. 4
1898 Uncompressed size of the file
1899 .IP 6. 4
1900 Compression ratio, for example
1901 .BR 0.123.
1902 If ratio is over 9.999, three dashes
1903 .RB ( \-\-\- )
1904 are displayed instead of the ratio.
1905 .IP 7. 4
1906 Comma-separated list of integrity check names.
1907 The following strings are used for the known check types:
1908 .BR None ,
1909 .BR CRC32 ,
1910 .BR CRC64 ,
1911 and
1912 .BR SHA\-256 .
1913 For unknown check types,
1914 .BI Unknown\- N
1915 is used, where
1916 .I N
1917 is the Check ID as a decimal number (one or two digits).
1918 .IP 8. 4
1919 Total size of stream padding in the file
1920 .RE
1921 .PD
1922 .PP
1923 The columns of the
1924 .B stream
1925 lines:
1926 .PD 0
1927 .RS
1928 .IP 2. 4
1929 Stream number (the first stream is 1)
1930 .IP 3. 4
1931 Number of blocks in the stream
1932 .IP 4. 4
1933 Compressed start offset
1934 .IP 5. 4
1935 Uncompressed start offset
1936 .IP 6. 4
1937 Compressed size (does not include stream padding)
1938 .IP 7. 4
1939 Uncompressed size
1940 .IP 8. 4
1941 Compression ratio
1942 .IP 9. 4
1943 Name of the integrity check
1944 .IP 10. 4
1945 Size of stream padding
1946 .RE
1947 .PD
1948 .PP
1949 The columns of the
1950 .B block
1951 lines:
1952 .PD 0
1953 .RS
1954 .IP 2. 4
1955 Number of the stream containing this block
1956 .IP 3. 4
1957 Block number relative to the beginning of the stream
1958 (the first block is 1)
1959 .IP 4. 4
1960 Block number relative to the beginning of the file
1961 .IP 5. 4
1962 Compressed start offset relative to the beginning of the file
1963 .IP 6. 4
1964 Uncompressed start offset relative to the beginning of the file
1965 .IP 7. 4
1966 Total compressed size of the block (includes headers)
1967 .IP 8. 4
1968 Uncompressed size
1969 .IP 9. 4
1970 Compression ratio
1971 .IP 10. 4
1972 Name of the integrity check
1973 .RE
1974 .PD
1975 .PP
1976 If
1977 .B \-\-verbose
1978 was specified twice, additional columns are included on the
1979 .B block
1980 lines.
1981 These are not displayed with a single
1982 .BR \-\-verbose ,
1983 because getting this information requires many seeks
1984 and can thus be slow:
1985 .PD 0
1986 .RS
1987 .IP 11. 4
1988 Value of the integrity check in hexadecimal
1989 .IP 12. 4
1990 Block header size
1991 .IP 13. 4
1992 Block flags:
1993 .B c
1994 indicates that compressed size is present, and
1995 .B u
1996 indicates that uncompressed size is present.
1997 If the flag is not set, a dash
1998 .RB ( \- )
1999 is shown instead to keep the string length fixed.
2000 New flags may be added to the end of the string in the future.
2001 .IP 14. 4
2002 Size of the actual compressed data in the block (this excludes
2003 the block header, block padding, and check fields)
2004 .IP 15. 4
2005 Amount of memory (in bytes) required to decompress
2006 this block with this
2007 .B xz
2008 version
2009 .IP 16. 4
2010 Filter chain.
2011 Note that most of the options used at compression time
2012 cannot be known, because only the options
2013 that are needed for decompression are stored in the
2014 .B .xz
2015 headers.
2016 .RE
2017 .PD
2018 .PP
2019 The columns of the
2020 .B summary
2021 lines:
2022 .PD 0
2023 .RS
2024 .IP 2. 4
2025 Amount of memory (in bytes) required to decompress
2026 this file with this
2027 .B xz
2028 version
2029 .IP 3. 4
2030 .B yes
2031 or
2032 .B no
2033 indicating if all block headers have both compressed size and
2034 uncompressed size stored in them
2035 .PP
2036 .I Since
2037 .B xz
2038 .I 5.1.2alpha:
2039 .IP 4. 4
2040 Minimum
2041 .B xz
2042 version required to decompress the file
2043 .RE
2044 .PD
2045 .PP
2046 The columns of the
2047 .B totals
2048 line:
2049 .PD 0
2050 .RS
2051 .IP 2. 4
2052 Number of streams
2053 .IP 3. 4
2054 Number of blocks
2055 .IP 4. 4
2056 Compressed size
2057 .IP 5. 4
2058 Uncompressed size
2059 .IP 6. 4
2060 Average compression ratio
2061 .IP 7. 4
2062 Comma-separated list of integrity check names
2063 that were present in the files
2064 .IP 8. 4
2065 Stream padding size
2066 .IP 9. 4
2067 Number of files.
2068 This is here to
2069 keep the order of the earlier columns the same as on
2070 .B file
2071 lines.
2072 .PD
2073 .RE
2074 .PP
2075 If
2076 .B \-\-verbose
2077 was specified twice, additional columns are included on the
2078 .B totals
2079 line:
2080 .PD 0
2081 .RS
2082 .IP 10. 4
2083 Maximum amount of memory (in bytes) required to decompress
2084 the files with this
2085 .B xz
2086 version
2087 .IP 11. 4
2088 .B yes
2089 or
2090 .B no
2091 indicating if all block headers have both compressed size and
2092 uncompressed size stored in them
2093 .PP
2094 .I Since
2095 .B xz
2096 .I 5.1.2alpha:
2097 .IP 12. 4
2098 Minimum
2099 .B xz
2100 version required to decompress the file
2101 .RE
2102 .PD
2103 .PP
2104 Future versions may add new line types and
2105 new columns can be added to the existing line types,
2106 but the existing columns won't be changed.
2107 .
2108 .SH "EXIT STATUS"
2109 .TP
2110 .B 0
2111 All is good.
2112 .TP
2113 .B 1
2114 An error occurred.
2115 .TP
2116 .B 2
2117 Something worth a warning occurred,
2118 but no actual errors occurred.
2119 .PP
2120 Notices (not warnings or errors) printed on standard error
2121 don't affect the exit status.
2122 .
2123 .SH ENVIRONMENT
2124 .B xz
2125 parses space-separated lists of options
2126 from the environment variables
2127 .B XZ_DEFAULTS
2128 and
2129 .BR XZ_OPT ,
2130 in this order, before parsing the options from the command line.
2131 Note that only options are parsed from the environment variables;
2132 all non-options are silently ignored.
2133 Parsing is done with
2134 .BR getopt_long (3)
2135 which is used also for the command line arguments.
2136 .TP
2137 .B XZ_DEFAULTS
2138 User-specific or system-wide default options.
2139 Typically this is set in a shell initialization script to enable
2140 .BR xz 's
2141 memory usage limiter by default.
2142 Excluding shell initialization scripts
2143 and similar special cases, scripts must never set or unset
2144 .BR XZ_DEFAULTS .
2145 .TP
2146 .B XZ_OPT
2147 This is for passing options to
2148 .B xz
2149 when it is not possible to set the options directly on the
2150 .B xz
2151 command line.
2152 This is the case e.g. when
2153 .B xz
2154 is run by a script or tool, e.g. GNU
2155 .BR tar (1):
2156 .RS
2157 .RS
2158 .PP
2159 .nf
2160 .ft CW
2161 XZ_OPT=\-2v tar caf foo.tar.xz foo
2162 .ft R
2163 .fi
2164 .RE
2165 .RE
2166 .IP ""
2167 Scripts may use
2168 .B XZ_OPT
2169 e.g. to set script-specific default compression options.
2170 It is still recommended to allow users to override
2171 .B XZ_OPT
2172 if that is reasonable, e.g. in
2173 .BR sh (1)
2174 scripts one may use something like this:
2175 .RS
2176 .RS
2177 .PP
2178 .nf
2179 .ft CW
2180 XZ_OPT=${XZ_OPT\-"\-7e"}
2181 export XZ_OPT
2182 .ft R
2183 .fi
2184 .RE
2185 .RE
2186 .
2187 .SH "LZMA UTILS COMPATIBILITY"
2188 The command line syntax of
2189 .B xz
2190 is practically a superset of
2191 .BR lzma ,
2192 .BR unlzma ,
2193 and
2194 .BR lzcat
2195 as found from LZMA Utils 4.32.x.
2196 In most cases, it is possible to replace
2197 LZMA Utils with XZ Utils without breaking existing scripts.
2198 There are some incompatibilities though,
2199 which may sometimes cause problems.
2200 .
2201 .SS "Compression preset levels"
2202 The numbering of the compression level presets is not identical in
2203 .B xz
2204 and LZMA Utils.
2205 The most important difference is how dictionary sizes
2206 are mapped to different presets.
2207 Dictionary size is roughly equal to the decompressor memory usage.
2208 .RS
2209 .PP
2210 .TS
2211 tab(;);
2212 c c c
2213 c n n.
2214 Level;xz;LZMA Utils
2215 \-0;256 KiB;N/A
2216 \-1;1 MiB;64 KiB
2217 \-2;2 MiB;1 MiB
2218 \-3;4 MiB;512 KiB
2219 \-4;4 MiB;1 MiB
2220 \-5;8 MiB;2 MiB
2221 \-6;8 MiB;4 MiB
2222 \-7;16 MiB;8 MiB
2223 \-8;32 MiB;16 MiB
2224 \-9;64 MiB;32 MiB
2225 .TE
2226 .RE
2227 .PP
2228 The dictionary size differences affect
2229 the compressor memory usage too,
2230 but there are some other differences between
2231 LZMA Utils and XZ Utils, which
2232 make the difference even bigger:
2233 .RS
2234 .PP
2235 .TS
2236 tab(;);
2237 c c c
2238 c n n.
2239 Level;xz;LZMA Utils 4.32.x
2240 \-0;3 MiB;N/A
2241 \-1;9 MiB;2 MiB
2242 \-2;17 MiB;12 MiB
2243 \-3;32 MiB;12 MiB
2244 \-4;48 MiB;16 MiB
2245 \-5;94 MiB;26 MiB
2246 \-6;94 MiB;45 MiB
2247 \-7;186 MiB;83 MiB
2248 \-8;370 MiB;159 MiB
2249 \-9;674 MiB;311 MiB
2250 .TE
2251 .RE
2252 .PP
2253 The default preset level in LZMA Utils is
2254 .B \-7
2255 while in XZ Utils it is
2256 .BR \-6 ,
2257 so both use an 8 MiB dictionary by default.
2258 .
2259 .SS "Streamed vs. non-streamed .lzma files"
2260 The uncompressed size of the file can be stored in the
2261 .B .lzma
2262 header.
2263 LZMA Utils does that when compressing regular files.
2264 The alternative is to mark that uncompressed size is unknown
2265 and use end-of-payload marker to indicate
2266 where the decompressor should stop.
2267 LZMA Utils uses this method when uncompressed size isn't known,
2268 which is the case for example in pipes.
2269 .PP
2270 .B xz
2271 supports decompressing
2272 .B .lzma
2273 files with or without end-of-payload marker, but all
2274 .B .lzma
2275 files created by
2276 .B xz
2277 will use end-of-payload marker and have uncompressed size
2278 marked as unknown in the
2279 .B .lzma
2280 header.
2281 This may be a problem in some uncommon situations.
2282 For example, a
2283 .B .lzma
2284 decompressor in an embedded device might work
2285 only with files that have known uncompressed size.
2286 If you hit this problem, you need to use LZMA Utils
2287 or LZMA SDK to create
2288 .B .lzma
2289 files with known uncompressed size.
2290 .
2291 .SS "Unsupported .lzma files"
2292 The
2293 .B .lzma
2294 format allows
2295 .I lc
2296 values up to 8, and
2297 .I lp
2298 values up to 4.
2299 LZMA Utils can decompress files with any
2300 .I lc
2301 and
2302 .IR lp ,
2303 but always creates files with
2304 .B lc=3
2305 and
2306 .BR lp=0 .
2307 Creating files with other
2308 .I lc
2309 and
2310 .I lp
2311 is possible with
2312 .B xz
2313 and with LZMA SDK.
2314 .PP
2315 The implementation of the LZMA1 filter in liblzma
2316 requires that the sum of
2317 .I lc
2318 and
2319 .I lp
2320 must not exceed 4.
2321 Thus,
2322 .B .lzma
2323 files, which exceed this limitation, cannot be decompressed with
2324 .BR xz .
2325 .PP
2326 LZMA Utils creates only
2327 .B .lzma
2328 files which have a dictionary size of
2329 .RI "2^" n
2330 (a power of 2) but accepts files with any dictionary size.
2331 liblzma accepts only
2332 .B .lzma
2333 files which have a dictionary size of
2334 .RI "2^" n
2335 or
2336 .RI "2^" n " + 2^(" n "\-1)."
2337 This is to decrease false positives when detecting
2338 .B .lzma
2339 files.
2340 .PP
2341 These limitations shouldn't be a problem in practice,
2342 since practically all
2343 .B .lzma
2344 files have been compressed with settings that liblzma will accept.
2345 .
2346 .SS "Trailing garbage"
2347 When decompressing,
2348 LZMA Utils silently ignore everything after the first
2349 .B .lzma
2350 stream.
2351 In most situations, this is a bug.
2352 This also means that LZMA Utils
2353 don't support decompressing concatenated
2354 .B .lzma
2355 files.
2356 .PP
2357 If there is data left after the first
2358 .B .lzma
2359 stream,
2360 .B xz
2361 considers the file to be corrupt unless
2362 .B \-\-single\-stream
2363 was used.
2364 This may break obscure scripts which have
2365 assumed that trailing garbage is ignored.
2366 .
2367 .SH NOTES
2368 .
2369 .SS "Compressed output may vary"
2370 The exact compressed output produced from
2371 the same uncompressed input file
2372 may vary between XZ Utils versions even if
2373 compression options are identical.
2374 This is because the encoder can be improved
2375 (faster or better compression)
2376 without affecting the file format.
2377 The output can vary even between different
2378 builds of the same XZ Utils version,
2379 if different build options are used.
2380 .PP
2381 The above means that once
2382 .B \-\-rsyncable
2383 has been implemented,
2384 the resulting files won't necessarily be rsyncable
2385 unless both old and new files have been compressed
2386 with the same xz version.
2387 This problem can be fixed if a part of the encoder
2388 implementation is frozen to keep rsyncable output
2389 stable across xz versions.
2390 .
2391 .SS "Embedded .xz decompressors"
2392 Embedded
2393 .B .xz
2394 decompressor implementations like XZ Embedded don't necessarily
2395 support files created with integrity
2396 .I check
2397 types other than
2398 .B none
2399 and
2400 .BR crc32 .
2401 Since the default is
2402 .BR \-\-check=crc64 ,
2403 you must use
2404 .B \-\-check=none
2405 or
2406 .B \-\-check=crc32
2407 when creating files for embedded systems.
2408 .PP
2409 Outside embedded systems, all
2410 .B .xz
2411 format decompressors support all the
2412 .I check
2413 types, or at least are able to decompress
2414 the file without verifying the
2415 integrity check if the particular
2416 .I check
2417 is not supported.
2418 .PP
2419 XZ Embedded supports BCJ filters,
2420 but only with the default start offset.
2421 .
2422 .SH EXAMPLES
2423 .
2424 .SS Basics
2425 Compress the file
2426 .I foo
2427 into
2428 .I foo.xz
2429 using the default compression level
2430 .RB ( \-6 ),
2431 and remove
2432 .I foo
2433 if compression is successful:
2434 .RS
2435 .PP
2436 .nf
2437 .ft CW
2438 xz foo
2439 .ft R
2440 .fi
2441 .RE
2442 .PP
2443 Decompress
2444 .I bar.xz
2445 into
2446 .I bar
2447 and don't remove
2448 .I bar.xz
2449 even if decompression is successful:
2450 .RS
2451 .PP
2452 .nf
2453 .ft CW
2454 xz \-dk bar.xz
2455 .ft R
2456 .fi
2457 .RE
2458 .PP
2459 Create
2460 .I baz.tar.xz
2461 with the preset
2462 .B \-4e
2463 .RB ( "\-4 \-\-extreme" ),
2464 which is slower than e.g. the default
2465 .BR \-6 ,
2466 but needs less memory for compression and decompression (48\ MiB
2467 and 5\ MiB, respectively):
2468 .RS
2469 .PP
2470 .nf
2471 .ft CW
2472 tar cf \- baz | xz \-4e > baz.tar.xz
2473 .ft R
2474 .fi
2475 .RE
2476 .PP
2477 A mix of compressed and uncompressed files can be decompressed
2478 to standard output with a single command:
2479 .RS
2480 .PP
2481 .nf
2482 .ft CW
2483 xz \-dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt
2484 .ft R
2485 .fi
2486 .RE
2487 .
2488 .SS "Parallel compression of many files"
2489 On GNU and *BSD,
2490 .BR find (1)
2491 and
2492 .BR xargs (1)
2493 can be used to parallelize compression of many files:
2494 .RS
2495 .PP
2496 .nf
2497 .ft CW
2498 find . \-type f \e! \-name '*.xz' \-print0 \e
2499     | xargs \-0r \-P4 \-n16 xz \-T1
2500 .ft R
2501 .fi
2502 .RE
2503 .PP
2504 The
2505 .B \-P
2506 option to
2507 .BR xargs (1)
2508 sets the number of parallel
2509 .B xz
2510 processes.
2511 The best value for the
2512 .B \-n
2513 option depends on how many files there are to be compressed.
2514 If there are only a couple of files,
2515 the value should probably be 1;
2516 with tens of thousands of files,
2517 100 or even more may be appropriate to reduce the number of
2518 .B xz
2519 processes that
2520 .BR xargs (1)
2521 will eventually create.
2522 .PP
2523 The option
2524 .B \-T1
2525 for
2526 .B xz
2527 is there to force it to single-threaded mode, because
2528 .BR xargs (1)
2529 is used to control the amount of parallelization.
2530 .
2531 .SS "Robot mode"
2532 Calculate how many bytes have been saved in total
2533 after compressing multiple files:
2534 .RS
2535 .PP
2536 .nf
2537 .ft CW
2538 xz \-\-robot \-\-list *.xz | awk '/^totals/{print $5\-$4}'
2539 .ft R
2540 .fi
2541 .RE
2542 .PP
2543 A script may want to know that it is using new enough
2544 .BR xz .
2545 The following
2546 .BR sh (1)
2547 script checks that the version number of the
2548 .B xz
2549 tool is at least 5.0.0.
2550 This method is compatible with old beta versions,
2551 which didn't support the
2552 .B \-\-robot
2553 option:
2554 .RS
2555 .PP
2556 .nf
2557 .ft CW
2558 if ! eval "$(xz \-\-robot \-\-version 2> /dev/null)" ||
2559         [ "$XZ_VERSION" \-lt 50000002 ]; then
2560     echo "Your xz is too old."
2561 fi
2562 unset XZ_VERSION LIBLZMA_VERSION
2563 .ft R
2564 .fi
2565 .RE
2566 .PP
2567 Set a memory usage limit for decompression using
2568 .BR XZ_OPT ,
2569 but if a limit has already been set, don't increase it:
2570 .RS
2571 .PP
2572 .nf
2573 .ft CW
2574 NEWLIM=$((123 << 20))  # 123 MiB
2575 OLDLIM=$(xz \-\-robot \-\-info\-memory | cut \-f3)
2576 if [ $OLDLIM \-eq 0 \-o $OLDLIM \-gt $NEWLIM ]; then
2577     XZ_OPT="$XZ_OPT \-\-memlimit\-decompress=$NEWLIM"
2578     export XZ_OPT
2579 fi
2580 .ft R
2581 .fi
2582 .RE
2583 .
2584 .SS "Custom compressor filter chains"
2585 The simplest use for custom filter chains is
2586 customizing a LZMA2 preset.
2587 This can be useful,
2588 because the presets cover only a subset of the
2589 potentially useful combinations of compression settings.
2590 .PP
2591 The CompCPU columns of the tables
2592 from the descriptions of the options
2593 .BR "\-0" " ... " "\-9"
2594 and
2595 .B \-\-extreme
2596 are useful when customizing LZMA2 presets.
2597 Here are the relevant parts collected from those two tables:
2598 .RS
2599 .PP
2600 .TS
2601 tab(;);
2602 c c
2603 n n.
2604 Preset;CompCPU
2605 \-0;0
2606 \-1;1
2607 \-2;2
2608 \-3;3
2609 \-4;4
2610 \-5;5
2611 \-6;6
2612 \-5e;7
2613 \-6e;8
2614 .TE
2615 .RE
2616 .PP
2617 If you know that a file requires
2618 somewhat big dictionary (e.g. 32 MiB) to compress well,
2619 but you want to compress it quicker than
2620 .B "xz \-8"
2621 would do, a preset with a low CompCPU value (e.g. 1)
2622 can be modified to use a bigger dictionary:
2623 .RS
2624 .PP
2625 .nf
2626 .ft CW
2627 xz \-\-lzma2=preset=1,dict=32MiB foo.tar
2628 .ft R
2629 .fi
2630 .RE
2631 .PP
2632 With certain files, the above command may be faster than
2633 .B "xz \-6"
2634 while compressing significantly better.
2635 However, it must be emphasized that only some files benefit from
2636 a big dictionary while keeping the CompCPU value low.
2637 The most obvious situation,
2638 where a big dictionary can help a lot,
2639 is an archive containing very similar files
2640 of at least a few megabytes each.
2641 The dictionary size has to be significantly bigger
2642 than any individual file to allow LZMA2 to take
2643 full advantage of the similarities between consecutive files.
2644 .PP
2645 If very high compressor and decompressor memory usage is fine,
2646 and the file being compressed is
2647 at least several hundred megabytes, it may be useful
2648 to use an even bigger dictionary than the 64 MiB that
2649 .B "xz \-9"
2650 would use:
2651 .RS
2652 .PP
2653 .nf
2654 .ft CW
2655 xz \-vv \-\-lzma2=dict=192MiB big_foo.tar
2656 .ft R
2657 .fi
2658 .RE
2659 .PP
2660 Using
2661 .B \-vv
2662 .RB ( "\-\-verbose \-\-verbose" )
2663 like in the above example can be useful
2664 to see the memory requirements
2665 of the compressor and decompressor.
2666 Remember that using a dictionary bigger than
2667 the size of the uncompressed file is waste of memory,
2668 so the above command isn't useful for small files.
2669 .PP
2670 Sometimes the compression time doesn't matter,
2671 but the decompressor memory usage has to be kept low
2672 e.g. to make it possible to decompress the file on
2673 an embedded system.
2674 The following command uses
2675 .B \-6e
2676 .RB ( "\-6 \-\-extreme" )
2677 as a base and sets the dictionary to only 64\ KiB.
2678 The resulting file can be decompressed with XZ Embedded
2679 (that's why there is
2680 .BR \-\-check=crc32 )
2681 using about 100\ KiB of memory.
2682 .RS
2683 .PP
2684 .nf
2685 .ft CW
2686 xz \-\-check=crc32 \-\-lzma2=preset=6e,dict=64KiB foo
2687 .ft R
2688 .fi
2689 .RE
2690 .PP
2691 If you want to squeeze out as many bytes as possible,
2692 adjusting the number of literal context bits
2693 .RI ( lc )
2694 and number of position bits
2695 .RI ( pb )
2696 can sometimes help.
2697 Adjusting the number of literal position bits
2698 .RI ( lp )
2699 might help too, but usually
2700 .I lc
2701 and
2702 .I pb
2703 are more important.
2704 E.g. a source code archive contains mostly US-ASCII text,
2705 so something like the following might give
2706 slightly (like 0.1\ %) smaller file than
2707 .B "xz \-6e"
2708 (try also without
2709 .BR lc=4 ):
2710 .RS
2711 .PP
2712 .nf
2713 .ft CW
2714 xz \-\-lzma2=preset=6e,pb=0,lc=4 source_code.tar
2715 .ft R
2716 .fi
2717 .RE
2718 .PP
2719 Using another filter together with LZMA2 can improve
2720 compression with certain file types.
2721 E.g. to compress a x86-32 or x86-64 shared library
2722 using the x86 BCJ filter:
2723 .RS
2724 .PP
2725 .nf
2726 .ft CW
2727 xz \-\-x86 \-\-lzma2 libfoo.so
2728 .ft R
2729 .fi
2730 .RE
2731 .PP
2732 Note that the order of the filter options is significant.
2733 If
2734 .B \-\-x86
2735 is specified after
2736 .BR \-\-lzma2 ,
2737 .B xz
2738 will give an error,
2739 because there cannot be any filter after LZMA2,
2740 and also because the x86 BCJ filter cannot be used
2741 as the last filter in the chain.
2742 .PP
2743 The Delta filter together with LZMA2
2744 can give good results with bitmap images.
2745 It should usually beat PNG,
2746 which has a few more advanced filters than simple
2747 delta but uses Deflate for the actual compression.
2748 .PP
2749 The image has to be saved in uncompressed format,
2750 e.g. as uncompressed TIFF.
2751 The distance parameter of the Delta filter is set
2752 to match the number of bytes per pixel in the image.
2753 E.g. 24-bit RGB bitmap needs
2754 .BR dist=3 ,
2755 and it is also good to pass
2756 .B pb=0
2757 to LZMA2 to accommodate the three-byte alignment:
2758 .RS
2759 .PP
2760 .nf
2761 .ft CW
2762 xz \-\-delta=dist=3 \-\-lzma2=pb=0 foo.tiff
2763 .ft R
2764 .fi
2765 .RE
2766 .PP
2767 If multiple images have been put into a single archive (e.g.\&
2768 .BR .tar ),
2769 the Delta filter will work on that too as long as all images
2770 have the same number of bytes per pixel.
2771 .
2772 .SH "SEE ALSO"
2773 .BR xzdec (1),
2774 .BR xzdiff (1),
2775 .BR xzgrep (1),
2776 .BR xzless (1),
2777 .BR xzmore (1),
2778 .BR gzip (1),
2779 .BR bzip2 (1),
2780 .BR 7z (1)
2781 .PP
2782 XZ Utils: <https://tukaani.org/xz/>
2783 .br
2784 XZ Embedded: <https://tukaani.org/xz/embedded.html>
2785 .br
2786 LZMA SDK: <http://7-zip.org/sdk.html>