hammer.5: add info on dedup and improve general description and markup
[dragonfly.git] / share / man / man5 / hammer.5
CommitLineData
5025869b
SW
1.\"
2.\" Copyright (c) 2008
3.\" The DragonFly Project. All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\"
9.\" 1. Redistributions of source code must retain the above copyright
10.\" notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\" notice, this list of conditions and the following disclaimer in
13.\" the documentation and/or other materials provided with the
14.\" distribution.
15.\" 3. Neither the name of The DragonFly Project nor the names of its
16.\" contributors may be used to endorse or promote products derived
17.\" from this software without specific, prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30.\" SUCH DAMAGE.
31.\"
aacaa523 32.Dd April 19, 2011
5025869b 33.Dt HAMMER 5
aacaa523 34.Os
5025869b
SW
35.Sh NAME
36.Nm HAMMER
37.Nd HAMMER file system
38.Sh SYNOPSIS
f9f627d2
SW
39To compile this driver into the kernel,
40place the following line in your
41kernel configuration file:
42.Bd -ragged -offset indent
aacaa523 43.Cd "options HAMMER"
f9f627d2
SW
44.Ed
45.Pp
46Alternatively, to load the driver as a
47module at boot time, place the following line in
48.Xr loader.conf 5 :
49.Bd -literal -offset indent
50hammer_load="YES"
51.Ed
5025869b 52.Pp
cd8f292b 53To mount via
5025869b 54.Xr fstab 5 :
cd8f292b
SW
55.Bd -literal -offset indent
56/dev/ad0s1d[:/dev/ad1s1d:...] /mnt hammer rw 2 0
5025869b
SW
57.Ed
58.Sh DESCRIPTION
59The
60.Nm
4e3c62a3 61file system provides facilities to store file system data onto disk devices
738141e7
SW
62and is intended to replace
63.Xr ffs 5
64as the default file system for
5025869b 65.Dx .
aacaa523 66.Pp
4e3c62a3
TN
67Among its features are instant crash recovery,
68large file systems spanning multiple volumes,
5329a464 69data integrity checking,
aacaa523
TN
70data deduplication,
71fine grained history retention and snapshots,
72pseudo-filesystems (PFSs),
73mirroring capability and
74unlimited number of files and links.
5025869b 75.Pp
ab3617ee 76All functions related to managing
5025869b 77.Nm
ab3617ee
SW
78file systems are provided by the
79.Xr newfs_hammer 8 ,
80.Xr mount_hammer 8 ,
81.Xr hammer 8 ,
aacaa523 82.Xr sysctl 8 ,
a0ed9ee2 83.Xr chflags 1 ,
5025869b
SW
84and
85.Xr undo 1
86utilities.
738141e7
SW
87.Pp
88For a more detailed introduction refer to the paper and slides listed in the
89.Sx SEE ALSO
90section.
91For some common usages of
92.Nm
93see the
94.Sx EXAMPLES
95section below.
aacaa523
TN
96.Pp
97Description of
98.Nm
99features:
bc7579a1
SW
100.Ss Instant Crash Recovery
101After a non-graceful system shutdown,
102.Nm
103file systems will be brought back into a fully coherent state
104when mounting the file system, usually within a few seconds.
aacaa523
TN
105.Pp
106Related commands:
107.Xr mount_hammer 8
bc7579a1
SW
108.Ss Large File Systems & Multi Volume
109A
110.Nm
a0ed9ee2
TN
111file system can be up to 1 Exabyte in size.
112It can span up to 256 volumes,
113each volume occupies a
bc7579a1
SW
114.Dx
115disk slice or partition, or another special file,
116and can be up to 4096 TB in size.
a0ed9ee2
TN
117Minimum recommended
118.Nm
119file system size is 50 GB.
bc7579a1
SW
120For volumes over 2 TB in size
121.Xr gpt 8
122and
123.Xr disklabel64 8
124normally need to be used.
aacaa523
TN
125.Pp
126Related
127.Xr hammer 8
128commands:
129.Cm volume-add ,
130.Cm volume-del ,
131.Cm volume-list ;
132see also
133.Xr newfs_hammer 8
5329a464
TN
134.Ss Data Integrity Checking
135.Nm
136has high focus on data integrity,
137CRC checks are made for all major structures and data.
138.Nm
139snapshots implements features to make data integrity checking easier:
a0ed9ee2
TN
140The atime and mtime fields are locked to the ctime
141for files accessed via a snapshot.
5329a464
TN
142The
143.Fa st_dev
144field is based on the PFS
145.Ar shared-uuid
146and not on any real device.
a0ed9ee2 147This means that archiving the contents of a snapshot with e.g.\&
5329a464
TN
148.Xr tar 1
149and piping it to something like
150.Xr md5 1
151will yield a consistent result.
152The consistency is also retained on mirroring targets.
aacaa523
TN
153.Ss Data Deduplication
154To save disk space data deduplication can be used.
155Data deduplication will identify data blocks which occur multiple times
156and only store one copy, multiple reference will be made to this copy.
157.Pp
158Related
159.Xr hammer 8
160commands:
161.Cm dedup ,
162.Cm dedup-simulate ,
163.Cm cleanup ,
164.Cm config
5025869b
SW
165.Ss Transaction IDs
166The
167.Nm
aacaa523 168file system uses 64-bit transaction ids to refer to historical
5025869b 169file or directory data.
aacaa523
TN
170Transaction ids used by
171.Nm
172are monotonically increasing over time.
173In other words:
174when a transaction is made,
175.Nm
176will always use higher transaction ids for following transactions.
177A transaction id is given in hexadecimal format
178.Li 0x016llx ,
5025869b
SW
179such as
180.Li 0x00000001061a8ba6 .
0257b9da
SW
181.Pp
182Related
183.Xr hammer 8
184commands:
aacaa523
TN
185.Cm snapshot ,
186.Cm snap ,
187.Cm snaplo ,
188.Cm snapq ,
189.Cm snapls ,
190.Cm synctid
bc7579a1
SW
191.Ss History & Snapshots
192History metadata on the media is written with every sync operation, so that
193by default the resolution of a file's history is 30-60 seconds until the next
194prune operation.
aacaa523
TN
195Prior versions of files and directories are generally accessible by appending
196.Ql @@
197and a transaction id to the name.
bc7579a1
SW
198The common way of accessing history, however, is by taking snapshots.
199.Pp
200Snapshots are softlinks to prior versions of directories and their files.
e328ac93 201Their data will be retained across prune operations for as long as the
bc7579a1
SW
202softlink exists.
203Removing the softlink enables the file system to reclaim the space
204again upon the next prune & reblock operations.
aacaa523
TN
205In
206.Nm
207Version 3+ snapshots are also maintained as file system meta-data.
0257b9da
SW
208.Pp
209Related
210.Xr hammer 8
211commands:
aacaa523
TN
212.Cm cleanup ,
213.Cm history ,
214.Cm snapshot ,
215.Cm snap ,
216.Cm snaplo ,
217.Cm snapq ,
218.Cm snaprm ,
219.Cm snapls ,
220.Cm config ,
221.Cm viconfig ;
f704fe91
TN
222see also
223.Xr undo 1
224.Ss Pruning & Reblocking
bc7579a1 225Pruning is the act of deleting file system history.
a0ed9ee2
TN
226By default only history used by the given snapshots
227and history from after the latest snapshot will be retained.
228By setting the per PFS parameter
229.Cm prune-min ,
230history is guaranteed to be saved at least this time interval.
bc7579a1 231All other history is deleted.
f704fe91
TN
232Reblocking will reorder all elements and thus defragment the file system and
233free space for reuse.
234After pruning a file system must be reblocked to recover all available space.
5329a464 235Reblocking is needed even when using the
aacaa523 236.Cm nohistory
5329a464 237.Xr mount_hammer 8
a0ed9ee2
TN
238option or
239.Xr chflags 1
240flag.
0257b9da
SW
241.Pp
242Related
243.Xr hammer 8
244commands:
aacaa523
TN
245.Cm cleanup ,
246.Cm snapshot ,
247.Cm prune ,
248.Cm prune-everything ,
249.Cm rebalance ,
250.Cm reblock ,
251.Cm reblock-btree ,
252.Cm reblock-inodes ,
253.Cm reblock-dirs ,
254.Cm reblock-data
255.Ss Pseudo-Filesystems (PFSs)
256A pseudo-filesystem, PFS for short, is a sub file system in a
257.Nm
258file system.
259Each PFS has independent inode numbers.
260All disk space in a
261.Nm
262file system is shared between all PFSs in it,
263so each PFS is free to use all remaining space.
0257b9da
SW
264A
265.Nm
aacaa523
TN
266file system supports up to 65536 PFSs.
267The root of a
268.Nm
269file system is PFS# 0, it is called the root PFS and is always a master PFS.
270.Pp
271A PFS can be either master or slave.
272Slaves are always read-only,
273so they can't be updated by normal file operations, only by
274.Xr hammer 8
275operations like mirroring and pruning.
0257b9da
SW
276Upgrading slaves to masters and downgrading masters to slaves are supported.
277.Pp
5329a464
TN
278It is recommended to use a
279.Nm null
aacaa523 280mount to access a PFS, except for root PFS;
5329a464
TN
281this way no tools are confused by the PFS root being a symlink
282and inodes not being unique across a
283.Nm
284file system.
285.Pp
aacaa523
TN
286Many
287.Xr hammer 8
288operations operates per PFS,
289this includes mirroring, offline deduping, pruning, reblocking and rebalancing.
290.Pp
291Related
292.Xr hammer 8
293commands:
294.Cm pfs-master ,
295.Cm pfs-slave ,
296.Cm pfs-status ,
297.Cm pfs-update ,
298.Cm pfs-destroy ,
299.Cm pfs-upgrade ,
300.Cm pfs-downgrade ;
301see also
302.Xr mount_null 8
303.Ss Mirroring
304Mirroring is copying of all data in a file system, including snapshots
305and other historical data.
306In order to allow inode numbers to be duplicated on the slaves
307.Nm
308mirroring feature uses PFSs.
309A master or slave PFS can be mirrored to a slave PFS.
310I.e.\& for mirroring multiple slaves per master are supported,
311but multiple masters per slave are not.
312.Pp
0257b9da
SW
313Related
314.Xr hammer 8
315commands:
aacaa523
TN
316.Cm mirror-copy ,
317.Cm mirror-stream ,
318.Cm mirror-read ,
319.Cm mirror-read-stream ,
320.Cm mirror-write ,
321.Cm mirror-dump
322.Ss Fsync Flush Modes
323The
324.Nm
325file system implements several different
326.Fn fsync
327flush modes, the mode used is set via the
328.Va vfs.hammer.flush_mode
329sysctl, see
330.Xr hammer 8
331for details.
332.Ss Unlimited Number of Files and Links
333There is no limit on the number of files or links in a
334.Nm
335file system, apart from available disk space.
5329a464
TN
336.Ss NFS Export
337.Nm
338file systems support NFS export.
cb9cae46
TN
339NFS export of PFSs is done using
340.Nm null
aacaa523
TN
341mounts (for file/directory in root PFS
342.Nm null
343mount is not needed).
738141e7 344For example, to export the PFS
5329a464 345.Pa /hammer/pfs/data ,
738141e7 346create a
cb9cae46
TN
347.Nm null
348mount, e.g.\& to
5329a464
TN
349.Pa /hammer/data
350and export the latter path.
351.Pp
352Don't export a directory containing a PFS (e.g.\&
353.Pa /hammer/pfs
354above).
355Only
356.Nm null
357mount for PFS root
358(e.g.\&
359.Pa /hammer/data
aacaa523
TN
360above) should be exported (subdirectory may be escaped if exported).
361.Ss File System Versions
362As new features have been introduced to
363.Nm
364a version number has been bumped.
365Each
366.Nm
367file system has a version, which can be upgraded to support new features.
368.Pp
369Related
370.Xr hammer 8
371commands:
372.Cm version ,
373.Cm version-upgrade ;
374see also
375.Xr newfs_hammer 8
ab3617ee 376.Sh EXAMPLES
4e3c62a3 377.Ss Preparing the File System
ab3617ee
SW
378To create and mount a
379.Nm
380file system use the
381.Xr newfs_hammer 8
382and
383.Xr mount_hammer 8
384commands.
385Note that all
386.Nm
387file systems must have a unique name on a per-machine basis.
5329a464 388.Bd -literal -offset indent
738141e7 389newfs_hammer -L HOME /dev/ad0s1d
ab3617ee
SW
390mount_hammer /dev/ad0s1d /home
391.Ed
392.Pp
cd8f292b
SW
393Similarly, multi volume file systems can be created and mounted by
394specifying additional arguments.
5329a464 395.Bd -literal -offset indent
738141e7 396newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d
ab3617ee
SW
397mount_hammer /dev/ad0s1d /dev/ad1s1d /home
398.Ed
399.Pp
400Once created and mounted,
401.Nm
e0331f4f 402file systems need periodic clean up making snapshots, pruning and reblocking,
5329a464
TN
403in order to have access to history and file system not to fill up.
404For this it is recommended to use the
405.Xr hammer 8
aacaa523 406.Cm cleanup
e0331f4f 407metacommand.
738141e7 408.Pp
e0331f4f
SW
409By default,
410.Dx
411is set up to run
aacaa523 412.Nm hammer Cm cleanup
e0331f4f
SW
413nightly via
414.Xr periodic 8 .
5329a464 415.Pp
738141e7
SW
416It is also possible to perform these operations individually via
417.Xr crontab 5 .
5329a464 418For example, to reblock the
4e3c62a3
TN
419.Pa /home
420file system every night at 2:15 for up to 5 minutes:
5329a464 421.Bd -literal -offset indent
738141e7 42215 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e
5329a464 423 >/dev/null 2>&1
ab3617ee 424.Ed
e328ac93
SW
425.Ss Snapshots
426The
427.Xr hammer 8
428utility's
aacaa523 429.Cm snapshot
e328ac93
SW
430command provides several ways of taking snapshots.
431They all assume a directory where snapshots are kept.
5329a464 432.Bd -literal -offset indent
e328ac93
SW
433mkdir /snaps
434hammer snapshot /home /snaps/snap1
0257b9da
SW
435(...after some changes in /home...)
436hammer snapshot /home /snaps/snap2
437.Ed
bc7579a1
SW
438.Pp
439The softlinks in
440.Pa /snaps
441point to the state of the
442.Pa /home
443directory at the time each snapshot was taken, and could now be used to copy
444the data somewhere else for backup purposes.
738141e7
SW
445.Pp
446By default,
447.Dx
448is set up to create nightly snapshots of all
449.Nm
450file systems via
451.Xr periodic 8
452and to keep them for 60 days.
0257b9da
SW
453.Ss Pruning
454A snapshot directory is also the argument to the
aacaa523
TN
455.Xr hammer 8
456.Cm prune
bc7579a1 457command which frees historical data from the file system that is not
aacaa523
TN
458pointed to by any snapshot link and is not from after the latest snapshot
459and is older than
460.Cm prune-min .
5329a464 461.Bd -literal -offset indent
0257b9da
SW
462rm /snaps/snap1
463hammer prune /snaps
464.Ed
0257b9da 465.Ss Mirroring
aacaa523
TN
466Mirroring is set up using
467.Nm
468pseudo-filesystems (PFSs).
0257b9da
SW
469To associate the slave with the master its shared UUID should be set to
470the master's shared UUID as output by the
aacaa523 471.Nm hammer Cm pfs-master
0257b9da 472command.
5329a464
TN
473.Bd -literal -offset indent
474hammer pfs-master /home/pfs/master
475hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid>
0257b9da
SW
476.Ed
477.Pp
478The
cb9cae46 479.Pa /home/pfs/slave
0257b9da
SW
480link is unusable for as long as no mirroring operation has taken place.
481.Pp
482To mirror the master's data, either pipe a
aacaa523 483.Cm mirror-read
0257b9da 484command into a
aacaa523 485.Cm mirror-write
0257b9da 486or, as a short-cut, use the
aacaa523 487.Cm mirror-copy
0257b9da
SW
488command (which works across a
489.Xr ssh 1
490connection as well).
cb9cae46
TN
491Initial mirroring operation has to be done to the PFS path (as
492.Xr mount_null 8
493can't access it yet).
5329a464 494.Bd -literal -offset indent
cb9cae46
TN
495hammer mirror-copy /home/pfs/master /home/pfs/slave
496.Ed
497.Pp
aacaa523
TN
498It is also possible to have the target PFS auto created
499by just issuing the same
500.Cm mirror-copy
501command, if the target PFS doesn't exist you will be prompted
502if you would like to create it.
503You can even omit the prompting by using the
504.Fl y
505flag:
506.Bd -literal -offset indent
507hammer -y mirror-copy /home/pfs/master /home/pfs/slave
508.Ed
509.Pp
cb9cae46
TN
510After this initial step
511.Nm null
512mount can be setup for
513.Pa /home/pfs/slave .
514Further operations can use
515.Nm null
516mounts.
517.Bd -literal -offset indent
518mount_null /home/pfs/master /home/master
519mount_null /home/pfs/slave /home/slave
520
0257b9da 521hammer mirror-copy /home/master /home/slave
e328ac93 522.Ed
5329a464
TN
523.Ss NFS Export
524To NFS export from the
525.Nm
526file system
527.Pa /hammer
528the directory
529.Pa /hammer/non-pfs
530without PFSs, and the PFS
531.Pa /hammer/pfs/data ,
aacaa523
TN
532the latter is
533.Nm null
534mounted to
5329a464
TN
535.Pa /hammer/data .
536.Pp
537Add to
538.Pa /etc/fstab
539(see
540.Xr fstab 5 ) :
541.Bd -literal -offset indent
542/hammer/pfs/data /hammer/data null rw
543.Ed
544.Pp
545Add to
546.Pa /etc/exports
547(see
548.Xr exports 5 ) :
549.Bd -literal -offset indent
550/hammer/non-pfs
551/hammer/data
552.Ed
aacaa523
TN
553.Sh DIAGNOSTICS
554.Bl -diag
555.It "hammer: System has insuffient buffers to rebalance the tree. nbuf < %d"
556Rebalancing a
557.Nm
558PFS uses quite a bit of memory and
559can't be done on low memory systems.
560It has been reported to fail on 512MB systems.
561Rebalancing isn't critical for
562.Nm
563file system operation;
564it is done by
565.Nm hammer
566.Cm rebalance ,
567often as part of
568.Nm hammer
569.Cm cleanup .
570.El
5025869b 571.Sh SEE ALSO
a0ed9ee2 572.Xr chflags 1 ,
5329a464
TN
573.Xr md5 1 ,
574.Xr tar 1 ,
5025869b 575.Xr undo 1 ,
a0ed9ee2 576.Xr exports 5 ,
738141e7 577.Xr ffs 5 ,
a0ed9ee2 578.Xr fstab 5 ,
4e3c62a3
TN
579.Xr disklabel64 8 ,
580.Xr gpt 8 ,
5025869b
SW
581.Xr hammer 8 ,
582.Xr mount_hammer 8 ,
5329a464 583.Xr mount_null 8 ,
aacaa523
TN
584.Xr newfs_hammer 8 ,
585.Xr periodic 8 ,
586.Xr sysctl 8
5025869b
SW
587.Rs
588.%A Matthew Dillon
589.%D June 2008
738141e7 590.%O http://www.dragonflybsd.org/hammer/hammer.pdf
5025869b
SW
591.%T "The HAMMER Filesystem"
592.Re
738141e7
SW
593.Rs
594.%A Matthew Dillon
595.%D October 2008
596.%O http://www.dragonflybsd.org/hammer/nycbsdcon/
597.%T "Slideshow from NYCBSDCon 2008"
598.Re
2b9eb799
SW
599.Rs
600.%A Michael Neumann
601.%D January 2010
602.%O http://www.ntecs.de/sysarch09/HAMMER.pdf
aacaa523 603.%T "Slideshow for a presentation held at KIT (http://www.kit.edu)"
2b9eb799 604.Re
eab82c57
MD
605.Sh FILESYSTEM PERFORMANCE
606The
607.Nm
608file system has a front-end which processes VNOPS and issues necessary
609block reads from disk, and a back-end which handles meta-data updates
a0ed9ee2
TN
610on-media and performs all meta-data write operations.
611Bulk file write operations are handled by the front-end.
eab82c57
MD
612Because
613.Nm
614defers meta-data updates virtually no meta-data read operations will be
a0ed9ee2 615issued by the frontend while writing large amounts of data to the file system
eab82c57
MD
616or even when creating new files or directories, and even though the
617kernel prioritizes reads over writes the fact that writes are cached by
618the drive itself tends to lead to excessive priority given to writes.
619.Pp
a0ed9ee2
TN
620There are four bioq sysctls, shown below with default values,
621which can be adjusted to give reads a higher priority:
eab82c57
MD
622.Bd -literal -offset indent
623kern.bioq_reorder_minor_bytes: 262144
624kern.bioq_reorder_burst_bytes: 3000000
625kern.bioq_reorder_minor_interval: 5
626kern.bioq_reorder_burst_interval: 60
627.Ed
628.Pp
629If a higher read priority is desired it is recommended that the
aacaa523 630.Va kern.bioq_reorder_minor_interval
eab82c57 631be increased to 15, 30, or even 60, and the
aacaa523 632.Va kern.bioq_reorder_burst_bytes
eab82c57 633be decreased to 262144 or 524288.
5025869b
SW
634.Sh HISTORY
635The
636.Nm
637file system first appeared in
638.Dx 1.11 .
639.Sh AUTHORS
640.An -nosplit
641The
642.Nm
643file system was designed and implemented by
aacaa523
TN
644.An Matthew Dillon Aq dillon@backplane.com ,
645data deduplication was added by
646.An Ilya Dryomov .
5025869b 647This manual page was written by
aacaa523
TN
648.An Sascha Wildner
649and updated by
650.An Thomas Nikolajsen .
651.Sh CAVEATS
652Data deduplication is considered experimental.