hammer.5: Add a reference to mneumann's KIT presentation slides.
[dragonfly.git] / share / man / man5 / hammer.5
CommitLineData
5025869b
SW
1.\"
2.\" Copyright (c) 2008
3.\" The DragonFly Project. All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\"
9.\" 1. Redistributions of source code must retain the above copyright
10.\" notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\" notice, this list of conditions and the following disclaimer in
13.\" the documentation and/or other materials provided with the
14.\" distribution.
15.\" 3. Neither the name of The DragonFly Project nor the names of its
16.\" contributors may be used to endorse or promote products derived
17.\" from this software without specific, prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30.\" SUCH DAMAGE.
31.\"
2b9eb799 32.Dd April 8, 2010
5025869b
SW
33.Os
34.Dt HAMMER 5
35.Sh NAME
36.Nm HAMMER
37.Nd HAMMER file system
38.Sh SYNOPSIS
f9f627d2
SW
39To compile this driver into the kernel,
40place the following line in your
41kernel configuration file:
42.Bd -ragged -offset indent
5025869b 43.Cd options HAMMER
f9f627d2
SW
44.Ed
45.Pp
46Alternatively, to load the driver as a
47module at boot time, place the following line in
48.Xr loader.conf 5 :
49.Bd -literal -offset indent
50hammer_load="YES"
51.Ed
5025869b 52.Pp
cd8f292b 53To mount via
5025869b 54.Xr fstab 5 :
cd8f292b
SW
55.Bd -literal -offset indent
56/dev/ad0s1d[:/dev/ad1s1d:...] /mnt hammer rw 2 0
5025869b
SW
57.Ed
58.Sh DESCRIPTION
59The
60.Nm
4e3c62a3 61file system provides facilities to store file system data onto disk devices
738141e7
SW
62and is intended to replace
63.Xr ffs 5
64as the default file system for
5025869b 65.Dx .
4e3c62a3
TN
66Among its features are instant crash recovery,
67large file systems spanning multiple volumes,
5329a464 68data integrity checking,
4e3c62a3
TN
69fine grained history retention,
70mirroring capability, and pseudo file systems.
5025869b 71.Pp
ab3617ee 72All functions related to managing
5025869b 73.Nm
ab3617ee
SW
74file systems are provided by the
75.Xr newfs_hammer 8 ,
76.Xr mount_hammer 8 ,
77.Xr hammer 8 ,
a0ed9ee2 78.Xr chflags 1 ,
5025869b
SW
79and
80.Xr undo 1
81utilities.
738141e7
SW
82.Pp
83For a more detailed introduction refer to the paper and slides listed in the
84.Sx SEE ALSO
85section.
86For some common usages of
87.Nm
88see the
89.Sx EXAMPLES
90section below.
bc7579a1
SW
91.Ss Instant Crash Recovery
92After a non-graceful system shutdown,
93.Nm
94file systems will be brought back into a fully coherent state
95when mounting the file system, usually within a few seconds.
96.Ss Large File Systems & Multi Volume
97A
98.Nm
a0ed9ee2
TN
99file system can be up to 1 Exabyte in size.
100It can span up to 256 volumes,
101each volume occupies a
bc7579a1
SW
102.Dx
103disk slice or partition, or another special file,
104and can be up to 4096 TB in size.
a0ed9ee2
TN
105Minimum recommended
106.Nm
107file system size is 50 GB.
bc7579a1
SW
108For volumes over 2 TB in size
109.Xr gpt 8
110and
111.Xr disklabel64 8
112normally need to be used.
5329a464
TN
113.Ss Data Integrity Checking
114.Nm
115has high focus on data integrity,
116CRC checks are made for all major structures and data.
117.Nm
118snapshots implements features to make data integrity checking easier:
a0ed9ee2
TN
119The atime and mtime fields are locked to the ctime
120for files accessed via a snapshot.
5329a464
TN
121The
122.Fa st_dev
123field is based on the PFS
124.Ar shared-uuid
125and not on any real device.
a0ed9ee2 126This means that archiving the contents of a snapshot with e.g.\&
5329a464
TN
127.Xr tar 1
128and piping it to something like
129.Xr md5 1
130will yield a consistent result.
131The consistency is also retained on mirroring targets.
5025869b
SW
132.Ss Transaction IDs
133The
134.Nm
135file system uses 64 bit, hexadecimal transaction IDs to refer to historical
136file or directory data.
738141e7
SW
137An ID has the
138.Xr printf 3
139format
140.Li %#016llx ,
5025869b
SW
141such as
142.Li 0x00000001061a8ba6 .
0257b9da
SW
143.Pp
144Related
145.Xr hammer 8
146commands:
a0ed9ee2 147.Ar snapshot ,
0257b9da 148.Ar synctid
bc7579a1
SW
149.Ss History & Snapshots
150History metadata on the media is written with every sync operation, so that
151by default the resolution of a file's history is 30-60 seconds until the next
152prune operation.
153Prior versions of files or directories are generally accessible by appending
5025869b
SW
154.Li @@
155and a transaction ID to the name.
bc7579a1
SW
156The common way of accessing history, however, is by taking snapshots.
157.Pp
158Snapshots are softlinks to prior versions of directories and their files.
e328ac93 159Their data will be retained across prune operations for as long as the
bc7579a1
SW
160softlink exists.
161Removing the softlink enables the file system to reclaim the space
162again upon the next prune & reblock operations.
0257b9da
SW
163.Pp
164Related
165.Xr hammer 8
166commands:
738141e7 167.Ar cleanup ,
bc7579a1 168.Ar history ,
f704fe91
TN
169.Ar snapshot ;
170see also
171.Xr undo 1
172.Ss Pruning & Reblocking
bc7579a1 173Pruning is the act of deleting file system history.
a0ed9ee2
TN
174By default only history used by the given snapshots
175and history from after the latest snapshot will be retained.
176By setting the per PFS parameter
177.Cm prune-min ,
178history is guaranteed to be saved at least this time interval.
bc7579a1 179All other history is deleted.
f704fe91
TN
180Reblocking will reorder all elements and thus defragment the file system and
181free space for reuse.
182After pruning a file system must be reblocked to recover all available space.
5329a464
TN
183Reblocking is needed even when using the
184.Ar nohistory
185.Xr mount_hammer 8
a0ed9ee2
TN
186option or
187.Xr chflags 1
188flag.
0257b9da
SW
189.Pp
190Related
191.Xr hammer 8
192commands:
738141e7 193.Ar cleanup ,
a0ed9ee2 194.Ar snapshot ,
738141e7
SW
195.Ar prune ,
196.Ar prune-everything ,
a0ed9ee2 197.Ar rebalance ,
0257b9da
SW
198.Ar reblock ,
199.Ar reblock-btree ,
200.Ar reblock-inodes ,
201.Ar reblock-dirs ,
738141e7 202.Ar reblock-data
0257b9da
SW
203.Ss Mirroring & Pseudo File Systems
204In order to allow inode numbers to be duplicated on the slaves
205.Nm Ap s
206mirroring feature uses
207.Dq Pseudo File Systems
208(PFSs).
209A
210.Nm
f704fe91 211file system supports up to 65535 PFSs.
0257b9da
SW
212Multiple slaves per master are supported, but multiple masters per slave
213are not.
214Slaves are always read-only.
215Upgrading slaves to masters and downgrading masters to slaves are supported.
216.Pp
5329a464
TN
217It is recommended to use a
218.Nm null
219mount to access a PFS;
220this way no tools are confused by the PFS root being a symlink
221and inodes not being unique across a
222.Nm
223file system.
224.Pp
0257b9da
SW
225Related
226.Xr hammer 8
227commands:
228.Ar pfs-master ,
229.Ar pfs-slave ,
5329a464 230.Ar pfs-cleanup ,
0257b9da
SW
231.Ar pfs-status ,
232.Ar pfs-update ,
233.Ar pfs-destroy ,
234.Ar pfs-upgrade ,
235.Ar pfs-downgrade ,
236.Ar mirror-copy ,
f704fe91 237.Ar mirror-stream ,
0257b9da 238.Ar mirror-read ,
f704fe91 239.Ar mirror-read-stream ,
0257b9da
SW
240.Ar mirror-write ,
241.Ar mirror-dump
5329a464
TN
242.Ss NFS Export
243.Nm
244file systems support NFS export.
cb9cae46
TN
245NFS export of PFSs is done using
246.Nm null
738141e7
SW
247mounts.
248For example, to export the PFS
5329a464 249.Pa /hammer/pfs/data ,
738141e7 250create a
cb9cae46
TN
251.Nm null
252mount, e.g.\& to
5329a464
TN
253.Pa /hammer/data
254and export the latter path.
255.Pp
256Don't export a directory containing a PFS (e.g.\&
257.Pa /hammer/pfs
258above).
259Only
260.Nm null
261mount for PFS root
262(e.g.\&
263.Pa /hammer/data
264above)
265should be exported
266(subdirectory may be escaped if exported).
ab3617ee 267.Sh EXAMPLES
4e3c62a3 268.Ss Preparing the File System
ab3617ee
SW
269To create and mount a
270.Nm
271file system use the
272.Xr newfs_hammer 8
273and
274.Xr mount_hammer 8
275commands.
276Note that all
277.Nm
278file systems must have a unique name on a per-machine basis.
5329a464 279.Bd -literal -offset indent
738141e7 280newfs_hammer -L HOME /dev/ad0s1d
ab3617ee
SW
281mount_hammer /dev/ad0s1d /home
282.Ed
283.Pp
cd8f292b
SW
284Similarly, multi volume file systems can be created and mounted by
285specifying additional arguments.
5329a464 286.Bd -literal -offset indent
738141e7 287newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d
ab3617ee
SW
288mount_hammer /dev/ad0s1d /dev/ad1s1d /home
289.Ed
290.Pp
291Once created and mounted,
292.Nm
e0331f4f 293file systems need periodic clean up making snapshots, pruning and reblocking,
5329a464
TN
294in order to have access to history and file system not to fill up.
295For this it is recommended to use the
296.Xr hammer 8
297.Ar cleanup
e0331f4f 298metacommand.
738141e7 299.Pp
e0331f4f
SW
300By default,
301.Dx
302is set up to run
303.Nm hammer Ar cleanup
304nightly via
305.Xr periodic 8 .
5329a464 306.Pp
738141e7
SW
307It is also possible to perform these operations individually via
308.Xr crontab 5 .
5329a464 309For example, to reblock the
4e3c62a3
TN
310.Pa /home
311file system every night at 2:15 for up to 5 minutes:
5329a464 312.Bd -literal -offset indent
738141e7 31315 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e
5329a464 314 >/dev/null 2>&1
ab3617ee 315.Ed
e328ac93
SW
316.Ss Snapshots
317The
318.Xr hammer 8
319utility's
320.Ar snapshot
321command provides several ways of taking snapshots.
322They all assume a directory where snapshots are kept.
5329a464 323.Bd -literal -offset indent
e328ac93
SW
324mkdir /snaps
325hammer snapshot /home /snaps/snap1
0257b9da
SW
326(...after some changes in /home...)
327hammer snapshot /home /snaps/snap2
328.Ed
bc7579a1
SW
329.Pp
330The softlinks in
331.Pa /snaps
332point to the state of the
333.Pa /home
334directory at the time each snapshot was taken, and could now be used to copy
335the data somewhere else for backup purposes.
738141e7
SW
336.Pp
337By default,
338.Dx
339is set up to create nightly snapshots of all
340.Nm
341file systems via
342.Xr periodic 8
343and to keep them for 60 days.
0257b9da
SW
344.Ss Pruning
345A snapshot directory is also the argument to the
346.Xr hammer 8 Ap s
347.Ar prune
bc7579a1
SW
348command which frees historical data from the file system that is not
349pointed to by any snapshot link and is not from after the latest snapshot.
5329a464 350.Bd -literal -offset indent
0257b9da
SW
351rm /snaps/snap1
352hammer prune /snaps
353.Ed
0257b9da
SW
354.Ss Mirroring
355Mirroring can be set up using
356.Nm Ap s
357pseudo file systems.
358To associate the slave with the master its shared UUID should be set to
359the master's shared UUID as output by the
360.Nm hammer Ar pfs-master
361command.
5329a464
TN
362.Bd -literal -offset indent
363hammer pfs-master /home/pfs/master
364hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid>
0257b9da
SW
365.Ed
366.Pp
367The
cb9cae46 368.Pa /home/pfs/slave
0257b9da
SW
369link is unusable for as long as no mirroring operation has taken place.
370.Pp
371To mirror the master's data, either pipe a
372.Fa mirror-read
373command into a
374.Fa mirror-write
375or, as a short-cut, use the
376.Fa mirror-copy
377command (which works across a
378.Xr ssh 1
379connection as well).
cb9cae46
TN
380Initial mirroring operation has to be done to the PFS path (as
381.Xr mount_null 8
382can't access it yet).
5329a464 383.Bd -literal -offset indent
cb9cae46
TN
384hammer mirror-copy /home/pfs/master /home/pfs/slave
385.Ed
386.Pp
387After this initial step
388.Nm null
389mount can be setup for
390.Pa /home/pfs/slave .
391Further operations can use
392.Nm null
393mounts.
394.Bd -literal -offset indent
395mount_null /home/pfs/master /home/master
396mount_null /home/pfs/slave /home/slave
397
0257b9da 398hammer mirror-copy /home/master /home/slave
e328ac93 399.Ed
5329a464
TN
400.Ss NFS Export
401To NFS export from the
402.Nm
403file system
404.Pa /hammer
405the directory
406.Pa /hammer/non-pfs
407without PFSs, and the PFS
408.Pa /hammer/pfs/data ,
409the latter is null mounted to
410.Pa /hammer/data .
411.Pp
412Add to
413.Pa /etc/fstab
414(see
415.Xr fstab 5 ) :
416.Bd -literal -offset indent
417/hammer/pfs/data /hammer/data null rw
418.Ed
419.Pp
420Add to
421.Pa /etc/exports
422(see
423.Xr exports 5 ) :
424.Bd -literal -offset indent
425/hammer/non-pfs
426/hammer/data
427.Ed
5025869b 428.Sh SEE ALSO
a0ed9ee2 429.Xr chflags 1 ,
5329a464
TN
430.Xr md5 1 ,
431.Xr tar 1 ,
5025869b 432.Xr undo 1 ,
a0ed9ee2 433.Xr exports 5 ,
738141e7 434.Xr ffs 5 ,
a0ed9ee2 435.Xr fstab 5 ,
4e3c62a3
TN
436.Xr disklabel64 8 ,
437.Xr gpt 8 ,
5025869b
SW
438.Xr hammer 8 ,
439.Xr mount_hammer 8 ,
5329a464 440.Xr mount_null 8 ,
5025869b
SW
441.Xr newfs_hammer 8
442.Rs
443.%A Matthew Dillon
444.%D June 2008
738141e7 445.%O http://www.dragonflybsd.org/hammer/hammer.pdf
5025869b
SW
446.%T "The HAMMER Filesystem"
447.Re
738141e7
SW
448.Rs
449.%A Matthew Dillon
450.%D October 2008
451.%O http://www.dragonflybsd.org/hammer/nycbsdcon/
452.%T "Slideshow from NYCBSDCon 2008"
453.Re
2b9eb799
SW
454.Rs
455.%A Michael Neumann
456.%D January 2010
457.%O http://www.ntecs.de/sysarch09/HAMMER.pdf
458.%T "Slideshow for a presentation held at KIT (http://www.kit.edu)."
459.Re
eab82c57
MD
460.Sh FILESYSTEM PERFORMANCE
461The
462.Nm
463file system has a front-end which processes VNOPS and issues necessary
464block reads from disk, and a back-end which handles meta-data updates
a0ed9ee2
TN
465on-media and performs all meta-data write operations.
466Bulk file write operations are handled by the front-end.
eab82c57
MD
467Because
468.Nm
469defers meta-data updates virtually no meta-data read operations will be
a0ed9ee2 470issued by the frontend while writing large amounts of data to the file system
eab82c57
MD
471or even when creating new files or directories, and even though the
472kernel prioritizes reads over writes the fact that writes are cached by
473the drive itself tends to lead to excessive priority given to writes.
474.Pp
a0ed9ee2
TN
475There are four bioq sysctls, shown below with default values,
476which can be adjusted to give reads a higher priority:
eab82c57
MD
477.Bd -literal -offset indent
478kern.bioq_reorder_minor_bytes: 262144
479kern.bioq_reorder_burst_bytes: 3000000
480kern.bioq_reorder_minor_interval: 5
481kern.bioq_reorder_burst_interval: 60
482.Ed
483.Pp
484If a higher read priority is desired it is recommended that the
485.Fa kern.bioq_reorder_minor_interval
486be increased to 15, 30, or even 60, and the
487.Fa kern.bioq_reorder_burst_bytes
488be decreased to 262144 or 524288.
5025869b
SW
489.Sh HISTORY
490The
491.Nm
492file system first appeared in
493.Dx 1.11 .
494.Sh AUTHORS
495.An -nosplit
496The
497.Nm
498file system was designed and implemented by
499.An Matthew Dillon Aq dillon@backplane.com .
500This manual page was written by
501.An Sascha Wildner .