1 $DragonFly: src/sys/vfs/hammer/Attic/hammer.txt,v 1.1 2007/10/10 19:37:25 dillon Exp $
5 (I) General Storage Abstraction
7 HAMMER uses a basic 16K filesystem buffer for all I/O. Buffers are
8 collected into clusters, cluster are collected into volumes, and a
9 single HAMMER filesystem may span multiple volumes.
11 HAMMER maintains a small hinted radix tree for block management in
12 each layer. A small radix tree in the volume header manages cluster
13 allocations within a volume, one in the cluster header manages buffer
14 allocations within a cluster, and most buffers (pure data buffers
15 excepted) will embed a small tree to manage item allocations within
18 Volumes are typically specified as disk partitions, with one volume
19 designated as the root volume containing the root cluster. The root
20 cluster does not need to be contained in volume 0 nor does it have to
21 be located at any particular offset.
23 Data can be migrated on a cluster-by-cluster or volume-by-volume basis
24 and any given volume may be expanded or contracted while the filesystem
25 is live. Whole volumes can be added and (with appropriate data
28 HAMMER's storage management limits it to 32768 volumes, 32768 clusters
29 per volume, and 32768 16K filesystem buffers per cluster. A volume
30 is thus limited to 16TB and a HAMMER filesystem as a whole is limited
31 to 524288TB. HAMMER's on-disk structures are designed to allow future
32 expansion through expansion of these limits. In particular, the volume
33 id is intended to be expanded to a full 32 bits in the future and using
34 a larger buffer size will also greatly increase the cluster and volume
35 size limitations by increasing the number of elements the buffer-
36 restricted radix trees can manage.
38 HAMMER breaks all of its information down into objects and records.
39 Records have a creation and deletion transaction id which allows HAMMER
40 to maintain a historical store. Information is only physically deleted
41 based on the data retention policy. Those portions of the data retention
42 policy affecting near-term modifications may be acted upon by the live
43 filesystem but all historical vacuuming is handled by a helper process.
45 All information in a HAMMER filesystem is CRCd to detect corruption.
47 (II) Filesystem Object Topology
49 The objects and records making up a HAMMER filesystem is organized into
50 a single, unified B-Tree. Each cluster maintains a B-Tree of the
51 records contained in that cluster and a unified B-Tree is constructed by
52 linking clusters together. HAMMER issues PUSH and PULL operations
53 internally to open up space for new records and to balance the global
54 B-Tree. These operations may have the side effect of allocating
55 new clusters or freeing clusters which become unused.
57 B-Tree operations tend to be limited to a single cluster. That is,
58 the B-Tree insertion and deletion algorithm is not extended to the
59 whole unified tree. If insufficient space exists in a cluster HAMMER
60 will allocate a new cluster, PUSH a portion of the existing
61 cluster's record store to the new cluster, and link the existing
62 cluster's B-Tree to the new one.
64 Because B-Tree operations tend to be restricted and because HAMMER tries
65 to avoid balancing clusters in the critical path, HAMMER employs a
66 background process to keep the topology as a whole in balance. One
67 side effect of this is that HAMMER is fairly loose when it comes to
68 inserting new clusters into the topology.
70 HAMMER objects revolve around the concept of an object identifier.
71 The obj_id is a 64 bit quantity which uniquely identifies a filesystem
72 object for the entire life of the filesystem. This uniqueness allows
73 backups and mirrors to retain varying amounts of filesystem history by
74 removing any possibility of conflict through identifier reuse. HAMMER
75 typically iterates object identifiers sequentially and expects to never
76 run out. At a creation rate of 100,000 objects per second it would
77 take HAMMER around 6 million years to run out of identifier space.
78 The characteristics of the HAMMER obj_id also allow HAMMER to operate
79 in a multi-master clustered environment.
81 A filesystem object is made up of records. Each record references a
82 variable-length store of related data, a 64 bit key, and a creation
83 and deletion transaction id which is indexed along with the key.
85 HAMMER utilizes a 64 bit key to index all records. Regular files use
86 the base data offset of the record as the key while directories use a
87 namekey hash as the key and store one directory entry per record. For
88 all intents and purposes a directory can store an unlimited number of
91 HAMMER is also capable of associating any number of out-of-band
92 attributes with a filesystem object using a separate key space. This
93 key space may be used for extended attributes, ACLs, and anything else
96 (III) Access to historical information
98 A HAMMER filesystem can be mounted with an as-of date to access a
99 snapshot of the system. Snapshots do not have to be explicitly taken
100 but are instead based on the retention policy you specify for any
101 given HAMMER filesystem. It is also possible to access individual files
102 or directories (and their contents) using an as-of extension on the
105 HAMMER uses the transaction ids stored in records to present a snapshot
106 view of the filesystem as-of any time in the past, with a granularity
107 based on the retention policy chosen by the system administrator.
108 feature also effectively implements file versioning.
110 (IV) Mirrors and Backups
112 HAMMER is organized in a way that allows an information stream to be
113 generated for mirroring and backup purposes. This stream includes all
114 historical information available in the source. No queueing is required
115 so there is no limit to the number of mirrors or backups you can have
116 and no limit to how long any given mirror or backup can be taken offline.
117 Resynchronization of the stream is not considered to be an expensive
120 Mirrors and backups are maintained logically, not physically, and may
121 have their own, independant retention polcies. For example, your live
122 filesystem could have a fairly rough retention policy, even none at all,
123 then be streamed to an on-site backup and from there to an off-site
124 backup, each with different retention policies.
126 (V) Transactions and Recovery
128 HAMMER implement an instant-mount capability and will recover information
129 on a cluster-by-cluster basis as it is being accessed.
131 HAMMER numbers each record it lays down and stores a synchronization
132 point in the cluster header. Clusters are synchronously marked 'open'
133 when undergoing modification. If HAMMER encounters a cluster which is
134 unexpectedly marked open it will perform a recovery operation on the
135 cluster and throw away any records beyond the synchronization point.
137 HAMMER supports a userland transactional facility. Userland can query
138 the current (filesystem wide) transaction id, issue numerous operations
139 and on recovery can tell HAMMER to revert all records with a greater
140 transaction id for any particular set of files. Multiple userland
141 applications can use this feature simultaniously as long as the files
142 they are accessing do not overlap. It is also possible for userland
143 to set up an ordering dependancy and maintain completely asynchronous
144 operation while still being able to guarentee recovery to a fairly
145 recent transaction id.
149 HAMMER uses 64 bit keys internally and makes key-based files directly
150 available to userland. Key-based files are not regular files and do not
151 operate using a normal data offset space.
153 You cannot copy a database file using a regular file copier. The
154 file type will not be S_IFREG but instead will be S_IFDB. The file
155 must be opened with O_DATABASE. Reads which normally seek the file
156 forward will instead iterate through the records and lseek/qseek can
157 be used to acquire or set the key prior to the read/write operation.