2 * Copyright (c) 2011-2012 The DragonFly Project. All rights reserved.
4 * This code is derived from software contributed to The DragonFly Project
5 * by Matthew Dillon <dillon@dragonflybsd.org>
7 * Redistribution and use in source and binary forms, with or without
8 * modification, are permitted provided that the following conditions
11 * 1. Redistributions of source code must retain the above copyright
12 * notice, this list of conditions and the following disclaimer.
13 * 2. Redistributions in binary form must reproduce the above copyright
14 * notice, this list of conditions and the following disclaimer in
15 * the documentation and/or other materials provided with the
17 * 3. Neither the name of The DragonFly Project nor the names of its
18 * contributors may be used to endorse or promote products derived
19 * from this software without specific, prior written permission.
21 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
24 * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
25 * COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
26 * INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
27 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
28 * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
29 * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
30 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
31 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
38 #ifndef _SYS_MALLOC_H_
39 #include <sys/malloc.h>
49 * Mesh network protocol structures.
53 * The mesh is constructed from point-to-point streaming links with varying
54 * levels of interconnectedness, forming a graph. Terminii in the graph
55 * are entities such as a HAMMER2 PFS or a network mount or other types
58 * The spanning tree protocol runs symmetrically on every node. Each node
59 * transmits a representitive LNK_SPAN out all available connections. Nodes
60 * also receive LNK_SPANs from other nodes (obviously), and must aggregate,
61 * reduce, and relay those LNK_SPANs out all available connections, thus
62 * propagating the spanning tree. Any connection failure or topology change
63 * causes changes in the LNK_SPAN propagation.
65 * Each LNK_SPAN or LNK_SPAN relay represents a virtual circuit for routing
66 * purposes. In addition, each relay is chained in one direction,
67 * representing a 1:N fan-out (i.e. one received LNK_SPAN can be relayed out
68 * multiple connections). In order to be able to route a message via a
69 * LNK_SPAN over a deterministic route THE MESSAGE CAN ONLY FLOW FROM A
70 * REMOTE NODE TOWARDS OUR NODE (N:1 fan-in).
72 * This supports the requirement that we have both message serialization
73 * and positive feedback if a topology change breaks the chain of VCs
74 * the message is flowing over. A remote node sending a message to us
75 * will get positive feedback that the route was broken and can take suitable
76 * action to terminate the transaction with an error.
78 * TRANSACTIONAL REPLIES
80 * However, when we receive a command message from a remote node and we want
81 * to reply to it, we have a problem. We want the remote node to have
82 * positive feedback if our reply fails to make it, but if we use a virtual
83 * circuit based on the remote node's LNK_SPAN to us it will be a DIFFERENT
84 * virtual circuit than the one the remote node used to message us. That's
85 * a problem because it means we have no reliable way to notify the remote
86 * node if we get notified that our reply has failed.
88 * The solution is to first note the fact that the remote chose an optimal
89 * route to get to us, so the reverse should be true. The reason the VC
90 * might not exist over the same route in the reverse is because there may
91 * be multiple paths available with the same distance metric.
93 * But this also means that we can adjust the messaging protocols to
94 * propagate a LNK_SPAN from the remote to us WHILE the remote's command
95 * message is being sent to us, and it will not only likely be optimal but
96 * it might also already exist, and it will also guarantee that a reply
97 * failure will propagate back to both sides (because even though each
98 * direction is using a different VC chain, the two chains are still
99 * going along the same path).
101 * We communicate the return VC by having the relay adjust both the target
102 * and the source fields in the message, rather than just the target, on
103 * each relay. As of when the message gets to us the 'source' field will
104 * represent the VC for the return direction (and of course also identify
105 * the node the message came from).
107 * This way both sides get positive feedback if a topology change disrupts
108 * the VC for the transaction. We also get one additional guarantee, and
109 * that is no spurious messages. Messages simply die when the VC they are
110 * traveling over is broken, in either direction, simple as that.
111 * It makes managing message transactional states very easy.
113 * MESSAGE TRANSACTIONAL STATES
115 * Message state is handled by the CREATE, DELETE, REPLY, and ABORT
116 * flags. Message state is typically recorded at the end points and
117 * at each hop until a DELETE is received from both sides.
119 * One-way messages such as those used by spanning tree commands are not
120 * recorded. These are sent without the CREATE, DELETE, or ABORT flags set.
121 * ABORT is not supported for one-off messages. The REPLY bit can be used
122 * to distinguish between command and status if desired.
124 * Persistent-state messages are messages which require a reply to be
125 * returned. These messages can also consist of multiple message elements
126 * for the command or reply or both (or neither). The command message
127 * sequence sets CREATE on the first message and DELETE on the last message.
128 * A single message command sets both (CREATE|DELETE). The reply message
129 * sequence works the same way but of course also sets the REPLY bit.
131 * Persistent-state messages can be aborted by sending a message element
132 * with the ABORT flag set. This flag can be combined with either or both
133 * the CREATE and DELETE flags. When combined with the CREATE flag the
134 * command is treated as non-blocking but still executes. Whem combined
135 * with the DELETE flag no additional message elements are required.
137 * ABORT SPECIAL CASE - Mid-stream aborts. A mid-stream abort can be sent
138 * when supported by the sender by sending an ABORT message with neither
139 * CREATE or DELETE set. This effectively turns the message into a
140 * non-blocking message (but depending on what is being represented can also
141 * cut short prior data elements in the stream).
143 * ABORT SPECIAL CASE - Abort-after-DELETE. Persistent messages have to be
144 * abortable if the stream/pipe/whatever is lost. In this situation any
145 * forwarding relay needs to unconditionally abort commands and replies that
146 * are still active. This is done by sending an ABORT|DELETE even in
147 * situations where a DELETE has already been sent in that direction. This
148 * is done, for example, when links are in a half-closed state. In this
149 * situation it is possible for the abort request to race a transition to the
150 * fully closed state. ABORT|DELETE messages which race the fully closed
151 * state are expected to be discarded by the other end.
155 * All base and extended message headers are 64-byte aligned, and all
156 * transports must support extended message headers up to DMSG_HDR_MAX.
157 * Currently we allow extended message headers up to 2048 bytes. Note
158 * that the extended header size is encoded in the 'cmd' field of the header.
160 * Any in-band data is padded to a 64-byte alignment and placed directly
161 * after the extended header (after the higher-level cmd/rep structure).
162 * The actual unaligned size of the in-band data is encoded in the aux_bytes
163 * field in this case. Maximum data sizes are negotiated during registration.
165 * Auxillary data can be in-band or out-of-band. In-band data sets aux_descr
166 * equal to 0. Any out-of-band data must be negotiated by the SPAN protocol.
168 * Auxillary data, whether in-band or out-of-band, must be at-least 64-byte
169 * aligned. The aux_bytes field contains the actual byte-granular length
170 * and not the aligned length.
172 * hdr_crc is calculated over the entire, ALIGNED extended header. For
173 * the purposes of calculating the crc, the hdr_crc field is 0. That is,
174 * if calculating the crc in HW a 32-bit '0' must be inserted in place of
175 * the hdr_crc field when reading the entire header and compared at the
176 * end (but the actual hdr_crc must be left intact in memory). A simple
177 * counter to replace the field going into the CRC generator does the job
178 * in HW. The CRC endian is based on the magic number field and may have
179 * to be byte-swapped, too (which is also easy to do in HW).
181 * aux_crc is calculated over the entire, ALIGNED auxillary data.
183 * SHARED MEMORY IMPLEMENTATIONS
185 * Shared-memory implementations typically use a pipe to transmit the extended
186 * message header and shared memory to store any auxilary data. Auxillary
187 * data in one-way (non-transactional) messages is typically required to be
188 * inline. CRCs are still recommended and required at the beginning, but
189 * may be negotiated away later.
191 * MULTI-PATH MESSAGE DUPLICATION
193 * Redundancy can be negotiated but is not required in the current spec.
194 * Basically you send the same message, with the same msgid, via several
195 * paths to the target. The msgid is the rendezvous. The first copy that
196 * makes it to the target is used, the second is ignored. Similarly for
197 * replies. This can improve performance during span flapping. Only
198 * transactional messages will be serialized. The target might receive
199 * multiple copies of one-way messages in higher protocol layers (potentially
200 * out of order, too).
203 uint16_t magic; /* 00 sanity, synchro, endian */
204 uint16_t reserved02; /* 02 */
205 uint32_t salt; /* 04 random salt helps w/crypto */
207 uint64_t msgid; /* 08 message transaction id */
208 uint64_t source; /* 10 originator or 0 */
209 uint64_t target; /* 18 destination or 0 */
211 uint32_t cmd; /* 20 flags | cmd | hdr_size / ALIGN */
212 uint32_t aux_crc; /* 24 auxillary data crc */
213 uint32_t aux_bytes; /* 28 auxillary data length (bytes) */
214 uint32_t error; /* 2C error code or 0 */
215 uint64_t aux_descr; /* 30 negotiated OOB data descr */
216 uint32_t reserved38; /* 38 */
217 uint32_t hdr_crc; /* 3C (aligned) extended header crc */
220 typedef struct dmsg_hdr dmsg_hdr_t;
222 #define DMSG_HDR_MAGIC 0x4832
223 #define DMSG_HDR_MAGIC_REV 0x3248
224 #define DMSG_HDR_CRCOFF offsetof(dmsg_hdr_t, salt)
225 #define DMSG_HDR_CRCBYTES (sizeof(dmsg_hdr_t) - DMSG_HDR_CRCOFF)
228 * Administrative protocol limits.
230 #define DMSG_HDR_MAX 2048 /* <= 65535 */
231 #define DMSG_AUX_MAX 65536 /* <= 1MB */
232 #define DMSG_BUF_SIZE (DMSG_HDR_MAX * 4)
233 #define DMSG_BUF_MASK (DMSG_BUF_SIZE - 1)
236 * The message (cmd) field also encodes various flags and the total size
237 * of the message header. This allows the protocol processors to validate
238 * persistency and structural settings for every command simply by
239 * switch()ing on the (cmd) field.
241 #define DMSGF_CREATE 0x80000000U /* msg start */
242 #define DMSGF_DELETE 0x40000000U /* msg end */
243 #define DMSGF_REPLY 0x20000000U /* reply path */
244 #define DMSGF_ABORT 0x10000000U /* abort req */
245 #define DMSGF_AUXOOB 0x08000000U /* aux-data is OOB */
246 #define DMSGF_FLAG2 0x04000000U
247 #define DMSGF_FLAG1 0x02000000U
248 #define DMSGF_FLAG0 0x01000000U
250 #define DMSGF_FLAGS 0xFF000000U /* all flags */
251 #define DMSGF_PROTOS 0x00F00000U /* all protos */
252 #define DMSGF_CMDS 0x000FFF00U /* all cmds */
253 #define DMSGF_SIZE 0x000000FFU /* N*32 */
255 #define DMSGF_CMDSWMASK (DMSGF_CMDS | \
260 #define DMSGF_BASECMDMASK (DMSGF_CMDS | \
264 #define DMSGF_TRANSMASK (DMSGF_CMDS | \
271 #define DMSG_PROTO_LNK 0x00000000U
272 #define DMSG_PROTO_DBG 0x00100000U
273 #define DMSG_PROTO_DOM 0x00200000U
274 #define DMSG_PROTO_CAC 0x00300000U
275 #define DMSG_PROTO_QRM 0x00400000U
276 #define DMSG_PROTO_BLK 0x00500000U
277 #define DMSG_PROTO_VOP 0x00600000U
280 * Message command constructors, sans flags
282 #define DMSG_ALIGN 64
283 #define DMSG_ALIGNMASK (DMSG_ALIGN - 1)
284 #define DMSG_DOALIGN(bytes) (((bytes) + DMSG_ALIGNMASK) & \
287 #define DMSG_HDR_ENCODE(elm) (((uint32_t)sizeof(struct elm) + \
291 #define DMSG_LNK(cmd, elm) (DMSG_PROTO_LNK | \
293 DMSG_HDR_ENCODE(elm))
295 #define DMSG_DBG(cmd, elm) (DMSG_PROTO_DBG | \
297 DMSG_HDR_ENCODE(elm))
299 #define DMSG_DOM(cmd, elm) (DMSG_PROTO_DOM | \
301 DMSG_HDR_ENCODE(elm))
303 #define DMSG_CAC(cmd, elm) (DMSG_PROTO_CAC | \
305 DMSG_HDR_ENCODE(elm))
307 #define DMSG_QRM(cmd, elm) (DMSG_PROTO_QRM | \
309 DMSG_HDR_ENCODE(elm))
311 #define DMSG_BLK(cmd, elm) (DMSG_PROTO_BLK | \
313 DMSG_HDR_ENCODE(elm))
315 #define DMSG_VOP(cmd, elm) (DMSG_PROTO_VOP | \
317 DMSG_HDR_ENCODE(elm))
320 * Link layer ops basically talk to just the other side of a direct
323 * LNK_PAD - One-way message on link-0, ignored by target. Used to
324 * pad message buffers on shared-memory transports. Not
325 * typically used with TCP.
327 * LNK_PING - One-way message on link-0, keep-alive, run by both sides
328 * typically 1/sec on idle link, link is lost after 10 seconds
331 * LNK_AUTH - Authenticate the connection, negotiate administrative
332 * rights & encryption, protocol class, etc. Only PAD and
333 * AUTH messages (not even PING) are accepted until
334 * authentication is complete. This message also identifies
337 * LNK_CONN - Enable the SPAN protocol on link-0, possibly also installing
338 * a PFS filter (by cluster id, unique id, and/or wildcarded
341 * LNK_SPAN - A SPAN transaction on link-0 enables messages to be relayed
342 * to/from a particular cluster node. SPANs are received,
343 * sorted, aggregated, and retransmitted back out across all
344 * applicable connections.
346 * The leaf protocol also uses this to make a PFS available
347 * to the cluster (e.g. on-mount).
349 * LNK_VOLCONF - Volume header configuration change. All hammer2
350 * connections (hammer2 connect ...) stored in the volume
351 * header are spammed at the link level to the hammer2
352 * service daemon, and any live configuration change
355 #define DMSG_LNK_PAD DMSG_LNK(0x000, dmsg_hdr)
356 #define DMSG_LNK_PING DMSG_LNK(0x001, dmsg_hdr)
357 #define DMSG_LNK_AUTH DMSG_LNK(0x010, dmsg_lnk_auth)
358 #define DMSG_LNK_CONN DMSG_LNK(0x011, dmsg_lnk_conn)
359 #define DMSG_LNK_SPAN DMSG_LNK(0x012, dmsg_lnk_span)
360 #define DMSG_LNK_VOLCONF DMSG_LNK(0x020, dmsg_lnk_volconf)
361 #define DMSG_LNK_ERROR DMSG_LNK(0xFFF, dmsg_hdr)
364 * LNK_CONN - Register connection for SPAN (transaction, left open)
366 * One LNK_CONN transaction may be opened on a stream connection, registering
367 * the connection with the SPAN subsystem and allowing the subsystem to
368 * accept and relay SPANs to this connection.
370 * The LNK_CONN message may contain a filter, limiting the desireable SPANs.
372 * This message contains a lot of the same info that a SPAN message contains,
373 * but is not a SPAN. That is, without this message the SPAN subprotocol will
374 * not be executed on the connection, nor is this message a promise that the
375 * sending end is a client or node of a cluster.
377 struct dmsg_lnk_auth {
383 * LNK_CONN identifies a streaming connection into the cluster. The other
384 * fields serve as a filter when supported for a particular peer and are
385 * not necessarily all used.
387 * peer_mask serves to filter the SPANs we receive by peer. A cluster
388 * controller typically sets this to (uint64_t)-1, a block devfs
389 * interface might set it to 1 << DMSG_PEER_DISK, and a hammer2
390 * mount might set it to 1 << DMSG_PEER_HAMMER2.
392 * mediaid allows multiple (e.g. HAMMER2) connections belonging to the same
393 * media, in terms of LNK_VOLCONF updates.
395 * pfs_clid, pfs_fsid, pfs_type, and label are peer-specific and must be
396 * left empty (zero-fill) if not supported by a particular peer.
398 * DMSG_PEER_CLUSTER filter: none
399 * DMSG_PEER_DISK filter: label
400 * DMSG_PEER_HAMMER2 filter: pfs_clid if not empty, and label
402 struct dmsg_lnk_conn {
404 uuid_t mediaid; /* media configuration id */
405 uuid_t pfs_clid; /* rendezvous pfs uuid */
406 uuid_t pfs_fsid; /* unique pfs uuid */
407 uint64_t peer_mask; /* PEER mask for SPAN filtering */
408 uint8_t peer_type; /* see DMSG_PEER_xxx */
409 uint8_t pfs_type; /* pfs type */
410 uint16_t proto_version; /* high level protocol support */
411 uint32_t status; /* status flags */
412 uint8_t reserved02[8];
413 int32_t dist; /* span distance */
414 uint32_t reserved03[14];
415 char label[256]; /* PFS label (can be wildcard) */
418 typedef struct dmsg_lnk_conn dmsg_lnk_conn_t;
421 * LNK_SPAN - Relay a SPAN (transaction, left open)
423 * This message registers a PFS/PFS_TYPE with the other end of the connection,
424 * telling the other end who we are and what we can provide or what we want
425 * to consume. Multiple registrations can be maintained as open transactions
426 * with each one specifying a unique {source} linkid.
428 * Registrations are sent from {source}=S {1...n} to {target}=0 and maintained
429 * as open transactions. Registrations are also received and maintains as
430 * open transactions, creating a matrix of linkid's.
432 * While these transactions are open additional transactions can be executed
433 * between any two linkid's {source}=S (registrations we sent) to {target}=T
434 * (registrations we received).
436 * Closure of any registration transaction will automatically abort any open
437 * transactions using the related linkids. Closure can be initiated
438 * voluntarily from either side with either end issuing a DELETE, or they
441 * Status updates are performed via the open transaction.
445 * A registration identifies a node and its various PFS parameters including
446 * the PFS_TYPE. For example, a diskless HAMMER2 client typically identifies
447 * itself as PFSTYPE_CLIENT.
449 * Any node may serve as a cluster controller, aggregating and passing
450 * on received registrations, but end-points do not have to implement this
451 * ability. Most end-points typically implement a single client-style or
452 * server-style PFS_TYPE and rendezvous at a cluster controller.
454 * The cluster controller does not aggregate/pass-on all received
455 * registrations. It typically filters what gets passed on based on
458 * STATUS UPDATES: Status updates use the same structure but typically
459 * only contain incremental changes to pfs_type, with the
460 * label field containing a text status.
462 struct dmsg_lnk_span {
464 uuid_t pfs_clid; /* rendezvous pfs uuid */
465 uuid_t pfs_fsid; /* unique pfs uuid */
466 uint8_t pfs_type; /* PFS type */
467 uint8_t peer_type; /* PEER type */
468 uint16_t proto_version; /* high level protocol support */
469 uint32_t status; /* status flags */
470 uint8_t reserved02[8];
471 int32_t dist; /* span distance */
472 uint32_t reserved03[15];
473 char label[256]; /* PFS label (can be wildcard) */
476 typedef struct dmsg_lnk_span dmsg_lnk_span_t;
478 #define DMSG_SPAN_PROTO_1 1
485 * All HAMMER2 directories directly under the super-root on your local
486 * media can be mounted separately, even if they share the same physical
489 * When you do a HAMMER2 mount you are effectively tying into a HAMMER2
490 * cluster via local media. The local media does not have to participate
491 * in the cluster, other than to provide the hammer2_copy_data[] array and
492 * root inode for the mount.
494 * This is important: The mount device path you specify serves to bootstrap
495 * your entry into the cluster, but your mount will make active connections
496 * to ALL copy elements in the hammer2_copy_data[] array which match the
497 * PFSID of the directory in the super-root that you specified. The local
498 * media path does not have to be mentioned in this array but becomes part
499 * of the cluster based on its type and access rights. ALL ELEMENTS ARE
500 * TREATED ACCORDING TO TYPE NO MATTER WHICH ONE YOU MOUNT FROM.
502 * The actual cluster may be far larger than the elements you list in the
503 * hammer2_copy_data[] array. You list only the elements you wish to
504 * directly connect to and you are able to access the rest of the cluster
505 * indirectly through those connections.
507 * This structure must be exactly 128 bytes long.
509 * WARNING! dmsg_vol_data is embedded in the hammer2 media volume header
511 struct dmsg_vol_data {
512 uint8_t copyid; /* 00 copyid 0-255 (must match slot) */
513 uint8_t inprog; /* 01 operation in progress, or 0 */
514 uint8_t chain_to; /* 02 operation chaining to, or 0 */
515 uint8_t chain_from; /* 03 operation chaining from, or 0 */
516 uint16_t flags; /* 04-05 flags field */
517 uint8_t error; /* 06 last operational error */
518 uint8_t priority; /* 07 priority and round-robin flag */
519 uint8_t remote_pfs_type;/* 08 probed direct remote PFS type */
520 uint8_t reserved08[23]; /* 09-1F */
521 uuid_t pfs_clid; /* 20-2F copy target must match this uuid */
522 uint8_t label[16]; /* 30-3F import/export label */
523 uint8_t path[64]; /* 40-7F target specification string or key */
526 typedef struct dmsg_vol_data dmsg_vol_data_t;
528 #define DMSG_VOLF_ENABLED 0x0001
529 #define DMSG_VOLF_INPROG 0x0002
530 #define DMSG_VOLF_CONN_RR 0x80 /* round-robin at same priority */
531 #define DMSG_VOLF_CONN_EF 0x40 /* media errors flagged */
532 #define DMSG_VOLF_CONN_PRI 0x0F /* select priority 0-15 (15=best) */
534 struct dmsg_lnk_volconf {
536 dmsg_vol_data_t copy; /* copy spec */
540 int64_t reserved02[32];
543 typedef struct dmsg_lnk_volconf dmsg_lnk_volconf_t;
546 * Debug layer ops operate on any link
548 * SHELL - Persist stream, access the debug shell on the target
549 * registration. Multiple shells can be operational.
551 #define DMSG_DBG_SHELL DMSG_DBG(0x001, dmsg_dbg_shell)
553 struct dmsg_dbg_shell {
556 typedef struct dmsg_dbg_shell dmsg_dbg_shell_t;
559 * Domain layer ops operate on any link, link-0 may be used when the
560 * directory connected target is the desired registration.
566 * Cache layer ops operate on any link, link-0 may be used when the
567 * directly connected target is the desired registration.
569 * LOCK - Persist state, blockable, abortable.
571 * Obtain cache state (MODIFIED, EXCLUSIVE, SHARED, or INVAL)
572 * in any of three domains (TREE, INUM, ATTR, DIRENT) for a
573 * particular key relative to cache state already owned.
575 * TREE - Effects entire sub-tree at the specified element
576 * and will cause existing cache state owned by
577 * other nodes to be adjusted such that the request
580 * INUM - Only effects inode creation/deletion of an existing
581 * element or a new element, by inumber and/or name.
582 * typically can be held for very long periods of time
583 * (think the vnode cache), directly relates to
584 * hammer2_chain structures representing inodes.
586 * ATTR - Only effects an inode's attributes, such as
587 * ownership, modes, etc. Used for lookups, chdir,
588 * open, etc. mtime has no affect.
590 * DIRENT - Only affects an inode's attributes plus the
591 * attributes or names related to any directory entry
592 * directly under this inode (non-recursively). Can
593 * be retained for medium periods of time when doing
596 * This function may block and can be aborted. You may be
597 * granted cache state that is more broad than the state you
598 * requested (e.g. a different set of domains and/or an element
599 * at a higher layer in the tree). When quorum operations
600 * are used you may have to reconcile these grants to the
601 * lowest common denominator.
603 * In order to grant your request either you or the target
604 * (or both) may have to obtain a quorum agreement. Deadlock
605 * resolution may be required. When doing it yourself you
606 * will typically maintain an active message to each master
607 * node in the system. You can only grant the cache state
608 * when a quorum of nodes agree.
610 * The cache state includes transaction id information which
611 * can be used to resolve data requests.
613 #define DMSG_CAC_LOCK DMSG_CAC(0x001, dmsg_cac_lock)
616 * Quorum layer ops operate on any link, link-0 may be used when the
617 * directly connected target is the desired registration.
619 * COMMIT - Persist state, blockable, abortable
621 * Issue a COMMIT in two phases. A quorum must acknowledge
622 * the operation to proceed to phase-2. Message-update to
623 * proceed to phase-2.
625 #define DMSG_QRM_COMMIT DMSG_QRM(0x001, dmsg_qrm_commit)
628 * NOTE!!!! ALL EXTENDED HEADER STRUCTURES MUST BE 64-BYTE ALIGNED!!!
630 * General message errors
632 * 0x00 - 0x1F Local iocomm errors
633 * 0x20 - 0x2F Global errors
635 #define DMSG_ERR_NOSUPP 0x20
638 char buf[DMSG_HDR_MAX];
640 dmsg_lnk_span_t lnk_span;
641 dmsg_lnk_conn_t lnk_conn;
642 dmsg_lnk_volconf_t lnk_volconf;
645 typedef union dmsg_any dmsg_any_t;