hammer2 - Config notifications, cleanup HAMMER2 VFS API * A hammer2 volume has a PERSISTENT table of 256 entries in the media volume header for specifying how the cluster connects together. Various hammer2 directives can list, add, or remove entries from this table. This table will be used for several different aspects of the filesystem and one of them is to tell the userland hammer2 service daemon what other machines to connect to. That is, we want the cluster configuration to be persistently stored as part of a HAMMER2 filesystem. * Add a notification message from the kernel to the daemon whenever this table id modified. The kernel will also spam the contents of the table to the daemon when it first connects to the daemon. The service daemon tracks the table and will connect to (or disconnect from) the listed targets in real time. In addition, the service daemon will retry a failed connection (or failed DNS lookup) forever as long as the entry is intact. The idea being that a machine in the cluster will recover once transitory failures are resolved. This is a bit messy at the moment as two pthreads have to be created for each connection... one to handle connect, disconnect, and retry operations and the other to handle the actual message stream over the connection. * Clean up the HAMMER2 VFS code's messaging APIs a bit, bringing them closer to the hammer2 userland APIs (though of course there will always be major differences). * Adjust the hammer2 VFS to try to clean up open transactional states when a socket failure occurs before proceeding with a umount, so the related functional states can be triggered and cleaned up. * Added an ioctl to reconnect a hammer2 mount to the userland hammer2 service daemon (not yet used). This will allow us to kill and restart the daemon and have it recover the communications pipes between itself and the HAMMER2 mounts in the kernel.
hammer2 - Message routing work * Further API simplification * Start adding router infrastructure to the kernel VFS * Looks like we will need a 'source' and 'target' field in the message header after all (replaces the single 'spanid' field), in order to track the reply VC as a message is relayed.
hammer2 - SPAN protocol work, router work * Fix SPAN relay sort and sequencing bugs. * Start reworking the APIs to accomodate routed messages. Start by creating a hammer2_router structure and adjusting most of the msg functions to pass it instead of the iocom. * Fix hammer2_state races by moving the state allocation to hammer2_msg_alloc() instead of hammer2_msg_write(). This gives code a chance to assign the state->any.* field without having to worry about the state getting ripped out from under us.
hammer2 - SPAN protocol work * Because we allow loops in the graph the loss of a feeder node can result in a tail-chasing loop of SPAN updates with an ever-growing distance parameter. To deal with this a spanning tree distance limit is required, beyond which no propagation occurs which terminates the chase. The tail then catches up to the head and the node is finally removed from the spanning tree entirely. This fixes the propagation of spanning tree deletions e.g. when we umount a HAMMER2 PFS. * Fix a state insertion bug. A structure was being inserted into the red-black tree before the required fields were being initialized. Corrects a SPAN propagation fault.
hammer2 - userland API / span work * Fix a stall bug in the streaming write code * Add some pretty-printing support for debug output. Also remove a ton of debug messages. * Add a remote shell 'tree' command to dump the spanning tree. * Cleanup some of the state handling and error codes.
hammer2 - SPAN protocol work * Initial implementation of the LNK_SPAN protocol between two hammer service daemons running on different machines. There's still lots to do. mount x P PFSs (P pipes) (machine A) | service daemon (machine A) (handling P + 1 connections) | INET SOCKET | service daemon (machine B) (handling Q + 1 connections) | mount x Q PFSs (Q pipes) (machine B) * Service deamons starts with LNK_CONN and then interconnect SPANs. * SPAN protocol allows any number of connections between services daemons and from service daemons to physical HAMMER2 mounts. * Fixed a message write() sequencing bug * Added some additional debug directives, and also added a remote debug directive to connect from one already-running service daemon to another.
hammer2 - spanning tree and messaging work * Fix numerous bugs and cleanup the messaging infrastructure further. Fix issues with state tracking and incorrect message flags, assert that flags are correct. * Fix issues with connection termination. All active transactions must be completely closed from both ends before the iocom can be destroyed. Fix bugs in the MSGF_DELETE message simulator when a socket error occurs (simulating the other end closing any active transactions going over the iocom). * Implement the spanning tree relay code. The relay code is even relatively optimal though ultimately we need to add additional filters to make client<->service rendezvous's less cpu intensive.
hammer2 - Flesh out span code, API cleanups * Cleanup the transactional APIs and add a few functions to help with simple (error code only) message replies. * Better message protocol layering for both the kernel and userland code. * Kernel now opens a LNK_CONN transaction which will enable the SPAN protocol on the link and also serve to install a PFS filter (which is not yet implemented). Upon success the kernel then initiates the SPAN. Basically for the kernel: send LNK_CONN wait for streaming reply (transaction remains open on both sides) send LNK_SPAN TODO: Receive/track LNK_SPANs, each representing a virtual circuit. TODO: Track LNK_SPANs that match our PFS. TODO: Issue higher level protocol transaction messages over these circuits based on VNOPS, caching, mirroring, etc. (transactional failures can occur when the LNK_SPAN state changes, forcing a retry, etc). * Userland now accepts the LNK_CONN and uses the open transaction to install tracking structures for those connections participating in the SPAN protocol. * Userland now installs tracking structures for received SPAN messages. * Start fleshing out the userland side of the SPAN relay/transmit code. This will involve yet more structures to track which SPANs are being relayed over each connection, so changes can be propagated (not yet implemented). For userland the TODO is very large so no point iterating it here. * Kernel now accepts DBG_SHELL replies (basically debug output messages) and will kprintf() them. DBG_SHELL commands not yet accepted by the kernel.
hammer2 - More work on userland hammer2 msg infrastructure * When a link error occurs generate a LNK_ERROR message for each transaction before setting the iocom ERROR flag and returning the final non-transactional LNK_ERROR. * Processing command switches switch on the original transactional head.cmd instead of the current msg->any.head.cmd, which allows the use of mixed cmd's in a transactional message stream. The target function then handles the actual msg->any.head.cmd. Thus we can consolidate all sub-commands used within a transaction into the target function, which greatly improves code quality. This allows us to send LNK_ERROR messages over active transactions. * Print the pfs_id and label for the received LNK_SPAN message the kernel sends to the userland hammer2 service process, and verify LNK_ERROR processing for connection terminations. Yup, it works.
hammer2 - Bring in the transaction state code from the hammer2 vfs * Bring in the transaction state management code from the kernel hammer2 module and cleanup the APIs to use similar mechanics. * Basic replymsg operations now use the HAMMER2_LNK_ERROR directive instead of the original command, for now.
hammer2 - Implement and test first SPAN message transaction. * The hammer2 VFS now sends a dummy SPAN message to the hammer2 service daemon. SPANs are used to register capabilities (primarily PFS services and PFS consumers). SPAN messages are left as open transactions for the duration of the link and/or when the graph changes (mainly a spanning tree mechanic that will be coded as a function of the hammer2 service daemon in userland. * Basic open transaction and simple reply message tested. Use a dummy message for testing. * hammer2_msg_write() detects CREATE, allocates state, and assigns a msgid. state allocation moved out of hammer2_state_msgtx() and into hammer2_msg_write() so we can calculate the proper CRCs. * Fixed a couple of expected bugs. The userland code was swapping msg_hdr.source and msg_hdr.target in the reply, but I adjusted the message spec to NOT do that (meaning any message routing has to select {source} or {target} based on whether the REPLY bit is set or not. * Memory seems to get cleaned up properly, so far.
hammer2 - Implement aes_256_cbc session encryption * The AES session key and initial iv[] are transmitted in the public key exchange. * The actual AES session key and initial iv[] is the data received XOR'd with the data sent, so if the public key exchange is broken (even if the verifier succeeds), the rest of the session will die a horrible death. * We use aes_256_cbc and in addition to the iv[] being adjusted by the data in-flight we also inject some random data in each message header to mix iv[] up even more than it would be normally. * We also check the message sequence number, which is embedded in the random data (the raw msg header's salt field), though the iv[] should catch any replays. * NOTE: Verifier is still weak, but the session key and iv[] exchange is very strong.