hammer2 - More involved refactoring of chain_repparent, cleanup * Remove unused locking flags (remove the NOLOCK and NOUNLOCK features). * Add HAMMER2_RESOLVE_NONBLOCK to hammer2_chain_lock() for use only by hammer2_chain_getparent() and hammer2_chain_repparent(). * Refactor hammer2_chain_getparent() and hammer2_chain_repparent(). Add a hot-path that uses HAMMER2_RESOLVE_NONBLOCK. If this fails we now do a much more involved tracking operation via 'reptrack' to deal with races against indirect block deletions. * Cleanup the copyright messages. * Fix an issue where a sync could be held-up indefinitely by ongoing overlapping modifying operations. * Install a proper initial inode count when creating a snapshot. * Fix a deadlock in checkdirempty(). A chain lock was winding up being ordered incorrectly.
hammer2 - locking revamp * Temporarily remove hammer2_ccms.c from the build and remove the ccms container. The CCMS container will be used again when we get cache coherent in, but for now it isn't needed. * Replace the ccms state lock with a mtx lock and move into hammer2_chain_core. Note that the mtx lock being used here has abort and async locking support and these features will be required by HAMMER2. * Replace the ccms spin lock with a normal spin lock and move into hammer2_chain_core. * Refactor the OS locking interface to use hammer2_* prefixes for easier porting. * Use a shared spin lock for the ONFLUSH bit update instead of an exclusive spin lock.
hammer2 - major simplification 1/many (stabilization B) * Change hammer2_cluster_bytes() to hammer2_cluster_need_resize() to check for cluster size mismatches against desired. Used for data block resizing. * Fix panic - allow data blocks to have a chain->dio. This will be the case when compression or other data filters are used. * Fix null pointer panic - chain->dio can be NULL for data blocks. * Fix null pointer panic - hlinkp is allowed to be NULL in hammer2_unlink_file(). * Do not assert if a hardlink target cannot be found. There is a known bug case when a directory is moved to another part of the topology where underlying hardlinks can get lost. kprintf() instead. * Fix inode deadlock, add missing inode unlock in hammer2_hardlink_find(). * Remove OBJTYPE_HARDLINK tests from hammer2_inode_lock_*(). It is no longer possible for an inode's chain to point to a hardlink pointer, it will always point to the hardlink target. * Add some lock count tracking to the VOPs to catch left over locks on return. (Note that read-ahead operations mess up the lock count because the shared lock is inherited by the async op, so lock count tracking is not done in code which handles logical file data). * Hammer2 survives cpdup, blogbench fsx, fsstress
hammer2 - Refactor flush mechanics * Greatly simplify and reduce special cases in the flush code * Remove the multi-layer rbtree and replace with two discrete rbtree's (rbtree and dbtree) representing the live state and one linked list (dbq) representing set-aside deleted chains that are not part of the live state. * Cleanup some debugging junk, add more debugging junk. * Separate flushing state flags and TIDs into their own fields instead of trying to use the live state flags and bref TIDs. * Simplify transaction TID tracking.
hammer2 - pfsmount -> clustermount separation part 2 * Further separate the high-level VNOPS/inode (hammer2_pfsmount) layer from the lower level device (hammer2_mount, hammer2_chain) layer. * Remove hmp fields from hammer2_trans and hammer2_inode. * Add hammer2_cluster to the pfsmount as degenerate case for now. This will be used to list all devices backing the PFS mount, pertaining to the copies mechanism. * Run all logical (file) buffer cache operations through the device buffer cache. Remove previous direct-mapped shortcuts and disable BMAP for now. Basically the issue here is that with multiple devices backing a HAMMER2 mount, the normal file buffer cache 'cached disk offset' operations used to shortcut I/O just won't work. We can add the shortcut back in later for single-backing-device mounts but for now separate them out entirely and bcopy() between them. * This will also make it easier for the GSOC H2 file compression project. * Restore some of the lost performance by using the newly implemented cluster_readcb() buffer cache function.
hammer2 - flush sequencing part 5 - more flush synchronization work * Get rid of chain->parent, replacing it with chain->above which is a pointer to the core common to the possibly multiple parents. Due to the multi-parenting, chain->parent was rather ad-hoc so getting rid of makes the code more clear. * Adjust several APIs which used to take a locked parent of chain to instead take the core common to multiple parents of chain. * Rework how CHAIN_MOVED is cleared. The code works better but still has bugs which can leave chains hanging and unflushed on umount. * Rework the lastdrop function significantly. * Continue working on automatic delete/duplicate operation when a modification crosses a synchronization boundary. This code is now mostly implemented. * Continue working on the flush filter which is responsible for differentiating modifications made before and after the synchronization point. The filter is now mostly implemented. * Use spinlock protection on the rbtree, allowing manipulation of children without having to lock a specific parent chain (which wouldn't help much anyway since there can be more than one parent). * Fix numerous assertions and panics.
hammer2 - Major restructuring, part 4/several * Add inumber -> inode structure tracking and lookup. This is needed to ensure that only a single inode structure be used to track multiple hardlinks to the same place. * Continue stabilization. Remove modify_tid/delete_tid checks in the flush code and (for now) only flush along the live path. Refactor held chains when creating new chains. The creation of a new chain can move around existing chains, causing the held chain to be marked deleted. When a hardlink is consolidated in a parent directory the source chain used in the duplication is not deleted. Numerous chain->duplink handling code was assuming that the source was always deleted. Fix that. * The shared inode lock now refactors ip->chain (the exclusive inode lock already did so). * Fix most ref-counting of the chain structure, fixing most of the memory leakage issues on unmount. * There are still some issues with small files not inheriting their data on duplication. cpdup /usr/share works but a significant number of small files lose their data references on re-mount.
hammer2 - Major restructuring, part 1/several * This breaks a lot of things. The next few commits will get it all working again. * Significantly rework the data structures. Instead of embedding the RBTREE for a chain's children in the chain, the chain instead points to a secondary structure containing the RBTREE. Chains can no longer be moved within the in-memory topology. That is, if a file is renamed or a block is resized or a block is moved into or out of an indirect block, the in-memory chain representing that block is NOT moved. Instead, the in-memory chain is marked deleted and a copy is created at the new location. Both the old and the new chain reference the same secondary structure and thus share the same RBTREE, and reference the same media storage. In addition, chain->duplink points from the deleted chain to its relocated copy and maintains a reference on the target until the deleted chain is deallocated. It is possible for the linked list to span more than one element. This link will soon be used to retarget inode->chain pointers (which can wind up pointing to stale data) and also eventually effect chain->parent traversals (real parent becomes chain->parent->[duplink*]). A rethink might be needed down the line. * This will allow the flush code to run 100% asynchronous from the frontend and still be able to flush to a synchronization point no matter how complex a set of changes have occured to the filesystem concurrent to the flush (but after its synchronization point). * The change also stabilizes chain->parent, which simplifies quite a bit of code. * Simplify nearly all the hammer2_chain_*() API functions, and other functions. * Add a hammer2_trans (transaction) structure to keep track of modifying transactions. This will be flushed out later and used to detect flush synchronization points. It currently contains the transaction id. * Start adding API infrastructure and start reworking the flush and other tree-modifying code to work under the new abstraction.
hammer2 - serialized flush work part 1 This is preliminary work required to support chain structure replication for the purposes of recording modifications which are then separated by serialization points (by transaction id). Ultimately this will allow the flush code to flush to an exact serialization point and in the process operate completely asynchronously from any further modifications being made on the frontend after that serialization point. * Separate hammer2_inode from hammer2_chain. * Split the locking APIs for inodes and chains into their own functions. * Move ip_data into chain->data->ipdata (kmalloc'd), instead of embedding it in hammer2_inode. This allows the inode structure to disconnect from the chain.
hammer2 - Integrate CCMS thread lock into hammer2 chain structure * Integrate the CCMS thread lock into the hammer2 chain structure. * Implement shared and exclusive modes (hammer2 was only really using exclusive mode before). Rework all the chain and inode locking functions to use CCMS via chain->cst. This also required changing the SPLAY trees into RB trees. * Start reworking non-modifying VNOPS to use shared CCMS locks. * Rework the hammer2_chain_drop() function to avoid deadlocks due to the mixed shared/exclusive locks we now support. * Major performance improvements for concurrent access. SHARED locks now extend to hammer2_chain and hammer2_inode structural accesses, recursions, and cached data (buffer cache) accesses. In particular, multiple threads can now access the same bp via a hammer2_chain locked shared. The bp's themselves are still exclusive only (the kernel APIs haven't changed), but the hammer2_chain structure can now share the bp's data across several threads accessing it via the chain.
hammer2 - Initial CCMS locking tie-in This is a necessary precursor step to being able to integrate the cache state grants with our chain locks. Basically we are replacing the hammer2 chain lockmgr lock (hammer2_chain->lk) with a CCMS cst structure (hammer2_chain->cst). This structure will become the attribute CST for hammer2 inodes. The topological CST is built into the hammer2_inode. Data-space CSTs will initially be the hammer2_chain->cst for indirect blocks though we will probably also need one or more in hammer2_inode to handle generic casess.
hammer2 - Initial CCMS adaptation and code-up This is an initial code-up and compiles-without-error pass, untested and likely full of bugs. CCMS needed a makeover but I managed to retain the guts of the original block/wakeup and CST partitioning code. * The frontend code now creates a larger CCMS topology which will mirror the chain topology (the ccms_inode will be embedded in a hammer2_inode), and places the data ranging in ccms_inode. * CCMS inode creation and deletion is broken up into two stages, e.g. a deletion requires a 'delete' plus 'uninit' sequence allowing the 'delete' to reflect a topological deletion but for the CCMS node to remain intact (e.g. if open descriptors on the related file or directory remain), then a final uninit when the descriptors finally go away. * Enhanced the original CCMS code and the new ccms_inode to track three different cache coherency domains: (1) A recursive topological domain which covers the inode and entire subtree. (2) An attribute domain covering only the inode attributes, and (3) A data domain covering a data offset range or directory key range. * Local cache states are implemented for the attribute and data range domains, the topological domain is not yet properly recursive. * Remotely-granted cache states are not yet implemented.