hammer2 - Try to reduce no-activity stalls during complex flushes * Hammer2 keeps track of directory dependencies to maintain meta-data consistency at flush boundaries. This can cause issues when heavy simultaneous front-end activity blows out dirty buffer limits and stalls in 'h2memw'. These front-end stalls are not supposed to be holding vnodes, but there do appear to be cases where the backend flusher is not able to immediately acquire some vnode locks during the flush. This causes the backend flush to skip that vnode but also introduce some static delays (rather than becoming cpu-bound). The backend flush ultimately restarts the flush and tries again. Situations can develop where the backend also stalls in a sequence of 'h2syndel' tsleep delays, resulting in zero cpu activity (frontend is stalled in 'h2memw'), and zero disk activity (backend is also stalled) for a short period of time. * This problem does not lead to permanent deadlocks, however. H2 is always able to recover. * Rearrange a 'h2syndel' tsleep() call in the backend flusher. Instead of tsleep on a per-failed-to-lock-vnode basis, we now finish flushing the remaining vnodes, then try to wakeup processes blocked in 'h2memw' on the frontend, and THEN sleep for a few ticks before restarting. This is an attempt to close the gap causing these periods of no-activity.
hammer2 - Multitude of SMP contention fixes, work on flush * Change the hammer2_io RBTREE to a hash table with per-entry locks. This reduces contention in the hammer2 block I/O subsystem which used to be protected by a single lock. * Change the hammer2_inode RBTREE to a hash table with per-entry locks. This reduces contention in the hammer2 inode cache which used to be protected by a single lock. * Replace the hammer2_chain LRU cache with a per-inode cluster cache, which caches the last cluster-related chain. These caches are designed to hold a deep chain with 0 refs (and thus its parent recursion) to avoid having to reconstitute and recheck the chains on every VOP. For example when doing sequential I/O on a file. Probably needs more work. * Use the new trigger_syncer_start() and trigger_syncer_end() API to fix flush waits when the frontend is be asked to do large bulk modifying operations (such as file creation). The old code still worked but could sometimes cause processes to pause for up to 30 seconds when the flush wait raced the syncer. The flush wait wound up waiting for the next filesystem sync.
sys/vfs/hammer2: Fix memory leak for second PFS mount and after When mounting, hmp and its ->devvpl are initialized only if it's the first PFS mount of that hmp. For the second PFS mount and after, malloc'd memory in a devvpl list (64 bytes per device per PFS) has been leaked. Found on FreeBSD on kldunload. -- Warning: memory type hammer2_mount leaked memory on destroy (1 allocations, 64 bytes leaked).
sys/vfs/hammer2: Fix -Wpointer-sign warnings Warned on Linux user space. The name pointer points to namecache name which is of char*. vnops passes name pointer to hammer2_dirent_create() which takes char*. warning: pointer targets in passing argument 2 of 'hammer2_dirent_create' differ in signedness [-Wpointer-sign]
sys/vfs/hammer2: Fix many comments * "lru_spin; /* inumber lookup */" -> lru_spin isn't for inode. * "If an error occurred we eat the lock" -> "eat the lock" was removed in c603b86b77206805493fc181d3576ecd1786e056 in 2015. The rest of the comment (not removed) seems obsolete too. * "removed from the parent's btree" -> Typo for rbtree. * "pointing it to an embedded data structure and copying the data from the buffer" -> No longer implemented like this since 01eabad4d93a8dc8f0f01a6209b384b1e010bb8c in 2012. * "Called to clean up cached DIOs on umount" -> This isn't specific to unmount, called regularly if iofree_count > dio_limit. * "voldata is not yet loaded" -> It's already loaded. * "so do not pass cluster", etc -> "cluster" no longer appears here since b7add6753e221920947c96fab3314c39a2f67fe4 in 2015. * "multiple hammer2_inode structures can be aliased to the same chain element, for example for hardlinks" -> This comment from 2013 seems to only apply to hardlink mechanism back then, which was something completely different.
sys/vfs/hammer2: Make sure PFS exists after chain lookup on mount If pmp is NULL at this point, it panics, so return EINVAL. What makes the existing code complicated is pmp could have already been found via @label matching about 300 lines above, regardless of chain lookup result right before this diff. So return EINVAL on `pmp == NULL` rather than `chain->error != 0`. In any case, something is wrong if chain lookup failed despite pmp.
sys/vfs/hammer2: Rename hammer2_chain_core_init() -> hammer2_chain_init() Currently this function is just a common function to initialize chains. What it initializes are no longer limited to chain->core as opposed to when it first appeared in 0dea3156dc9c037aae4fd9fb00c631a401f62e5a in 2013. Remove "core" for clarity and remove irrelevant comment.