kernel: Move GPL'd kernel files to sys/gnu to have them all in one place. This affects files in sys/dev/sound/pci/gnu and sys/vfs/gnu/ext2fs. sys/gnu is analogous to the gnu directory for userland, i.e. below it, we follow the same hierarchy as in /usr/src/sys. This commit changes the location of the ext2fs headers, which are public, so I've bumped __DragonFly_version in order to have something to patch against in pkgsrc, in case this causes build breakage for any packages.
kevent: Restore old EV_EOF semantics - EV_EOF should be set when the other side closed the connection, even if there are data pending in the read buffer (the old semantics). - EV_NODATA is added to indicate there are no more data pending in the buffer and EOF is detected (EV_EOF is also set in this situation). Kernel code now tests EV_NODATA instead of EV_EOF, since EV_NODATA delivers the information which was delivered by the EV_EOF before this commit. DragonFly-Bug: http://bugs.dragonflybsd.org/issue1998
kernel - Do a better job with the filesystem background sync * Adjust code for MNT_LAZY, MNT_NOWAIT, and MNT_WAITOK to reflect the fact that they are three different flags and not enumeration constants. * HAMMER now sets VMSC_ONEPASS for MNT_LAZY syncs (background filesystem sync). This generally reduces instances where the background sync winds up running continuously when heavy filesystem ops saturate the disk. Fewer vnodes dirtied after the sync is initiated will get caught up in the sync.
kernel - Fix read event on file for select/poll API * select/poll have always returned an immediate read event on regular files, but kqueue is expected to only return a EVFILT_READ event when not sitting at the file EOF. * The kernel adds a NOTE_OLDAPI flag which filter functions can use to discern between select/poll and kqueue related knotes. * Adjust filesystem filter function to always return an immediate event for reads via select/poll. * Fixes guile, which for some reason beyond our ken select()'s for a read event on a file. Reported-by: Johannes Hofmann <johannes.hofmann@gmx.de>
kernel - Make filters able to be marked MPSAFE * Change struct filterops f_isfd field to f_flags, taking FILTEROP_ISFD and/or FILTEROP_MPSAFE. * Convert all existing filter definitions to use new flags. * Create filter_attach/detach/event wrapper functions for calling through the struct filterops vector that grab the MPLOCK as necessary. * kern_event() uses kq->kq_count to determine whether or not to sleep, kqueue_scan() removes events from the TAILQ and can possibly sleep, releasing the global kq token, before updating kq->kq_count.
kernel - Fix kqfilter error return codes * Some kqfilters returned an Exxx error, others return 1 on error, and the device kq code returned -1 on error. * All kqfilters now return a proper Exxx error. * When an EVFILT is not implemented, EOPNOTSUPP is now returned. EPERM is no longer returned.
kernel - lwkt_token revamp * Simplify the token API. Hide the lwkt_tokref mechanics and simplify the lwkt_gettoken()/lwkt_reltoken() API to remove the need to declare and pass a lwkt_tokref along with the token. This makes tokens operate more like locks. There is a minor restriction that tokens must be unlocked in exactly the reverse order they were locked in, and another restriction limiting the maximum number of tokens a thread can hold to defined value (32 for now). The tokrefs are now an array embedded in the thread structure. * Improve performance when blocking and unblocking threads with recursively held tokens. * Improve performance when acquiring the same token recursively. This operation is now O(1) and requires no locks or critical sections of any sort. This will allow us to acquire redundant tokens in deep call paths without having to worry about performance issues. * Add a flags field to the lwkt_token and lwkt_tokref structures and add a flagged feature which will acquire the MP lock along with a particular token. This will be used as a transitory mechanism in upcoming MPSAFE work. The mplock feature in the token structure can be directly connected to a mpsafe sysctl without being vulnerable to state-change races.
kernel - VM PAGER part 2/2 - Expand vinitvmio() and vnode_pager_alloc() * vinitvmio() is responsible for assigning the initial VM object size based on the file size. Adjust vinitvmio() to conform to the new nvextendbuf() and nvtruncbuf() API. * vinitvmio() has been given two additional parameters, blksize and boff, to allow it to determine how much larger the VM object must be relative to the byte-granular file size passed to it. * Remove vm_page_alloc() and remove the pgo_alloc vector from struct pagerops. Convert all the VM pager allocation procedures into global procedures which are called directly. Trying to feed everything through a single function was a joke when all the callers knew precisely what kind of VM object they were creating anyway. Add the extra arguments to vnode_pager_alloc() which vinitvmio() needs to pass in.
kernel - ufs, ext2fs getpages/putpages cleanup * Completely remove the original ffs_getpages/ffs_putpages code and remove the vfs.ffs.getpages_uses_bufcache sysctl. UFS/FFS now unconditionally use vop_stdgetpages and vop_stdputpages. * ext2fs already unconditionally calls vnode_pager_generic_getpages(). Remove the shim and adjust ext2fs's .vop_getpages to point directly to vop_stdgetpages().
kernel - Finish implementing PG_RAM / pipelined mmap operation * Finish implementing the PG_RAM read-ahead mark code. This code allows the VM system to generate pipelining faults when reading a memory mapped file sequentially. This allows programs which scan files via mmap() to max-out the I/O system, similar to read(). Before this change programs using mmap() could not get better then ~70-80% disk utilization for sequential I/O. This commit passes the sequential access flag through to the VOP_GETPAGES code which then adjusts the sequential access heuristic in the ioflags accordingly.
Kernel - fix access checks * VOP_ACCESS() is used for more then just access(). UFS and other filesystems (but not HAMMER) were calling it in the open/create/rename/ unlink paths. The uid/gid must be used in those cases, not the ruid/rgid. Add a VOP_EACCESS() macro which passes the appropriate flag to use the uid/gid instead of the ruid/rgid, and adjust the filesystems to use this macro. Reported-by: Stathis Kamperis <ekamperi@gmail.com>
DEVFS - rollup - all kernel devices * Make changes needed to kernel devices to use devfs. * Also pre-generate some devices (usually 4) to support system utilities which do not yet deal with the auto-cloning device support. * Adjust the spec_vnops for various filesystems to vector to dummy code for read and write, for VBLK/VCHR nodes in old filesystems which are no longer supported. Submitted-by: Alex Hornung <ahornung@gmail.com>
HAMMER / VFS_VGET - Add optional dvp argument to VFS_VGET(). Fix readdirplus * VGET is used by NFS to acquire a vnode given an inode number. HAMMER requires additional information to determine the PFS the inode is being acquired from. Add an optional directory vnode argument to the VGET. If non-NULL, HAMMER will extract the PFS information from this vnode. * Adjust NFS to pass the dvp to VGET when doing a readdirplus. Note that the PFS is already encoded in file handles, but readdirplus acquires the attributes for each directory entry it scans (readdir does not). This fixes readdirplus for NFS served HAMMER PFS exports.
buffer cache - Control all access to the buf red-black trees with vp->v_token Access to the buffer cache's RB trees is now controlled via vp->v_token instead of a critical section. We still hold the BGL but this is not quite as simple as it seems because an interrupt calling biodone() on a B_INVAL buffer may now potentially block, where as it would not have before. The buffer is locked.