kernel - TMPFS - Add infrastructure to main kernel to help support TMPFS * Add buwrite(), similar to bdwrite() except it fakes the write, marks the pages as valid and dirty, and returns them to the VM system leaving the buffer cache buffer clean. This is used by tmpfs in tmpfs_write() and allows the entire VM page cache to be used to cache dirty tmpfs data instead of just the buffer cache. Also add vm_page_ste_validdirty() to suppor buwrite(). * Implement MNTK_SG_MPSAFE for future use by tmpfs. * Fix a bug in swap_strategy(). When the entire block being requested is sparse (has no swap assignments) the function was not properly biodone()'ing the original bio after zero-filling the space.
kernel - SWAP CACHE part 19/many - distinguish bulk data in HAMMER block dev * Add buf->flags/B_NOTMETA, vm_page->flags/PG_NOTMETA. If set the pages underlying the buffer will not be considered meta-data from the point of view of the swapcache. * HAMMER must sometimes access bulk data via the block device instead of via a file vnode. For example, the reblocking and mirroring code. We do not want this data to be misinterpreted as meta-data when the meta-data-only swapcache is turned on, otherwise it will blow out the actual meta-data in the swapcache. HAMMER_RECTYPE_DATA and HAMMER_RECTYPE_DB are considered normal data. All other record types (e.g. direntry, inode, etc) are meta-data.
kernel - Improve VM fault performance for sequential access * VM fault I/O pipelining was not working properly. * Temporarily fix pipelining by introducing PG_RAM, A read-ahead mark for vm_page_t, and adjust vm_fault to pass VM pages through to getpages calls if PG_RAM is set, even if they are fully valid. * Remove code in vnode_pager_generic_getpages() which shortcutted the operation when the requested page was fully valid. This prevented read-aheads from being issued. * A more permanent solution is in the works (basically getting rid of the whole VM read-ahead/read-behind array entirely, just passing a single page through to vnode_pager_generic_getpages(), and letting the filesystem handle the read-ahead in a more efficient fashion. Reported-by: "Mikhail T." <mi+thun@aldan.algebra.com>
Revert "rename amd64 architecture to x86_64" This reverts commit c1543a890188d397acca9fe7f76bcd982481a763. I'm reverting it because: 1) the change didn't get properly discussed 2) it was based on false premises: "The rest of the world seems to call amd64 x86_64." 3) no pkgsrc bulk build was done to test the change 4) the original committer acted irresponsibly by committing such a big change just before going on vacation.
Kernel - more NFS fixes, more dirty bit fixes, remove vfs_bio_set_validclean() * Remove vfs_bio_set_validclean(). It is no longer needed. * General getpages operations must clear dirty bits non-inclusive of the end of the range. A read which partially overlaps dirty VM pages shouldn't happen in the first place but if it were to happen we don't want to lose the dirty status on the DEV_BSIZE'd chunk straddling the end of the read. * General truncation support. Replace previous fix with a call to a new inline, vm_page_clear_dirty_beg_nonincl(). Similar to the getpages() issue, we do not want to lose the dirty status on the DEV_BSIZE'd chunk straddling the beginning of a truncation. (side note: Only effecs NFS as all other filesystems DEV_BSIZE-align their operations, but a good general fix in anycase).
Fix many bugs and issues in the VM system, particularly related to heavy paging. * (cleanup) PG_WRITEABLE is now set by the low level pmap code and not by high level code. It means 'This page may contain a managed page table mapping which is writeable', meaning that hardware can dirty the page at any time. The page must be tested via appropriate pmap calls before being disposed of. * (cleanup) PG_MAPPED is now handled by the low level pmap code and only applies to managed mappings. There is still a bit of cruft left over related to the pmap code's page table pages but the high level code is now clean. * (bug) Various XIO, SFBUF, and MSFBUF routines which bypass normal paging operations were not properly dirtying pages when the caller intended to write to them. * (bug) vfs_busy_pages in kern/vfs_bio.c had a busy race. Separate the code out to ensure that we have marked all the pages as undergoing IO before we call vm_page_protect(). vm_page_protect(... VM_PROT_NONE) can block under very heavy paging conditions and if the pages haven't been marked for IO that could blow up the code. * (optimization) Make a minor optimization. When busying pages for write IO, downgrade the page table mappings to read-only instead of removing them entirely. * (bug) In platform/pc32/i386/pmap.c fix various places where pmap_inval_add() was being called at the wrong point. Only one was critical, in pmap_enter(), where pmap_inval_add() was being called so far away from the pmap entry being modified that it could wind up being flushed out prior to the modification, breaking the cpusync required. pmap.c also contains most of the work involved in the PG_MAPPED and PG_WRITEABLE changes. * (bug) Close numerous pte updating races with hardware setting the modified bit. There is still one race left (in pmap_enter()). * (bug) Disable pmap_copy() entirely. Fix most of the bugs anyway, but there is still one left in the handling of the srcmpte variable. * (cleanup) Change vm_page_dirty() from an inline to a real procedure, and move the code which set the object to writeable/maybedirty into vm_page_dirty(). * (bug) Calls to vm_page_protect(... VM_PROT_NONE) can block. Fix all cases where this call was made with a non-busied page. All such calls are now made with a busied page, preventing blocking races from re-dirtying or remapping the page unexpectedly. (Such blockages could only occur during heavy paging activity where the underlying page table pages are being actively recycled). * (bug) Fix the pageout code to properly mark pages as undergoing I/O before changing their protection bits. * (bug) Busy pages undergoing zeroing or partial zeroing in the vnode pager (vm/vnode_pager.c) to avoid unexpected effects.
Fix a bug in umtx_sleep(). This function sleeps on the mutex's physical address and will get lost if the physical page underlying the VM address is copied on write. This case can occur when a threaded program fork()'s. Introduce a VM page event notification mechanism and use it to wake-up the umtx_sleep() if the underlying page takes a COW fault. Reported-by: Jordan Gordeev <jgordeev@dir.bg>, "Simon 'corecode' Schubert" <corecode@xxxxxxxxxxxx>
Replace the global VM page hash table with a per-VM-object RB tree. No performance degradation was observed (probably due to locality of reference in the RB tree improving cache characteristics for searches). This also significantly reduces the kernel memory footprint (no global VM page hash table) and reduces the size of the vm_page structure. Future MP work should benefit from this change. Prior work in the VM tree guarenteed that VM pages only existed in the hash table while also associated with a VM object, this commit uses that guarentee to make the VM page lookup structures VM-object-centric.
I'm growing tired of having to add #include lines for header files that the include file(s) I really want depend on. Go through nearly all major system include files and add appropriately #ifndef'd #include lines to include all dependant header files. Kernel source files now only need to #include the header files they directly depend on. So, for example, if I wanted to add a SYSCTL to a kernel source file, I would only have to #include <sys/sysctl.h> to bring in the support for it, rather then four or five header files in addition to <sys/sysctl.h>.