From e9b560582904689977f992db6d5a91304b89ec4a Mon Sep 17 00:00:00 2001 From: Matthew Dillon Date: Tue, 16 Feb 2010 23:52:14 -0800 Subject: [PATCH] kernel - SWAP CACHE part 20/many - add 'cache' and 'noscache' chflags. * Allow directory hierarchies to be selected for data caching when using vm.swapcache.data_enable. * Add the vm.swapcache.use_chflags sysctl which defaults to ON and enables use of the new chflags flags to determine what directory trees the swapcache will cache data from. * Add chflags cache and noscache. The flags are tracked recursively by the namecache and do *NOT* have to be set recursively in the directory tree. Setting a flag in a top-level directory is sufficient to cover the entire subtree. chflags cache - Any regular file in the subtree will be cached by swapcache. chflags noscache - Disables any swapcacheing of data in the subtree, overrides any use of chflags cache in the subtree. NOTE: Only applies to file data. The caching of file meta-data by swapcache is controlled globally by vm.swapcache.meta_enable and ignores chflags flags.. * Adjust the manual pages for swapcache and chflags. * NOTE! The default has been changed to require the use of chflags, data caching will not occur unless you either turn off the vm.swapcache.use_chflags sysctl (which enables data caching globally) or do something like 'chflags cache /'. Of course vm.swapcache.read_enable must also be turned on for swapcache to cache file data. * NOTE! World must be rebuilt for libc, chflags, and ls to understand the new flags. --- lib/libc/gen/strtofflags.c | 7 +++++- share/man/man8/swapcache.8 | 46 ++++++++++++++++++++++++++++++++++++-- sys/kern/vfs_nlookup.c | 42 +++++++++++++++++++++++++++------- sys/kern/vfs_vnops.c | 16 +++++++++++++ sys/sys/namecache.h | 8 +++---- sys/sys/nlookup.h | 1 - sys/sys/stat.h | 2 ++ sys/sys/vnode.h | 3 +-- sys/vm/vm_swapcache.c | 23 ++++++++++++++++--- test/test/baaz | 2 ++ usr.bin/chflags/chflags.1 | 39 +++++++++++++++++++++++++++++++- 11 files changed, 167 insertions(+), 22 deletions(-) diff --git a/lib/libc/gen/strtofflags.c b/lib/libc/gen/strtofflags.c index 2ab28c43d2..f505563fa7 100644 --- a/lib/libc/gen/strtofflags.c +++ b/lib/libc/gen/strtofflags.c @@ -73,7 +73,12 @@ static struct { { "nohistory", UF_NOHISTORY, 1 }, #endif { "nouunlnk", UF_NOUNLINK, 1 }, - { "nouunlink", UF_NOUNLINK, 1 } + { "nouunlink", UF_NOUNLINK, 1 }, +#ifdef UF_CACHE + { "nocache", UF_CACHE, 0 }, + { "noucache", UF_CACHE, 0 }, + { "noscache", SF_NOCACHE, 1 }, +#endif }; #define longestflaglen 12 #define nmappings (sizeof(mapping) / sizeof(mapping[0])) diff --git a/share/man/man8/swapcache.8 b/share/man/man8/swapcache.8 index efce81bb14..1714dbef73 100644 --- a/share/man/man8/swapcache.8 +++ b/share/man/man8/swapcache.8 @@ -26,6 +26,7 @@ data and meta-data. .Cd sysctl vm.swapcache.read_enable=0 .Cd sysctl vm.swapcache.meta_enable=0 .Cd sysctl vm.swapcache.data_enable=0 +.Cd sysctl vm.swapcache.use_chflags=1 .Cd sysctl vm.swapcache.maxlaunder=256 .Sh DESCRIPTION .Nm @@ -47,7 +48,8 @@ recovered sufficiently for write activity to resume. .Cd vm.swapcache.meta_enable enables the writing of filesystem meta-data to the swapcache. Filesystem metadata is any data which the filesystem accesses via the disk device -using buffercache. +using buffercache. Meta-data is cached globally regardless of file +or directory flags. .Pp .Cd vm.swapcache.data_enable enables the writing of filesystem file-data to the swapcache. Filesystem @@ -56,6 +58,17 @@ In technical terms, when the buffer cache is used to access a regular file through its vnode. Please do not blindly turn on this option, see the PERFORMANCE TUNING section for more information. .Pp +.Cd vm.swapcache.use_chflags +enables the use of the +.Cm cache +and +.Cm noscache +.Xr chflags 1 +flags to control which files will be data-cached. +If this sysctl is disabled and data_enable is enabled, +the system will ignore file flags and attempt to swapcache all +regular files. +.Pp .Cd vm.swapcache.read_enable enables reading from the swapcache and should be set to 1 for normal operation. @@ -130,6 +143,19 @@ increase to cover the entire directory topology being served. Each vnode requires about 1K of physical ram. .Pp +When data caching is turned on you generally want to use +.Xr chflags 1 +with the +.Cm cache +flag to enable data caching on a directory. +This flag is tracked by the namecache and does not need to be +recursively set in the directory tree. +Simply setting the flag in a top level directory is sufficient. +A typical setup is something like this: +.Pp +.Dl chflags cache /etc /sbin /bin /usr /home +.Dl chflags noscache /usr/obj +.Pp .It Cd vm.swapcache.maxfilesize This may be used to reduce cache thrashing when a focus on a small potentially fragmented filespace is desired, leaving the @@ -150,6 +176,10 @@ larger bursts. The larger bursts also tend to improve SSD performance as the SSD itself can do a better job write-combining and erasing blocks. .Pp +.It Cd vm_swapcache.maxswappct +This controls the maximum amount of swapspace +.Nm +may use, in percentage terms. .El .Pp Finally, interleaved swap (multiple SSDs) may be used to increase @@ -215,7 +245,8 @@ swapcache will become fragmented within a single regular file and the constant back-and-forth between the swapcache and the hard drive will result in excessive seeking on the hard drive. .Sh SWAPCACHE SIZE & MANAGEMENT -The swapcache feature will use up to 75% of configured swap space. +The swapcache feature will use up to 75% of configured swap space +by default. The remaining 25% is reserved for normal paging operation. The system operator should configure at least 4 times the SWAP space versus main memory and no less than 8G of swap space. @@ -223,6 +254,17 @@ If a 40G SSD is used the recommendation is to configure 16G to 32G of swap (note: 32-bit is limited to 32G of swap by default, for 64-bit it is 512G of swap). .Pp +The +.Cd vm_swapcache.maxswappct +sysctl may be used to change the default. +You may have to change this default if you also use +.Xr tmpfs 5 , +.Xr vn 4 , +or if you have not allocated enough swap for reasonable normal paging +activity to occur (in which case you probably shouldn't be using +.Nm +anyway). +.Pp If swapcache reaches the 75% limit it will begin tearing down swap in linear bursts by iterating through available VM objects, until swap space use drops to 70%. The tear-down is limited by the rate at diff --git a/sys/kern/vfs_nlookup.c b/sys/kern/vfs_nlookup.c index 385b5e1488..59e24d3101 100644 --- a/sys/kern/vfs_nlookup.c +++ b/sys/kern/vfs_nlookup.c @@ -71,6 +71,9 @@ #include #endif +static int naccess(struct nchandle *nch, int vmode, struct ucred *cred, + int *stickyp, int *cflagsp); + /* * Initialize a nlookup() structure, early error return for copyin faults * or a degenerate empty string (which is not allowed). @@ -372,6 +375,7 @@ nlookup(struct nlookupdata *nd) int error; int len; int dflags; + int cflags; #ifdef KTRACE if (KTRPOINT(nd->nl_td, KTR_NAMEI)) @@ -395,6 +399,7 @@ nlookup(struct nlookupdata *nd) * Loop on the path components. At the top of the loop nd->nl_nch * is ref'd and unlocked and represents our current position. */ + cflags = nd->nl_nch.ncp->nc_flag & (NCF_SF_PNOCACHE | NCF_UF_PCACHE); for (;;) { /* * Make sure nl_nch is locked so we can access the vnode, resolution @@ -439,7 +444,8 @@ nlookup(struct nlookupdata *nd) * Check directory search permissions. */ dflags = 0; - if ((error = naccess(&nd->nl_nch, NLC_EXEC, nd->nl_cred, &dflags)) != 0) + error = naccess(&nd->nl_nch, NLC_EXEC, nd->nl_cred, &dflags, &cflags); + if (error) break; /* @@ -533,7 +539,8 @@ nlookup(struct nlookupdata *nd) par.mount = nch.mount; cache_hold(&par); cache_lock(&par); - error = naccess(&par, 0, nd->nl_cred, &dflags); + cflags = par.ncp->nc_flag & (NCF_SF_PNOCACHE | NCF_UF_PCACHE); + error = naccess(&par, 0, nd->nl_cred, &dflags, &cflags); cache_put(&par); } } @@ -586,7 +593,7 @@ nlookup(struct nlookupdata *nd) error = EROFS; } else { error = naccess(&nch, nd->nl_flags | dflags, - nd->nl_cred, NULL); + nd->nl_cred, NULL, &cflags); } } if (error == 0 && wasdotordotdot && @@ -737,7 +744,7 @@ nlookup(struct nlookupdata *nd) */ if (nch.ncp->nc_vp && (nd->nl_flags & NLC_ALLCHKS)) { error = naccess(&nch, nd->nl_flags | dflags, - nd->nl_cred, NULL); + nd->nl_cred, NULL, &cflags); if (error) { cache_put(&nch); break; @@ -880,12 +887,13 @@ fail: * The passed ncp must be referenced and locked. */ int -naccess(struct nchandle *nch, int nflags, struct ucred *cred, int *nflagsp) +naccess(struct nchandle *nch, int nflags, struct ucred *cred, + int *nflagsp, int *cflagsp) { struct vnode *vp; struct vattr va; int error; - int sticky; + int cflags; ASSERT_NCH_LOCKED(nch); if (nch->ncp->nc_flag & NCF_UNRESOLVED) { @@ -912,10 +920,10 @@ naccess(struct nchandle *nch, int nflags, struct ucred *cred, int *nflagsp) error = EINVAL; } else if (error == 0 || error == ENOENT) { par.mount = nch->mount; - sticky = 0; cache_hold(&par); cache_lock(&par); - error = naccess(&par, NLC_WRITE, cred, NULL); + cflags = par.ncp->nc_flag & (NCF_SF_PNOCACHE | NCF_UF_PCACHE); + error = naccess(&par, NLC_WRITE, cred, NULL, &cflags); cache_put(&par); } } @@ -1005,6 +1013,24 @@ naccess(struct nchandle *nch, int nflags, struct ucred *cred, int *nflagsp) *nflagsp |= NLC_IMMUTABLE; } + /* + * Track swapcache management flags in the namecache. + * (*cflagsp) tracks and returns the cumulative parent state + * while nc_flag gets the old parent state and the new + * flags state from the vap. + */ + cflags = *cflagsp; + nch->ncp->nc_flag &= ~(NCF_SF_PNOCACHE | NCF_UF_PCACHE); + nch->ncp->nc_flag |= cflags; + + if (va.va_flags & SF_NOCACHE) + cflags |= NCF_SF_PNOCACHE | NCF_SF_NOCACHE; + if (va.va_flags & UF_CACHE) + cflags |= NCF_UF_PCACHE | NCF_UF_CACHE; + *cflagsp = cflags & (NCF_SF_PNOCACHE | NCF_UF_PCACHE); + nch->ncp->nc_flag &= ~(NCF_SF_NOCACHE | NCF_UF_CACHE); + nch->ncp->nc_flag |= cflags & (NCF_SF_NOCACHE | NCF_UF_CACHE); + /* * Process general access. */ diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c index abe40630a4..6263e70574 100644 --- a/sys/kern/vfs_vnops.c +++ b/sys/kern/vfs_vnops.c @@ -105,6 +105,7 @@ vn_open(struct nlookupdata *nd, struct file *fp, int fmode, int cmode) struct vattr vat; struct vattr *vap = &vat; int error; + u_int flags; /* * Certain combinations are illegal @@ -241,6 +242,21 @@ again: goto bad; } + /* + * Set or clear VNSWAPCACHE on the vp based on nd->nl_nch.ncp->nc_flag. + * These particular bits a tracked all the way from the root. + * + * NOTE: Might not work properly on NFS servers due to the + * disconnected namecache. + */ + flags = nd->nl_nch.ncp->nc_flag; + if ((flags & (NCF_UF_CACHE | NCF_UF_PCACHE)) && + (flags & (NCF_SF_NOCACHE | NCF_SF_PNOCACHE)) == 0) { + vsetflags(vp, VSWAPCACHE); + } else { + vclrflags(vp, VSWAPCACHE); + } + /* * Setup the fp so VOP_OPEN can override it. No descriptor has been * associated with the fp yet so we own it clean. diff --git a/sys/sys/namecache.h b/sys/sys/namecache.h index 3efbddd14a..c50e92401f 100644 --- a/sys/sys/namecache.h +++ b/sys/sys/namecache.h @@ -154,10 +154,10 @@ struct nchandle { #define NCF_WHITEOUT 0x0002 /* negative entry corresponds to whiteout */ #define NCF_UNRESOLVED 0x0004 /* invalid or unresolved entry */ #define NCF_ISMOUNTPT 0x0008 /* someone may have mounted on us here */ -#define NCF_UNUSED10 0x0010 -#define NCF_UNUSED20 0x0020 -#define NCF_UNUSED40 0x0040 -#define NCF_UNUSED80 0x0080 +#define NCF_SF_NOCACHE 0x0010 /* track swapcache chflags from attr */ +#define NCF_UF_CACHE 0x0020 +#define NCF_SF_PNOCACHE 0x0040 /* track from parent */ +#define NCF_UF_PCACHE 0x0080 #define NCF_ISSYMLINK 0x0100 /* represents a symlink */ #define NCF_ISDIR 0x0200 /* represents a directory */ #define NCF_DESTROYED 0x0400 /* name association is considered destroyed */ diff --git a/sys/sys/nlookup.h b/sys/sys/nlookup.h index ac0934396b..29eb7d3176 100644 --- a/sys/sys/nlookup.h +++ b/sys/sys/nlookup.h @@ -160,7 +160,6 @@ int nlookup_mp(struct mount *mp, struct nchandle *nch); int nlookup(struct nlookupdata *); int nreadsymlink(struct nlookupdata *nd, struct nchandle *nch, struct nlcomponent *nlc); -int naccess(struct nchandle *nch, int vmode, struct ucred *cred, int *stickyp); int naccess_va(struct vattr *va, int nflags, struct ucred *cred); #endif diff --git a/sys/sys/stat.h b/sys/sys/stat.h index 607e57492e..693080e874 100644 --- a/sys/sys/stat.h +++ b/sys/sys/stat.h @@ -183,6 +183,7 @@ struct stat { #define UF_NOUNLINK 0x00000010 /* file may not be removed or renamed */ #define UF_FBSDRSVD20 0x00000020 /* (unused) */ #define UF_NOHISTORY 0x00000040 /* do not retain history/snapshots */ +#define UF_CACHE 0x00000080 /* enable data swapcache */ /* * Super-user changeable flags. */ @@ -193,6 +194,7 @@ struct stat { #define SF_NOUNLINK 0x00100000 /* file may not be removed or renamed */ #define SF_FBSDRSVD20 0x00200000 /* (used by FreeBSD for snapshots) */ #define SF_NOHISTORY 0x00400000 /* do not retain history/snapshots */ +#define SF_NOCACHE 0x00800000 /* disable data swapcache */ #ifdef _KERNEL /* diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index 503d526fa1..93d42c67c0 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -310,8 +310,7 @@ struct vnode { #define VONWORKLST 0x00200000 /* On syncer work-list */ #define VMOUNT 0x00400000 /* Mount in progress */ #define VOBJDIRTY 0x00800000 /* object might be dirty */ - -/* open for business 0x01000000 */ +#define VSWAPCACHE 0x01000000 /* enable swapcache */ /* open for business 0x02000000 */ /* open for business 0x04000000 */ diff --git a/sys/vm/vm_swapcache.c b/sys/vm/vm_swapcache.c index ee316c7ee8..3cf5d45101 100644 --- a/sys/vm/vm_swapcache.c +++ b/sys/vm/vm_swapcache.c @@ -99,6 +99,8 @@ static int vm_swapcache_sleep; static int vm_swapcache_maxlaunder = 256; static int vm_swapcache_data_enable = 0; static int vm_swapcache_meta_enable = 0; +static int vm_swapcache_maxswappct = 75; +static int vm_swapcache_use_chflags = 1; /* require chflags cache */ static int64_t vm_swapcache_minburst = 10000000LL; /* 10MB */ static int64_t vm_swapcache_curburst = 4000000000LL; /* 4G after boot */ static int64_t vm_swapcache_maxburst = 2000000000LL; /* 2G nominal max */ @@ -115,6 +117,10 @@ SYSCTL_INT(_vm_swapcache, OID_AUTO, meta_enable, CTLFLAG_RW, &vm_swapcache_meta_enable, 0, ""); SYSCTL_INT(_vm_swapcache, OID_AUTO, read_enable, CTLFLAG_RW, &vm_swapcache_read_enable, 0, ""); +SYSCTL_INT(_vm_swapcache, OID_AUTO, maxswappct, + CTLFLAG_RW, &vm_swapcache_maxswappct, 0, ""); +SYSCTL_INT(_vm_swapcache, OID_AUTO, use_chflags, + CTLFLAG_RW, &vm_swapcache_use_chflags, 0, ""); SYSCTL_QUAD(_vm_swapcache, OID_AUTO, minburst, CTLFLAG_RW, &vm_swapcache_minburst, 0, ""); @@ -129,6 +135,9 @@ SYSCTL_QUAD(_vm_swapcache, OID_AUTO, accrate, SYSCTL_QUAD(_vm_swapcache, OID_AUTO, write_count, CTLFLAG_RW, &vm_swapcache_write_count, 0, ""); +#define SWAPMAX(adj) \ + ((int64_t)vm_swap_max * (vm_swapcache_maxswappct + (adj)) / 100) + /* * vm_swapcached is the high level pageout daemon. */ @@ -185,10 +194,10 @@ vm_swapcached(void) * repeat. */ if (state == SWAPC_WRITING) { - if (vm_swap_cache_use > (int64_t)vm_swap_max * 75 / 100) + if (vm_swap_cache_use > SWAPMAX(0)) state = SWAPC_CLEANING; } else { - if (vm_swap_cache_use < (int64_t)vm_swap_max * 70 / 100) + if (vm_swap_cache_use < SWAPMAX(-5)) state = SWAPC_WRITING; } @@ -267,8 +276,16 @@ vm_swapcache_writing(vm_page_t marker) switch(vp->v_type) { case VREG: - if (vm_swapcache_data_enable == 0) + /* + * If data_enable is 0 do not try to swapcache data. + * If use_chflags is set then only swapcache data for + * VSWAPCACHE marked vnodes, otherwise any vnode. + */ + if (vm_swapcache_data_enable == 0 || + ((vp->v_flag & VSWAPCACHE) == 0 && + vm_swapcache_use_chflags)) { continue; + } if (vm_swapcache_maxfilesize && object->size > (vm_swapcache_maxfilesize >> PAGE_SHIFT)) { diff --git a/test/test/baaz b/test/test/baaz index ad48ffac7e..ecf88001fb 100644 --- a/test/test/baaz +++ b/test/test/baaz @@ -4,3 +4,5 @@ I knew a crooked man who walked a crooked mile just to steal a penny leant against a crooked stile +Typing test: Now is the time for all good men to come to the aid of + their country. diff --git a/usr.bin/chflags/chflags.1 b/usr.bin/chflags/chflags.1 index ca1fcc4fd0..bad595851b 100644 --- a/usr.bin/chflags/chflags.1 +++ b/usr.bin/chflags/chflags.1 @@ -126,6 +126,8 @@ set the user append-only flag (owner or super-user only) set the user immutable flag (owner or super-user only) .It Cm uunlnk , uunlink set the user undeletable flag (owner or super-user only) +.It Cm cache , ucache, snocache +control the data swapcache (recursive) .El .Pp Putting the letters @@ -203,6 +205,40 @@ setting. See .Xr security 7 for more information on this setting. +.Sh SWAPCACHE FLAGS +The +.Cm [u]cache +bit may be set to enable swapcache data caching. +The superuser flag, +.Cm noscache +may be used to disable swapcache data caching and overrides the +user flag. +.Pp +The flag is recursive and need only be set on a top-level directory +to automatically apply to the entire subtree, though you may have +to refresh the namecache with a dummy +.Xr find 1 +command. +You do not have to recursive set the flag with +.Nm +.Op R +and, in fact, we do not recommend it under any circumstances. +.Pp +If you intend to use swapcache data the +.Cm vm.swapcache.use_chflags +sysctl determines whether the chflags flags are used or not. +If turned off and +.Cm vm.swapcache.data_enable +is turned on, data caching is turned on globally and the +file flags are ignored. +If use_chflags is turned on along with data_enable then only +subtrees marked cacheable will be swapcached. +.Pp +You would typically want to enable the cache on /usr, /home, and /bin +and disable it for /usr/obj. +.Pp +This only applies to data caching. Meta-data caching is universal when +enabled. .Sh EXIT STATUS .Ex -std .Sh SEE ALSO @@ -211,7 +247,8 @@ for more information on this setting. .Xr stat 2 , .Xr fts 3 , .Xr security 7 , -.Xr symlink 7 +.Xr symlink 7 , +.Xr swapcache 8 .Sh HISTORY The .Nm -- 2.41.0