gitweb.dragonflybsd.org Git - dragonfly.git/atom - sys/vfs/hpfs/hpfs_vfsops.c history

kernel - Normalize the vx_*() vnode interface

2020-03-03T21:26:48Z

kernel - Normalize the vx_*() vnode interface

* The vx_*() vnode interface is used for initial allocations, reclaims,
  and terminations.

  Normalize all use cases to prevent the mixing together of the vx_*()
  API and the vn_*() API.  For example, vx_lock() should not be paired
  with vn_unlock(), and so forth.

* Integrate an update-counter mechanism into the vx_*() API, assert
  reasonability.

* Change vfs_cache.c to use an int update counter instead of a long.
  The vfs_cache code can't quite use the spin-lock update counter API
  yet.

  Use proper atomics for load and store.

* Implement VOP_GETATTR_QUICK, meant to be a 'quick' version of
  VOP_GETATTR() that only retrieves information related to permissions
  and ownership.  This will be fast-pathed in a later commit.

* Implement vx_downgrade() to convert an exclusive vx_lock into an
  exclusive vn_lock (for vnodes).  Adjust all use cases in the
  getnewvnode() path.

* Remove unnecessary locks in tmpfs_getattr() and don't use
  any in tmpfs_getattr_quick().

* Remove unnecessary locks in hammer2_vop_getattr() and don't use
  any in hammer2_vop_getattr_quick()

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel - Rejigger mount code to add vfs_flags in struct vfsops

2020-02-14T23:58:22Z

kernel - Rejigger mount code to add vfs_flags in struct vfsops

* Rejigger the mount code so we can add a vfs_flags field to vfsops,
  which mount_init() has visibility to.

* Allows nullfs to flag that its mounts do not need a syncer thread.
  Previously nullfs would destroy the syncer thread after the
  fact.

* Improves dsynth performance (it does lots of nullfs mounts).

[D B] sys/vfs/hpfs/hpfs_vfsops.c

Rename some functions to better names.

2019-12-01T11:03:21Z

Rename some functions to better names.

devfs_find_device_by_udev() -> devfs_find_device_by_devid()
dev2udev()                  -> devid_from_dev()
udev2dev()                  -> dev_from_devid()

This fits with the rest of the code. 'dev' usually means a cdev_t,
such as in make_dev(), etc. Instead of 'udev', use 'devid', since
that's what dev_t is, a "Device ID".

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel: Save some indent here and there and some small cleanup.

2016-06-09T18:12:00Z

kernel: Save some indent here and there and some small cleanup.

All these are related to an inspection of the places where we do:

if (...) {
   ...
   goto blah;
} else {
   ...
}

in which case the 'else' is not needed.

I only changed places where I thought that it improves readability or
is just as readable without the 'else'.

[D B] sys/vfs/hpfs/hpfs_vfsops.c

sys/kern: Don't implement .vfs_sync unless sync is supported

2016-05-24T15:09:48Z

sys/kern: Don't implement .vfs_sync unless sync is supported

The only reason filesystems without requirement of syncing
(e.g. no backing storage) need to implement .vfs_sync is because
those fs need a sync with a return value of 0 on unmount.

If unmount allows sync with return value of EOPNOTSUPP for fs
that do not support sync, those fs no longer have to implement
.vfs_sync with vfs_stdsync() only to pass dounmount().

The drawback is when there is a sync (other than vfs_stdnosync)
that returns EOPNOTSUPP for real errors. The existing fs in
DragonFly don't do this (and shouldn't either).

Also see https://bugs.dragonflybsd.org/issues/2912.

 # grep "\.vfs_sync" sys/vfs sys/gnu/vfs -rI | grep vfs_stdsync
 sys/vfs/udf/udf_vfsops.c:       .vfs_sync =             vfs_stdsync,
 sys/vfs/portal/portal_vfsops.c: .vfs_sync =             vfs_stdsync
 sys/vfs/devfs/devfs_vfsops.c:   .vfs_sync       = vfs_stdsync,
 sys/vfs/isofs/cd9660/cd9660_vfsops.c:   .vfs_sync =             vfs_stdsync,
 sys/vfs/autofs/autofs_vfsops.c: .vfs_sync =             vfs_stdsync,    /* for unmount(2) */
 sys/vfs/tmpfs/tmpfs_vfsops.c:   .vfs_sync =                     vfs_stdsync,
 sys/vfs/dirfs/dirfs_vfsops.c:   .vfs_sync =                     vfs_stdsync,
 sys/vfs/ntfs/ntfs_vfsops.c:     .vfs_sync =             vfs_stdsync,
 sys/vfs/procfs/procfs_vfsops.c: .vfs_sync =             vfs_stdsync
 sys/vfs/hpfs/hpfs_vfsops.c:     .vfs_sync =             vfs_stdsync,
 sys/vfs/nullfs/null_vfsops.c:   .vfs_sync =             vfs_stdsync,

[D B] sys/vfs/hpfs/hpfs_vfsops.c

devfs: add passing of file pointer through to dev_dclose

2014-01-14T22:02:40Z

devfs: add passing of file pointer through to dev_dclose

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel - Performance tuning

2013-11-09T04:59:32Z

kernel - Performance tuning

* Use a shared lock in the exec*() code, open, close, chdir, fchdir,
  access, stat, and readlink.

* Adjust nlookup() to allow the last namecache record in a path to be
  locked shared if it is already resolved, and the caller requests it.

* Remove nearly all global locks from critical dsched paths.  Defer
  creation of the tdio until an I/O actually occurs (huge savings in
  the fork/exit paths).

* Improves fork/exec concurrency on monster of static binaries from
  14200/sec to 55000/sec+.  For dynamic binaries improve from around
  2500/sec to 9000/sec or so (48 cores fork/exec'ing different dynamic
  binaries).  For the same dynamic binary it's more around 5000/sec or
  so.

  Lots of issues here including the fact that all dynamic binaries load
  many shared resources, even hen the binaries are different programs.
  AKA libc.so.X and ld-elf.so.2, as well as /dev/urandom (from libc),
  and access numerous common path elements.

  Nearly all of these paths are now non-contending.  The major remaining
  contention is in per-vm_page/PMAP manipulation.  This is per-page and
  concurrent execs of the same program tend to pipeline so it isn't a
  big problem.

[D B] sys/vfs/hpfs/hpfs_vfsops.c

hpfs - Fix a couple panics and a little cleanup.

2012-11-22T10:47:58Z

hpfs - Fix a couple panics and a little cleanup.

* Fix compilation with HPFS_DEBUG.

* Fix a panic due CNP_PDIRUNLOCK flag not being cleared.

* Fix a panic where returned vnode after a lookup is not
  NULL in the ENOENT case.

* Disable write support completely. It was pretty minimal
  and operations like create or rename were not supported.

It has been tested with a filesystem created by OS/2 Warp 2.1.
Copying data out of it worked fine, but there is still an
outstanding issue with overlapping buffers.

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel: Use NULL instead of 0 for pointers, part 1/x.

2012-09-10T21:37:54Z

kernel: Use NULL instead of 0 for pointers, part 1/x.

Found-with: Coccinelle (http://coccinelle.lip6.fr/)

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel: Remove some bogus casts of NULL to something.

2012-05-01T17:48:16Z

kernel: Remove some bogus casts of NULL to something.

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel: Use NULL for pointers.

2012-01-03T21:22:30Z

kernel: Use NULL for pointers.

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel: Replace all usage of MALLOC()/FREE() with kmalloc()/kfree().

2011-12-06T17:58:34Z

kernel: Replace all usage of MALLOC()/FREE() with kmalloc()/kfree().

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel - Greatly improve shared memory fault rate concurrency / shared tokens

2011-11-15T09:02:24Z

kernel - Greatly improve shared memory fault rate concurrency / shared tokens

This commit rolls up a lot of work to improve postgres database operations
and the system in general.  With this changes we can pgbench -j 8 -c 40 on
our 48-core opteron monster at 140000+ tps, and the shm vm_fault rate
hits 3.1M pps.

* Implement shared tokens.  They work as advertised, with some cavets.

  It is acceptable to acquire a shared token while you already hold the same
  token exclusively, but you will deadlock if you acquire an exclusive token
  while you hold the same token shared.

  Currently exclusive tokens are not given priority over shared tokens so
  starvation is possible under certain circumstances.

* Create a critical code path in vm_fault() using the new shared token
  feature to quickly fault-in pages which already exist in the VM cache.
  pmap_object_init_pt() also uses the new feature.

  This increases fault-in concurrency by a ridiculously huge amount,
  particularly on SHM segments (say when you have a large number of postgres
  clients).  Scaling for large numbers of clients on large numbers of
  cores is significantly improved.

  This also increases fault-in concurrency for MAP_SHARED file maps.

* Expand the breadn() and cluster_read() APIs.  Implement breadnx() and
  cluster_readx() which allows a getblk()'d bp to be passed.  If *bpp is not
  NULL a bp is being passed in, otherwise the routines call getblk().

* Modify the HAMMER read path to use the new API.  Instead of calling
  getcacheblk() HAMMER now calls getblk() and checks the B_CACHE flag.
  This gives getblk() a chance to regenerate a fully cached buffer from
  VM backing store without having to acquire any hammer-related locks,
  resulting in even faster operation.

* If kern.ipc.shm_use_phys is set to 2 the VM pages will be pre-allocated.
  This can take quite a while for a large map and also lock the machine
  up for a few seconds.  Defaults to off.

* Reorder the smp_invltlb()/cpu_invltlb() combos in a few places, running
  cpu_invltlb() last.

* An invalidation interlock might be needed in pmap_enter() under certain
  circumstances, enable the code for now.

* vm_object_backing_scan_callback() was failing to properly check the
  validity of a vm_object after acquiring its token.  Add the required
  check + some debugging.

* Make vm_object_set_writeable_dirty() a bit more cache friendly.

* The vmstats sysctl was scanning every process's vm_map (requiring a
  vm_map read lock to do so), which can stall for long periods of time
  when the system is paging heavily.  Change the mechanic to a LWP flag
  which can be tested with minimal locking.

* Have the phys_pager mark the page as dirty too, to make sure nothing
  tries to free it.

* Remove the spinlock in pmap_prefault_ok(), since we do not delete page
  table pages it shouldn't be needed.

* Add a required cpu_ccfence() in pmap_inval.c.  The code generated prior
  to this fix was still correct, and this makes sure it stays that way.

* Replace several manual wiring cases with calls to vm_page_wire().

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel: Add missing MODULE_VERSION()s for file systems.

2011-10-29T09:57:42Z

kernel: Add missing MODULE_VERSION()s for file systems.

The loader will figure out by itself whether to load a module or not,
depending on whether it's already in the kernel config or not, iif
MODULE_VERSION() is present.

I.e., if MSDOSFS (that has MODULE_VERSION()) is in the config and
msdos_load="YES" is in /boot/loader.conf, msdos.ko will not be loaded
by the loader at all.

Without MODULE_VERSION() it will lead (in the best case) to whining in
dmesg like for ahci or (in the worst case) to weird behavior, such as
for nullfs:

# mount -a
null: vfsload(null): No such file or directory

Therefore, we definitely want MODULE_VERSION() for all new modules.

This commit is the first in a series to add the missing MODULE_VERSION()s.

I know that ufs is not a module, just included it for completeness' sake.

Reported-by: marino, tuxillo

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel - Remove mplock shims from global tokens

2011-01-20T06:29:17Z

kernel - Remove mplock shims from global tokens

* Remove the mplock safety shims from all global tokens.

* Remove the mplock flag and API arguments.  All tokens
  are now always MPSAFE.

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel - Add additional fields to kinfo_cputime

2010-08-24T04:50:29Z

kernel - Add additional fields to kinfo_cputime

* Add a message field and address to allow the kernel to report contention
  points on the cpus to userland.

* Enhance the mplock and token subsystems to record contention points.

* Enhance the scheduler to record contention information in the
  per-cpu cpu_time structure.

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel - lwkt_token revamp

2010-06-06T17:26:42Z

kernel - lwkt_token revamp

* Simplify the token API.  Hide the lwkt_tokref mechanics and simplify
  the lwkt_gettoken()/lwkt_reltoken() API to remove the need to declare
  and pass a lwkt_tokref along with the token.

  This makes tokens operate more like locks.  There is a minor restriction
  that tokens must be unlocked in exactly the reverse order they were locked
  in, and another restriction limiting the maximum number of tokens a thread
  can hold to defined value (32 for now).

  The tokrefs are now an array embedded in the thread structure.

* Improve performance when blocking and unblocking threads with recursively
  held tokens.

* Improve performance when acquiring the same token recursively.  This
  operation is now O(1) and requires no locks or critical sections of any
  sort.

  This will allow us to acquire redundant tokens in deep call paths
  without having to worry about performance issues.

* Add a flags field to the lwkt_token and lwkt_tokref structures and add
  a flagged feature which will acquire the MP lock along with a particular
  token.  This will be used as a transitory mechanism in upcoming MPSAFE
  work.

  The mplock feature in the token structure can be directly connected
  to a mpsafe sysctl without being vulnerable to state-change races.

[D B] sys/vfs/hpfs/hpfs_vfsops.c

kernel - fine-grained namecache and partial vnode MPSAFE work

2009-12-28T06:36:07Z

kernel - fine-grained namecache and partial vnode MPSAFE work

			Namecache subsystem

* All vnode->v_flag modifications now use vsetflags() and vclrflags().
  Because some flags are set and cleared by vhold()/vdrop() which
  do not require any locks to be held, all modifications must use atomic
  ops.

* Clean up and revamp the namecache MPSAFE work.  Namecache operations now
  use a fine-grained MPSAFE locking model which loosely follows these
  rules:

  - lock ordering is child to parent.  e.g. lock file, then lock parent
    directory.  This allows resolver recursions up the parent directory
    chain.

  - Downward-traversing namecache invalidations and path lookups will
    unlock the parent (but leave it referenced) before attempting to
    lock the child.

  - Namecache hash table lookups utilize a per-bucket spinlock.

  - vnode locks may be acquired while holding namecache locks but not
    vise-versa.  VNodes are not destroyed until all namecache references
    go away, but can enter reclamation.  Namecache lookups detect the case
    and re-resolve to overcome the race.  Namecache entries are not
    destroyed while referenced.

* Remove vfs_token, the namecache MPSAFE model is now totally fine-grained.

* Revamp namecache locking primitves (cache_lock/cache_unlock and
  friends).  Use atomic ops and nc_exlocks instead of nc_locktd and
  build-in a request flag.  This solves busy/tsleep races between lock
  holder and lock requester.

* Revamp namecache parent/child linkages.  Instead of using vfs_token to
  lock such operations we simply lock both child and parent namecache
  entries.  Hash table operations are also fully integrated with the
  parent/child linking operations.

* The vnode->v_namecache list is locked via vnode->v_spinlock, which
  is actually vnode->v_lock.lk_spinlock.

* Revamp cache_vref() and cache_vget().  The passed namecache entry must
  be referenced and locked.  Internals are simplified.

* Fix a deadlock by moving the call to _cache_hysteresis() to a
  place where the current thread otherwise does not hold any locked
  ncp's.

* Revamp nlookup() to follow the new namecache locking rules.

* Fix a number of places, e.g. in vfs/nfs/nfs_subs.c, where ncp->nc_parent
  or ncp->nc_vp was being accessed with an unlocked ncp.  nc_parent
  and nc_vp accesses are only valid if the ncp is locked.

* Add the vfs.cache_mpsafe sysctl, which defaults to 0.  This may be set
  to 1 to enable MPSAFE namecache operations for [l,f]stat() and open()
  system calls (for the moment).

			VFS/VNODE subsystem

* Use a global spinlock for now called vfs_spin to manage vnode_free_list.
  Use vnode->v_spinlock (and vfs_spin) to manage vhold/vdrop ops and
  to interlock v_auxrefs tests against vnode terminations.

* Integrate per-mount mnt_token and (for now) the MP lock into VOP_*()
  and VFS_*() operations.  This allows the MP lock to be shifted further
  inward from the system calls, but we don't do it quite yet.

* HAMMER: VOP_GETATTR, VOP_READ, and VOP_INACTIVE are now MPSAFE.  The
  corresponding sysctls have been removed.

* FIFOFS: Needed some MPSAFE work in order to allow HAMMER to make things
  MPSAFE above, since HAMMER forwards vops for in-filesystem fifos to
  fifofs.

* Add some debugging kprintf()s when certain MP races are averted, for
  testing only.

				MISC

* Add some assertions to the VM system.

* Document existing and newly MPSAFE code.

[D B] sys/vfs/hpfs/hpfs_vfsops.c

DEVFS - rollup - all kernel devices

2009-08-04T04:42:54Z

DEVFS - rollup - all kernel devices

* Make changes needed to kernel devices to use devfs.

* Also pre-generate some devices (usually 4) to support system utilities
  which do not yet deal with the auto-cloning device support.

* Adjust the spec_vnops for various filesystems to vector to dummy
  code for read and write, for VBLK/VCHR nodes in old filesystems
  which are no longer supported.

Submitted-by: Alex Hornung

[D B] sys/vfs/hpfs/hpfs_vfsops.c

HAMMER / VFS_VGET - Add optional dvp argument to VFS_VGET(). Fix readdirplus

2009-07-23T05:00:13Z

HAMMER / VFS_VGET - Add optional dvp argument to VFS_VGET().  Fix readdirplus

* VGET is used by NFS to acquire a vnode given an inode number.  HAMMER
  requires additional information to determine the PFS the inode is being
  acquired from.

  Add an optional directory vnode argument to the VGET.  If non-NULL, HAMMER
  will extract the PFS information from this vnode.

* Adjust NFS to pass the dvp to VGET when doing a readdirplus.

  Note that the PFS is already encoded in file handles, but readdirplus
  acquires the attributes for each directory entry it scans (readdir does
  not).  This fixes readdirplus for NFS served HAMMER PFS exports.

[D B] sys/vfs/hpfs/hpfs_vfsops.c