gitweb.dragonflybsd.org Git - dragonfly.git/atom - sys/kern/vfs_journal.c history

: Get rid of udev_t.

2019-12-01T10:59:08Z

: Get rid of udev_t.

In a time long long ago, dev_t was a pointer, which later became cdev_t
during the great cleanups, until it ended up being a uint32_t, just like
udev_t. See for example the definitions of __dev_t in .

This commit cleans up further by removing the udev_t type, leaving just
the POSIX dev_t type for both kernel and userland. Put it inside a
_DEV_T_DECLARED to prepare for further cleanups in .

[D B] sys/kern/vfs_journal.c

kernel: Remove numerous #include .

2019-03-02T20:34:21Z

kernel: Remove numerous #include .

Most of them were added when we converted spl*() calls to
crit_enter()/crit_exit(), almost 14 years ago. We can now
remove a good chunk of them again for where crit_*() are
no longer used.

I had to adjust some files that were relying on thread2.h
or headers that it includes coming in via other headers
that it was removed from.

[D B] sys/kern/vfs_journal.c

kernel - Major signal path adjustments to fix races, tsleep race fixes, +more

2011-11-15T23:23:41Z

kernel - Major signal path adjustments to fix races, tsleep race fixes, +more

* Refactor the signal code to properly hold the lp->lwp_token.  In
  particular the ksignal() and lwp_signotify() paths.

* The tsleep() path must also hold lp->lwp_token to properly handle
  lp->lwp_stat states and interlocks.

* Refactor the timeout code in tsleep() to ensure that endtsleep() is only
  called from the proper context, and fix races between endtsleep() and
  lwkt_switch().

* Rename proc->p_flag to proc->p_flags

* Rename lwp->lwp_flag to lwp->lwp_flags

* Add lwp->lwp_mpflags and move flags which require atomic ops (are adjusted
  when not the current thread) to the new field.

* Add td->td_mpflags and move flags which require atomic ops (are adjusted
  when not the current thread) to the new field.

* Add some freeze testing code to the x86-64 trap code (default disabled).

[D B] sys/kern/vfs_journal.c

kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes

2011-10-18T17:36:11Z

kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes

This is a very large patch which reworks locking in the entire VM subsystem,
concentrated on VM objects and the x86-64 pmap code.  These fixes remove
nearly all the spin lock contention for non-threaded VM faults and narrows
contention for threaded VM faults to just the threads sharing the pmap.

Multi-socket many-core machines will see a 30-50% improvement in parallel
build performance (tested on a 48-core opteron), depending on how well
the build parallelizes.

As part of this work a long-standing problem on 64-bit systems where programs
would occasionally seg-fault or bus-fault for no reason has been fixed.  The
problem was related to races between vm_fault, the vm_object collapse code,
and the vm_map splitting code.

* Most uses of vm_token have been removed.  All uses of vm_spin have been
  removed.  These have been replaced with per-object tokens and per-queue
  (vm_page_queues[]) spin locks.

  Note in particular that since we still have the page coloring code the
  PQ_FREE and PQ_CACHE queues are actually many queues, individually
  spin-locked, resulting in very excellent MP page allocation and freeing
  performance.

* Reworked vm_page_lookup() and vm_object->rb_memq.  All (object,pindex)
  lookup operations are now covered by the vm_object hold/drop system,
  which utilize pool tokens on vm_objects.  Calls now require that the
  VM object be held in order to ensure a stable outcome.

  Also added vm_page_lookup_busy_wait(), vm_page_lookup_busy_try(),
  vm_page_busy_wait(), vm_page_busy_try(), and other API functions
  which integrate the PG_BUSY handling.

* Added OBJ_CHAINLOCK.  Most vm_object operations are protected by
  the vm_object_hold/drop() facility which is token-based.  Certain
  critical functions which must traverse backing_object chains use
  a hard-locking flag and lock almost the entire chain as it is traversed
  to prevent races against object deallocation, collapses, and splits.

  The last object in the chain (typically a vnode) is NOT locked in
  this manner, so concurrent faults which terminate at the same vnode will
  still have good performance.  This is important e.g. for parallel compiles
  which might be running dozens of the same compiler binary concurrently.

* Created a per vm_map token and removed most uses of vmspace_token.

* Removed the mp_lock in sys_execve().  It has not been needed in a while.

* Add kmem_lim_size() which returns approximate available memory (reduced
  by available KVM), in megabytes.  This is now used to scale up the
  slab allocator cache and the pipe buffer caches to reduce unnecessary
  global kmem operations.

* Rewrote vm_page_alloc(), various bits in vm/vm_contig.c, the swapcache
  scan code, and the pageout scan code.  These routines were rewritten
  to use the per-queue spin locks.

* Replaced the exponential backoff in the spinlock code with something
  a bit less complex and cleaned it up.

* Restructured the IPIQ func/arg1/arg2 array for better cache locality.
  Removed the per-queue ip_npoll and replaced it with a per-cpu gd_npoll,
  which is used by other cores to determine if they need to issue an
  actual hardware IPI or not.  This reduces hardware IPI issuance
  considerably (and the removal of the decontention code reduced it even
  more).

* Temporarily removed the lwkt thread fairq code and disabled a number of
  features.  These will be worked back in once we track down some of the
  remaining performance issues.

  Temproarily removed the lwkt thread resequencer for tokens for the same
  reason.  This might wind up being permanent.

  Added splz_check()s in a few critical places.

* Increased the number of pool tokens from 1024 to 4001 and went to a
  prime-number mod algorithm to reduce overlaps.

* Removed the token decontention code.  This was a bit of an eyesore and
  while it did its job when we had global locks it just gets in the way now
  that most of the global locks are gone.

  Replaced the decontention code with a fall back which acquires the
  tokens in sorted order, to guarantee that deadlocks will always be
  resolved eventually in the scheduler.

* Introduced a simplified spin-for-a-little-while function
  _lwkt_trytoken_spin() that the token code now uses rather than giving
  up immediately.

* The vfs_bio subsystem no longer uses vm_token and now uses the
  vm_object_hold/drop API for buffer cache operations, resulting
  in very good concurrency.

* Gave the vnode its own spinlock instead of sharing vp->v_lock.lk_spinlock,
  which fixes a deadlock.

* Adjusted all platform pamp.c's to handle the new main kernel APIs.  The
  i386 pmap.c is still a bit out of date but should be compatible.

* Completely rewrote very large chunks of the x86-64 pmap.c code.  The
  critical path no longer needs pmap_spin but pmap_spin itself is still
  used heavily, particularin the pv_entry handling code.

  A per-pmap token and per-pmap object are now used to serialize pmamp
  access and vm_page lookup operations when needed.

  The x86-64 pmap.c code now uses only vm_page->crit_count instead of
  both crit_count and hold_count, which fixes races against other parts of
  the kernel uses vm_page_hold().

  _pmap_allocpte() mechanics have been completely rewritten to remove
  potential races.  Much of pmap_enter() and pmap_enter_quick() has also
  been rewritten.

  Many other changes.

* The following subsystems (and probably more) no longer use the vm_token
  or vmobj_token in critical paths:

  x The swap_pager now uses the vm_object_hold/drop API instead of vm_token.

  x mmap() and vm_map/vm_mmap in general now use the vm_object_hold/drop API
    instead of vm_token.

  x vnode_pager

  x zalloc

  x vm_page handling

  x vfs_bio

  x umtx system calls

  x vm_fault and friends

* Minor fixes to fill_kinfo_proc() to deal with process scan panics (ps)
  revealed by recent global lock removals.

* lockmgr() locks no longer support LK_NOSPINWAIT.  Spin locks are
  unconditionally acquired.

* Replaced netif/e1000's spinlocks with lockmgr locks.  The spinlocks
  were not appropriate owing to the large context they were covering.

* Misc atomic ops added

[D B] sys/kern/vfs_journal.c

spinlocks - Rename API to spin_{try,un,}lock

2010-08-30T10:10:17Z

spinlocks - Rename API to spin_{try,un,}lock

* Rename the API to spin_trylock, spin_unlock and spin_lock instead of
  spin_lock_wr, spin_unlock_wr and spin_trylock_wr now that we only have
  exclusive spinlocks.

* 99% of this patch was generated by a semantic coccinelle patch

[D B] sys/kern/vfs_journal.c

kernel - All lwkt thread now start out mpsafe part 2/2

2010-08-29T00:30:29Z

kernel - All lwkt thread now start out mpsafe part 2/2

* Remove the TDF_MPSAFE flag entirely.  All thread creation of all
  types now start running the thread without the mplock.

  Drivers which aren't mpsafe immediately acquire the mplock.

[D B] sys/kern/vfs_journal.c

kernel - All lwkt thread now start out mpsafe part 1/2

2010-08-29T00:18:34Z

kernel - All lwkt thread now start out mpsafe part 1/2

* All callers of lwkt_init_thread(), lwkt_create() and lwkt_alloc_thread()
  now always pass TDF_MPSAFE and the flag is asserted in the low level
  thread creation code.

[D B] sys/kern/vfs_journal.c

kernel - Remove msf_buf dependency from VFS Journals

2010-02-25T16:11:16Z

kernel - Remove msf_buf dependency from VFS Journals

* UIO writes (VOP_WRITE case) are now handled by a copyin() directly to the
journal buffer.

* VOP_PUTPAGES case is now handled with XIO directly instead of with msf_buf's
as an abstraction over XIO.

[D B] sys/kern/vfs_journal.c

kernel - namecache MPSAFE work

2009-12-21T16:15:18Z

kernel - namecache MPSAFE work

* Most of the MPSAFE coding required for namecache operation.  The MP
  lock still surrounds this code.

* Use a per-bucket spinlock for nchashtbl[] lookups.

* Use a global spinlock for ncneglist.

* Use a global token for nc_parent interlocks.

* Use a per-vnode spinlock (v_spinlock == v_lock.lk_spinlock) to
  manage access to the v_namecache list.

* Recode namecache locks to use atomic_cmpset_ptr() based around
  nc_locktd instead of nc_exlocks.  nc_exlocks is still used to
  track the exclusive lock count.

  NOTE: There may be an issue with the way nc_lockreq is handled.

* Recode cache_hold/drop to use atomic_cmpset_int().

* Carefully code cache_zap() for MPSAFEness.  In particular, multiple
  locks must be held before it can be determined that a namecache
  structure no longer has any referers.  Here is where having the global
  token is really useful.

* cache_fullpath() and vn_fullpath() previously assumed the good graces
  of the MP lock and didn't bother holding refs on the namecache pointers
  they were traversing.  Now they do.

* nlookup*() functions also previously made assumptions with regards
  to holding refs.  Now they properly hold refs.

* struct namecache's nc_flag field is no longer modified outside of
  holding a lock on the structure, so we do not have to resort to
  atomic ops.

[D B] sys/kern/vfs_journal.c

AMD64 - Fix many compile-time warnings. int/ptr type mismatches, %llx, etc.

2009-06-24T19:31:02Z

AMD64 - Fix many compile-time warnings.  int/ptr type mismatches, %llx, etc.

[D B] sys/kern/vfs_journal.c

Give the device major / minor numbers their own separate 32 bit fields

2007-05-09T00:53:36Z

Give the device major / minor numbers their own separate 32 bit fields
in the kernel.  Change dev_ops to use a RB tree to index major device
numbers and remove the 256 device major number limitation.

Build a dynamic major number assignment feature into dev_ops_add() and
adjust ASR (which already had a hand-rolled one), and MFS to use the
feature.  MFS at least does not require any filesystem visibility to
access its backing device.  Major devices numbers >= 256 are used for
dynamic assignment.

Retain filesystem compatibility for device numbers that fall within the
range that can be represented in UFS or struct stat (which is a single
32 bit field supporting 8 bit major numbers and 24 bit minor numbers).

[D B] sys/kern/vfs_journal.c

Fix the incorrect addition of a leading '/' in file paths in the journaling

2007-01-25T18:19:31Z

Fix the incorrect addition of a leading '/' in file paths in the journaling
stream.  This addition was a side effect of changes made to the
namecache topology.

Submitted-by: "Steve O'Hara-Smith"

[D B] sys/kern/vfs_journal.c

Fix a number of places where the kernel assumed it could directly access

2007-01-12T06:06:58Z

Fix a number of places where the kernel assumed it could directly access
user memory.  Primarily the core dump code.

[D B] sys/kern/vfs_journal.c

Rename printf -> kprintf in sys/ and add some defines where necessary

2006-12-23T00:35:05Z

Rename printf -> kprintf in sys/ and add some defines where necessary
(files which are used in userland, too).

[D B] sys/kern/vfs_journal.c

Major namecache work primarily to support NULLFS.

2006-10-27T04:56:34Z

Major namecache work primarily to support NULLFS.

* Move the nc_mount field out of the namecache{} record and use a new
  namecache handle structure called nchandle { mount, ncp } for all
  API accesses to the namecache.

* Remove all mount point linkages from the namecache topology.  Each mount
  now has its own namecache topology rooted at the root of the mount point.

  Mount points are flagged in their underlying filesystem's namecache
  topology but instead of linking the mount into the topology, the flag
  simply triggers a mountlist scan to locate the mount.  ".." is handled
  the same way... when the root of a topology is encountered the scan
  can traverse to the underlying filesystem via a field stored in the
  mount structure.

* Ref the mount structure based on the number of nchandle structures
  referencing it, and do not kfree() the mount structure during a forced
  unmount if refs remain.

These changes have the following effects:

* Traversal across mount points no longer require locking of any sort,
  preventing process blockages occuring in one mount from leaking across
  a mount point to another mount.

* Aliased namespaces such as occurs with NULLFS no longer duplicate the
  namecache topology of the underlying filesystem.  Instead, a NULLFS
  mount simply shares the underlying topology (differentiating between
  it and the underlying topology by the fact that the name cache
  handles { mount, ncp } contain NULLFS's mount pointer.

  This saves an immense amount of memory and allows NULLFS to be used
  heavily within a system without creating any adverse impact on kernel
  memory or performance.

* Since the namecache topology for a NULLFS mount is shared with the
  underyling mount, the namecache records are in fact the same records
  and thus full coherency between the NULLFS mount and the underlying
  filesystem is maintained by design.

* Future efforts, such as a unionfs or shadow fs implementation, now
  have a mount structure to work with.  The new API is a lot more
  flexible then the old one.

[D B] sys/kern/vfs_journal.c

Rename malloc->kmalloc, free->kfree, and realloc->krealloc. Pass 1

2006-09-05T00:55:51Z

Rename malloc->kmalloc, free->kfree, and realloc->krealloc.  Pass 1

[D B] sys/kern/vfs_journal.c

Split kern/vfs_journal.c. Leave the low level journal support code in

2006-05-08T18:45:53Z

Split kern/vfs_journal.c.  Leave the low level journal support code in
kern/vfs_journal.c and move all the mount-based journaling code and
journaling VNOPS to kern/vfs_jops.c.  This is in preparation for utilizing
the core journaling protocol for userland VFS support.

[D B] sys/kern/vfs_journal.c

Recode the streamid selector. The streamid was faked before. Do it for

2006-05-07T00:24:58Z

Recode the streamid selector.  The streamid was faked before.  Do it for
real now, guarenteeing that parallel transactions will have unique stream
identifiers.

In addition, while not required, streamid calculations are such that non
parallel transactions will have a tendancy to use the same id, so someone
observing the streamid's in a journaling stream can easily pick out when
parallel transactions occur.

[D B] sys/kern/vfs_journal.c

The fdrop() procedure no longer needs a thread argument, remove it.

2006-05-06T06:38:39Z

The fdrop() procedure no longer needs a thread argument, remove it.

[D B] sys/kern/vfs_journal.c

The thread/proc pointer argument in the VFS subsystem originally existed

2006-05-06T02:43:15Z

The thread/proc pointer argument in the VFS subsystem originally existed
for...  well, I'm not sure *WHY* it originally existed when most of the
time the pointer couldn't be anything other then curthread or curproc or
the code wouldn't work.  This is particularly true of lockmgr locks.

Remove the pointer argument from all VOP_*() functions, all fileops functions,
and most ioctl functions.

[D B] sys/kern/vfs_journal.c