dragonfly.git
12 years agoAdd some lines about lwkt_serialize_adaptive_enter().
Sascha Wildner [Wed, 7 May 2008 20:03:09 +0000 (20:03 +0000)]
Add some lines about lwkt_serialize_adaptive_enter().

Submitted-by: sephe
12 years agoBump base development version to 197700 so it is properly distinct from
Matthew Dillon [Wed, 7 May 2008 17:26:28 +0000 (17:26 +0000)]
Bump base development version to 197700 so it is properly distinct from
the 1.12 release version.

Reported-by: Hasso Tepper
12 years agoCorrect comments and minor variable naming and sysctl issues.
Matthew Dillon [Wed, 7 May 2008 17:19:47 +0000 (17:19 +0000)]
Correct comments and minor variable naming and sysctl issues.

Reported-by: Fabio Checconi <fabio@gandalf.sssup.it>
12 years agoFix a sizeof() the wrong variable name. The correct variable was the same
Matthew Dillon [Tue, 6 May 2008 21:40:40 +0000 (21:40 +0000)]
Fix a sizeof() the wrong variable name.  The correct variable was the same
size so no harm done, but get it right.

Submitted-by: Fabio Checconi <fabio@gandalf.sssup.it>
12 years agoThe vkernel's maximum number of CPUs is now 16.
Sascha Wildner [Tue, 6 May 2008 18:55:01 +0000 (18:55 +0000)]
The vkernel's maximum number of CPUs is now 16.

12 years agoEnable kern.trap_mpsafe and kern.syscall_mpsafe by default for vkernels.
Matthew Dillon [Tue, 6 May 2008 18:43:02 +0000 (18:43 +0000)]
Enable kern.trap_mpsafe and kern.syscall_mpsafe by default for vkernels.

12 years agoRemove the SMP_MAXCPU override for vkernels, causing the build to revert
Matthew Dillon [Tue, 6 May 2008 18:37:58 +0000 (18:37 +0000)]
Remove the SMP_MAXCPU override for vkernels, causing the build to revert
to the i386 limit of 16.  This is not because vkernels couldn't handle more
(up to 31), but because I want the installed world to be compatible between
vkernel and pc32.

This unbreaks programs like 'vmstat -m'.

12 years agoAdd strings for some AMD features
Sepherosa Ziehau [Tue, 6 May 2008 10:05:02 +0000 (10:05 +0000)]
Add strings for some AMD features

12 years agoHAMMER 41B/Many: Cleanup.
Matthew Dillon [Tue, 6 May 2008 00:21:08 +0000 (00:21 +0000)]
HAMMER 41B/Many: Cleanup.

* Disable (most) debugging kprintfs unless a hammer debug sysctl is set.

* Do not allow buffers to be synced on panic.

12 years agoHAMMER Utilities: Sync with recent changes.
Matthew Dillon [Tue, 6 May 2008 00:15:35 +0000 (00:15 +0000)]
HAMMER Utilities: Sync with recent changes.

* Add some missing crc initializations.

* Fix an assertion that was breaking newfs_hammer on large disks.

12 years agoKeep track of the number of buffers undgoing IO, and include that number
Matthew Dillon [Tue, 6 May 2008 00:14:12 +0000 (00:14 +0000)]
Keep track of the number of buffers undgoing IO, and include that number
in calculations involving numdirtybuffers.  This prevents the kernel from
believing that there are only a few dirty buffers when, in fact, all the
dirty buffers are running IOs.

12 years agoOnly call bwillwrite() for regular file write()s, instead of for all write()s.
Matthew Dillon [Mon, 5 May 2008 22:09:44 +0000 (22:09 +0000)]
Only call bwillwrite() for regular file write()s, instead of for all write()s.
This stops hicuping on things like pty's and sockets during heavy file
activity.

12 years agoHAMMER Utilities: Feature add
Matthew Dillon [Mon, 5 May 2008 20:34:52 +0000 (20:34 +0000)]
HAMMER Utilities: Feature add

* Check record crc and signature in extra-verbose mode

* Adjustments for structural changes

* Generate proper CRCs for structures laid down by newfs_hammer

12 years agoHAMMER 41/Many: Implement CRC checking (WARNING: On-media structures changed)
Matthew Dillon [Mon, 5 May 2008 20:34:48 +0000 (20:34 +0000)]
HAMMER 41/Many: Implement CRC checking (WARNING: On-media structures changed)

* Generate and check on-media CRC fields.

* Clean up the record modification API

* Add a header signature field for future critical recovery

* Rearrange CRC fields for on-media structures to make them easier to
  deal with.

* Adjust the ioctl API

* When trying to back-out of an operation that errored, free allocated
  data blocks.

12 years ago- Add lwkt_serialize_adaptive_enter(9), it is same as lwkt_serialize_enter(9)
Sepherosa Ziehau [Mon, 5 May 2008 12:35:03 +0000 (12:35 +0000)]
- Add lwkt_serialize_adaptive_enter(9), it is same as lwkt_serialize_enter(9)
  except that it spins a little bit before sleeping.
- Under debug sysctl tree, add sysctl nodes to tune various backoff related
  parameter for lwkt_serialize_adaptive_enter(9).
- Add ktr for serializer enter end, exit begin, spin backoff and spin backoff
  failure.

Reviewed-by: corecode@
12 years agoUse mask instead of modulo, since bo->backoff is always power of 2
Sepherosa Ziehau [Mon, 5 May 2008 11:07:48 +0000 (11:07 +0000)]
Use mask instead of modulo, since bo->backoff is always power of 2

Suggested-by: dillon@
12 years agoHAMMER 40G/Many: UNDO cleanup & stabilization.
Matthew Dillon [Sun, 4 May 2008 19:57:42 +0000 (19:57 +0000)]
HAMMER 40G/Many: UNDO cleanup & stabilization.

* Fix a race in the undo record allocator that could result in a
  corrupted UNDO FIFO.

* Fix improperly placed calls to hammer_test_inode().

* Properly account for nlinks when a deleted ADD record is to be
  converted to a DEL record by the flush.  In this case the frontend's
  notion of nlinks accounts for the deletion but the backend must sync
  the record anyway, so the backend needs to bump the link count by one.

12 years agoHAMMER Utilities: enhanced show, timeout option
Matthew Dillon [Sun, 4 May 2008 19:18:17 +0000 (19:18 +0000)]
HAMMER Utilities: enhanced show, timeout option

* Enchange the show command when used with -vvv.  The command now reports
  directory entries and basic information about inodes.

* Add the [-t timeout] option.  The idea is to use this to limit the amount
  of time hammer spends reblocking or pruning a filesystem when running the
  command from a cron job.

* Adjust the format of the softlink option to be more consistent.

12 years agoAdjust to our current directory layout on pkgbox.
Sascha Wildner [Sun, 4 May 2008 17:07:49 +0000 (17:07 +0000)]
Adjust to our current directory layout on pkgbox.

Reported-by: aggelos and others
12 years agoHAMMER 40F/Many: UNDO cleanup & stabilization.
Matthew Dillon [Sun, 4 May 2008 09:06:45 +0000 (09:06 +0000)]
HAMMER 40F/Many: UNDO cleanup & stabilization.

* Properly classify UNDO zone buffers so they are flushed at the correct
  point in time.

* Minor rewrite of the code tracking the UNDO demark for the next flush.

* Introduce a considerably better backend flushing activation algorithm
  to avoid single-buffer flushes.

* Put a lock around the freemap allocator.

12 years agoThe direct-write pipe code has a bug in it somewhere when the system is
Matthew Dillon [Sun, 4 May 2008 08:42:03 +0000 (08:42 +0000)]
The direct-write pipe code has a bug in it somewhere when the system is
paging heavily.  Disable it for now.

12 years ago- Randomize spinlock exponential backoff value, which reduces the chance of
Sepherosa Ziehau [Sun, 4 May 2008 04:48:47 +0000 (04:48 +0000)]
- Randomize spinlock exponential backoff value, which reduces the chance of
  serious spinlock contention (probably) caused by same backoff steps
- Ktr spinlock backoff value and backoff failure
- Under debug sysctl tree, add sysctl node for spinlock backoff limit
- Break long lines

Reviewed-by: dillon@
12 years agoAdd more missing entries and fix more mistakes (some of which I introduced
Sascha Wildner [Sun, 4 May 2008 04:17:11 +0000 (04:17 +0000)]
Add more missing entries and fix more mistakes (some of which I introduced
in my last commit).

12 years agoFix some mistakes and add some missing entries.
Sascha Wildner [Sun, 4 May 2008 03:44:07 +0000 (03:44 +0000)]
Fix some mistakes and add some missing entries.

12 years agoPrint the 64 bit inode as a 64 bit quantity rather then a 32 bit quantity.
Matthew Dillon [Sun, 4 May 2008 02:28:28 +0000 (02:28 +0000)]
Print the 64 bit inode as a 64 bit quantity rather then a 32 bit quantity.

12 years agoAdd missing names and MLINKS.
Sascha Wildner [Sun, 4 May 2008 00:55:01 +0000 (00:55 +0000)]
Add missing names and MLINKS.

12 years agoCorrect a bug in seekdir/readdir which could cause the directory entry
Matthew Dillon [Sat, 3 May 2008 22:07:37 +0000 (22:07 +0000)]
Correct a bug in seekdir/readdir which could cause the directory entry
after a deleted entry to be skipped when seeking past the deleted entry.

NOTE: DragonFly has a specific issue even after this fix which currently
causes seekdirs to be unreliable if any files are deleted.  DragonFly
translates directory entries into a filesystem-independant form and if
the real filesystem collapses the entry, the offsets will not be maintained
in the machine-independant form.

Submitted-by: Marc Balmer <marc@msys.ch>
12 years agoHAMMER 40F/Many: Inode/link-count sequencer cleanup pass, UNDO cache.
Matthew Dillon [Sat, 3 May 2008 20:21:20 +0000 (20:21 +0000)]
HAMMER 40F/Many: Inode/link-count sequencer cleanup pass, UNDO cache.

* Implement an UNDO cache.  If we have already laid down an UNDO in the
  current flush cycle we do not have to lay down another one for the same
  address.  This greatly reduces the number of UNDOs we generate during
  a flush.

* Properly get the vnode in order to be able to issue vfsync()'s from the
  backend.  We may also have to acquire the vnode when doing an unload
  check for a file deletion.

* Properly generate UNDO records for the volume header.  During crash recovery
  we have to UNDO the volume header along with any partially written
  meta-data, because the volume header refers to the meta-data.

* Add another record type, GENERAL, representing inode or softlink records.

* Move the setting of HAMMER_INODE_WRITE_ALT to the backend, allowing
  the kernel to flush buffers up to the point where the backend syncs
  the inode.

12 years agoHAMMER 40E/Many: Inode/link-count sequencer cleanup pass.
Matthew Dillon [Sat, 3 May 2008 07:59:06 +0000 (07:59 +0000)]
HAMMER 40E/Many: Inode/link-count sequencer cleanup pass.

* An inode can go inactive before it is deleted, add an unload check
  in hammer_ip_del_directory to catch the nlinks == 0 case on an inactive
  inode.  Otherwise the inode would not be deleted on-media until umount.

* Add a missing resignaling case.

* Clean out a few more of the debug kprintf()'s

12 years agoHAMMER 40D/Many: Inode/link-count sequencer cleanup pass.
Matthew Dillon [Sat, 3 May 2008 05:28:55 +0000 (05:28 +0000)]
HAMMER 40D/Many: Inode/link-count sequencer cleanup pass.

* Move the vfsync from the frontend to the backend.  This allows the
  frontend to passively move inodes to the backend without having to
  actually start the flush, greatly improving performance.

* Use an inode lock to deal with directory entry syncing races between
  the frontend and the backend.  It isn't optimal but it's ok for now.

* Massively optimize the backend code by initializing a single cursor
  for an inode and passing the cursor to procedures, instead of having
  each procedure initialize its own cursor.

* Fix a sequencing issue with the backend.  While building the flush
  state for an inode another process could get in and initiate its own
  flush, screwing up the flush group and creating confusion.
  (hmp->flusher_lock)

* Don't lose track of HAMMER_FLUSH_SIGNAL flush requests.  If we get
  such a requet but have to flag a reflush, also flag that the reflush
  is to be signaled (done immediately when the current flush is done).

* Remove shared inode locks from hammer_vnops.c.  Their original purpose
  no longer exists.

* Simplify the arguments passed to numerous procedures (hammer_ip_first(),
  etc).

12 years agoPrint the path even if we do not understand the filesystem type.
Matthew Dillon [Sat, 3 May 2008 04:13:12 +0000 (04:13 +0000)]
Print the path even if we do not understand the filesystem type.

Fix a switch/case compiler warning.

12 years agoElaborate a bit more on lexical conventions and ISA device configuration.
Sascha Wildner [Fri, 2 May 2008 22:10:58 +0000 (22:10 +0000)]
Elaborate a bit more on lexical conventions and ISA device configuration.

Taken-from: FreeBSD

12 years agoHAMMER 40C/Many: Inode/link-count sequencer cleanup pass.
Matthew Dillon [Fri, 2 May 2008 16:41:26 +0000 (16:41 +0000)]
HAMMER 40C/Many: Inode/link-count sequencer cleanup pass.

* Fix a forever-syncing inode issue by properly clearing the XDIRTY flag
  when the last record is removed from ip->rec_tree.

12 years ago- Put exit ktr in proper place
Sepherosa Ziehau [Fri, 2 May 2008 11:17:19 +0000 (11:17 +0000)]
- Put exit ktr in proper place
- Add sleep_{beg,end} and wakeup_{beg,end} ktr

12 years agoWhite space
Sepherosa Ziehau [Fri, 2 May 2008 10:57:33 +0000 (10:57 +0000)]
White space

12 years agoUse a list with tags.
Sascha Wildner [Fri, 2 May 2008 10:46:33 +0000 (10:46 +0000)]
Use a list with tags.

12 years agoIntroduce ETHER_INPUT_CHAIN option:
Sepherosa Ziehau [Fri, 2 May 2008 07:40:32 +0000 (07:40 +0000)]
Introduce ETHER_INPUT_CHAIN option:
1) During RXEOF, we aggregate packets, which have same target CPU, instead of
   calling lwkt_sendmsg() for each input packet.
2) At the end of RXEOF, low level ipiq sending is used to dispatch mbuf chain
   to the target CPU.
3) On the target CPU, the ipi function puts mbuf to their belonging msgport.
   Note, though lwkt_sendmsg() is used in ipi function, no further ipi activity
   will happen, since we are on target CPU.

em(4) is made to aware of this option.
This option is off by default and has no effect on vlan(4) operation.

12 years agoHAMMER 40B/Many: Inode/link-count sequencer cleanup pass.
Matthew Dillon [Fri, 2 May 2008 06:51:57 +0000 (06:51 +0000)]
HAMMER 40B/Many: Inode/link-count sequencer cleanup pass.

* Fix data record leakage w/ final inode disposition on disk.

* Fix numerous live locks with infinitely re-syncing inodes.

12 years agoSweep over our manual pages and remove .Pp before a .Bd or .Bl without
Sascha Wildner [Fri, 2 May 2008 02:05:08 +0000 (02:05 +0000)]
Sweep over our manual pages and remove .Pp before a .Bd or .Bl without
-compact because it has no effect.

12 years agoHAMMER 40A/Many: Inode/link-count sequencer.
Matthew Dillon [Fri, 2 May 2008 01:00:42 +0000 (01:00 +0000)]
HAMMER 40A/Many: Inode/link-count sequencer.

* Remove the hammer_depend structure and build the dependancies directly
  into the hammer_record structure.

* Attempt to implement layout rules to ensure connectivity is maintained.
  This means, for example, that before HAMMER can flush a newly created
  file it will make sure the file has namespace connectivity to the
  directory it was created it, recursively to the root.

NOTE: 40A destabilizes the filesystem a bit, it's going to take a few
passes to get everything working properly.  There are numerous issues
with this commit.

12 years agoProperly yield to userland processes.
Simon Schubert [Fri, 2 May 2008 00:19:52 +0000 (00:19 +0000)]
Properly yield to userland processes.

12 years agoMove text that doesn't belong to a list outside of it.
Sascha Wildner [Thu, 1 May 2008 23:36:43 +0000 (23:36 +0000)]
Move text that doesn't belong to a list outside of it.

12 years agoAdd .It
Sascha Wildner [Thu, 1 May 2008 23:29:10 +0000 (23:29 +0000)]
Add .It

12 years agoMove .Pp outside of .Bl
Sascha Wildner [Thu, 1 May 2008 22:06:06 +0000 (22:06 +0000)]
Move .Pp outside of .Bl

12 years agoReduce vertical space.
Sascha Wildner [Thu, 1 May 2008 21:51:43 +0000 (21:51 +0000)]
Reduce vertical space.

12 years agoRemove some obsolete lines.
Sascha Wildner [Thu, 1 May 2008 20:24:01 +0000 (20:24 +0000)]
Remove some obsolete lines.

12 years agoTurn off yy_flex_realloc() related warnings (such as the one issued when
Sascha Wildner [Thu, 1 May 2008 20:01:24 +0000 (20:01 +0000)]
Turn off yy_flex_realloc() related warnings (such as the one issued when
building usr.sbin/config) by marking the function unused.

There are a number of things which decide whether it's used or not, such
as using REJECT, %option yylineno, and some trailing context patterns.

See NetBSD's revisions 1.10 & 1.19.

Taken-from: NetBSD

12 years agoSet a sensible mode on /etc/upgrade/Makefile_upgrade.inc .
Thomas E. Spanjaard [Thu, 1 May 2008 19:44:37 +0000 (19:44 +0000)]
Set a sensible mode on /etc/upgrade/Makefile_upgrade.inc .

12 years agoRegenerate the pciconf(8) database from the following files:
Sascha Wildner [Thu, 1 May 2008 18:02:45 +0000 (18:02 +0000)]
Regenerate the pciconf(8) database from the following files:

Hart:    Jan 22, 2008 (version 671)
Boemler: May  1, 2008
Mares:   Mar  1, 2008

12 years agoAdd FreeBSD 7.1 (which is already referenced in cmx.4).
Sascha Wildner [Thu, 1 May 2008 13:04:51 +0000 (13:04 +0000)]
Add FreeBSD 7.1 (which is already referenced in cmx.4).

12 years agoMention that BCM430[69] chips do not work properly on channel 1/2/3
Sepherosa Ziehau [Thu, 1 May 2008 12:34:06 +0000 (12:34 +0000)]
Mention that BCM430[69] chips do not work properly on channel 1/2/3

12 years agoSync with FreeBSD (adds OpenBSD 4.3).
Sascha Wildner [Thu, 1 May 2008 12:27:18 +0000 (12:27 +0000)]
Sync with FreeBSD (adds OpenBSD 4.3).

12 years agoktr the end of various ipiq sending operation.
Sepherosa Ziehau [Thu, 1 May 2008 09:37:48 +0000 (09:37 +0000)]
ktr the end of various ipiq sending operation.

12 years agoRemove obsolete keywords: conflicts, controller, disk, tape
Sascha Wildner [Thu, 1 May 2008 09:24:42 +0000 (09:24 +0000)]
Remove obsolete keywords: conflicts, controller, disk, tape

Remove obsolete option: -n

12 years agoktr cpu_send_ipiq
Sepherosa Ziehau [Thu, 1 May 2008 02:11:39 +0000 (02:11 +0000)]
ktr cpu_send_ipiq

12 years ago- Promote em(4) polling begin/end ktr into polling(4)
Sepherosa Ziehau [Thu, 1 May 2008 02:03:28 +0000 (02:03 +0000)]
- Promote em(4) polling begin/end ktr into polling(4)
- Add crit section around if_poll

12 years agoEnforce proper sequencing of world and kernel targets.
Simon Schubert [Wed, 30 Apr 2008 23:05:33 +0000 (23:05 +0000)]
Enforce proper sequencing of world and kernel targets.

.ORDER: does *not* take an arbitrary list of targets of which all pairs
are supposed to be built in their specified sequence,
instead it specifies which adjacent pairs need to be built in sequence.
As a result, given a sequence "buildworld buildkernel quickkernel" and
the make targets "buildworld" and "quickkernel", make would still
parallelize the build of these targets.

Additionally, introduce quickworld to the sequencing.

12 years ago* Mention that bmake must be used for pkgsrc.
Sascha Wildner [Wed, 30 Apr 2008 21:45:28 +0000 (21:45 +0000)]
* Mention that bmake must be used for pkgsrc.

* Add references to pkg_radd(1) and pkg_search(1).

* Adjust documentation URL (taken from NetBSD).

12 years agoHave vfsync() call buf_checkwrite() on buffers with bioops to determine
Matthew Dillon [Wed, 30 Apr 2008 17:34:11 +0000 (17:34 +0000)]
Have vfsync() call buf_checkwrite() on buffers with bioops to determine
whether it is ok to write out a buffer or not.  Used by HAMMER to prevent
specfs from syncing out meta-data at the wrong time.

12 years agoAdd pmap_unmapdev() calls to detach functions for drivers which used
Matthew Dillon [Wed, 30 Apr 2008 17:28:17 +0000 (17:28 +0000)]
Add pmap_unmapdev() calls to detach functions for drivers which used
pmap_mapdev(), when possible.

12 years agoCothreads do not have a globaldata context and cannot handle signals
Matthew Dillon [Wed, 30 Apr 2008 16:59:45 +0000 (16:59 +0000)]
Cothreads do not have a globaldata context and cannot handle signals
which require one.  Add SIGTERM, SIGWINCH, and SIGUSR2 to the list of
signals cothreads mask.

Reported-by: Rumko <rumcic@gmail.com>
12 years agoAdd tunable for each_burst.
Sepherosa Ziehau [Wed, 30 Apr 2008 09:30:59 +0000 (09:30 +0000)]
Add tunable for each_burst.

12 years agoAdd KTR_TESTLOG to LINT.
Sascha Wildner [Wed, 30 Apr 2008 09:15:08 +0000 (09:15 +0000)]
Add KTR_TESTLOG to LINT.

Mention KTR_SERIALIZER & KTR_TESTLOG in ktr(4).

12 years agoChange the SMP wakeup() code to send an IPI to the target cpu's in parallel
Matthew Dillon [Wed, 30 Apr 2008 04:19:57 +0000 (04:19 +0000)]
Change the SMP wakeup() code to send an IPI to the target cpu's in parallel
instead of chaining the message.  This fixes a stack depth assertion in the
IPI processing code that Sephe was hitting in his network work.  The target
cpu _wakeup() code no longer recurses the IPI subsystem.

Reported-by: "Sepherosa Ziehau" <sepherosa@gmail.com>
12 years agoAdd some assertions when a buffer is reused
Matthew Dillon [Wed, 30 Apr 2008 04:11:44 +0000 (04:11 +0000)]
Add some assertions when a buffer is reused

12 years agoThe driver was improperly using kmem_free() instead of pmap_unmapdev(),
Matthew Dillon [Wed, 30 Apr 2008 04:05:21 +0000 (04:05 +0000)]
The driver was improperly using kmem_free() instead of pmap_unmapdev(),
and hitting a recently added assertion.  Use the proper call.

Reported-by: Rumko <rumcic@gmail.com>
12 years agoKTR various serializer operation
Sepherosa Ziehau [Tue, 29 Apr 2008 16:00:12 +0000 (16:00 +0000)]
KTR various serializer operation

12 years agoThree int arguments are used in IPIQ_STRING
Sepherosa Ziehau [Tue, 29 Apr 2008 14:26:09 +0000 (14:26 +0000)]
Three int arguments are used in IPIQ_STRING

12 years agoRemove unneeded argument.
Sascha Wildner [Tue, 29 Apr 2008 11:59:08 +0000 (11:59 +0000)]
Remove unneeded argument.

12 years agoUse 'MS-DOS' and not 'MS DOS' or 'MSDOS'.
Sascha Wildner [Tue, 29 Apr 2008 09:33:41 +0000 (09:33 +0000)]
Use 'MS-DOS' and not 'MS DOS' or 'MSDOS'.

12 years agoFix section ref.
Sascha Wildner [Tue, 29 Apr 2008 09:02:45 +0000 (09:02 +0000)]
Fix section ref.

12 years agoHAMMER 39B/Many: Cleanup pass
Matthew Dillon [Tue, 29 Apr 2008 04:43:08 +0000 (04:43 +0000)]
HAMMER 39B/Many: Cleanup pass

* Correct an issue where an inode flush was not being immediately picked
  up by the flusher, causing frontend I/O to stall.

12 years agoHAMMER Utilities: zone limit
Matthew Dillon [Tue, 29 Apr 2008 01:11:02 +0000 (01:11 +0000)]
HAMMER Utilities: zone limit

* newfs_hammer now sets a default zone limit in the volume header.

12 years agoHAMMER 39/Many: Parallel operations optimizations
Matthew Dillon [Tue, 29 Apr 2008 01:10:37 +0000 (01:10 +0000)]
HAMMER 39/Many: Parallel operations optimizations

* Implement a per-direct cache of new object IDs.  Up to 128 directories
  will be managed in LRU fashion.  The cached provides a pool of object
  IDs to better localize the object ids of files created in a directory,
  so parallel operations on the filesystem do not create a fragmented
  object id space.

* Cache numerous fields in the root volume's header to avoid creating
  undo records for them, creatly improving

  (ultimately we can sync an undo space representing the volume header
  using a direct comparison mechanic but for now we assume the write of
  the volume header to be atomic).

* Implement a zone limit for the blockmap which newfs_hammer can install.
  The blockmap zones have an ultimate limit of 2^60 bytes, or around
  one million terrabytes.  If you create a 100G filesystem there is no
  reason to let the blockmap iterate over its entire range as that would
  result in a lot of fragmentation and blockmap overhead.  By default
  newfs_hammer sets the zone limit to 100x the size of the filesystem.

* Fix a bug in the crash recovery code.  Do not sync newly added inodes
  once the flusher is running, otherwise the volume header can get out
  of sync.  Just create a dummy marker structure and move it to the tail
  of the inode flush_list when the flush starts, and stop when we hit it.

* Adjust hammer_vfs_sync() to sync twice.  The second sync is needed to
  update the volume header's undo fifo indices, otherwise HAMMER will
  believe that it must undo the last fully synchronized flush.

12 years agoPaging and swapping system fixes.
Matthew Dillon [Mon, 28 Apr 2008 21:16:27 +0000 (21:16 +0000)]
Paging and swapping system fixes.

* Do not try to free a VM page after a failed IO read from swap.  It is
  illegal to free a VM page from an interrupt.  Just deactivate it instead.

* Do not attempt to move a VM page into the cache queue after a successful
  pageout from the vnode or swap pagers, and do not try to adjust page
  protections to read-only (they should already be read-only).  Both
  operations require making serious pmap calls which we really do not
  want to do from an interrupt.

  Instead, leave the page on its current queue or, if the system is low
  on pages, deactivate the page.

The pmap protection code is supposed to be runnable from an interrupt but
testing with vkernels shows program corruption occuring under severe paging
loads.  Pmap protection changes were only being made from pageout interrupts.
brelse() itself, which can also be called from an interrupt via biodone(),
does not make such changes for asynchronous I/O.

With these changes in place the program corruption stopped or has been
greatly reduced.  Further testing in a 64MB vkernel environment is ongoing.

In addition, trying to move the page after a completed pageout/swappout
to the cache queue was improperly depressing the priority of read-heavy
pages.  Under severe paging loads we now only deactivate the page.  Plus
moving a page to the cache queue causes pmap operations to be run which
we again do not want to run from an interrupt.

12 years agoFix various names of defined values.
Sascha Wildner [Mon, 28 Apr 2008 21:16:17 +0000 (21:16 +0000)]
Fix various names of defined values.

12 years agoKTR_TESTLOG is a valid kernel option (it enables the KTR ipi performance
Matthew Dillon [Mon, 28 Apr 2008 20:00:18 +0000 (20:00 +0000)]
KTR_TESTLOG is a valid kernel option (it enables the KTR ipi performance
testing sysctls).

12 years agoFix a NULL poiner dereference in the statistics collecting code as
Matthew Dillon [Mon, 28 Apr 2008 18:04:08 +0000 (18:04 +0000)]
Fix a NULL poiner dereference in the statistics collecting code as
used by 'systat -vm 1'.  p->p_vmspace can be NULL.

12 years agoFix typo: spam -> span
Sascha Wildner [Mon, 28 Apr 2008 09:38:38 +0000 (09:38 +0000)]
Fix typo: spam -> span

Reported-by: sjg (on #dragonflybsd)
12 years agoMinor code reordering and documentation adjustments.
Matthew Dillon [Mon, 28 Apr 2008 07:07:02 +0000 (07:07 +0000)]
Minor code reordering and documentation adjustments.

12 years agoFix some pmap races in pc32 and vkernel, and other vkernel issues.
Matthew Dillon [Mon, 28 Apr 2008 07:05:09 +0000 (07:05 +0000)]
Fix some pmap races in pc32 and vkernel, and other vkernel issues.

* Fix a case where a vm_page_sleep_busy() loop can cause a page's hold_count
  to go stale, potentially resulting in a double action being taken on
  the page.  This can only occur during very heavy paging loads.  Fix
  the issue by doing the vm_page_unhold() after the blocking condition
  instead of before.  This fix applies to both the vkernel and pc32.

* The pmap_allocpte() call in in pmap_copy() can block, causing the cached
  page directory page to become stale.  Detect the case and take appropriate
  action.  This fix applies to both the vkernel and pc32.

* Add numerous KKASSERT()s to assert that the pmap tracking counters are
  correct, to try to detect races in the future.

* Fix a bug in the vkernel's signal handling. We have to catch SIGBUS as
  well as SIGSEGV.  We also have to tell the signal handler to not
  block the signal on entry or a page fault in the vkernel itself can
  cause the signal to be masked, the vkernel may then block and switch
  threads, and another page fault in the vkernel itself would wind up
  being masked.  The result is an endless loop in the vkernel retrying the
  same faulting instruction forever (until you kill the vkernel).

12 years agoHAMMER 38F/Many: Undo/Synchronization and crash recovery, stabilization pass
Matthew Dillon [Sun, 27 Apr 2008 21:07:15 +0000 (21:07 +0000)]
HAMMER 38F/Many: Undo/Synchronization and crash recovery, stabilization pass

* Fix a bug in the front-end's cached truncation off (ip->trunc_off).

* Fix a bug in the memory record visibility check when called from the backend.

12 years agoAdd basic support for 8111C; hardware checksum offload does not seems to work
Sepherosa Ziehau [Sun, 27 Apr 2008 15:10:37 +0000 (15:10 +0000)]
Add basic support for 8111C; hardware checksum offload does not seems to work
on 8111C yet.

12 years agoPrint unknown hardware version.
Sepherosa Ziehau [Sun, 27 Apr 2008 14:18:16 +0000 (14:18 +0000)]
Print unknown hardware version.

12 years agoHAMMER 38E/Many: Undo/Synchronization and crash recovery
Matthew Dillon [Sun, 27 Apr 2008 00:45:37 +0000 (00:45 +0000)]
HAMMER 38E/Many: Undo/Synchronization and crash recovery

* Fix a couple of deadlocks.

* Fix a kernel buffer cache exhaustion issue.

* Get the 'hammer prune' and 'hammer reblock' command working again.  The
  commands are now properly synchronized for crash recovery.

12 years agoHAMMER utilities: Misc documentation and new options.
Matthew Dillon [Sun, 27 Apr 2008 00:43:57 +0000 (00:43 +0000)]
HAMMER utilities: Misc documentation and new options.

* Add the -u <undoareasize> option.  This option allows the size of the
  undo FIFO buffer to be specified.

* Document missing newfs_hammer's options.

12 years agoRevert rev 1.40, which will cause deadlock, if task's function tries to
Sepherosa Ziehau [Sat, 26 Apr 2008 23:09:40 +0000 (23:09 +0000)]
Revert rev 1.40, which will cause deadlock, if task's function tries to
enqueue itself.

Approved-by: dillon@
12 years agoHAMMER 38E/Many: Undo/Synchronization and crash recovery
Matthew Dillon [Sat, 26 Apr 2008 19:08:14 +0000 (19:08 +0000)]
HAMMER 38E/Many: Undo/Synchronization and crash recovery

* Clean up interlocks between the frontend and backend.

* Deal with the case where the backend needs to sync a record to disk that
  the frontend wishes to delete.  This basically just involves converting
  the record from a deleted in-memory record to a delete-on-disk record,
  so the synced record does not become visible to userland.

* Deal with the case when an inode is being destroyed where the backend
  wishes to delete an in-memory record without syncing it to disk.

* Document numerous special cases for future attention.

12 years agowi(4) depends on wlan(4)
Sepherosa Ziehau [Sat, 26 Apr 2008 14:11:06 +0000 (14:11 +0000)]
wi(4) depends on wlan(4)

12 years agoDon't do following optimization in udp_disconnect():
Sepherosa Ziehau [Sat, 26 Apr 2008 14:08:52 +0000 (14:08 +0000)]
Don't do following optimization in udp_disconnect():
Conditionally free cached pcb route entry by predicting new laddr.

During soclose() on a connected UDP socket, this optimization will cause
cached pcb route entry being freed on wrong CPU, since f{port,addr} have
been changed.

Fix comment in udp_connect().

12 years agoRemove old FreeBSD upgrade tools.
Sascha Wildner [Sat, 26 Apr 2008 09:49:50 +0000 (09:49 +0000)]
Remove old FreeBSD upgrade tools.

12 years agoFix compilation in tools/tools/crypto and fix some minor issues.
Sascha Wildner [Sat, 26 Apr 2008 09:19:10 +0000 (09:19 +0000)]
Fix compilation in tools/tools/crypto and fix some minor issues.

12 years agoHAMMER 38E/Many: Undo/Synchronization and crash recovery
Matthew Dillon [Sat, 26 Apr 2008 08:02:17 +0000 (08:02 +0000)]
HAMMER 38E/Many: Undo/Synchronization and crash recovery

* Add record<->inode dependancies for file creation an deletion.  If a
  directory entry representing a new file is synced out, the file is also
  synced out at the same time, and vise-versa.

* Dirty reclaimed inodes are now forwarded to the flusher, which should
  prevent leaks of hammer_inode structures.  (Still needs work).

* Force finalization if the undo fifo becomes more then half full.
  This can currently break dependancies.  (Still needs work).

* Misc stabilization fixes to recent commits.

12 years agoHAMMER 38D/Many: Undo/Synchronization and crash recovery
Matthew Dillon [Sat, 26 Apr 2008 02:54:00 +0000 (02:54 +0000)]
HAMMER 38D/Many: Undo/Synchronization and crash recovery

* The flusher now waits for I/O to complete at the appropriate points.

* Implement instant crash recovery.  The UNDO FIFO is scanned backwards
  and reapplied to the filesystem on mount.  There is still more work
  to do here, inode<->inode associations (e.g. directory entry vs file)
  are not yet bound together.

* Clean up I/O sequencing a lot and get rid of a ton of unnecessary flusher
  wakeups.

12 years agoHAMMER 38C/Many: Undo/Synchronization and crash recovery
Matthew Dillon [Fri, 25 Apr 2008 21:49:49 +0000 (21:49 +0000)]
HAMMER 38C/Many: Undo/Synchronization and crash recovery

* Classify buffers as meta, undo, or data buffers, and collect them
  into separate lists so they can be flushed in the proper order.

* Make the META buffers and volume header flushed under HAMMERs direct
  control only, as part of the UNDO sequencing.

* Major work on the flusher thread.  Flush the various buffer classes in
  the correct order (the synchronization points are not yet coded, however).

* Update the volume header's UNDO fifo indices.

* Add a ton of sanity checks on buffer modifications and narrow the size
  of some of the UNDO records.

* Clean-up after loose IOs.  An IO can be loose when its ref count drops
  to zero and the kernel attempts to reclaim its bp.  We can't garbage
  collect the IO in the kernel bioops callback so we have to remember
  that the IO is now loose and do it later (in the flusher).

* Temporarily comment out an allocator iterator feature which we cannot
  do right now because it may result in new data allocations overwriting
  old deletions which are still subject to UNDO.

12 years agoHAMMER 38B/Many: Undo/Synchronization and crash recovery
Matthew Dillon [Thu, 24 Apr 2008 22:05:13 +0000 (22:05 +0000)]
HAMMER 38B/Many: Undo/Synchronization and crash recovery

* Properly requeue an inode synchronization when BIOs are present on
  ip->bio_alt_list.  Fixes a sync stall.

12 years agoHAMMER 38A/Many: Undo/Synchronization and crash recovery
Matthew Dillon [Thu, 24 Apr 2008 21:20:33 +0000 (21:20 +0000)]
HAMMER 38A/Many: Undo/Synchronization and crash recovery

* Separate all frontend operations from all backend media synchronization.
  The frontend VNOPs make all changes in-memory and in the frontend
  buffer cache.  The backend buffer cache used to manage meta-data is
  not touched.

  - In-memory inode contains two copies of critical meta-data structures
  - In-memory record tree distinguishes between records undergoing
    synchronization and records not undergoing synchronization.
  - Frontend buffer cache buffers are tracked to determine which ones
    to synchronize and which ones not to.
  - Deletions are cached in-memory.  Any number of file truncations
    simply caches the lowest truncation offset and on-media records
    beyond that point are ignored.  Record deletions are cached as
    a negative entry in the in-memory record tree until the backend
    can execute the operation on the media.
  - Frontend operations continue to have full, direct read access to
    the media.

* Backend synchronization to the disk media is able to take place
  simultaniously with frontend operations on the same inodes.  This
  will need some tuning but it basically works.

* In-memory records are no longer removed from the B-Tree when deleted.
  They are marked for deletion and removed when the last reference goes
  away.

* An Inode whos last reference is being released is handed over to the
  backend flusher for its final disposition.

* There are some bad hacks and debugging tests in this commit.  In particular
  when the backend needs to do a truncation it special-cases any
  negative entries it finds in the in-memory record tree.  Also, if a
  rename operation hits a deadlock it currently breaks atomicy.

* The transaction API has been simplified.  The frontend no longer allocates
  transaction ids.  Instead the backend does a full flush with a single
  transaction id (since that is the granularity the crash recovery code will
  have anyway).

12 years agoFix panics which can occur when killing a threaded program. lwp_exit()
Matthew Dillon [Thu, 24 Apr 2008 08:53:02 +0000 (08:53 +0000)]
Fix panics which can occur when killing a threaded program.  lwp_exit()
was being called without the BGL from userret().  It needs the BGL.

12 years agoMake description of -C option more clear: describe what option does first.
Thomas Nikolajsen [Wed, 23 Apr 2008 22:09:07 +0000 (22:09 +0000)]
Make description of -C option more clear: describe what option does first.
Update usage() to version in fdisk.8.

12 years agoAdd HAMMER to disklabel.8
Thomas Nikolajsen [Wed, 23 Apr 2008 21:59:22 +0000 (21:59 +0000)]
Add HAMMER to disklabel.8
Add cross references to mount_hammer.8 and hammer.8

When here add 'B' (Byte) to comment part of disklabel output, to make unit explicit.