Matthew Dillon [Fri, 9 May 2008 16:03:27 +0000 (16:03 +0000)]
Return EINVAL if a NULL pointer is passed to the mutex routines, instead
of crashing. This appears to be what the standard intended and what libc_r
does.
Matthew Dillon [Fri, 9 May 2008 15:49:42 +0000 (15:49 +0000)]
Sync sysperf with some random stuff, and add a cld instruction tester.
Simon Schubert [Fri, 9 May 2008 11:24:08 +0000 (11:24 +0000)]
Make the code we're running under total signal mask as short as possible.
This fixes a hang when calling pthread_create(NULL, ...), which was due
to the fact that we hit a SIGSEGV while under total signal mask, leading
to infinite page faults.
Matthew Dillon [Fri, 9 May 2008 07:26:51 +0000 (07:26 +0000)]
HAMMER 42/Many: Cleanup.
* Finish up the general code to associate an inode with a cursor.
Matthew Dillon [Fri, 9 May 2008 07:24:48 +0000 (07:24 +0000)]
Fix many bugs and issues in the VM system, particularly related to
heavy paging.
* (cleanup) PG_WRITEABLE is now set by the low level pmap code and not by
high level code. It means 'This page may contain a managed page table
mapping which is writeable', meaning that hardware can dirty the page
at any time. The page must be tested via appropriate pmap calls before
being disposed of.
* (cleanup) PG_MAPPED is now handled by the low level pmap code and only
applies to managed mappings. There is still a bit of cruft left over
related to the pmap code's page table pages but the high level code is now
clean.
* (bug) Various XIO, SFBUF, and MSFBUF routines which bypass normal paging
operations were not properly dirtying pages when the caller intended
to write to them.
* (bug) vfs_busy_pages in kern/vfs_bio.c had a busy race. Separate the code
out to ensure that we have marked all the pages as undergoing IO before we
call vm_page_protect(). vm_page_protect(... VM_PROT_NONE) can block
under very heavy paging conditions and if the pages haven't been marked
for IO that could blow up the code.
* (optimization) Make a minor optimization. When busying pages for write
IO, downgrade the page table mappings to read-only instead of removing
them entirely.
* (bug) In platform/pc32/i386/pmap.c fix various places where
pmap_inval_add() was being called at the wrong point. Only one was
critical, in pmap_enter(), where pmap_inval_add() was being called so far
away from the pmap entry being modified that it could wind up being flushed
out prior to the modification, breaking the cpusync required.
pmap.c also contains most of the work involved in the PG_MAPPED and
PG_WRITEABLE changes.
* (bug) Close numerous pte updating races with hardware setting the
modified bit. There is still one race left (in pmap_enter()).
* (bug) Disable pmap_copy() entirely. Fix most of the bugs anyway, but
there is still one left in the handling of the srcmpte variable.
* (cleanup) Change vm_page_dirty() from an inline to a real procedure, and
move the code which set the object to writeable/maybedirty into
vm_page_dirty().
* (bug) Calls to vm_page_protect(... VM_PROT_NONE) can block. Fix all cases
where this call was made with a non-busied page. All such calls are
now made with a busied page, preventing blocking races from re-dirtying
or remapping the page unexpectedly.
(Such blockages could only occur during heavy paging activity where the
underlying page table pages are being actively recycled).
* (bug) Fix the pageout code to properly mark pages as undergoing I/O before
changing their protection bits.
* (bug) Busy pages undergoing zeroing or partial zeroing in the vnode pager
(vm/vnode_pager.c) to avoid unexpected effects.
Matthew Dillon [Fri, 9 May 2008 06:38:19 +0000 (06:38 +0000)]
Fix fork/vfork statistics. forks and vforks were being improperly counted
as rforks.
Matthew Dillon [Fri, 9 May 2008 06:35:12 +0000 (06:35 +0000)]
Fix a nasty memory corruption issue which can occur due to the kernel bcopy's
use of the FP unit. If the destination address faults the NPX code can
lose track of the fact that the kernel was using the FP unit. When the
fault is resolved the kernel bcopy resumes with corrupted FP registers.
The most common situation where this could occur is with pipes, and generally
only when the system is paging heavily and causing multiple processes to
fault in the kernel FP bcopy code.
Matthew Dillon [Thu, 8 May 2008 01:41:07 +0000 (01:41 +0000)]
Fix a race between the namecache and the vnode recycler. A vnode cannot be
recycled if it's namecache entry represents a directory with locked children.
The various VOP_N*() functions require the parent dvp to be stable.
The main fix is in vrecycle() (kern/vfs_subr.c). Do not vgone() the vnode
if we can't clean out the children.
Also create an API to assert that the parent dvp is stable, and make it
vhold/vdrop the dvp.
The race primarily effected HAMMER which uses the VOP_N*() API.
Matthew Dillon [Thu, 8 May 2008 01:31:01 +0000 (01:31 +0000)]
Fix some lock ordering issues in the pipe code.
In particular fix a bug in the pipe_write() code when multiple writers
are present that could cause garbage to be injected into the pipe due
to a resize possibly occuring while wpipe->pipe_buffer.cnt is non-zero.
Matthew Dillon [Thu, 8 May 2008 01:26:01 +0000 (01:26 +0000)]
Recode the resource limit core (struct plimit) to fix a few races and
generally make it work better. Also make changes with an eye towards
making it MPSAFE.
Matthew Dillon [Thu, 8 May 2008 01:21:06 +0000 (01:21 +0000)]
Clear the direction flag (CLD) on entry to the kernel, to support future
kernels compiled with GCC-4.3 or later.
Sascha Wildner [Wed, 7 May 2008 20:03:09 +0000 (20:03 +0000)]
Add some lines about lwkt_serialize_adaptive_enter().
Submitted-by: sephe
Matthew Dillon [Wed, 7 May 2008 17:26:28 +0000 (17:26 +0000)]
Bump base development version to 197700 so it is properly distinct from
the 1.12 release version.
Reported-by: Hasso Tepper
Matthew Dillon [Wed, 7 May 2008 17:19:47 +0000 (17:19 +0000)]
Correct comments and minor variable naming and sysctl issues.
Reported-by: Fabio Checconi <fabio@gandalf.sssup.it>
Matthew Dillon [Tue, 6 May 2008 21:40:40 +0000 (21:40 +0000)]
Fix a sizeof() the wrong variable name. The correct variable was the same
size so no harm done, but get it right.
Submitted-by: Fabio Checconi <fabio@gandalf.sssup.it>
Sascha Wildner [Tue, 6 May 2008 18:55:01 +0000 (18:55 +0000)]
The vkernel's maximum number of CPUs is now 16.
Matthew Dillon [Tue, 6 May 2008 18:43:02 +0000 (18:43 +0000)]
Enable kern.trap_mpsafe and kern.syscall_mpsafe by default for vkernels.
Matthew Dillon [Tue, 6 May 2008 18:37:58 +0000 (18:37 +0000)]
Remove the SMP_MAXCPU override for vkernels, causing the build to revert
to the i386 limit of 16. This is not because vkernels couldn't handle more
(up to 31), but because I want the installed world to be compatible between
vkernel and pc32.
This unbreaks programs like 'vmstat -m'.
Sepherosa Ziehau [Tue, 6 May 2008 10:05:02 +0000 (10:05 +0000)]
Add strings for some AMD features
Matthew Dillon [Tue, 6 May 2008 00:21:08 +0000 (00:21 +0000)]
HAMMER 41B/Many: Cleanup.
* Disable (most) debugging kprintfs unless a hammer debug sysctl is set.
* Do not allow buffers to be synced on panic.
Matthew Dillon [Tue, 6 May 2008 00:15:35 +0000 (00:15 +0000)]
HAMMER Utilities: Sync with recent changes.
* Add some missing crc initializations.
* Fix an assertion that was breaking newfs_hammer on large disks.
Matthew Dillon [Tue, 6 May 2008 00:14:12 +0000 (00:14 +0000)]
Keep track of the number of buffers undgoing IO, and include that number
in calculations involving numdirtybuffers. This prevents the kernel from
believing that there are only a few dirty buffers when, in fact, all the
dirty buffers are running IOs.
Matthew Dillon [Mon, 5 May 2008 22:09:44 +0000 (22:09 +0000)]
Only call bwillwrite() for regular file write()s, instead of for all write()s.
This stops hicuping on things like pty's and sockets during heavy file
activity.
Matthew Dillon [Mon, 5 May 2008 20:34:52 +0000 (20:34 +0000)]
HAMMER Utilities: Feature add
* Check record crc and signature in extra-verbose mode
* Adjustments for structural changes
* Generate proper CRCs for structures laid down by newfs_hammer
Matthew Dillon [Mon, 5 May 2008 20:34:48 +0000 (20:34 +0000)]
HAMMER 41/Many: Implement CRC checking (WARNING: On-media structures changed)
* Generate and check on-media CRC fields.
* Clean up the record modification API
* Add a header signature field for future critical recovery
* Rearrange CRC fields for on-media structures to make them easier to
deal with.
* Adjust the ioctl API
* When trying to back-out of an operation that errored, free allocated
data blocks.
Sepherosa Ziehau [Mon, 5 May 2008 12:35:03 +0000 (12:35 +0000)]
- Add lwkt_serialize_adaptive_enter(9), it is same as lwkt_serialize_enter(9)
except that it spins a little bit before sleeping.
- Under debug sysctl tree, add sysctl nodes to tune various backoff related
parameter for lwkt_serialize_adaptive_enter(9).
- Add ktr for serializer enter end, exit begin, spin backoff and spin backoff
failure.
Reviewed-by: corecode@
Sepherosa Ziehau [Mon, 5 May 2008 11:07:48 +0000 (11:07 +0000)]
Use mask instead of modulo, since bo->backoff is always power of 2
Suggested-by: dillon@
Matthew Dillon [Sun, 4 May 2008 19:57:42 +0000 (19:57 +0000)]
HAMMER 40G/Many: UNDO cleanup & stabilization.
* Fix a race in the undo record allocator that could result in a
corrupted UNDO FIFO.
* Fix improperly placed calls to hammer_test_inode().
* Properly account for nlinks when a deleted ADD record is to be
converted to a DEL record by the flush. In this case the frontend's
notion of nlinks accounts for the deletion but the backend must sync
the record anyway, so the backend needs to bump the link count by one.
Matthew Dillon [Sun, 4 May 2008 19:18:17 +0000 (19:18 +0000)]
HAMMER Utilities: enhanced show, timeout option
* Enchange the show command when used with -vvv. The command now reports
directory entries and basic information about inodes.
* Add the [-t timeout] option. The idea is to use this to limit the amount
of time hammer spends reblocking or pruning a filesystem when running the
command from a cron job.
* Adjust the format of the softlink option to be more consistent.
Sascha Wildner [Sun, 4 May 2008 17:07:49 +0000 (17:07 +0000)]
Adjust to our current directory layout on pkgbox.
Reported-by: aggelos and others
Matthew Dillon [Sun, 4 May 2008 09:06:45 +0000 (09:06 +0000)]
HAMMER 40F/Many: UNDO cleanup & stabilization.
* Properly classify UNDO zone buffers so they are flushed at the correct
point in time.
* Minor rewrite of the code tracking the UNDO demark for the next flush.
* Introduce a considerably better backend flushing activation algorithm
to avoid single-buffer flushes.
* Put a lock around the freemap allocator.
Matthew Dillon [Sun, 4 May 2008 08:42:03 +0000 (08:42 +0000)]
The direct-write pipe code has a bug in it somewhere when the system is
paging heavily. Disable it for now.
Sepherosa Ziehau [Sun, 4 May 2008 04:48:47 +0000 (04:48 +0000)]
- Randomize spinlock exponential backoff value, which reduces the chance of
serious spinlock contention (probably) caused by same backoff steps
- Ktr spinlock backoff value and backoff failure
- Under debug sysctl tree, add sysctl node for spinlock backoff limit
- Break long lines
Reviewed-by: dillon@
Sascha Wildner [Sun, 4 May 2008 04:17:11 +0000 (04:17 +0000)]
Add more missing entries and fix more mistakes (some of which I introduced
in my last commit).
Sascha Wildner [Sun, 4 May 2008 03:44:07 +0000 (03:44 +0000)]
Fix some mistakes and add some missing entries.
Matthew Dillon [Sun, 4 May 2008 02:28:28 +0000 (02:28 +0000)]
Print the 64 bit inode as a 64 bit quantity rather then a 32 bit quantity.
Sascha Wildner [Sun, 4 May 2008 00:55:01 +0000 (00:55 +0000)]
Add missing names and MLINKS.
Matthew Dillon [Sat, 3 May 2008 22:07:37 +0000 (22:07 +0000)]
Correct a bug in seekdir/readdir which could cause the directory entry
after a deleted entry to be skipped when seeking past the deleted entry.
NOTE: DragonFly has a specific issue even after this fix which currently
causes seekdirs to be unreliable if any files are deleted. DragonFly
translates directory entries into a filesystem-independant form and if
the real filesystem collapses the entry, the offsets will not be maintained
in the machine-independant form.
Submitted-by: Marc Balmer <marc@msys.ch>
Matthew Dillon [Sat, 3 May 2008 20:21:20 +0000 (20:21 +0000)]
HAMMER 40F/Many: Inode/link-count sequencer cleanup pass, UNDO cache.
* Implement an UNDO cache. If we have already laid down an UNDO in the
current flush cycle we do not have to lay down another one for the same
address. This greatly reduces the number of UNDOs we generate during
a flush.
* Properly get the vnode in order to be able to issue vfsync()'s from the
backend. We may also have to acquire the vnode when doing an unload
check for a file deletion.
* Properly generate UNDO records for the volume header. During crash recovery
we have to UNDO the volume header along with any partially written
meta-data, because the volume header refers to the meta-data.
* Add another record type, GENERAL, representing inode or softlink records.
* Move the setting of HAMMER_INODE_WRITE_ALT to the backend, allowing
the kernel to flush buffers up to the point where the backend syncs
the inode.
Matthew Dillon [Sat, 3 May 2008 07:59:06 +0000 (07:59 +0000)]
HAMMER 40E/Many: Inode/link-count sequencer cleanup pass.
* An inode can go inactive before it is deleted, add an unload check
in hammer_ip_del_directory to catch the nlinks == 0 case on an inactive
inode. Otherwise the inode would not be deleted on-media until umount.
* Add a missing resignaling case.
* Clean out a few more of the debug kprintf()'s
Matthew Dillon [Sat, 3 May 2008 05:28:55 +0000 (05:28 +0000)]
HAMMER 40D/Many: Inode/link-count sequencer cleanup pass.
* Move the vfsync from the frontend to the backend. This allows the
frontend to passively move inodes to the backend without having to
actually start the flush, greatly improving performance.
* Use an inode lock to deal with directory entry syncing races between
the frontend and the backend. It isn't optimal but it's ok for now.
* Massively optimize the backend code by initializing a single cursor
for an inode and passing the cursor to procedures, instead of having
each procedure initialize its own cursor.
* Fix a sequencing issue with the backend. While building the flush
state for an inode another process could get in and initiate its own
flush, screwing up the flush group and creating confusion.
(hmp->flusher_lock)
* Don't lose track of HAMMER_FLUSH_SIGNAL flush requests. If we get
such a requet but have to flag a reflush, also flag that the reflush
is to be signaled (done immediately when the current flush is done).
* Remove shared inode locks from hammer_vnops.c. Their original purpose
no longer exists.
* Simplify the arguments passed to numerous procedures (hammer_ip_first(),
etc).
Matthew Dillon [Sat, 3 May 2008 04:13:12 +0000 (04:13 +0000)]
Print the path even if we do not understand the filesystem type.
Fix a switch/case compiler warning.
Sascha Wildner [Fri, 2 May 2008 22:10:58 +0000 (22:10 +0000)]
Elaborate a bit more on lexical conventions and ISA device configuration.
Taken-from: FreeBSD
Matthew Dillon [Fri, 2 May 2008 16:41:26 +0000 (16:41 +0000)]
HAMMER 40C/Many: Inode/link-count sequencer cleanup pass.
* Fix a forever-syncing inode issue by properly clearing the XDIRTY flag
when the last record is removed from ip->rec_tree.
Sepherosa Ziehau [Fri, 2 May 2008 11:17:19 +0000 (11:17 +0000)]
- Put exit ktr in proper place
- Add sleep_{beg,end} and wakeup_{beg,end} ktr
Sepherosa Ziehau [Fri, 2 May 2008 10:57:33 +0000 (10:57 +0000)]
White space
Sascha Wildner [Fri, 2 May 2008 10:46:33 +0000 (10:46 +0000)]
Use a list with tags.
Sepherosa Ziehau [Fri, 2 May 2008 07:40:32 +0000 (07:40 +0000)]
Introduce ETHER_INPUT_CHAIN option:
1) During RXEOF, we aggregate packets, which have same target CPU, instead of
calling lwkt_sendmsg() for each input packet.
2) At the end of RXEOF, low level ipiq sending is used to dispatch mbuf chain
to the target CPU.
3) On the target CPU, the ipi function puts mbuf to their belonging msgport.
Note, though lwkt_sendmsg() is used in ipi function, no further ipi activity
will happen, since we are on target CPU.
em(4) is made to aware of this option.
This option is off by default and has no effect on vlan(4) operation.
Matthew Dillon [Fri, 2 May 2008 06:51:57 +0000 (06:51 +0000)]
HAMMER 40B/Many: Inode/link-count sequencer cleanup pass.
* Fix data record leakage w/ final inode disposition on disk.
* Fix numerous live locks with infinitely re-syncing inodes.
Sascha Wildner [Fri, 2 May 2008 02:05:08 +0000 (02:05 +0000)]
Sweep over our manual pages and remove .Pp before a .Bd or .Bl without
-compact because it has no effect.
Matthew Dillon [Fri, 2 May 2008 01:00:42 +0000 (01:00 +0000)]
HAMMER 40A/Many: Inode/link-count sequencer.
* Remove the hammer_depend structure and build the dependancies directly
into the hammer_record structure.
* Attempt to implement layout rules to ensure connectivity is maintained.
This means, for example, that before HAMMER can flush a newly created
file it will make sure the file has namespace connectivity to the
directory it was created it, recursively to the root.
NOTE: 40A destabilizes the filesystem a bit, it's going to take a few
passes to get everything working properly. There are numerous issues
with this commit.
Simon Schubert [Fri, 2 May 2008 00:19:52 +0000 (00:19 +0000)]
Properly yield to userland processes.
Sascha Wildner [Thu, 1 May 2008 23:36:43 +0000 (23:36 +0000)]
Move text that doesn't belong to a list outside of it.
Sascha Wildner [Thu, 1 May 2008 23:29:10 +0000 (23:29 +0000)]
Add .It
Sascha Wildner [Thu, 1 May 2008 22:06:06 +0000 (22:06 +0000)]
Move .Pp outside of .Bl
Sascha Wildner [Thu, 1 May 2008 21:51:43 +0000 (21:51 +0000)]
Reduce vertical space.
Sascha Wildner [Thu, 1 May 2008 20:24:01 +0000 (20:24 +0000)]
Remove some obsolete lines.
Sascha Wildner [Thu, 1 May 2008 20:01:24 +0000 (20:01 +0000)]
Turn off yy_flex_realloc() related warnings (such as the one issued when
building usr.sbin/config) by marking the function unused.
There are a number of things which decide whether it's used or not, such
as using REJECT, %option yylineno, and some trailing context patterns.
See NetBSD's revisions 1.10 & 1.19.
Taken-from: NetBSD
Thomas E. Spanjaard [Thu, 1 May 2008 19:44:37 +0000 (19:44 +0000)]
Set a sensible mode on /etc/upgrade/Makefile_upgrade.inc .
Sascha Wildner [Thu, 1 May 2008 18:02:45 +0000 (18:02 +0000)]
Regenerate the pciconf(8) database from the following files:
Hart: Jan 22, 2008 (version 671)
Boemler: May 1, 2008
Mares: Mar 1, 2008
Sascha Wildner [Thu, 1 May 2008 13:04:51 +0000 (13:04 +0000)]
Add FreeBSD 7.1 (which is already referenced in cmx.4).
Sepherosa Ziehau [Thu, 1 May 2008 12:34:06 +0000 (12:34 +0000)]
Mention that BCM430[69] chips do not work properly on channel 1/2/3
Sascha Wildner [Thu, 1 May 2008 12:27:18 +0000 (12:27 +0000)]
Sync with FreeBSD (adds OpenBSD 4.3).
Sepherosa Ziehau [Thu, 1 May 2008 09:37:48 +0000 (09:37 +0000)]
ktr the end of various ipiq sending operation.
Sascha Wildner [Thu, 1 May 2008 09:24:42 +0000 (09:24 +0000)]
Remove obsolete keywords: conflicts, controller, disk, tape
Remove obsolete option: -n
Sepherosa Ziehau [Thu, 1 May 2008 02:11:39 +0000 (02:11 +0000)]
ktr cpu_send_ipiq
Sepherosa Ziehau [Thu, 1 May 2008 02:03:28 +0000 (02:03 +0000)]
- Promote em(4) polling begin/end ktr into polling(4)
- Add crit section around if_poll
Simon Schubert [Wed, 30 Apr 2008 23:05:33 +0000 (23:05 +0000)]
Enforce proper sequencing of world and kernel targets.
.ORDER: does *not* take an arbitrary list of targets of which all pairs
are supposed to be built in their specified sequence,
instead it specifies which adjacent pairs need to be built in sequence.
As a result, given a sequence "buildworld buildkernel quickkernel" and
the make targets "buildworld" and "quickkernel", make would still
parallelize the build of these targets.
Additionally, introduce quickworld to the sequencing.
Sascha Wildner [Wed, 30 Apr 2008 21:45:28 +0000 (21:45 +0000)]
* Mention that bmake must be used for pkgsrc.
* Add references to pkg_radd(1) and pkg_search(1).
* Adjust documentation URL (taken from NetBSD).
Matthew Dillon [Wed, 30 Apr 2008 17:34:11 +0000 (17:34 +0000)]
Have vfsync() call buf_checkwrite() on buffers with bioops to determine
whether it is ok to write out a buffer or not. Used by HAMMER to prevent
specfs from syncing out meta-data at the wrong time.
Matthew Dillon [Wed, 30 Apr 2008 17:28:17 +0000 (17:28 +0000)]
Add pmap_unmapdev() calls to detach functions for drivers which used
pmap_mapdev(), when possible.
Matthew Dillon [Wed, 30 Apr 2008 16:59:45 +0000 (16:59 +0000)]
Cothreads do not have a globaldata context and cannot handle signals
which require one. Add SIGTERM, SIGWINCH, and SIGUSR2 to the list of
signals cothreads mask.
Reported-by: Rumko <rumcic@gmail.com>
Sepherosa Ziehau [Wed, 30 Apr 2008 09:30:59 +0000 (09:30 +0000)]
Add tunable for each_burst.
Sascha Wildner [Wed, 30 Apr 2008 09:15:08 +0000 (09:15 +0000)]
Add KTR_TESTLOG to LINT.
Mention KTR_SERIALIZER & KTR_TESTLOG in ktr(4).
Matthew Dillon [Wed, 30 Apr 2008 04:19:57 +0000 (04:19 +0000)]
Change the SMP wakeup() code to send an IPI to the target cpu's in parallel
instead of chaining the message. This fixes a stack depth assertion in the
IPI processing code that Sephe was hitting in his network work. The target
cpu _wakeup() code no longer recurses the IPI subsystem.
Reported-by: "Sepherosa Ziehau" <sepherosa@gmail.com>
Matthew Dillon [Wed, 30 Apr 2008 04:11:44 +0000 (04:11 +0000)]
Add some assertions when a buffer is reused
Matthew Dillon [Wed, 30 Apr 2008 04:05:21 +0000 (04:05 +0000)]
The driver was improperly using kmem_free() instead of pmap_unmapdev(),
and hitting a recently added assertion. Use the proper call.
Reported-by: Rumko <rumcic@gmail.com>
Sepherosa Ziehau [Tue, 29 Apr 2008 16:00:12 +0000 (16:00 +0000)]
KTR various serializer operation
Sepherosa Ziehau [Tue, 29 Apr 2008 14:26:09 +0000 (14:26 +0000)]
Three int arguments are used in IPIQ_STRING
Sascha Wildner [Tue, 29 Apr 2008 11:59:08 +0000 (11:59 +0000)]
Remove unneeded argument.
Sascha Wildner [Tue, 29 Apr 2008 09:33:41 +0000 (09:33 +0000)]
Use 'MS-DOS' and not 'MS DOS' or 'MSDOS'.
Sascha Wildner [Tue, 29 Apr 2008 09:02:45 +0000 (09:02 +0000)]
Fix section ref.
Matthew Dillon [Tue, 29 Apr 2008 04:43:08 +0000 (04:43 +0000)]
HAMMER 39B/Many: Cleanup pass
* Correct an issue where an inode flush was not being immediately picked
up by the flusher, causing frontend I/O to stall.
Matthew Dillon [Tue, 29 Apr 2008 01:11:02 +0000 (01:11 +0000)]
HAMMER Utilities: zone limit
* newfs_hammer now sets a default zone limit in the volume header.
Matthew Dillon [Tue, 29 Apr 2008 01:10:37 +0000 (01:10 +0000)]
HAMMER 39/Many: Parallel operations optimizations
* Implement a per-direct cache of new object IDs. Up to 128 directories
will be managed in LRU fashion. The cached provides a pool of object
IDs to better localize the object ids of files created in a directory,
so parallel operations on the filesystem do not create a fragmented
object id space.
* Cache numerous fields in the root volume's header to avoid creating
undo records for them, creatly improving
(ultimately we can sync an undo space representing the volume header
using a direct comparison mechanic but for now we assume the write of
the volume header to be atomic).
* Implement a zone limit for the blockmap which newfs_hammer can install.
The blockmap zones have an ultimate limit of 2^60 bytes, or around
one million terrabytes. If you create a 100G filesystem there is no
reason to let the blockmap iterate over its entire range as that would
result in a lot of fragmentation and blockmap overhead. By default
newfs_hammer sets the zone limit to 100x the size of the filesystem.
* Fix a bug in the crash recovery code. Do not sync newly added inodes
once the flusher is running, otherwise the volume header can get out
of sync. Just create a dummy marker structure and move it to the tail
of the inode flush_list when the flush starts, and stop when we hit it.
* Adjust hammer_vfs_sync() to sync twice. The second sync is needed to
update the volume header's undo fifo indices, otherwise HAMMER will
believe that it must undo the last fully synchronized flush.
Matthew Dillon [Mon, 28 Apr 2008 21:16:27 +0000 (21:16 +0000)]
Paging and swapping system fixes.
* Do not try to free a VM page after a failed IO read from swap. It is
illegal to free a VM page from an interrupt. Just deactivate it instead.
* Do not attempt to move a VM page into the cache queue after a successful
pageout from the vnode or swap pagers, and do not try to adjust page
protections to read-only (they should already be read-only). Both
operations require making serious pmap calls which we really do not
want to do from an interrupt.
Instead, leave the page on its current queue or, if the system is low
on pages, deactivate the page.
The pmap protection code is supposed to be runnable from an interrupt but
testing with vkernels shows program corruption occuring under severe paging
loads. Pmap protection changes were only being made from pageout interrupts.
brelse() itself, which can also be called from an interrupt via biodone(),
does not make such changes for asynchronous I/O.
With these changes in place the program corruption stopped or has been
greatly reduced. Further testing in a 64MB vkernel environment is ongoing.
In addition, trying to move the page after a completed pageout/swappout
to the cache queue was improperly depressing the priority of read-heavy
pages. Under severe paging loads we now only deactivate the page. Plus
moving a page to the cache queue causes pmap operations to be run which
we again do not want to run from an interrupt.
Sascha Wildner [Mon, 28 Apr 2008 21:16:17 +0000 (21:16 +0000)]
Fix various names of defined values.
Matthew Dillon [Mon, 28 Apr 2008 20:00:18 +0000 (20:00 +0000)]
KTR_TESTLOG is a valid kernel option (it enables the KTR ipi performance
testing sysctls).
Matthew Dillon [Mon, 28 Apr 2008 18:04:08 +0000 (18:04 +0000)]
Fix a NULL poiner dereference in the statistics collecting code as
used by 'systat -vm 1'. p->p_vmspace can be NULL.
Sascha Wildner [Mon, 28 Apr 2008 09:38:38 +0000 (09:38 +0000)]
Fix typo: spam -> span
Reported-by: sjg (on #dragonflybsd)
Matthew Dillon [Mon, 28 Apr 2008 07:07:02 +0000 (07:07 +0000)]
Minor code reordering and documentation adjustments.
Matthew Dillon [Mon, 28 Apr 2008 07:05:09 +0000 (07:05 +0000)]
Fix some pmap races in pc32 and vkernel, and other vkernel issues.
* Fix a case where a vm_page_sleep_busy() loop can cause a page's hold_count
to go stale, potentially resulting in a double action being taken on
the page. This can only occur during very heavy paging loads. Fix
the issue by doing the vm_page_unhold() after the blocking condition
instead of before. This fix applies to both the vkernel and pc32.
* The pmap_allocpte() call in in pmap_copy() can block, causing the cached
page directory page to become stale. Detect the case and take appropriate
action. This fix applies to both the vkernel and pc32.
* Add numerous KKASSERT()s to assert that the pmap tracking counters are
correct, to try to detect races in the future.
* Fix a bug in the vkernel's signal handling. We have to catch SIGBUS as
well as SIGSEGV. We also have to tell the signal handler to not
block the signal on entry or a page fault in the vkernel itself can
cause the signal to be masked, the vkernel may then block and switch
threads, and another page fault in the vkernel itself would wind up
being masked. The result is an endless loop in the vkernel retrying the
same faulting instruction forever (until you kill the vkernel).
Matthew Dillon [Sun, 27 Apr 2008 21:07:15 +0000 (21:07 +0000)]
HAMMER 38F/Many: Undo/Synchronization and crash recovery, stabilization pass
* Fix a bug in the front-end's cached truncation off (ip->trunc_off).
* Fix a bug in the memory record visibility check when called from the backend.
Sepherosa Ziehau [Sun, 27 Apr 2008 15:10:37 +0000 (15:10 +0000)]
Add basic support for 8111C; hardware checksum offload does not seems to work
on 8111C yet.
Sepherosa Ziehau [Sun, 27 Apr 2008 14:18:16 +0000 (14:18 +0000)]
Print unknown hardware version.
Matthew Dillon [Sun, 27 Apr 2008 00:45:37 +0000 (00:45 +0000)]
HAMMER 38E/Many: Undo/Synchronization and crash recovery
* Fix a couple of deadlocks.
* Fix a kernel buffer cache exhaustion issue.
* Get the 'hammer prune' and 'hammer reblock' command working again. The
commands are now properly synchronized for crash recovery.
Matthew Dillon [Sun, 27 Apr 2008 00:43:57 +0000 (00:43 +0000)]
HAMMER utilities: Misc documentation and new options.
* Add the -u <undoareasize> option. This option allows the size of the
undo FIFO buffer to be specified.
* Document missing newfs_hammer's options.
Sepherosa Ziehau [Sat, 26 Apr 2008 23:09:40 +0000 (23:09 +0000)]
Revert rev 1.40, which will cause deadlock, if task's function tries to
enqueue itself.
Approved-by: dillon@
Matthew Dillon [Sat, 26 Apr 2008 19:08:14 +0000 (19:08 +0000)]
HAMMER 38E/Many: Undo/Synchronization and crash recovery
* Clean up interlocks between the frontend and backend.
* Deal with the case where the backend needs to sync a record to disk that
the frontend wishes to delete. This basically just involves converting
the record from a deleted in-memory record to a delete-on-disk record,
so the synced record does not become visible to userland.
* Deal with the case when an inode is being destroyed where the backend
wishes to delete an in-memory record without syncing it to disk.
* Document numerous special cases for future attention.
Sepherosa Ziehau [Sat, 26 Apr 2008 14:11:06 +0000 (14:11 +0000)]
wi(4) depends on wlan(4)