Matthew Dillon [Wed, 10 Feb 2010 08:54:00 +0000 (00:54 -0800)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Wed, 10 Feb 2010 08:45:02 +0000 (00:45 -0800)]
kernel - SMP - "Fix AP #%d (PHY# %d) failed" issues
Ok, here's what is going on. If an SMI interrupt occurs while
an AP is going through the INIT/STARTUP IPI sequence the AP will
brick, and nothing you do will resurrect it.
BIOSes typically set up SMI interrupts when emulating (for example)
a PS/2 keyboard with a USB keyboard, or even if just implementing
BIOS support for a USB keyboard. Even worse, the BIOS may set up
the interrupt to poll at 1000hz. And, EVEN WORSE, it can totally
depend on which USB port you've plugged your keyboard in. And, on top
of all of that, the SMI interrupt is not consistent.
The INIT/STARTUP code contains a 10ms delay (as per Intel spec) between
the INIT IPI and the STARTUP IPI. Well, you can do the math.
In order to reliably boot a SMP system where the BIOS has set up
SMI interrupts this patch uses a nifty bit of code to detect when
the SMI interrupt has occurred and tries to shift the INIT/STARTUP
sequence into a gap between SMI interrupts. If it has to it will
reduce the 10ms spec delay all the way down to 150us. In many
cases we really have no choice for reliable operation. Even a 300uS
delay is too much in the tests I performed on a Shuttle Phenom and
Phenom II cube. I don't honestly know if this will break other SMP
configurations, we'll have to see.
On the particular shuttle I tested on, one of the four USB connections
on the backpanel (the upper left when looking at it from the back)
seemed to cause the BIOS to set up SMI interrupts at a high rate and
caused kernel boots to fail. With this commit those boots now succeed.
Constantine A. Murenin [Wed, 10 Feb 2010 02:43:37 +0000 (21:43 -0500)]
man4: MLINK acpi_thermal.4 acpi_tz.4
Constantine A. Murenin [Wed, 10 Feb 2010 02:37:03 +0000 (21:37 -0500)]
acpi_tz(4): zero temperature in acpi refers to -273,2degC -- convert to uK appropriately
Sascha Wildner [Tue, 9 Feb 2010 15:40:41 +0000 (16:40 +0100)]
ktrdump.8: Fix typo in xref.
Sascha Wildner [Tue, 9 Feb 2010 15:10:11 +0000 (16:10 +0100)]
make upgrade: Remove obsolete fortunes2* files.
Matthew Dillon [Tue, 9 Feb 2010 10:17:55 +0000 (02:17 -0800)]
kernel - NFS - fix additional problems with readdirplus
* Ok, give up trying to hack a fix for readdirplus. Instead, do the
fix the right by properly reordering namecache lookups and vnodes.
* Do not create a namecache entry for '.' or '..'. These entries are
superfluous (ignored by the lookup code).
Constantine A. Murenin [Tue, 9 Feb 2010 05:11:36 +0000 (00:11 -0500)]
aibs.4: sprinkle a few markup tags
Matthew Dillon [Tue, 9 Feb 2010 08:59:42 +0000 (00:59 -0800)]
kernel - NFS - fix deadlock in NFS client-side readdirplus (part 2)
* Missed a vnode in the last commit. Two vnodes have to potentially
be unlocked.
Matthew Dillon [Tue, 9 Feb 2010 08:46:26 +0000 (00:46 -0800)]
kernel - NFS - fix deadlock in NFS client-side readdirplus
* readdirplus holds a vnode lock while attempting to do a namecache
lookup, which is not legal. Unlock the vnode while doing the
lookup.
Matthew Dillon [Tue, 9 Feb 2010 08:10:26 +0000 (00:10 -0800)]
HAMMER VFS - Improve initial B-Tree packing
* Detect the case where B-Tree leafs are being laid down sequentially,
such as when creating a large file. When linear operation is detected
split leafs 75:25 instead of 50:50. This greatly improves fill ratios.
It should be noted that the HAMMER flush sorts by inode so directory
entries will also tend to benefit.
* This only effects (improves) the initial B-Tree layout. The overnight
hammer cleanup will refactor the B-Tree to a more optimal state
regardless.
Matthew Dillon [Tue, 9 Feb 2010 08:08:32 +0000 (00:08 -0800)]
kernel - struct vm_object - increase paging_in_progress from short to int
* Change the paging_in_progress refcount from an unsigned short to an int.
It is potentitally possible to overflow it as a short, especially when
many pages are rolled up into clusters.
This changes the size of the vm_object structure.
Matthew Dillon [Tue, 9 Feb 2010 08:05:55 +0000 (00:05 -0800)]
kernel - Fix bug in cahce_fromdvp() as uesd by NFS's readdirplus
* cache_fromdvp() is supposed to return a held ncp for the directory
vnode's namecache entry if one is present and makeit is 0. It
was returning NULL instead.
* NFS readdirplus was kprintf()ing debug info unconditionally when
it was able to successfully construct a vnode. #if 0 out the
kprintf().
Matthew Dillon [Tue, 9 Feb 2010 08:04:44 +0000 (00:04 -0800)]
kernel - slab allocator - Refactor struct kmemusage
* Refactor struct kmemusage to just contain a 32 bit ku_pagecnt
instead of a 16 bit ku_pagecnt and other fields (none of which
were used).
Matthew Dillon [Tue, 9 Feb 2010 08:02:19 +0000 (00:02 -0800)]
kernel - nata - Fix bug in SET_MULTI command
* The command was not properly masking atadev->param.sectors_intr,
resulting in the setting of a value which some hard drives (OCZ SSD)
would reject.
This mainly just gets rid of an error message on the console.
SET_MULTI is typically a NOP on most SATA drives.
Obtained-from: FreeBSD
Justin C. Sherrill [Tue, 9 Feb 2010 04:34:46 +0000 (20:34 -0800)]
Sprinkle in some commas to break out dependent clauses, and spelling fixes.
Matthew Dillon [Tue, 9 Feb 2010 03:47:31 +0000 (19:47 -0800)]
docs - More adjustments to the swapcache manual page.
Matthew Dillon [Tue, 9 Feb 2010 01:41:10 +0000 (17:41 -0800)]
kernel - Remove further misuses of %ll* in kprintfs, use intmax_t
* In two minor places
Matthew Dillon [Tue, 9 Feb 2010 01:40:00 +0000 (17:40 -0800)]
kernel - SWAP CACHE part 17/many - Add missing critical sections
* Add missing critical sections in several swap_*() procedures which
are no longer being called with a critical section held.
Matthew Dillon [Tue, 9 Feb 2010 01:37:36 +0000 (17:37 -0800)]
kernel - SWAP CACHE part 16/many - Correct bug in kern_slaballoc.c
* When kmalloc() tries to free oversized allocations it incorrectly
dereferences a structure after it has been freed.
Reported-by: Rumko, Stathis Kamperis <beket@crater.dragonflybsd.org>
Thanks-to: Above for getting a nice kernel dump and doing some git bisecting
Matthew Dillon [Mon, 8 Feb 2010 21:21:48 +0000 (13:21 -0800)]
docs - Improve the swapcache.8 manual page (followup)
* Fix endurance statements for SLC. SLC has approximately 10x the
endurance. Documentation on the web is confused on this matter with
10x and 100x both being thrown around. We will just assume 10x
for now.
Matthew Dillon [Mon, 8 Feb 2010 21:12:55 +0000 (13:12 -0800)]
docs - Improve the swapcache.8 manual page
* Add a ton of useful information to the manual page including how to
read the wear indicator from the SMART data.
Stathis Kamperis [Mon, 8 Feb 2010 20:49:44 +0000 (22:49 +0200)]
awk(1): Increase input field separator width.
POSIX allows -F to be an extended regular expression.
The current width of 10 chars just isn't enough.
FreeBSD changed it to 100. NetBSD has an initial value of 16,
dynamically resizable via malloc().
Matthew Dillon [Mon, 8 Feb 2010 19:35:24 +0000 (11:35 -0800)]
mount_nfs - Make rdirplus the default
* It is really high-time we made rdirplus the default for NFS mounts.
It improves client directory traversals by 300%.
* With a SSD meta-data swapcache on the NFS server 'disk' latencies might
as well be 'fully cached in ram' always. The bottleneck becomes the
network regardless of server load.
* Note that linux also defaults to using rdirplus mounts.
Matthew Dillon [Mon, 8 Feb 2010 17:51:05 +0000 (09:51 -0800)]
kernel - Improve cluster_read()
* The cluster_read() code was tripping over itself due to a findblk()
call which caused it to believe it had found a buffer hole when it
really found a busy buffer.
Redo the code to use the FINDBLK_TEST flag to locate the next buffer
hole. Also add a shortcut to support efficient coding for larger
read-ahead values.
* Change the single-read-ahead in cluster_read() to a multiple-read-ahead
based on the maxra parameter. Before we just did a single read-ahead
and even though this was a cluster read it still created a situation
where the next cluster_read(0 operation would stall on previous read-ahead
before issuing the next one. In otherwords, it wasn't pipelining requests
as well asit could.
This change tries to keep at least two read-aheads in progress so when
the next cluster_read() stalls on the first one the second one is still
in the pipeline after it unstalls, allowing it to issue the third one
to keep the pipeline hot.
* These changes improve SSD swapcache operation as well as normal HD
cluster_read() pipelining. In addition the read-ahead is now sufficient
to keep the pipeline hot across a 2 x Swap (interleaved) setup.
Aggelos Economopoulos [Mon, 8 Feb 2010 17:43:33 +0000 (19:43 +0200)]
Bring in a simple event tracing library and POC utility
- Import libevtr, a library for abstracting access to an event stream.
libevtr uses its own dump format and can synthesize event attributes
based on known event types.
- Modify ktrdump(8) to be able to dump an event stream to a file
using libevtr.
- Add evtranalyze(1), a proof of concept utility to display events in
a line-oriented text format or to generate an svg file displaying
the events on each processor. This needs quite some work.
Matthew Dillon [Mon, 8 Feb 2010 07:37:53 +0000 (23:37 -0800)]
kernel - SWAP CACHE part 15/many - Correct bug in vm.swapcache.maxfilesize
* vm.swapcache.maxfilesize was being applied to meta-data as well as
file data. It is only supposed to be applied to regular file data.
Matthew Dillon [Mon, 8 Feb 2010 05:28:59 +0000 (21:28 -0800)]
kernel - SWAP CACHE part 14/many - Add more features, man page
* Implement write clustering. Swapcache attempts to cluster writes
for optimal matching between swap and the buffer cache. This
also reduces the IOPS for writes by a factor 16. The SSD should
be able to do write combining and erasing more optimally as well.
* Add vm.swapcache.minburst
This ensures that curburst is allowed to recover sufficiently that
a nice good write burst can be done, once curburst hits 0. Otherwise
swapcache winds up doing tiny bursts which tend to fragment the cache.
* Add vm.swapcache.maxfilesize
If set to non-zero prevents swapcache from caching files larger than
the specified size. That is, swapcache will only cache smaller files.
This is experimental because there are issues caching small files
anyway (the vnodes get recycled too quickly).
* Allow vm.swapcache.curburst to be manually set larger than
vm.swapcache.maxburst, so the initial load-in can be different
from the maximum reburst.
* Adjust the code which deals with write errors on swap to ensure
that the backing store is destroyed (because it isn't a clean copy).
Ulrich Spörlein [Wed, 20 Jan 2010 10:05:47 +0000 (11:05 +0100)]
fortune(6): Merge fortunes2 into regular fortunes
- Stop special ROT13 treatment of fortunes-o. Neither murphy-o,
fortunes2-o nor limerick were doing the same and contain even
more possibly offensive stuff.
- Merge the spelling files for fortunes{,-o}, this improves
maintainability in case fortunes are moved between the files
- make the installation of offensive stuff depend on
INSTALL_OFFENSIVE_FORTUNES, like NetBSD (defaults to yes).
Previously you had to edit the Makefile to disable this.
- Drop CVS Ids, which are no longer maintained :(
No fortunes added or removed from the pool.
Ulrich Spörlein [Sat, 9 Jan 2010 15:23:32 +0000 (16:23 +0100)]
fortune(6): Sync improvements with Free/Net/OpenBSD; deduplicate
- Typos, attributions and style improvements.
- Make attributions and style more consistent and conforming to Notes
Some of these are taken from FreeBSD, some from NetBSD and a few from
OpenBSD. Yet quite a few more are by yours truly.
Also:
- Fix typos in fortunes.sp.ok, murphy
- Remove duplicated fortunes (some where present thrice!)
- fortunes is king and loses no cookie
- fortunes-o contains no cookies already in fortunes
- fortunes2-o contains no cookies already in fortunes2, fortunes or
fortunes-o
- fortunes2 contains no cookies already in fortunes
- finally, cookies in fortunes2 were removed, if they were already in
fortune-o
The reasoning for the last step is, that when fortunes2 gets merged into
fortunes, no possible offensive quotes show up there, that were already
deemed offensive and moved from fortunes to fortunes-o
- Remove some quotes from murphy-o already in other files, sort
- Remove duplicates within limerick (via OpenBSD)
- Sync startrek to NetBSD/OpenBSD; sort
- Typos in zippy
Ulrich Spörlein [Sat, 9 Jan 2010 10:35:53 +0000 (11:35 +0100)]
fortune(6): Fix wording and typos
- "fortunes" is the name of the default fortune file
- fix a couple of typos
Ulrich Spörlein [Sun, 3 Jan 2010 20:47:38 +0000 (21:47 +0100)]
larn(6): remove unused (and stale) holidaysfile
Besides, there's no apparent code that acutally uses this.
Matthew Dillon [Sat, 6 Feb 2010 19:29:34 +0000 (11:29 -0800)]
kmapinfo - Adjustments to debug utility
* Fix up for recent kernel changes
* Properly report EMPTY gaps at the beginning and ending of the kernel_map.
Matthew Dillon [Sat, 6 Feb 2010 19:26:39 +0000 (11:26 -0800)]
kernel - SWAP CACHE part 13/many - More vm_pindex_t work for vm_objects on i386
* vm_object->size also needs to be a vm_pindex_t, e.g. when mmap()ing regular
HAMMER files or block devices or HAMMER's own use of block devices,
in order to support vm_object operations past the 16TB mark.
* Introduce a 64-bit-friendly trunc_page64() and round_page64(), just to
make sure we don't cut off page alignment operations on 64-bit offsets.
Matthew Dillon [Sat, 6 Feb 2010 19:24:37 +0000 (11:24 -0800)]
vmstat - Adjustments for kmalloc size_t changes
* Adjust for changes to struct malloc_type.
* Clean up the column output. Get rid of 'Size(s)' which is no longer
used and increase the width of some of the fields.
Matthew Dillon [Sat, 6 Feb 2010 19:23:21 +0000 (11:23 -0800)]
kernel - More conversions to size_t in struct malloc_type
* Missed ks_inuse.
Matthew Dillon [Sat, 6 Feb 2010 18:11:21 +0000 (10:11 -0800)]
kernel - Expand the x86_64 KVA to 8G part 2
* Fix a loop variable overflow when dumping the entire KVM space.
Matthew Dillon [Sat, 6 Feb 2010 17:52:08 +0000 (09:52 -0800)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Sat, 6 Feb 2010 17:43:06 +0000 (09:43 -0800)]
kernel - Expand the x86_64 KVA to 8G
* Our kmem_init() was mapping out the ~6G of KVA below KERNBASE. KERNBASE
is at the -2G mark and unlike i386 it does not mark the beginning of KVA.
Add two more globals, virtual2_start and virtual2_end, adn adjust
kmem_init() to use that space. This fixes kernel_map exhaustion issues
on x86_64. Before the change only ~600M of KVA was available after a
fresh boot.
* Populate the PDPs around both KERNBASE and at virtual2_start for
bootstrapping purposes.
* Adjust kernel_vm_end to start iteration for growkernel purposes at
VM_MIN_KERNEL_ADDRESS and no longer use it to figure out the end
of KVM for the minidump.
In addition, adjust minidump to dump the entire kernel virtual
address space.
* Remove numerous extranious variables.
* Fix a bug in vm_map_insert() where vm_map->first_free was being
incorrect set when the map does not begin with reserved space.
Matthew Dillon [Sat, 6 Feb 2010 17:13:11 +0000 (09:13 -0800)]
x86_64 kernel - Increase buffer cache and vnode resources, and more.
* Increase the maximum buffer cache from 200M to 400M. Note that
the buffer cache is backed by the VM page cache which is unlimited.
* Use size_t for kmalloc() tracking
* Allow 0 to be specified for kmalloc_raise_limit() which makes a
kmalloc pool unlimited.
* Adjust the kern.maxvnodes autocalculation for both i386 and x86_64.
i386 boxes with maximum memory will get a slightly lower vnode
limit while x86_64 boxes will get a dramatically higher vnode limit.
* Remove kmalloc pool limits for vnodes, for HAMMER inodes, and
for UFS inodes. These pools track maxvnodes and do not require
limits.
This fixes occassional kmalloc assertions and allows the sysop to
raise kern.maxvnodes on a running system.
Matthew Dillon [Sat, 6 Feb 2010 17:09:22 +0000 (09:09 -0800)]
kernel - Close MP race in vnode allocation code
* vx_lock_nonblock() is used by allocfreevnode() to interlock the
vnode being freed. However, this function will incorrect succeed
on a vnode recursively held by a caller of allocfreevnode() which
is in the middle of being reclaimed if the vnode in question
allows LK_CANRECURSE locks in the lockinit. UFS vnodes use this
mechanic.
Add a little bit of code to close the hole.
Matthew Dillon [Sat, 6 Feb 2010 16:57:05 +0000 (08:57 -0800)]
kernel - SWAP CACHE part 12/many - Add swapcache cleanup state
* Add a small state machine and hysteresis to flip between swapcache
writing and swapcache cleaning. The swapcache is written to until
(unless) it hits 75% use. If this occurs it switches to cleaning
mode to get rid of swapcache pages until it gets down to 70%. While
in cleaning mode burst accumulation still occurs. Then it flips back.
Currently the cleaning mode tries to choose swap meta-blocks which
are wholely swapped (have no VM pages), running linearly through
the VM object list in order to try to clean contiguous areas of
the swapcache. The idea is to reduce fragmentation that would lead
to excessive disk seeking. At the same time the limited cleaning
run (only 5% of the swap cache) should prevent any large-scale
excessive deletion of the swapcache.
* Add a new VM object type, OBJT_MARKER, which may be used by iterators
running through the vm_object_list.
Matthew Dillon [Sat, 6 Feb 2010 08:26:38 +0000 (00:26 -0800)]
kernel - usb keyboard - Fix polling issue on x86_64 when dropping into DDB
* USB keyboards stop responding when x86_64 drops into DDB. For some reason
this does not occur on 32-bit.
Add a missing call to usbd_dopoll() in ukbd_check() to proactively
solve the problem.
Michael Neumann [Sat, 6 Feb 2010 01:49:57 +0000 (02:49 +0100)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Michael Neumann [Sat, 6 Feb 2010 01:46:21 +0000 (02:46 +0100)]
aac: Add PCI identifier for Adaptec RAID 5405
Obtained-From: FreeBSD (aac_pci.c revision 174368)
Matthew Dillon [Sat, 6 Feb 2010 00:21:10 +0000 (16:21 -0800)]
kernel - SWAP CACHE part 11/many - Write improvements, fix backing store free
* Improve write staging by not counting VM pages which already have a
swap assignment when doing the limited scan of the INACTIVE VM page
queue.
As the swapcache starts to perform more and more disk I/O goes to it,
radically increasing the data rate and also radically increasing the
rate at which pages are shuffled between VM page queues. At some
point enough data is coming from the swapcache that vm.swapcache.maxlaunder
is unable to keep up even when sufficient burst bandwidth is available.
This led to an asymptotic caching curve. After the fix the caching
curve is linear (for data sets which fit in the swapcache).
* The swapcache associated with meta-data (VCHR vnodes) was not being
destroyed on umount. Adjust a conditional such that it is properly
destroyed. Otherwise stale data might be retained across e.g. a
media change.
Matthew Dillon [Fri, 5 Feb 2010 18:13:51 +0000 (10:13 -0800)]
kernel - SWAP CACHE part 10/many - Fix swap space usage calculation
* The code which limits how much swap space the swap cache uses was
broken. It was using the current amount of free swap space instead
of the total space, causing it to only use 40% of available swap
instead of 66%
* Fix the calculation and also make it 3/4 (75%) of configured swap.
Matthew Dillon [Fri, 5 Feb 2010 18:12:29 +0000 (10:12 -0800)]
kernel - slab allocator
* Track the total number of zones under management, in bytes, so
the value can be reconciled against malloc_type use tracking to
determine how much fragmentation is occurring.
Matthew Dillon [Fri, 5 Feb 2010 18:08:37 +0000 (10:08 -0800)]
AHCI - Fix minor bug. Also AHCI/SILI - use ATA_F_EXCLUSIVE for pass-thru
* The AHCI driver could sometimes queue multiple ATA_F_EXCLUSIVE commands.
This case never actually occurred but fix it anyway.
* Flag CAM pass-through commands as exclusive for safety.
Sascha Wildner [Fri, 5 Feb 2010 15:00:19 +0000 (16:00 +0100)]
Move the prototypes of pthread_kill() and pthread_sigmask() to <signal.h>.
In accordance with POSIX and like FreeBSD and NetBSD have it too.
In-discussion-with: Beket
Sascha Wildner [Fri, 5 Feb 2010 10:44:59 +0000 (11:44 +0100)]
it(4): Add it3 also in the other configs, not just GENERIC.
Matthew Dillon [Fri, 5 Feb 2010 08:24:04 +0000 (00:24 -0800)]
debug utilities - adjust vmpageinfo, add zallocinfo
* Adjust vmpageinfo to match recent changes. Add the symbolic names
for the flags.
* Add zallocinfo which dumps the state of the slab data structures.
Matthew Dillon [Fri, 5 Feb 2010 06:23:43 +0000 (22:23 -0800)]
vmstat - increase the maximum number of kmalloc types we can
* Increase from 200 to 1024. 200 wasn't enough.
Constantine A. Murenin [Fri, 5 Feb 2010 04:40:34 +0000 (23:40 -0500)]
kernel: print the amount of ignored memory above 4GB in MB, too
Matthew Dillon [Fri, 5 Feb 2010 04:32:41 +0000 (20:32 -0800)]
kernel - Use intmax_t when printing memory amounts
* Now that vm_pindex_t is 64 bits, fix various printf()s
Constantine A. Murenin [Fri, 5 Feb 2010 04:09:22 +0000 (23:09 -0500)]
kernel: print memory amount in MB instead of KB
* all other BSDs already print memory in MB instead of KB
Matthew Dillon [Fri, 5 Feb 2010 00:23:50 +0000 (16:23 -0800)]
kernel - SWAP CACHE part 9/many - Fix excessive active->cache moves
* Due to a bug the pageout daemon was moving an excessive number
of pages from the active queue to the cache queue, bypassing
the inactive queue.
This was preventing the swapcache from finding pages to write
out.
Matthew Dillon [Fri, 5 Feb 2010 00:16:58 +0000 (16:16 -0800)]
kernel - fix panic on reboot when swap populated
* The swapvp does not have a v_mount so do not try to access
the mount lock through it if v_mount is NULL.
Matthew Dillon [Thu, 4 Feb 2010 22:56:42 +0000 (14:56 -0800)]
kernel - SWAP CACHE part 8/many - Add the swap cache read intercept, rate ctl
* Add vn_cache_strategy() and adjust vn_strategy() to call it. This
implements the read intercept. If vn_cache_strategy() determines that
the entire request can be handled by the swap cache it issues an
appropriate swap_pager_strategy() call and returns 1, else it returns 0
and the normal vn_strategy() function is run.
vn_cache_strategy() only intercepts READ's which meet some fairly strict
requirements, including no bogus pages and page alignment (so certain
meta-data in UFS which uses a 6144 byte block size cannot be read via
the swap cache, sorry).
* Implement numerous sysctls.
vm.swapcache.accrate (default 1000000)
The average long-term write rate in bytes/second for writing
data to the swap cache. This is what ultimately controls the
wear rate of the SSD swap.
vm.swapcache.maxburst (default
1000000000)
vm.swapcache.curburst (default starts at
1000000000)
On machine boot curburst defaults to maxburst and will automatically
be trimmed to maxburst if you change maxburst. This allows a high
write-rate after boot.
During normal operation writes reduce curburst and accrate increases
curburst (up to maxburst), so periods of inactivity will allow another
burst of write activity later on.
vm.swapcache.read_enable (default 0 - disabled)
Enable the swap cache read intercept. When turned on vn_strategy()
calls will read from the swap cache if possible. When turned off
vn_strategy() calls read from the underlying vnode whether data
is available in the swap cache or not.
vm.swapcache.meta_enable (default 0 - disabled)
Enable swap caching of meta-data (The VM-backed block devices used
by filesystems). The swapcache code scans the VM page inactive
queue for suitable clean VCHR-backed VM pages and writes them to
the swap cache.
vm.swapcache.data_enable (default 0 - disabled)
Enable swap caching of data (Regular files). The swapcache code
scans the VM page inactive queue for suitable clean VREG-backed VM
pages and writes them to the swap cache.
vm.swapcache.maxlaunder (default 128 pages per 1/10 second)
Specifies the maximum number of pages in the inactive queue to
scan every 1/10 second. Set fairly low for the moment but
the default will ultimately be increased to something like 512
or 1024.
vm.swapcache.write_count
The total amount of data written by the swap cache to swap,
in bytes, since boot.
* Call swap_pager_unswapped() in a few more places that need it.
* NFS doesn't use bread/vn_strategy so it has been modified to call
vn_cache_strategy() directly for async IO. Currently we cannot
easily do it for synchronous IO. But async IO will get most of
it.
* The swap cache will use up to 2/3 of available swap space to
cache clean vnode-backed data. Currently once this limit is
reached it will rely on vnode recycling to clean out space
and make room for more.
Vnode recycling is currently excessively limiting the amount of
data which can be cached, since when a vnode is recycled it's
backing VM object is also recycled and the swap cache assignments
are freed. Meta-data has other problems... it can choke the
swap cache.
Dealing with these issues is on the TODO.
Matthew Dillon [Thu, 4 Feb 2010 22:32:11 +0000 (14:32 -0800)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Constantine A. Murenin [Thu, 4 Feb 2010 22:08:47 +0000 (17:08 -0500)]
it(4): it3 at port 0x228
* Port 0x228 is quite popular on many motherboards.
* Makes it(4) work on my GIGABYTE GA-MA78GM-S2H (780G / SB700).
Matthew Dillon [Thu, 4 Feb 2010 17:05:57 +0000 (09:05 -0800)]
kernel - SWAP CACHE part 7/many - Add vm_swapcache.c core (write side)
* Add vm_swapcache.c which will be responsible for assigning swap to clean
vnode-backed VM pages and writing the data out.
Implement a very simple inactive queue scanner and swap-writer for
testing.
* Track swap space use, split up into the piece used for anonymous
data and the piece used for clean vnode-backed data.
* Add PG_SWAPPED tracking for newly allocated VM pages via
swap_pager_page_inserted().
* Conditionalize the swap code's dirtying/undirtying of VM pages. We
don't want to mess with the dirty state when working the swap
cache since it isn't the definitive backing store for the VM page.
Matthew Dillon [Thu, 4 Feb 2010 03:24:44 +0000 (19:24 -0800)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Constantine A. Murenin [Tue, 2 Feb 2010 23:36:27 +0000 (18:36 -0500)]
syslog: introduce /var/log/daemon
* The idea is taken from OpenBSD.
* The immediate benefit is more informational messages from sensorsd,
e.g. stuff like the total number of sensors, configuration reloads
and 'OK' and 'within' status/state events.
Matthew Dillon [Thu, 4 Feb 2010 03:02:45 +0000 (19:02 -0800)]
kernel - SWAP CACHE part 6/many - Refactor swap_pager_freespace()
* Refactor swap_pager_freespace() to use a RB_SCAN() instead of a
vm_pindex_t iteration. This is necessary if we intend to allow
swap backing store for vnodes because the related files & VM objects
can be huge. This is also generally a good idea in 64-bit mode
to help deal with x86_64's massive address space.
* Start adding swap space freeing calls in the OBJT_VNODE handling code
and generic VM object handling code.
* Remove various checks for OBJT_SWAP from swap*() and swp*() functions
to allow them to be used with OBJT_VNODE objects.
* Add checks for degenerate cases to reduce call overheads as the swap
handling functions are now called for vnode objects too.
* Add assertions for pagers which do not need swap support.
Matthew Dillon [Thu, 4 Feb 2010 01:19:36 +0000 (17:19 -0800)]
kernel - SWAP CACHE part 5/many - Change vm_pindex_t to 64 bits on i386
* Change vm_pindex_t from unsigned long (32 bits) to __uint64_t (64 bits).
This change is necessary to support block devices with greater than 16TB
of storage as well as to support the mmap()ing of HAMMER files larger
than 16TB.
Primarily this was done to support block devices greater than 16TB
since HAMMER volumes are allowed to be up to 4096TB each. Filesystem
mounts use VM objects to back block devices.
* On x86_64 vm_pindex_t is already 64 bits but change the typedef from
unsigned long to __uint64_t to match i386.
* Most conversions to and from vm_pindex_t are to 64 bits anyway so this
change does not create any performance issues.
Matthew Dillon [Thu, 4 Feb 2010 00:50:09 +0000 (16:50 -0800)]
kernel - SWAP CACHE part 4/many - Add PG_SWAPPED
* Add the PG_SWAPPED flag to struct vm_page to indicate when
backing store has been assigned to a VM page.
Matthew Dillon [Wed, 3 Feb 2010 23:19:52 +0000 (15:19 -0800)]
kernel - VM - fix vm_pages_needed race
* vm_page_needed sleep/wakeup can race and cause a wakeup to be missed,
resulting in processes getting stuck in 'pfault' until something else
kicks the pager.
Fix the race.
Matthew Dillon [Wed, 3 Feb 2010 22:45:32 +0000 (14:45 -0800)]
kernel - SWAP CACHE part 3/many - Rearrange VM pagerops
* Remove pgo_init, pgo_pageunswapped, and pgo_strategy
* The swap pager was the only consumer of pgo_pageunswapped and
pgo_strategy. Since these functions will soon operate on any
VM object type and not just OBJT_SWAP there's no point putting
them in pagerops.
* Make swap_pager_strategy() and swap_pager_unswapped() global
functions and call them directly.
Matthew Dillon [Wed, 3 Feb 2010 21:23:58 +0000 (13:23 -0800)]
kernel - syncache - Fix races due to struct syncache not being stable storage
* struct syncache no longer uses stable storage. Proactively delete
tcpcb references to the syncache instead of letting them hang.
Matthew Dillon [Wed, 3 Feb 2010 21:22:23 +0000 (13:22 -0800)]
kernel - jails - Fix NULL pointer deref in prison_remote_ip()
* This might be a bit of a hack but shortcut the routine if
td->td_ucred is NULL. This occurs if the routine is called
via a kernel support thread.
Matthew Dillon [Wed, 3 Feb 2010 19:04:55 +0000 (11:04 -0800)]
AHCI - Improve warning messages when probing for a port multiplier
* Improve the warning messages on the console so the sysad knows the
PM probe failure is just a notification and not actually an error.
Submitted-by: "Edward O'Callaghan" <eocallaghan@auroraux.org>
Matthew Dillon [Wed, 3 Feb 2010 18:24:36 +0000 (10:24 -0800)]
sshd - Add safety measures to the default installed sshd_config
* Uncomment various sshd_config options to enforce their defaults.
This does not make any changes to the current defaults but ensures that
the configuration state for these particular options will not change
if the default happens to be changed in the distributed codebase.
RhostsRSAAuthentication no
HostbasedAuthentication no
IgnoreRhosts yes
* Change the ChallengeResponseAuthentication default from 'yes' to 'no'.
This only applies to PAM and PAM is disabled by default so this change
has no effect unless PAM is enabled by default at some future time.
* For now leave UsePAM commented out, do not enforce its default 'no' state.
The changes above will make it safe if the codebase default changes in
the future. The codebase default is currently 'no'.
* Note that we previously also changed the PasswordAuthentication default
to 'no', so everything is on the same page now.
Suggested-by: Doug Barton <dougb@freebsd.org> (generally)
Matthew Dillon [Wed, 3 Feb 2010 18:23:36 +0000 (10:23 -0800)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Sascha Wildner [Wed, 3 Feb 2010 15:36:28 +0000 (16:36 +0100)]
make upgrade: Don't remove /etc/upgrade/Makefile_upgrade.inc upon completion.
It's not dangerous to 'make upgrade' more than once and it's even useful
when testing.
Matthew Dillon [Wed, 3 Feb 2010 05:58:40 +0000 (21:58 -0800)]
kernel - SWAP CACHE part 2/many - Remove VM pager lists
* VM pager lists were used to associate handles with VM objects. Only the
device_pager actually used them. Store the VM object in cdev_t->si_object
instead and remove the device pager's VM pager list.
* phys_pager and swap_pager only use anonymous objects, the VM pager lists
were implemented but not used. Assert that the handles are NULL and remove
the VM pager lists.
* Remove vm_pager_object_lookup().
Matthew Dillon [Wed, 3 Feb 2010 04:36:21 +0000 (20:36 -0800)]
kernel - SWAP CACHE part 1/many - Convert swblock to a Red-Black tree
* Convert struct swblock from being hashed to a per-vm_object RB tree.
This remove two pointers from struct swblock but adds a RB_ENTRY which
is three pointers and an integer, so swblock gets a little more
bloated.
* Optimize swp_pager_meta_free_all(). We previously indexed through
the entire VM object's size which doesn't scale well for 64-bit
or for swap-cached vnodes. Now we need only iterate the RB tree.
* Move swblock fields out of the VM pager union and make them part of the
native vm_object structure. Swap block assignments will soon be allowed
on vnodes for fast data caching.
Stathis Kamperis [Tue, 2 Feb 2010 22:52:07 +0000 (00:52 +0200)]
HAMMER Utility - Handle PFS#0 case in 'snapls' directive
Since we are here, fix a memory leak.
Stathis Kamperis [Tue, 2 Feb 2010 19:45:24 +0000 (21:45 +0200)]
HAMMER Utility - Extend output in 'snapls' directive.
Matthew Dillon [Tue, 2 Feb 2010 18:17:52 +0000 (10:17 -0800)]
HAMMER Utility - Revise snaprm documentation
* Do a better job documenting the various arguments to the
snaprm directive.
Matthew Dillon [Tue, 2 Feb 2010 17:51:00 +0000 (09:51 -0800)]
HAMMER VFS - Fix assertion when taking snapshot
* hammer_ioc_add_snapshot() issues an ASOF lookup for the snapshot and
then a non-ASOF insertion (insertions never use ASOF). However, the
ASOF lookup can modify the cursor's key (cursor.key_beg).
This mismatch between the cursor's key and the leaf being inserted can
then result in an assertion in the btree insertion code.
* Reloading the key before doing the insertion fixes the problem. Also
document the case.
Reported-by: Stathis Kamperis <ekamperi@gmail.com>
Matthew Dillon [Sun, 31 Jan 2010 23:39:26 +0000 (15:39 -0800)]
kernel - IF_NFE - Continue work on word alignment support
* Add a capability and enable word-alignment conditionally. For now
just enable it for the MCP77 and MCP79 chipsets.
* Note that the CK804 family does not appear to support 2-byte
DMA alignment.
Reported-by: Rumko
Matthew Dillon [Sun, 31 Jan 2010 22:08:56 +0000 (14:08 -0800)]
kernel - NFS - Document an issue with nfs_realign()
* Document the fact that nfs_realign() must use blocking mbuf allocations
or risk locking up TCP NFS mount connections due to TCP NFS mounts not
retrying RPCs unless the link itself is lost.
Matthew Dillon [Sun, 31 Jan 2010 18:18:25 +0000 (10:18 -0800)]
kernel - SILI disk driver - Add support for Sil3124
* Sil3124 uses the same chipset ABI as the 3134 but with 4 ports
instead of 2. It appears to only need a PCI entry.
* This is for the PCI-X 3124. The 3124A is a PCI-e version which
probably will also work (not yet tested), and for which we still
need the PCI ID.
Submitted-by: Tim Darby <t+dfbsd@timdarby.net>
Sascha Wildner [Sun, 31 Jan 2010 17:19:26 +0000 (18:19 +0100)]
Regenerate sysproto.h (forgotten in last commit to syscalls.master).
Sascha Wildner [Sun, 31 Jan 2010 16:50:11 +0000 (17:50 +0100)]
POSIX says mprotect(2)'s first argument shall not be const.
Matthew Dillon [Sun, 31 Jan 2010 05:49:59 +0000 (21:49 -0800)]
kernel - NFE - Align packet data payload
* Offset the RX ring DMA by 2 bytes so the IP header, TCP header, and
payload is aligned after the 6-byte MAC header.
EM does the same thing.
* Reduces NFS overhead during bcopy()s and also avoids triggering
nfs_realign.
Sascha Wildner [Sun, 31 Jan 2010 05:26:42 +0000 (06:26 +0100)]
periodic.conf.5: Update for pkgsrc checks.
Describe the recently added variables so that people actually know about
them.
Adapted-from: NetBSD
Sascha Wildner [Sun, 31 Jan 2010 04:37:20 +0000 (05:37 +0100)]
Sync zoneinfo database with tzdata2010b from elsie.
northamerica: 8.28 -> 8.30
zone.tab: 8.31 -> 8.33
Beginning in 2010, several Mexican cities near the north border will share
their DST schedule with the United States.
This requires splitting up several zones (adding new ones for those
cities).
Sascha Wildner [Sun, 31 Jan 2010 04:20:20 +0000 (05:20 +0100)]
md5.1: Clean up the last commit a bit.
Constantine A. Murenin [Sat, 30 Jan 2010 09:45:01 +0000 (04:45 -0500)]
acpi.4: Xr aibs(4)
Constantine A. Murenin [Sat, 30 Jan 2010 09:44:41 +0000 (04:44 -0500)]
aibs(4): s/misformed/malformed/; suggested by Paul Goyette
Constantine A. Murenin [Sat, 30 Jan 2010 09:44:23 +0000 (04:44 -0500)]
aibs(4): use ACPI_INTEGER and PRIx64; suggested by Jukka Ruohonen
Matthew Dillon [Fri, 29 Jan 2010 18:55:34 +0000 (10:55 -0800)]
kernel - Fix issue in UFS related to new nvtruncbuf() API use
* When a UFS truncation must downsize a block it must sometimes call
FSYNC twice, the second time to flush out softdep block dependencies
related to the original indirect block.
UFS depends on the first FSYNC call to prevent the buffer cache buffer
straddling the new file/directory EOF from becoming dirty. However,
nvtruncbuf() defeats this by re-dirtying the bp.
The solution is to simply undirty the bp prior to the second FSYNC,
which works fine since it will be written out later with a b*write()
anyway.
* Fixes 'locking against myself' panic w/UFS.
Reported-by: Stathis Kamperis <ekamperi@gmail.com>
Matthew Dillon [Thu, 28 Jan 2010 17:04:34 +0000 (09:04 -0800)]
kernel - Even more buffer cache / VM coherency work
* nvtruncbuf/nvextendbuf now clear the cached layer 2 disk offset
from the buffer cache buffer being zero-extended or zero-truncated.
This is required by HAMMER since HAMMER never overwrites data
in the same media block.
* Convert HAMMER over to the new nvtruncbuf/nvextendbuf API.
The new API automatically handles zero-truncations and zero-extensions
within the buffer straddling the file EOF and also changes the way
backing VM pages are handled. Instead of cutting the VM pages off
at the nearest boundary past file EOF any pages in the straddling
buffer are left fully valid and intact, which avoids numerous pitfalls
the old API had in dealing with VM page valid/dirty bits during
file truncations and extensions.
* Make sure the PG_ZERO flag in the VM page is cleared in allocbuf().
* Refactor HAMMER's strategy code to close two small windows of
opportunity where stale data might be read from the media. In
particular, refactor hammer_ip_*_bulk(), hammer_frontend_trunc*(),
and hammer_io_direct_write(). These were detected by the fsx test
program on a heavily paging system with physical memory set artificially
low.
Data flows through three stages in HAMMER:
(1) Buffer cache.
(2) In-memory records referencing the direct-write data offset on the
media until the actual B-Tree is updated on-media at a later time.
(3) Media B-Tree lookups referencing the committed data offset on the
media.
HAMMER must perform a careful, fragile dance to ensure that access to
the data from userland doesn't slip through any cracks while the data
is transitioning between stages. Two cracks were found and fixed:
(A) The direct-write code was allowing the BUF/BIO in the strategy
call to complete before adding the in-memory record to the index
for the stage 1->2 transition. Now fixed.
(B) The HAMMER truncation code was skipping in-memory records queued
to the backend flusher under the assumption that the backend
flusher would deal with them, which it will eventually, but there
was a small window where the data was still accessible by userland
after the truncation if userland did a truncation followed by an
extension. Now fixed.
Matthew Dillon [Tue, 26 Jan 2010 20:50:33 +0000 (12:50 -0800)]
HAMMER VFS - Disallow rebalancing on small-memory machines
* Rebalancing may have to hold upwards of 3900 buffers locked
in the worst case, disallow the operation on machines which
do not configure enough buffer cache buffers.
Matthew Dillon [Tue, 26 Jan 2010 20:41:03 +0000 (12:41 -0800)]
kernel - More buffer cache / VM coherency work
* Add a buffer offset argument to nvtruncbuf(). The truncation length and
blocksize for the block containing the truncation point alone are
insufficient since prior blocks might be using a different blocksize.
* Add a buffer offset argument to nvnode_pager_setsize() for the same
reason.
* nvtruncbuf() and nvextendbuf() now bdwrite() the buffer being zero-filled.
This fixes a race where the clean buffer might be discarded and read
from the medias pre-truncation backing store again before the filesystem
has a chance to adjust it.
* nvextendbuf() now takes additional arguments. The block offset for the
old and new blocks must be passed.
* Convert UFS over to the use nv*() API, hopefully solving any remaining
fsx VM/BUF coherency issues.
* Correct bugs with swap_burst_read mode, but leave the mode disabled.
There are still unresolved issues when the mode is enabled.
(Reported-by: YONETANI Tomokazu <qhwt+dfly@les.ath.cx>)
* Fix a bug in vm_prefault() which would leak VM pages, eventually
causing the machine to run out of memory.
Jan Lentfer [Tue, 26 Jan 2010 17:33:00 +0000 (18:33 +0100)]
groff: Fixup after new version import
* adds a patch based on current groff cvs
that fixes error messages during man lint
runs
* Fixup tmac/Makefile to take new files
into account
Jan Lentfer [Tue, 26 Jan 2010 09:08:46 +0000 (10:08 +0100)]
groff: Update master to work with v1.20.1
* updated patches to apply cleanly
* removed one obsolete patch
Jan Lentfer [Mon, 25 Jan 2010 22:04:24 +0000 (23:04 +0100)]
groff: update vendor branch to v1.20.1