Matthew Dillon [Sat, 18 Sep 2010 20:23:41 +0000 (13:23 -0700)]
kernel - Optimize kfree() to greatly reduce IPI traffic
* Instead of IPIing the chunk being freed to the originating cpu we
use atomic ops to directly link the chunk onto the target slab.
We then notify the target cpu via an IPI message only in the case where
we believe the slab has to be entered back onto the target cpu's
ZoneAry.
This reduces the IPI messaging load by a factor of 100x or more.
kfree() sends virtually no IPIs any more.
* Move malloc_type accounting to the cpu issuing the kmalloc or kfree
(kfree used to forward the accounting to the target cpu). The
accounting is done using the per-cpu malloc_type accounting array
so large deltas will likely accumulate, but they should all cancel
out properly in the summation.
* Use the kmemusage array and kup->ku_pagecnt to track whether a
SLAB is active or not, which allows the handler for the asynchronous IPI
to validate that the SLAB still exists before trying to access it.
This is necessary because once the cpu doing the kfree() successfully
links the chunk into z_RChunks, the target slab can get ripped out
from under it by the owning cpu.
* The special cpu-competing linked list is different from the linked list
normally used to find free chunks, so the localized code and the
MP code is segregated.
We pay special attention to list ordering to try to avoid unnecessary
cache mastership changes, though it should be noted that the c_Next
link field in the chunk creates an issue no matter what we do.
A 100% lockless algorithm is used. atomic_cmpset_ptr() is used
to manage the z_RChunks singly-linked list.
* Remove the page localization code for now. For the life of the
typically chunk of memory I don't think this provided much of
an advantage.
Prodded-by: Venkatesh Srinivas
Sascha Wildner [Sat, 18 Sep 2010 18:32:38 +0000 (20:32 +0200)]
kernel: Remove #include <sys/mutex.h> if <sys/mutex2.h> is included too.
Sascha Wildner [Sat, 18 Sep 2010 18:06:33 +0000 (20:06 +0200)]
newfs(8): Remove some leftover defines that are no longer used.
Sascha Wildner [Sat, 18 Sep 2010 10:05:25 +0000 (12:05 +0200)]
Fix some synopses in various manual pages.
Matthew Dillon [Sat, 18 Sep 2010 00:56:27 +0000 (17:56 -0700)]
network - Correct bug in last commit
* Fix a crit_enter() that had to be removed.
Reported-by: YONETANI Tomokazu <qhwt.dfly@les.ath.cx>
Matthew Dillon [Fri, 17 Sep 2010 23:38:37 +0000 (16:38 -0700)]
network - Remove crit_exit/crit_enter wrappers in pf.c
* Note I'm talking about exit/enter wrappers, not enter/exit wrappers.
I believe the enter/exit wrappers can be removed too but for now
we have to remove the exit/enter wrappers which assumed a critical
section would be held on entry.
This is no longer the case. Since so much of the network stack is
now threaded callers into PF are not necessarily holding a critical
section to exit out of.
Reported-by: lentferj, Rumko
Matthew Dillon [Fri, 17 Sep 2010 22:29:22 +0000 (15:29 -0700)]
network - Zero out m_len / m_pkthdr.len in m_get*() and friends
* Newly allocated mbufs now set m_len and (if a packet header)
m_pkthdr.len to 0 instead of leaving them uninitialized,
allowing us to assert that the mbuf does not have an overrun
later when it is freed.
Reported-by: Jan Lentfer <Jan.Lentfer@web.de>
YONETANI Tomokazu [Fri, 17 Sep 2010 12:03:04 +0000 (21:03 +0900)]
ips - missing part from
667d31bb
Without this change, the added strings are never used.
Noticed-by: Sascha Wildner (via DragonFly BSD Digest)
Antonio Huete Jimenez [Fri, 17 Sep 2010 10:23:37 +0000 (12:23 +0200)]
udevd.8 - Fix SYNOPSIS
Matthias Schmidt [Fri, 17 Sep 2010 10:04:36 +0000 (12:04 +0200)]
Merge branch 'master' of git://git.dragonflybsd.org/dragonfly
Matthias Schmidt [Fri, 17 Sep 2010 09:49:53 +0000 (11:49 +0200)]
dma - Fix the parsing of recipient addresses
Author: Peter Pentchev <roam@ringlet.net>
Matthias Schmidt [Fri, 17 Sep 2010 09:18:13 +0000 (11:18 +0200)]
dma - Fix double free buf
Author: Peter Pentchev <roam@ringlet.net>
Matthias Schmidt [Fri, 17 Sep 2010 09:16:23 +0000 (11:16 +0200)]
dma.8 - Change wording to match dma.conf
dma.conf says "Uncomment", man page says "comment". Correct this to
match dma.conf.
Author: Peter Pentchev <roam@ringlet.net>
Matthias Schmidt [Fri, 17 Sep 2010 09:13:31 +0000 (11:13 +0200)]
dma - Set correct group in Makefile.plain
dma needs to be setgid in group mail
Author: Peter Pentchev <roam@ringlet.net>
Matthias Schmidt [Fri, 17 Sep 2010 08:49:03 +0000 (10:49 +0200)]
dma - Fix typo
Matthew Dillon [Fri, 17 Sep 2010 08:45:04 +0000 (01:45 -0700)]
network - Fix race in accept() - try #2
* The last fix wasn't good enough. Really try to fix it this time. Use
a pool token and validate so_head after acquiring it to deal with races,
interlock against 0-ref races (sockets can be on the so_comp/so_incomp
queues with 0 references), and use it for the accept predicate.
Matthew Dillon [Fri, 17 Sep 2010 08:30:10 +0000 (01:30 -0700)]
kernel - Add lwkt_relpooltoken()
* Add a convenience function that is symmetric with lwkt_getpooltoken(),
though slightly slower than simply lwkt_reltoken()'ing the token returned
by lwkt_getpooltoken().
Matthew Dillon [Fri, 17 Sep 2010 06:00:26 +0000 (23:00 -0700)]
network - Fix race in accept()
* Fix a race where a socket undergoing an accept() was not being
referenced soon enough, resulting in a window of opportunity for
the kernel to attempt to free it if the tcp connection resets
before userland can finish the accept.
This resulted in an assertion panic.
Reported-by: Peter Avalos
Matthew Dillon [Fri, 17 Sep 2010 05:59:23 +0000 (22:59 -0700)]
utilities - Correct variable types to match kernel
* *vnodes, dirtybuf, etc are still int's in the kernel, for
systat -vm output.
Sascha Wildner [Fri, 17 Sep 2010 03:25:59 +0000 (05:25 +0200)]
kernel: Staticize some dev_ops and adjust a name in dev/sound.
Matthew Dillon [Fri, 17 Sep 2010 02:56:17 +0000 (19:56 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Thu, 16 Sep 2010 21:57:27 +0000 (14:57 -0700)]
kernel - Adjust AHCI driver to deal with AMD braindamage / 880G chipset
* As of this writing AMD has some new chipsets out for AM3 MBs which
supports AHCI on 5 SATA + 1 E-SATA connector. My testing was done
on a MB with the 880G chipset.
The AHCI firmware for this chipset is a bit on the rough side. It
seems a bit slow on the INIT/device-detection sequencing (possibly due
to longer PHY training time? It's supposed to be a 6GBit PHY), and it
generates a stream of PCS interrupts for some devices.
My assumption is that the PCS interrupts are not being masked by the
chipset during the INIT phase. Both IFS and PCS interrupts seem to
occur during INIT/RESET and PM probing stages.
In addition, at least one drive... an Intel SSD, caused a large number
of PCS interrupts during the INIT phase even when connected to an
internal SATA port at power-on. This is clearly a bug in the AMD
AHCI chipset, again related to their firmware not internally masking
communications glitches during INIT, and/or taking an extra long time
to train the PHY.
* Adjust the AHCI driver to deal with this situation. Limit the interrupt
rate for PCS errors and do harsh reinitialization of the port when we get
a PCS error, along with allowing extra time for the device detect to
succeed.
* As a side benefit the AHCI driver should be able to deal with device
connection and disconnection on non-hot-swap-capable ports, at least
up to a point.
* Silence some of the console output during probe.
* Try harder to clear the CI/SACT registers when stopping a port. Some
chipsets appear to not clear the registers when we cycle ST when they
have already stopped the command processor, possibly as part of the IFS
or PCS interrupt paths.
* Fix a bug where an IFS or PCS interrupt marks a probe command (software
reset sequence) as complete when it actually errored-out.
* Sleep longer between retries if a command fails due to an IFS error.
When testing with the WD Green drives a drive inserted into a PM
enclosure cold seems to take longer to start up during the COMRESET
sequence. This only seems to occur with the AMD chipset and does
not occur with the older NVidia chipset. IFS errors occur for several
seconds beyond what I would consider a reasonable sleep interval.
Sascha Wildner [Fri, 17 Sep 2010 02:31:15 +0000 (04:31 +0200)]
pf(4): Fix a kprintf() warning on x86_64.
Matthias Schmidt [Thu, 16 Sep 2010 18:13:24 +0000 (20:13 +0200)]
rconfig(8) - Add new script to setup an encrypted root file system
This is basically a copy of hammer.sh modified to setup an encrypted
HAMMER root file system with cryptsetup/mkinitrd.
Venkatesh Srinivas [Thu, 16 Sep 2010 17:10:28 +0000 (10:10 -0700)]
Merge branch 'master' of /repository/git/dragonfly
Venkatesh Srinivas [Thu, 16 Sep 2010 17:09:00 +0000 (10:09 -0700)]
kernel - tmpfs: Convert tmpfs name malloc zone to a per-mount zone.
Now filenames from one tmpfs do not exhaust space in other ones.
Related to bug 1726.
Matthew Dillon [Thu, 16 Sep 2010 17:05:00 +0000 (10:05 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Thu, 16 Sep 2010 17:03:30 +0000 (10:03 -0700)]
kernel - Fix NFS panic
* nfs_write() was not wrapped with a token, leading to races.
* Add some queueing assertions while we are here.
Reported-by: Thomas Nikolajsen <thomas.nikolajsen@mail.dk>
Sascha Wildner [Thu, 16 Sep 2010 16:42:50 +0000 (18:42 +0200)]
Make bluetooth(4) compileable into the kernel and add it to LINT.
Sascha Wildner [Thu, 16 Sep 2010 16:42:33 +0000 (18:42 +0200)]
bluetooth(4): Remove an unused variable.
Matthew Dillon [Thu, 16 Sep 2010 16:37:34 +0000 (09:37 -0700)]
network - Fix unconverted netmsg function
* Fix a function I forgot to convert to the netmsg argument format.
Reported-by: swildner
Matthew Dillon [Thu, 16 Sep 2010 15:55:54 +0000 (08:55 -0700)]
HAMMER Utility - Adjust documentation
* Add some missing bits re: checkmap
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Matthew Dillon [Thu, 16 Sep 2010 15:50:39 +0000 (08:50 -0700)]
network - Fix if_gif build when no INET6
* Make if_gif build properly when INET6 is not specified.
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Alex Hornung [Thu, 16 Sep 2010 10:03:01 +0000 (12:03 +0200)]
initrd - Allow realroot to not have /dev for paths
* Check if the realroot (for local and crypt) has a MOUNTFROM that
begins with /dev/, i.e. is a full path. If not, just prepend /dev/.
Matthew Dillon [Thu, 16 Sep 2010 07:52:46 +0000 (00:52 -0700)]
network - Add some serious assertions when MBUF_DEBUG is enabled (2)
* Missed in the first commit, the sys/mbuf.h changes. Note that a full
kernel compile is required when MBUF_DEBUG is added (or removed)
from the kernel config. You can't mix-n-match
Matthew Dillon [Thu, 16 Sep 2010 07:50:40 +0000 (00:50 -0700)]
network - Allow asynchronous shutdown and fix a MP race in soshutdown().
* The ssb_release() call in sorflush() must be protected by
socket->so_rcv.ssb_token. This call is made from the user
side when soshutdown() is called.
* Allow shutdowns to interrupt another thread read()ing from the same
descriptor by removing the user-side interlock in the shutdown code
path.
Matthew Dillon [Thu, 16 Sep 2010 07:49:33 +0000 (00:49 -0700)]
network - Add assertions for direct messaged calls
* Certain pru_* functions are direct-messaged calls and the operation must
be done on return. Assert that the operation is done.
Matthew Dillon [Thu, 16 Sep 2010 07:48:05 +0000 (00:48 -0700)]
network - Add some serious assertions when MBUF_DEBUG is enabled
* Assert that the mbuf field state is sane when pulling a new one out of
the object cache.
* Store the last function (name) to free an mbuf as a debugging aid.
Matthew Dillon [Thu, 16 Sep 2010 07:44:10 +0000 (00:44 -0700)]
kernel - Make interrupt thread preemption programmable
* Add sysctl lwkt.preempt_enable (default on) to allow interrupt thread
preemption to be controlled for debugging purposes.
Matthew Dillon [Thu, 16 Sep 2010 07:38:49 +0000 (00:38 -0700)]
network - Fix nasty bug in udp6_send()
* This bug was causing machines receiving inet6 udp packets to crash
very quickly, but was nearly impossible to find due to the weird
way it caused mbufs to interact.
Reported-by: Peter Avalos <peter@theshell.com>,
Francois Tigeot <ftigeot@wolfpond.org>
Alex Hornung [Thu, 16 Sep 2010 08:37:27 +0000 (10:37 +0200)]
crashinfo - (hopefully) fix hang
* Pipe the commands into kgdb instead of using <. This seems to fix an
issue where kgdb wouldn't really get anything out of the file on the
other side of < and would get stuck in kqread.
Reported-by: Peter Avalos
Matthew Dillon [Thu, 16 Sep 2010 00:41:31 +0000 (17:41 -0700)]
network - Fix MP races in GIF
* GIF used a single route cache across all CPUs causing races. In addition
GIF did not clean out the cache when destroying an interface or changing
the address family.
* Change the single route cache entry to an array[SMP_MAXCPU] and also
separate out the inet4 and inet6 route cache entries.
Matthew Dillon [Thu, 16 Sep 2010 00:40:50 +0000 (17:40 -0700)]
vknetd - Adjust unsecure mode (-U) to also pass any IP protocol.
* When running in unsecure mode all IP protocols will now be passed.
Samuel J. Greear [Thu, 16 Sep 2010 00:23:41 +0000 (00:23 +0000)]
kernel - Rename the sndtok to sndtok
* After all, they can't both be a rcvtok.
Matthew Dillon [Wed, 15 Sep 2010 20:22:25 +0000 (13:22 -0700)]
kernel - Fix MADV_NOSYNC and MAP_NOSYNC, improve vkernel performance
* The vm_prefault() code was not setting PG_NOSYNC so only 1/4 of the
pages of a NOSYNC memory mapping were actually NOSYNC.
* This bug caused the vkernel to essentially flush out all of its
dirty memory pages every 30 seconds. Needless to say this was bad.
The vkernel can now be run with its memory set in the multiples
of gigabytes (if you happen to have that much real memory) without
creating a massive disk load.
Matthew Dillon [Wed, 15 Sep 2010 20:17:18 +0000 (13:17 -0700)]
kernel - Increase x86_64 & vkernel kvm, adjust vm_page_array mapping
* Change the vm_page_array and dmesg space to not use the DMAP area.
The space could not be accessed by userland kvm utilities due
to that issue.
TODO - reoptimize to use 2M super-pages.
* Auto-size NKPT to accomodate the above changes as vm_page_array[]
is now mapped into the kernel page tables.
* Increase NKPDPE to 128 PDPs to accomodate machines with large
amounts of ram. This increases the kernel KVA space to 128G.
Matthew Dillon [Wed, 15 Sep 2010 16:42:06 +0000 (09:42 -0700)]
network - Major netmsg retooling, part 2
* Convert remaining protocols (divert, ipx, mpls, natm).
* Minor code correction in gif (no operational change).
* Remove NS protocol from LINT in preparation for complete removal
from tree.
Alex Hornung [Wed, 15 Sep 2010 11:41:39 +0000 (13:41 +0200)]
dloader - Add support for kernel_options=""
* Add back the support to specify kernel_options a la
kernel_options="-v -a".
Reported-by: Sascha Wildner (swildner@)
Alex Hornung [Wed, 15 Sep 2010 11:07:11 +0000 (13:07 +0200)]
dloader - Allow foo_name for modules
* Allow a foo_name apart from foo_load and foo_type to specify the
actual file name (module name) to load.
* This fixes the acpi dsdt overrides.
Reported-by: Sascha Wildner (swildner@)
Alex Hornung [Wed, 15 Sep 2010 10:41:01 +0000 (12:41 +0200)]
dloader - Allow foo_type for modules
* Allow a foo_type apart from foo_load to specify the type of the module
to be loaded.
* This fixes the use of md_image type for md preloads, and this in turn
fixes the initrd system.
Reported-by: Matthias Schmidt (matthias@)
Alex Hornung [Wed, 15 Sep 2010 09:30:23 +0000 (11:30 +0200)]
mkinitrd - Adjust initrd.img path to new loader
* Adjust the path where to install the initrd.img to /boot/kernel, to be
compatible with dloader.
Reported-by: Matthias Schmidt (matthias@)
Sascha Wildner [Wed, 15 Sep 2010 09:37:27 +0000 (11:37 +0200)]
syscons(4): Move tty token release and acquirement around Debugger().
YONETANI Tomokazu [Wed, 15 Sep 2010 05:07:56 +0000 (14:07 +0900)]
ips - Issue IPS_CACHE_FLUSH_CMD to the controller on BUF_CMD_FLUSH
Previously, BUF_CMD_FLUSH ended up as a zero-byte write command, which
always fails, flooding the console with `iobuf error 5'. Filesystems
other than HAMMER almost never issues this command, so we've never
seen the error message in pre-HAMMER days. This commit adds a new path
for BUF_CMD_FLUSH and issue IPS_CACHE_FLUSH_CMD for it.
Also mention the tunable/sysctl knob debug.ips.ignore_flush_cmd in ips(4)
man page in case the new behavior confuses your controller; when set, the
driver just discards BUF_CMD_FLUSH.
YONETANI Tomokazu [Wed, 15 Sep 2010 05:07:53 +0000 (14:07 +0900)]
Make it easier to find proper manual page for newer ServeRAID controllers.
Obtained-From: FreeBSD r196701
YONETANI Tomokazu [Wed, 15 Sep 2010 05:07:50 +0000 (14:07 +0900)]
ips - Add Adaptec ServeRAID 7x IDs. IDs taken from Linux.
Taken-from: FreeBSD r163024, r163995
Matthew Dillon [Wed, 15 Sep 2010 03:47:10 +0000 (20:47 -0700)]
network - Protect so_rcv sockbuf in udp and unix domain protocols
* The so_rcv sockbuf was not being locked against the user side
when the unix and udp protocols appended to it, resulting in
assertions.
Matthew Dillon [Wed, 15 Sep 2010 03:19:17 +0000 (20:19 -0700)]
network - Increase basic mbuf size from 256 to 384 bytes
* Due to the bloat in m_hdr and m_pkthdr the 256-byte mbuf structure
is no longer large enough and there appears to be quite a bit of
legacy code still using m_get() and making assumptions on the
available space without checking actual space.
We have assertions in place to catch these but stabilizing the
system is more important right now.
* Increase the basic mbuf buffer size from 256 to 384 bytes.
Matthew Dillon [Tue, 14 Sep 2010 23:28:53 +0000 (16:28 -0700)]
network - Major netmsg retooling, part 1
* Remove all the netmsg shims and make all pr_usrreqs and some proto->pr_*
requests directly netmsg'd.
* Fix issues with tcp implied connects and tcp6->tcp4 fallbacks with
implied connects.
* Fix an issue with a stack-based udp netmsg (allocate it)
* Consolidate struct ip6protosw and struct protosw into a single
structure and normalize the API functions which differed between
the two (primarily proto->pr_input()).
* Remove protosw->pr_soport()
* Replace varargs protocol *_input() functions (ongoing) with fixed
arguments.
Matthew Dillon [Tue, 14 Sep 2010 22:59:28 +0000 (15:59 -0700)]
vkernel - Improve memory image file startup
* Remove the code that pre-filled a memory image file with zeros. It's
completely worthless, particularly with HAMMER.
* On startup truncate the memory file to 0 and then extend to the
memory size, deleting any backing store from the prior vkernel run.
The new file will start out full of holes.
This greatly improves vkernel startup time.
Sascha Wildner [Tue, 14 Sep 2010 19:48:10 +0000 (21:48 +0200)]
twa(4): Sync with FreeBSD (twa(4) version 3.80.06.003).
Thanks to Xin Li for notifying me of this update.
Tested-by: Damian Lubosch <dl@xiqit.de>
Sascha Wildner [Tue, 14 Sep 2010 18:03:10 +0000 (20:03 +0200)]
ie(4): This driver is ISA only, so remove some unneeded files from SRCS.
Jan Lentfer [Sat, 11 Sep 2010 22:34:08 +0000 (00:34 +0200)]
pf: Make pf work w/ the MPSAFE network stack
add pf_token where appropriate
in pf_socket_lookup() use lwkt_domsg() instead of lwkt_sendmsg()
to make race conditions more unlikely
if_pfsyn.c: re-add lost init code
Jan Lentfer [Sat, 11 Sep 2010 18:50:32 +0000 (20:50 +0200)]
pf: Revert commit 5165ac2
I was too hasty changing byte ordering when trying
to track down a NAT problem
Sepherosa Ziehau [Tue, 14 Sep 2010 14:26:47 +0000 (22:26 +0800)]
ACPI P-State: Force P-State to use the first usable entry in P-State table
It looks like on certain boxes P-State will be set to the last usable
P-State (i.e. lowest frequency)
Sepherosa Ziehau [Tue, 14 Sep 2010 13:43:14 +0000 (21:43 +0800)]
ACPI P-State: When there is no _PSD, create one CPU domain for each CPU
Sepherosa Ziehau [Tue, 14 Sep 2010 13:29:00 +0000 (21:29 +0800)]
test commit
Matthew Dillon [Tue, 14 Sep 2010 01:40:54 +0000 (18:40 -0700)]
network - UDP currently only going to one proto thread
* Adjust udp_addrcpu() to always return cpu 0 for now, the UDP
implementation currently only operates on protocol thread 0.
Matthew Dillon [Tue, 14 Sep 2010 01:40:22 +0000 (18:40 -0700)]
network - Protect soreceive() from backend
* Somehow I missed the token required in soreceive() to protect it from
the backend.
Matthew Dillon [Tue, 14 Sep 2010 00:04:03 +0000 (17:04 -0700)]
network - Fix udp self-referential panic
* udp_ctlinput() can't call domsg, it has to start the chain going with
lwkt_sendmsg().
* Currently udp only runs on protocol thread cpu 0.
Matthew Dillon [Mon, 13 Sep 2010 23:53:56 +0000 (16:53 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Mon, 13 Sep 2010 23:50:55 +0000 (16:50 -0700)]
kernel - swapoff - regenerate system calls
* Added swapoff, regenerate system calls.
Matthew Dillon [Mon, 13 Sep 2010 23:41:40 +0000 (16:41 -0700)]
Kernel - Implement swapoff
* Generally port of the swapoff implementation from FreeBSD to DragonFly,
with major modifications.
Modifications to handle swapcache issues (VCHR vnodes with VM objects
can have swap associations for swapcache).
* Libkvm changes
So there are two problems with libkvm. The first is not really
swapoff-related - the new sysctl way of reporting numbers bzero'es
swap_max elements in the given swap_ary array. This is in contrast to
the old kvm way, which bzero'es only those elements that will be
actually filled. So if we have 3 swap devices and swap_max is 16, then
the sysctl code will zero out all 16 elements and fill the first 4,
while the old kvm code will zero out exactly 4 elements and fill them.
Since we want to keep API stable (I learned it the hard way :-) ) I
think this fix can be separated out and go to master as a bugfix to the
newly introduced sysctl way of reporting things.
The second problem only shows up if we introduce a swapoff syscall
and enforce using of the old kvm way. It was written with the
assumption that swap devices can only be added, not removed - it
assumes than if I have a swap device with index 3, 4 swap
devices are active. This is not true with swapoff - I can swapon
A, B, C and D, then swapoff B and C and here we are - I have an
active swap device with index 3, but only 2 devices are active.
It turned out to be easier to just rewrite it (based on sysctl way),
because that assumption was rather deep and everything was based on it.
Since along with sysctl way per-device swap accounting was introduced,
the kvm way now uses it instead of scanning blist.
Which brings us to the last change - blist scanning code is now used
only for debugging purposes. getswapinfo_radix() is now called only if
DUMP_TREE flag is set. Pieces that touched swap_ary entries are removed,
swap_ary and swap_max are no longer passed to scanning code.
After all that both ways are now working correctly with the regards to
the swapoff call and the old kvm way (the behaviour is exactly the same,
all boudary cases were tested, API remains the same). The only (minor)
difference is that swapctl numbers are a little bit bigger than kvm way
ones. Thats because kvm way subtracts dmmax (the assumption is that the
first dmmax is never allocated), and sysctl way does not. I tried to fix
this, but it turns out that we need to introduce a dmmax sysctl for that.
So if you want I can add it, but I want to hear from you first (both on
this thing and my changes to libkvm in general).
* Userspace. Add swapoff & adjust manual pages.
Note: Bounty project ($300)
Submitted-by: Ilya Dryomov <idryomov@gmail.com>
Sascha Wildner [Mon, 13 Sep 2010 17:52:26 +0000 (19:52 +0200)]
Fix some mdoc issues in tcp.4's SEE ALSO.
Sascha Wildner [Mon, 13 Sep 2010 17:08:22 +0000 (19:08 +0200)]
Fix two typos in manual pages and messages.
Matthew Dillon [Mon, 13 Sep 2010 15:58:51 +0000 (08:58 -0700)]
build - Fix vkernel installation target for /usr/src/test/vkernel
* Fix the installation target in /usr/src/test/vkernel/Makefile, it
was assuming the old style /boot kernel layout instead of the new.
Matthew Dillon [Mon, 13 Sep 2010 15:20:16 +0000 (08:20 -0700)]
devfs - Fix use-after-free case when making pty's invisible
* Fix a use-after-free case when making a pty devfs node invisible.
The dev_dclose(). Move the test/flag to before the dev_dclose() call.
* Document that the pty code may destroy the device in the dev_dclose()
call, causing the node to become stale.
Reported-by: Francois Tigeot <ftigeot@wolfpond.org>
Reminded-by: sjg
Venkatesh Srinivas [Mon, 13 Sep 2010 11:26:33 +0000 (04:26 -0700)]
Fix !INVARANTS build and reformat token asserts for easy reading.
Matthew Dillon [Mon, 13 Sep 2010 07:08:53 +0000 (00:08 -0700)]
network - Fix multiple MP races (2)
* MEVENT signaling needs the ssb_token as well as the kq_token for now
to prevent blocking inside the predicate. This is a hack for now.
* Add missing porttoken protection in in_pcbremlists().
Reported-by: lentferj
Matthew Dillon [Mon, 13 Sep 2010 05:33:08 +0000 (22:33 -0700)]
network - Fix multiple MP races
* Fix sonewconn() races. sonewconn() was attaching prior to changing
the socket->so_port, relying on the caller to set the socket->so_port.
This resulted in a race where userland wound up with visibility on the
socket and could issue commands, like close(), which would end up going
to the original protocol thread instead of the post-connect protocol thread
which was handling the sonewconn().
Thus the close() could message the backend to detach and compete
against the sonewconn() because the detach message was going to
a different protocol thread.
* When the socket->so_port is changed the inpcb was not being moved
from the old pcbinfo->pcblisthead list to the new one, resulting
in MP races later on during removal.
* Add more debugging kprintf()s.
* Clean up sosetport() use, remove the now-unused *_soport_attach().
Reported-by: Many
Matthew Dillon [Mon, 13 Sep 2010 03:23:22 +0000 (20:23 -0700)]
network - remove the redispatch local
* Remove the resdispatch local variable which is no longer used.
Peter Avalos [Mon, 13 Sep 2010 02:09:43 +0000 (02:09 +0000)]
savecore: Fix typo in comment.
Matthew Dillon [Sun, 12 Sep 2010 20:14:23 +0000 (13:14 -0700)]
network - Add debugging assertions
* Add some assertions to try to catch failure cases earlier.
Matthew Dillon [Sun, 12 Sep 2010 17:30:38 +0000 (10:30 -0700)]
kernel - Fix one-cycle MP race in vshouldmsync()
* vshouldmsync() is the mntvnode fast function, which is called without
any vnode lock. vp->v_object can thus get ripped out from under the
scan function.
Hold vmobj_token through the scan so any pointer accessed via
v_object remains stable (even if no longer related to the vnode
due to the race).
Reported-by: swildner
Sascha Wildner [Sun, 12 Sep 2010 17:12:54 +0000 (19:12 +0200)]
pfctl.8: Adjust date.
Sascha Wildner [Sun, 12 Sep 2010 17:12:36 +0000 (19:12 +0200)]
crashinfo.8: Remove trailing whitespace.
Matthew Dillon [Sun, 12 Sep 2010 04:35:14 +0000 (21:35 -0700)]
network - Assert that the packet's data has not overrrun the buffer in m_free()
* Add an assertion to try to catch subsystems which blow up a mbuf's
data buffer.
Sascha Wildner [Sat, 11 Sep 2010 20:55:05 +0000 (22:55 +0200)]
pfctl(8): Fix some printf issues (and buildworld on x86_64).
Matthew Dillon [Sat, 11 Sep 2010 20:37:21 +0000 (13:37 -0700)]
network - Fix tcp inpcb race
* tcbinfo[cpu].porthashbase was being shared across all the cpus,
creating MP races. Change it so it isn't shared.
Reported-by: "Samuel J. Greear" <sjg@evilcode.net>
Matthew Dillon [Sat, 11 Sep 2010 18:53:57 +0000 (11:53 -0700)]
network - More tokens for ipsec
* Get key_token in more places to fix MP races.
Matthew Dillon [Sat, 11 Sep 2010 18:52:45 +0000 (11:52 -0700)]
kernel - cleanup & assertions in mbuf code
* Assert the mbuf's next/nextpkt fields are NULL on allocation.
Matthew Dillon [Sat, 11 Sep 2010 18:50:56 +0000 (11:50 -0700)]
kernel - Fix mprace in kern_objcache
* mag_purge() is interruptable, do not continue the purge
if the magazine is moved. For example, the magazine could
move to the depot and we would wind up continuing to purge
it without the depot lock.
* Add some temporary debugging
Sascha Wildner [Sat, 11 Sep 2010 16:40:54 +0000 (18:40 +0200)]
Re-add RSS_DEBUG to LINT.
Alex Hornung [Sat, 11 Sep 2010 13:21:23 +0000 (14:21 +0100)]
Fix manual break to debugger
* When manually breaking to debugger, we can't hold any tokens as they
get in the way of kbdmux' lockmgr in an interrupt context.
Alex Hornung [Sat, 11 Sep 2010 12:25:34 +0000 (13:25 +0100)]
savecore,crashinfo - fix several problems
* Fix the savecore rc.d script to only run savecore (and crashinfo) when
there's actually a core dump available.
* Limit the kgdb CPU time to 15 seconds to avoid looping forever if we
have a somewhat broken vmcore.
Venkatesh Srinivas [Sat, 11 Sep 2010 11:16:39 +0000 (04:16 -0700)]
kernel: bzeront() - Switch out loop instruction in i386 bzero for sub/jnz.
Much as I like loop, it has a ~7-9 cycle latency on AMD CPUs. Lets spend
idlezero time actually zeroing...
Sascha Wildner [Sat, 11 Sep 2010 09:40:56 +0000 (11:40 +0200)]
ftp-proxy.8 & pflogd.8: Fix some mdoc issues.
Sascha Wildner [Sat, 11 Sep 2010 08:57:34 +0000 (10:57 +0200)]
Fix LINT build.
I never get why LINT isn't just checked before pushing such huge
changes. Takes just a couple of minutes, really. :)
Matthew Dillon [Sat, 11 Sep 2010 06:02:39 +0000 (23:02 -0700)]
network - Correct double free of mbuf during reboot
* Correct code which was leaving a stale mbuf pointer intact
when flushing the so_rcv sockbuf in a socket. This normally
occured during shutdown/reboot.
Matthew Dillon [Sat, 11 Sep 2010 05:41:30 +0000 (22:41 -0700)]
build - Fix netgraph
* Some source files were missing newly required includes for their
use of the mplock and socketvar2.h inlines.
Reported-by: lentferj
Matthew Dillon [Sat, 11 Sep 2010 11:58:35 +0000 (11:58 +0000)]
network - raw_input needs further protection
* We also need the so_rcv.ssb_token to protect against userland