Sascha Wildner [Wed, 27 Feb 2013 19:13:21 +0000 (20:13 +0100)]
kernel: Fix -Wundef in a number of places.
Sepherosa Ziehau [Wed, 27 Feb 2013 01:52:57 +0000 (09:52 +0800)]
gcc47/i386: Add more -mno flags
Matthew Dillon [Tue, 26 Feb 2013 23:20:43 +0000 (15:20 -0800)]
systat - Fix overflowing path lookup fields
* Reduce the field width for Path-lookups from 9 to 6 and
hits from 7 to 6. This normalizes the fields so similar
numbers use the same units and ensures at least one space
between them.
* Fixes display issues on large multi-way systems.
Sascha Wildner [Tue, 26 Feb 2013 21:50:25 +0000 (22:50 +0100)]
isp(4): Remove a duplicate xpt_alloc_ccb() that was causing leakage.
Confirmed-by: mjacob@
Matthew Dillon [Tue, 26 Feb 2013 17:40:55 +0000 (09:40 -0800)]
kernel - Fix mount bug caught by assertion
* A recently added assertion caught a bug in the mount code where
a namecache entry was not being properly locked.
* Fix the bug in checkdirs() (called by the mount code).
Antonio Huete Jimenez [Tue, 26 Feb 2013 12:51:42 +0000 (13:51 +0100)]
kqueue.2 - Mention tmpfs(5) as a kqueue-enabled filesystem.
Sepherosa Ziehau [Tue, 26 Feb 2013 13:18:51 +0000 (21:18 +0800)]
bce: Put interrupt reenabling into each interrupt handlers
So shared interrupt reenabling code do not need to check interrupt
type; only legacy interrupt needs extra register writing.
Sepherosa Ziehau [Tue, 26 Feb 2013 12:28:49 +0000 (20:28 +0800)]
bce: Move status index's location and cached status index into RX ring
Sascha Wildner [Tue, 26 Feb 2013 09:35:26 +0000 (10:35 +0100)]
Revert "<malloc.h>: Restrict support for <malloc.h> to !defined(__STDC__)."
This reverts commit
1b3342693b737646f3cab0715e31ec6ab5216b38.
It caused too many issues in the package department.
Reported-by: marino
Sepherosa Ziehau [Tue, 26 Feb 2013 09:42:40 +0000 (17:42 +0800)]
gcc47/x86_64: Add more -mno flags
Matthew Dillon [Tue, 26 Feb 2013 08:35:03 +0000 (00:35 -0800)]
kernel - Try harder to unmount a filesystem
* Use LK_TIMELOCK (5 seconds) instead of LK_NOWAIT when getting the mp
lockmgr lock for unforced unmounts.
* Remove the syncer vnode and issue VFS_SYNC prior to checking
mnt_refs instead of after the check. This appears to improve tmpfs's
chances of unmounting, though it is a bit unclear as to why.
* Wait up to 1 second for mnt_refs to drop to 1 before giving up.
* Improves Poudriere's chances of successfully unmounting a tmpfs
filesystem.
Matthew Dillon [Tue, 26 Feb 2013 03:27:05 +0000 (19:27 -0800)]
kernel - Fix shared/excl livelock with vm.shared_fault
* The vop_helper_read_shortcut() code was holding a shared token on
a VM object through a uiomove(). If the uiomove() generated a VM
fault requiring a shadow copy, the shadow copy would try to get
an exclusive token on potentially the same object and livelock.
* Fix by unlocking/relocking across the uiomove().
Matthew Dillon [Tue, 26 Feb 2013 02:31:26 +0000 (18:31 -0800)]
kernel - Fix vm.shared_fault for vkernels and 32-bit
* The pmap code needed the same changes as were made to the 64-bit
pmap code to avoid a live lock.
Reported-by: davshao, tuxillo
Matthew Dillon [Tue, 26 Feb 2013 01:37:14 +0000 (17:37 -0800)]
kernel - Fix panic on ptrace termination
* Fix a panic in the situation where gdb is exiting and terminating
a ptrace, but the original parent prpocess of the process being
debugged no longer exists.
Matthew Dillon [Tue, 26 Feb 2013 00:49:01 +0000 (16:49 -0800)]
kernel - Beef up lwkt_dropmsg() API and fix deadlock in so_async_rcvd*()
* Beef up the lwkt_dropmsg() API. The API now conditionally returns
success (0) or an error (ENOENT).
* so_pru_rcvd_async() improperly calls lwkt_sendmsg() with a spinlock
held. This is not legal. Hack up lwkt_sendmsg() a bit to resolve.
Matthew Dillon [Mon, 25 Feb 2013 19:51:15 +0000 (11:51 -0800)]
kernel - Remove symbol space corruption from ncp_conn.h (2)
* libncp also needed adjustment.
Matthew Dillon [Mon, 25 Feb 2013 19:15:26 +0000 (11:15 -0800)]
kernel - Remove symbol space corruption from ncp_conn.h
* ncp_conn.h was #defining 'ipxaddr', 'inaddr', and 'saddr', all
commonly used variable names. This was interfering with netmsg.h.
* Remove the definitions, replace use cases with expansion.
Sascha Wildner [Mon, 25 Feb 2013 17:42:17 +0000 (18:42 +0100)]
kernel/i386: Use offsetof() here.
Sepherosa Ziehau [Mon, 25 Feb 2013 14:13:14 +0000 (22:13 +0800)]
arp: Embed netmsg_inarp in mbuf for asynchronized ARP input processing
- Embed netmsg_inarp, which records necessary states for routing table
updating and later ARP reply, into mbuf; this does not change mbuf
header size.
- If routing tables need updating upon ARP packet reception, the
netmsg_inarp embedded in the input ARP packet is sent asynchronizingly
to routing threads and the possible ARP reply is defered until all
routing tables are updated, i.e. the ARP packet is redispatched to
netisr0 for the ARP reply sending from the last routing thread.
- Remove no longer needed dedicated network threads.
Discussed-with: dillon@
Sepherosa Ziehau [Mon, 25 Feb 2013 09:11:12 +0000 (17:11 +0800)]
netmsg: Update comment
Matthew Dillon [Mon, 25 Feb 2013 05:50:51 +0000 (21:50 -0800)]
kernel - Fix incorrect assertion in nlookup()
* Fix an incorrect assertion, When ISLOCKED is set the returned ncp
can be locked shared or exclusive in the error case, rather than
just exclusive.
Sepherosa Ziehau [Mon, 25 Feb 2013 01:33:13 +0000 (09:33 +0800)]
netisr: Dedicated network thread is not netisr
Dedicated network thread should just fetch and run the netmsg on its
own port instead of performing full-fledged netisr operation,
e.g. run rollups
Reported-by: pavalos@
Sascha Wildner [Sun, 24 Feb 2013 15:39:24 +0000 (16:39 +0100)]
ccd(4): Fix operator precedence.
Sepherosa Ziehau [Sun, 24 Feb 2013 14:42:29 +0000 (22:42 +0800)]
bce: Cache TX/RX consumer indices' location
Using the them to access TX/RX consumer indices instead of directly
accessing status block; prepare for the MSI-X support
Sepherosa Ziehau [Sun, 24 Feb 2013 14:14:10 +0000 (22:14 +0800)]
bce: Save CID into related TX/RX ring struct
Sascha Wildner [Sun, 24 Feb 2013 09:46:11 +0000 (10:46 +0100)]
ath(4): s/long long unsigned/unsigned long long/
Sascha Wildner [Sun, 24 Feb 2013 05:10:28 +0000 (06:10 +0100)]
kernel/vm_object: Add debugvm_object_hold_maybe_shared() prototype.
Matthew Dillon [Sat, 23 Feb 2013 19:49:31 +0000 (11:49 -0800)]
debug - vmpageinfo changes
* Adjust vmpageinfo to print more information.
Matthew Dillon [Sat, 23 Feb 2013 19:47:01 +0000 (11:47 -0800)]
kernel - Clean up if_bridge bif_state tests
* bif_state is only valid when IFBIF_STP is set, adjust two bits of
code that were using bif_state unconditionally.
* This is a semi-operational change because bif_state's default value
when IFBIF_STP is not set resulted in correct operation anyway.
However, setting STP and then clearing it on a sub-interface could
cause problems with stale state.
Matthew Dillon [Sat, 23 Feb 2013 19:45:24 +0000 (11:45 -0800)]
kernel - Track slabs allocated and freed
* Add statistics counters kern.slabs_allocated and kern.slabs_freed,
tracking kernel memory allocator slab statistics.
Matthew Dillon [Sat, 23 Feb 2013 19:44:17 +0000 (11:44 -0800)]
kernel - Separate page activity heuristic for anonymous memory vs files
* Add sysctls vm.anonmem_decline and vm.filemem_decline with reasonable
defaults.
* Should improve retention of anonymous memory over file cache.
Matthew Dillon [Sat, 23 Feb 2013 19:22:00 +0000 (11:22 -0800)]
kernel - Implementat much deeper use of shared VM object locks
* Use a shared VM object lock on terminal (and likely highly shared)
OBJT_VNODE objects. For example, binaries in the system such as
/bin/sh or /usr/bin/make.
This greatly improves fork/exec and related VM faults on concurrently
executing binaries. Most commonly, parallel builds often exec
hundreds of thousands of sh's and make's.
+50% to +100% nominal improved performance under these conditions.
+200% to +300% improved poudriere performance during the depend
stage.
* Formalize the shared VM object lock with a new API function,
vm_object_lock_maybe_shared(), which determines whether a VM
object meets the requirements for obtaining a shared lock.
* Adjust the vm_fault*() APIs to track whether the VM object is
locked shared or exclusive on entry.
* Clarify that OBJ_ONEMAPPING is only applicable to OBJT_DEFAULT
and OBJT_SWAP objects.
* Heavy work on the exec path. Somewhat lighter work on the exit
path. Tons more work could be done.
Sascha Wildner [Sat, 23 Feb 2013 18:48:32 +0000 (19:48 +0100)]
<malloc.h>: Restrict support for <malloc.h> to !defined(__STDC__).
In essence this is what FreeBSD did: error if __STDC__ is defined, and
silently include <stdlib.h> if not.
Packages are expected to now fail their config checks for <malloc.h>
but to build nevertheless, which was confirmed with building ~500
packages as a test.
Adjust a few config.h files of contrib/ code as well, notably
libssp's, which gets rid of the malloc.h warnings from the buildworld
output.
Sascha Wildner [Sat, 23 Feb 2013 18:46:25 +0000 (19:46 +0100)]
libkern: Stop compiling in (u)cmpdi2.c, because they are not used.
It's 32 bit code that assumes that two longs fit into 64 bits,
hence put it into i386 'files' (commented out).
Sascha Wildner [Sat, 23 Feb 2013 13:33:27 +0000 (14:33 +0100)]
kernel/x86_64: Remove some bogus #ifndefs.
Sascha Wildner [Sat, 23 Feb 2013 12:21:02 +0000 (13:21 +0100)]
kernel/isa: Remove empty isa_init() (formerly used for COMPAT_OLDISA).
Matthew Dillon [Sat, 23 Feb 2013 05:57:45 +0000 (21:57 -0800)]
kernel - Implement shared namecache locks
* Currently highly experimental, so I've added a sysctl and default it
to disabled for now.
sysctl debug.ncp_shared_lock_disable
0 Shared namecache locks enabled
1 Shared namecache locks disabled (default)
* Removes most conflicts when concurrent processes are doing long path
lookups with substantially similar prefixes.
* Also removes directory conflicts when concurrent processes are accessing
different file names under the same directory using short paths.
* Shared mode is only used when the ncp is resolved and in a normal
working state (this includes entries which have resolved to ENOENT).
Otherwise the code falls back to exclusive mode.
* Shared namecache locks have three major complexities:
(1) First, some bits of the nlookup() routine had to be rearranged to
avoid double locking. This is because the last namecache component
always has to be locked exclusively, but a path such as a/b/d/.
references the same ncp entry for both of the last two components.
(2) Second, any lock on a namecache structure vhold()'s the related vp
(if not NULL). Shared locks present a particular issue where a
second cpu may obtain a second shared lock before the first cpu
is able to complete vhold()ing the vnode. The vnode cannot be
vhold()'d prior to the lock. To deal with this an interlock was
implemented (see NC_SHLOCK_VHOLD).
(3) Finally, because there might be many concurrent shared lock users
to avoid starving out an exclusive lock user we must stall further
shared locks while an exclusive request is pending.
* The implementation specifically does not attempt to implement lock
upgrading. That's another can of worms that I'd rather not open.
Matthew Dillon [Sat, 23 Feb 2013 05:44:55 +0000 (21:44 -0800)]
kernel - cpu_pause() needs to be memory-modifying
* __asm __volatile isn't enough, it also needs the "memory"
attribute to prevent gcc from optimizing out memory loads around
loops using cpu_pause().
Sascha Wildner [Sat, 23 Feb 2013 01:56:51 +0000 (02:56 +0100)]
Use NULL for pointers in a couple of places.
Sascha Wildner [Sat, 23 Feb 2013 01:31:06 +0000 (02:31 +0100)]
libdmsg: Fix pointer dereference.
Sascha Wildner [Sat, 23 Feb 2013 00:43:55 +0000 (01:43 +0100)]
hier.7: Document that /boot/kernel is a directory and has the modules too.
While here, also remove /usr/include/objc.
Sascha Wildner [Fri, 22 Feb 2013 23:35:00 +0000 (00:35 +0100)]
iso639: Add Standard Moroccan Tamazight.
See http://www.loc.gov/standards/iso639-2/php/code_changes.php
Matthew Dillon [Fri, 22 Feb 2013 21:03:04 +0000 (13:03 -0800)]
kernel - Fix cross-mount handling in tmpfs hardlink code
* Fix tmpfs to properly report EXDEV when a cross-mount hardlink is
attempted instead of asserting and causing a panic.
Reported-by: ftigeot
Matthew Dillon [Fri, 22 Feb 2013 21:01:45 +0000 (13:01 -0800)]
kernel - Fix deadlock in tmpfs
* If the pageout daemon is paging out a file on a tmpfs mount concurrent
with an unmount of same, a deadlock can occur.
* Fix the node vs vnode lock order in the tmpfs umount code.
Matthew Dillon [Fri, 22 Feb 2013 18:16:30 +0000 (10:16 -0800)]
kernel - Remove getnewvnode() bottlenecks
* Move the global mntvnodescan_list into the mount structure and remove
the global mntvnode_token. Adjust the code to use the per-mount
mp->mnt_token instead.
* This removes a major token bottleneck in getnewvnode(), particularly
important when doing concurrent not-yet-cached directory scans or file
creates under different mount points, and when the vnode cache reaches
its nominal maximum.
* Also add a missing piece for the last cache_findmount() commit.
Matthew Dillon [Fri, 22 Feb 2013 18:09:58 +0000 (10:09 -0800)]
kernel - Add negative caching to cache_findmount()
* Add negative caching to cache_findmount(). It turns out that there
are quite a few cases, particularly during poudriere, so this is
needed to avoid dropping down into the slow mountlist scan code.
* Removes remaining bottlenecks in mount-point crossings during path
lookups. The mountlist_token is no longer colliding in critical
paths.
Matthew Dillon [Fri, 22 Feb 2013 09:54:47 +0000 (01:54 -0800)]
kernel - Increase NCMOUNT_NUMCACHE, add enable & statistics
* Increase NCMOUNT_NUMCACHE to 1009 (prime number), change to modulo.
This cache improves long namecache path lookups.
* Add enable and statistics. Cache defaults to enabled.
debug.ncmount_cache_enable (defaults to 1)
debug.ncmount_cache_hit
debug.ncmount_cache_miss
debug.ncmount_cache_overwrite
Sepherosa Ziehau [Fri, 22 Feb 2013 09:15:11 +0000 (17:15 +0800)]
bce: Move RX serializers before TX serializers
RX serializers will be use to protect MSI-X
Sepherosa Ziehau [Fri, 22 Feb 2013 08:28:27 +0000 (16:28 +0800)]
bce: Free serializer array in detach path
Sepherosa Ziehau [Fri, 22 Feb 2013 07:16:36 +0000 (15:16 +0800)]
altq/hfsc: Fix wrong malloc size
Reported-by: pavalos@
Matthew Dillon [Fri, 22 Feb 2013 06:32:11 +0000 (22:32 -0800)]
kernel - Fix performance issue due to buffer fragmentation
* Systems with a lot of memory have very large buffer pools. Defragmenting
these pools can be expensive. Often the buffer_map becomes full well
before the bufspace actually hits its limits. Filesystems such as HAMMER
which use large buffer sizes (64K) are more likely to cause the problem.
The result is extremely bad I/O performance for data not in the buffer
cache which requires a new buffer to be instantiated.
* To solve this we double the size of the buffer_map's KVA area on
64-bit systems while leaving the maximum buffer space allowed the
same. The larger virtual space greatly reduces KVA allocation
failures due to fragmentation.
* This solves significant performance issues on monster with its 64G
of ram, but should improve performance on any 64-bit system by
reducing buffer cache defrag iterations.
* Also fix a possible intermediate value overflow in vlrureclaim().
Matthew Dillon [Fri, 22 Feb 2013 04:32:03 +0000 (20:32 -0800)]
kernel - Add frontend cache for cache_findmount()
* When a name lookup crosses a mount point boundary it must call
cache_findmount() to locate the mount linkage. This linkage is
not stored in the vp or ncp because there is a 1:N relationship
between vp/ncp and possible mounts due to DragonFly's ability
to do arbitrary nullfs mounts in the topology.
* The mountlist scan requires an exclusive token to deal with ripouts
during the scan. This creates a bottleneck when highly parallel
filesystem operations are being run on the machine and use mount-crossing
paths or absolute paths.
* The frontend cache is able to use a shared spinlock for the fast path,
and implements a simple non-chained linear array hashed by pointer
values.
Matthew Dillon [Fri, 22 Feb 2013 02:14:45 +0000 (18:14 -0800)]
kernel - Fix network lockup due to msgport bug
* Netisr threads (i.e. arp thread) which issue route table updates
use a synchronous netmsg from a 'spin' type port to a 'thread' type
port.
When going spin->thread, the lwkt_thread_putport*() code was not
using an atomic op to manipulate ms_flags. This could interfere
with the originator on the spin port issuing a lwkt_spin_waitmsg()
and cause one or more flags to be lost.
Ensure that lwkt_thread_putport*() uses atomic ops when manipulating
ms_flags.
* Another serious issue is that the lwkt_*_waitmsg() code was testing
MSGF_QUEUED outside of its port lock. This flag can only be tested
while the port is locked.
* lwkt_thread_replyport() must use an atomic op when setting
MSGF_INTRANSIT and MSGF_REPLY to avoid SMP races on ms_flags
updates.
* lwkt_thread_replyport() requires a critical section against
possible preemption when adjusting ms_flags.
* lwkt_forwardmsg() does not need a critical section.
* Other notes: Not all ms_flags manipulation needs an atomic op. For
example, when initializing a new message or when a lock is held to
rendezvous at a reply port when replying. However, all 'put' and 'wait'
interactions on messages absolutely require atomic ops when manipulating
ms_flags.
Finally, note that all msgport queue operations use atomic ops to
adjust MSGF_QUEUED when adding or removing a message to a port queue.
Antonio Huete Jimenez [Wed, 20 Feb 2013 11:09:26 +0000 (12:09 +0100)]
vkd(4) - Announce virtual disks upon initialization.
Antonio Huete Jimenez [Wed, 20 Feb 2013 10:26:54 +0000 (11:26 +0100)]
vke(4) - Show backing tap only if one was used.
Antonio Huete Jimenez [Fri, 22 Feb 2013 00:51:22 +0000 (01:51 +0100)]
vkernel(7) - Minor manpage adjustments.
Antonio Huete Jimenez [Fri, 22 Feb 2013 00:14:01 +0000 (01:14 +0100)]
vkernel - Settable serial numbers for virtual disks.
User can specify now serial numbers for their virtual disks
from the command line.
Example:
./vkernel -m 128m -r root.img:VKDMYSERNO
Matthew Dillon [Thu, 21 Feb 2013 23:33:26 +0000 (15:33 -0800)]
kernel - Fix issue with ARP packets stalling out entire network
* ARP packets can cause ARP routing table updates to occur. An ARP
routing table update is an expensive synchronous netmsg that is
forwarded through *ALL* cpus.
* ARP was previously being handled by netisr 0 and on large multi-way
machines (aka monster the 48-way opteron) under very heavy loads this
could result in very long stalls for any packet processing forwarded
to cpu 0.
Stalls exceeding 200 seconds were observed on monster when a large
number of ARP packets had to be processed.
* Implement a dedicated thread feature for the NETISR mechanism and
modify NETISR_ARP to use it. This takes the expensive synchronous
ARP packet processing off the general per-cpu netisr threads.
This thread currently runs on cpu (18 % ncpus) (NETISR_ARP == 18).
Thus the general per-cpu (netisr 0) thread will no longer stall
on ARP packets.
* ping latencies under extreme loads improved to (approximately):
ping -i 0.001 monster-nr
11735 packets transmitted, 11735 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.073/0.190/27.019/0.382 ms
Matthew Dillon [Thu, 21 Feb 2013 23:32:12 +0000 (15:32 -0800)]
kernel - Add critical section in lwkt_yield_quick()
* Add a critical section to protect the clearing of the LWKT reschedule
bit against gd_tdrunq.
Sascha Wildner [Thu, 21 Feb 2013 19:14:23 +0000 (20:14 +0100)]
asr(4): Remove a case that is not a member of the enum being tested.
Apparently xpt_opcode can wind up being REPORT_LUNS here. gcc47 warns
about it because REPORT_LUNS is not an enum xpt_opcode member.
FreeBSD just casted this away. Instead of doing that, remove it and
let the default case handle it (which has the same code as XPT_ABORT).
While here, fix a typo in a comment.
Sepherosa Ziehau [Thu, 21 Feb 2013 13:19:09 +0000 (21:19 +0800)]
bce: Fix tick/pulse callout target CPU setting
Sepherosa Ziehau [Thu, 21 Feb 2013 09:29:47 +0000 (17:29 +0800)]
igb: Fix timer cpuid settings when entering/exiting polling mode
Matthew Dillon [Thu, 21 Feb 2013 07:03:53 +0000 (23:03 -0800)]
kernel - Fix cpu/token starvation, vfs_busy deadlocks. incls sysctl (2)
* Last commit had a bug in the deadlock fix for nlookup(). This fix
is tested and works.
Matthew Dillon [Thu, 21 Feb 2013 07:01:20 +0000 (23:01 -0800)]
kernel - Fix excessive kprintf()s during refcount_wait()
* _refcount_wait() can do itself in with excessive kprintf()s
on large multi-way machines, causing the machine to become
unresponsive.
* Rewrite the code to use a ticks test and only kprintf()
a warning when it takes more than 60 seconds.
* Used by vm_object_pip_wait(). Long I/O queues are possible.
Matthew Dillon [Thu, 21 Feb 2013 06:42:08 +0000 (22:42 -0800)]
kernel - Implement vm.read_shortcut support in tmpfs
* Implement tmpfs support for vm.read_shorcut_enable=1
* Approximately doubles tmpfs read() performance on 64-bit systems
for data sets which exceed the size of the buffer cache.
Example using monster (64G ram, 6.4G buffer cache, 9G test data set)
du -s -k /mp
9037196 /mp
sysctl vm.read_shortcut_enable=1
time tar cf /dev/nmonster# time tar cf /dev/null /mp
6.763u 13.275s 0:20.05 99.9% 26+66k 0+0io 0pf+0w
7.224u 12.830s 0:20.07 99.9% 26+66k 0+0io 0pf+0w
6.957u 14.924s 0:21.91 99.8% 26+66k 0+0io 0pf+0w
sysctl vm.read_shortcut_enable=0
time tar cf /dev/nmonster# time tar cf /dev/null /mp
7.510u 23.997s 0:31.52 99.9% 26+66k 0+0io 0pf+0w
7.769u 37.738s 0:45.53 99.9% 25+65k 0+0io 0pf+0w
7.716u 40.306s 0:48.04 99.9% 25+65k 0+0io 0pf+0w
* Note that variations in run time when the feature is disabled
depends on what data is already present in the buffer cache and
the cost of mapping new buffers and tearing down old buffers.
This can be substantial on large multi-way systems due to
SMP/page-table issues.
Matthew Dillon [Thu, 21 Feb 2013 05:10:10 +0000 (21:10 -0800)]
kernel - add yields in the swap pager freeing path
* Add yields in swp_pager_meta_free*(). This routine can loop
heavily on very large VM objects and we don't want it to stall
the cpu.
Matthew Dillon [Thu, 21 Feb 2013 05:09:46 +0000 (21:09 -0800)]
kernel - cleanup
* Minor cleanup
Matthew Dillon [Thu, 21 Feb 2013 05:07:00 +0000 (21:07 -0800)]
kernel - Remove remaining mplock use cases from tmpfs
* Use the per-mount lock for remaining cases, including nremove,
truncate, and other operations.
* Also fixes machine stalls against pings when removing very
large files.
Submitted-by: vsrinivas
Sepherosa Ziehau [Thu, 21 Feb 2013 04:44:42 +0000 (12:44 +0800)]
icmp: ICMP is MPSAFE
Sepherosa Ziehau [Thu, 21 Feb 2013 04:36:44 +0000 (12:36 +0800)]
bce: RX and TX ring counts are not required to be same
However, in Dragonfly, RX ring count must be great than TX ring count.
Clue-from: Linux bnx2
Matthew Dillon [Thu, 21 Feb 2013 02:38:33 +0000 (18:38 -0800)]
kernel - Fix cpu/token starvation, vfs_busy deadlocks. incls sysctl
* Remove the mplock around the userland sysctl system call, it should no
longer be needed.
* Remove the mplock around getcwd(), it should no longer be needed.
* Change the vfs_busy(), sys_mount(), and related mount code to use the
per-mount token instead of the mp lock.
* Fix a race in vfs_busy() which could cause it to never get woken up.
* Fix a deadlock in nlookup() when the lookup is racing an unmount. When
the mp is flagged MNTK_UNMOUNT, the unmount is in progress and the lookup
must fail instead of loop.
* per-mount token now protects mp->mnt_kern_flag.
* unmount code now waits for final mnt_refs to return to the proper value,
fixing races with other code that might temporarily ref the mount point.
* Add lwkt_yield()'s in nvtruncbuf*() and nvnode_pager_setsize(), reducing
cpu stalls due to large file-extending I/O's. Also in tmpfs.
* Use a marker in the vm_meter code and check for vmobj_token collisions.
When a collision is detected, give other threads a chance to take the
token. This prevents hogging of this very important token.
Testing-by: dillon, vsrinivas, ftigeot
Sascha Wildner [Wed, 20 Feb 2013 18:39:18 +0000 (19:39 +0100)]
vkernel/vke: Comment out 'ifp', just like the code that uses it.
John Marino [Wed, 20 Feb 2013 15:59:51 +0000 (16:59 +0100)]
build: Only auto-save once per build
suggested-by: tuxillo
Sepherosa Ziehau [Wed, 20 Feb 2013 09:50:22 +0000 (17:50 +0800)]
bce: Switch to IFQ subqueue functions and use per-TX queue watchdog
Sascha Wildner [Wed, 20 Feb 2013 07:38:07 +0000 (08:38 +0100)]
<sys/bus.h>: Fix wording.
Reported-by: marino
Sascha Wildner [Wed, 20 Feb 2013 07:08:39 +0000 (08:08 +0100)]
kernel: Use DEVMETHOD_END in the drivers.
Sascha Wildner [Wed, 20 Feb 2013 07:08:11 +0000 (08:08 +0100)]
<sys/bus.h>: Add DEVMETHOD_END.
Matthew Dillon [Wed, 20 Feb 2013 06:56:58 +0000 (22:56 -0800)]
kernel - Properly account system time for contending tokens
* When the LWKT schedule gets stuck on a contending token it switches
through the idle thread, the idle thread is told not to halt, and
resolution of the contention is handled by lwkt_switch() from the
idle thread's context.
* This was causing token contention to be improperly accounted for as
idle time in the per-cpu stats. Fix the case by testing the
RQF_AST_LWKT_RESCHED flag which tells the idle thread not to halt,
and account for the tick as system time if the flag is set.
* The improper time accounting was causing powerd to come to the wrong
conclusion in massively parralel fsstress tests on monster.dragonflybsd.org
(48 cpus). With the fix, powerd no longer becomes confused.
Reported-by: vsrinivas
Matthew Dillon [Tue, 19 Feb 2013 23:29:55 +0000 (15:29 -0800)]
vkernel - Fix if_vke
* The vkernel device driver threads are cothreads and do not have
any per-cpu data.
* Fix recent stat counter changes which attempted to access per-cpu data
from a cothread. This fixes a vkernel SIGILL by virtue of the trap()
code being called recursively on trap's own attempt to access mycpu,
until its stack runs out.
Peter Avalos [Tue, 19 Feb 2013 18:26:30 +0000 (10:26 -0800)]
Adjust files for libarchive-3.1.2 import.
Peter Avalos [Tue, 19 Feb 2013 18:35:19 +0000 (10:35 -0800)]
Merge branch 'vendor/LIBARCHIVE'
Sascha Wildner [Tue, 19 Feb 2013 17:51:31 +0000 (18:51 +0100)]
patch(1): Fix typo.
Sascha Wildner [Tue, 19 Feb 2013 17:49:16 +0000 (18:49 +0100)]
patch(1): When -C is specified, do not claim to have saved rejects.
Submitted-by: Loganaden Velvindron
Dragonfly-bug: <http://bugs.dragonflybsd.org/issues/2359>
Sepherosa Ziehau [Tue, 19 Feb 2013 13:12:04 +0000 (21:12 +0800)]
bce: Reimplement polling in non-compat mode
Take advantage of the independent TX/RX serializers.
Sepherosa Ziehau [Tue, 19 Feb 2013 09:01:50 +0000 (17:01 +0800)]
bce: Split TX/RX serializer
Sepherosa Ziehau [Tue, 19 Feb 2013 04:57:05 +0000 (12:57 +0800)]
bce: Defer interrupt allocation until the TX/RX rings are allocated
This will be required for implementing MSI-X at least.
Sepherosa Ziehau [Tue, 19 Feb 2013 04:21:34 +0000 (12:21 +0800)]
bce: Regroup function declaration
Sepherosa Ziehau [Tue, 19 Feb 2013 04:12:13 +0000 (12:12 +0800)]
bce: Fix supported devices list in comment
Sepherosa Ziehau [Tue, 19 Feb 2013 03:16:37 +0000 (11:16 +0800)]
bce: Put RX related fields into bce_rx_ring
Matthew Dillon [Mon, 18 Feb 2013 20:08:29 +0000 (12:08 -0800)]
kernel - Fix a race and enable the VM read shortcut feature by default
* Fix a lookup/access race. No known cases hit the race but decided
it needed to be fixed for safety.
Instead of looking up and holding the VM page we know try to busy it,
and only access the content if we are able to do so non-blocking.
This costs a bit more in overhead but handles the page more properly.
/usr/obj/usr/src
time tar cf /dev/null .
0.734u 5.781s 0:06.51 100.0% 24+66k 0+0io 0pf+0w (shorcut disabled)
0.664u 2.382s 0:03.05 99.6% 24+66k 0+0io 0pf+0w (shorcut enabled)
* Default vm.read_shortcut_enable to 1. The feature is now enabled by
default.
* The feature has been in the tree a while default disabled and needs wider
use, so it is being enabled by default. The feature is only useful on
64-bit systems (i.e. so the DMAP can be used). It allows the buffer
cache and the VM page mapping code to be completely bypassed in situations
where the file data is available in the VM page cache.
Sascha Wildner [Mon, 18 Feb 2013 18:25:50 +0000 (19:25 +0100)]
rc.d/addswap: Load the vn(4) module if not already present.
Reported-by: lentferj
Matthew Dillon [Mon, 18 Feb 2013 17:50:22 +0000 (09:50 -0800)]
kernel - Fix rare race in namecache
* Fix a rare race in _cache_cleanneg() where the ncp being cleaned up is
resolved during the moment between where _cache_cleanneg() accesses it
prior to locking and removing it.
* _cache_cleanneg() needed to re-check that the ncp was still on the
negative cache list.
Reported-by: marino
John Marino [Mon, 18 Feb 2013 00:15:59 +0000 (01:15 +0100)]
csu: Fix .eh_frame_hdr errors seen on i386
The libcsu object files should have been generated with
-fno-asynchronous-unwind-tables. The crtbegin*, crtend* objects,
specific to a compiler, were generated with this flag on both platforms
as seen in the vendor build. This commit builds libcsu with the same
cflags on both platforms, and it allows the error frame header to get
built successfully.
This has been seen on i386 for a while, and later bug #2511 hit upon
it outside of the world build.
Samuel J. Greear [Sun, 17 Feb 2013 23:40:21 +0000 (16:40 -0700)]
build - Do not use cp -a
* The -a option to cp was added in November, unbreak installworld for those
running a world built prior to November 2012.
John Marino [Sun, 17 Feb 2013 20:14:48 +0000 (21:14 +0100)]
build: implement automatic world backups
The directives DAYS_BACKUP and NO_BACKUP have been removed.
The "backupworld" target will save important directories to the WORLD_BACKUP
directory just as before, and it is restored with the "restoreworld" target.
Additionally, every time the "installworld" target is executed, the same
directories will be automatically backed up at the location of
${MAKEOBJDIRPREFIX}/world_backup . These directories could be restored
with the new make target "restoreworld-auto".
The WORLD_BACKUP location default is now /var/backups/world_backup .
The directory /usr/lib has been added to the backup list.
The more useless errors seen with a broken world have been removed, these
came in with bmake.
John Marino [Sun, 17 Feb 2013 19:16:53 +0000 (20:16 +0100)]
build: Remove installworld backup check
Dillon wants to rework backup functionality.
1. Remove any check that can halt installworld
2. Backup world automatically and store it in /usr/obj/world_binaries
3. Keep manual backup commands, they still it WORLD_BACKUP as before
4. Get rid of days check
This commit accomplished step 1.
Sepherosa Ziehau [Sun, 17 Feb 2013 13:15:39 +0000 (21:15 +0800)]
bce: Put TX related fields into bce_tx_ring
Antonio Huete Jimenez [Sun, 17 Feb 2013 11:41:45 +0000 (12:41 +0100)]
vkernel - Allow setting MAC addresses from within the command line.
In order to be able to specify the MAC address we want to
use for every interface within the vkernel, an extra argument
has been added to the -I option.
Example:
./kernel -r root.img -m 256m -I auto=aa:bb:cc:dd:ee:ff
John Marino [Sun, 17 Feb 2013 11:05:18 +0000 (12:05 +0100)]
build: add "make backupworld" and "make restoreworld" functionality
Three new make.conf parameters have been defined:
* WORLD_BACKUP - location to store backed up world binaries
default = /var/backups/world_binaries
* DAYS_BACKUP - The number of days since the last backup that must pass
before "make installworld" fails with an error
default = 28
* NO_BACKUP - defining this will prevent backup checks.
The build functionality has been changed. Prior to "make installworld",
the makefile will check to see if system binaries have been previously
backed up. If they haven't, "make installworld" will fail to execute
explaining that the system should be backed up. If a previous backup
does exist, but it's older than the specified number of days, "make
installworld" will fail explaining the backup needs to be refreshed.
Passing NO_BACKUP through the command line or make.conf will inhibit
these checks.
While here, define WORLD_CCVER, LDVER, WORLD_LDVER, WORLD_BINUTILSVER
in make.conf man page too.
Sepherosa Ziehau [Sun, 17 Feb 2013 11:07:06 +0000 (19:07 +0800)]
bce: Factor out bce_xmit()