Thomas Nikolajsen [Mon, 3 Jan 2011 06:26:09 +0000 (07:26 +0100)]
hammer(8) rebalance: change default saturation_percentage a few more places
Antonio Huete Jimenez [Sun, 2 Jan 2011 21:16:48 +0000 (22:16 +0100)]
hammer(8) - Take in account the saturation level passed to rebalance.
Also change the default saturation level from 75% to 85%.
Sascha Wildner [Sun, 2 Jan 2011 19:16:05 +0000 (20:16 +0100)]
kernel: Remove support for the Xerox Network Systems (NS) protocol.
It was previously removed from LINT with commit
67bf99c4 and its removal
was also announced here:
http://leaf.dragonflybsd.org/mailarchive/kernel/2010-09/msg00025.html
To be on the safe side, bump _DragonFly_version due to /usr/include/netns
going away. In the unlikely event of something breaking in pkgsrc we'll
have a version to patch against, if needed.
Sascha Wildner [Sun, 2 Jan 2011 17:43:19 +0000 (18:43 +0100)]
bluetooth(4): Fix loading bluetooth without pf.
bluetooth(4) was previously using pf(4)'s pool_* macros. Now that pool_get()
has been turned into a function in pf(4), this is no longer possible, since
the bluetooth(4) module would have to depend on the pf(4) module for that.
To unbreak module loading, convert all calls to these macros to the zone(9)
calls which they really are.
Sascha Wildner [Sun, 2 Jan 2011 17:38:47 +0000 (18:38 +0100)]
aibs(4): Add missing module dependency.
Samuel J. Greear [Sun, 2 Jan 2011 15:14:55 +0000 (15:14 +0000)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Samuel J. Greear [Sun, 2 Jan 2011 15:12:14 +0000 (15:12 +0000)]
kernel - s/so_pru_control/so_pru_control_direct/
* This enables building without COMPAT_43
Sascha Wildner [Sun, 2 Jan 2011 11:28:37 +0000 (12:28 +0100)]
Rearrange the previous commit a bit and fix a pathname.
Jan Lentfer [Sun, 2 Jan 2011 11:09:20 +0000 (12:09 +0100)]
pf: Do not install pfsync man pages, as we lack support for it.
Jan Lentfer [Sat, 1 Jan 2011 23:22:05 +0000 (00:22 +0100)]
pf: Update man pages, too
Sascha Wildner [Sat, 1 Jan 2011 11:42:33 +0000 (12:42 +0100)]
Fix 64 bit build.
Sascha Wildner [Fri, 31 Dec 2010 20:39:11 +0000 (21:39 +0100)]
Bump the copyright years.
\o/ H A P P Y N E W Y E A R ! \o/
Sascha Wildner [Fri, 31 Dec 2010 18:05:20 +0000 (19:05 +0100)]
twe(4): Update to version 1.50.01.002.
This adds support for the 7000/8000 series adapters (some of which
are SATA controllers).
See the manual page for the full list of supported devices.
The update was tested with a 6200 card.
Taken-from: FreeBSD
Thanks-to: lentferj for providing a controller for testing
Jan Lentfer [Fri, 31 Dec 2010 14:07:41 +0000 (15:07 +0100)]
remove pfsync option from LINT
* pfsync is now build as part of pf directly
Jan Lentfer [Mon, 11 Oct 2010 16:01:08 +0000 (18:01 +0200)]
pf: Update packetfilter to OpenBSD 4.4
* As correct pf function depends directly on pfsync now
compile if_pfsyn.c into pf.ko. pflog is already part
of pf.ko.
* Activate pfsync function by default. It's not a kernel
option anymore, but pfsync is very unlikley to work.
Anyway our ifconfig is missing all pfsync related options.
I will try to make pfsync working again after upgrading to
pf from OpenBSD 4.5 as pfsync changes completley then
and is not compatible anymore with prior versions.
* Also make the module unloading sane in if_pflog.c
Thanks to Alex Hornung and Aggelos Economopoulos for debugging.
Sepherosa Ziehau [Fri, 31 Dec 2010 11:05:11 +0000 (19:05 +0800)]
apic: Clear all entries in int table
This fixes the case where missing 8259 entry and missing IRQ15
happen simultaneously
Sepherosa Ziehau [Fri, 31 Dec 2010 08:04:29 +0000 (16:04 +0800)]
intr: Start hardware interrupt at IDT_OFFSET
Sepherosa Ziehau [Fri, 31 Dec 2010 06:15:52 +0000 (14:15 +0800)]
intr: Remove call_fast_unpend() declaration
This function is not implemented at all.
Sepherosa Ziehau [Fri, 31 Dec 2010 06:05:46 +0000 (14:05 +0800)]
apic: Remove unused macros
Sepherosa Ziehau [Fri, 31 Dec 2010 05:54:17 +0000 (13:54 +0800)]
apic: Remove unused macros and duplicated comment
Sepherosa Ziehau [Fri, 31 Dec 2010 05:32:34 +0000 (13:32 +0800)]
intr: Move IO/APIC IDT vector offset into isa/intr_machdep.h
Sepherosa Ziehau [Fri, 31 Dec 2010 03:44:15 +0000 (11:44 +0800)]
intr: Make sure that changing IPI offset will also update related TPR
While I'm here, reorganize various IPI offsets
Sepherosa Ziehau [Fri, 31 Dec 2010 03:08:50 +0000 (11:08 +0800)]
intr: Remove unused typedef
Sepherosa Ziehau [Fri, 31 Dec 2010 03:05:40 +0000 (11:05 +0800)]
intr: Remove unused TRP macros
Sepherosa Ziehau [Fri, 31 Dec 2010 02:41:47 +0000 (10:41 +0800)]
intr: Clean up comment of Local APIC TPR
Sepherosa Ziehau [Fri, 31 Dec 2010 02:11:03 +0000 (10:11 +0800)]
pci: MPTable pcib/hostb should not be used, if !apic_io_enable
MP Table is not parsed at all if !apic_io_enable
Jan Lentfer [Thu, 30 Dec 2010 23:36:51 +0000 (00:36 +0100)]
ldns: Re-Add lost README files
Jan Lentfer [Thu, 30 Dec 2010 22:04:30 +0000 (23:04 +0100)]
ldns/drill: Update local files to 1.6.7
Jan Lentfer [Thu, 30 Dec 2010 00:30:02 +0000 (01:30 +0100)]
Update to ldns-1.6.7
Ilya Dryomov [Thu, 30 Dec 2010 11:41:08 +0000 (13:41 +0200)]
HAMMER - Remove unused variable
Although assigned (so GCC was silent), the 'blockmap' variable is
unused in hammer_blockmap_free(), hammer_blockmap_dedup() and
hammer_blockmap_finalize() functions.
Sepherosa Ziehau [Thu, 30 Dec 2010 08:46:43 +0000 (16:46 +0800)]
Nuke forward_fastint_remote(), which has never been used.
Matthew Dillon [Wed, 29 Dec 2010 08:49:59 +0000 (00:49 -0800)]
kernel - Fix lockmgr non-zero exclusive count panic (2)
* Handle another possible case when upgrading a shared lock to an
exclusive lock where the exclusive flag can wiggle its way in.
Reported-by: Peter Avalos <peter@theshell.com>,
YONETANI Tomokazu <qhwt.dfly@les.ath.cx>
Matthew Dillon [Wed, 29 Dec 2010 08:32:27 +0000 (00:32 -0800)]
kernel - Fix lockmgr non-zero exclusive count panic
* The vm_map lock uses shared & exclusive locks and tries to upgrade
shared to exclusive. There is a race where a shared upgrade can
steal an exclusive lock from an exclusive request which has already
acquired the LK_WANT_EXCL flag.
* Deal with the case by having the exclusive lock also acquire
LK_HAVE_EXCL to catch any shared upgrades which beat out the
request.
Reported-by: YONETANI Tomokazu <qhwt.dfly@les.ath.cx>
Researched-by: YONETANI Tomokazu <qhwt.dfly@les.ath.cx>
Sepherosa Ziehau [Wed, 29 Dec 2010 08:13:21 +0000 (16:13 +0800)]
ip_demux: Update comment for tcp_ctlport()
Sepherosa Ziehau [Wed, 29 Dec 2010 07:51:45 +0000 (15:51 +0800)]
ip_demux: Update comment for ip_lengthcheck()
Sepherosa Ziehau [Wed, 29 Dec 2010 07:48:08 +0000 (15:48 +0800)]
mtodoff: Add comment
While i'm here, nuke staled comment
Sascha Wildner [Tue, 28 Dec 2010 17:42:20 +0000 (18:42 +0100)]
kernel: Remove support for the EISA bus and EISA/VLB devices.
Discussed-with-and-approved-by: dillon, aggelos, and others.
Sascha Wildner [Tue, 28 Dec 2010 15:45:35 +0000 (16:45 +0100)]
Remove redundant settings. These are the same in /etc/defaults/rc.conf.
Sascha Wildner [Tue, 28 Dec 2010 14:17:26 +0000 (15:17 +0100)]
installer: Add swap to /etc/crypttab only if it is actually encrypted.
This caused some nasty warnings and error msgs upon booting.
Sepherosa Ziehau [Tue, 28 Dec 2010 02:49:50 +0000 (10:49 +0800)]
ipflow: cpumask_t should be used instead of uint32_t
Sepherosa Ziehau [Tue, 28 Dec 2010 01:54:55 +0000 (09:54 +0800)]
re(4): Add support for 8168E
Submitted-by: Tim Bisson <bissont@mac.com>
Sascha Wildner [Sun, 26 Dec 2010 15:57:33 +0000 (16:57 +0100)]
Create MLINKS for the mbuf(9) manual page.
Someone with more knowledge than me should review mbuf(9) and update it
to our current state of affairs.
Reported-by: pavalos
Venkatesh Srinivas [Sun, 26 Dec 2010 01:02:11 +0000 (17:02 -0800)]
Merge branch 'master' of /repository/git/dragonfly
Venkatesh Srinivas [Sun, 26 Dec 2010 00:57:07 +0000 (16:57 -0800)]
kernel -- Spinlock debugging.
* Track spinlocks held by a thread in a per-thread array; records the lock
address and the EIP of the lock-taker.
* Panic in lockmgr() if we hold any spinlocks when trying to take a sleeping
lockmgr lock.
Sascha Wildner [Sat, 25 Dec 2010 19:40:08 +0000 (20:40 +0100)]
mpipe.9: Remove trailing whitespace.
Samuel J. Greear [Fri, 24 Dec 2010 03:33:54 +0000 (03:33 +0000)]
vkernel - Catch manpage up with loader/installkernel changes
Reported-by: Tony
Sascha Wildner [Thu, 23 Dec 2010 17:32:36 +0000 (18:32 +0100)]
ips.4: Sort SEE ALSO references.
Sascha Wildner [Thu, 23 Dec 2010 15:02:03 +0000 (16:02 +0100)]
Miscellaneous manual page cleanup.
Sascha Wildner [Thu, 23 Dec 2010 14:58:40 +0000 (15:58 +0100)]
Fix typos in messages and manual pages.
Venkatesh Srinivas [Thu, 23 Dec 2010 06:57:49 +0000 (22:57 -0800)]
kernel -- MPIPE: Don't call a NULL constructor.
Venkatesh Srinivas [Thu, 23 Dec 2010 04:57:59 +0000 (20:57 -0800)]
kernel -- MPIPE: Add a constructor argument and priv ptr.
Matthew Dillon [Thu, 23 Dec 2010 03:35:25 +0000 (19:35 -0800)]
kernel - Fix boot-time lockup with if_igb
* In DragonFly on return from a call to if_start if
(IFF_RUNNING|IFF_OACTIVE) == IFF_RUNNING the if_start code for the
device is expected to have drained the queue and will be re-called
if it has not.
Add a required ifq_purge() in igb_start_locked() in case where the
adapter's link is not yet active.
* NOTE: In FreeBSD this is not the case, but correctly coding the
driver would probably still be beneficial.
Alex Hornung [Wed, 22 Dec 2010 08:58:13 +0000 (08:58 +0000)]
udevd - Exit from the SIGTERM handler
Reported-by: Tim Darby
Samuel J. Greear [Wed, 22 Dec 2010 05:31:24 +0000 (05:31 +0000)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Samuel J. Greear [Wed, 22 Dec 2010 05:30:06 +0000 (05:30 +0000)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Sascha Wildner [Wed, 22 Dec 2010 05:27:31 +0000 (06:27 +0100)]
vm_map_findspace.9: Turn an Xr to a manpage we don't have (yet) into an Fn.
Samuel J. Greear [Wed, 22 Dec 2010 05:24:21 +0000 (05:24 +0000)]
kernel - Add many sysctl definitions, sysv, vfs, nfs, etc.
* Also take the opportunity to remove some dead (no longer referenced
sysctl's), these are:
vfs.cache.dothits
vfs.cache.dotdothits
vfs.cache.nummiss
vfs.cache.nummisszap
vfs.cache.numposzaps
vfs.cache.numposhits
vfs.cache.numnegzaps
vfs.cache.numneghits
vfs.reassignbufloops
vfs.reassignbufsortgood
vfs.reassignbufsortbad
vfs.reassignbufmethod
vfs.nfs.defect
vfs.cache.numfullpathfail4
vfs.cache.numfullpathfail3
vfs.cache.numfullpathfail2
vfs.cache.numfullpathfail1
vfs.cache.numcwdfail4
vfs.cache.numcwdfail3
vfs.cache.numcwdfail2
vfs.cache.numcwdfail1
* Add back a couple of vfs.cache sysctl's with improved names
vfs.cache.numcwdfailnf
vfs.cache.numcwdfailsz
vfs.cache.numfullpathfailnf
vfs.cache.numfullpathfailsz
Submitted-by: Taras Klaskovsky
Sponsored-by: Google Code-In
Sascha Wildner [Wed, 22 Dec 2010 05:21:47 +0000 (06:21 +0100)]
Hook the vm_page_alloc.9 manual page into the build.
Venkatesh Srinivas [Wed, 22 Dec 2010 02:53:49 +0000 (18:53 -0800)]
Import vm_page_alloc manpage from FreeBSD and add DFly-specifics.
Venkatesh Srinivas [Tue, 21 Dec 2010 22:38:43 +0000 (14:38 -0800)]
Convert netstat/inet6.c to use a standard NELEM. (buildworld fixes).
Reported-by: pavalos@
Venkatesh Srinivas [Tue, 21 Dec 2010 22:02:14 +0000 (14:02 -0800)]
Restore __arysize to stdint.h. Userland was including and using __arysize.
Reported-by: Santabolt on #dragonflybsd
Venkatesh Srinivas [Tue, 21 Dec 2010 21:45:27 +0000 (13:45 -0800)]
kernel -- Convert sfbuf to use kref.
kref_dec was also modified to return an int, whether it saw a 1 -> 0
transition (0) or not (1).
Venkatesh Srinivas [Tue, 21 Dec 2010 20:25:54 +0000 (12:25 -0800)]
Code clean -- nuke private defines of __arysize and arysize, replace with NELEM.
Venkatesh Srinivas [Tue, 21 Dec 2010 20:08:55 +0000 (12:08 -0800)]
kernel -- Implement kref, a very lightweight reference counting system.
Sascha Wildner [Tue, 21 Dec 2010 19:22:25 +0000 (20:22 +0100)]
route(8)/routed(8): Raise WARNS to 6.
Venkatesh Srinivas [Tue, 21 Dec 2010 17:23:48 +0000 (09:23 -0800)]
Update MPIPE manual page to describe MPF_CACHEDATA & remove old lock comment.
Sascha Wildner [Tue, 21 Dec 2010 16:57:55 +0000 (17:57 +0100)]
kgdb(1): Adjust comments, too.
Sascha Wildner [Tue, 21 Dec 2010 16:36:34 +0000 (17:36 +0100)]
kgdb(1): Fix finding the right kernel for symbols when using -n.
Due to a change of kernel names in /var/crash (kernel.xx -> kern.xx)
kgdb(1) wasn't finding the kernel anymore and resorted to the one in
/usr/obj.
Sascha Wildner [Tue, 21 Dec 2010 12:43:49 +0000 (13:43 +0100)]
last(1): Raise WARNS to 6.
Sepherosa Ziehau [Tue, 21 Dec 2010 08:26:34 +0000 (16:26 +0800)]
tcp: Don't allow persist timer if TCP connection is not established yet
This probably could move the un-updated snd_nxt panic earlier.
The dump of the panic in http://bugs.dragonflybsd.org/issue1939 shows
- snd_nxt is less than snd_una
- A persist timer was fired (frame 16, tp->tt_msg->tt_prev_tasks).
- The TCP segment triggered the panic has SYN|ACK (frame 17, th->th_flags).
This TCP segment is considered as valid (frame 17, list), so tp->t_state
was SYN_SENT.
This explains why snd_nxt is less than snd_una:
If tcp_output() is called by persist timer, then the persist timer is
active and the "forced" is turned on, this causes the snd_nxt not updated
at all.
MISSING CHIAN IN THE LINK:
How is the persist timer got set in the SYN_SENT in the first place?
Hope the new panic could lift the veil...
Sascha Wildner [Tue, 21 Dec 2010 04:46:53 +0000 (05:46 +0100)]
Bring in mps(4) for LSI Fusion-MPT 2 Serial Attached SCSI controllers.
The driver should support the following controllers:
* LSI Logic SAS2004 (4 Port SAS)
* LSI Logic SAS2008 (8 Port SAS)
* LSI Logic SAS2108 (8 Port SAS)
* LSI Logic SAS2116 (16 Port SAS)
* LSI Logic SAS2208 (8 Port SAS)
Due to it still being in in development (Integrated RAID isn't supported,
for example), it's only hooked into the module build and added to LINT.
The port hasn't received any testing at all other than make it build. But
it is known that Matt has such a controller. :-)
Thanks to FreeBSD from which this driver is taken.
Peter Avalos [Tue, 21 Dec 2010 00:32:55 +0000 (14:32 -1000)]
Fix buildworld for latest liblzma/libarchive changes.
Peter Avalos [Mon, 20 Dec 2010 20:23:19 +0000 (10:23 -1000)]
libarchive: link in liblzma.
Reported-by: dillon@
Sascha Wildner [Mon, 20 Dec 2010 12:30:29 +0000 (13:30 +0100)]
Fix VKERNEL/VKERNEL64 build.
Alex Hornung [Mon, 20 Dec 2010 07:11:48 +0000 (07:11 +0000)]
utmpx - Bring in ${foo}x.3 manpages from NetBSD
Matthew Dillon [Mon, 20 Dec 2010 05:52:36 +0000 (21:52 -0800)]
kernel - vm_page BUSY handling, change vm_page_cache() API, maybe fix seg-fault
* All vm_page_deactivate() calls now operate with the caller holding
the page PG_BUSY or the page known not to be busy. Reorder several
cases where a vm_page is unbusied prior to calling deactivate.
* vm_page_cache() now expected the vm_page to be PG_BUSY and will cache
the page and clear the bit.
* Fix a race in vm_pageout_page_free() which calls vm_object_reference()
with an unbusied vm_page, then proceeds to busy and free the page.
The problem is that vm_object_reference() can block on vmobj_token.
This may fix the x86-64 seg-fault issue. Or it may not (throws up hands).
* Remove incorrect KKASSERT which was causing bogus panics.
Matthew Dillon [Mon, 20 Dec 2010 01:24:29 +0000 (17:24 -0800)]
kernel - Implement POLLING support for if_igb, change token->lockmgr lock
* Clean up the polling code so it works.
* Use a lockmgr lock instead of a token, the original driver writer
expected a normal lock.
Matthew Dillon [Sun, 19 Dec 2010 19:17:36 +0000 (11:17 -0800)]
kernel - Optimize idle thread halt
* Count the number of times the idle thread is entered on a cpu without
switching to a non-idle thread. Use the fast-halt (non-ACPI) until the
count exceeds a reasonable machdep.cpu_idle_repeat.
This improves the default performance to levels closer to cpu_idle_hlt
mode 1 but still gives us the power savings from mode 3. Performanced is
improved significantly because many threads on SMP boxes are event
or pipe oriented and only sleep for short periods of time or ping-pong
back and forth. For example, a cc -pipe, or typical kernel threads
blocking on tokens or locks for short periods of time.
* Adjust machdep.cpu_idle_hlt modes:
0 Never halt, the idle thread just spins.
1 Always use a fast HLT/MONITOR/MWAIT
2 Hybrid approach use (1) up to a certain point, then use (3).
(this is the default)
3 Always use the ACPI halt
Matthew Dillon [Sun, 19 Dec 2010 17:25:17 +0000 (09:25 -0800)]
kernel - Add MONITOR/MWAIT support to the LWKT scheduler
* Adjust the FIFO contention resequencer (which deals with spinning
on tokens) to use MONITOR/MWAIT when available instead of DELAY(1)
when waiting to become the head of the queue.
* Adjust the x86-64 idle loop to use MONITOR/MWAIT when available when
the idle halt mode (machdep.cpu_idle_hlt) is set to 1. This
significantly improves performance for event-oriented programs, including
compile pipelines.
NOTE: On the 48-core monster setting machdep.cpu_idle_hlt to 1 improves
performance but at the cost of an additional 200W of power at idle vs
the default value of 2 (ACPI idle halt). Look for a hybrid approach in
a future commit.
Charlie [Sun, 19 Dec 2010 16:44:17 +0000 (11:44 -0500)]
Merge branch 'master' of git://git.dragonflybsd.org/dragonfly
Venkatesh Srinivas [Sun, 19 Dec 2010 16:39:46 +0000 (11:39 -0500)]
Interbench -- Do not compare strings with ==.
Peter Avalos [Sun, 19 Dec 2010 11:46:45 +0000 (01:46 -1000)]
Update to libarchive-2.8.4.
This includes support for lzma and no longer relies on OpenSSL for hash
functions.
Peter Avalos [Sun, 19 Dec 2010 12:50:48 +0000 (02:50 -1000)]
Merge branch 'vendor/LIBARCHIVE' into HEAD
Peter Avalos [Sun, 19 Dec 2010 10:32:24 +0000 (00:32 -1000)]
Add SHA384 functions to libmd.
Sepherosa Ziehau [Sun, 19 Dec 2010 12:07:13 +0000 (20:07 +0800)]
jme: Rework software reset procedure
There is a wired TX/RX clock synchronization issues during software reset.
To address these issues we have to disable and enable TX/RX clocks several
times according to JMicron's document.
These clock synchronization issues seem to affect JMC250C/JMC260C chips,
however, it is claimed that these issues affact all JMC250/JMC260 chips.
Thank JMicron for providing JMC250C samples and detailed document.
Peter Avalos [Sun, 19 Dec 2010 06:26:34 +0000 (20:26 -1000)]
Hook up liblzma, lzmainfo, xz, and xzdec to the build.
Peter Avalos [Sun, 19 Dec 2010 07:58:21 +0000 (21:58 -1000)]
Merge branch 'vendor/XZ' into HEAD
Peter Avalos [Sun, 19 Dec 2010 01:16:19 +0000 (15:16 -1000)]
Rearrange lib/Makefile.
Use one line per subdir. This greatly improves readability.
Remove libraries we don't have any more from the comments.
Peter Avalos [Sun, 19 Dec 2010 00:24:26 +0000 (14:24 -1000)]
Import xz-5.0.0.
This is from the XZ Utils project: http://tukaani.org/xz/
Venkatesh Srinivas [Sat, 18 Dec 2010 23:28:08 +0000 (15:28 -0800)]
Rename cpu_mmw_pause(l) to cpu_mmw_pause_(int/long)
Venkatesh Srinivas [Sat, 18 Dec 2010 23:20:32 +0000 (15:20 -0800)]
Use xorq, not xorl, for RAX on x64.
Venkatesh Srinivas [Sat, 18 Dec 2010 23:18:52 +0000 (15:18 -0800)]
kernel -- Replace cpu_mmw_mwait with _pause, which doesn't loop. Correct bugs.
Venkatesh Srinivas [Sat, 18 Dec 2010 21:44:05 +0000 (13:44 -0800)]
kernel -- Monitor/Mwait routine for x64; (untested!)
Venkatesh Srinivas [Sat, 18 Dec 2010 21:43:13 +0000 (13:43 -0800)]
kernel -- Correct bug in i386 monitor / mwait routine and change to long type.
Venkatesh Srinivas [Sat, 18 Dec 2010 15:24:58 +0000 (07:24 -0800)]
kernel -- Add MONITOR and MWAIT routine to i386 kernel.
Provides cpu_mmw_spin() and cpu_mmw_mwait(), both of which wait for a given
memory cell to contain a value different from an expected value. _spin()
merely spins on the cell; _mwait() uses the SSE3 MONITOR/MWAIT isns.
Matthew Dillon [Sat, 18 Dec 2010 08:42:52 +0000 (00:42 -0800)]
kernel - scheduler adjustments for large ncpus / 48-core monster
* Change the LWKT scheduler's token spinning algorithm. It used to
DELAY a short period of time and then simply retry, creating a lot
of contention between cpus trying to acquire a token.
Now the LWKT scheduler uses a FIFO index mechanic to resequence the
contending cpus into 1uS retry slots using essentially just
atomic_fetchadd_int(), so it is very cache friendly. The spin-retry
thus has a bounded cache management traffic load regardless of
the number of cpus and contending cpus will not be tripping over
each other.
The new algorithm slightly regresses 4-cpu operation (~5% under heavy
contention) but significantly improves 48-cpu operation. It is also
flexible enough for further work down the road. The old algorithm
simply did not scale very well.
Add three sysctls:
sysctl lwkt.spin_method=1
0 Allow a user thread to be scheduled on a cpu while kernel
threads are contended on a token, using the IPI mechanic
to interrupt the user thread and reschedule on decontention.
This can potentially result in excessive IPI traffic.
1 Allow a user thread to be scheduled on a cpu while kernel
threads are contended on a token, reschedule on the next clock
tick (100 Hz typically). Decontention will NOT generate
any IPI traffic. DEFAULT.
2 Do not allow a user thread to be scheduled on a cpu while
kernel threads are contended. Should not be used normally,
for debugging only.
sysctl lwkt.spin_delay=1
Slot time in microseconds, default 1uS. Recommended values are
1 or 2 but not longer.
sysctl lwkt.spin_loops=10
Number of times the LWKT scheduler loops on contended threads
before giving up and allowing an idle-thread HLT. In order to
wake up from the HLT decontention will cause an IPI so you do
not want to set this value too small and. Values between
10 and 100 are recommended.
* Redo the token decontention algorithm. Use a new gd_reqflags flag,
RQF_WAKEUP, coupled with RQF_AST_LWKT_RESCHED in the per-cpu globaldata
structure to determine what cpus actually need to be IPId on token
decontention (to wakeup their idle threads stuck in HLT).
This requires that all gd_reqflags operations use locked atomic
instructions rather than non-locked instructions.
* Decontention IPIs are a last-gasp effort if the LWKT scheduler has spun
too many times. Under normal conditions, even under heavy contention,
actual IPIing should be minimal.
Matthew Dillon [Sat, 18 Dec 2010 08:35:13 +0000 (00:35 -0800)]
utilities - Print the cpu id for running and runnable threads
* The cpuid was only being printed for the currently running thread on
a cpu. Also print it for any scheduled (runnable) threads on that cpu
even if they aren't the currently running thread.
Matthew Dillon [Sat, 18 Dec 2010 08:34:27 +0000 (00:34 -0800)]
kernel - Fix M_NOWAIT's in e1000/if_igb driver
* M_NOWAITs should be M_INTWAITs. Fixes numerous boot-time problems
with the driver.
Venkatesh Srinivas [Sat, 18 Dec 2010 01:08:22 +0000 (17:08 -0800)]
kernel -- Remove fo_poll.
Venkatesh Srinivas [Fri, 17 Dec 2010 18:06:45 +0000 (10:06 -0800)]
Merge branch 'master' of /repository/git/dragonfly