dragonfly.git
20 years agoper-cpu tcbinfo[]s aren't ready for prime time yet. The tcbinfo is assigned
Matthew Dillon [Mon, 5 Apr 2004 17:47:01 +0000 (17:47 +0000)]
per-cpu tcbinfo[]s aren't ready for prime time yet.  The tcbinfo is assigned
at tcp_attach time, but there is insufficient information available at this
time to select the hash table and the wrong one gets assigned N-1 out of N
times on MP systems (N = number of cpus), causing outgoing tcp connections
to fail.

An an option, TCP_DISTRIBUTED_TCBINFO, so MP-safe tcbinfo distribution can
continue to be developed without impacting users.

20 years agoWe are DragonFly not FreeBSD, so rename the name in GENERIC, and remove the
Eirik Nygaard [Mon, 5 Apr 2004 13:44:40 +0000 (13:44 +0000)]
We are DragonFly not FreeBSD, so rename the name in GENERIC, and remove the
reference to the local handbook, which we don't have.

20 years agoConsistently use "foreign" and "local", which are invariant on the
Jeffrey Hsu [Mon, 5 Apr 2004 09:17:48 +0000 (09:17 +0000)]
Consistently use "foreign" and "local", which are invariant on the
host machine, instead of "src" and "dst", which varies according
to whether a packet is being received or sent.

20 years agoRemove makewhatis from /usr/bin (it officially resides in /usr/sbin),
Matthew Dillon [Mon, 5 Apr 2004 05:41:43 +0000 (05:41 +0000)]
Remove makewhatis from /usr/bin (it officially resides in /usr/sbin),
remove /usr/sbin/prebind (no longer exists, see 'resident').

20 years agoPartial sync from FreeBSD adds some more support and fixes. Also replace a
Matthew Dillon [Mon, 5 Apr 2004 05:34:36 +0000 (05:34 +0000)]
Partial sync from FreeBSD adds some more support and fixes.  Also replace a
number of hardwired masks with appropriately defined constants.

20 years agoBring in FreeBSD 1.2.2.2. Properly unwind the stack when certain
Matthew Dillon [Mon, 5 Apr 2004 05:31:58 +0000 (05:31 +0000)]
Bring in FreeBSD 1.2.2.2.  Properly unwind the stack when certain
failure cases occur in rfork_thread().

Submitted-by: Igor Sysoev <is@rambler-co.ru>
20 years agoFix buildworld. Document TOOLS_PREFIX and USRDATA_PREFIX, improve INCLUDEDIR
Matthew Dillon [Mon, 5 Apr 2004 05:30:13 +0000 (05:30 +0000)]
Fix buildworld.  Document TOOLS_PREFIX and USRDATA_PREFIX, improve INCLUDEDIR
documentation.  Modify bsd.incs.mk to not install header files if BOOTSTRAPPING
is set (for buildworld), and change lex to install its C++ header file in
${INCLUDEDIR}/c++ instead of ${INCLUDEDIR}/g++.  Set DESTDIR for BMAKEENV,
set BOOTSTRAPPING for XMAKE (cross build tools).  Note that DESTDIR is set
in the bootstrap-tools: target, this will be removed in a later commit.

20 years agoUndo the last commit. Utility programs which install c++ includes have no
Matthew Dillon [Mon, 5 Apr 2004 02:03:24 +0000 (02:03 +0000)]
Undo the last commit.  Utility programs which install c++ includes have no
knowledge and should have no knowledge of particular compiler versions
installed.  They should install their C++ header files in one place.

20 years agoQuake 3 server (running under linux emulation) was failing with odd '
Matthew Dillon [Mon, 5 Apr 2004 00:06:02 +0000 (00:06 +0000)]
Quake 3 server (running under linux emulation) was failing with odd '
Protocol not available' errors.  The problem turned out to be the internal
IP_HDRINCL check that the linux emulation code in the kernel was doing in
linux_sendto().  If the internal check fails with an error, the emulation
code should simply assume that IP_HDRINCL is off rather then return the error.

The bug was introduced during the syscall separation work on this module.
FreeBSD-4.x properly ignores the error.  This patch restores behavior for
DFly.

Reported-by: Sascha Wildner <saw@online.de>
20 years agoFix a missing wildcard binding in the recent wildcard binding hash table work.
Matthew Dillon [Sun, 4 Apr 2004 22:13:38 +0000 (22:13 +0000)]
Fix a missing wildcard binding in the recent wildcard binding hash table work.
This prevented YP from working properly.

Reported-by: Richard Nyberg <rnyberg@it.su.se>
Patch-Supplied-by: Jeffrey Hsu <hsu@freebsd.org>
20 years agoCorrect C++ header handling for gcc2 and lex.
Joerg Sonnenberger [Sun, 4 Apr 2004 21:31:14 +0000 (21:31 +0000)]
Correct C++ header handling for gcc2 and lex.

gcc2 used a "beforeinstall" target instead of the standard bsd.incs.mk way.
Therefore certain headers weren't correctly installed when doing an
installincludes or "make includes" from the src root. The cc1plus part was
still installed to the old location and that broke e.g. textproc/jade.

lex installed its C++ interface still to /usr/include/g++, until a decision
about a generic C++ header location is made, a version for both system
compilers is installed.

20 years agoSetting the date/time does not always properly write-back the RTC, causing
Matthew Dillon [Sun, 4 Apr 2004 08:00:06 +0000 (08:00 +0000)]
Setting the date/time does not always properly write-back the RTC, causing
the date/time to be wrong again after a reboot.  This was due to the recent
systimer changes which updated the 'time_second' global via hardclock() only.
Change the writeback code to use microtime() instead of time_second.

Reported-by: esmith <esmith@patmedia.net>
20 years agoPerl is no longer needed by buildworld/buildkernel.
Matthew Dillon [Sun, 4 Apr 2004 01:08:18 +0000 (01:08 +0000)]
Perl is no longer needed by buildworld/buildkernel.

Submitted-by: YONETANI Tomokazu <qhwt+dragonfly-kernel@les.ath.cx>
20 years agoCleanup NXENV so it works properly when running buildworld from FreeBSD.
Matthew Dillon [Sat, 3 Apr 2004 23:07:14 +0000 (23:07 +0000)]
Cleanup NXENV so it works properly when running buildworld from FreeBSD.

20 years agoDispatch reassembled fragment.
Jeffrey Hsu [Sat, 3 Apr 2004 22:18:30 +0000 (22:18 +0000)]
Dispatch reassembled fragment.

20 years agoFix byte-order.
Jeffrey Hsu [Sat, 3 Apr 2004 22:17:59 +0000 (22:17 +0000)]
Fix byte-order.

20 years agoCreate a normal stack frame in generic_bcopy() to aid debugging, so
Matthew Dillon [Sat, 3 Apr 2004 08:21:16 +0000 (08:21 +0000)]
Create a normal stack frame in generic_bcopy() to aid debugging, so
backtraces work properly.

20 years agoFix bugs in xio_copy_*(). We were not using the masked offset when
Matthew Dillon [Sat, 3 Apr 2004 08:20:10 +0000 (08:20 +0000)]
Fix bugs in xio_copy_*().  We were not using the masked offset when
calculation the number of bytes to copy from the first indexed page,
leading to a negative 'n' calculation in situations that could be
triggered with a ^C on programs using pipes (such as a buildworld).
This almost universally resulted in a panic.

20 years agoAdd `device atapicam' to unbreak TINDERBOX config.
Hiten Pandya [Sat, 3 Apr 2004 07:14:08 +0000 (07:14 +0000)]
Add `device atapicam' to unbreak TINDERBOX config.

20 years agoIn the sysclock commit I tried to make 'boottime' a fixed value, but it
Matthew Dillon [Sat, 3 Apr 2004 05:30:10 +0000 (05:30 +0000)]
In the sysclock commit I tried to make 'boottime' a fixed value, but it
ended up being set to the superblock update time (time of last shutdown)
rather then the real time clock during boot.

Give up on making it a fixed value and just set it to the current time
minus the compensated elapsed time (gd->gd_time_seconds) whenever the
time of day is stepped.  Subsystems which use boottime as an identifier,
such as NFS, already copy it and this change effectively returns boottime
operation to its pre-sysclock algorithm.

Reported-by: YONETANI Tomokazu <qhwt+dragonfly-bugs@les.ath.cx> and others
20 years agoAllocate the DMA segment array in bus_dma_tag_create instead of using a
Joerg Sonnenberger [Fri, 2 Apr 2004 18:16:45 +0000 (18:16 +0000)]
Allocate the DMA segment array in bus_dma_tag_create instead of using a
local variable in bus_dmamap_create et al.

20 years agoCorrect the commented-out example for MODULES_OVERRIDE.
Matthew Dillon [Fri, 2 Apr 2004 18:09:38 +0000 (18:09 +0000)]
Correct the commented-out example for MODULES_OVERRIDE.

Submitted-by: Dheeraj Reddy <dheerajs@comcast.net>
20 years agoGarbage-collect unused variable.
Hiten Pandya [Fri, 2 Apr 2004 12:45:40 +0000 (12:45 +0000)]
Garbage-collect unused variable.

20 years agoAdapt the netisr message handlers to accomodate the available error
Hiten Pandya [Fri, 2 Apr 2004 12:32:27 +0000 (12:32 +0000)]
Adapt the netisr message handlers to accomodate the available error
handling facility by returning an appropriate error value.

While I am there, move pppintr() to top of file to simplify things.

This commit is a followup to: rev. 1.2 of src/sys/kern/uipc_msg.c

20 years agoAdd Makefile for the netif/ie ISA NIC driver.
Hiten Pandya [Fri, 2 Apr 2004 11:31:27 +0000 (11:31 +0000)]
Add Makefile for the netif/ie ISA NIC driver.

20 years agoThe globaldata houses a pointer and not an embedded struct for nchstats;
Hiten Pandya [Fri, 2 Apr 2004 10:50:23 +0000 (10:50 +0000)]
The globaldata houses a pointer and not an embedded struct for nchstats;
use the correct access method.

Fixes build of ext2fs kernel module.

20 years agoMake buildkernel's require a buildworld to be done first, because they
Matthew Dillon [Fri, 2 Apr 2004 06:21:36 +0000 (06:21 +0000)]
Make buildkernel's require a buildworld to be done first, because they
no longer munge the pathes to use native apps when buildworld tools aren't
available.

Buildkernel now tells you this and exits if it doesn't think you've done
a buildworld.

Add a new target, 'nativekernel', which just runs config and uses native
tools to build the kernel.  'nativekernel' and 'buildkernel' use the same
object directory but are mutually exclusive.  If you run one, then try to run
the other, it will wipe the directory and start over.

20 years agoPer-CPU VFS Namecache Effectiveness Statistics:
Hiten Pandya [Fri, 2 Apr 2004 05:46:03 +0000 (05:46 +0000)]
Per-CPU VFS Namecache Effectiveness Statistics:

* Convert nchstats into a CPU indexed array

* Export the per-CPU nchstats as a sysctl vfs.cache.nchstats
  and let user-land aggregate them.

* Add a function called kvm_nch_cpuagg() to libkvm; it is
  shared by systat(1) and vmstat(1) and the ncache-stats test
  program.  As the function name suggests, it aggregates
  the per-CPU nchstats.

* Move struct nchstats into a separate header to avoid
  header file namespace pollution; sys/nchstats.h.

* Keep a cached copy of the globaldata pointer in the VFS
  specific LOOKUP op, and use that to increment the
  namecache effectiveness counters (nchstats).

* Modify systat(1) and vmstat(1) to accomodate the new
  behavior of accessing nchstats.  Remove a (now) redundant
  sysctl to get the cpu count (hw.ncpu), instead we just divide
  the total length of the nchstats array returned by sysctl
  by sizeof(struct nchstats) to get the CPU count.

* Garbage-collect unused variables and fix nearby warnings
  in systat(1) an vmstat(1).

* Add a very-cool test program, that prints the nchstats
  per-CPU statistics to show CPU distribution.  Here is the
  output it generates on an 2-processor SMP machine:

  gray# ncache-stats
  VFS Name Cache Effectiveness Statistics
     4207370 total name lookups
  COUNTER             CPU-1       CPU-2           TOTAL
  goodhits            2477657     1060677         (3538334  )
  neghits             107531      47294           (154825   )
  badhits             28968       7720            (36688    )
  falsehits           0           0               (0        )
  misses              339671      137852          (477523   )
  longnames           0           0               (0        )
  passes 2            13104       6813            (19917    )
  2-passes            25134       15257           (40391    )

The SMP machine used for testing this commit was proudly presented
by David Rhodus <drhodus@dragonflybsd.org>.

Reviewed-by: Matthew Dillon <dillon@backplane.com>

20 years agoConsolidate length checks in ip_demux().
Jeffrey Hsu [Thu, 1 Apr 2004 23:04:50 +0000 (23:04 +0000)]
Consolidate length checks in ip_demux().

20 years agoStyle(9) cleanup to src/sys/vfs, stage 3/21: fdesc.
Chris Pressey [Thu, 1 Apr 2004 19:08:15 +0000 (19:08 +0000)]
Style(9) cleanup to src/sys/vfs, stage 3/21: fdesc.

- Convert K&R-style function definitions to ANSI style.

Submitted-by: Andre Nathan <andre@digirati.com.br>
Additional-reformatting-by: cpressey
20 years agoEnhance the pmap_kenter*() API and friends, separating out entries which
Matthew Dillon [Thu, 1 Apr 2004 17:58:08 +0000 (17:58 +0000)]
Enhance the pmap_kenter*() API and friends, separating out entries which
only need invalidation on the local cpu against entries which need invalidation
across the entire system, and provide a synchronization abstraction.

Enhance sf_buf_alloc() and friends to allow the caller to specify whether the
sf_buf's kernel mapping is going to be used on just the current cpu or
whether it needs to be valid across all cpus.  This is done by maintaining
a cpumask of known-synchronized cpus in the struct sf_buf

Optimize sf_buf_alloc() and friends by removing both TAILQ operations in the
critical path.  TAILQ operations to remove the sf_buf from the free queue
are now done in a lazy fashion.  Most sf_buf operations allocate a buf,
work on it, and free it, so why waste time moving the sf_buf off the freelist
if we are only going to move back onto the free list a microsecond later?

Fix a bug in sf_buf_alloc() code as it was being used by the PIPE code.
sf_buf_alloc() was unconditionally using PCATCH in its tsleep() call, which
is only correct when called from the sendfile() interface.

Optimize the PIPE code to require only local cpu_invlpg()'s when mapping
sf_buf's, greatly reducing the number of IPIs required.  On a DELL-2550,
a pipe test which explicitly blows out the sf_buf caching by using huge
buffers improves from 350 to 550 MBytes/sec.  However, note that buildworld
times were not found to have changed.

Replace the PIPE code's custom 'struct pipemapping' structure with a
struct xio and use the XIO API functions rather then its own.

20 years agoFix an unused variable warning (non-operational).
Matthew Dillon [Thu, 1 Apr 2004 17:41:19 +0000 (17:41 +0000)]
Fix an unused variable warning (non-operational).

20 years agoImplement a convenient gd_cpumask so we don't have to do 1 << gd->gd_cpuid
Matthew Dillon [Thu, 1 Apr 2004 17:40:59 +0000 (17:40 +0000)]
Implement a convenient gd_cpumask so we don't have to do 1 << gd->gd_cpuid
all the time.

20 years agoConvert sis(4) from vtophys to busdma.
Joerg Sonnenberger [Thu, 1 Apr 2004 16:24:57 +0000 (16:24 +0000)]
Convert sis(4) from vtophys to busdma.

Obtained-from: FreeBSD 5

20 years agoKObj extension stage II/III
Joerg Sonnenberger [Thu, 1 Apr 2004 13:50:47 +0000 (13:50 +0000)]
KObj extension stage II/III

Tokenize kobj to make it SMP safe. This is based on the assumption that
drivers are responsible for not removing active devices. This allows us
to avoid all locks / critical sections for method lookup and object
instantiation / uninstanziation, leaving only the class management.

20 years agoKObj extension stage I/III
Joerg Sonnenberger [Thu, 1 Apr 2004 08:41:24 +0000 (08:41 +0000)]
KObj extension stage I/III

Isolate the reference counting for kobj classes in special functions to
allow clean locking in the next step.

Merge all calls of kobj_class_compile either into the new
kobj_class_instantiate or into kobj_init and make it static. Same for
kobj_class_free.

Remove kobj_class_compile_static, it wasn't used and is pretty pointless
since the kobj framework is not used before the VM subsystem has been
initialized.

20 years agoRemove struct driver and make driver_t directly defined as kobj_class.
Joerg Sonnenberger [Thu, 1 Apr 2004 07:33:18 +0000 (07:33 +0000)]
Remove struct driver and make driver_t directly defined as kobj_class.
The additional *priv field is only used by the ISA/PCI compat shims and
those can use a local struct instead.

20 years agoAdd the "struct ucred *" argument to the remaining nic ioctls in LINT.
Joerg Sonnenberger [Thu, 1 Apr 2004 07:27:17 +0000 (07:27 +0000)]
Add the "struct ucred *" argument to the remaining nic ioctls in LINT.

20 years agoFix warning about unused variable
Joerg Sonnenberger [Thu, 1 Apr 2004 06:52:45 +0000 (06:52 +0000)]
Fix warning about unused variable

20 years agoRemove unused obsolete drivers.
Joerg Sonnenberger [Thu, 1 Apr 2004 06:23:18 +0000 (06:23 +0000)]
Remove unused obsolete drivers.

20 years agoGet rid of the upper-end malloc() limit for the pipe throughput test.
Matthew Dillon [Thu, 1 Apr 2004 01:47:44 +0000 (01:47 +0000)]
Get rid of the upper-end malloc() limit for the pipe throughput test.

20 years agoRemove the ip_mthread_enable sysctl option. Enough code has been converted
Jeffrey Hsu [Thu, 1 Apr 2004 01:38:53 +0000 (01:38 +0000)]
Remove the ip_mthread_enable sysctl option.  Enough code has been converted
over to threads and message-passing that true dispatching is required for
proper synchronization.

Approved by: Matt Dillon

20 years agoStyle(9) cleanup.
Chris Pressey [Wed, 31 Mar 2004 23:20:22 +0000 (23:20 +0000)]
Style(9) cleanup.

- Convert K&R-style function definitions to ANSI style.
- Remove `register' keywords.
- Remove casts to void when ignoring return values.
- Remove explicit `return' at end of void functions.
- Additional minor whitespace and formatting adjustments.
- No functional changes.

20 years agoStyle(9) cleanup to src/sys/vfs, stage 2/21: deadfs.
Chris Pressey [Wed, 31 Mar 2004 23:13:43 +0000 (23:13 +0000)]
Style(9) cleanup to src/sys/vfs, stage 2/21: deadfs.

- Convert K&R-style function definitions to ANSI style.

Submitted-by: Andre Nathan <andre@digirati.com.br>
Additional-reformatting-by: cpressey
20 years agoAdd missing sf_buf_free()'s.
Matthew Dillon [Wed, 31 Mar 2004 22:08:32 +0000 (22:08 +0000)]
Add missing sf_buf_free()'s.

Reported-by: Jonathan Lemon <jlemon@flugsvamp.com>
20 years agoCorrect type slippage in previous commit: a u_int was accidentally
Chris Pressey [Wed, 31 Mar 2004 21:03:38 +0000 (21:03 +0000)]
Correct type slippage in previous commit: a u_int was accidentally
turned into a u_long.  Change it back.

20 years agoCleanup the forking behavior of the CAPS client test program.
Matthew Dillon [Wed, 31 Mar 2004 20:27:34 +0000 (20:27 +0000)]
Cleanup the forking behavior of the CAPS client test program.

20 years agoAllow the child priority (receive side of the pipe test) to be specified
Matthew Dillon [Wed, 31 Mar 2004 20:27:09 +0000 (20:27 +0000)]
Allow the child priority (receive side of the pipe test) to be specified
on the command line.  Default it to be the same as the parent.

20 years agoClarify the purpose of liby:
Chris Pressey [Wed, 31 Mar 2004 20:25:37 +0000 (20:25 +0000)]
Clarify the purpose of liby:

- Mention it in the FILES section of the yacc(1) man page.
- Create an MLINK from liby.3 -> yacc.1 so users can `man liby'.

Approved-by: dillon
20 years agoCleanup libcaps to support recent LWKT changes. Add TDF_SYSTHREAD back
Matthew Dillon [Wed, 31 Mar 2004 20:23:42 +0000 (20:23 +0000)]
Cleanup libcaps to support recent LWKT changes.  Add TDF_SYSTHREAD back
to sys/thread.h (libcaps needs it).

20 years agoRemove `-ly' and `${LIBY}' from our Makefiles. Linking to liby is not
Chris Pressey [Wed, 31 Mar 2004 20:22:14 +0000 (20:22 +0000)]
Remove `-ly' and `${LIBY}' from our Makefiles.  Linking to liby is not
necessary for any of our programs, as they all supply their own main()
and yyerror() functions.

Also add $DragonFly$ to these files as needed for the commit.

Approved-by: dillon
20 years agoTrash the vmspace_copy() hacks that CAPS was previously using. No other
Matthew Dillon [Wed, 31 Mar 2004 19:29:26 +0000 (19:29 +0000)]
Trash the vmspace_copy() hacks that CAPS was previously using.  No other
subsystem uses these hacks and the new XIO mechanism is far, far superior.

20 years agoChange CAPS over to use XIO instead of the vmspace_copy() junk it was using
Matthew Dillon [Wed, 31 Mar 2004 19:28:29 +0000 (19:28 +0000)]
Change CAPS over to use XIO instead of the vmspace_copy() junk it was using
before.  This almost doubles CAPS IPC messaging performance.

Also correct a number of memory leaks due to incorrect reference counting.

20 years agoHook XIO up to the kernel build.
Matthew Dillon [Wed, 31 Mar 2004 19:24:28 +0000 (19:24 +0000)]
Hook XIO up to the kernel build.

20 years agoInitial XIO implementation. XIOs represent data through a list of VM pages
Matthew Dillon [Wed, 31 Mar 2004 19:24:17 +0000 (19:24 +0000)]
Initial XIO implementation.  XIOs represent data through a list of VM pages
rather then mapped KVM, allowing them to be passed between threads without
having to worry about KVM mapping overheads, TLB invalidation, and so forth.

This initial implementation supports creating XIOs from user or kernel data
and copying from an XIO to a user or kernel buffer or a uio.  XIO are intended
to be used with CAPS, PIPES, VFS, DEV, and other I/O paths.

The XIO concept is an outgrowth of Alan Cox'es unique use of target-side
SF_BUF mapping to improve pipe performance.

20 years agoM_NOWAIT => M_INTWAIT conversion. This subsystems are way too crucial to
Joerg Sonnenberger [Wed, 31 Mar 2004 16:39:20 +0000 (16:39 +0000)]
M_NOWAIT => M_INTWAIT conversion. This subsystems are way too crucial to
have failing memory allocations. At least some of same are handled via
panic anyway.

20 years agoThe existing hash algorithm in bufhash() does not distribute entries
David Rhodus [Wed, 31 Mar 2004 15:32:53 +0000 (15:32 +0000)]
The existing hash algorithm in bufhash() does not distribute entries
very well across buckets, especially in the case of cylinder group blocks
which are located at a sequence of locations that are a multiple of a large
power of two apart.  In the case of large file systems, one or possibly
a few of the hash chains can get excessively long.  Replace the existing
hash algorithm with a variation on the Fibonacci hash.

Merged from FreeBSD

20 years agoOnly enter into wildcard hash table if bind succeeds.
Jeffrey Hsu [Wed, 31 Mar 2004 10:23:10 +0000 (10:23 +0000)]
Only enter into wildcard hash table if bind succeeds.

20 years agoOnly enter into wildcard hash table if bind succeeds.
Jeffrey Hsu [Wed, 31 Mar 2004 07:21:38 +0000 (07:21 +0000)]
Only enter into wildcard hash table if bind succeeds.

20 years agoStyle(9) cleanup to src/sys/vfs, stage 1/21: coda.
Chris Pressey [Wed, 31 Mar 2004 02:34:37 +0000 (02:34 +0000)]
Style(9) cleanup to src/sys/vfs, stage 1/21: coda.

- Convert K&R-style function definitions to ANSI style.

Submitted-by: Andre Nathan <andre@digirati.com.br>
Minor-whitespace-tweaks-by: cpressey
20 years agoOnly enter wildcard sockets into the wildcard hash table.
Jeffrey Hsu [Wed, 31 Mar 2004 00:43:09 +0000 (00:43 +0000)]
Only enter wildcard sockets into the wildcard hash table.

20 years agoSecond major scheduler patch. This corrects interactive issues that were
Matthew Dillon [Tue, 30 Mar 2004 19:14:18 +0000 (19:14 +0000)]
Second major scheduler patch.  This corrects interactive issues that were
introduced in the pipe sf_buf patch.

Split need_resched() into need_user_resched() and need_lwkt_resched().
Userland reschedules are requested when a process is scheduled with a higher
priority then the currently running process, and LWKT reschedules are
requested when a thread is scheduled with a higher priority then the
currently running thread.  As before, these are ASTs, LWKTs are not
preemptively switch while running in the kernel.

Exclusively use the resched wanted flags to determine whether to reschedule
or call lwkt_switch() upon return to user mode.  We were previously also
testing the LWKT run queue for higher priority threads, but this was causing
inefficient scheduler interactions when two processes are doing tightly
bound synchronous IPC (e.g. using PIPEs) because in DragonFly the LWKT
priority of a thread is raised when it enters the kernel, and lowered when
it tries to return to userland.  The wakeups occuring in the pipe code
were causing extra quick-flip thread switches.

Introduce a new tsleep() flag which disables the need_lwkt_resched() call
when the sleeping thread is woken up.   This is used by the PIPE code in
the synchronous direct-write PIPE case to avoid the above problem.

Redocument and revamp the ESTCPU code.  The original changes reduced the
interrupt rate from 100Hz (FBsd-4 and FBsd-5) to 20Hz, but did not compensate
for the slower ramp-up time.  This commit introduces a 'virtual' ESTCPU
frequency which compensates without us having to bump up the actual systimer
interrupt rate.

Redo the P_CURPROC methodology, which is used by the userland scheduler
to manage processes running in userland.  Create a globaldata->gd_uschedcp
process pointer which represents the current running-in-userland (or about
to be running in userland) process, and carefully recode acquire_curproc()
to allow this gd_uschedcp designation to be stolen from other threads trying
to return to userland without having to request a reschedule (which would
have to switch back to those threads to release the designation).  This
reduces the number of unnecessary context switches that occur due to
scheduler interactions.  Also note that this specifically solves the case
where there might be several threads running in the kernel which are trying
to return to userland at the same time.  A heuristic check against gd_upri
is used to select the correct thread for schedling to userland 'most of the
time'.  When the correct thread is not selected, we fall back to the old
behavior of forcing a reschedule.

Add debugging sysctl variables to better track userland scheduler efficiency.

With these changes pipe statistics are further improved.  Though some
scheduling aberrations still exist(1), the previous scheduler had totally
broken interactive processes and this one does not.

BLKSIZE BEFORE NEWPIPE NOW     Tests on AMD64
MBytes/s MBytes/s MBytes/s 3200+ FN85MB
    (64KB L1, 1MB L2)
256KB 1900 2200 2250
 64KB 1800 2200 2250
 32KB - - 3300
 16KB 1650 2500-3000 2600-3200
  8KB 1400 2300 2000-2400(1)
  4KB 1300 1400-1500 1500-1700

20 years agoAdd SI_SUB_LOCK as sysinit priority for the initialisation of tokens and
Joerg Sonnenberger [Tue, 30 Mar 2004 17:18:58 +0000 (17:18 +0000)]
Add SI_SUB_LOCK as sysinit priority for the initialisation of tokens and
lockmgr locks. This priority should not be abused, since it is higher then
SI_SUB_VM.

20 years agoStyle(9) cleanup.
Chris Pressey [Tue, 30 Mar 2004 02:59:00 +0000 (02:59 +0000)]
Style(9) cleanup.

- Convert K&R-style function definitions to ANSI style.
- Remove ``register'' keywords.
- Remove casts to (void) when ignoring return values.
- Sort #include's.
- Minor whitespace adjustments.
- No functional changes.

20 years agoStyle(9) cleanup.
Chris Pressey [Tue, 30 Mar 2004 02:30:59 +0000 (02:30 +0000)]
Style(9) cleanup.

- Sort #include's.
- Convert to fully ANSI function definitions: use (void) instead of ()
- No functional changes.

20 years agoStyle(9) cleanup.
Chris Pressey [Tue, 30 Mar 2004 01:14:22 +0000 (01:14 +0000)]
Style(9) cleanup.

- Convert K&R-style function definitions to ANSI style.
- Remove ``register'' keywords.
- char *argv[] -> char **argv
- No functional changes.

20 years agoProtect the mntvnode scan for coda with the proper token. Since we do not
Matthew Dillon [Mon, 29 Mar 2004 20:52:17 +0000 (20:52 +0000)]
Protect the mntvnode scan for coda with the proper token.  Since we do not
block we do not need to use the vmntvnodescan() facility (as long as we
take the appropriate precautions), and we can remove the v_mount != mp
test.

20 years agoCount vnodes held on the mount list simply by using the
Matthew Dillon [Mon, 29 Mar 2004 20:43:52 +0000 (20:43 +0000)]
Count vnodes held on the mount list simply by using the
mp->mnt_nvnodelistsize field, instead of physically looping on the
vnode list.

Suggested-by: someone, not sure. Hiten or David maybe.
20 years agoMove vm_fault_quick() out from the machine specific location
David Rhodus [Mon, 29 Mar 2004 17:30:23 +0000 (17:30 +0000)]
Move vm_fault_quick() out from the machine specific location
as the function is now cpu agnostic.

20 years agoMake sure the ELF header size is not too large. This fixes a potential over
David Rhodus [Mon, 29 Mar 2004 17:17:09 +0000 (17:17 +0000)]
Make sure the ELF header size is not too large. This fixes a potential over
flow that could happen in a number of places. In DragonFly we rely that the
ELF header will be in the first page. Though the ABI specification does not
require this it is always true in practice.

Glanced at FreeBSD but found it be incomplete. Possibly more bounds
checking is needed for other things here, though futher investigation is
needed first.

20 years agoUDF was not properly cleaning up getblk'd buffers in the face of error
Matthew Dillon [Mon, 29 Mar 2004 16:38:36 +0000 (16:38 +0000)]
UDF was not properly cleaning up getblk'd buffers in the face of error
conditions.  In some places it was assuming that getblk() would not
return a buffer on error, but in fact getblk() generally always returns
a buffer whether an error occurs or not (and always on an I/O error).

Reported-by: David Rhodus <drhodus@crater.dragonflybsd.org>
20 years agoBring in a bunch of well tested MPIPE changes. Preallocate a minimum
Matthew Dillon [Mon, 29 Mar 2004 16:22:23 +0000 (16:22 +0000)]
Bring in a bunch of well tested MPIPE changes.  Preallocate a minimum
number of mpipe elements when it is initialized.  Use an array to cache
free MPIPE buffers nad remove the data structure overloading that was
previously occuring on the buffer itself.  Add a deconstructor.  Separate
the blocking and non-blocking allocation APIs into their own functions.

The new code still needs Giant, but it's getting a lot closer to being
lock free.

20 years agoGenerally bring in additional sf_buf improvements from FreeBSD-5. Separate
Matthew Dillon [Mon, 29 Mar 2004 15:46:21 +0000 (15:46 +0000)]
Generally bring in additional sf_buf improvements from FreeBSD-5.  Separate
the wiring used by sendfile into its own mbuf_ext support code and remove it
from the sf_buf code.  Alan Cox's uiomove_fromphys() was expecting to use
the cleaner version of sf_buf.  This fixes a long standing bug related to
multiple mbuf refs in the sendfile() code and also fixes recent bugs
introduced to the PIPE code from the importation of uiomove_fromphys() (due
to differences in the sf_buf API).  The sf_buf API is now more normalized
towards FBSD-5.

Note that the mbuf_ext API has not changed, and is very differnt from
FBSD-5 in regards to handling multiple references.  Introduce some temporary
hacks to sf_buf to get around the fact which will be pulled when the
mbuf_ext API is updated later on.

20 years ago* Change the offset alignment in vn_rdwe_inchunks()
David Rhodus [Mon, 29 Mar 2004 15:21:42 +0000 (15:21 +0000)]
* Change the offset alignment in vn_rdwe_inchunks()
This is primarily used by the ELF image activator.

FreeBSD src repository

  Modified files:
    sys/kern             vfs_vnops.c
  Log:
  Align the offset in vn_rdwr_inchunks() so that at most the first and
  the last chunk are misaligned relative to a MAXBSIZE byte boundary.
  vn_rdwr_inchunks() is used mainly for elf core dumps, and elf sections
  are usually perfectly misaligned relative to MAXBSIZE, and chunking
  prevents the file system from doing much realigning.

  This gives a surprisingly large speedup for core dumps -- from 50 to
  13 seconds for a 512MB core dump here.  The pessimization was mostly
  from an interaction of the misalignment with IO_DIRECT.  It increased
  the number of i/o's for each chunk by a factor of 5 (3 writes and 2
  read-before-writes instead of 1 write).

20 years ago* Fix an off-by-one problem.
David Rhodus [Mon, 29 Mar 2004 15:17:51 +0000 (15:17 +0000)]
* Fix an off-by-one problem.

* Don't sleep on NULL anymore.

Merged from FreeBSD

20 years agoRevert last commit. This should not have happened.
Joerg Sonnenberger [Mon, 29 Mar 2004 14:16:32 +0000 (14:16 +0000)]
Revert last commit. This should not have happened.

20 years agoRemove the old locking based on memory flags by lockmgr based code.
Joerg Sonnenberger [Mon, 29 Mar 2004 14:08:09 +0000 (14:08 +0000)]
Remove the old locking based on memory flags by lockmgr based code.

Initial effort by Eirik Nygaard.

20 years agokern_sysctl.c
Joerg Sonnenberger [Mon, 29 Mar 2004 14:06:31 +0000 (14:06 +0000)]
kern_sysctl.c

20 years agoInitialize the pcpu clocks after we've activated the cpu bit in
Matthew Dillon [Mon, 29 Mar 2004 07:36:48 +0000 (07:36 +0000)]
Initialize the pcpu clocks after we've activated the cpu bit in
smp_active_mask rather then before.

20 years agoAdd functionality to binutils 2.14's ld to scan /var/run/ld-elf.so.hints
Joerg Sonnenberger [Sun, 28 Mar 2004 16:26:33 +0000 (16:26 +0000)]
Add functionality to binutils 2.14's ld to scan /var/run/ld-elf.so.hints
for dependencies of shared libraries.

Submitted-By: Andreas Hauser <andy@splashground.de>
Obtained-From: FreeBSD / in-tree binutils 2.12

20 years agoAdd the pipe2 sysperf test. This test issues block writes from parent to
Matthew Dillon [Sun, 28 Mar 2004 09:21:53 +0000 (09:21 +0000)]
Add the pipe2 sysperf test.  This test issues block writes from parent to
child over a pipe and reports on the overhead and data rate.  The block size
is specified on the command line.

20 years agoUpdate to style(9) guidelines.
Chris Pressey [Sun, 28 Mar 2004 09:10:03 +0000 (09:10 +0000)]
Update to style(9) guidelines.

- Explicitly state that the ``register'' keyword should be avoided.
- Correct example and description of preferred indentation when wrapping
  function arguments over multiple lines.
- Resolve contradictions in guidelines for ``return''.
- Explicitly state whitespace rules for commas, semicolons, ``->'' and
  ``.'' operators.
- General clarifications.

Approved-by: dillon
20 years agoImport Alan Cox's /usr/src/sys/kern/sys_pipe.c 1.171. This rips out
Matthew Dillon [Sun, 28 Mar 2004 08:25:54 +0000 (08:25 +0000)]
Import Alan Cox's /usr/src/sys/kern/sys_pipe.c 1.171.  This rips out
writer-side KVA mappings and replaces them with writer-side vm_page wiring
(left intact from before) plus reader-side SF_BUF copies.

Import 1.141, which is a simple patch which removes a blocking condition
when space is available in the pipe's write buffer which was causing
non-blocking I/O select-based writes to spin-wait unnecessarily.  1.171
rips out writer-side KVA mappings and replaces them

Import FreeBSD-5.x's uiomove_fromphys(), which sys_pipe.c now uses.  This
procedure could become very useful in a number of DragonFly subsystems.

This greatly improves PIPE performance for the direct-mapped case (moderate
to large reads and writes).  Additionally, recent scheduler fixes greatly
improve PIPE performance for both the direct-mapped and small-buffer cases.

NOTE: wired page limits for pipes have not yet been imported, and the heavy
use of sf_buf's may require some tuning in the many-pipes case.

    BLKSIZE BEFORE AFTER
MBytes/s MBytes/s Tests on AMD64/3200+ FN85 MB
    ------- ------ ------ (64KB L1, 1MB L2)
    256KB 1900 2200
     64KB 1800 2200
     16KB 1650 2500-3000
      8KB 1400 2300
      4KB 1300 1400-1500 (note 1)

    note 1: The 4KB case is not a direct-write case, the results are due to
    the scheduler fixes only.

Obtained-from: FreeBSD-5.x / FreeBSD's Alan Cox

20 years agoDo some major performance tuning of the userland scheduler.
Matthew Dillon [Sun, 28 Mar 2004 08:03:05 +0000 (08:03 +0000)]
Do some major performance tuning of the userland scheduler.

When determining whether to reschedule, use a relative priority comparison
against PPQ rather then a queue index comparison to avoid the edge case
where two processes are only a p_priority of 1 apart, but fall into
different queues.  This reduces unnecessary preemptive context switches.
Also change the sense of test_resched() and document it.

Properly incriement p_ru.ru_nivcsw (involuntary context switches stat counter).

Fix uio_yield().  We have to call lwkt_setpri_self() to cycle our thread
to the end of its runq, and we do not need to call acquire_curproc() and
release_curproc() after switching.

When returning to userland, lower our priority and call lwkt_maybe_switch()
BEFORE acquiring P_CURPROC.  Before we called lwkt_maybe_switch() after we
acquired P_CURPROC which could result in us holding P_CURPROC, switching to
another thread which itself returns to usermode at a higher priority, and
that thread having to switch back to us to release P_CURPROC and then us back
to the other thread again.  This reduces the number of unnecessary context
switches that occur in certain situations.  In particular, this cuts the
number of context switches in PIPE situations by 50-75% (1/2 to 2/3).

20 years agoProtect v_usecount with a critical section for now (we depend on the BGL),
Matthew Dillon [Sun, 28 Mar 2004 07:54:00 +0000 (07:54 +0000)]
Protect v_usecount with a critical section for now (we depend on the BGL),
and assert that it does not drop below 0.

Suggested-by: David Rhodus <drhodus@machdep.com>
20 years agoStyle(9) cleanup.
Chris Pressey [Sun, 28 Mar 2004 01:02:54 +0000 (01:02 +0000)]
Style(9) cleanup.

- Remove ``register'' keyword.
- Remove casts to (void) when ignoring return values.
- Add ``static'' to internal function prototypes.
- Change an occurance of 1 to STDOUT_FILENO.
- Change an occurance of BUFSIZ to sizeof(ibuf).
- No functional changes.

20 years agoCorrect misspelling of "orphan" and fix up comment structure.
Chris Pressey [Sun, 28 Mar 2004 00:48:00 +0000 (00:48 +0000)]
Correct misspelling of "orphan" and fix up comment structure.

Obtained-from: NetBSD, src/sys/coda/coda_vfsops.c revision 1.32

20 years agoChange sendfile() to send the header out coaleseced with the data.
Jeffrey Hsu [Sat, 27 Mar 2004 21:01:03 +0000 (21:01 +0000)]
Change sendfile() to send the header out coaleseced with the data.

Inspired by Mike Silbersack's FreeBSD rev 1.171 to uipc_syscalls.c.

20 years agoPull out m_uiomove() functionality from sosend().
Jeffrey Hsu [Sat, 27 Mar 2004 11:50:45 +0000 (11:50 +0000)]
Pull out m_uiomove() functionality from sosend().

20 years agoGive UDP its own sosend() function.
Jeffrey Hsu [Sat, 27 Mar 2004 11:48:48 +0000 (11:48 +0000)]
Give UDP its own sosend() function.

20 years agoCorrect a typo that was introduced in revision 1.2.
Chris Pressey [Sat, 27 Mar 2004 01:46:10 +0000 (01:46 +0000)]
Correct a typo that was introduced in revision 1.2.
In makeargv(), 'margv' should have been 'margc'.
Compiles without warnings now.

Confirmed-with: FreeBSD CVSweb,
                /src/usr.sbin/timed/timedc/timedc.c revision 1.5

20 years agoStyle(9) cleanup: remove ``register'' keywords.
Chris Pressey [Sat, 27 Mar 2004 01:39:13 +0000 (01:39 +0000)]
Style(9) cleanup: remove ``register'' keywords.

20 years agoMake the .nx/.no native program helper binaries work and add some missing
Matthew Dillon [Fri, 26 Mar 2004 21:58:13 +0000 (21:58 +0000)]
Make the .nx/.no native program helper binaries work and add some missing
header file dependancies.

20 years agoThe NXCC (native C compiler) misnamed OBJFORMATPATH, it neesd to be
Matthew Dillon [Fri, 26 Mar 2004 21:57:23 +0000 (21:57 +0000)]
The NXCC (native C compiler) misnamed OBJFORMATPATH, it neesd to be
OBJFORMAT_PATH, causing 'missing crt1.o' from ld in the buildworld
includes stage.

20 years agoChange this vnode check inside of the VFS_BIO_DEBUG
David Rhodus [Fri, 26 Mar 2004 17:23:42 +0000 (17:23 +0000)]
Change this vnode check inside of the VFS_BIO_DEBUG
code path to check for erroneous hold counts from the
reference count check that was an el-relevant check here.

20 years agoUpdate rc.d scripts to use the correct path for the named pidfile.
David Rhodus [Fri, 26 Mar 2004 13:32:27 +0000 (13:32 +0000)]
Update rc.d scripts to use the correct path for the named pidfile.

Sent in by: Craig Dooley <craig@xlnx-x.net>

20 years agoStyle(9) cleanup.
Chris Pressey [Fri, 26 Mar 2004 00:30:13 +0000 (00:30 +0000)]
Style(9) cleanup.

- Convert K&R-style function definitions to ANSI style.
- Remove ``register'' keywords.
- Adjust whitespace and parens w.r.t. style(9) and remove casts to
  (void) when ignoring return values (time.c only.)

20 years agoFour new features and a bugfix.
Chris Pressey [Thu, 25 Mar 2004 23:55:13 +0000 (23:55 +0000)]
Four new features and a bugfix.

- Center the clock on the user's terminal.
- Check that the terminal is sufficiently large to fully display the
  clock (about 61x9.)  If not, exit immediately with an error.
- Introduce a short delay in the scrolling when -s is given, so that
  it can be better appreciated on syscons(4) and local xterm(1).  The
  default delay is 120 milliseconds.
- Add a new option, -d, to allow changing the scroll delay to any
  duration from 0 to 5000 milliseconds.  The -d option implies -s.
- Make it so that, when the optional argument is omitted, the clock
  really does run forever.  (Before this, it would have stopped after
  about 65536 seconds due to wraparound.)

20 years agoAttach mount_udf to the buildworld process now.
David Rhodus [Thu, 25 Mar 2004 22:07:21 +0000 (22:07 +0000)]
Attach mount_udf to the buildworld process now.

20 years agoFix a missing makewhatis related change so buildworld works again.
Matthew Dillon [Thu, 25 Mar 2004 20:52:43 +0000 (20:52 +0000)]
Fix a missing makewhatis related change so buildworld works again.

Reported-by: Chris Pressey <cpressey@catseye.mine.nu>