dragonfly.git
20 years agoAdd predicate message facility.
Jeffrey Hsu [Sat, 10 Apr 2004 00:48:06 +0000 (00:48 +0000)]
Add predicate message facility.

20 years agoSend connects to the right processor.
Jeffrey Hsu [Sat, 10 Apr 2004 00:10:42 +0000 (00:10 +0000)]
Send connects to the right processor.

20 years agoAdd header file to pull in the setting of the TCP_DISTRIBUTED_TCBINFO option.
Jeffrey Hsu [Sat, 10 Apr 2004 00:07:16 +0000 (00:07 +0000)]
Add header file to pull in the setting of the TCP_DISTRIBUTED_TCBINFO option.

20 years agoFix typo with last minute change in last commit.
Jeffrey Hsu [Fri, 9 Apr 2004 23:33:02 +0000 (23:33 +0000)]
Fix typo with last minute change in last commit.

20 years agoPush the lwkt_replymsg() up one level from netisr_service_loop() to
Jeffrey Hsu [Fri, 9 Apr 2004 22:34:10 +0000 (22:34 +0000)]
Push the lwkt_replymsg() up one level from netisr_service_loop() to
the message handler so we can explicitly reply or not reply as appropriate.

20 years agonawk => awk
Joerg Sonnenberger [Fri, 9 Apr 2004 13:06:15 +0000 (13:06 +0000)]
nawk => awk

20 years agoThis is _SYS_XIO_H, not _SYS_UIO_H.
Joerg Sonnenberger [Fri, 9 Apr 2004 12:51:20 +0000 (12:51 +0000)]
This is _SYS_XIO_H, not _SYS_UIO_H.

Noticed by: ibotty

20 years agoIntroduce negative (ENOENT) caching for NFS. Before this, an attempt to
Matthew Dillon [Thu, 8 Apr 2004 22:32:14 +0000 (22:32 +0000)]
Introduce negative (ENOENT) caching for NFS.  Before this, an attempt to
lookup a non-existant path would ALWAYS result in packet traffic.  That is,
NFS was only attribute-caching successful lookups, not failed lookups,
and was not making use of the VFS cache facility virtually at all.  This new
features complements the existing attribute cachign feature.

Add a sysctl, vfs.nfs.neg_cache_timeout, which controls the timeout for
negatively cached lookups.  The default is 3 seconds.  You can set this
sysctl to 0 to recover the old non-negative-caching behavior.

This makes a HUGE difference for programs which search nfs directories, such
as compilers (the header file search path), make, and a few other utilities.
NFS packet traffic can be reduced upwards of 90%.  For example, with /usr/src
mounted via NFS, building libc a second time without negative caching
generates 66000 packets of NFS traffic in each direction, building libc
a second time with negative caching enabled generates 9500 packets worth
of NFS traffic, in EACH DIRECTION.  While it is true that negative lookups
are cached on the NFS server, the huge reduction in network traffic and
equivalent reduction in synchronous read latencies result in radically
reduced overheads across the board for operations which generate a lot of
negative hits.

A buildworld test with the default 3 second negative caching timeout went
from 2265 seconds to 1900 seconds.

20 years agonamecache work stage 4a: Do some minor performance cleanups with negative
Matthew Dillon [Thu, 8 Apr 2004 22:00:41 +0000 (22:00 +0000)]
namecache work stage 4a: Do some minor performance cleanups with negative
caching, add a cache entry timeout feature.

20 years agoStyle(9) cleanup to src/sys/vfs, stage 5/21: ext2fs.
Chris Pressey [Thu, 8 Apr 2004 20:57:52 +0000 (20:57 +0000)]
Style(9) cleanup to src/sys/vfs, stage 5/21: ext2fs.

- Convert K&R-style function definitions to ANSI style.

Submitted-by: Andre Nathan <andre@digirati.com.br>
Additional-reformatting-by: cpressey
20 years agoWorkaround for not having a proc context. Use the thread0 context when
Jeffrey Hsu [Thu, 8 Apr 2004 20:13:28 +0000 (20:13 +0000)]
Workaround for not having a proc context.  Use the thread0 context when
the real context is not available.  The real solution is to propagate
the information passed into ngc_send() down to here.  This workaround
implements the same incorrect behavior as FreeBSD as of rev 1.4 of
ng_ksocket.c in 1999.

20 years agonamecache work stage 4:
Matthew Dillon [Thu, 8 Apr 2004 17:56:48 +0000 (17:56 +0000)]
namecache work stage 4:

(1) Remove vnode->v_dd, vnode->v_ddid, namecache->nc_dvp_data, and
namecache->nc_dvp_id.  These identifiers were being used to detect stale
parent directory linkages in the namecache and were leftovers from the
original FreeBSD-4.x namecache topology.  The new namecache topology
actively discards such linkages and does not require them.

(2) Cleanup kern/vfs_cache.c, abstracting out allocation and parent
link/unlink operations into their own procedures.

(3) Formally allow a disjoint topology.  That is, allow the case where
nc_parent is NULL.  When constructing namecache entries (dvp,vp), require
that that dvp be associated with a namecache record so we can create the
proper parent->child linkage.  Since no naming information is known for
dbp, formally allow unnamed namecache records to be created in order to
create the association.

(4) Properly relink parent namecache entries when ".." is entered into
the cache.  This is what relinks a disjoint namecache topology after it
has been partially purged or when the namecache is instantiated in the
middle of the logical topology (and thus disjoint).

Note that the original plan was to not allow a disjoint topology, but after
much hair pulling I've come to the conclusion that it is impossible to do
this.  So the work now formally allows a disjoint topology but also, unlike
the original FreeBSD code, takes pains to try to keep the topology intact
by only recycling 'leaf' vnodes.  This is accomplished by vref()ing a vnode
when its namecache records have children.

20 years ago/tmp/motd* files were being left sitting around after a reboot when the
Matthew Dillon [Thu, 8 Apr 2004 17:35:22 +0000 (17:35 +0000)]
/tmp/motd* files were being left sitting around after a reboot when the
motd is found not to have changed.  Make sure both temporary files are
cleaned up.

20 years agoAdd support for AC'97 codec of the AMD-8111 chipset.
Joerg Sonnenberger [Thu, 8 Apr 2004 15:16:50 +0000 (15:16 +0000)]
Add support for AC'97 codec of the AMD-8111 chipset.

Obtained-From: FreeBSD kern/55932

20 years agoTCP statistics structure renamed tcpstat -> tcp_stats.
Matthew Dillon [Wed, 7 Apr 2004 21:40:19 +0000 (21:40 +0000)]
TCP statistics structure renamed tcpstat -> tcp_stats.

20 years agoMake TCP stats per-cpu. (forgot to add new header file)
Matthew Dillon [Wed, 7 Apr 2004 20:56:15 +0000 (20:56 +0000)]
Make TCP stats per-cpu. (forgot to add new header file)

Submitted-by: Hiten Pandya <hmp@crater.dragonflybsd.org>
20 years agoStyle(9) cleanup.
Chris Pressey [Wed, 7 Apr 2004 20:43:24 +0000 (20:43 +0000)]
Style(9) cleanup.

- Remove `register' keywords.
- No functional changes.

20 years agoEnable propolice (stack smashing detector) by default on gcc3.
Matthew Dillon [Wed, 7 Apr 2004 17:48:03 +0000 (17:48 +0000)]
Enable propolice (stack smashing detector) by default on gcc3.

20 years agoMake TCP stats per-cpu.
Matthew Dillon [Wed, 7 Apr 2004 17:01:27 +0000 (17:01 +0000)]
Make TCP stats per-cpu.

Submitted-by: Hiten Pandya <hmp@crater.dragonflybsd.org>
20 years agoAdjust the C++ preprocessor to include /usr/include/c++ by default for
Joerg Sonnenberger [Wed, 7 Apr 2004 14:02:41 +0000 (14:02 +0000)]
Adjust the C++ preprocessor to include /usr/include/c++ by default for
version independent C++ header files.

For GCC 2.95 this is done by adding a new define GPLUSPLUS_INCLUDE_DIR2,
for GCC 3.3 the version depend path is now included in
GPLUSPLUS_TOOL_INCLUDE_DIR and the version independ path in
GPLUSPLUS_INCLUDE_DIR. If the compiler is updated, it should be checked
that /usr/include/c++/$CCVER is still included before /usr/include/c++.

20 years agoSince GCC 2.95.4 is known to produce bad code for higher optimization
Joerg Sonnenberger [Wed, 7 Apr 2004 12:57:31 +0000 (12:57 +0000)]
Since GCC 2.95.4 is known to produce bad code for higher optimization
levels and CPU specific instructions sets, disable those for the system
C compiler. Keeping at least the C compiler working is more important
than a slight increase in compilation speed.

20 years agoCosmetic changes.
Jeffrey Hsu [Wed, 7 Apr 2004 09:36:07 +0000 (09:36 +0000)]
Cosmetic changes.

20 years agoGeneral ata malloc() flags cleanup. Use M_INTWAIT where appropriate and
Matthew Dillon [Wed, 7 Apr 2004 06:22:15 +0000 (06:22 +0000)]
General ata malloc() flags cleanup.  Use M_INTWAIT where appropriate and
get rid of unnecessary NULL checks.

20 years agoGeneral bus malloc() flags cleanup, M_NOWAIT -> M_INTWAIT. Note: leave
Matthew Dillon [Wed, 7 Apr 2004 05:54:41 +0000 (05:54 +0000)]
General bus malloc() flags cleanup, M_NOWAIT -> M_INTWAIT.  Note: leave
isa dmabuf bouncebuffer code as is (it uses malloc() M_NOWAIT and then
falls back to contigmalloc).

20 years agoGeneral netif malloc() flags cleanup. Use M_INTWAIT or M_WAITOK instead
Matthew Dillon [Wed, 7 Apr 2004 05:45:30 +0000 (05:45 +0000)]
General netif malloc() flags cleanup.  Use M_INTWAIT or M_WAITOK instead
of M_NOWAIT.  Generally use M_WAITOK in the attach code or ioctl code
typically called from userland, and M_INTWAIT for routines that might
be called during non-boot operations.  Since M*WAIT flags guarentee a
non-NULL result, also remove now-unnecessary NULL checks.

20 years agoUse hex bit values instead of decimal bit values (non operational change).
Matthew Dillon [Wed, 7 Apr 2004 05:18:19 +0000 (05:18 +0000)]
Use hex bit values instead of decimal bit values (non operational change).

20 years agoProtect nfs socket locks with a critical section. Recheck rep->r_mrep just
Matthew Dillon [Wed, 7 Apr 2004 05:15:48 +0000 (05:15 +0000)]
Protect nfs socket locks with a critical section.  Recheck rep->r_mrep just
prior to calling tsleep() in case another thread got in and handled the
request being waited for.  Rewrite the vnode scanning code in nfs_sync()
to use vmntvnodescan(), fixing a number of potential races.  Protect the
commit phase 2 scan in nfs_subs.c with the appropriate token (note: still
needs some work).

20 years agoStyle(9) cleanup to src/sys/vfs, stage 4/21: fifofs.
Chris Pressey [Tue, 6 Apr 2004 21:32:39 +0000 (21:32 +0000)]
Style(9) cleanup to src/sys/vfs, stage 4/21: fifofs.

- Convert K&R-style function definitions to ANSI style.

Submitted-by: Andre Nathan <andre@digirati.com.br>
Additional-reformatting-by: cpressey
20 years agoDo not reset %gs in signal handlers, some programs depend on it (KDE in
Matthew Dillon [Mon, 5 Apr 2004 19:15:57 +0000 (19:15 +0000)]
Do not reset %gs in signal handlers, some programs depend on it (KDE in
particular, and nvidia video driver as well).

20 years agoSubsystems which install an so_upcall may themselves call socket functions
Matthew Dillon [Mon, 5 Apr 2004 18:53:03 +0000 (18:53 +0000)]
Subsystems which install an so_upcall may themselves call socket functions
from the handler thread, which can lead to deadlocks in lwkt_domsg().

Have the netmsg service loop install its own mp_putport() function which
checks for self-referential messages (curthread == port->mp_td) and executes
them synchronously.

20 years agoExport the lwkt_default_*() message port default functions so other
Matthew Dillon [Mon, 5 Apr 2004 18:49:19 +0000 (18:49 +0000)]
Export the lwkt_default_*() message port default functions so other
code (e.g. networking) can call them.

20 years agoReadd _G_config.h and the missing std headers. This brings C++ back to where
Joerg Sonnenberger [Mon, 5 Apr 2004 18:02:51 +0000 (18:02 +0000)]
Readd _G_config.h and the missing std headers. This brings C++ back to where
it was a week ago.

20 years agoper-cpu tcbinfo[]s aren't ready for prime time yet. The tcbinfo is assigned
Matthew Dillon [Mon, 5 Apr 2004 17:47:01 +0000 (17:47 +0000)]
per-cpu tcbinfo[]s aren't ready for prime time yet.  The tcbinfo is assigned
at tcp_attach time, but there is insufficient information available at this
time to select the hash table and the wrong one gets assigned N-1 out of N
times on MP systems (N = number of cpus), causing outgoing tcp connections
to fail.

An an option, TCP_DISTRIBUTED_TCBINFO, so MP-safe tcbinfo distribution can
continue to be developed without impacting users.

20 years agoWe are DragonFly not FreeBSD, so rename the name in GENERIC, and remove the
Eirik Nygaard [Mon, 5 Apr 2004 13:44:40 +0000 (13:44 +0000)]
We are DragonFly not FreeBSD, so rename the name in GENERIC, and remove the
reference to the local handbook, which we don't have.

20 years agoConsistently use "foreign" and "local", which are invariant on the
Jeffrey Hsu [Mon, 5 Apr 2004 09:17:48 +0000 (09:17 +0000)]
Consistently use "foreign" and "local", which are invariant on the
host machine, instead of "src" and "dst", which varies according
to whether a packet is being received or sent.

20 years agoRemove makewhatis from /usr/bin (it officially resides in /usr/sbin),
Matthew Dillon [Mon, 5 Apr 2004 05:41:43 +0000 (05:41 +0000)]
Remove makewhatis from /usr/bin (it officially resides in /usr/sbin),
remove /usr/sbin/prebind (no longer exists, see 'resident').

20 years agoPartial sync from FreeBSD adds some more support and fixes. Also replace a
Matthew Dillon [Mon, 5 Apr 2004 05:34:36 +0000 (05:34 +0000)]
Partial sync from FreeBSD adds some more support and fixes.  Also replace a
number of hardwired masks with appropriately defined constants.

20 years agoBring in FreeBSD 1.2.2.2. Properly unwind the stack when certain
Matthew Dillon [Mon, 5 Apr 2004 05:31:58 +0000 (05:31 +0000)]
Bring in FreeBSD 1.2.2.2.  Properly unwind the stack when certain
failure cases occur in rfork_thread().

Submitted-by: Igor Sysoev <is@rambler-co.ru>
20 years agoFix buildworld. Document TOOLS_PREFIX and USRDATA_PREFIX, improve INCLUDEDIR
Matthew Dillon [Mon, 5 Apr 2004 05:30:13 +0000 (05:30 +0000)]
Fix buildworld.  Document TOOLS_PREFIX and USRDATA_PREFIX, improve INCLUDEDIR
documentation.  Modify bsd.incs.mk to not install header files if BOOTSTRAPPING
is set (for buildworld), and change lex to install its C++ header file in
${INCLUDEDIR}/c++ instead of ${INCLUDEDIR}/g++.  Set DESTDIR for BMAKEENV,
set BOOTSTRAPPING for XMAKE (cross build tools).  Note that DESTDIR is set
in the bootstrap-tools: target, this will be removed in a later commit.

20 years agoUndo the last commit. Utility programs which install c++ includes have no
Matthew Dillon [Mon, 5 Apr 2004 02:03:24 +0000 (02:03 +0000)]
Undo the last commit.  Utility programs which install c++ includes have no
knowledge and should have no knowledge of particular compiler versions
installed.  They should install their C++ header files in one place.

20 years agoQuake 3 server (running under linux emulation) was failing with odd '
Matthew Dillon [Mon, 5 Apr 2004 00:06:02 +0000 (00:06 +0000)]
Quake 3 server (running under linux emulation) was failing with odd '
Protocol not available' errors.  The problem turned out to be the internal
IP_HDRINCL check that the linux emulation code in the kernel was doing in
linux_sendto().  If the internal check fails with an error, the emulation
code should simply assume that IP_HDRINCL is off rather then return the error.

The bug was introduced during the syscall separation work on this module.
FreeBSD-4.x properly ignores the error.  This patch restores behavior for
DFly.

Reported-by: Sascha Wildner <saw@online.de>
20 years agoFix a missing wildcard binding in the recent wildcard binding hash table work.
Matthew Dillon [Sun, 4 Apr 2004 22:13:38 +0000 (22:13 +0000)]
Fix a missing wildcard binding in the recent wildcard binding hash table work.
This prevented YP from working properly.

Reported-by: Richard Nyberg <rnyberg@it.su.se>
Patch-Supplied-by: Jeffrey Hsu <hsu@freebsd.org>
20 years agoCorrect C++ header handling for gcc2 and lex.
Joerg Sonnenberger [Sun, 4 Apr 2004 21:31:14 +0000 (21:31 +0000)]
Correct C++ header handling for gcc2 and lex.

gcc2 used a "beforeinstall" target instead of the standard bsd.incs.mk way.
Therefore certain headers weren't correctly installed when doing an
installincludes or "make includes" from the src root. The cc1plus part was
still installed to the old location and that broke e.g. textproc/jade.

lex installed its C++ interface still to /usr/include/g++, until a decision
about a generic C++ header location is made, a version for both system
compilers is installed.

20 years agoSetting the date/time does not always properly write-back the RTC, causing
Matthew Dillon [Sun, 4 Apr 2004 08:00:06 +0000 (08:00 +0000)]
Setting the date/time does not always properly write-back the RTC, causing
the date/time to be wrong again after a reboot.  This was due to the recent
systimer changes which updated the 'time_second' global via hardclock() only.
Change the writeback code to use microtime() instead of time_second.

Reported-by: esmith <esmith@patmedia.net>
20 years agoPerl is no longer needed by buildworld/buildkernel.
Matthew Dillon [Sun, 4 Apr 2004 01:08:18 +0000 (01:08 +0000)]
Perl is no longer needed by buildworld/buildkernel.

Submitted-by: YONETANI Tomokazu <qhwt+dragonfly-kernel@les.ath.cx>
20 years agoCleanup NXENV so it works properly when running buildworld from FreeBSD.
Matthew Dillon [Sat, 3 Apr 2004 23:07:14 +0000 (23:07 +0000)]
Cleanup NXENV so it works properly when running buildworld from FreeBSD.

20 years agoDispatch reassembled fragment.
Jeffrey Hsu [Sat, 3 Apr 2004 22:18:30 +0000 (22:18 +0000)]
Dispatch reassembled fragment.

20 years agoFix byte-order.
Jeffrey Hsu [Sat, 3 Apr 2004 22:17:59 +0000 (22:17 +0000)]
Fix byte-order.

20 years agoCreate a normal stack frame in generic_bcopy() to aid debugging, so
Matthew Dillon [Sat, 3 Apr 2004 08:21:16 +0000 (08:21 +0000)]
Create a normal stack frame in generic_bcopy() to aid debugging, so
backtraces work properly.

20 years agoFix bugs in xio_copy_*(). We were not using the masked offset when
Matthew Dillon [Sat, 3 Apr 2004 08:20:10 +0000 (08:20 +0000)]
Fix bugs in xio_copy_*().  We were not using the masked offset when
calculation the number of bytes to copy from the first indexed page,
leading to a negative 'n' calculation in situations that could be
triggered with a ^C on programs using pipes (such as a buildworld).
This almost universally resulted in a panic.

20 years agoAdd `device atapicam' to unbreak TINDERBOX config.
Hiten Pandya [Sat, 3 Apr 2004 07:14:08 +0000 (07:14 +0000)]
Add `device atapicam' to unbreak TINDERBOX config.

20 years agoIn the sysclock commit I tried to make 'boottime' a fixed value, but it
Matthew Dillon [Sat, 3 Apr 2004 05:30:10 +0000 (05:30 +0000)]
In the sysclock commit I tried to make 'boottime' a fixed value, but it
ended up being set to the superblock update time (time of last shutdown)
rather then the real time clock during boot.

Give up on making it a fixed value and just set it to the current time
minus the compensated elapsed time (gd->gd_time_seconds) whenever the
time of day is stepped.  Subsystems which use boottime as an identifier,
such as NFS, already copy it and this change effectively returns boottime
operation to its pre-sysclock algorithm.

Reported-by: YONETANI Tomokazu <qhwt+dragonfly-bugs@les.ath.cx> and others
20 years agoAllocate the DMA segment array in bus_dma_tag_create instead of using a
Joerg Sonnenberger [Fri, 2 Apr 2004 18:16:45 +0000 (18:16 +0000)]
Allocate the DMA segment array in bus_dma_tag_create instead of using a
local variable in bus_dmamap_create et al.

20 years agoCorrect the commented-out example for MODULES_OVERRIDE.
Matthew Dillon [Fri, 2 Apr 2004 18:09:38 +0000 (18:09 +0000)]
Correct the commented-out example for MODULES_OVERRIDE.

Submitted-by: Dheeraj Reddy <dheerajs@comcast.net>
20 years agoGarbage-collect unused variable.
Hiten Pandya [Fri, 2 Apr 2004 12:45:40 +0000 (12:45 +0000)]
Garbage-collect unused variable.

20 years agoAdapt the netisr message handlers to accomodate the available error
Hiten Pandya [Fri, 2 Apr 2004 12:32:27 +0000 (12:32 +0000)]
Adapt the netisr message handlers to accomodate the available error
handling facility by returning an appropriate error value.

While I am there, move pppintr() to top of file to simplify things.

This commit is a followup to: rev. 1.2 of src/sys/kern/uipc_msg.c

20 years agoAdd Makefile for the netif/ie ISA NIC driver.
Hiten Pandya [Fri, 2 Apr 2004 11:31:27 +0000 (11:31 +0000)]
Add Makefile for the netif/ie ISA NIC driver.

20 years agoThe globaldata houses a pointer and not an embedded struct for nchstats;
Hiten Pandya [Fri, 2 Apr 2004 10:50:23 +0000 (10:50 +0000)]
The globaldata houses a pointer and not an embedded struct for nchstats;
use the correct access method.

Fixes build of ext2fs kernel module.

20 years agoMake buildkernel's require a buildworld to be done first, because they
Matthew Dillon [Fri, 2 Apr 2004 06:21:36 +0000 (06:21 +0000)]
Make buildkernel's require a buildworld to be done first, because they
no longer munge the pathes to use native apps when buildworld tools aren't
available.

Buildkernel now tells you this and exits if it doesn't think you've done
a buildworld.

Add a new target, 'nativekernel', which just runs config and uses native
tools to build the kernel.  'nativekernel' and 'buildkernel' use the same
object directory but are mutually exclusive.  If you run one, then try to run
the other, it will wipe the directory and start over.

20 years agoPer-CPU VFS Namecache Effectiveness Statistics:
Hiten Pandya [Fri, 2 Apr 2004 05:46:03 +0000 (05:46 +0000)]
Per-CPU VFS Namecache Effectiveness Statistics:

* Convert nchstats into a CPU indexed array

* Export the per-CPU nchstats as a sysctl vfs.cache.nchstats
  and let user-land aggregate them.

* Add a function called kvm_nch_cpuagg() to libkvm; it is
  shared by systat(1) and vmstat(1) and the ncache-stats test
  program.  As the function name suggests, it aggregates
  the per-CPU nchstats.

* Move struct nchstats into a separate header to avoid
  header file namespace pollution; sys/nchstats.h.

* Keep a cached copy of the globaldata pointer in the VFS
  specific LOOKUP op, and use that to increment the
  namecache effectiveness counters (nchstats).

* Modify systat(1) and vmstat(1) to accomodate the new
  behavior of accessing nchstats.  Remove a (now) redundant
  sysctl to get the cpu count (hw.ncpu), instead we just divide
  the total length of the nchstats array returned by sysctl
  by sizeof(struct nchstats) to get the CPU count.

* Garbage-collect unused variables and fix nearby warnings
  in systat(1) an vmstat(1).

* Add a very-cool test program, that prints the nchstats
  per-CPU statistics to show CPU distribution.  Here is the
  output it generates on an 2-processor SMP machine:

  gray# ncache-stats
  VFS Name Cache Effectiveness Statistics
     4207370 total name lookups
  COUNTER             CPU-1       CPU-2           TOTAL
  goodhits            2477657     1060677         (3538334  )
  neghits             107531      47294           (154825   )
  badhits             28968       7720            (36688    )
  falsehits           0           0               (0        )
  misses              339671      137852          (477523   )
  longnames           0           0               (0        )
  passes 2            13104       6813            (19917    )
  2-passes            25134       15257           (40391    )

The SMP machine used for testing this commit was proudly presented
by David Rhodus <drhodus@dragonflybsd.org>.

Reviewed-by: Matthew Dillon <dillon@backplane.com>

20 years agoConsolidate length checks in ip_demux().
Jeffrey Hsu [Thu, 1 Apr 2004 23:04:50 +0000 (23:04 +0000)]
Consolidate length checks in ip_demux().

20 years agoStyle(9) cleanup to src/sys/vfs, stage 3/21: fdesc.
Chris Pressey [Thu, 1 Apr 2004 19:08:15 +0000 (19:08 +0000)]
Style(9) cleanup to src/sys/vfs, stage 3/21: fdesc.

- Convert K&R-style function definitions to ANSI style.

Submitted-by: Andre Nathan <andre@digirati.com.br>
Additional-reformatting-by: cpressey
20 years agoEnhance the pmap_kenter*() API and friends, separating out entries which
Matthew Dillon [Thu, 1 Apr 2004 17:58:08 +0000 (17:58 +0000)]
Enhance the pmap_kenter*() API and friends, separating out entries which
only need invalidation on the local cpu against entries which need invalidation
across the entire system, and provide a synchronization abstraction.

Enhance sf_buf_alloc() and friends to allow the caller to specify whether the
sf_buf's kernel mapping is going to be used on just the current cpu or
whether it needs to be valid across all cpus.  This is done by maintaining
a cpumask of known-synchronized cpus in the struct sf_buf

Optimize sf_buf_alloc() and friends by removing both TAILQ operations in the
critical path.  TAILQ operations to remove the sf_buf from the free queue
are now done in a lazy fashion.  Most sf_buf operations allocate a buf,
work on it, and free it, so why waste time moving the sf_buf off the freelist
if we are only going to move back onto the free list a microsecond later?

Fix a bug in sf_buf_alloc() code as it was being used by the PIPE code.
sf_buf_alloc() was unconditionally using PCATCH in its tsleep() call, which
is only correct when called from the sendfile() interface.

Optimize the PIPE code to require only local cpu_invlpg()'s when mapping
sf_buf's, greatly reducing the number of IPIs required.  On a DELL-2550,
a pipe test which explicitly blows out the sf_buf caching by using huge
buffers improves from 350 to 550 MBytes/sec.  However, note that buildworld
times were not found to have changed.

Replace the PIPE code's custom 'struct pipemapping' structure with a
struct xio and use the XIO API functions rather then its own.

20 years agoFix an unused variable warning (non-operational).
Matthew Dillon [Thu, 1 Apr 2004 17:41:19 +0000 (17:41 +0000)]
Fix an unused variable warning (non-operational).

20 years agoImplement a convenient gd_cpumask so we don't have to do 1 << gd->gd_cpuid
Matthew Dillon [Thu, 1 Apr 2004 17:40:59 +0000 (17:40 +0000)]
Implement a convenient gd_cpumask so we don't have to do 1 << gd->gd_cpuid
all the time.

20 years agoConvert sis(4) from vtophys to busdma.
Joerg Sonnenberger [Thu, 1 Apr 2004 16:24:57 +0000 (16:24 +0000)]
Convert sis(4) from vtophys to busdma.

Obtained-from: FreeBSD 5

20 years agoKObj extension stage II/III
Joerg Sonnenberger [Thu, 1 Apr 2004 13:50:47 +0000 (13:50 +0000)]
KObj extension stage II/III

Tokenize kobj to make it SMP safe. This is based on the assumption that
drivers are responsible for not removing active devices. This allows us
to avoid all locks / critical sections for method lookup and object
instantiation / uninstanziation, leaving only the class management.

20 years agoKObj extension stage I/III
Joerg Sonnenberger [Thu, 1 Apr 2004 08:41:24 +0000 (08:41 +0000)]
KObj extension stage I/III

Isolate the reference counting for kobj classes in special functions to
allow clean locking in the next step.

Merge all calls of kobj_class_compile either into the new
kobj_class_instantiate or into kobj_init and make it static. Same for
kobj_class_free.

Remove kobj_class_compile_static, it wasn't used and is pretty pointless
since the kobj framework is not used before the VM subsystem has been
initialized.

20 years agoRemove struct driver and make driver_t directly defined as kobj_class.
Joerg Sonnenberger [Thu, 1 Apr 2004 07:33:18 +0000 (07:33 +0000)]
Remove struct driver and make driver_t directly defined as kobj_class.
The additional *priv field is only used by the ISA/PCI compat shims and
those can use a local struct instead.

20 years agoAdd the "struct ucred *" argument to the remaining nic ioctls in LINT.
Joerg Sonnenberger [Thu, 1 Apr 2004 07:27:17 +0000 (07:27 +0000)]
Add the "struct ucred *" argument to the remaining nic ioctls in LINT.

20 years agoFix warning about unused variable
Joerg Sonnenberger [Thu, 1 Apr 2004 06:52:45 +0000 (06:52 +0000)]
Fix warning about unused variable

20 years agoRemove unused obsolete drivers.
Joerg Sonnenberger [Thu, 1 Apr 2004 06:23:18 +0000 (06:23 +0000)]
Remove unused obsolete drivers.

20 years agoGet rid of the upper-end malloc() limit for the pipe throughput test.
Matthew Dillon [Thu, 1 Apr 2004 01:47:44 +0000 (01:47 +0000)]
Get rid of the upper-end malloc() limit for the pipe throughput test.

20 years agoRemove the ip_mthread_enable sysctl option. Enough code has been converted
Jeffrey Hsu [Thu, 1 Apr 2004 01:38:53 +0000 (01:38 +0000)]
Remove the ip_mthread_enable sysctl option.  Enough code has been converted
over to threads and message-passing that true dispatching is required for
proper synchronization.

Approved by: Matt Dillon

20 years agoStyle(9) cleanup.
Chris Pressey [Wed, 31 Mar 2004 23:20:22 +0000 (23:20 +0000)]
Style(9) cleanup.

- Convert K&R-style function definitions to ANSI style.
- Remove `register' keywords.
- Remove casts to void when ignoring return values.
- Remove explicit `return' at end of void functions.
- Additional minor whitespace and formatting adjustments.
- No functional changes.

20 years agoStyle(9) cleanup to src/sys/vfs, stage 2/21: deadfs.
Chris Pressey [Wed, 31 Mar 2004 23:13:43 +0000 (23:13 +0000)]
Style(9) cleanup to src/sys/vfs, stage 2/21: deadfs.

- Convert K&R-style function definitions to ANSI style.

Submitted-by: Andre Nathan <andre@digirati.com.br>
Additional-reformatting-by: cpressey
20 years agoAdd missing sf_buf_free()'s.
Matthew Dillon [Wed, 31 Mar 2004 22:08:32 +0000 (22:08 +0000)]
Add missing sf_buf_free()'s.

Reported-by: Jonathan Lemon <jlemon@flugsvamp.com>
20 years agoCorrect type slippage in previous commit: a u_int was accidentally
Chris Pressey [Wed, 31 Mar 2004 21:03:38 +0000 (21:03 +0000)]
Correct type slippage in previous commit: a u_int was accidentally
turned into a u_long.  Change it back.

20 years agoCleanup the forking behavior of the CAPS client test program.
Matthew Dillon [Wed, 31 Mar 2004 20:27:34 +0000 (20:27 +0000)]
Cleanup the forking behavior of the CAPS client test program.

20 years agoAllow the child priority (receive side of the pipe test) to be specified
Matthew Dillon [Wed, 31 Mar 2004 20:27:09 +0000 (20:27 +0000)]
Allow the child priority (receive side of the pipe test) to be specified
on the command line.  Default it to be the same as the parent.

20 years agoClarify the purpose of liby:
Chris Pressey [Wed, 31 Mar 2004 20:25:37 +0000 (20:25 +0000)]
Clarify the purpose of liby:

- Mention it in the FILES section of the yacc(1) man page.
- Create an MLINK from liby.3 -> yacc.1 so users can `man liby'.

Approved-by: dillon
20 years agoCleanup libcaps to support recent LWKT changes. Add TDF_SYSTHREAD back
Matthew Dillon [Wed, 31 Mar 2004 20:23:42 +0000 (20:23 +0000)]
Cleanup libcaps to support recent LWKT changes.  Add TDF_SYSTHREAD back
to sys/thread.h (libcaps needs it).

20 years agoRemove `-ly' and `${LIBY}' from our Makefiles. Linking to liby is not
Chris Pressey [Wed, 31 Mar 2004 20:22:14 +0000 (20:22 +0000)]
Remove `-ly' and `${LIBY}' from our Makefiles.  Linking to liby is not
necessary for any of our programs, as they all supply their own main()
and yyerror() functions.

Also add $DragonFly$ to these files as needed for the commit.

Approved-by: dillon
20 years agoTrash the vmspace_copy() hacks that CAPS was previously using. No other
Matthew Dillon [Wed, 31 Mar 2004 19:29:26 +0000 (19:29 +0000)]
Trash the vmspace_copy() hacks that CAPS was previously using.  No other
subsystem uses these hacks and the new XIO mechanism is far, far superior.

20 years agoChange CAPS over to use XIO instead of the vmspace_copy() junk it was using
Matthew Dillon [Wed, 31 Mar 2004 19:28:29 +0000 (19:28 +0000)]
Change CAPS over to use XIO instead of the vmspace_copy() junk it was using
before.  This almost doubles CAPS IPC messaging performance.

Also correct a number of memory leaks due to incorrect reference counting.

20 years agoHook XIO up to the kernel build.
Matthew Dillon [Wed, 31 Mar 2004 19:24:28 +0000 (19:24 +0000)]
Hook XIO up to the kernel build.

20 years agoInitial XIO implementation. XIOs represent data through a list of VM pages
Matthew Dillon [Wed, 31 Mar 2004 19:24:17 +0000 (19:24 +0000)]
Initial XIO implementation.  XIOs represent data through a list of VM pages
rather then mapped KVM, allowing them to be passed between threads without
having to worry about KVM mapping overheads, TLB invalidation, and so forth.

This initial implementation supports creating XIOs from user or kernel data
and copying from an XIO to a user or kernel buffer or a uio.  XIO are intended
to be used with CAPS, PIPES, VFS, DEV, and other I/O paths.

The XIO concept is an outgrowth of Alan Cox'es unique use of target-side
SF_BUF mapping to improve pipe performance.

20 years agoM_NOWAIT => M_INTWAIT conversion. This subsystems are way too crucial to
Joerg Sonnenberger [Wed, 31 Mar 2004 16:39:20 +0000 (16:39 +0000)]
M_NOWAIT => M_INTWAIT conversion. This subsystems are way too crucial to
have failing memory allocations. At least some of same are handled via
panic anyway.

20 years agoThe existing hash algorithm in bufhash() does not distribute entries
David Rhodus [Wed, 31 Mar 2004 15:32:53 +0000 (15:32 +0000)]
The existing hash algorithm in bufhash() does not distribute entries
very well across buckets, especially in the case of cylinder group blocks
which are located at a sequence of locations that are a multiple of a large
power of two apart.  In the case of large file systems, one or possibly
a few of the hash chains can get excessively long.  Replace the existing
hash algorithm with a variation on the Fibonacci hash.

Merged from FreeBSD

20 years agoOnly enter into wildcard hash table if bind succeeds.
Jeffrey Hsu [Wed, 31 Mar 2004 10:23:10 +0000 (10:23 +0000)]
Only enter into wildcard hash table if bind succeeds.

20 years agoOnly enter into wildcard hash table if bind succeeds.
Jeffrey Hsu [Wed, 31 Mar 2004 07:21:38 +0000 (07:21 +0000)]
Only enter into wildcard hash table if bind succeeds.

20 years agoStyle(9) cleanup to src/sys/vfs, stage 1/21: coda.
Chris Pressey [Wed, 31 Mar 2004 02:34:37 +0000 (02:34 +0000)]
Style(9) cleanup to src/sys/vfs, stage 1/21: coda.

- Convert K&R-style function definitions to ANSI style.

Submitted-by: Andre Nathan <andre@digirati.com.br>
Minor-whitespace-tweaks-by: cpressey
20 years agoOnly enter wildcard sockets into the wildcard hash table.
Jeffrey Hsu [Wed, 31 Mar 2004 00:43:09 +0000 (00:43 +0000)]
Only enter wildcard sockets into the wildcard hash table.

20 years agoSecond major scheduler patch. This corrects interactive issues that were
Matthew Dillon [Tue, 30 Mar 2004 19:14:18 +0000 (19:14 +0000)]
Second major scheduler patch.  This corrects interactive issues that were
introduced in the pipe sf_buf patch.

Split need_resched() into need_user_resched() and need_lwkt_resched().
Userland reschedules are requested when a process is scheduled with a higher
priority then the currently running process, and LWKT reschedules are
requested when a thread is scheduled with a higher priority then the
currently running thread.  As before, these are ASTs, LWKTs are not
preemptively switch while running in the kernel.

Exclusively use the resched wanted flags to determine whether to reschedule
or call lwkt_switch() upon return to user mode.  We were previously also
testing the LWKT run queue for higher priority threads, but this was causing
inefficient scheduler interactions when two processes are doing tightly
bound synchronous IPC (e.g. using PIPEs) because in DragonFly the LWKT
priority of a thread is raised when it enters the kernel, and lowered when
it tries to return to userland.  The wakeups occuring in the pipe code
were causing extra quick-flip thread switches.

Introduce a new tsleep() flag which disables the need_lwkt_resched() call
when the sleeping thread is woken up.   This is used by the PIPE code in
the synchronous direct-write PIPE case to avoid the above problem.

Redocument and revamp the ESTCPU code.  The original changes reduced the
interrupt rate from 100Hz (FBsd-4 and FBsd-5) to 20Hz, but did not compensate
for the slower ramp-up time.  This commit introduces a 'virtual' ESTCPU
frequency which compensates without us having to bump up the actual systimer
interrupt rate.

Redo the P_CURPROC methodology, which is used by the userland scheduler
to manage processes running in userland.  Create a globaldata->gd_uschedcp
process pointer which represents the current running-in-userland (or about
to be running in userland) process, and carefully recode acquire_curproc()
to allow this gd_uschedcp designation to be stolen from other threads trying
to return to userland without having to request a reschedule (which would
have to switch back to those threads to release the designation).  This
reduces the number of unnecessary context switches that occur due to
scheduler interactions.  Also note that this specifically solves the case
where there might be several threads running in the kernel which are trying
to return to userland at the same time.  A heuristic check against gd_upri
is used to select the correct thread for schedling to userland 'most of the
time'.  When the correct thread is not selected, we fall back to the old
behavior of forcing a reschedule.

Add debugging sysctl variables to better track userland scheduler efficiency.

With these changes pipe statistics are further improved.  Though some
scheduling aberrations still exist(1), the previous scheduler had totally
broken interactive processes and this one does not.

BLKSIZE BEFORE NEWPIPE NOW     Tests on AMD64
MBytes/s MBytes/s MBytes/s 3200+ FN85MB
    (64KB L1, 1MB L2)
256KB 1900 2200 2250
 64KB 1800 2200 2250
 32KB - - 3300
 16KB 1650 2500-3000 2600-3200
  8KB 1400 2300 2000-2400(1)
  4KB 1300 1400-1500 1500-1700

20 years agoAdd SI_SUB_LOCK as sysinit priority for the initialisation of tokens and
Joerg Sonnenberger [Tue, 30 Mar 2004 17:18:58 +0000 (17:18 +0000)]
Add SI_SUB_LOCK as sysinit priority for the initialisation of tokens and
lockmgr locks. This priority should not be abused, since it is higher then
SI_SUB_VM.

20 years agoStyle(9) cleanup.
Chris Pressey [Tue, 30 Mar 2004 02:59:00 +0000 (02:59 +0000)]
Style(9) cleanup.

- Convert K&R-style function definitions to ANSI style.
- Remove ``register'' keywords.
- Remove casts to (void) when ignoring return values.
- Sort #include's.
- Minor whitespace adjustments.
- No functional changes.

20 years agoStyle(9) cleanup.
Chris Pressey [Tue, 30 Mar 2004 02:30:59 +0000 (02:30 +0000)]
Style(9) cleanup.

- Sort #include's.
- Convert to fully ANSI function definitions: use (void) instead of ()
- No functional changes.

20 years agoStyle(9) cleanup.
Chris Pressey [Tue, 30 Mar 2004 01:14:22 +0000 (01:14 +0000)]
Style(9) cleanup.

- Convert K&R-style function definitions to ANSI style.
- Remove ``register'' keywords.
- char *argv[] -> char **argv
- No functional changes.

20 years agoProtect the mntvnode scan for coda with the proper token. Since we do not
Matthew Dillon [Mon, 29 Mar 2004 20:52:17 +0000 (20:52 +0000)]
Protect the mntvnode scan for coda with the proper token.  Since we do not
block we do not need to use the vmntvnodescan() facility (as long as we
take the appropriate precautions), and we can remove the v_mount != mp
test.

20 years agoCount vnodes held on the mount list simply by using the
Matthew Dillon [Mon, 29 Mar 2004 20:43:52 +0000 (20:43 +0000)]
Count vnodes held on the mount list simply by using the
mp->mnt_nvnodelistsize field, instead of physically looping on the
vnode list.

Suggested-by: someone, not sure. Hiten or David maybe.