14 years agoWell, ok, if you are going to turn off writable strings, then the code
Matthew Dillon [Tue, 13 Jun 2006 22:12:16 +0000 (22:12 +0000)]
Well, ok, if you are going to turn off writable strings, then the code
has to be fixed to not actually try to write to strings :-)

14 years agoAdd two more system calls, __accept and __connect. The old accept() and
Matthew Dillon [Tue, 13 Jun 2006 21:04:17 +0000 (21:04 +0000)]
Add two more system calls, __accept and __connect.  The old accept() and
connect() are still present but will eventually be replaced with a libc

The new system calls add a flags argument, allowing O_FBLOCKING
or O_FNONBLOCKING to be passed to override the non-blocking setting in
the file pointer.  They are intended to be used by libc_r.

14 years agoUse the _SELECT_DECLARED method to include the select() prototype instead
Matthew Dillon [Tue, 13 Jun 2006 20:01:53 +0000 (20:01 +0000)]
Use the _SELECT_DECLARED method to include the select() prototype instead
of what we had before.  This brings the handling of the prototype inline
with others.

14 years agoAlso obey securenets when TCP wrappers are enabled.
Simon Schubert [Tue, 13 Jun 2006 12:38:37 +0000 (12:38 +0000)]
Also obey securenets when TCP wrappers are enabled.

Submitted-by: Ancient
Taken-from: FreeBSD SA

14 years agoDon't allow backslash characters in smbfs path requests.
Simon Schubert [Tue, 13 Jun 2006 12:31:57 +0000 (12:31 +0000)]
Don't allow backslash characters in smbfs path requests.

Submitted-by: Ancient
Taken-from: FreeBSD SA

14 years agoThe pread/preadv/pwrite/pwritev system calls have been renamed. Create
Matthew Dillon [Tue, 13 Jun 2006 08:17:42 +0000 (08:17 +0000)]
The pread/preadv/pwrite/pwritev system calls have been renamed.  Create
wrappers in libc for the renamed functions.

14 years agoAdd kernel syscall support for explicit blocking and non-blocking I/O
Matthew Dillon [Tue, 13 Jun 2006 08:12:04 +0000 (08:12 +0000)]
Add kernel syscall support for explicit blocking and non-blocking I/O
regardless of the setting applied to the file pointer.

send/sendmsg/sendto/recv/recvmsg/recfrom: New MSG_ flags defined in
sys/socket.h may be passed to these functions to override the settings
applied to the file pointer on a per-I/O basis.

MSG_FBLOCKING - Force the operation to be blocking
MSG_FNONBLOCKING- Force the operation to be non-blocking

pread/preadv/pwrite/pwritev: These system calls have been renamed and
wrappers will be added to libc.  The new system calls are prefixed with
a double underscore (like getcwd vs __getcwd) and include an additional
flags argument.  The new flags are defined in sys/fcntl.h and may be
used to override settings applied to the file pointer on a per-I/O basis.

Additionally, the internal __ versions of these functions now accept an
offset of -1 to mean 'degenerate into a read/readv/write/writev' (i.e.
use the offset in the file pointer and update it on completion).

O_FBLOCKING - Force the operation to be blocking
O_FNONBLOCKING - Force the operation to be non-blocking
O_FAPPEND - Force the write operation to append (to a regular file)
O_FOFFSET - (implied of the offset != -1) - offset is valid
O_FSYNCWRITE - Force a synchronous write
O_FASYNCWRITE - Force an asynchronous write
O_FUNBUFFERED - Force an unbuffered operation (O_DIRECT)
O_FBUFFERED - Force a buffered operation (negate O_DIRECT)

If the flags do not specify an operation (e.g. neither FBLOCKING or
FNONBLOCKING are set), then the settings in the file pointer are used.

The original system calls will become wrappers in libc, without the flags
arguments.  The new system calls will be made available to libc_r to allow
it to perform non-blocking I/O without having to mess with a descriptor's
file flags.

NOTE: the new __pread and __pwrite system calls are backwards compatible
with the originals due to a pad byte that libc always set to 0.
The new __preadv and __pwritev system calls are NOT backwards compatible,
but since they were added to HEAD just two months ago I have decided
to not renumber them either.

NOTE: The subrev has been bumped to 1.5.4 and installworld will refuse to
install if you are not running at least a 1.5.4 kernel.

14 years agoDon't depend on POSIX namespace pollution with u_char from sys/types.h.
Joerg Sonnenberger [Sun, 11 Jun 2006 22:54:18 +0000 (22:54 +0000)]
Don't depend on POSIX namespace pollution with u_char from sys/types.h.

14 years agoInclude unistd.h to get isatty(). Has been lurking in my release tree
Joerg Sonnenberger [Sun, 11 Jun 2006 12:57:31 +0000 (12:57 +0000)]
Include unistd.h to get isatty(). Has been lurking in my release tree
for ages.

14 years agosync with FreeBSD rev 1.6:
YONETANI Tomokazu [Sun, 11 Jun 2006 08:43:34 +0000 (08:43 +0000)]
sync with FreeBSD rev 1.6:
This driver doesn't need to include <sys/bus_private.h> either.

14 years agoRemove the select_curproc vector from the usched structure. It is used
Matthew Dillon [Sat, 10 Jun 2006 20:19:39 +0000 (20:19 +0000)]
Remove the select_curproc vector from the usched structure.  It is used
locally within each scheduler but is not called by the kernel through
its vector.

Suggested-by: Michal Belczyk <belczyk@bsd.krakow.pl>
14 years agoMove selinfo stuff to the separate header sys/selinfo.h. Make sys/select.h
Matthew Dillon [Sat, 10 Jun 2006 20:00:17 +0000 (20:00 +0000)]
Move selinfo stuff to the separate header sys/selinfo.h.  Make sys/select.h
POSIX compatible.

Note: Modifications from the original patch.  For the moment maintain
compatibility with BSD manual pages by ensuring that the prototype for
the select() function is declared in both sys/select.h and unistd.h.

Submitted-by: Alexey Slynko <slynko@tronet.ru>
14 years agoWe shouldn't have to fninit to make the FP unit usable for MMX based copies.
Matthew Dillon [Sat, 10 Jun 2006 18:07:05 +0000 (18:07 +0000)]
We shouldn't have to fninit to make the FP unit usable for MMX based copies.
fnclex should be sufficient.

Reported-by: "Attilio Rao" <attilio@freebsd.org>
Info-originally-from: Bruce Evans

14 years agoFix namespace pollution.
Matthew Dillon [Sat, 10 Jun 2006 17:37:08 +0000 (17:37 +0000)]
Fix namespace pollution.

Submitted-by: Alexey Slynko <slynko@tronet.ru>
14 years agoFix typo.
Sascha Wildner [Sat, 10 Jun 2006 15:52:07 +0000 (15:52 +0000)]
Fix typo.

14 years agoAdd a new utility, 'pctrack', which dumps program counter tracking data
Matthew Dillon [Thu, 8 Jun 2006 18:48:30 +0000 (18:48 +0000)]
Add a new utility, 'pctrack', which dumps program counter tracking data
recorded by the kernel.  The kernel must be compiled with DEBUG_PCTRACK.

14 years agoAdd an option, DEBUG_PCTRACK, which will record the program counter of
Matthew Dillon [Thu, 8 Jun 2006 18:25:48 +0000 (18:25 +0000)]
Add an option, DEBUG_PCTRACK, which will record the program counter of
the code being interrupted from the statistics clock interrupt.

14 years agoRemove the asynchronous system call interface sendsys/waitsys. It was an
Matthew Dillon [Wed, 7 Jun 2006 03:02:11 +0000 (03:02 +0000)]
Remove the asynchronous system call interface sendsys/waitsys.  It was an
idea before its time.

14 years agoAdd missing crit_exit()
Matthew Dillon [Tue, 6 Jun 2006 19:30:12 +0000 (19:30 +0000)]
Add missing crit_exit()

Reported-by: Sascha Wildner <saw@online.de>
14 years agoSome netisr's are just used to wakeup a driver via schednetisr(). The
Matthew Dillon [Tue, 6 Jun 2006 18:04:16 +0000 (18:04 +0000)]
Some netisr's are just used to wakeup a driver via schednetisr().  The
netmsg's sent to these ISR's must be replied whereas the netmsg's sent
to packet-handling ISRs must not be replied because the netmsg is embedded
in the mbuf).

In the case of notifications via schednetisr(), we reply the message before
we run the queue in order to interlock the wakeup message with the queue.
Otherwise we could end up with a race that leaves packets in the queue
without a wakeup to process them.

Reported-by: Stefan Krueger <skrueger@meinberlikomm.de>
Investigated-by: YONETANI Tomokazu <qhwt+dfly@les.ath.cx>
14 years agoCleanup crit_*() usage to reduce bogus warnings printed to the console
Matthew Dillon [Mon, 5 Jun 2006 21:03:03 +0000 (21:03 +0000)]
Cleanup crit_*() usage to reduce bogus warnings printed to the console
when a kernel is compiled with DEBUG_CRIT_SECTIONS.

NOTE: DEBUG_CRIT_SECTIONS does a direct pointer comparison rather than a
strcmp in order to reduce overhead.  Supply a string constant in cases
where the string identifier might be (intentionally) different otherwise.

14 years agoAdd an INVARIANTS test in both the trap code and system call code. The
Matthew Dillon [Mon, 5 Jun 2006 20:59:19 +0000 (20:59 +0000)]
Add an INVARIANTS test in both the trap code and system call code.  The
system will now panic if the critical section count recorded at the beginning
of a system call or trap does not match the count recorded at the end.

14 years agoRemove an inappropriate crit_exit() in ehci.c and add a missing crit_exit()
Matthew Dillon [Mon, 5 Jun 2006 20:56:54 +0000 (20:56 +0000)]
Remove an inappropriate crit_exit() in ehci.c and add a missing crit_exit()
in kern/vfs_subr.c.  Specify string IDs in vfsync_bp() so we don't get
complaints on the console when the kernel is compiled with

The missing crit_exit() in kern/vfs_subr.c was causing the kernel to leave
threads in a critical section, causing interrupts to stop operating and
cpu-bound userland programs to lock up the rest of the system.

Reported-by: Sascha Wildner <saw@online.de>, others.
14 years agoRemove lwp_cpumask assignment. lwp_cpumask is handled in the bcopy section.
Matthew Dillon [Mon, 5 Jun 2006 18:02:14 +0000 (18:02 +0000)]
Remove lwp_cpumask assignment. lwp_cpumask is handled in the bcopy section.

14 years agoFix a WARNS=3 gcc warning related to longjmp clobbers, fix a possible use
Matthew Dillon [Mon, 5 Jun 2006 15:55:13 +0000 (15:55 +0000)]
Fix a WARNS=3 gcc warning related to longjmp clobbers, fix a possible use
of an uninitalized variable, retval, in histedit.c.

Submitted-by: "Douglas S. Keester" <dkeester@comcast.net>
14 years agoFix a file descriptor leak in cam_lookup_pass() when the ioctl to
Matthew Dillon [Mon, 5 Jun 2006 15:02:24 +0000 (15:02 +0000)]
Fix a file descriptor leak in cam_lookup_pass() when the ioctl to
find the passthru device fails.

Submitted-by: Gary <gary@velocity-servers.net>
Obtained from: FreeBSD

14 years agoModify kern/makesyscall.sh to prefix all kernel system call procedures
Matthew Dillon [Mon, 5 Jun 2006 07:26:11 +0000 (07:26 +0000)]
Modify kern/makesyscall.sh to prefix all kernel system call procedures
with "sys_".  Modify all related kernel procedures to use the new naming
convention.  This gets rid of most of the namespace overloading between
the kernel and standard header files.

14 years agoFix a minor bug in the last commit. lwp_cpumask has to be in the LWP copy
Matthew Dillon [Mon, 5 Jun 2006 07:23:19 +0000 (07:23 +0000)]
Fix a minor bug in the last commit.  lwp_cpumask has to be in the LWP copy
section, not the LWP zero section.  This prevented the system from booting.

14 years agoRegenerate.
David Xu [Mon, 5 Jun 2006 00:35:05 +0000 (00:35 +0000)]

14 years agoOops, the usched_set syscall prototype should be updated.
David Xu [Mon, 5 Jun 2006 00:33:36 +0000 (00:33 +0000)]
Oops, the usched_set syscall prototype should be updated.

14 years agoAllow userland to bind a process to specific CPUs. The initial
David Xu [Mon, 5 Jun 2006 00:32:37 +0000 (00:32 +0000)]
Allow userland to bind a process to specific CPUs. The initial
implementation only allows current process PID to be used, but
an improved version will allow any PID to be specified.

Reviewed by: dillon

14 years agoRemove LWKT reader-writer locks (kern/lwkt_rwlock.c). Remove lwkt_wait
Matthew Dillon [Sun, 4 Jun 2006 21:09:50 +0000 (21:09 +0000)]
Remove LWKT reader-writer locks (kern/lwkt_rwlock.c).  Remove lwkt_wait
queues (only RW locks used them).  Convert remaining uses of RW locks to
LOCKMGR locks.

In recent months lockmgr locks have been simplified to the point where we
no longer need a lighter-weight fully blocking lock.  The removal also
simplifies lwkt_schedule() in that it no longer needs a special case to
deal with wait lists.

14 years agoFix blocking races in various *_locate() functions within softupdates.
Matthew Dillon [Sun, 4 Jun 2006 19:37:23 +0000 (19:37 +0000)]
Fix blocking races in various *_locate() functions within softupdates.
If softupdates blocks in malloc(), its understanding of the existance
of a data structure may change.  A relookup of the data structure is
required to ensure that the assumed state still holds.

Taken-from: FreeBSD/1.166

14 years agoAn inodedep might go away after the bwrite, do not try to access
Matthew Dillon [Sun, 4 Jun 2006 18:25:44 +0000 (18:25 +0000)]
An inodedep might go away after the bwrite, do not try to access
potentially freed memory.

Taken-from: FreeBSD/1.166

14 years agoMisc cleanup - move another namecache list scan into vfs_cache.c
Matthew Dillon [Sun, 4 Jun 2006 17:33:36 +0000 (17:33 +0000)]
Misc cleanup - move another namecache list scan into vfs_cache.c

14 years agoSilence warning.
Sascha Wildner [Sun, 4 Jun 2006 11:29:38 +0000 (11:29 +0000)]
Silence warning.

14 years agoFix `/etc/rc.d/dhclient stop' by explicitly returning 0, otherwise
YONETANI Tomokazu [Sat, 3 Jun 2006 10:41:26 +0000 (10:41 +0000)]
Fix `/etc/rc.d/dhclient stop' by explicitly returning 0, otherwise
the exit code from dhclient_prestop() will inherit the one from
ifalias_down().  Shell functions ipx_down() and ifalias_down()
return false(non-zero) when no IPX addresses or IP aliases are found,
but it shouldn't prevent dhclient from releasing the lease.

14 years agoFix a bug in the linux emulator's getdents_common() function. The function
Matthew Dillon [Sat, 3 Jun 2006 08:06:31 +0000 (08:06 +0000)]
Fix a bug in the linux emulator's getdents_common() function.  The function
was looping without checking for the directory EOF, leading to a situation
where an infinite loop could occur.

Reported-by: Rumko <rumcic@gmail.com>, YONETANI Tomokazu <qhwt+dfly@les.ath.cx>
14 years agoRename arguments to atomic_cmpset_int() to make their function more obvious.
Matthew Dillon [Fri, 2 Jun 2006 20:32:05 +0000 (20:32 +0000)]
Rename arguments to atomic_cmpset_int() to make their function more obvious.

14 years agoFix a file descriptor leak, add a missing vx_put() after linprocfs
Matthew Dillon [Fri, 2 Jun 2006 19:44:39 +0000 (19:44 +0000)]
Fix a file descriptor leak, add a missing vx_put() after linprocfs
destroys a vnode.

Reported-by: joerg@britannica.bec.de
14 years agoAdd an option which dumps the filename from the vnode's namecache link.
Matthew Dillon [Fri, 2 Jun 2006 19:39:48 +0000 (19:39 +0000)]
Add an option which dumps the filename from the vnode's namecache link.

14 years agoRemove vnode->v_id. This field used to be used to identify stale namecache
Matthew Dillon [Fri, 2 Jun 2006 04:59:54 +0000 (04:59 +0000)]
Remove vnode->v_id.  This field used to be used to identify stale namecache
entries related to parent directory linkages.  It was a terrible hack and
fortunately is no longer used.

14 years agonamecache->nc_refs is no longer protected by the MP lock. Atomic ops must
Matthew Dillon [Thu, 1 Jun 2006 22:45:19 +0000 (22:45 +0000)]
namecache->nc_refs is no longer protected by the MP lock.  Atomic ops must
be used.

14 years agoAdd some mdoc markup and remove hard sentence breaks.
Sascha Wildner [Thu, 1 Jun 2006 19:38:06 +0000 (19:38 +0000)]
Add some mdoc markup and remove hard sentence breaks.

Sascha Wildner [Thu, 1 Jun 2006 19:35:59 +0000 (19:35 +0000)]

14 years agoSince we can only hold one shared spinlock at a time anyway, change the
Matthew Dillon [Thu, 1 Jun 2006 19:02:39 +0000 (19:02 +0000)]
Since we can only hold one shared spinlock at a time anyway, change the
gd_spinlocks_rd counter into a gd_spinlock_rd pointer.  This will improve
performance for potentially contested exclusive spinlocks.  Now they can
test the per-cpu spinlock pointer directly against the spinlock being
acquired instead of testing a counter which might represent any shared

This also has the effect of relaxing the requirement that further
exclusive spinlocks cannot be acquired while holding a shared spinlock,
but for now we are going to leave the requirement intact.

14 years agoTeach kdump a handy new trick: -p $pid selects the records of
Joerg Sonnenberger [Thu, 1 Jun 2006 18:18:00 +0000 (18:18 +0000)]
Teach kdump a handy new trick: -p $pid selects the records of
a specific PID, making it much faster than e.g. grep of the output.
Keep track of how long the header was, useful for later additions.

Obtained-from: NetBSD

14 years agoAnother update. Clarify that a shared spinlock can be acquired while holding
Matthew Dillon [Thu, 1 Jun 2006 17:17:35 +0000 (17:17 +0000)]
Another update.  Clarify that a shared spinlock can be acquired while holding
exclusive spinlocks, but only one shared spinlock can be held and no new
exclusive spinlocks can be acquired while holding a shared spinlock.

14 years agoUpdate the manual page to reflect additional spinlock requirements.
Matthew Dillon [Thu, 1 Jun 2006 17:05:01 +0000 (17:05 +0000)]
Update the manual page to reflect additional spinlock requirements.

14 years agoIf the scheduler clock cannot call bsd4_resetpriority() due to spinlock
Matthew Dillon [Thu, 1 Jun 2006 16:49:59 +0000 (16:49 +0000)]
If the scheduler clock cannot call bsd4_resetpriority() due to spinlock
requirements, at least call need_user_resched().

14 years agoUse the MP friendly objcache instead of zalloc to allocate temporary
Matthew Dillon [Thu, 1 Jun 2006 06:10:58 +0000 (06:10 +0000)]
Use the MP friendly objcache instead of zalloc to allocate temporary

14 years agogd_tdallq is not protected by the BGL any more, it can only be manipulated
Matthew Dillon [Thu, 1 Jun 2006 05:38:46 +0000 (05:38 +0000)]
gd_tdallq is not protected by the BGL any more, it can only be manipulated
on the current cpu.  Remove the thread when it exits rather then when it is

14 years agoZap references to Digital's TurboLaser bus.
Sascha Wildner [Wed, 31 May 2006 19:06:13 +0000 (19:06 +0000)]
Zap references to Digital's TurboLaser bus.

14 years agoAdd kobj(9) manual page.
Sascha Wildner [Wed, 31 May 2006 09:42:10 +0000 (09:42 +0000)]
Add kobj(9) manual page.

Taken-from: FreeBSD

14 years agoRemove trailing whitespace and fix references.
Sascha Wildner [Tue, 30 May 2006 08:13:07 +0000 (08:13 +0000)]
Remove trailing whitespace and fix references.

14 years agoFix numerous bugs in the BSD4 scheduler introduced in recent commits.
Matthew Dillon [Mon, 29 May 2006 22:57:24 +0000 (22:57 +0000)]
Fix numerous bugs in the BSD4 scheduler introduced in recent commits.
Primarily, do not try to get a spinlock from a hard interrupt (e.g. IPI)
if spinlocks are already being held by the cpu.

This will probably have to be made an absolute rule - no spinlocks at all
in a hard interrupt / IPI (vs an interrupt thread).

14 years agoShortcut two common spinlock situations and don't bother KTR logging them.
Matthew Dillon [Mon, 29 May 2006 16:50:06 +0000 (16:50 +0000)]
Shortcut two common spinlock situations and don't bother KTR logging them.

14 years agoAdd two KTR (kernel trace) options: KTR_GIANT_CONTENTION and
Matthew Dillon [Mon, 29 May 2006 07:29:15 +0000 (07:29 +0000)]
Add two KTR (kernel trace) options: KTR_GIANT_CONTENTION and
KTR_SPIN_CONTENTION.  These will cause MP lock contention and spin lock
contention to be KTR-logged.

14 years agoRemove conditional memory allocation based on KTR_ALL. Allocate memory
Matthew Dillon [Mon, 29 May 2006 07:18:04 +0000 (07:18 +0000)]
Remove conditional memory allocation based on KTR_ALL.  Allocate memory
for all cpus based on KTR only.

14 years agoClean up compiler warnings when KTR is enabled but KTR_ALL is not.
Matthew Dillon [Mon, 29 May 2006 06:47:29 +0000 (06:47 +0000)]
Clean up compiler warnings when KTR is enabled but KTR_ALL is not.

14 years agoFurther isolate the user process scheduler data by moving more variables
Matthew Dillon [Mon, 29 May 2006 03:57:21 +0000 (03:57 +0000)]
Further isolate the user process scheduler data by moving more variables
from the globaldata structure to the scheduler module(s).

Make the user process scheduler MP safe.  Make the LWKT 'pull thread'
(to a different cpu) feature MP safe.  Streamline the user process
scheduler API.

Do a near complete rewrite of the BSD4 scheduler.  Remote reschedules
(reschedules to other cpus), cpu pickup of queued processes, and locality
of reference handling should make the new BSD4 scheduler a lot more

Add a demonstration user process scheduler called 'dummy'
(kern/usched_dummy.c).  Add a kenv variable 'kern.user_scheduler' that
can be set to the desired scheduler on boot (i.e. 'bsd4' or 'dummy').

NOTE: Until more of the system is taken out from under the MP lock,
these changes actually slow things down slightly.  Buildworlds are
about ~2.7% slower.

14 years agoGet rid -y/-Y (sort by interactive measure). The interactive measure has
Matthew Dillon [Sun, 28 May 2006 23:12:09 +0000 (23:12 +0000)]
Get rid -y/-Y (sort by interactive measure).  The interactive measure has
been removed.

14 years agoMark various forms of read() and write() MPSAFE. Note that the MP lock is
Matthew Dillon [Sat, 27 May 2006 20:17:17 +0000 (20:17 +0000)]
Mark various forms of read() and write() MPSAFE.  Note that the MP lock is
still acquire, but now its a lot deeper in the fileops.

Mark dup(), dup2(), close(), closefrom(), and fcntl() MPSAFE.  Some code
paths don't have to get the MP lock, but most still do deeper into the

14 years agoAdd a spinlock(9) manual page (based on a writeup by Matt).
Sascha Wildner [Sat, 27 May 2006 17:22:46 +0000 (17:22 +0000)]
Add a spinlock(9) manual page (based on a writeup by Matt).

14 years agoCheck cvs commit's -m argument for being a filename and ask the user
Simon Schubert [Sat, 27 May 2006 11:59:44 +0000 (11:59 +0000)]
Check cvs commit's -m argument for being a filename and ask the user
if he is serious.  This should in the future prevent log messages like
"/tmp/mycommit.txt", which happen when -m is used instead of -F.

14 years agoRemove /usr/share/examples/ibcs2 via 'make upgrade'.
Sascha Wildner [Sat, 27 May 2006 10:10:07 +0000 (10:10 +0000)]
Remove /usr/share/examples/ibcs2 via 'make upgrade'.

14 years agoClear the new VMAYHAVELOCKS flag when after an unlock we determine that
Matthew Dillon [Sat, 27 May 2006 02:03:17 +0000 (02:03 +0000)]
Clear the new VMAYHAVELOCKS flag when after an unlock we determine that
there are no more locks and no pending locks.

14 years agoGreatly reduce the MP locking that occurs in closef(), and remove
Matthew Dillon [Sat, 27 May 2006 01:57:42 +0000 (01:57 +0000)]
Greatly reduce the MP locking that occurs in closef(), and remove
unnecessary VOP_ADVLOCK calls in both closef() and fdrop() by adding
a new vnode flag, VMAYHAVELOCKS, that we can check before doing the

14 years agoImplement msleep(). This function is similar to the FreeBSD msleep() except
Matthew Dillon [Sat, 27 May 2006 01:51:27 +0000 (01:51 +0000)]
Implement msleep().  This function is similar to the FreeBSD msleep() except
it interlocks with a spinlock instead of a mutex.  The spinlock must be
exclusively held on entry.  msleep() will atomically sleep and release the
spinlock, then reacquire the spinlock when it wakes up.

A novel approach to the interlock is used.  DragonFly's tsleep/wakeup
mechanism is a per-cpu mechanism, with a local array of cpu masks, one
entry per hash index.  A wakeup simpy sends an IPI message to each target
cpu whos bitmap bit is set in the ident's hash entry.

This allows us to interlock simply by entering a critical section and
setting our bit, then releasing the mutex, then tsleep()ing as per normal.
No additional locks are required.  The critical section will delay any wakeup
race with us simply by delaying the IPI message that is potentially
in-transit to our cpu.

Requested-by: Numerous people, and its time has come now.
14 years agoAdd a read-ahead version of ffs_blkatoff() called ffs_blkatoff_ra(). This
Matthew Dillon [Fri, 26 May 2006 19:57:33 +0000 (19:57 +0000)]
Add a read-ahead version of ffs_blkatoff() called ffs_blkatoff_ra().  This
code was basically extracted from ffs_read().  ffs_read() now calls
ffs_blkatoff_ra().  ufs_readdir() now also calls ffs_blkatoff_ra().

14 years ago- Uniformly use .In for header file references.
Sascha Wildner [Fri, 26 May 2006 19:39:41 +0000 (19:39 +0000)]
- Uniformly use .In for header file references.

- Fix numerous wrong directory names.

14 years agoRemove FFS function hooks used by UFS. Simply make direct calls from ufs
Matthew Dillon [Fri, 26 May 2006 17:07:48 +0000 (17:07 +0000)]
Remove FFS function hooks used by UFS.  Simply make direct calls from ufs
to ffs.  The original ufs routines don't exist anymore anyhow and EXT2 no
longer references UFS files directly.  UFS and FFS have been 'one' filesystem
for two decades.  These hooks are no longer needed.

14 years ago* Fix a number of cases where too much kernel memory might be allocated to
Matthew Dillon [Fri, 26 May 2006 16:56:34 +0000 (16:56 +0000)]
* Fix a number of cases where too much kernel memory might be allocated to
  satisfy a directory read operation.

* Calculate a minimum of (1) allocated directory cookie and limit the maximum
  to 1024.

* Rewrite ufs_readdir() (part 1/2) to use the buffer cache instead of
  allocating a kernel buffer and to do better validation of the scanned
  directory entries.

* Use a simpler fix for EXT2FS.

Reported-by: [NetBSD.org #7471]
14 years agoAdd #include <sys/lock.h> where needed to support get_mplock().
Matthew Dillon [Fri, 26 May 2006 15:55:13 +0000 (15:55 +0000)]
Add #include <sys/lock.h> where needed to support get_mplock().

Reported-by: YONETANI Tomokazu <qhwt+dfly@les.ath.cx>
14 years ago* Make falloc() MPSAFE. filehead (the file list) and nfiles are now
Matthew Dillon [Fri, 26 May 2006 02:26:26 +0000 (02:26 +0000)]
* Make falloc() MPSAFE.  filehead (the file list) and nfiles are now
  static and fully MPSAFE.

* Add a MPSAFE procedure which scans all struct file's in the system.

* Substantially rework unp_gc().  It is not quite MPSAFE yet, but all of
  its struct file accesses and file list scanning should be.

14 years agoMore MP work.
Matthew Dillon [Fri, 26 May 2006 00:33:13 +0000 (00:33 +0000)]
More MP work.

* Incorporate fd_knlistsize initialization into fsetfd().

* Mark all fileops vectors as MPSAFE (but get the mplock for most of them).
  Clean up a number of fileops routines, mainly *_ioctl().

* Make crget(), crhold(), and crfree() MPSAFE.  crfree still needs the mplock
  on the last release.  Give ucred a spinlock to handle the crfree()
  0 transition race.

14 years agoFix several buffer cache issues related to B_NOCACHE.
Matthew Dillon [Thu, 25 May 2006 19:31:15 +0000 (19:31 +0000)]
Fix several buffer cache issues related to B_NOCACHE.

* Do not set B_NOCACHE when calling vinvalbuf(... V_SAVE).  This will
  destroy dirty VM backing store associated with clean buffers before
  the VM system has a chance to check for and flush them.

  Taken-from: FreeBSD

* Properly set B_NOCACHE when destroying buffers related to truncated data.

* Fix a bug in vnode_pager_setsize() that was recently introduced.
  v_filesize was being set before a new/old size comparison, causing a
  file truncation to not destroy related VM pages past the new EOF.

* Remove a bogus B_NOCACHE|B_DIRTY test in brelse().  This was originally
  intended to be a B_NOCACHE|B_DELWRITE test which then cleared B_NOCACHE,
  but now that B_NOCACHE operation has been fixed it really does indicate that
  the buffer, its contents, and its backing store are to be destroyed, even
  if the buffer is marked B_DELWRI.

  Instead of clearing B_NOCACHE when B_DELWRITE is found to be set, clear
  B_DELWRITE when B_NOCACHE is found to be set.

  Note that B_NOCACHE is still cleared when bdirty() is called in order to
  ensure that data is not lost when softupdates and other code do a
  'B_NOCACHE + bwrite' sequence.  Softupdates can redirty a buffer in its
  io completion hook and a write error can also redirty a buffer.

* The VMIO buffer rundown seems to have mophed into a state where the
  distinction between NFS and non-NFS buffers can be removed.  Remove
  the test.

14 years agoConvert almost all of the remaining manual traversals of the allproc
Matthew Dillon [Thu, 25 May 2006 07:36:37 +0000 (07:36 +0000)]
Convert almost all of the remaining manual traversals of the allproc
list over to allproc_scan().

The allproc_scan() code is MPSAFE, and code which before just cached
a proc pointer now PHOLD's it as well, but access to the various proc
fields is *NOT* yet MPSAFE.  Still, we are closer now.

14 years agoAdjust pamp_growkernel(), elf_brand_inuse(), and ktrace() to use
Matthew Dillon [Thu, 25 May 2006 04:17:09 +0000 (04:17 +0000)]
Adjust pamp_growkernel(), elf_brand_inuse(), and ktrace() to use
allproc_scan() instead of scanning the process list manually.

14 years agoModifying lk_flags during lock reinitialization requires a spinlock.
Matthew Dillon [Thu, 25 May 2006 02:46:38 +0000 (02:46 +0000)]
Modifying lk_flags during lock reinitialization requires a spinlock.

14 years agoWhen a vnode is vgone()'d its v_ops is replaced with dead_vnode_ops.
Matthew Dillon [Thu, 25 May 2006 01:20:07 +0000 (01:20 +0000)]
When a vnode is vgone()'d its v_ops is replaced with dead_vnode_ops.
dead_vnode_ops replaces VOP_LOCK with a dummy routine that just returns

This blows up anyone actually trying to access the vnode by improperly
returning a successful lock which then panics the machine when the caller
tries to unlock it.  This also screws up VOP_LOCK vs vx_lock() interactions
and can theoretically create other problems.

Normally vgone()'d vnodes have no references and this isn't a problem.
The two notable exceptions are (1) when revoke() is called on a device
(i.e. tty), and (2) when a procfs or linprocfs vnode is destroyed due
to a process exit while another process is accessing it.

Remove dead_lock().  Dead vnodes revert to defaultops which implement
the expected lockmgr lock.

14 years agoFix issues with an incorrectly initialized buffer when formatting a floppy.
Matthew Dillon [Wed, 24 May 2006 21:50:11 +0000 (21:50 +0000)]
Fix issues with an incorrectly initialized buffer when formatting a floppy.

Reported-by: Stefan Krueger <skrueger@meinberlikomm.de>
14 years agoMove the code that inserts a new process into the allproc list into its
Matthew Dillon [Wed, 24 May 2006 18:59:51 +0000 (18:59 +0000)]
Move the code that inserts a new process into the allproc list into its
own procedure, proc_add_allproc().  Make it MPSAFE.

Integrate pid generation for the new process into proc_add_allproc(), move
all related code from kern_fork.c to kern_proc.c.

Change procfs to use the new allproc scanning function.

14 years agoStart consolidating process related code into kern_proc.c. Implement
Matthew Dillon [Wed, 24 May 2006 17:44:04 +0000 (17:44 +0000)]
Start consolidating process related code into kern_proc.c.  Implement
a few MPSAFE functions to mess with the allproc and zombproc lists, and a
callback for scanning allproc.

Adjust linprocfs to use the new callback as a test.

14 years agoregen
Simon Schubert [Wed, 24 May 2006 12:42:01 +0000 (12:42 +0000)]

14 years agounbreak world: spell MPSAFE correctly
Simon Schubert [Wed, 24 May 2006 12:40:19 +0000 (12:40 +0000)]
unbreak world: spell MPSAFE correctly

14 years agospinlock more of the file descriptor code. No appreciable difference in
Matthew Dillon [Wed, 24 May 2006 03:23:35 +0000 (03:23 +0000)]
spinlock more of the file descriptor code.  No appreciable difference in
performance on buildworld tests.

Change getvnode() to holdvnode() and use semantics similar to holdsock().
The old getvnode() code wasn't fhold()ing the file pointer.  The new
holdvnode() code does.

14 years agoMove all the resource limit handling code into a new file, kern/kern_plimit.c.
Matthew Dillon [Tue, 23 May 2006 20:35:12 +0000 (20:35 +0000)]
Move all the resource limit handling code into a new file, kern/kern_plimit.c.
Add spinlocks for access, and mark getrlimit and setrlimit as being MPSAFE.

Document how LWPs will have to be handled - basically we will have to unshare
the resource structure once we start allowing multiple LWPs per process, but
we can otherwise leave it in the proc structure.

14 years agoThe pageout daemon does not usually page out pages it considers active.
Matthew Dillon [Tue, 23 May 2006 01:21:48 +0000 (01:21 +0000)]
The pageout daemon does not usually page out pages it considers active.
However, under certain types of heavy memory use it is possible to keep
nearly all of a machine's pages marked active.  This can result in a
degenerate situation where the pageout demon pages out so few pages that
it might as well not be operating at all, resulting in a machine lockup.

Adjust the pageout daemon to dig into active pages based on its loop
counter.  This counter will start to go up when the pageout daemon is not
able to keep up.  The higher counter gets, the more active pages
become candidates for paging.  We depend on fault-in rate limiting to
avoid thrashing to the point of inaccessibility.

Also-thanks-to: Peter Holms filesystem and load testing suite (stress2).

14 years agoSync to head. Add a verbose option to vmpageinfo which dumps all the
Matthew Dillon [Tue, 23 May 2006 01:00:05 +0000 (01:00 +0000)]
Sync to head.  Add a verbose option to vmpageinfo which dumps all the
vm_page structures.

14 years agoFix a minor bug in fdcopy() in the last commit, Consolidate the
Matthew Dillon [Mon, 22 May 2006 21:33:11 +0000 (21:33 +0000)]
Fix a minor bug in fdcopy() in the last commit, Consolidate the
fd_lastfile and fd_freefile fixup code to its own little inline function.

14 years agoDo a major cleanup of the file descriptor handling code in preparation for
Matthew Dillon [Mon, 22 May 2006 21:21:26 +0000 (21:21 +0000)]
Do a major cleanup of the file descriptor handling code in preparation for
making the descriptor table MPSAFE.  Introduce a new feature that allows a
file descriptor number to be reserved without having to assign a file
pointer to it.  This allows code such as open(), dup(), etc to reserve
descriptors to work with without having to worry about the related file
being ripped out from under them by another thread sharing the descriptor

falloc() - This function allocates the file pointer and descriptor as
before, but does NOT associate the file pointer with the

Before this change another thread could access the file
pointer while the system call creating it was blocked,
before the system call had a chance to completely initialize
the file pointer.

The caller must call fsetfd() to assign or clear the
reserved descriptor.

fsetfd() - Is now responsible for associating a file pointer with a
previously reserved descriptor or clearing the reservation.

fdealloc() - This hack existed to deal with open/dup races against other
threads.  The above changes remove the possibility so this
routine has been deleted.

dup code - kern_dup() and dupfdopen() have been completely rewritten.
They are much cleaner and less obtuse now.  Additional race
conditions in the original code were also found and fixed.

funsetfd() - Now returns the file pointer that was cleared and takes
responsibility for adjusting fd_lastfile.

NOTE: fd_lastfile is inclusive of any reserved descriptors.

fdcopy() - While not yet MPSAFE, fdcopy now properly handles races
against other threads.

fdp->fd_lastfile -
This field was not being properly updated in certain failure
cases.  This commit fixes that.  Also, if all a process's
descriptors were closed this field was incorrectly left at
0 when it should have been set to -1.

fdp->fd_files - A number of code blocks were trying to optimize a for()
loop over all file descriptors by caching a pointer to
fd_files.  This is a problem because fd_files can be
reallocated if code within the loop blocks.  These loops
have been rewritten.

14 years agoMop up remains of the ibcs2/streams/svr4 removal:
Sascha Wildner [Mon, 22 May 2006 06:26:30 +0000 (06:26 +0000)]
Mop up remains of the ibcs2/streams/svr4 removal:

* Remove streams(4) and svr4(4) manual pages.

* Add associated modules and their manual pages to the list of files
  to be removed upon 'make upgrade'.

* Remove IBCS2 and SPX_HACK options.

* Change M_ZOMBIE definition back to static.

* Fix miscellaneous references & comments.

14 years agoGive struct filedesc and struct file a spinlock, and do some initial
Matthew Dillon [Mon, 22 May 2006 00:52:31 +0000 (00:52 +0000)]
Give struct filedesc and struct file a spinlock, and do some initial
(incomplete) lockup work.

Performance impact: No measurable impact.

14 years agoImplement a much faster spinlock.
Matthew Dillon [Sun, 21 May 2006 20:23:29 +0000 (20:23 +0000)]
Implement a much faster spinlock.

* Spinlocks can't conflict with FAST interrupts without deadlocking anyway,
  so instead of using a critical section simply do not allow an interrupt
  thread to preempt the current thread if it is holding a spinlock.  This
  cuts spinlock overhead in half.

* Implement shared spinlocks in addition to exclusive spinlocks.  Shared
  spinlocks would be used, e.g. for file descriptor table lookups.

* Cache a shared spinlock by using the spinlock's lock field as a bitfield,
  one for each cpu (bit 31 for exclusive locks).  A shared spinlock sets
  its cpu's shared bit and does not bother clearing it on unlock.

  This means that multiple, parallel shared spinlock accessors do NOT incur
  a cache conflict on the spinlock.  ALL parallel shared accessors operate
  at full speed (~10ns vs ~40-100ns in overhead).  90% of the 10ns in
  overhead is due to a necessary MFENCE to interlock against exclusive
  spinlocks on the mutex.  However, this MFENCE only has to play with
  pending cpu-local memory writes so it will always run at near full speed.

* Exclusive spinlocks in the face of previously cached shared spinlocks
  are now slightly more expensive because they have to clear the cached
  shared spinlock bits by checking the globaldata structure for each
  conflicting cpu to see if it is still holding a shared spinlock.  However,
  only the initial (unavoidable) atomic swap involves potential cache
  conflicts.  The shared bit checks involve only memory reads and the
  situation should be self-correcting from a performance standpoint since
  the shared bits then get cleared.

* Add sysctl's for basic spinlock performance testing.  Setting
  debug.spin_lock_test issues a test.  Tests #2 and #3 loop
  debug.spin_test_count times.  p.s. these tests will stall the whole

1       Test the indefinite wait code
2       Time the best-case exclusive lock overhead
3       Time the best-case shared lock overhead

* TODO: A shared->exclusive spinlock upgrade inline with positive feedback,
  and an exclusive->shared spinlock downgrade inline.

14 years agoMisc mdoc(7) cleanup:
Sascha Wildner [Sun, 21 May 2006 14:15:06 +0000 (14:15 +0000)]
Misc mdoc(7) cleanup:

* Fix section numbers.

* Fix .Xr abuse.

* Remove reference to obsolete plot(1) manual page.

* Fix typo.

14 years agoOnly _KERNEL code can optimize based on SMP vs UP. User code must always
Matthew Dillon [Sun, 21 May 2006 05:31:14 +0000 (05:31 +0000)]
Only _KERNEL code can optimize based on SMP vs UP.  User code must always
assume SMP and generate a "lock; " prefix.

14 years agoClean up more #include files. Create an internal __boolean_t so two or
Matthew Dillon [Sun, 21 May 2006 03:43:48 +0000 (03:43 +0000)]
Clean up more #include files.  Create an internal __boolean_t so two or
three sys/ header files don't have to juggle the type.  Use
_KERNEL_STRUCTURES in variuos pieces of user code that delve into kvm.

Reported-by: Rumko <rumcic@gmail.com>, walt <wa1ter@myrealbox.com>
14 years agoA little script that runs through all the header files and checks that
Matthew Dillon [Sun, 21 May 2006 00:27:59 +0000 (00:27 +0000)]
A little script that runs through all the header files and checks that
they can be singly included, or that they generate the appropriate
#error or #warning.  The following flags combinations are used:

_KERNEL kernel access
_KERNEL_STRUCTURES userland access to kernel structures
[none] userland access

14 years agoRemove the (unmaintained for 10+ years) svr4 and ibcs2 emulation code.
Matthew Dillon [Sat, 20 May 2006 18:26:36 +0000 (18:26 +0000)]
Remove the (unmaintained for 10+ years) svr4 and ibcs2 emulation code.
Poof, gone.