dragonfly.git
13 years agoAdd an option (-n ncpus) to specify the number of cpus a virtual kernel
Matthew Dillon [Mon, 2 Jul 2007 02:37:05 +0000 (02:37 +0000)]
Add an option (-n ncpus) to specify the number of cpus a virtual kernel
should simulate.  The virtual kernel must be built with options SMP.
1-32 cpus may be simulated regardless of the number of real cpus.  The
virtual kernel will create a thread for each cpu.

13 years agoThe real-kernel madvise and mcontrol system calls handle SMP interactions
Matthew Dillon [Mon, 2 Jul 2007 02:23:00 +0000 (02:23 +0000)]
The real-kernel madvise and mcontrol system calls handle SMP interactions
when manipulating virtual page tables.  Construct a new set of functions
for the virtual kernel to take advantage of this.

Add a cpu cache mask to the pmap structure for the virtual kernel which
allows us to invalidate per-cpu page table mappings simply by clearing
the mask.  Also reload PT1pde after a successful cache hit if the cpu
mask bit is found to be 0.

Redo most of the PTE handling code for the virtual kernel.  Use the new
invalidation function set and carefully deal with race conditions between
cpus.  Race conditions are far more serious with a SMP virtual kernel then
with a real kernel because there are effectively two levels of page table
caching instead of one, since the real kernel maintains a separate pmap
for each VM space under the virtual kernel's control in addition to the
standard TLB interactions.

13 years agoThe kernel perfmon support (options PERFMON) was trying to initialize its
Matthew Dillon [Mon, 2 Jul 2007 02:14:32 +0000 (02:14 +0000)]
The kernel perfmon support (options PERFMON) was trying to initialize its
device way too early in the boot sequence, resulting in a panic on SMP
boxes.  Move initialization to a bit later in the boot sequence.

Reported-by: Thomas Nikolajsen <sinknull@crater.dragonflybsd.org>
Dragonfly-bug: <http://bugs.dragonflybsd.org/issue714>

13 years agosched_ithd() must be called from within a critical section.
Matthew Dillon [Mon, 2 Jul 2007 01:47:22 +0000 (01:47 +0000)]
sched_ithd() must be called from within a critical section.

13 years agoCopy a junk file from pc32 needed for <time.h>
Matthew Dillon [Mon, 2 Jul 2007 01:43:30 +0000 (01:43 +0000)]
Copy a junk file from pc32 needed for <time.h>

13 years agoOnly use the symbol returned by dladdr() if its address is <= the
Matthew Dillon [Mon, 2 Jul 2007 01:42:07 +0000 (01:42 +0000)]
Only use the symbol returned by dladdr() if its address is <= the
address we are trying to decode.

13 years agoClean up a kprintf() that was missing a newline.
Matthew Dillon [Mon, 2 Jul 2007 01:41:26 +0000 (01:41 +0000)]
Clean up a kprintf() that was missing a newline.

Submitted-by: Joe Talbott <josepht@cstone.net>
13 years agoImplement an architecture function cpu_mplock_contested() which is
Matthew Dillon [Mon, 2 Jul 2007 01:37:11 +0000 (01:37 +0000)]
Implement an architecture function cpu_mplock_contested() which is
called by the LWKT thread scheduler when the only thread(s) it can
schedule need the MP lock and the scheduler was unable to acquire the
MP lock.

On real systems this function just executes the cpu 'pause' instruction,
and on virtual systems this functions sleeps for a millisecond.

Use umtx_sleep() instead of sigpause() in the virtual kernel's idle loop
to interlock threads scheduled via a signal with the idle loop sleep.
This fixes a race condition that caused the vkernel to stop scheduling
(but there may be more issues, stay tuned).

13 years agoDo not allow umtx_sleep() to restart on a restartable signal. We want to
Matthew Dillon [Mon, 2 Jul 2007 01:30:07 +0000 (01:30 +0000)]
Do not allow umtx_sleep() to restart on a restartable signal.  We want to
return EINTR so the caller can handle side effects from the signal.

13 years agoNuke USB_MATCH*, USB_ATTACH* and USB_DETACH* macros.
Hasso Tepper [Sun, 1 Jul 2007 21:24:04 +0000 (21:24 +0000)]
Nuke USB_MATCH*, USB_ATTACH* and USB_DETACH* macros.

13 years agoRemove .Pp before .Sh.
Sascha Wildner [Sun, 1 Jul 2007 17:23:25 +0000 (17:23 +0000)]
Remove .Pp before .Sh.

13 years agoAdd markup and clean up a bit.
Sascha Wildner [Sun, 1 Jul 2007 10:49:36 +0000 (10:49 +0000)]
Add markup and clean up a bit.

13 years agoAlso credit lots of help from Aggelos Economopoulos <aoiko@cc.ece.ntua.gr>
Matthew Dillon [Sun, 1 Jul 2007 04:02:33 +0000 (04:02 +0000)]
Also credit lots of help from Aggelos Economopoulos <aoiko@cc.ece.ntua.gr>
in the SMP virtual kernel commit.

13 years agoUse dladdr() to obtain symbol names when possible and try to dump the
Matthew Dillon [Sun, 1 Jul 2007 03:28:54 +0000 (03:28 +0000)]
Use dladdr() to obtain symbol names when possible and try to dump the
entire stack backtrace instead of just the first call.

13 years agoConditionalize SMP bits for non-SMP builds.
Matthew Dillon [Sun, 1 Jul 2007 03:04:15 +0000 (03:04 +0000)]
Conditionalize SMP bits for non-SMP builds.

13 years agoBring in all of Joe Talbott's SMP virtual kernel work to date, which makes
Matthew Dillon [Sun, 1 Jul 2007 02:51:45 +0000 (02:51 +0000)]
Bring in all of Joe Talbott's SMP virtual kernel work to date, which makes
virtual kernel builds with SMP almost get through a full boot.  This work
includes:

    * Creation of 'cpu' threads via libthread_xu
    * Globaldata initialization
    * AP synchronization
    * Bootstrapping to the idle thread
    * SMP pmap (mmu) functions
    * IPI handling

My part of this commit:

    * Bring all the signal interrupts under DragonFly's machine independant
      interrupt handler API.  This will properly deal with the MP lock
      and critical section handling.

    * Some additional pmap bits to handle SMP invalidation issues.

Submitted-by: Joe Talbott <josepht@cstone.net>
Additional-bits-by: Matt Dillon
13 years agoMore multi-threaded support for virtualization. Move the save context
Matthew Dillon [Sun, 1 Jul 2007 01:11:38 +0000 (01:11 +0000)]
More multi-threaded support for virtualization.  Move the save context
from the process structure to the lwp structure, cleaning up the vmspace
support structures at the same time.  This allows multiple LWPs in the
same process to be running a virtualization context at the same time.

13 years agomi_switch() and cpu_switch() are gone. Remove manpage and prototype.
Sascha Wildner [Sun, 1 Jul 2007 00:03:49 +0000 (00:03 +0000)]
mi_switch() and cpu_switch() are gone. Remove manpage and prototype.

13 years agoA signal is sent to a particular LWP must be delivered to that LWP and never
Matthew Dillon [Sat, 30 Jun 2007 23:38:31 +0000 (23:38 +0000)]
A signal is sent to a particular LWP must be delivered to that LWP and never
posted to the process generically.  Otherwise things like seg faults can
end up being posted to the wrong LWP.

13 years agoUse the actual function name in the message.
Sascha Wildner [Sat, 30 Jun 2007 21:52:19 +0000 (21:52 +0000)]
Use the actual function name in the message.

13 years agotvtohz() was split into tvtohz_low() and tvtohz_high() in Jan 2004.
Sascha Wildner [Sat, 30 Jun 2007 21:47:54 +0000 (21:47 +0000)]
tvtohz() was split into tvtohz_low() and tvtohz_high() in Jan 2004.

Update the manual page with some words from the comments in kern_clock.c.

13 years agoNuke PROC_(UN)LOCK, usb_callout_t, usb_kthread_create* and uio_procp.
Hasso Tepper [Sat, 30 Jun 2007 20:39:22 +0000 (20:39 +0000)]
Nuke PROC_(UN)LOCK, usb_callout_t, usb_kthread_create* and uio_procp.

13 years agoFix KASSERT messages.
Hasso Tepper [Sat, 30 Jun 2007 20:17:36 +0000 (20:17 +0000)]
Fix KASSERT messages.

13 years agoClean up a bit.
Sascha Wildner [Sat, 30 Jun 2007 19:20:48 +0000 (19:20 +0000)]
Clean up a bit.

13 years agoUse .Va for errno.
Sascha Wildner [Sat, 30 Jun 2007 19:03:52 +0000 (19:03 +0000)]
Use .Va for errno.

13 years agoTry to avoid accidental foot shooting by not allowing a virtual kernel
Matthew Dillon [Sat, 30 Jun 2007 05:54:03 +0000 (05:54 +0000)]
Try to avoid accidental foot shooting by not allowing a virtual kernel
to be installed unless DESTDIR is explicitly specified.

13 years agoMove the P_WEXIT check from lwpsignal() to kern_kill(). That is, disallow
Matthew Dillon [Sat, 30 Jun 2007 02:33:04 +0000 (02:33 +0000)]
Move the P_WEXIT check from lwpsignal() to kern_kill().  That is, disallow
signals to exiting processes but allow signals to threads that have not gone
through the exit interlock yet.  This allows exit1() to interlock the process
and still signal its LWPs.

Fix a bug in exit1() which was improperly using lwp_signotify() to wake up
LWPs to force the to exit.  This function is basically a NOP if there are
no signals pending to the LWP.  Send a real SIGKILL to the LWP instead.

This fixes a bug where vkernels get stuck in an exiting state and cannot be
killed.

Reported-by: Joe Talbott <josepht@cstone.net>
13 years agoUpdate the documentation for sys_checkpoint().
Matthew Dillon [Sat, 30 Jun 2007 01:59:41 +0000 (01:59 +0000)]
Update the documentation for sys_checkpoint().

13 years agoAdd MLINKS for checkpoint.1, because most point looking for information
Matthew Dillon [Sat, 30 Jun 2007 01:40:56 +0000 (01:40 +0000)]
Add MLINKS for checkpoint.1, because most point looking for information
out of the cold on checkpointing will probably try 'man checkpoint' more
often then 'man checkpt'.

13 years agoFlag the checkpoint descriptor so on restore we can identify it and use the
Matthew Dillon [Fri, 29 Jun 2007 23:40:00 +0000 (23:40 +0000)]
Flag the checkpoint descriptor so on restore we can identify it and use the
descriptor for the restore rather then trying to look up the original
checkpoint file.  This issue occurs when a program calls sys_checkpoint()
manually.

This allows a checkpoint-resume to be done on a copied checkpoint file,
or a gzipped (then gunzipped) checkpoint file, etc.  The original checkpoint
file no longer needs to remain intact.

Requested-by: _why <why@ruby-lang.org>
13 years agoNuke usb_ callout macros.
Hasso Tepper [Fri, 29 Jun 2007 22:56:31 +0000 (22:56 +0000)]
Nuke usb_ callout macros.

13 years agoImplement struct lwp->lwp_vmspace. Leave p_vmspace intact. This allows
Matthew Dillon [Fri, 29 Jun 2007 21:54:15 +0000 (21:54 +0000)]
Implement struct lwp->lwp_vmspace.  Leave p_vmspace intact.  This allows
vkernels to run threaded and to run emulated VM spaces on a per-thread basis.
struct proc->p_vmspace is left intact, making it easy to switch into and out
of an emulated VM space.  This is needed for the virtual kernel SMP work.

This also gives us the flexibility to run emulated VM spaces in their own
threads, or in a limited number of separate threads.  Linux does this and
they say it improved performance.  I don't think it necessarily improved
performance but its nice to have the flexibility to do it in the future.

13 years agoAdd some useful references to various manual pages which deal with random
Sascha Wildner [Fri, 29 Jun 2007 19:34:41 +0000 (19:34 +0000)]
Add some useful references to various manual pages which deal with random
numbers.

Suggested-by: Robin Carey <robin_carey5@yahoo.co.uk>
Dragonfly-bug: <http://bugs.dragonflybsd.org/issue708>

13 years agoThis is a simple little syslink test program which ping-pongs a 64K
Matthew Dillon [Fri, 29 Jun 2007 17:18:42 +0000 (17:18 +0000)]
This is a simple little syslink test program which ping-pongs a 64K
buffer around.

13 years agoClean up syslink a bit and add an abstraction that will eventually allow
Matthew Dillon [Fri, 29 Jun 2007 05:14:00 +0000 (05:14 +0000)]
Clean up syslink a bit and add an abstraction that will eventually allow
zero-copy support for I/O on syslink DMA bufs.

13 years agoAdd O_MAPONREAD (not yet implemented). This will have the semantics of
Matthew Dillon [Fri, 29 Jun 2007 05:12:40 +0000 (05:12 +0000)]
Add O_MAPONREAD (not yet implemented).  This will have the semantics of
replacing the underlying VM with a copy-on-write page mapping when possible.
Ultimately when used with syslink this will have the semantics of allowing
a fully shared mapping for syslink DMA buffers.

13 years agoAdd a new flag, XIOF_VMLINEAR, which requires that the buffer being mapped
Matthew Dillon [Fri, 29 Jun 2007 05:09:15 +0000 (05:09 +0000)]
Add a new flag, XIOF_VMLINEAR, which requires that the buffer being mapped
be contiguous within a single VM object.

13 years agoGet out-of-band DMA buffers working for user<->user syslinks. This
Matthew Dillon [Fri, 29 Jun 2007 00:18:05 +0000 (00:18 +0000)]
Get out-of-band DMA buffers working for user<->user syslinks.  This
allows the syslink protocol to operate in a manner very similar to the
way sophisticated DMA hardware works, where you have a DMA buffer attached
to a command.

Augment the syslink protocol to implement read, write, and read-modify-write
style commands.

Obtain the MP lock in places where needed because fileops are called without
it held now.  Our VM ops are not MP safe yet.

Use an XIO to map VM pages between userland processes.  Add additional
XIO functions to aid in copying data to and from a userland context.  This
removes an extra buffer copy from the path and allows us to manipulate pure
vm_page_t's for just about everything.

13 years agoClarify cpu localization requirements when using callout_stop() and
Matthew Dillon [Thu, 28 Jun 2007 20:24:57 +0000 (20:24 +0000)]
Clarify cpu localization requirements when using callout_stop() and
callout_reset().

Fix a SMP race in callout_stop() where the callout structure was being
modified in an unsafe manner.

13 years agoNuke SIMPLEQ_* and logprintf.
Hasso Tepper [Thu, 28 Jun 2007 13:55:13 +0000 (13:55 +0000)]
Nuke SIMPLEQ_* and logprintf.

13 years agoRemove duplicate.
Hasso Tepper [Thu, 28 Jun 2007 09:33:33 +0000 (09:33 +0000)]
Remove duplicate.

13 years agoNuke device_ptr_t, USBBASEDEVICE, USBDEVNAME(), USBDEVUNIT(), USBGETSOFTC(),
Hasso Tepper [Thu, 28 Jun 2007 06:32:33 +0000 (06:32 +0000)]
Nuke device_ptr_t, USBBASEDEVICE, USBDEVNAME(), USBDEVUNIT(), USBGETSOFTC(),
USBDEVPTRNAME() and Static with help from sed(1).

13 years agoFix a bug-a-boo, the type uuid was being printed instead of the storage
Matthew Dillon [Wed, 27 Jun 2007 18:15:57 +0000 (18:15 +0000)]
Fix a bug-a-boo, the type uuid was being printed instead of the storage
uuid (so the type was being printed twice).

13 years agoUse kernel functions. I don't understand how I could miss these ...
Hasso Tepper [Wed, 27 Jun 2007 13:26:18 +0000 (13:26 +0000)]
Use kernel functions. I don't understand how I could miss these ...

13 years agoNuke the code specific to NetBSD/OpenBSD/FreeBSD at first. I doubt anyone
Hasso Tepper [Wed, 27 Jun 2007 12:28:00 +0000 (12:28 +0000)]
Nuke the code specific to NetBSD/OpenBSD/FreeBSD at first. I doubt anyone
will update these pieces and I don't intend to review macros for all
platforms.

There is the chance though that I might kill something which should stay
in the code in form "TODO: port it to DF". So, please review and kick me.

13 years agoFix files that included the posix scheduling headers that were merged earlier.
Joe Talbott [Tue, 26 Jun 2007 23:30:05 +0000 (23:30 +0000)]
Fix files that included the posix scheduling headers that were merged earlier.

13 years agoImplement jscan -o. Take the patch from Steve and add some additional
Matthew Dillon [Tue, 26 Jun 2007 20:47:58 +0000 (20:47 +0000)]
Implement jscan -o.  Take the patch from Steve and add some additional
checks to the write() to deal with EINTR and EAGAIN.

Submitted-by: "Steve O'Hara-Smith" <steve@sohara.org>
13 years agoA file descriptor of -1 is legal when accessing journal status. Just allow
Matthew Dillon [Tue, 26 Jun 2007 20:39:33 +0000 (20:39 +0000)]
A file descriptor of -1 is legal when accessing journal status.  Just allow
it generally, the journal command switch will recheck it on a per-command
basis.

13 years agoNuke USBDEV().
Hasso Tepper [Tue, 26 Jun 2007 19:52:10 +0000 (19:52 +0000)]
Nuke USBDEV().

13 years agoRepo-copy numerous files from sys/emulation/posix4 to sys/sys and sys/kern
Matthew Dillon [Tue, 26 Jun 2007 19:31:10 +0000 (19:31 +0000)]
Repo-copy numerous files from sys/emulation/posix4 to sys/sys and sys/kern
and adjust the build to suit.  posix scheduling is here to stay.

Submitted-by: Joe Talbott <josepht@cstone.net>
13 years agoIf RX csum calculation with pseudo header is enabled, bge(4)'s will
Sepherosa Ziehau [Tue, 26 Jun 2007 15:10:23 +0000 (15:10 +0000)]
If RX csum calculation with pseudo header is enabled, bge(4)'s will
miscalculate csum of frames carrying UDP datagrams.  If UDP datagrams
are not fragmented by IP, then the rate of miscalculation is low, but
if UDP datagrams are fragmented by IP, then most of the frames will be
delivered to the upper layer with wrong hardware csum, which is quite
common for NFS.

Disable hardware RX csum calculation with pseudo header; it will be
better than doing software csum if hardware csum error happens, since
the error rate is too high.

Reported-by: Thomas Nikolajsen <thomas.nikolajsen@mail.dk>
13 years agoOne callout_stop() is enough.
Hasso Tepper [Tue, 26 Jun 2007 14:56:50 +0000 (14:56 +0000)]
One callout_stop() is enough.

Suggested-by: Joerg
13 years agomalloc -> kmalloc
Hasso Tepper [Tue, 26 Jun 2007 11:53:16 +0000 (11:53 +0000)]
malloc -> kmalloc

13 years ago- Fix headphone jack sensing support for Olivetti Olibook 610-430 XPSE.
Hasso Tepper [Tue, 26 Jun 2007 11:04:50 +0000 (11:04 +0000)]
- Fix headphone jack sensing support for Olivetti Olibook 610-430 XPSE.
- Drain all callout handlers during driver detach appropriately.
- M_NOWAIT -> M_WAITOK

Obtained-from: FreeBSD

13 years agoClean up sys/bus/usb/usb_port.h. Remove not used/dead/old code.
Hasso Tepper [Tue, 26 Jun 2007 08:36:24 +0000 (08:36 +0000)]
Clean up sys/bus/usb/usb_port.h. Remove not used/dead/old code.

13 years agoNuke "is is" stammering.
Hasso Tepper [Tue, 26 Jun 2007 07:47:28 +0000 (07:47 +0000)]
Nuke "is is" stammering.

13 years agoAdd a new option (-i) that allows the insane deviation value to be set, and
Matthew Dillon [Tue, 26 Jun 2007 02:40:20 +0000 (02:40 +0000)]
Add a new option (-i) that allows the insane deviation value to be set, and
change the default to 0.5 seconds.  For example, -i 0.025 would set the test
to be 25ms.

Change the insane check... just map out a server deemed to be insane for 60
minutes, do not disconnect or reset it (which might lead to excessive packet
traffic).

Update the documentation.

13 years agoCreate a default dntpd.conf file for DragonFly using three pool.ntp.org
Matthew Dillon [Tue, 26 Jun 2007 01:41:38 +0000 (01:41 +0000)]
Create a default dntpd.conf file for DragonFly using three pool.ntp.org
hosts as the time source.

13 years agoAdjust debug output so columns line up better.
Matthew Dillon [Tue, 26 Jun 2007 00:40:35 +0000 (00:40 +0000)]
Adjust debug output so columns line up better.

13 years agoRecode the state machine to make it a bit less confusing. Collapse the
Matthew Dillon [Mon, 25 Jun 2007 21:33:36 +0000 (21:33 +0000)]
Recode the state machine to make it a bit less confusing.  Collapse the
two failure states into a single failure state and handle failure processing
in each state.

Handle DNS failures by having dntpd relookup failed DNSes occassionally.

dntpd will now relookup the server name if a server fails, allowing you
to specify domains which front pools of ntp servers.  dntpd will also
check for duplicate IPs and relookup again (up to a point).

Add a sanity check.  If two or more servers are specified a quorum of
servers must agree that the selected time offset is reasonable.  For the
moment do a +/- 30 second check (though we can probably make this +/- 2
seconds).   If a server is determined to be broken, scrap its data and
reconnect.  If it is still broken, permanently disable it.  This is
primarily to handle severely broken servers that are occassionally present
in ntp pools.

13 years agoFix rts_input() which is the only procedure which calls raw_input(). As
Matthew Dillon [Sun, 24 Jun 2007 20:00:00 +0000 (20:00 +0000)]
Fix rts_input() which is the only procedure which calls raw_input().  As
with other packet input routines, the mbuf must be demuxed and forwarded
to the correct protocol thread so it can be cpu-localized for processing.

This allow anyone, including interrupt code, to write to the routing
socket.

Reported-by: "Sepherosa Ziehau" <sepherosa@gmail.com>
13 years agoAdd missing name.
Sascha Wildner [Sun, 24 Jun 2007 17:42:58 +0000 (17:42 +0000)]
Add missing name.

13 years agoFix typo in a diagnostic message.
Sascha Wildner [Sun, 24 Jun 2007 17:37:35 +0000 (17:37 +0000)]
Fix typo in a diagnostic message.

13 years agoFix HISTORY.
Sascha Wildner [Sun, 24 Jun 2007 10:50:43 +0000 (10:50 +0000)]
Fix HISTORY.

13 years agoAdd a slightly modified ataraid(4) manpage from FreeBSD as nataraid(4).
Sascha Wildner [Sun, 24 Jun 2007 10:47:48 +0000 (10:47 +0000)]
Add a slightly modified ataraid(4) manpage from FreeBSD as nataraid(4).

13 years agoFrom FreeBSD:
Peter Avalos [Sun, 24 Jun 2007 05:17:51 +0000 (05:17 +0000)]
From FreeBSD:

Fixed the threshold for using the simple Taylor approximation.

In e_log.c, there was just a off-by-1 (1 ulp) error in the comment
about the threshold.  The precision of the threshold is unimportant,
but the magic numbers in the code are easier to understand when the
threshold is described precisely.

In e_logf.c, mistranslation of the magic numbers gave an off-by-1
(1 * 16 ulps) error in the intended negative bound for the threshold
and an off-by-7 (7 * 16 ulps) error in the intended positive bound for
the threshold, and the intended bounds were not translated from the
double precision bounds so they were unnecessarily small by a factor
of about 2048.

The optimization of using the simple Taylor approximation for args
near a power of 2 is dubious since it only applies to a relatively
small proportion of args, but if it is done then doing it 2048 times
as often _may_ be more efficient.  (My benchmarks show unexplained
dependencies on the data that increase with further optimizations
in this area.)

13 years agoRemove trailing whitespace.
Sascha Wildner [Sat, 23 Jun 2007 20:52:41 +0000 (20:52 +0000)]
Remove trailing whitespace.

13 years agoAdd markup for DIOCGPART.
Sascha Wildner [Sat, 23 Jun 2007 20:51:42 +0000 (20:51 +0000)]
Add markup for DIOCGPART.

13 years agoFix markup.
Sascha Wildner [Sat, 23 Jun 2007 20:39:10 +0000 (20:39 +0000)]
Fix markup.

13 years agoActually process rc_info.
Sascha Wildner [Sat, 23 Jun 2007 10:13:39 +0000 (10:13 +0000)]
Actually process rc_info.

13 years agoUse .Va for rc variables.
Sascha Wildner [Sat, 23 Jun 2007 09:37:24 +0000 (09:37 +0000)]
Use .Va for rc variables.

13 years ago- Add hw.skcX.imtime sysctl node and hw.skc.imtime tunable for interrupt
Sepherosa Ziehau [Sat, 23 Jun 2007 09:25:02 +0000 (09:25 +0000)]
- Add hw.skcX.imtime sysctl node and hw.skc.imtime tunable for interrupt
  moderation time.  Adjusting of hw.skcX.imtime will be committed to NIC
  immediately.
- Increase default interrupt moderation time from 100 usec to 160 usec.
  This reduces host interrupt load without noticable performance impact.

13 years agoRemove unused variable.
Simon Schubert [Fri, 22 Jun 2007 21:41:16 +0000 (21:41 +0000)]
Remove unused variable.

13 years ago- Factor out bge_{disable,enable}_intr().
Sepherosa Ziehau [Fri, 22 Jun 2007 15:26:18 +0000 (15:26 +0000)]
- Factor out bge_{disable,enable}_intr().
- In bge_enable_intr(), trigger another hardware interrupt after clearing
  interrupt mask, since any writing to BGE_MBX_IRQ0_LO will acknowledge
  interrupts.  Add comment about it.
- In bge_disable_intr(), acknowledge and disable interrupt by writing 1 to
  BGE_MBX_IRQ0_LO, since setting interrupt mask itself does not de-assert
  a currently asserted interrupt.  Add comment about it.
- Since we have explicitly disabled interrupt using BGE_MBX_IRQ0_LO, set
  "RX/TX coalesced BD count during interrupt" to 1.  In this way, RX/TX
  coalescing engine will properly update status block, which contains RX/TX
  descriptor index.  This only affects polling(4) operation, since we don't
  have a "during interrupt" period in our interrupt handler.
- Fix comment.

Tested-with: 5751, 5701(altima)

13 years ago- Add KTR_IF_{BGE,EM} to opt_ktr.h
Sepherosa Ziehau [Fri, 22 Jun 2007 12:08:07 +0000 (12:08 +0000)]
- Add KTR_IF_{BGE,EM} to opt_ktr.h
- Add commented out KTR_IF_{BGE,EM} entries to LINT

Reminded-by: swildner@
# LINT compiling test is conducted with KTR_IF_{BGE,EM}

13 years agoDrain packets even if link is down.
Sepherosa Ziehau [Fri, 22 Jun 2007 11:53:40 +0000 (11:53 +0000)]
Drain packets even if link is down.

Suggested-by: joerg@
Obtained-from: NetBSD (mlelstv@netbsd.org)

13 years agoAdd some KTRs in bge(4) to count RX/TX packets per interrupt.
Sepherosa Ziehau [Thu, 21 Jun 2007 15:00:18 +0000 (15:00 +0000)]
Add some KTRs in bge(4) to count RX/TX packets per interrupt.

13 years agoAdd mpls-in-ip. Bring in some fixes from IANA and FreeBSD in progress.
Hasso Tepper [Thu, 21 Jun 2007 13:36:58 +0000 (13:36 +0000)]
Add mpls-in-ip. Bring in some fixes from IANA and FreeBSD in progress.

13 years agoUpgrade to less-406 fixing some display bugs.
Peter Avalos [Wed, 20 Jun 2007 23:37:33 +0000 (23:37 +0000)]
Upgrade to less-406 fixing some display bugs.

13 years agoAdd our READMEs.
Peter Avalos [Wed, 20 Jun 2007 23:28:28 +0000 (23:28 +0000)]
Add our READMEs.

13 years agoMerge from vendor branch LESS:
Peter Avalos [Wed, 20 Jun 2007 23:25:56 +0000 (23:25 +0000)]
Merge from vendor branch LESS:
Import less-406.

13 years agoImport less-406.
Peter Avalos [Wed, 20 Jun 2007 23:25:56 +0000 (23:25 +0000)]
Import less-406.

13 years agoFix an issue with positive namecache timeouts. Locked children often
Matthew Dillon [Wed, 20 Jun 2007 06:23:24 +0000 (06:23 +0000)]
Fix an issue with positive namecache timeouts.  Locked children often
depend on the resolved vnode in the parent ncp's remaining intact, but
the positive namecache timeout code broke that rule and caused certain
VFS functions which depend on an intact parent (rename & remove primarily)
to occassionally return EPERM.  Only zap the node if it has no children.

Reported-by: Thomas Nikolajsen <thomas.nikolajsen@mail.dk>
13 years agoCorrect a bug in the -S truncation mode where the mode was not being passed
Matthew Dillon [Tue, 19 Jun 2007 19:28:18 +0000 (19:28 +0000)]
Correct a bug in the -S truncation mode where the mode was not being passed
to open(2), resulting in new files being created with weird permissions.

13 years agoDo not blindly allow the block count to overflow. Restrict newfs filesystem
Matthew Dillon [Tue, 19 Jun 2007 19:18:20 +0000 (19:18 +0000)]
Do not blindly allow the block count to overflow.  Restrict newfs filesystem
sizes to just under 1TB and report a fatal error if the media is too large.

13 years agoThe fstype was not being properly tested for a CCD uuid.
Matthew Dillon [Tue, 19 Jun 2007 19:09:46 +0000 (19:09 +0000)]
The fstype was not being properly tested for a CCD uuid.

Correct a bug when generating an interleave table for very large disk
arrays (> 2TB).  A size variable was 32 bits instead of 64 bits.

13 years agoRefuse to label media that is too large to handle a 32 bit disklabel
Matthew Dillon [Tue, 19 Jun 2007 19:07:41 +0000 (19:07 +0000)]
Refuse to label media that is too large to handle a 32 bit disklabel
(aka > 2TB).  disklabel64 will have to be used instead on such media.

13 years agoAdd the -p pidfile option to the vkernel.
Matthew Dillon [Tue, 19 Jun 2007 17:25:48 +0000 (17:25 +0000)]
Add the -p pidfile option to the vkernel.

Submitted-by: Chris Turner <c.turner@199technologies.org>
13 years agoAdd sysctl/tunable for TX/RX interrupt coalescing variables. Default
Sepherosa Ziehau [Tue, 19 Jun 2007 14:59:41 +0000 (14:59 +0000)]
Add sysctl/tunable for TX/RX interrupt coalescing variables.  Default
values are obtained from empirical measurement on bcm5751(PCIe).
Inspired-by: Bruce Evans <brde@optusnet.com.au> on freebsd-net mail list
For a running bge(4), setting these sysctl variables will not take
effect immediately; they are committed to device in the upcoming
interrupt handler.
Adapted-from: NetBSD if_bge.c 1.58 (jonathan@netbsd.org)

# On Altima AC9100 (bcm5701 based), TX/RX interrupt coalescing values
# seem to have no effect at all :(

13 years agoRename d_obj_uuid to d_stor_uuid to conform to the naming convention being
Matthew Dillon [Tue, 19 Jun 2007 06:39:10 +0000 (06:39 +0000)]
Rename d_obj_uuid to d_stor_uuid to conform to the naming convention being
used in other structures.

13 years agoCorrect a couple of uuid retention issues. Output the storage uuid for
Matthew Dillon [Tue, 19 Jun 2007 06:38:33 +0000 (06:38 +0000)]
Correct a couple of uuid retention issues.  Output the storage uuid for
each partition and generate a new storage uuid for any partition missing
one (also regenerate it if the user deletes the storage uuid line for
that partition).

13 years agoMake some adjustments to clean up structural field names. Add type and
Matthew Dillon [Tue, 19 Jun 2007 06:07:57 +0000 (06:07 +0000)]
Make some adjustments to clean up structural field names.  Add type and
storage uuid's to the partinfo structure for the DIOCGPART ioctl and
load the fields up for GPT slices and disklabel64 partitions.

13 years agoImplement non-booting support for the DragonFly 64 bit disklabel:
Matthew Dillon [Tue, 19 Jun 2007 02:53:56 +0000 (02:53 +0000)]
Implement non-booting support for the DragonFly 64 bit disklabel:

* Add full kernel support.  Both 32 and 64 bit labels will be probed.
* Add a new program, disklabel64, which allows you to create and edit
  the new disklabel.
* Add some logic to prevent foot shooting.

DragonFly's 64 bit disklabels start at byte offset 0 on the disk slice
or GPT partition and operate in a slice-relative fashion.  No translation
is required when going from on-disk to in-core or vise-versa, unlike the
existing 32 bit disklabels.  512 bytes at the beginning of the label are
reserved for legacy boot code.  Specifically, the label starts at sector 0,
NOT sector 1, which means its location on the disk is the same regardless
of the sector size.

The label has a UUID to uniquely identify the storage and a type and
object uuid for each partition.  All location specifications are 64 bit
byte offsets, NOT logical blocks.  The label enforces an alignment
requirement for label-related I/O and partitions which defaults to 4K
regardless of the sector size.  This makes the label 100% portable across
media with different sector sizes within the constraints of the alignment
requirement.

All partitions are specified using byte offsets and sizes, constrained
by the alignment requirement, relative to the base of the label (i.e.
offset 0 in the slice).  disklabel64 will adjust the offsets for display
purposes to be relative to the partition table area.  The label headers,
partition table, and boot2 areas come BEFORE the partition table area and
partitions which overlap any of those objects are not allowed.

By default, a virgin 64 bit disklabel will reserve 32K for boot2.  As of
this writing, boot1 and boot2 blocks have not yet been implemented.

13 years agoImprove the error message for gpt add a little.
Matthew Dillon [Tue, 19 Jun 2007 02:30:35 +0000 (02:30 +0000)]
Improve the error message for gpt add a little.

13 years agoMake vkernel compile with 'options SMP'. Most functions are stubs that
Joe Talbott [Mon, 18 Jun 2007 18:57:13 +0000 (18:57 +0000)]
Make vkernel compile with 'options SMP'.  Most functions are stubs that
call panic(9).

Reviewed-By: Matt Dillon
13 years agoMove all the code related to handling the current 32 bit disklabel
Matthew Dillon [Mon, 18 Jun 2007 05:13:43 +0000 (05:13 +0000)]
Move all the code related to handling the current 32 bit disklabel
to subr_disklabel32.c.  Move the header file from sys/disklabel.h to
sys/disklabel32.h.  Rename all the related structures and constants
and retire 'struct disklabel'.

Redo the sys/disklabel.h header file to implement a generic disklabel
abstraction.  Modify kern/subr_diskslice.c to use this abstraction, with
some shims for the ops dispatch at the moment which will be cleaned up later.

Adjust all auxillary code that directly accesses 32 bit disklabels to use
the new structure and constant names.

Remove the snoop-adjust code.  The kernel would snoop reads and writes to
the disklabel area via the raw slice device (e.g. ad0s1) and convert the
disklabel from the in-core format to the on-disk format and vise-versa.
The reads and writes made by disklabel -r and the kernel's own internal
readdisklabel and writedisklabel code used the snooping.

Rearrange the kernel's internal code to manually convert the disklabel when
reading and writing.  Rearrange the /sbin/disklabel program to do the same
when the -r option is used.  Have the disklabel program also check which
DragonFly OS it is running under so it can be run on older systems.  Note
that the disklabel binary prior to these changes will NOT operate on the
disklabel properly if running on a NEW kernel.

Introduce skeleton files for 64 bit disklabel support.

13 years agoDisklabel separation work - more.
Matthew Dillon [Mon, 18 Jun 2007 00:38:08 +0000 (00:38 +0000)]
Disklabel separation work - more.

13 years agoDisklabel separation work - Generally shift all disklabel-specific
Matthew Dillon [Sun, 17 Jun 2007 23:50:16 +0000 (23:50 +0000)]
Disklabel separation work - Generally shift all disklabel-specific
procedures for the kernel proper to a new source file, subr_disklabel32.c.
Move the DTYPE_ and FS_ defines out of sys/disklabel.h and into a new
header files sys/dtype.h

Make adjustments to the uuids file, renaming "DragonFly Label" to
"DragonFly Label32" and creating a "DragonFly Label64" uuid.

13 years agoMore syslink messaging work. Now basically done except for the I/O 'DMA'
Matthew Dillon [Sun, 17 Jun 2007 21:31:07 +0000 (21:31 +0000)]
More syslink messaging work.  Now basically done except for the I/O 'DMA'
component.

13 years ago* Add a missing KMODDEP to ng_eiface and hook it into the build. [*]
Sascha Wildner [Sun, 17 Jun 2007 20:33:14 +0000 (20:33 +0000)]
* Add a missing KMODDEP to ng_eiface and hook it into the build. [*]

* Add a ng_eiface(4) manual page from FreeBSD-4 [*] and add a reference
  to it in netgraph(4).

* Add a NETGRAPH_EIFACE kernel config option.

* Sync libnetgraph with our node types.

[*] Submitted-by: Nuno-Antunes <nuno.antunes@gmail.com>