9 years agoAdd kernel-layer support for chflags checks, remove (most) from the VFS layer.
Matthew Dillon [Wed, 6 May 2009 02:14:31 +0000 (19:14 -0700)]
Add kernel-layer support for chflags checks, remove (most) from the VFS layer.

Give nlookup() and nlookup_va() the tools to do nearly all chflags related
activities.  Here are the rules:

Immutable (uchg, schg)

    If set on a directory no files associated with the directory may
    be created, deleted, linked, or renamed.  In addition, any files open()ed
    via the directory will be immutable whether they are flagged that
    way or not.

    If set on a file or directory the file or directory may not be
    written to, chmodded, chowned, chgrped, or renamed.  The file can
    still be hardlinked and the file/directory can still be chflagged.
    If you do not wish the file to be linkable then set the immutable bit
    on all directories containing a link of the file.  Once you form
    this closure no further links will be possible.

    NOTE ON REASONING:  Security scripts should check link counts anyway,
    depending on a file flag which can be changed as a replacement for
    checking the link count is stupid.  If you are secure then your closures
    will hold.  If you aren't then nothing will save you.

    This feature is not recursive.  If the directory contains
    subdirectories they must be flagged immutable as well.

Undeletable (uunlnk, sunlnk)

    If set on a file or directory that file or directory cannot be removed
    or renamed.  The file can still otherwise be manipulated, linked, and
    so forth.  However, it should be noted that any hardlinks you create
    will also not be deletable :-)

    If set on a directory this flag has no effect on the contents
    of the directory (yet).  See APPEND-ONLY on directories for what
    you want.

Append-only (uappnd/sappnd)

    If set on a directory no file within the directory may be deleted or
    renamed.  However, new files may be created in the directory and
    the files in the directory can be modified or hardlinked without

    If set on a file the file cannot be truncated, random-written, or
    deleted.  It CAN be chmoded, chowned, renamed, and appended to
    with O_APPEND etc.

    If you do not wish the file to be renameable then you must also
    set the Undeletable flag.  Setting the append-only flag will ensure
    that the file doesn't disappear from the filesystem, but does not
    prevent it from being moved about the filesystem.

Security fix - futimes()

    futimes() could be called on any open descriptor.  Restrict
    it to just those files you own or have write permission on.

Security fix - Hardlinks

    Users can no longer hardlink foreign-owned files which they do not
    have write access to.  The user must now have write permission on
    the file being hardlinked or the user must own the file, or be root.

Security fix - fcntl()

    fcntl() can no longer be used to turn of O_APPEND mode if the file
    was flagged append-only.


    * Append-only on directories

    * Immutable on directories to control set-in-stone & hardlinking

    * Immutable files can be hardlinked on DragonFly, not on FreeBSD.

    * User must be the owner of the file or have write access to the
      file being hardlinked.

9 years agoAdd tunable to _not_ register i8254 interrupt if lapic timer is used.
Sepherosa Ziehau [Mon, 4 May 2009 11:13:23 +0000 (19:13 +0800)]
Add tunable to _not_ register i8254 interrupt if lapic timer is used.

9 years agoMake dev.cpu.X.cx_usage sysctl also report current average of sleep time.
Hasso Tepper [Mon, 4 May 2009 08:47:30 +0000 (11:47 +0300)]
Make dev.cpu.X.cx_usage sysctl also report current average of sleep time.

9 years agoRework a CPU C-state selection logic a bit.
Hasso Tepper [Mon, 4 May 2009 08:45:16 +0000 (11:45 +0300)]
Rework a CPU C-state selection logic a bit.

Avoid comparing negative signed to positive unsignad values. It was
leading to a bug, when C-state does not decrease on sleep shorter then
declared transition latency. Fixing this deprecates workaround for broken
C-states on some hardware.

By the way, change state selecting logic a bit. Instead of last sleep
time use short-time average of it. Global interrupts rate in system is a
quite random value, to corellate subsequent sleeps so directly.

Obtained-from: FreeBSD

9 years agoMove the code to update cpu_cx_count out of acpi_cpu_generic_cx_probe().
Hasso Tepper [Mon, 4 May 2009 08:43:29 +0000 (11:43 +0300)]
Move the code to update cpu_cx_count out of acpi_cpu_generic_cx_probe().

Put it into acpi_cpu_startup() which is where all the other code to update
this global variable lives.  This fixes a bug where cpu_cx_count was not
updated correctly if acpi_cpu_generic_cx_probe() returned early.

Obtained-from: FreeBSD

9 years agoacpi_cpu: fixup for PIIX4E PCI config related to C2.
Hasso Tepper [Mon, 4 May 2009 08:41:24 +0000 (11:41 +0300)]
acpi_cpu: fixup for PIIX4E PCI config related to C2.

If you have seen
cpu0: too many short sleeps, backing off to C1
with this chipset before you may want to try cx_lowest of C2 again.

Obtained-from: FreeBSD

9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Mon, 4 May 2009 06:15:18 +0000 (23:15 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoThe kernel permissions check code was not checking deletability for
Matthew Dillon [Mon, 4 May 2009 06:08:19 +0000 (23:08 -0700)]
The kernel permissions check code was not checking deletability for
the rename source or the directory sticky bit for rename targets which

This only effected HAMMER which assumes the kernel is responsible for
permissions checks.

Reported-by: YONETANI Tomokazu <qhwt+dfly@les.ath.cx>
9 years agonanosleep: don't overwrite error with copyout success status
Simon Schubert [Sun, 3 May 2009 21:06:53 +0000 (23:06 +0200)]
nanosleep: don't overwrite error with copyout success status

When nanosleep gets interrupted, it returns EINTR.  In the case of a
non-zero error status, sys_nanosleep will copyout() the remaining sleep
time.  However it would overwrite the nanosleep error status with the
error status of copyout() -- which is 0 (success) most of the time.  This
means the important error status of nanosleep (EINTR) would be overwritten
by 0.  Follow FreeBSD and NetBSD and only return the copyout status if it

Reported-by: walt
9 years agoinstaller: Sync ASCII Fred with loader menu pic.
Sascha Wildner [Sun, 3 May 2009 15:39:09 +0000 (17:39 +0200)]
installer: Sync ASCII Fred with loader menu pic.

9 years agoinstaller: Move the installer from contrib/ to usr.sbin/.
Sascha Wildner [Sun, 3 May 2009 15:22:33 +0000 (17:22 +0200)]
installer: Move the installer from contrib/ to usr.sbin/.

Fix trailing whitespace while I'm doing it.

9 years agoacpi: Select proper one shot timer based on CPUs' C3 state.
Sepherosa Ziehau [Sun, 3 May 2009 08:42:11 +0000 (16:42 +0800)]
acpi: Select proper one shot timer based on CPUs' C3 state.

9 years agoAllow one shot timer to be switched on a running system between i8254 and
Sepherosa Ziehau [Sun, 3 May 2009 07:36:56 +0000 (15:36 +0800)]
Allow one shot timer to be switched on a running system between i8254 and
lapic timer:

- Always register "clk" interrupt.
- Add cputimer_intr_switch(), which could switch one shot timer between
  i8254 and lapic timer on a running system.  It could be used to select
  a proper one shot timer duing ACPI C3 transition:
  e.g. ->C3 use i8254, C3-> use lapic timer
- Add sysctl node hw.cputimer_intr_type to test cputimer_intr_switch().

9 years agohpet: Veto Sx state transition, if hpet is the sys_cputimer, since
Sepherosa Ziehau [Sun, 3 May 2009 05:53:52 +0000 (13:53 +0800)]
hpet: Veto Sx state transition, if hpet is the sys_cputimer, since
hpet is not required to function under S1-S5.  Add comment about
the reference to the related hpet standard items.

9 years agolapic timer: Improve lapic timer vector code
Sepherosa Ziehau [Sun, 3 May 2009 04:49:15 +0000 (12:49 +0800)]
lapic timer: Improve lapic timer vector code

- Check for a non-zero td->td_nest_count before allowing the processing
  to occur.  Mainly to allow interrupt thread preemption to work for
  slow interrupts
- Increment V_INTR statistic, so vm.stats.sys.v_intr shows correct value,
  i.e. 'Int' field in systat -vm

Submitted-by: dillon@
9 years agoinstaller: Deactivate "Install extra software packages" option.
Sascha Wildner [Sun, 3 May 2009 08:51:40 +0000 (10:51 +0200)]
installer: Deactivate "Install extra software packages" option.

Our installation unconditionally cpdups the CD/DVD's /usr/pkg to the
disk so this option doesn't make sense and only confused people in the

Leave "Remove software packages" in, though, since it actually seems to

9 years agoinstaller: Fix various issues related to MFS backed partitions.
Sascha Wildner [Sun, 3 May 2009 07:25:44 +0000 (09:25 +0200)]
installer: Fix various issues related to MFS backed partitions.

* The size of the MFS should be what the user specified, and not be
  based on slice size.

* On the fstab line, specify block and fragment size too. Softupdates
  is ignored as it doesn't seem to play nice with MFS.

* MFS backed partitions don't need to be mounted at installation time
  but the mount points have to be created anyway.

* While here, perform some minor cleanup.

9 years agoinstaller: Fix typo that prevented the creation of MFS backed partitions.
Sascha Wildner [Sat, 2 May 2009 20:42:08 +0000 (22:42 +0200)]
installer: Fix typo that prevented the creation of MFS backed partitions.

A bit late, but it was hard to spot. :)

Reported-by: Alec Berryman <alec@thened.net>
Dragonfly-bug: <http://bugs.dragonflybsd.org/issue34>

9 years agoinstaller: Remove some OpenBSD specific code.
Sascha Wildner [Sat, 2 May 2009 07:57:27 +0000 (09:57 +0200)]
installer: Remove some OpenBSD specific code.

9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Sat, 2 May 2009 17:52:18 +0000 (10:52 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoposix_memalign.3: Add implementation notes from malloc.3
Stathis Kamperis [Sat, 2 May 2009 18:54:09 +0000 (18:54 +0000)]
posix_memalign.3: Add implementation notes from malloc.3

Reviewed-by: swildner@
9 years agoposix_memalign.3: Import manual page from FreeBSD.
Stathis Kamperis [Sat, 2 May 2009 12:30:14 +0000 (12:30 +0000)]
posix_memalign.3: Import manual page from FreeBSD.

Reviewed-by: swildner@
9 years agoFix sticky bit directory handling for deletions.
Matthew Dillon [Sat, 2 May 2009 17:49:58 +0000 (10:49 -0700)]
Fix sticky bit directory handling for deletions.

Properly check the sticky bit and disallow deletions if set on a directory
and the user doing the deletion does not own the directory or the file.

This check is now being done by the kernel layer, VFSs do not need to do
this check any more.

Reported-by: YONETANI Tomokazu <qhwt+dfly@les.ath.cx>
9 years agoUnbreak VKERNEL compile: Add missing symbol
Sepherosa Ziehau [Sat, 2 May 2009 13:59:29 +0000 (21:59 +0800)]
Unbreak VKERNEL compile: Add missing symbol

9 years agolapic timer: Correct AMD C1E handling
Sepherosa Ziehau [Sat, 2 May 2009 13:35:59 +0000 (21:35 +0800)]
lapic timer: Correct AMD C1E handling

It turns out that AMD C1E only happens after ACPI-CA module is
running, so we will have to broadcast IPI at the end of the ACPI-CA
attach to clear the C1E related bits and kick start the possible
stalled lapic timer.

Tested-with: TL-58

9 years agolapic timer: Finish the lapic timer support
Sepherosa Ziehau [Sat, 2 May 2009 09:41:23 +0000 (17:41 +0800)]
lapic timer: Finish the lapic timer support

- Add lapic_timer_process_oncpu(), which fires per-cpu systimer queue.
- Add lapic_timer_intr_reload(), which restart/start lapic timer.
- Change cputimer_intr_reload to function pointer, so it could be
  overridden when needed.  It is original cputimer_intr_reload function
  on amd64 and vkernel.  On i386, APIC initialization will set it to
  lapic_timer_intr_reload if lapic_timer_enable tunable is set to 1,
  else i8254_intr_reload (origial cputimer_intr_reload) will be used.
- If lapic_timer_enable is 1, then don't try to register "clk" interrupt
  handler at all.

As of this commit, lapic timer support is done.  It is not enabled by
default, set 'hw.lapci_timer_enable' to enable it.

9 years agolapic timer: Improve lapic timer testing
Sepherosa Ziehau [Sat, 2 May 2009 08:26:22 +0000 (16:26 +0800)]
lapic timer: Improve lapic timer testing

- Add lapic_timer_oneshot_intr_enable(), which set lapic timer into
  one shot mode and enable lapic timer interrupt.  It is called
  during per-cpu systimers initialization.
- Add lapic_timer_oneshot_quick(), which only set lapic timer's ICR

9 years agosystimer/cputimer: Add {systimer,cputimer}_intr_enable()
Sepherosa Ziehau [Sat, 2 May 2009 07:57:28 +0000 (15:57 +0800)]
systimer/cputimer: Add {systimer,cputimer}_intr_enable()

9 years agolapic timer: Add lapic timer interrupt delivery testing
Sepherosa Ziehau [Sat, 2 May 2009 07:28:50 +0000 (15:28 +0800)]
lapic timer: Add lapic timer interrupt delivery testing

9 years agolapic timer: Add necessary bits for lapic timer interrupt delivery
Sepherosa Ziehau [Sat, 2 May 2009 05:50:34 +0000 (13:50 +0800)]
lapic timer: Add necessary bits for lapic timer interrupt delivery

The implementation in ipl.s and apic_vector.s is based on our ipiq

9 years agolapic timer: Setup AP lapic timer's divisor
Sepherosa Ziehau [Sat, 2 May 2009 04:45:41 +0000 (12:45 +0800)]
lapic timer: Setup AP lapic timer's divisor

9 years agoMove sysbeepstop_ch initialization to the beginning of cpu_initclocks()
Sepherosa Ziehau [Sat, 2 May 2009 04:33:19 +0000 (12:33 +0800)]
Move sysbeepstop_ch initialization to the beginning of cpu_initclocks()

9 years agoi8254: Adjust cpu_initclocks() a little bit.
Sepherosa Ziehau [Sat, 2 May 2009 04:24:21 +0000 (12:24 +0800)]
i8254: Adjust cpu_initclocks() a little bit.

- Factor out i8254_intr_reload(), and use it in cpu_initclocks()
- In i8254 interrupt delivery testing, add assertion to make sure
  that the current sys_cputimer is i8254

9 years agoclocks.7: Mention HPET cpu timer.
Stathis Kamperis [Sat, 2 May 2009 11:23:25 +0000 (11:23 +0000)]
clocks.7: Mention HPET cpu timer.

Based on the commit message of bea6e27816f976132ff7ad7446c0e90406378f9b.
Also, replace .Sy macros with .Va when refering to sysctl variables.

Discussed-with: sephe@
Reviewed-by: swildner@
9 years agoRaise some WARNS in usr.bin.
Sascha Wildner [Fri, 1 May 2009 22:20:25 +0000 (00:20 +0200)]
Raise some WARNS in usr.bin.

9 years agodoscmd(1): Raise WARNS to 6 and silence all warnings.
Sascha Wildner [Fri, 1 May 2009 20:00:12 +0000 (22:00 +0200)]
doscmd(1): Raise WARNS to 6 and silence all warnings.

9 years agoclocks(7): tsc frequency is found with hw.tsc_frequency now
Stathis Kamperis [Fri, 1 May 2009 21:50:58 +0000 (21:50 +0000)]
clocks(7): tsc frequency is found with hw.tsc_frequency now

Also mention hw.tsc_present sysctl for determining its presence.

9 years agolapic timer: Reimplement set_apic_timer using lapic_timer_oneshot
Sepherosa Ziehau [Fri, 1 May 2009 10:18:42 +0000 (18:18 +0800)]
lapic timer: Reimplement set_apic_timer using lapic_timer_oneshot

9 years agoRemove dead apic timer code.
Sepherosa Ziehau [Fri, 1 May 2009 09:48:30 +0000 (17:48 +0800)]
Remove dead apic timer code.

9 years agolapic timer: Disable C1 Enhanced mode on AMD K8 Family Revision F
Sepherosa Ziehau [Fri, 1 May 2009 09:14:58 +0000 (17:14 +0800)]
lapic timer: Disable C1 Enhanced mode on AMD K8 Family Revision F
and above to keep local APIC timer alive.

Obtained-from: FreeBSD (ariff@freebsd.org)
See-also: FreeBSD PR i386/104678

9 years agolapic timer: Save lapic timer frequency
Sepherosa Ziehau [Fri, 1 May 2009 08:55:35 +0000 (16:55 +0800)]
lapic timer: Save lapic timer frequency

9 years agolapic timer: Add lapic timer calibration code.
Sepherosa Ziehau [Fri, 1 May 2009 07:00:31 +0000 (15:00 +0800)]
lapic timer: Add lapic timer calibration code.

The calibrated information is not used yet.

Obtained-from: FreeBSD (jhb@freebsd.org)

9 years agohpet: Bark loud if 1024B hpet register space couldn't be mapped
Sepherosa Ziehau [Fri, 1 May 2009 04:48:10 +0000 (12:48 +0800)]
hpet: Bark loud if 1024B hpet register space couldn't be mapped

9 years agoPrepare lapic timer: Patch the hardware bug in nForce2 chipset,
Sepherosa Ziehau [Fri, 1 May 2009 04:26:05 +0000 (12:26 +0800)]
Prepare lapic timer: Patch the hardware bug in nForce2 chipset,
which could hang the lapic timer.

"Workaround a hang on some nForce2 systems that can happen if the
 CPU goes into and out of the halt state very quickly."

Obtained-from: FreeBSD rev158881
See-also: FreeBSD PR i386/97785

9 years agoacpi_timer: Timer name change.
Sepherosa Ziehau [Fri, 1 May 2009 03:59:57 +0000 (11:59 +0800)]
acpi_timer: Timer name change.

ACPI-safe   -- 32bit counter
ACPI-safe24 -- 24bit counter

9 years agoFix comment
Sepherosa Ziehau [Fri, 1 May 2009 02:46:24 +0000 (10:46 +0800)]
Fix comment

9 years agoacpi.4: Add debug.acpi.enabled; mention hpet there
Sepherosa Ziehau [Fri, 1 May 2009 02:39:28 +0000 (10:39 +0800)]
acpi.4: Add debug.acpi.enabled; mention hpet there

9 years agoktrdump: ignore ts=0 when searching for earliest_ts()
Simon Schubert [Thu, 30 Apr 2009 10:59:47 +0000 (12:59 +0200)]
ktrdump: ignore ts=0 when searching for earliest_ts()

When merge-printing multiple cpu buffers, we already treat ts=0 as
a condition to prefer a more recent entry.  However when searching for
the first entry, ts=0 (empty) will be treated regularly.  This can lead
to a situation that ktrdump would only print entries from the last CPU:

Assume you had 4 CPUs, and the buffer for CPU #2 and #3 started out with
empty entries (which would not be ignored by earliest_ts()).  When
searching for the next entry, the empty (ts=0) entry of CPU #2 would
always be selected as the first entry.  However a ts=0 entry of CPU #3
would override this.  In this case only the index of CPU #3 would
advance until full entries would be printed.  Once in this situation,
processing the ts of CPU #2 would always reset ts to 0, and this would
be treated as "not found" when processing CPU #3's entries, leading to
an output that only contains CPU #3 entries.

9 years agomalloc(3) manual page: Adjust to our new malloc() implementation.
Sascha Wildner [Thu, 30 Apr 2009 10:56:08 +0000 (12:56 +0200)]
malloc(3) manual page: Adjust to our new malloc() implementation.

The "IMPLEMENTATION NOTES" section was submitted by Matt Dillon.

9 years agonmalloc - Further optimize posix_memalign()
Matthew Dillon [Thu, 30 Apr 2009 03:07:07 +0000 (20:07 -0700)]
nmalloc - Further optimize posix_memalign()

Align the requested size to the nearest alignment to improve our chances
of coming up with a power-of-2.

Greatly improve the fitting algorithm for oddly sized requests, e.g.

(1) 32 byte alignment on a 1026 size.  In this case the zone for 1026
    already has a chunking (128) that exceeds the requested alignment,
    so we just do a _slaballoc().

(2) A 256 byte alignment on a 513 byte size.  In this case the zone
    for 513 has a chunking of 64, which is not sufficient, so we
    find the nearest power-of-2 >= 513 and allocate that.  In our
    case we would find 1024.  Since _slaballoc() guarantees that
    power-of-2 allocations within the zone limit will be on the
    same-sized boundary, we then just allocate the nearest power of 2.

9 years agoHAMMER Utility: Update mirror-dump, mirror-read to reflect protocol changes.
Matthew Dillon [Wed, 29 Apr 2009 22:44:14 +0000 (15:44 -0700)]
HAMMER Utility: Update mirror-dump, mirror-read to reflect protocol changes.

The mirror-dump command now ignores PFSD records instead of complaining
and aborting.

The mirror-read command now includes handling for CRC-errored records.

9 years agoHAMMER VFS - Better CRC handling, bad-file handling.
Matthew Dillon [Wed, 29 Apr 2009 22:34:59 +0000 (15:34 -0700)]
HAMMER VFS - Better CRC handling, bad-file handling.

Data CRC errors should now generate EIO instead of panic()ing the system.
B-Tree CRC errors might still panic() and freemap CRC errors WILL still

Continuing from DDB on a B-Tree node CRC error when debugging is enabled
now no longer marks the B-Tree node as good.

The mirror-read command will now transfer data records with bad CRCs
instead of aborting the transfer, identifying them with a new type field.
The mirror-write ioctl currently ignores such records.

If a directory entry is encountered and the related inode cannot be
looked up, generate a dummy in-memory inode of type FIFO to placemark
the bad directory entry, allowing it to be removed.  Currently it is
possible for a directory entry to be synced to the media in a different
transaction then the related inode (a bug which needs to be fixed).
If a crash occurs at the wrong time the recovery code can leave the media
in a state where the directory entry exists but the inode does not.  This
change allows the bad directory entry to be removed.

Reported-by: Antonio Huete Jimenez
9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Wed, 29 Apr 2009 17:38:19 +0000 (10:38 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoAs per POSIX, unconstify if_name in <net/if.h>.
Sascha Wildner [Wed, 29 Apr 2009 15:36:13 +0000 (17:36 +0200)]
As per POSIX, unconstify if_name in <net/if.h>.

9 years agoAdd HPET cputimer.
Sepherosa Ziehau [Wed, 29 Apr 2009 14:27:27 +0000 (22:27 +0800)]
Add HPET cputimer.

HPET - High Precision Event Timers.  Only main counter is used
currently.  This cputimer should be faster than ACPI-fast24 and
ACPI-safe, so give it highest priority.

HPET is not enabled by default.  You could add "hpet" to
debug.acpi.enabled to enable it.

Obtained-from: FreeBSD
Submitted-by: Dmitry Komissaroff <aunoor@gmail.com> w/ mod from me
Local change:
Try mapping 0x100 bytes HPET register space, if broken ACPI tables
are encountered (like one of my testing box); 0x100 is large enough
to cover the main counter.

9 years agoSplit out core kern_clock_*() calls for the clock_*() system calls.
Matthew Dillon [Wed, 29 Apr 2009 01:28:56 +0000 (18:28 -0700)]
Split out core kern_clock_*() calls for the clock_*() system calls.

Submitted-by: Mohd Farid Kamarudin <mokamaru@gmail.com>
9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Tue, 28 Apr 2009 23:52:31 +0000 (16:52 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoFix short allocation in libc RTLD for static-compiled programs.
Matthew Dillon [Tue, 28 Apr 2009 23:49:30 +0000 (16:49 -0700)]
Fix short allocation in libc RTLD for static-compiled programs.

libc's __libc_allocate_tls() (weakly bound to _rtld_allocate_tls()) was not
allocating enough space for the TLS segments in statically-compiled
threaded applications.

The old malloc allocated lots of extra space and masked the bug.  The new
slab malloc doesn't and revealed the bug.

Reproduced-by: Sepherosa Ziehau <sepherosa@gmail.com>
9 years agoktrdump: remove debug output
Simon Schubert [Tue, 28 Apr 2009 18:24:23 +0000 (20:24 +0200)]
ktrdump: remove debug output

9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Tue, 28 Apr 2009 16:34:46 +0000 (09:34 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoAdd posix_memalign(), fix minor bug in nmalloc.
Matthew Dillon [Tue, 28 Apr 2009 16:30:10 +0000 (09:30 -0700)]
Add posix_memalign(), fix minor bug in nmalloc.

Add the posix_memalign() function in all of its glory.  Our new slab
allocator already does most of the job perfectly, particularly when
alignment < size (for things like cache-line aligned allocations).

Correct a bug in _vmem_alloc() for the case where (size) is much larger
then (alignment).  The hack to get mmap() to return an aligned address
was not properly unmapping temporarily-mapped space.

Reformulate how errno is set to support posix_memalign(), which is defined
by the standard to return the error rather then set errno.

Requested-by: Hasso Tepper <hasso@estpak.ee>
9 years agoemx(4): __cachealign struct emx_rxdata
Sepherosa Ziehau [Tue, 28 Apr 2009 15:03:37 +0000 (23:03 +0800)]
emx(4): __cachealign struct emx_rxdata

9 years agoserializer: Revoke PROFILE_SERIALIZER kernel option
Sepherosa Ziehau [Tue, 28 Apr 2009 13:46:49 +0000 (21:46 +0800)]
serializer: Revoke PROFILE_SERIALIZER kernel option

This kernel is added by me to do preliminary serializer contention
profiling.  It is kinda invasive and expands struct lwkt_serialize
considerably.  Need to find a better way...

9 years agoRemove unneeded .Pp before .Rs in various manual pages.
Sascha Wildner [Tue, 28 Apr 2009 13:02:19 +0000 (15:02 +0200)]
Remove unneeded .Pp before .Rs in various manual pages.

9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Tue, 28 Apr 2009 05:28:56 +0000 (22:28 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoFix type-o, the file $syscompatdcldf12 file was not being properly touched.
Matthew Dillon [Tue, 28 Apr 2009 03:44:04 +0000 (20:44 -0700)]
Fix type-o, the file $syscompatdcldf12 file was not being properly touched.

Submitted-by: Mohd Farid Kamarudin <mokamaru@gmail.com>
9 years agoSync zoneinfo database with tzdata2009g from elsie.
Sascha Wildner [Mon, 27 Apr 2009 18:19:43 +0000 (20:19 +0200)]
Sync zoneinfo database with tzdata2009g from elsie.

africa:         8.18 -> 8.19

Due to Ramadan shifting through the Gregorian calendar it will end before
the fourth Thursday in September in 2009 and the next couple of years, so
Egypt is expected to end DST on the last Thursday in September.

9 years agoifpoll: Unbreak UP LINT building
Sepherosa Ziehau [Mon, 27 Apr 2009 13:15:50 +0000 (21:15 +0800)]
ifpoll: Unbreak UP LINT building

9 years agoAdd IFPOLL_ENABLE to LINT
Sepherosa Ziehau [Mon, 27 Apr 2009 12:25:10 +0000 (20:25 +0800)]

9 years agoemx(4): If error happens, we must hold all of the serializers instead
Sepherosa Ziehau [Mon, 27 Apr 2009 12:22:25 +0000 (20:22 +0800)]
emx(4): If error happens, we must hold all of the serializers instead
of trying to hold them, since the register content changes upon next

9 years agoifpoll: Fix comment
Sepherosa Ziehau [Mon, 27 Apr 2009 12:15:42 +0000 (20:15 +0800)]
ifpoll: Fix comment

9 years agoifpoll: Reorganize TX/RX polling sysctl tree
Sepherosa Ziehau [Mon, 27 Apr 2009 11:08:01 +0000 (19:08 +0800)]
ifpoll: Reorganize TX/RX polling sysctl tree

9 years agoifpoll: Use rdtsc() whenever possible to calculate time related states.
Sepherosa Ziehau [Sun, 26 Apr 2009 12:05:13 +0000 (20:05 +0800)]
ifpoll: Use rdtsc() whenever possible to calculate time related states.

9 years agoifpoll: Expose kernel time fraction; currenly for debugging only.
Sepherosa Ziehau [Sun, 26 Apr 2009 11:05:31 +0000 (19:05 +0800)]
ifpoll: Expose kernel time fraction; currenly for debugging only.

9 years agoifpoll: Put pollmore under crit section
Sepherosa Ziehau [Sun, 26 Apr 2009 06:42:56 +0000 (14:42 +0800)]
ifpoll: Put pollmore under crit section

9 years agoifpoll: crit_{enter,exit}() -> crit_{enter,exit}_gd()
Sepherosa Ziehau [Sun, 26 Apr 2009 06:12:47 +0000 (14:12 +0800)]
ifpoll: crit_{enter,exit}() -> crit_{enter,exit}_gd()

9 years agoifpoll: Let callers of sched_* enter/exit crit section
Sepherosa Ziehau [Sun, 26 Apr 2009 06:01:56 +0000 (14:01 +0800)]
ifpoll: Let callers of sched_* enter/exit crit section

9 years agoifpoll: Put iteration of polling handlers under crit section.
Sepherosa Ziehau [Sun, 26 Apr 2009 05:13:28 +0000 (13:13 +0800)]
ifpoll: Put iteration of polling handlers under crit section.

9 years agoAdd ifpoll, which support hardware TX/RX queues based polling.
Sepherosa Ziehau [Sun, 26 Apr 2009 03:04:18 +0000 (11:04 +0800)]
Add ifpoll, which support hardware TX/RX queues based polling.
The implementation is mainly based on the polling(4) code.

Difference to the polling(4):
- Instead of registering one polling handler for both TX/RX and status,
  drivers could register multiple polling handlers for TX/RX polling
  handler on different CPU based on its own needs.  And drivers could
  register one status check handler, which is always polled on CPU0.
- TX could be polled at lower frequency than RX; normally we don't
  need high frequency polling for TX, but for RX, we may need relative
  higher polling frequency.
- Better serializer integration.

ifnet changes:
- ifnet.if_qpoll is added, which should be implemented by driver which
  supports ifpoll.
- IFF_NPOLLING is added to indicate that the driver is using ifpoll.

- Add 'npolling' and '-npolling'; they are used to turn on/off ifpoll
  on the specified interface.

- emx(4) is converted to use the ifpoll.  Coexistance of ifpoll and
  polling(4) in one driver requires extra effort in driver itself;
  drop polling(4) support in emx(4) for now.

IFPOLL_ENABLE kernel option is added, which is not enabled by default.

9 years agoUse STDERR_FILENO for stderr messages.
Hasso Tepper [Mon, 27 Apr 2009 03:28:55 +0000 (06:28 +0300)]
Use STDERR_FILENO for stderr messages.

9 years agoFix recursive lock in detached close of /dev/tty.
Matthew Dillon [Sun, 26 Apr 2009 19:26:16 +0000 (12:26 -0700)]
Fix recursive lock in detached close of /dev/tty.

A recursive lock and vp-held-after-release issue when close()ing a /dev/tty
descriptor was resulting in a panic.

Reported-by: Hasso Tepper <hasso@estpak.ee>
9 years agoUse MAP_TRYFIXED instead of MAP_FIXED when mapping the red zone.
Matthew Dillon [Sat, 25 Apr 2009 18:43:15 +0000 (11:43 -0700)]
Use MAP_TRYFIXED instead of MAP_FIXED when mapping the red zone.

We want to fail if the user program already faulted through the zone,
though in reality the red zone init occurs before main() is even run so
there is no practical difference.

9 years agoAdd cpdup feature - allow uid/gid/flags changes to fail if running as user
Matthew Dillon [Sat, 25 Apr 2009 18:39:45 +0000 (11:39 -0700)]
Add cpdup feature - allow uid/gid/flags changes to fail if running as user

If running as a user instead of root uid, gid, and flags changes are allowed
to fail and also, if running as a user, no longer force a copy if they
differ but the mtime and size are the same.  Generate a single warning

Reorder the call to setutimes to occur after chown/chmod instead of before,
and to occur after a chflags call if IMMUTABLE is not set.

9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Sat, 25 Apr 2009 17:42:30 +0000 (10:42 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoFix an installworld failure due to kernel fixes and a libthread_xu issue.
Matthew Dillon [Sat, 25 Apr 2009 17:36:03 +0000 (10:36 -0700)]
Fix an installworld failure due to kernel fixes and a libthread_xu issue.

Build the bootstrap version of cpdup without threading to work around a
bug in libthread_xu.  Libthread_xu was trying to map the original user
stack's red zone without using MAP_FIXED or MAP_TRYFIXED or MAP_STACK,
a behavior which the kernel now prohibits.

This fixes running installworld after rebooting with a new kernel.

Sepherosa Ziehau <sepherosa@gmail.com>

9 years agopktgenctl: Update according to recent libc changes
Sepherosa Ziehau [Sat, 25 Apr 2009 11:40:24 +0000 (19:40 +0800)]
pktgenctl: Update according to recent libc changes

9 years agoAdd a dummy offset to the arrays generated by genassym to avoid ary[0]
Matthew Dillon [Sat, 25 Apr 2009 00:11:10 +0000 (17:11 -0700)]
Add a dummy offset to the arrays generated by genassym to avoid ary[0]

The dummy offset avoids the generation of dummy arrays of size zero.
This whole code path is a hack, but after a lot of messing around
Alex and I determined that it was easier to hack it then to try to
redo the code due to complications introduced by cross-compiled

Submitted-by: Alex Hornung <ahornung@gmail.com>
9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Fri, 24 Apr 2009 19:23:31 +0000 (12:23 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoFix the backslashes in a __asm line's interference with an #ifdef
Matthew Dillon [Fri, 24 Apr 2009 19:21:49 +0000 (12:21 -0700)]
Fix the backslashes in a __asm line's interference with an #ifdef

Reported-by: Hasso Tepper
9 years agoNo barriers and spinlocks.
Hasso Tepper [Fri, 24 Apr 2009 18:59:40 +0000 (21:59 +0300)]
No barriers and spinlocks.

9 years agoWe don't support barriers and spinlocks yet.
Hasso Tepper [Fri, 24 Apr 2009 18:30:36 +0000 (21:30 +0300)]
We don't support barriers and spinlocks yet.

Fixes a lot of problems with thirdparty software.

9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Fri, 24 Apr 2009 17:21:45 +0000 (10:21 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoFix various clang compile issues
Alex [Fri, 24 Apr 2009 12:54:22 +0000 (13:54 +0100)]
Fix various clang compile issues

1) remove uses of __label__, which is not supported by llvm/clang
2) remove uses of register type var __asm("ecx") and other variable
register-binding as it is not supported by llvm/clang and is superfluous
3) add an ugly hack, conditionalized on __clang__, to allow correct
compilation of atomic_intr_cond_try()

Submitted-by: Alex Hornung
Cherry-Picked-From: Alex Hornung's leaf repo

9 years agounvis(3) manual page: s/RFCxxxx/RFC xxxx/
Sascha Wildner [Fri, 24 Apr 2009 06:47:11 +0000 (08:47 +0200)]
unvis(3) manual page: s/RFCxxxx/RFC xxxx/

9 years agoypclient(3) manual page: .Pp not needed here.
Sascha Wildner [Fri, 24 Apr 2009 06:46:44 +0000 (08:46 +0200)]
ypclient(3) manual page: .Pp not needed here.

9 years agoDon't call vm_map_findspace() when MAP_TRYFIXED is specified.
Matthew Dillon [Thu, 23 Apr 2009 22:18:10 +0000 (15:18 -0700)]
Don't call vm_map_findspace() when MAP_TRYFIXED is specified.

MAP_TRYFIXED is intended to return the requested address or an error.

9 years agoFix logic when using the umtx_*_err() functions. With these functions
Matthew Dillon [Thu, 23 Apr 2009 22:16:51 +0000 (15:16 -0700)]
Fix logic when using the umtx_*_err() functions.  With these functions
a positive error value is returned, not -1.

9 years agoMerge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Matthew Dillon [Thu, 23 Apr 2009 21:46:18 +0000 (14:46 -0700)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly

9 years agoMake adjustments to how MAP_STACK works to prevent improper mmap()s.
Matthew Dillon [Thu, 23 Apr 2009 21:41:28 +0000 (14:41 -0700)]
Make adjustments to how MAP_STACK works to prevent improper mmap()s.

Record that a vm_map_entry is a stack mapping.  When locating free space
do not allow non-MAP_STACK mappings to use space reserved by MAP_STACK
mappings, unless MAP_FIXED is used of course.

Previously MAP_STACK mappings implied MAP_FIXED, which is not how they are
supposed to work.  Implement proper hinting without MAP_FIXED.

Do not allow a normal mmap() call to use space reserved by a MAP_STACK
mapping (unless MAP_FIXED is used of course).

The proper method of making a MAP_STACK mapping inside another MAP_STACK
mapping is to use MAP_STACK | MAP_TRYFIXED.  For now, though, we silently
imply MAP_TRYFIXED when MAP_STACK is specified and it will work without it.

Document MAP_TRYFIXED and make it also relax additional requirements imposed
by MAP_STACK mappings inside of MAP_STACK mappings.

Adjust libthread_xu to use MAP_STACK | MAP_TRYFIXED.

9 years agocxm(4): Fix a crash by warning if no firmware is compiled in.
Sascha Wildner [Thu, 23 Apr 2009 20:38:03 +0000 (22:38 +0200)]
cxm(4): Fix a crash by warning if no firmware is compiled in.