dragonfly.git
11 years agortld: Add two special directives to libmap.conf
John Marino [Fri, 4 May 2012 16:54:27 +0000 (18:54 +0200)]
rtld: Add two special directives to libmap.conf

include <file>:
    Parse the contents of file before continuing with the current file.

includedir <dir>:
    Parse the contents of every file in dir that ends in .conf before
    continuing with the current file.

Any file or directory encountered while processing include or includedir
directives will be parsed exactly once, even if it is encountered multiple
times.

Taken from FreeBSD SVN 234851 (30 APR 2012) with modification:

1) DragonFly realpath works differently than FreeBSD's and doesn't
   accept a null value for the resolved_path argument.
2) FreeBSD's debug lines reflect the wrong function, lm_init, instead
   of lmc_parse_file.  lmc_parse_dir also calls lmc_parse_file, so
   the debug message is definitely wrong and was corrected.
3) FreeBSD keeps using path even after determining realpath and putting
   the result in the rpath variable.  It uses path for debug messages
   and opening a file descriptor.  DragonFly doesn't use path again and
   only uses rpath after it is determined.
4) FreeBSD's lmc_parse_file code had a bug in the linked list used to
   track which conf files had already been parsed.  Memory for the
   filename was allocated so it wouldn't get overwritten after multiple
   passes, which is standard for the includedir functionality.

11 years agortld: Sync with FreeBSD after gnu_hash import
John Marino [Fri, 4 May 2012 16:41:58 +0000 (18:41 +0200)]
rtld: Sync with FreeBSD after gnu_hash import

For the most part, FreeBSD took our gnu_hash implementation without much
modification.  Most of these changes are caused by whitespace differences
due to a different style scheme, and by declarating variables separately
from their assignments.  Notable exceptions were:
* FreeBSD didn't use int_fast32_t type
* FreeBSD keeps checking the first character before doing strcmp
* FreeBSD renamed the symlook_obj2 to symlook_obj1_*

The only additions were two debug statements.  This commits syncs the
following back to DragonFly:

FreeBSD SVN 234840 (30 APR 2012)
FreeBSD SVN 234841 (30 APR 2012)

11 years agomake upgrade: Now that we have devfs, don't try to remove any /dev/*
Sascha Wildner [Fri, 4 May 2012 19:15:54 +0000 (21:15 +0200)]
make upgrade: Now that we have devfs, don't try to remove any /dev/*

11 years agotcp: Per-connection DupThresh
Sepherosa Ziehau [Fri, 4 May 2012 09:48:24 +0000 (17:48 +0800)]
tcp: Per-connection DupThresh

This eases implementing adaptive DupThresh algorithm, e.g. RFC4653

11 years agotcp: Disable aggressive rescue retransmission for SACK by default
Sepherosa Ziehau [Fri, 4 May 2012 04:58:22 +0000 (12:58 +0800)]
tcp: Disable aggressive rescue retransmission for SACK by default

It could cause medium spurious retransmit when segments are reordered
but not lost.

11 years agotcp: Dragging RescueRxt along with HighRxt should depend on tcp.rescuesack_agg
Sepherosa Ziehau [Fri, 4 May 2012 04:16:21 +0000 (12:16 +0800)]
tcp: Dragging RescueRxt along with HighRxt should depend on tcp.rescuesack_agg

11 years agotcp: Move useless DSACK detection before increasing dupacks
Sepherosa Ziehau [Fri, 4 May 2012 04:07:14 +0000 (12:07 +0800)]
tcp: Move useless DSACK detection before increasing dupacks

- Avoid the spurious retransmit in the following dump:
  http://leaf.dragonflybsd.org/~sephe/fast1.xpl (~9.755sec)
- Loosely meet the requirement of RFC3042: no new segments should be
  sent upon ACKs carrying useless SACK information
- Add sysctl net.inet.tcp.ignore_redun_dsack to disable useless
  DSACK detection; default on

11 years agokernel: Remove the unused HW_WDOG option.
Sascha Wildner [Thu, 3 May 2012 21:02:27 +0000 (23:02 +0200)]
kernel: Remove the unused HW_WDOG option.

11 years agokernel/wdog: Compile in kern_wdog.c by default.
Sascha Wildner [Thu, 3 May 2012 20:58:16 +0000 (22:58 +0200)]
kernel/wdog: Compile in kern_wdog.c by default.

Actually it was always compiled in by default but the code depended
on the WATCHDOG_ENABLED option which is not in the GENERIC kernels.

Simply remove the WATCHDOG_ENABLE option. The code is small, it does
nothing by default than initializing a lock, a /dev/wdog and a
callout, and it also makes it easier for people who want to use
ichwd(4) by simply kldloading it.

11 years agoImport ichwd(4) from FreeBSD
Francois Tigeot [Thu, 3 May 2012 07:22:27 +0000 (09:22 +0200)]
Import ichwd(4) from FreeBSD

This is a device driver for the watchdog timer function present on the
LPC interface bridge in Intel ICH-series chipsets.

11 years agokernel/ecc: Remove unneeded MFILES line in the Makefile.
Sascha Wildner [Thu, 3 May 2012 11:16:18 +0000 (13:16 +0200)]
kernel/ecc: Remove unneeded MFILES line in the Makefile.

Reported-by: ftigeot
11 years agoKernel: do not manipulate watchdog list if empty
François Tigeot [Thu, 3 May 2012 10:43:52 +0000 (12:43 +0200)]
Kernel: do not manipulate watchdog list if empty

Suggested-by: Alex Hornung
11 years agotcp: Ignore duplicate ACK carries useless DSACK
Sepherosa Ziehau [Thu, 3 May 2012 08:37:17 +0000 (16:37 +0800)]
tcp: Ignore duplicate ACK carries useless DSACK

This mainly used to avoid unnecessary early retransmit and fast
retransmit as show in the following two dumps:
http://leaf.dragonflybsd.org/~sephe/early.xpl (~4.8sec)
http://leaf.dragonflybsd.org/~sephe/fast.xpl (~12.35sec)

11 years agoUpdate to per-CPU hardware resources format.
Francois Tigeot [Wed, 2 May 2012 17:57:03 +0000 (19:57 +0200)]
Update to per-CPU hardware resources format.

11 years agotcp_var.h: White space cleanup
Sepherosa Ziehau [Wed, 2 May 2012 03:27:22 +0000 (11:27 +0800)]
tcp_var.h: White space cleanup

11 years agovmstat - Remove the busy_time == 0 hack
Matthew Dillon [Wed, 2 May 2012 03:06:44 +0000 (20:06 -0700)]
vmstat - Remove the busy_time == 0 hack

Remove a very old busy_time == 0 hack which assumed that a delta busy time
of 0 with transactions present simply meant that a transaction didn't
complete within one second and that the device was 100% busy.

In fact, disk devices are so fast these days (particularly SSDs) that it is
possible for many transactions to complete without causing the busy counter
to tick-over.  The device is more likely to be less than 1% busy.

This fixes the '% busy' display for e.g. 'systat -vm 1' to properly report
0% in these situations.

11 years agokernel: Remove some bogus casts of NULL to something.
Sascha Wildner [Tue, 1 May 2012 17:48:16 +0000 (19:48 +0200)]
kernel: Remove some bogus casts of NULL to something.

11 years agohptmv(4): Remove an unneeded NULL check after M_WAITOK.
Sascha Wildner [Tue, 1 May 2012 14:04:40 +0000 (16:04 +0200)]
hptmv(4): Remove an unneeded NULL check after M_WAITOK.

11 years agokernel/plip: A little indent fix.
Sascha Wildner [Tue, 1 May 2012 13:12:05 +0000 (15:12 +0200)]
kernel/plip: A little indent fix.

11 years agokernel: Adjust some indentation.
Sascha Wildner [Tue, 1 May 2012 12:21:57 +0000 (14:21 +0200)]
kernel: Adjust some indentation.

11 years agokernel: Remove some unused variables.
Sascha Wildner [Tue, 1 May 2012 09:48:27 +0000 (11:48 +0200)]
kernel: Remove some unused variables.

11 years agoHAMMER VFS - Only set B_CLUSTEROK on 64K buffers
Matthew Dillon [Mon, 30 Apr 2012 22:48:32 +0000 (15:48 -0700)]
HAMMER VFS - Only set B_CLUSTEROK on 64K buffers

* Only set B_CLUSETEROK on 64K buffers.  This should fix a fairly rare
  panic related to buffer size mismatches due to the bufdaemon crossing
  the 16K/64K buffer size boundary when clustering buffers.

11 years agokernel - Fix degenerate cluster_write() cases
Matthew Dillon [Mon, 30 Apr 2012 22:44:53 +0000 (15:44 -0700)]
kernel - Fix degenerate cluster_write() cases

* cluster_write() should bdwrite() as a fallback, not bawrite().

  Note that cluster_awrite() always bawrite()'s or equivalent.  The
  DragonFly API split the functions out, so cluster_write() can now
  almost always bdwrite() for the non-clustered case.

* Solves some serious performance and real-time disk space usage issues
  when HAMMER1 was updated to use the cluster calls.  The disk space
  would be recovered by the daily cleanup but the extra writes could
  end up being quite excessive, 25:1 unnecessary writes vs necessary
  writes.

Reported-by: multiple, testing by tuxillo
11 years agoRevert "kernel/vga: Remove some unneeded #ifdef/#define's."
Sascha Wildner [Mon, 30 Apr 2012 22:11:13 +0000 (00:11 +0200)]
Revert "kernel/vga: Remove some unneeded #ifdef/#define's."

This reverts commit 617e8b12b140696ebd906ac4c3e01ea56643c624.

Sorry, it wasn't unneeded at all.

11 years agokernel/vga: Remove some unneeded #ifdef/#define's.
Sascha Wildner [Mon, 30 Apr 2012 22:02:13 +0000 (00:02 +0200)]
kernel/vga: Remove some unneeded #ifdef/#define's.

11 years agokernel: Remove some unused variables.
Sascha Wildner [Mon, 30 Apr 2012 20:24:01 +0000 (22:24 +0200)]
kernel: Remove some unused variables.

11 years agokernel: Remove some unused variables.
Sascha Wildner [Mon, 30 Apr 2012 18:02:58 +0000 (20:02 +0200)]
kernel: Remove some unused variables.

11 years agotcp: Implement part of Eifel Response Algorithm (RFC4015)
Sepherosa Ziehau [Sat, 28 Apr 2012 02:36:17 +0000 (10:36 +0800)]
tcp: Implement part of Eifel Response Algorithm (RFC4015)

It adapts the retransmission timer to avoid further spurious timeouts.

11 years agovquota.8: Remove extra .El
Sascha Wildner [Sun, 29 Apr 2012 18:21:44 +0000 (20:21 +0200)]
vquota.8: Remove extra .El

11 years agokernel: Use LIST_FOREACH in some places.
Sascha Wildner [Sun, 29 Apr 2012 12:12:00 +0000 (14:12 +0200)]
kernel: Use LIST_FOREACH in some places.

11 years agovquotactl.2: Add back a reference to loader.conf(5).
Sascha Wildner [Sat, 28 Apr 2012 20:42:08 +0000 (22:42 +0200)]
vquotactl.2: Add back a reference to loader.conf(5).

Add some words about the tunable, too.

Also adjust vquota(8)'s manpage in a similar manner.

11 years agovquotactl.2: Fix some small mdoc issues.
Sascha Wildner [Sat, 28 Apr 2012 19:57:52 +0000 (21:57 +0200)]
vquotactl.2: Fix some small mdoc issues.

11 years agoDocument the vquotactl() syscall
Francois Tigeot [Sat, 28 Apr 2012 19:25:40 +0000 (21:25 +0200)]
Document the vquotactl() syscall

11 years agonrelease: Build the git we ship on the LiveCD without Python support.
Sascha Wildner [Sat, 28 Apr 2012 18:23:23 +0000 (20:23 +0200)]
nrelease: Build the git we ship on the LiveCD without Python support.

This saves quite some space (>60MB) on the ISO.

11 years agomake upgrade: Don't remove /var/heimdal.
Sascha Wildner [Sat, 28 Apr 2012 11:41:10 +0000 (13:41 +0200)]
make upgrade: Don't remove /var/heimdal.

pkgsrc's heimdal package will create it, too.

11 years agokernel: Remove newlines from the panic messages that have one.
Sascha Wildner [Sat, 21 Apr 2012 09:03:04 +0000 (11:03 +0200)]
kernel: Remove newlines from the panic messages that have one.

panic() itself will add a newline.

11 years agoAdd tap(4) to LINT/LINT64.
Sascha Wildner [Sat, 28 Apr 2012 07:29:07 +0000 (09:29 +0200)]
Add tap(4) to LINT/LINT64.

11 years agotap(4): Use the number of instances from the kernel config file.
Sascha Wildner [Sat, 28 Apr 2012 07:28:35 +0000 (09:28 +0200)]
tap(4): Use the number of instances from the kernel config file.

11 years agomsgport: Implement dropmsg for spin port
Sepherosa Ziehau [Sat, 28 Apr 2012 01:45:54 +0000 (09:45 +0800)]
msgport: Implement dropmsg for spin port

DragonFly-bug: http://bugs.dragonflybsd.org/issues/2354

11 years agoRemove some CLEANFILES in kernel module Makefiles.
Sascha Wildner [Fri, 27 Apr 2012 22:26:16 +0000 (00:26 +0200)]
Remove some CLEANFILES in kernel module Makefiles.

11 years agoUpdate files for OpenSSL-1.0.1b import.
Peter Avalos [Fri, 27 Apr 2012 19:42:09 +0000 (12:42 -0700)]
Update files for OpenSSL-1.0.1b import.

11 years agoMerge branch 'vendor/OPENSSL'
Peter Avalos [Fri, 27 Apr 2012 19:39:35 +0000 (12:39 -0700)]
Merge branch 'vendor/OPENSSL'

11 years agoImport OpenSSL-1.0.1b.
Peter Avalos [Fri, 27 Apr 2012 19:35:59 +0000 (12:35 -0700)]
Import OpenSSL-1.0.1b.

      o Make FIPS capable OpenSSL ciphers work in non-FIPS mode.
      o Fix SSL_OP_NO_TLSv1_1 clash with SSL_OP_ALL in OpenSSL 1.0.0

11 years agotcp: Update snd_last upon spurious timeout retransmission restore
Sepherosa Ziehau [Fri, 27 Apr 2012 08:15:55 +0000 (16:15 +0800)]
tcp: Update snd_last upon spurious timeout retransmission restore

According to RFC4015; mainly to avoid delay spike.

11 years agotcp: Fix window scaling for accecpted socket
Sepherosa Ziehau [Fri, 27 Apr 2012 06:49:07 +0000 (14:49 +0800)]
tcp: Fix window scaling for accecpted socket

- Retire tcpcb.requested_s_scale, use tcpcb.snd_scale directly.
- Set tcpcb.snd_wnd in SYN_SENT state only if the TCP flags contains SYN.
- Save other side advertised window into syncache, and setup tcpcb.snd_wnd
  according to the save value after the 3-way hand shake is done.
- Delay tiwin setup in tcp_input(), specificly after tcpcb.snd_scale is
  setup on the SO_ACCEPTCONN path.

This tends to fix the window scaling bug: when the sender accepts
connection and data only follow from sender to receiver.

11 years agotcp: Balance aggressiveness of SACK rescue retransmission
Sepherosa Ziehau [Thu, 26 Apr 2012 08:09:51 +0000 (16:09 +0800)]
tcp: Balance aggressiveness of SACK rescue retransmission

This commit is following the idea of sustain ACK clocking whenever
possible to avoid timeout transmission during fast recovery, which
is mentioned in both in RFC3517 and "Rescue Retransmission for SACK"
draft.

- Be a little bit more aggressive in NextSeg()

  The main problem of "Rescue Retransmission for SACK" draft is its
  conservativeness of how many rescue retransmission could happen
  during fast recovery, which under some situation is not enough to
  sustain ACK clock.

  Our aggressive SACK rescue retransmission variant tries to tick out
  one rescue segment if there are no other segments could be sent according
  to the RFC3517, thus ACK clock is kept ticking.

- Be consertive in sending out rescue segment.

  The idea of SACK rescue retransmission is just to sustain ACK clock.
  As long as there are segments sent (either new segments or retransmission)
  during SACK base fast recovery, the ACK clock will be sustained.  So
  rescue segment will not be sent in this situation.

SACK rescue retransmission statistics are updated more accurately to
reflect what had happened.

The aggressive variant of SACK rescue retransmission could be disabled
by setting sysctl net.inet.tcp.rescuesack_agg to 0; it is enabled by
default.

11 years agontp_{adj,get}time.2: Mention our dntpd(8) and mark ntpd as being in pkgsrc.
Sascha Wildner [Wed, 25 Apr 2012 07:27:44 +0000 (09:27 +0200)]
ntp_{adj,get}time.2: Mention our dntpd(8) and mark ntpd as being in pkgsrc.

Reported-by: Loganaden Velvindron
11 years agotcp: Implement "Rescue Retransmission for SACK-based Loss Recovery Algorithm"
Sepherosa Ziehau [Thu, 19 Apr 2012 09:30:59 +0000 (17:30 +0800)]
tcp: Implement "Rescue Retransmission for SACK-based Loss Recovery Algorithm"

http://tools.ietf.org/html/draft-nishida-tcpm-rescue-retransmission-00

When hsu@ implemented RFC3517, part of the fix mentioned in the above draft
had been implemented, i.e. no SACK scoreboard left case, as mentioned in the
above draft as example.  However, the original implementation still did not
cover the case when there are small amount of SACK scoreboards left (< 3),
and the original implmentation could be more aggressive than the method
suggested in the above draft.

- Whether to use this new mechanism is controlled by net.inet.tcp.rescuesack
  sysctl node; it is on by default.  Disable it will fallback to the original
  rescue retransmission behaviour implemented by hsu@.
- Save rexmt_high, before we start retransmission using RFC3517, so if nothing
  is sent, rexmt_high could be restored.
- Add statistics about rescue retransmission.

We could futher examine whether do more than one rescue retransmission could
be helpful or not.

11 years agontp_adjtime.2: Use the correct function name.
Sascha Wildner [Wed, 25 Apr 2012 04:52:50 +0000 (06:52 +0200)]
ntp_adjtime.2: Use the correct function name.

Reported-by: y0n3t4n1
11 years agontp_adjtime.2: Use the correct function name.
Sascha Wildner [Tue, 24 Apr 2012 17:16:03 +0000 (19:16 +0200)]
ntp_adjtime.2: Use the correct function name.

Reported-by: Loganaden Velvindron
11 years ago<sys/lock.h>: A little whitespace adjustment.
Sascha Wildner [Tue, 24 Apr 2012 17:15:12 +0000 (19:15 +0200)]
<sys/lock.h>: A little whitespace adjustment.

11 years agokernel: Change wmesg type for lockinit, lockreinit
Markus Pfeiffer [Tue, 24 Apr 2012 13:18:41 +0000 (13:18 +0000)]
kernel: Change wmesg type for lockinit, lockreinit

* change type of parameter wmesg to const char * for lockinit and
  lockreinit.
* change type of member wmesg of struct lock to const char *
* adapt manpage lock(9)

11 years agotest/test/README: add a few lines
Markus Pfeiffer [Tue, 24 Apr 2012 11:34:37 +0000 (11:34 +0000)]
test/test/README: add a few lines

11 years agokernel/viapm: Makefile cleanup (remove unneeded .PATH and opt_isa.h).
Sascha Wildner [Sun, 22 Apr 2012 11:36:53 +0000 (13:36 +0200)]
kernel/viapm: Makefile cleanup (remove unneeded .PATH and opt_isa.h).

11 years agokernel/modules: Remove opt_pci.h (which doesn't exist) from some Makefiles.
Sascha Wildner [Sun, 22 Apr 2012 11:24:00 +0000 (13:24 +0200)]
kernel/modules: Remove opt_pci.h (which doesn't exist) from some Makefiles.

12 years agokernel -- ffs: Soft updates locking fixes
Venkatesh Srinivas [Sun, 22 Apr 2012 03:05:06 +0000 (20:05 -0700)]
kernel -- ffs: Soft updates locking fixes

1) Do not take mplock in bioops callbacks; the lock was no longer synchronizing
   against mainline code.

2) Do not hold softdep lock around bwillinode()

3) Take softdep lock in softdep_process_worklist bioops callback. This callback
   was previously using the mplock for synchronization (insufficiently!)

4) Modify process_worklist_item to expect the softdep lock to be held on
   entry and release it at appropriate times.

Prevents a panic seen when running fsstress on a UFS+softdep fs, where fsync
finds a null buffer on vnode trees. This arose from a front-end/back-end
race in softdep_process_worklist.

12 years agokernel: Fix some whitespace from the previous commit.
Sascha Wildner [Sat, 21 Apr 2012 20:59:58 +0000 (22:59 +0200)]
kernel: Fix some whitespace from the previous commit.

12 years agokill zombies if the parent set SIG_IGN on SIGCHLD
Matthias Rampke [Tue, 17 Apr 2012 21:33:16 +0000 (23:33 +0200)]
kill zombies if the parent set SIG_IGN on SIGCHLD

fix for http://bugs.dragonflybsd.org/issues/2349

12 years agortld: Add write-lock to case of filter loading
John Marino [Sat, 21 Apr 2012 18:52:45 +0000 (20:52 +0200)]
rtld: Add write-lock to case of filter loading

Propagate the current state of rtld_bind_lock to dlopen_object() calls
through the filter loading call chain. This fixes attempts to
write-lock the already locked rtld_bind_lock when filter loading is
initiated by relocation of dlopening dso.

Taken from: FreeBSD SVN 234170 (12 APR 2012)

Do not try to adjust stacks if dlopen_object is called too early.  This
is a follow-up to FreeBSD SVN 233231 which fixed a similar issue with
object initialization code.

Taken from: FreeBSD SVN 233777 (02 APR 2012)

12 years agokernel -- vm_pageout: Handle pages w/ NULL vm_objects on the act/in pageqs.
Venkatesh Srinivas [Sat, 21 Apr 2012 18:40:41 +0000 (11:40 -0700)]
kernel -- vm_pageout: Handle pages w/ NULL vm_objects on the act/in pageqs.

vm_page_unwire could end up putting pages w/ NULL object fields onto the
active or inactive page queues. Allow the active/inactive scans to deal with
these pages rather than panic-ing. This pages can be disposed of normally.

Closes-bug: 2338
Suggested-by: dillon
Reported-by: sephe, JustinS
12 years agopthread_join(3): If the target thread is detached, return EINVAL.
Sascha Wildner [Sat, 21 Apr 2012 10:17:07 +0000 (12:17 +0200)]
pthread_join(3): If the target thread is detached, return EINVAL.

We were returning ESRCH previously, which is wrong, as it indicates
that the thread could not be found. Fix this in both libthread_xu
and libc_r.

See http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_join.html

----->8-----
[ESRCH]
    No thread could be found corresponding to that specified by the
    given thread ID.

...

[EINVAL]
    The value specified by thread does not refer to a joinable thread.
-----8<-----

12 years agoUpdate files for OpenSSL-1.0.1a import.
Peter Avalos [Sat, 21 Apr 2012 03:55:08 +0000 (20:55 -0700)]
Update files for OpenSSL-1.0.1a import.

12 years agoMerge branch 'vendor/OPENSSL'
Peter Avalos [Sat, 21 Apr 2012 03:37:28 +0000 (20:37 -0700)]
Merge branch 'vendor/OPENSSL'

12 years agoImport OpenSSL-1.0.1a.
Peter Avalos [Sat, 21 Apr 2012 03:33:46 +0000 (20:33 -0700)]
Import OpenSSL-1.0.1a.

o Fix for ASN1 overflow bug CVE-2012-2110.
o Workarounds for some servers that hang on long client hellos.
o Fix SEGV in AES code.

12 years agokernel: Remove some leftover references to struct cfdriver.
Sascha Wildner [Fri, 20 Apr 2012 09:02:13 +0000 (11:02 +0200)]
kernel: Remove some leftover references to struct cfdriver.

12 years agodsched_bfq.4: Fix capitalization.
Sascha Wildner [Thu, 19 Apr 2012 21:55:52 +0000 (23:55 +0200)]
dsched_bfq.4: Fix capitalization.

12 years agodigi.4: Remove some wrong documentation.
Sascha Wildner [Thu, 19 Apr 2012 21:49:46 +0000 (23:49 +0200)]
digi.4: Remove some wrong documentation.

12 years agobsd-family-tree: Sync with FreeBSD.
Sascha Wildner [Thu, 19 Apr 2012 21:24:39 +0000 (23:24 +0200)]
bsd-family-tree: Sync with FreeBSD.

12 years agotcp: Reimplement TCP_FASTKEEP socket option using per-pcb keepidle
Sepherosa Ziehau [Thu, 19 Apr 2012 07:16:39 +0000 (15:16 +0800)]
tcp: Reimplement TCP_FASTKEEP socket option using per-pcb keepidle

Retired now used TF_FASTKEEP

12 years agotcp: Reset keepalive timer, if TCP_KEEPIDLE is changed
Sepherosa Ziehau [Thu, 19 Apr 2012 06:57:36 +0000 (14:57 +0800)]
tcp: Reset keepalive timer, if TCP_KEEPIDLE is changed

This would cause side effect, if the keepalive probing was underway.
Correcting this side effect could become overkill, so it is suggested
that TCP_KEEPIDLE is set before connect(2) or listen(2).

12 years agojme: Unbreak buildkernel w/ KASSERT
Sepherosa Ziehau [Thu, 19 Apr 2012 03:13:37 +0000 (11:13 +0800)]
jme: Unbreak buildkernel w/ KASSERT

12 years agojme: Move TX descriptor count into chain_data
Sepherosa Ziehau [Thu, 19 Apr 2012 03:08:33 +0000 (11:08 +0800)]
jme: Move TX descriptor count into chain_data

Improve CPU cache utilization

12 years agojme: Option file adjustment
Sepherosa Ziehau [Thu, 19 Apr 2012 03:02:06 +0000 (11:02 +0800)]
jme: Option file adjustment

- RSS is now globally enabled; opt_rss.h is no longer needed here
- Always enable JME_RSS_DEBUG if module is not built w/ buildkernel

12 years agojme: Move RX ring count and RX descriptor count into rxdata
Sepherosa Ziehau [Thu, 19 Apr 2012 03:00:05 +0000 (11:00 +0800)]
jme: Move RX ring count and RX descriptor count into rxdata

Improve CPU cache utilization

12 years agojme: Use RX data's interrupt mask to test interrupt status
Sepherosa Ziehau [Thu, 19 Apr 2012 02:39:41 +0000 (10:39 +0800)]
jme: Use RX data's interrupt mask to test interrupt status

Functionally same, semantically better

12 years agojme: Pass rxdata to RX functions
Sepherosa Ziehau [Thu, 19 Apr 2012 02:34:09 +0000 (10:34 +0800)]
jme: Pass rxdata to RX functions

12 years agoRename vfs_accounting_enabled to vfs_quota_enabled
François Tigeot [Tue, 17 Apr 2012 20:27:36 +0000 (22:27 +0200)]
Rename vfs_accounting_enabled to vfs_quota_enabled

12 years agoef(4): Bring in some small fixes from FreeBSD.
Sascha Wildner [Wed, 18 Apr 2012 20:49:04 +0000 (22:49 +0200)]
ef(4): Bring in some small fixes from FreeBSD.

Gets rid of a superfluous extra opt_ef.h file in /usr/obj hierarchy.

12 years agoRemove a deleted mpt(4) related header file with 'make upgrade'.
Sascha Wildner [Wed, 18 Apr 2012 19:49:38 +0000 (21:49 +0200)]
Remove a deleted mpt(4) related header file with 'make upgrade'.

12 years agojme: Don't enable RSS by default
Sepherosa Ziehau [Wed, 18 Apr 2012 12:43:45 +0000 (20:43 +0800)]
jme: Don't enable RSS by default

It does not seem to be stable enough to be used.

12 years agojme: MSI-X interrupt handler bug fixes
Sepherosa Ziehau [Wed, 18 Apr 2012 12:42:05 +0000 (20:42 +0800)]
jme: MSI-X interrupt handler bug fixes

- Always write INTR_STATUS in RX interrupt handler
- Always enable/disable related interrupts in cooresponding interrupt handler

12 years agotcp: Correct sending idle detection and implement part of RFC2861
Sepherosa Ziehau [Wed, 18 Apr 2012 07:31:26 +0000 (15:31 +0800)]
tcp: Correct sending idle detection and implement part of RFC2861

This commit mainly changes how cwnd is shinked after idle period on
the send side.

- Properly detect sending idle period according to RFC5681.  The problem
  of using reception time to detect sending idle period is described in
  RFC5681 as:

  "...
   Using the last time a segment was received to determine whether or
   not to decrease cwnd can fail to deflate cwnd in the common case of
   persistent HTTP connections [HTH98].  In this case, a Web server
   receives a request before transmitting data to the Web client.  The
   reception of the request makes the test for an idle connection fail,
   and allows the TCP to begin transmission with a possibly
   inappropriately large cwnd.
   ..."

  This mainly affects HTTP/1.1 persistent connection performance after
  the connection is idled for a long time.  The impact probably should not
  be drastic, since 80% HTTP/1.1 persistent connection delay between two
  requests are less then minimum RTO (1 second) as discovered by:
  "Overclocking the Yahoo! CDN for Faster Web Page Loads"
    http://conferences.sigcomm.org/imc/2011/docs/p569.pdf

  Sysctl node net.inet.tcp.idle_restart is added to disable the cwnd
  shinking after idle period.  It is on by default.  And you can set it
  to 0 to restore old behaviour against HTTP/1.1 persistent connection.

- Implement part of RFC2861, which decays cwnd after idle period according
  to the length of sending idle period.  The main difference between our
  implementation and the RFC2861 is that we don't let cwnd go below the
  value allowed by RFC5861.

  Sysctl node net.inet.tcp.idle_cwv is added to disable CWV after sending
  idle period.  It is on by default.  Disable net.inet.tcp.idle_restart
  will also indirectly disable CWV after sending idle period.

  The CWV during the application-limited period is not implemented by this
  commit.  It is just too conservative, as discovered by:
  "Analysing TCP for Bursty Traffic, Int'l J. of Communications,
   Network and System Sciences, 7(3), July 2010."

- Add statistics about how much sending idle happened

12 years agosocket: Change sysctl names sosnd -> sosend, no functional changes
Sepherosa Ziehau [Tue, 17 Apr 2012 10:26:43 +0000 (18:26 +0800)]
socket: Change sysctl names sosnd -> sosend, no functional changes

12 years agovquota(8): document a limit of 0 means no check
François Tigeot [Tue, 17 Apr 2012 20:54:27 +0000 (22:54 +0200)]
vquota(8): document a limit of 0 means no check

12 years agoVFS quota: enforce user and group limits
Francois Tigeot [Wed, 4 Apr 2012 08:37:38 +0000 (10:37 +0200)]
VFS quota: enforce user and group limits

12 years agoVFS quota: add a command to set a group quota
Francois Tigeot [Fri, 30 Mar 2012 18:47:36 +0000 (20:47 +0200)]
VFS quota: add a command to set a group quota

12 years agoVFS quota: add a command to set a user's quota
Francois Tigeot [Fri, 23 Mar 2012 22:19:12 +0000 (23:19 +0100)]
VFS quota: add a command to set a user's quota

12 years agompt(4): Sync with FreeBSD.
Sascha Wildner [Tue, 17 Apr 2012 16:13:57 +0000 (18:13 +0200)]
mpt(4): Sync with FreeBSD.

Bug fixes and cleanups.

12 years agompt(4): Pass INTR_MPSAFE when setting up the interrupt.
Sascha Wildner [Tue, 17 Apr 2012 16:03:15 +0000 (18:03 +0200)]
mpt(4): Pass INTR_MPSAFE when setting up the interrupt.

It's a porting mistake I did back then.

12 years agoVFS quota: start enforcing limits
Francois Tigeot [Thu, 22 Mar 2012 13:33:43 +0000 (14:33 +0100)]
VFS quota: start enforcing limits

* Add a function vq_write_ok() to check if writing a specified amount
  of data to a mounted filesystem is allowed.

* It only checks a global per mount-point limit for now.

* Enforce this limit check in vop_write().

12 years agoVFS quota: add a function to set a per mount-point global space limit
Francois Tigeot [Sat, 17 Mar 2012 21:15:09 +0000 (22:15 +0100)]
VFS quota: add a function to set a per mount-point global space limit

12 years agoVFS accounting: have syscalls fail if not enabled
Francois Tigeot [Mon, 16 Apr 2012 17:50:55 +0000 (19:50 +0200)]
VFS accounting: have syscalls fail if not enabled

12 years agoVFS quota: report per mount-point space limits
Francois Tigeot [Sat, 17 Mar 2012 18:30:55 +0000 (19:30 +0100)]
VFS quota: report per mount-point space limits

12 years agojme: Change how IFCAP_RSS is handled
Sepherosa Ziehau [Mon, 16 Apr 2012 12:00:36 +0000 (20:00 +0800)]
jme: Change how IFCAP_RSS is handled

- Number of RX rings being used is not changed.
- Hardware is always setup w/ RSS support.
- If IFCAP_RSS is not enabled, we simply don't use the RSS hash provided
  by the hardware.

12 years agojme: Per-device tunable knobs
Sepherosa Ziehau [Mon, 16 Apr 2012 11:51:31 +0000 (19:51 +0800)]
jme: Per-device tunable knobs

12 years agoifnet: Factor out if_ring_count2()
Sepherosa Ziehau [Mon, 16 Apr 2012 11:41:20 +0000 (19:41 +0800)]
ifnet: Factor out if_ring_count2()

This function calculates maximum allowed power-of-2 ring count based on
user specified value (cnt) and the maximum number of rings supported by
the hardware (cnt_max).  The power-of-2 cpu count is also take into
consideration.

12 years agoemx: Change how IFCAP_RSS is handled
Sepherosa Ziehau [Mon, 16 Apr 2012 10:43:00 +0000 (18:43 +0800)]
emx: Change how IFCAP_RSS is handled

- Number of RX rings being used is not changed.
- Hardware is always setup w/ RSS support.
- If IFCAP_RSS is not enabled, we simply don't use the RSS hash provided
  by the hardware.

12 years agoemx: Allow user to specify the number of RX ring to use
Sepherosa Ziehau [Mon, 16 Apr 2012 10:19:19 +0000 (18:19 +0800)]
emx: Allow user to specify the number of RX ring to use

Default value is 0, which means auto (based on # of cpus in the system)

12 years agoemx: Per-device tunable knobs
Sepherosa Ziehau [Mon, 16 Apr 2012 10:07:28 +0000 (18:07 +0800)]
emx: Per-device tunable knobs

- Max TX/RX descriptor count
- Interrupt thrrotle ceiling

12 years agobus: Change device_getenv_int interface a little bit
Sepherosa Ziehau [Mon, 16 Apr 2012 09:51:14 +0000 (17:51 +0800)]
bus: Change device_getenv_int interface a little bit

- Pass in the fallback value
- If the kgetenv fails, fallback value will be returned