dragonfly.git
23 months agokernel - Fix degenerate cluster_write() cases
Matthew Dillon [Mon, 30 Apr 2012 22:44:53 +0000 (15:44 -0700)]
kernel - Fix degenerate cluster_write() cases

* cluster_write() should bdwrite() as a fallback, not bawrite().

  Note that cluster_awrite() always bawrite()'s or equivalent.  The
  DragonFly API split the functions out, so cluster_write() can now
  almost always bdwrite() for the non-clustered case.

* Solves some serious performance and real-time disk space usage issues
  when HAMMER1 was updated to use the cluster calls.  The disk space
  would be recovered by the daily cleanup but the extra writes could
  end up being quite excessive, 25:1 unnecessary writes vs necessary
  writes.

Reported-by: multiple, testing by tuxillo
23 months agoRevert "kernel/vga: Remove some unneeded #ifdef/#define's."
Sascha Wildner [Mon, 30 Apr 2012 22:11:13 +0000 (00:11 +0200)]
Revert "kernel/vga: Remove some unneeded #ifdef/#define's."

This reverts commit 617e8b12b140696ebd906ac4c3e01ea56643c624.

Sorry, it wasn't unneeded at all.

23 months agokernel/vga: Remove some unneeded #ifdef/#define's.
Sascha Wildner [Mon, 30 Apr 2012 22:02:13 +0000 (00:02 +0200)]
kernel/vga: Remove some unneeded #ifdef/#define's.

23 months agokernel: Remove some unused variables.
Sascha Wildner [Mon, 30 Apr 2012 20:24:01 +0000 (22:24 +0200)]
kernel: Remove some unused variables.

23 months agokernel: Remove some unused variables.
Sascha Wildner [Mon, 30 Apr 2012 18:02:58 +0000 (20:02 +0200)]
kernel: Remove some unused variables.

23 months agotcp: Implement part of Eifel Response Algorithm (RFC4015)
Sepherosa Ziehau [Sat, 28 Apr 2012 02:36:17 +0000 (10:36 +0800)]
tcp: Implement part of Eifel Response Algorithm (RFC4015)

It adapts the retransmission timer to avoid further spurious timeouts.

23 months agovquota.8: Remove extra .El
Sascha Wildner [Sun, 29 Apr 2012 18:21:44 +0000 (20:21 +0200)]
vquota.8: Remove extra .El

23 months agokernel: Use LIST_FOREACH in some places.
Sascha Wildner [Sun, 29 Apr 2012 12:12:00 +0000 (14:12 +0200)]
kernel: Use LIST_FOREACH in some places.

23 months agovquotactl.2: Add back a reference to loader.conf(5).
Sascha Wildner [Sat, 28 Apr 2012 20:42:08 +0000 (22:42 +0200)]
vquotactl.2: Add back a reference to loader.conf(5).

Add some words about the tunable, too.

Also adjust vquota(8)'s manpage in a similar manner.

23 months agovquotactl.2: Fix some small mdoc issues.
Sascha Wildner [Sat, 28 Apr 2012 19:57:52 +0000 (21:57 +0200)]
vquotactl.2: Fix some small mdoc issues.

23 months agoDocument the vquotactl() syscall
Francois Tigeot [Sat, 28 Apr 2012 19:25:40 +0000 (21:25 +0200)]
Document the vquotactl() syscall

23 months agonrelease: Build the git we ship on the LiveCD without Python support.
Sascha Wildner [Sat, 28 Apr 2012 18:23:23 +0000 (20:23 +0200)]
nrelease: Build the git we ship on the LiveCD without Python support.

This saves quite some space (>60MB) on the ISO.

23 months agomake upgrade: Don't remove /var/heimdal.
Sascha Wildner [Sat, 28 Apr 2012 11:41:10 +0000 (13:41 +0200)]
make upgrade: Don't remove /var/heimdal.

pkgsrc's heimdal package will create it, too.

23 months agokernel: Remove newlines from the panic messages that have one.
Sascha Wildner [Sat, 21 Apr 2012 09:03:04 +0000 (11:03 +0200)]
kernel: Remove newlines from the panic messages that have one.

panic() itself will add a newline.

23 months agoAdd tap(4) to LINT/LINT64.
Sascha Wildner [Sat, 28 Apr 2012 07:29:07 +0000 (09:29 +0200)]
Add tap(4) to LINT/LINT64.

23 months agotap(4): Use the number of instances from the kernel config file.
Sascha Wildner [Sat, 28 Apr 2012 07:28:35 +0000 (09:28 +0200)]
tap(4): Use the number of instances from the kernel config file.

23 months agomsgport: Implement dropmsg for spin port
Sepherosa Ziehau [Sat, 28 Apr 2012 01:45:54 +0000 (09:45 +0800)]
msgport: Implement dropmsg for spin port

DragonFly-bug: http://bugs.dragonflybsd.org/issues/2354

23 months agoRemove some CLEANFILES in kernel module Makefiles.
Sascha Wildner [Fri, 27 Apr 2012 22:26:16 +0000 (00:26 +0200)]
Remove some CLEANFILES in kernel module Makefiles.

23 months agoUpdate files for OpenSSL-1.0.1b import.
Peter Avalos [Fri, 27 Apr 2012 19:42:09 +0000 (12:42 -0700)]
Update files for OpenSSL-1.0.1b import.

23 months agoMerge branch 'vendor/OPENSSL'
Peter Avalos [Fri, 27 Apr 2012 19:39:35 +0000 (12:39 -0700)]
Merge branch 'vendor/OPENSSL'

23 months agoImport OpenSSL-1.0.1b.
Peter Avalos [Fri, 27 Apr 2012 19:35:59 +0000 (12:35 -0700)]
Import OpenSSL-1.0.1b.

      o Make FIPS capable OpenSSL ciphers work in non-FIPS mode.
      o Fix SSL_OP_NO_TLSv1_1 clash with SSL_OP_ALL in OpenSSL 1.0.0

23 months agotcp: Update snd_last upon spurious timeout retransmission restore
Sepherosa Ziehau [Fri, 27 Apr 2012 08:15:55 +0000 (16:15 +0800)]
tcp: Update snd_last upon spurious timeout retransmission restore

According to RFC4015; mainly to avoid delay spike.

23 months agotcp: Fix window scaling for accecpted socket
Sepherosa Ziehau [Fri, 27 Apr 2012 06:49:07 +0000 (14:49 +0800)]
tcp: Fix window scaling for accecpted socket

- Retire tcpcb.requested_s_scale, use tcpcb.snd_scale directly.
- Set tcpcb.snd_wnd in SYN_SENT state only if the TCP flags contains SYN.
- Save other side advertised window into syncache, and setup tcpcb.snd_wnd
  according to the save value after the 3-way hand shake is done.
- Delay tiwin setup in tcp_input(), specificly after tcpcb.snd_scale is
  setup on the SO_ACCEPTCONN path.

This tends to fix the window scaling bug: when the sender accepts
connection and data only follow from sender to receiver.

23 months agotcp: Balance aggressiveness of SACK rescue retransmission
Sepherosa Ziehau [Thu, 26 Apr 2012 08:09:51 +0000 (16:09 +0800)]
tcp: Balance aggressiveness of SACK rescue retransmission

This commit is following the idea of sustain ACK clocking whenever
possible to avoid timeout transmission during fast recovery, which
is mentioned in both in RFC3517 and "Rescue Retransmission for SACK"
draft.

- Be a little bit more aggressive in NextSeg()

  The main problem of "Rescue Retransmission for SACK" draft is its
  conservativeness of how many rescue retransmission could happen
  during fast recovery, which under some situation is not enough to
  sustain ACK clock.

  Our aggressive SACK rescue retransmission variant tries to tick out
  one rescue segment if there are no other segments could be sent according
  to the RFC3517, thus ACK clock is kept ticking.

- Be consertive in sending out rescue segment.

  The idea of SACK rescue retransmission is just to sustain ACK clock.
  As long as there are segments sent (either new segments or retransmission)
  during SACK base fast recovery, the ACK clock will be sustained.  So
  rescue segment will not be sent in this situation.

SACK rescue retransmission statistics are updated more accurately to
reflect what had happened.

The aggressive variant of SACK rescue retransmission could be disabled
by setting sysctl net.inet.tcp.rescuesack_agg to 0; it is enabled by
default.

23 months agontp_{adj,get}time.2: Mention our dntpd(8) and mark ntpd as being in pkgsrc.
Sascha Wildner [Wed, 25 Apr 2012 07:27:44 +0000 (09:27 +0200)]
ntp_{adj,get}time.2: Mention our dntpd(8) and mark ntpd as being in pkgsrc.

Reported-by: Loganaden Velvindron
23 months agotcp: Implement "Rescue Retransmission for SACK-based Loss Recovery Algorithm"
Sepherosa Ziehau [Thu, 19 Apr 2012 09:30:59 +0000 (17:30 +0800)]
tcp: Implement "Rescue Retransmission for SACK-based Loss Recovery Algorithm"

http://tools.ietf.org/html/draft-nishida-tcpm-rescue-retransmission-00

When hsu@ implemented RFC3517, part of the fix mentioned in the above draft
had been implemented, i.e. no SACK scoreboard left case, as mentioned in the
above draft as example.  However, the original implementation still did not
cover the case when there are small amount of SACK scoreboards left (< 3),
and the original implmentation could be more aggressive than the method
suggested in the above draft.

- Whether to use this new mechanism is controlled by net.inet.tcp.rescuesack
  sysctl node; it is on by default.  Disable it will fallback to the original
  rescue retransmission behaviour implemented by hsu@.
- Save rexmt_high, before we start retransmission using RFC3517, so if nothing
  is sent, rexmt_high could be restored.
- Add statistics about rescue retransmission.

We could futher examine whether do more than one rescue retransmission could
be helpful or not.

23 months agontp_adjtime.2: Use the correct function name.
Sascha Wildner [Wed, 25 Apr 2012 04:52:50 +0000 (06:52 +0200)]
ntp_adjtime.2: Use the correct function name.

Reported-by: y0n3t4n1
23 months agontp_adjtime.2: Use the correct function name.
Sascha Wildner [Tue, 24 Apr 2012 17:16:03 +0000 (19:16 +0200)]
ntp_adjtime.2: Use the correct function name.

Reported-by: Loganaden Velvindron
23 months ago<sys/lock.h>: A little whitespace adjustment.
Sascha Wildner [Tue, 24 Apr 2012 17:15:12 +0000 (19:15 +0200)]
<sys/lock.h>: A little whitespace adjustment.

23 months agokernel: Change wmesg type for lockinit, lockreinit
Markus Pfeiffer [Tue, 24 Apr 2012 13:18:41 +0000 (13:18 +0000)]
kernel: Change wmesg type for lockinit, lockreinit

* change type of parameter wmesg to const char * for lockinit and
  lockreinit.
* change type of member wmesg of struct lock to const char *
* adapt manpage lock(9)

23 months agotest/test/README: add a few lines
Markus Pfeiffer [Tue, 24 Apr 2012 11:34:37 +0000 (11:34 +0000)]
test/test/README: add a few lines

2 years agokernel/viapm: Makefile cleanup (remove unneeded .PATH and opt_isa.h).
Sascha Wildner [Sun, 22 Apr 2012 11:36:53 +0000 (13:36 +0200)]
kernel/viapm: Makefile cleanup (remove unneeded .PATH and opt_isa.h).

2 years agokernel/modules: Remove opt_pci.h (which doesn't exist) from some Makefiles.
Sascha Wildner [Sun, 22 Apr 2012 11:24:00 +0000 (13:24 +0200)]
kernel/modules: Remove opt_pci.h (which doesn't exist) from some Makefiles.

2 years agokernel -- ffs: Soft updates locking fixes
Venkatesh Srinivas [Sun, 22 Apr 2012 03:05:06 +0000 (20:05 -0700)]
kernel -- ffs: Soft updates locking fixes

1) Do not take mplock in bioops callbacks; the lock was no longer synchronizing
   against mainline code.

2) Do not hold softdep lock around bwillinode()

3) Take softdep lock in softdep_process_worklist bioops callback. This callback
   was previously using the mplock for synchronization (insufficiently!)

4) Modify process_worklist_item to expect the softdep lock to be held on
   entry and release it at appropriate times.

Prevents a panic seen when running fsstress on a UFS+softdep fs, where fsync
finds a null buffer on vnode trees. This arose from a front-end/back-end
race in softdep_process_worklist.

2 years agokernel: Fix some whitespace from the previous commit.
Sascha Wildner [Sat, 21 Apr 2012 20:59:58 +0000 (22:59 +0200)]
kernel: Fix some whitespace from the previous commit.

2 years agokill zombies if the parent set SIG_IGN on SIGCHLD
Matthias Rampke [Tue, 17 Apr 2012 21:33:16 +0000 (23:33 +0200)]
kill zombies if the parent set SIG_IGN on SIGCHLD

fix for http://bugs.dragonflybsd.org/issues/2349

2 years agortld: Add write-lock to case of filter loading
John Marino [Sat, 21 Apr 2012 18:52:45 +0000 (20:52 +0200)]
rtld: Add write-lock to case of filter loading

Propagate the current state of rtld_bind_lock to dlopen_object() calls
through the filter loading call chain. This fixes attempts to
write-lock the already locked rtld_bind_lock when filter loading is
initiated by relocation of dlopening dso.

Taken from: FreeBSD SVN 234170 (12 APR 2012)

Do not try to adjust stacks if dlopen_object is called too early.  This
is a follow-up to FreeBSD SVN 233231 which fixed a similar issue with
object initialization code.

Taken from: FreeBSD SVN 233777 (02 APR 2012)

2 years agokernel -- vm_pageout: Handle pages w/ NULL vm_objects on the act/in pageqs.
Venkatesh Srinivas [Sat, 21 Apr 2012 18:40:41 +0000 (11:40 -0700)]
kernel -- vm_pageout: Handle pages w/ NULL vm_objects on the act/in pageqs.

vm_page_unwire could end up putting pages w/ NULL object fields onto the
active or inactive page queues. Allow the active/inactive scans to deal with
these pages rather than panic-ing. This pages can be disposed of normally.

Closes-bug: 2338
Suggested-by: dillon
Reported-by: sephe, JustinS
2 years agopthread_join(3): If the target thread is detached, return EINVAL.
Sascha Wildner [Sat, 21 Apr 2012 10:17:07 +0000 (12:17 +0200)]
pthread_join(3): If the target thread is detached, return EINVAL.

We were returning ESRCH previously, which is wrong, as it indicates
that the thread could not be found. Fix this in both libthread_xu
and libc_r.

See http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_join.html

----->8-----
[ESRCH]
    No thread could be found corresponding to that specified by the
    given thread ID.

...

[EINVAL]
    The value specified by thread does not refer to a joinable thread.
-----8<-----

2 years agoUpdate files for OpenSSL-1.0.1a import.
Peter Avalos [Sat, 21 Apr 2012 03:55:08 +0000 (20:55 -0700)]
Update files for OpenSSL-1.0.1a import.

2 years agoMerge branch 'vendor/OPENSSL'
Peter Avalos [Sat, 21 Apr 2012 03:37:28 +0000 (20:37 -0700)]
Merge branch 'vendor/OPENSSL'

2 years agoImport OpenSSL-1.0.1a.
Peter Avalos [Sat, 21 Apr 2012 03:33:46 +0000 (20:33 -0700)]
Import OpenSSL-1.0.1a.

o Fix for ASN1 overflow bug CVE-2012-2110.
o Workarounds for some servers that hang on long client hellos.
o Fix SEGV in AES code.

2 years agokernel: Remove some leftover references to struct cfdriver.
Sascha Wildner [Fri, 20 Apr 2012 09:02:13 +0000 (11:02 +0200)]
kernel: Remove some leftover references to struct cfdriver.

2 years agodsched_bfq.4: Fix capitalization.
Sascha Wildner [Thu, 19 Apr 2012 21:55:52 +0000 (23:55 +0200)]
dsched_bfq.4: Fix capitalization.

2 years agodigi.4: Remove some wrong documentation.
Sascha Wildner [Thu, 19 Apr 2012 21:49:46 +0000 (23:49 +0200)]
digi.4: Remove some wrong documentation.

2 years agobsd-family-tree: Sync with FreeBSD.
Sascha Wildner [Thu, 19 Apr 2012 21:24:39 +0000 (23:24 +0200)]
bsd-family-tree: Sync with FreeBSD.

2 years agotcp: Reimplement TCP_FASTKEEP socket option using per-pcb keepidle
Sepherosa Ziehau [Thu, 19 Apr 2012 07:16:39 +0000 (15:16 +0800)]
tcp: Reimplement TCP_FASTKEEP socket option using per-pcb keepidle

Retired now used TF_FASTKEEP

2 years agotcp: Reset keepalive timer, if TCP_KEEPIDLE is changed
Sepherosa Ziehau [Thu, 19 Apr 2012 06:57:36 +0000 (14:57 +0800)]
tcp: Reset keepalive timer, if TCP_KEEPIDLE is changed

This would cause side effect, if the keepalive probing was underway.
Correcting this side effect could become overkill, so it is suggested
that TCP_KEEPIDLE is set before connect(2) or listen(2).

2 years agojme: Unbreak buildkernel w/ KASSERT
Sepherosa Ziehau [Thu, 19 Apr 2012 03:13:37 +0000 (11:13 +0800)]
jme: Unbreak buildkernel w/ KASSERT

2 years agojme: Move TX descriptor count into chain_data
Sepherosa Ziehau [Thu, 19 Apr 2012 03:08:33 +0000 (11:08 +0800)]
jme: Move TX descriptor count into chain_data

Improve CPU cache utilization

2 years agojme: Option file adjustment
Sepherosa Ziehau [Thu, 19 Apr 2012 03:02:06 +0000 (11:02 +0800)]
jme: Option file adjustment

- RSS is now globally enabled; opt_rss.h is no longer needed here
- Always enable JME_RSS_DEBUG if module is not built w/ buildkernel

2 years agojme: Move RX ring count and RX descriptor count into rxdata
Sepherosa Ziehau [Thu, 19 Apr 2012 03:00:05 +0000 (11:00 +0800)]
jme: Move RX ring count and RX descriptor count into rxdata

Improve CPU cache utilization

2 years agojme: Use RX data's interrupt mask to test interrupt status
Sepherosa Ziehau [Thu, 19 Apr 2012 02:39:41 +0000 (10:39 +0800)]
jme: Use RX data's interrupt mask to test interrupt status

Functionally same, semantically better

2 years agojme: Pass rxdata to RX functions
Sepherosa Ziehau [Thu, 19 Apr 2012 02:34:09 +0000 (10:34 +0800)]
jme: Pass rxdata to RX functions

2 years agoRename vfs_accounting_enabled to vfs_quota_enabled
François Tigeot [Tue, 17 Apr 2012 20:27:36 +0000 (22:27 +0200)]
Rename vfs_accounting_enabled to vfs_quota_enabled

2 years agoef(4): Bring in some small fixes from FreeBSD.
Sascha Wildner [Wed, 18 Apr 2012 20:49:04 +0000 (22:49 +0200)]
ef(4): Bring in some small fixes from FreeBSD.

Gets rid of a superfluous extra opt_ef.h file in /usr/obj hierarchy.

2 years agoRemove a deleted mpt(4) related header file with 'make upgrade'.
Sascha Wildner [Wed, 18 Apr 2012 19:49:38 +0000 (21:49 +0200)]
Remove a deleted mpt(4) related header file with 'make upgrade'.

2 years agojme: Don't enable RSS by default
Sepherosa Ziehau [Wed, 18 Apr 2012 12:43:45 +0000 (20:43 +0800)]
jme: Don't enable RSS by default

It does not seem to be stable enough to be used.

2 years agojme: MSI-X interrupt handler bug fixes
Sepherosa Ziehau [Wed, 18 Apr 2012 12:42:05 +0000 (20:42 +0800)]
jme: MSI-X interrupt handler bug fixes

- Always write INTR_STATUS in RX interrupt handler
- Always enable/disable related interrupts in cooresponding interrupt handler

2 years agotcp: Correct sending idle detection and implement part of RFC2861
Sepherosa Ziehau [Wed, 18 Apr 2012 07:31:26 +0000 (15:31 +0800)]
tcp: Correct sending idle detection and implement part of RFC2861

This commit mainly changes how cwnd is shinked after idle period on
the send side.

- Properly detect sending idle period according to RFC5681.  The problem
  of using reception time to detect sending idle period is described in
  RFC5681 as:

  "...
   Using the last time a segment was received to determine whether or
   not to decrease cwnd can fail to deflate cwnd in the common case of
   persistent HTTP connections [HTH98].  In this case, a Web server
   receives a request before transmitting data to the Web client.  The
   reception of the request makes the test for an idle connection fail,
   and allows the TCP to begin transmission with a possibly
   inappropriately large cwnd.
   ..."

  This mainly affects HTTP/1.1 persistent connection performance after
  the connection is idled for a long time.  The impact probably should not
  be drastic, since 80% HTTP/1.1 persistent connection delay between two
  requests are less then minimum RTO (1 second) as discovered by:
  "Overclocking the Yahoo! CDN for Faster Web Page Loads"
    http://conferences.sigcomm.org/imc/2011/docs/p569.pdf

  Sysctl node net.inet.tcp.idle_restart is added to disable the cwnd
  shinking after idle period.  It is on by default.  And you can set it
  to 0 to restore old behaviour against HTTP/1.1 persistent connection.

- Implement part of RFC2861, which decays cwnd after idle period according
  to the length of sending idle period.  The main difference between our
  implementation and the RFC2861 is that we don't let cwnd go below the
  value allowed by RFC5861.

  Sysctl node net.inet.tcp.idle_cwv is added to disable CWV after sending
  idle period.  It is on by default.  Disable net.inet.tcp.idle_restart
  will also indirectly disable CWV after sending idle period.

  The CWV during the application-limited period is not implemented by this
  commit.  It is just too conservative, as discovered by:
  "Analysing TCP for Bursty Traffic, Int'l J. of Communications,
   Network and System Sciences, 7(3), July 2010."

- Add statistics about how much sending idle happened

2 years agosocket: Change sysctl names sosnd -> sosend, no functional changes
Sepherosa Ziehau [Tue, 17 Apr 2012 10:26:43 +0000 (18:26 +0800)]
socket: Change sysctl names sosnd -> sosend, no functional changes

2 years agovquota(8): document a limit of 0 means no check
François Tigeot [Tue, 17 Apr 2012 20:54:27 +0000 (22:54 +0200)]
vquota(8): document a limit of 0 means no check

2 years agoVFS quota: enforce user and group limits
Francois Tigeot [Wed, 4 Apr 2012 08:37:38 +0000 (10:37 +0200)]
VFS quota: enforce user and group limits

2 years agoVFS quota: add a command to set a group quota
Francois Tigeot [Fri, 30 Mar 2012 18:47:36 +0000 (20:47 +0200)]
VFS quota: add a command to set a group quota

2 years agoVFS quota: add a command to set a user's quota
Francois Tigeot [Fri, 23 Mar 2012 22:19:12 +0000 (23:19 +0100)]
VFS quota: add a command to set a user's quota

2 years agompt(4): Sync with FreeBSD.
Sascha Wildner [Tue, 17 Apr 2012 16:13:57 +0000 (18:13 +0200)]
mpt(4): Sync with FreeBSD.

Bug fixes and cleanups.

2 years agompt(4): Pass INTR_MPSAFE when setting up the interrupt.
Sascha Wildner [Tue, 17 Apr 2012 16:03:15 +0000 (18:03 +0200)]
mpt(4): Pass INTR_MPSAFE when setting up the interrupt.

It's a porting mistake I did back then.

2 years agoVFS quota: start enforcing limits
Francois Tigeot [Thu, 22 Mar 2012 13:33:43 +0000 (14:33 +0100)]
VFS quota: start enforcing limits

* Add a function vq_write_ok() to check if writing a specified amount
  of data to a mounted filesystem is allowed.

* It only checks a global per mount-point limit for now.

* Enforce this limit check in vop_write().

2 years agoVFS quota: add a function to set a per mount-point global space limit
Francois Tigeot [Sat, 17 Mar 2012 21:15:09 +0000 (22:15 +0100)]
VFS quota: add a function to set a per mount-point global space limit

2 years agoVFS accounting: have syscalls fail if not enabled
Francois Tigeot [Mon, 16 Apr 2012 17:50:55 +0000 (19:50 +0200)]
VFS accounting: have syscalls fail if not enabled

2 years agoVFS quota: report per mount-point space limits
Francois Tigeot [Sat, 17 Mar 2012 18:30:55 +0000 (19:30 +0100)]
VFS quota: report per mount-point space limits

2 years agojme: Change how IFCAP_RSS is handled
Sepherosa Ziehau [Mon, 16 Apr 2012 12:00:36 +0000 (20:00 +0800)]
jme: Change how IFCAP_RSS is handled

- Number of RX rings being used is not changed.
- Hardware is always setup w/ RSS support.
- If IFCAP_RSS is not enabled, we simply don't use the RSS hash provided
  by the hardware.

2 years agojme: Per-device tunable knobs
Sepherosa Ziehau [Mon, 16 Apr 2012 11:51:31 +0000 (19:51 +0800)]
jme: Per-device tunable knobs

2 years agoifnet: Factor out if_ring_count2()
Sepherosa Ziehau [Mon, 16 Apr 2012 11:41:20 +0000 (19:41 +0800)]
ifnet: Factor out if_ring_count2()

This function calculates maximum allowed power-of-2 ring count based on
user specified value (cnt) and the maximum number of rings supported by
the hardware (cnt_max).  The power-of-2 cpu count is also take into
consideration.

2 years agoemx: Change how IFCAP_RSS is handled
Sepherosa Ziehau [Mon, 16 Apr 2012 10:43:00 +0000 (18:43 +0800)]
emx: Change how IFCAP_RSS is handled

- Number of RX rings being used is not changed.
- Hardware is always setup w/ RSS support.
- If IFCAP_RSS is not enabled, we simply don't use the RSS hash provided
  by the hardware.

2 years agoemx: Allow user to specify the number of RX ring to use
Sepherosa Ziehau [Mon, 16 Apr 2012 10:19:19 +0000 (18:19 +0800)]
emx: Allow user to specify the number of RX ring to use

Default value is 0, which means auto (based on # of cpus in the system)

2 years agoemx: Per-device tunable knobs
Sepherosa Ziehau [Mon, 16 Apr 2012 10:07:28 +0000 (18:07 +0800)]
emx: Per-device tunable knobs

- Max TX/RX descriptor count
- Interrupt thrrotle ceiling

2 years agobus: Change device_getenv_int interface a little bit
Sepherosa Ziehau [Mon, 16 Apr 2012 09:51:14 +0000 (17:51 +0800)]
bus: Change device_getenv_int interface a little bit

- Pass in the fallback value
- If the kgetenv fails, fallback value will be returned

2 years agopci: Utilize device_getenv_int
Sepherosa Ziehau [Mon, 16 Apr 2012 09:16:08 +0000 (17:16 +0800)]
pci: Utilize device_getenv_int

2 years agobus: Add device_getenv_int helper function
Sepherosa Ziehau [Mon, 16 Apr 2012 09:15:16 +0000 (17:15 +0800)]
bus: Add device_getenv_int helper function

To get device specific int tunable knobs.

2 years agoroute: Turn on route_assert_owner_access by default
Sepherosa Ziehau [Mon, 16 Apr 2012 02:10:52 +0000 (10:10 +0800)]
route: Turn on route_assert_owner_access by default

No reports about the invalid CPU local routing information accessing
since 2009.  Time to use assertion instead of printing backtrace.

2 years agokernel: Use ${.TARGET} in module Makefiles.
Sascha Wildner [Sun, 15 Apr 2012 20:23:33 +0000 (22:23 +0200)]
kernel: Use ${.TARGET} in module Makefiles.

2 years agoVFS accounting: remove unneeded code
François Tigeot [Sun, 15 Apr 2012 20:09:45 +0000 (22:09 +0200)]
VFS accounting: remove unneeded code

2 years agoVFS accounting: do not call initialization functions directly
Francois Tigeot [Sun, 15 Apr 2012 19:31:39 +0000 (21:31 +0200)]
VFS accounting: do not call initialization functions directly

* Use a macro to check if VFS accounting is enabled and they have been
  properly enabled for this mount point first

* Change the return type of vfs_acdone() to void. This function does
  not return errors and should not make a filesystem unmount fail.

2 years agokernel/netgraph7: Remove some atalk remains.
Sascha Wildner [Sun, 15 Apr 2012 19:17:04 +0000 (21:17 +0200)]
kernel/netgraph7: Remove some atalk remains.

2 years agoVFS accounting: fix vfs_register()
Francois Tigeot [Sun, 15 Apr 2012 18:41:10 +0000 (20:41 +0200)]
VFS accounting: fix vfs_register()

* Do not register per mount-point initialization and destruction
  functions if VFS accounting is not globally enabled.

2 years agoVFS quota: add per mount-point global, uid and gid limit fields.
Francois Tigeot [Sat, 17 Mar 2012 17:17:49 +0000 (18:17 +0100)]
VFS quota: add per mount-point global, uid and gid limit fields.

2 years agoif: Remove experimental ifnet.if_start related sysctls
Sepherosa Ziehau [Sun, 15 Apr 2012 12:39:46 +0000 (20:39 +0800)]
if: Remove experimental ifnet.if_start related sysctls

2 years agomake.conf: Remove the unused LOADER_TFTP_SUPPORT variable.
Sascha Wildner [Sun, 15 Apr 2012 10:26:45 +0000 (12:26 +0200)]
make.conf: Remove the unused LOADER_TFTP_SUPPORT variable.

We offer both the NFS and TFTP versions of pxeboot and loader since
2005 (see 423d6aa030d43ea1d66b0572688d0964c641c1b5) and since this
commit, the LOADER_TFTP_SUPPORT make.conf variable has been without
any effect.

2 years agoRemove some unnecessary inclusions of <sys/cdefs.h> across the tree.
Sascha Wildner [Sat, 14 Apr 2012 15:34:00 +0000 (17:34 +0200)]
Remove some unnecessary inclusions of <sys/cdefs.h> across the tree.

2 years agokernel -- Enable threaded syncer for NFS mounts.
Venkatesh Srinivas [Sat, 14 Apr 2012 01:10:58 +0000 (18:10 -0700)]
kernel -- Enable threaded syncer for NFS mounts.

NFS mounts will now use a per-mount thread to complete periodic syncs on its
vnodes rather than using the system's syncer0.

Also remove a change that snuck in mistakenly to unmark syncer threads
as verbose.

2 years agokernel -- Per-mount threaded syncer infrastructure.
Venkatesh Srinivas [Sat, 14 Apr 2012 00:46:42 +0000 (17:46 -0700)]
kernel -- Per-mount threaded syncer infrastructure.

Do not shut down syncer thread when unmount fails.

Reminded-by: dillon@
2 years agokernel: Clean up some module Makefiles.
Sascha Wildner [Fri, 13 Apr 2012 17:40:11 +0000 (19:40 +0200)]
kernel: Clean up some module Makefiles.

Remove unneeded .PATHs and variables. Use ${.TARGET}.

2 years agocxm(4): Remove an unused file (opt_cxm.h) from the Makefile.
Sascha Wildner [Fri, 13 Apr 2012 17:20:25 +0000 (19:20 +0200)]
cxm(4): Remove an unused file (opt_cxm.h) from the Makefile.

2 years agoem/emx: Add a note about why MSI-X should be not enable on 82574
Sepherosa Ziehau [Fri, 13 Apr 2012 10:40:16 +0000 (18:40 +0800)]
em/emx: Add a note about why MSI-X should be not enable on 82574

2 years agoem/emx: Update to Intel em-7.2.4
Sepherosa Ziehau [Fri, 13 Apr 2012 09:09:10 +0000 (17:09 +0800)]
em/emx: Update to Intel em-7.2.4

- Fix the max frame length settings for 82583 chips
- Workaround 82574 specification update errata #20.

Local changes:
- For 82574 specification update errata #20 workaround, we don't
  disable ASPM L1.  Disabling ASPM L1 is said to be unnecessary
  in the specification update.

2 years agopci: Add definition for PCI express Link capabilities/control
Sepherosa Ziehau [Fri, 13 Apr 2012 10:31:32 +0000 (18:31 +0800)]
pci: Add definition for PCI express Link capabilities/control

2 years agotcp/sack: Further optimize scoreboard block allocation
Sepherosa Ziehau [Fri, 13 Apr 2012 08:34:51 +0000 (16:34 +0800)]
tcp/sack: Further optimize scoreboard block allocation

Use one slot freed SACK scoreboard cache.

30minutes tests are conducted on a heavy congested real-life network path;
the new statistics:

  31254 SACK scoreboard updates
      2 overflows
      0 failures
      13392 records reused
      12703 records fast allocated

Before this commit, ~42% allocations are avoided (reused); after this commit,
~83% allocations are avoided (reused + fast allocated).

2 years agotcp/sack: Optimize scoreboard block allocation
Sepherosa Ziehau [Fri, 13 Apr 2012 07:30:26 +0000 (15:30 +0800)]
tcp/sack: Optimize scoreboard block allocation

Allocate SACK scoreboard block only if we can't extend the existing
one's right edge (end).

This commit could avoid ~70% SACK scoreboard block allocation on
leaf.dragonflybsd.org (11528032 updating, 8353353 reused) according
to the "netstat -s -f inet -p tcp" output as of today.  On my testing
sites, this commit could avoid 30%~50% SACK scoreboard block allocation.

2 years agotcp/sack: Move scoreboard block start/end setup into alloc function
Sepherosa Ziehau [Fri, 13 Apr 2012 06:31:55 +0000 (14:31 +0800)]
tcp/sack: Move scoreboard block start/end setup into alloc function

This paves way for further SACK scoreboard block allocation optimization