dragonfly.git
2 years agohammer2 - Move CCMS code from kernel to hammer2
Matthew Dillon [Sat, 2 Jun 2012 17:26:54 +0000 (10:26 -0700)]
hammer2 - Move CCMS code from kernel to hammer2

* Move the CCMS cache coherency module from the kernel to hammer2.  It will
  now be hammer2-specific.

2 years agoMerge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository...
Matthew Dillon [Sat, 2 Jun 2012 17:22:04 +0000 (10:22 -0700)]
Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2

2 years agokernel - Add comment on spinlocks_wr
Matthew Dillon [Sat, 2 Jun 2012 17:21:03 +0000 (10:21 -0700)]
kernel - Add comment on spinlocks_wr

* Document a side effect related to spinlocks_wr in the LWKT scheduler.

2 years agokernel - Remove kernel-level ccms module (it will be moved into hammer2)
Matthew Dillon [Sat, 2 Jun 2012 17:15:51 +0000 (10:15 -0700)]
kernel - Remove kernel-level ccms module (it will be moved into hammer2)

* Remove the CCMS kernel layer.  The CCMS module is going to be moved
  directly into hammer2 in order to make hammer2 more portable.  For
  now that means moving the files into vfs/hammer2 in the hammer2 branch.

* CCMS is a logical cache coherency locking layer that has been in the
  DragonFly tree for a while but was not enabled by default.  Originally
  the plan was to not lock vnodes across operations but to instead acquire
  the appropriate CCMS lock(s), but rewiring all the filesystems proved to
  be too large a task.

* HAMMER2's cluster work is going to need this layer for real, but nothing
  else does.  What we will do instead (eventually) is add a mount flag to
  allow us to avoid locking vnodes across VNOPS calls which HAMMER2 will be
  able to specify.

2 years agoMerge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository...
Matthew Dillon [Thu, 31 May 2012 17:26:35 +0000 (10:26 -0700)]
Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2

2 years agoigb: Optimize TX path
Sepherosa Ziehau [Sun, 27 May 2012 13:16:46 +0000 (21:16 +0800)]
igb: Optimize TX path

Reduce the number of status reports of TX ring: at most 16 reports every
TX descriptor count tranmission.  It is unnecessary to report status for
every TX descriptor.  This could greatly reduce bus traffic.

Use "Transmit Completions Head Write Back" as mentioned in the datasheet.
In this model, TX descriptors are no longer written by hardware thus cache
trashing is avoided.  This also greatly reduce the complexity of igb_txeof.

Implemention note of "Transmit Completions Head Write Back",
- HWBTHRESH is not used, since:
  o  82575 does not support it
  o  Number of status reports are already greatly reduced
- WB_on_EITR is not used, since:
  o  82575 does not support it
  o  It will cause unnecessary head write-back

Performance is almost same as previous code:
- 1.48Mpps for 18bytes UDP datagram
- Line rate for 1472bytes UDP datagram and TCP stream

2 years agotcp: Adjust tcpcb fields comment about NewReno fast recovery
Sepherosa Ziehau [Thu, 31 May 2012 09:32:08 +0000 (17:32 +0800)]
tcp: Adjust tcpcb fields comment about NewReno fast recovery

We have SACK based fast recovery; don't limits the fields to NewReno

2 years agokernel/drm: Remove bogus .PATHs.
Sascha Wildner [Thu, 31 May 2012 05:45:57 +0000 (07:45 +0200)]
kernel/drm: Remove bogus .PATHs.

2 years agodrm: Stow drivers for various chip families
François Tigeot [Tue, 29 May 2012 21:12:15 +0000 (23:12 +0200)]
drm: Stow drivers for various chip families

putting them into their own subdirectories in sys/dev/drm/

Inspired-by: David Shao's dflygsocdrm work
2 years agoFix for password truncation when using crypt(3) with DES
Aggelos Economopoulos [Wed, 30 May 2012 14:03:21 +0000 (16:03 +0200)]
Fix for password truncation when using crypt(3) with DES

Passwords containing a 0x80 byte (UTF-8 encoded ones, ASCII and
ISO-8859-* not affected) would get truncated as if a '\0' byte
had been encountered. This could result in some very weak passwords.

Reported-by: Rubin Xu, Joseph Bonneau, Donting Yu (CVE-2012-2143)
2 years agoicmp: Discard ICMP Source Quench per RFC6633
Sepherosa Ziehau [Wed, 30 May 2012 08:23:48 +0000 (16:23 +0800)]
icmp: Discard ICMP Source Quench per RFC6633

2 years agotcp: Only tcpopt.to_flags are needed in tcp_recv_dupack()
Sepherosa Ziehau [Wed, 30 May 2012 05:16:02 +0000 (13:16 +0800)]
tcp: Only tcpopt.to_flags are needed in tcp_recv_dupack()

While im here, change tcpopt.to_flags from u_long to u_int

2 years agotcp: Even for PAWS tolerance, no segments should follow segment with FIN
Sepherosa Ziehau [Wed, 30 May 2012 03:48:03 +0000 (11:48 +0800)]
tcp: Even for PAWS tolerance, no segments should follow segment with FIN

2 years agotcp: Don't let fast retransmit disrupt RTO rebasing
Sepherosa Ziehau [Tue, 29 May 2012 09:12:07 +0000 (17:12 +0800)]
tcp: Don't let fast retransmit disrupt RTO rebasing

While im here, add and adjust comment about spurious timeout retransmit
detection.

2 years agotcp/reass: Fix the cases that FIN got lost during reassemble
Sepherosa Ziehau [Wed, 30 May 2012 03:34:22 +0000 (11:34 +0800)]
tcp/reass: Fix the cases that FIN got lost during reassemble

While im here, set SACK report's right edge correctly if the current
segment could be merged with its succeeding segment.

2 years agotcp/sack: If other side reneged, discard the current SACK scoreboard
Sepherosa Ziehau [Wed, 30 May 2012 01:43:50 +0000 (09:43 +0800)]
tcp/sack: If other side reneged, discard the current SACK scoreboard

Other side reneging is detected using the first SACK record:
If its left edge is less than or equal to the cumulative ACK of the
incoming segment, other side probably reneged.

This fixes the later assertion that the first SACK record's left edge
must be above snd_una in tcp_sack_first_unsacked_len()

Add statistics about other side reneging

2 years agosocket: Fix wrongly numbered SIOCGIFDATA
Sepherosa Ziehau [Tue, 29 May 2012 08:09:47 +0000 (16:09 +0800)]
socket: Fix wrongly numbered SIOCGIFDATA

While im here, add comment about used number in 'i' group

DragonFly-bug: http://bugs.dragonflybsd.org/issues/1897

2 years agodrm.4: A little clean up.
Sascha Wildner [Mon, 28 May 2012 11:33:17 +0000 (13:33 +0200)]
drm.4: A little clean up.

2 years agokernel: increase watchdog default period to 30s
François Tigeot [Sun, 27 May 2012 20:06:37 +0000 (22:06 +0200)]
kernel: increase watchdog default period to 30s

Reducing idle cpu time and power a bit

2 years agotcp/sack: Constify function arguments if possible
Sepherosa Ziehau [Mon, 28 May 2012 06:33:28 +0000 (14:33 +0800)]
tcp/sack: Constify function arguments if possible

2 years agoman/ktr: Adjust for the recent ether function cleanup
Sepherosa Ziehau [Mon, 28 May 2012 02:38:08 +0000 (10:38 +0800)]
man/ktr: Adjust for the recent ether function cleanup

Reminded-by: swildner@
2 years agotcp/sack: Only retransmit unSACKed data when fast retransmit
Sepherosa Ziehau [Mon, 28 May 2012 02:32:41 +0000 (10:32 +0800)]
tcp/sack: Only retransmit unSACKed data when fast retransmit

2 years agopktgen: Unbreak compile
Sepherosa Ziehau [Sun, 27 May 2012 11:32:00 +0000 (19:32 +0800)]
pktgen: Unbreak compile

2 years agokernel: in_cksum2.s is needed by inet6 code
François Tigeot [Sun, 27 May 2012 06:40:27 +0000 (08:40 +0200)]
kernel: in_cksum2.s is needed by inet6 code

2 years agoRemove a few more casts of NULL to some pointer type.
Sascha Wildner [Sat, 26 May 2012 23:00:15 +0000 (01:00 +0200)]
Remove a few more casts of NULL to some pointer type.

2 years agokernel: tcp_fasttimo() is dead
Francois Tigeot [Sat, 26 May 2012 14:07:50 +0000 (16:07 +0200)]
kernel: tcp_fasttimo() is dead

* It was actually killed in 1999

* Remove its last two remaining references

2 years agopci: Disable PCI express memory mapped access method by default
Sepherosa Ziehau [Sat, 26 May 2012 15:06:07 +0000 (23:06 +0800)]
pci: Disable PCI express memory mapped access method by default

It seems to hang some systems during boot.

Reported-by: y0netan1@
2 years agotools: Add netblast
Sepherosa Ziehau [Sat, 26 May 2012 15:05:01 +0000 (23:05 +0800)]
tools: Add netblast

Obtained-from: FreeBSD

2 years agoacpi: strupr() isn't used anywhere, so remove it.
Sascha Wildner [Sat, 26 May 2012 11:40:43 +0000 (13:40 +0200)]
acpi: strupr() isn't used anywhere, so remove it.

2 years agondis.4: Comment out an unneeded sentence.
Sascha Wildner [Sat, 26 May 2012 08:21:02 +0000 (10:21 +0200)]
ndis.4: Comment out an unneeded sentence.

It is supported on all platforms we have.

2 years agoMerge branch 'master' of /repository/git/dragonfly
Venkatesh Srinivas [Sat, 26 May 2012 03:15:00 +0000 (20:15 -0700)]
Merge branch 'master' of /repository/git/dragonfly

2 years agokernel: Remove the inclusion of opt_ddb.h from where it is unnecessary.
Sascha Wildner [Fri, 25 May 2012 21:28:33 +0000 (23:28 +0200)]
kernel: Remove the inclusion of opt_ddb.h from where it is unnecessary.

None of these files uses DDB, DDB_UNATTENDED or GDB_REMOTE_CHAT (which
is what opt_ddb.h defines).

2 years agolibc -- dmalloc: Call malloc_init as-needed, rather than via ctor (#2)
Venkatesh Srinivas [Fri, 25 May 2012 19:43:58 +0000 (12:43 -0700)]
libc -- dmalloc: Call malloc_init as-needed, rather than via ctor (#2)

This commit is a second revision of
e12d3396c777165504d60d2a1408dcd7cb63660d; for details, see the original
commit message.

That commit was reverted quickly, as it broke pthreads; this revision
does not suffer from that problem, as it preserves the __constructor
logic for malloc_init.

Reverts: 4018c6eddd57f4abf9134690cbfa46c9d7103558 (Revert libc ...)
Reported-by: marino@
Closes-bug: 2305

2 years agoRemove some useless casts of NULL to another pointer type.
Sascha Wildner [Fri, 25 May 2012 18:07:33 +0000 (20:07 +0200)]
Remove some useless casts of NULL to another pointer type.

2 years agopci: Print PCIe memory mapped accessing information a little bit earlier
Sepherosa Ziehau [Fri, 25 May 2012 08:28:55 +0000 (16:28 +0800)]
pci: Print PCIe memory mapped accessing information a little bit earlier

2 years agotcp: Enable RFC3517bis by default
Sepherosa Ziehau [Fri, 25 May 2012 07:38:50 +0000 (15:38 +0800)]
tcp: Enable RFC3517bis by default

2 years agotcp: Function renaming
Sepherosa Ziehau [Fri, 25 May 2012 07:23:59 +0000 (15:23 +0800)]
tcp: Function renaming

tcp_recv_dupack() probably is better than tcp_fast_recovery(), which does
more the fast recovery.

2 years agotcp/sack: Fix off-by-one bug when updating rescue SACK information
Sepherosa Ziehau [Fri, 25 May 2012 06:18:18 +0000 (14:18 +0800)]
tcp/sack: Fix off-by-one bug when updating rescue SACK information

2 years agotcp/sack: Force out more segments allowed by "pipe" during fast recovery
Sepherosa Ziehau [Thu, 24 May 2012 08:09:57 +0000 (16:09 +0800)]
tcp/sack: Force out more segments allowed by "pipe" during fast recovery

If some segments are cumulatively acked or SACKed, and HighRxt equals
snd_una, one segment (new or retransmit) will be forced out even if cwnd
and pipe don't allow it.  When large amount of segments are lost, i.e.
computed pipe could be large, this avoids unnecessary retransmit timeout
and could perform as good as NewReno.

Sysctl node net.inet.tcp.force_sackrxt could be tuned to burst out several
retransmits, default is 1 (should be good enough).  Set this sysctl to 0,
SACK based fast recovery will obey the computed pipe.

Several unnecessary retransmit timeout graph as described above:
http://leaf.dragonflybsd.org/~sephe/no_force_sack_rexmt2_15.xpl (starts @15s)
http://leaf.dragonflybsd.org/~sephe/no_force_sack_rexmt_54.xpl (starts @54s)

2 years agotcp/sack: Use RFC3517bis IsLost(snd_una) as fallback of early retransmit
Sepherosa Ziehau [Thu, 24 May 2012 05:35:36 +0000 (13:35 +0800)]
tcp/sack: Use RFC3517bis IsLost(snd_una) as fallback of early retransmit

Since we are less certain about whether is segment is lost or not when
using IsLost(snd_una), we do not send out other unSACKed segments except
the first unSACKed segment under this condition.  Sending out other
unSACKed segments could be too aggressive here; just wait for another
ACK to tick out more unSACKed segments.

2 years agokernel: Remove some bogus casts to the own type (FINAL).
Sascha Wildner [Thu, 24 May 2012 18:16:10 +0000 (20:16 +0200)]
kernel: Remove some bogus casts to the own type (FINAL).

2 years agokernel: Remove some bogus casts to the own type.
Sascha Wildner [Thu, 24 May 2012 17:26:08 +0000 (19:26 +0200)]
kernel: Remove some bogus casts to the own type.

2 years agokernel: Remove some bogus casts to the own type.
Sascha Wildner [Thu, 24 May 2012 17:19:30 +0000 (19:19 +0200)]
kernel: Remove some bogus casts to the own type.

2 years agokernel: Remove some bogus casts to the own type.
Sascha Wildner [Thu, 24 May 2012 08:35:00 +0000 (10:35 +0200)]
kernel: Remove some bogus casts to the own type.

2 years agotcp/sack: Fix the condition that SACK rescue retransmit can't be done
Sepherosa Ziehau [Wed, 23 May 2012 09:38:30 +0000 (17:38 +0800)]
tcp/sack: Fix the condition that SACK rescue retransmit can't be done

If we have nothing left above the HighRxt, the first unSACKed segment
will be used as the SACK rescue retransmit.

2 years agotcp: Indentation
Sepherosa Ziehau [Wed, 23 May 2012 09:37:39 +0000 (17:37 +0800)]
tcp: Indentation

2 years agokernel -- CLFLUSH support
Venkatesh Srinivas [Thu, 24 May 2012 02:15:25 +0000 (19:15 -0700)]
kernel -- CLFLUSH support

* Introduce a kernel variable, 'vmm_guest', signifying whether the
  kernel is running in a virtual environment, such as KVM. This is
  set based on the CPUID2.VMM flag on kernels and set automatically
  on virtual kernels.

* Introduce wrappers for CLFLUSH instructions.

* Provide tunable, hw.clflush_enable, to autoenable CLFLUSH on h/w (-1)
  disable always (0), or enable always (1).

Closes-bug: 2363
Reviewed-by: ftigeot@
From: David Shao, FreeBSD

2 years agokernel: Remove some bogus casts to the own type.
Sascha Wildner [Wed, 23 May 2012 20:42:46 +0000 (22:42 +0200)]
kernel: Remove some bogus casts to the own type.

2 years agokernel: Remove some bogus casts to the own type.
Sascha Wildner [Wed, 23 May 2012 19:30:05 +0000 (21:30 +0200)]
kernel: Remove some bogus casts to the own type.

2 years agokernel/linux: Fix a wrong cast (introduced in e54488bb).
Sascha Wildner [Wed, 23 May 2012 19:28:32 +0000 (21:28 +0200)]
kernel/linux: Fix a wrong cast (introduced in e54488bb).

2 years agokernel: Remove some bogus casts to the own type.
Sascha Wildner [Wed, 23 May 2012 16:36:44 +0000 (18:36 +0200)]
kernel: Remove some bogus casts to the own type.

2 years agokernel: Remove some bogus casts to the own type.
Sascha Wildner [Wed, 23 May 2012 16:01:25 +0000 (18:01 +0200)]
kernel: Remove some bogus casts to the own type.

2 years agotcp: Simplify "extended limited transmit" logic a little bit
Sepherosa Ziehau [Wed, 23 May 2012 05:44:52 +0000 (13:44 +0800)]
tcp: Simplify "extended limited transmit" logic a little bit

Don't follow the RFC4653 or RFC3517bis's "extended limited transmit"
description verbatimly; increase cwnd once and let tcp_output() do
the job.

2 years agotcp: Optimize SACK scoreboard records consolidation a little bit
Sepherosa Ziehau [Wed, 23 May 2012 03:14:02 +0000 (11:14 +0800)]
tcp: Optimize SACK scoreboard records consolidation a little bit

If the SACK block and SACK scoreboard record are matched exactly,
SACK scoreboard records consolidation is not needed at all.

2 years agoRevert "libc -- dmalloc: Call malloc_init as-needed, rather than via cc constructor."
Sascha Wildner [Tue, 22 May 2012 13:02:01 +0000 (15:02 +0200)]
Revert "libc -- dmalloc: Call malloc_init as-needed, rather than via cc constructor."

This reverts commit e12d3396c777165504d60d2a1408dcd7cb63660d.

2 years agoacpica: Unbreak LINT/LINT64 building
Sepherosa Ziehau [Tue, 22 May 2012 08:10:21 +0000 (16:10 +0800)]
acpica: Unbreak LINT/LINT64 building

2 years agoacpi/timer: Fix return value
Sepherosa Ziehau [Tue, 22 May 2012 07:55:45 +0000 (15:55 +0800)]
acpi/timer: Fix return value

2 years agoacpidb: regenerate osunixxf.c.patch
Magliano Andrea [Fri, 11 May 2012 13:59:11 +0000 (15:59 +0200)]
acpidb: regenerate osunixxf.c.patch

someone please take care of dfly header, if necessary;
i applied the patch by hand and pulled in a git diff

2 years agoacpidb: add missing evglock.c to Makefile
Magliano Andrea [Fri, 11 May 2012 13:58:52 +0000 (15:58 +0200)]
acpidb: add missing evglock.c to Makefile

2 years agoFix iasl compilation
Magliano Andrea [Fri, 11 May 2012 08:42:56 +0000 (10:42 +0200)]
Fix iasl compilation

basically sync with svn://svn.freebsd.org/base/head@220663

2 years agoSome files overlooked on first commit...
Magliano Andrea [Fri, 11 May 2012 08:19:52 +0000 (10:19 +0200)]
Some files overlooked on first commit...

2 years agoRevert previous commit (wrong tentative)
Magliano Andrea [Fri, 11 May 2012 07:19:24 +0000 (09:19 +0200)]
Revert previous commit (wrong tentative)

and do like svn://svn.freebsd.org/base/head@220663
it doesn't seem possible with bsd Makefile infrastructure
to set source target specific flags

2 years agoFirst import (compiles, seems to run correctly)
Magliano Andrea [Fri, 11 May 2012 06:12:10 +0000 (08:12 +0200)]
First import (compiles, seems to run correctly)

Taken from FreeBSD r222544:218590 (patch applied),
not from acpica repository.

One problem shown (no more reproducible, skew build?):
in bootverbose mode 'domain0 misses processors, should be 2, got 1'
sysctl shows hw.acpi.cpu0 only, other cpus are missing;
seems an error in evaluating C009 Method in aml code...

TODO:

* iasl compiler Makefile has to be reworked because of specific
  YASL flags for new files dtparser.[yl]

* 'EVENTHANDLER_INVOKE(power_suspend)' to be integrated in acpi.c

* atomic_load_acq_64 isn't implemented (used in acpi_hpet.c)

* sc->tc.tc_quality isn't available; to be investigated

* acpi_timer_test() improved implementation not integrated

* ACPI_CAP_SMP_C3_NATIVE and CPI_CAP_PX_HW_COORD in acpivar.h
  left out, as FreeBSD don't use it either

2 years agoigb: Add to x86_64 and i386 GENERIC
Sepherosa Ziehau [Mon, 21 May 2012 08:58:31 +0000 (16:58 +0800)]
igb: Add to x86_64 and i386 GENERIC

2 years agoLINT: Add igb(4)
Sepherosa Ziehau [Mon, 21 May 2012 08:35:02 +0000 (16:35 +0800)]
LINT: Add igb(4)

2 years agohammer2 - Fix lost flush
Matthew Dillon [Sun, 20 May 2012 18:12:58 +0000 (11:12 -0700)]
hammer2 - Fix lost flush

* hammer2 allows the buffer cache buffers related to MODIFIED but unlocked
  chains to be retired by the OS.  In this situation hammer2 does not want
  to bdwrite() the buffer again unless additional modifications are made,
  even though the MODIFIED bit in the chain remains set throughout the
  entire sequence.

* Fix a case where these additional modifications were not properly flagging
  for the buffer cache buffer to be retired with a bdwrite(), causing data
  loss.  This is related to the DIRTYBP chain flag.

* Make further adjustments to the DIRTYBP chain flag.

* Also fix a case where the MOVED bit might not get properly set when a
  block is resized.  The problem was masked by the fact that a resize
  only occurs on data blocks and only during a write(), so the related
  buffer was being marked MODIFIED anyway.  However, the resize code still
  needed to be corrected.

* Add some debugging to 'hammer2 stat' to make it easier to poke around
  related kernel structures.

2 years agoMerge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository...
Matthew Dillon [Sun, 20 May 2012 18:12:44 +0000 (11:12 -0700)]
Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2

2 years agolibc -- dmalloc: Call malloc_init as-needed, rather than via cc constructor.
Venkatesh Srinivas [Sun, 20 May 2012 14:10:56 +0000 (07:10 -0700)]
libc -- dmalloc: Call malloc_init as-needed, rather than via cc constructor.

dmalloc requires its own _nmalloc_thr_init be called before it can service
allocations. Applications with preinit arrays were able to call malloc before
constructors ran, which caused them to crash on uninitialized allocator state.

The change uses a flag to test for allocator init state. It is also careful
to not allow _nmalloc_thr_init to be called recursively from within pthread
initialization (slglobal.masked).

Reported-by: marino@
Closes-bug: 2305

2 years agonetif: Remove no longer used e1000 layout
Sepherosa Ziehau [Sun, 20 May 2012 13:52:01 +0000 (21:52 +0800)]
netif: Remove no longer used e1000 layout

2 years agoigb: Import Intel igb-2.2.3
Sepherosa Ziehau [Wed, 25 Apr 2012 12:42:40 +0000 (20:42 +0800)]
igb: Import Intel igb-2.2.3

Local changes
- Laundry the code
- Rewrite busdma related code
- Rewrite RX path
- Enable hardware TX IP chesksum

Integration w/ DragonFly's RSS and TX path optimization will be
conducted in the repository.

Tested-with: 82576 82575EB

2 years agoig_hal: Merge Intel igb-2.2.3 HAL w/ em-7.2.4 HAL
Sepherosa Ziehau [Thu, 19 Apr 2012 14:11:00 +0000 (22:11 +0800)]
ig_hal: Merge Intel igb-2.2.3 HAL w/ em-7.2.4 HAL

2 years agoe1000: Unhook from building, prepare for the new igb
Sepherosa Ziehau [Thu, 19 Apr 2012 13:57:50 +0000 (21:57 +0800)]
e1000: Unhook from building, prepare for the new igb

2 years agokernel/devfs: Remove the unused devfs Makefile.
Sascha Wildner [Sun, 20 May 2012 02:47:11 +0000 (04:47 +0200)]
kernel/devfs: Remove the unused devfs Makefile.

2 years agohammer2 - Add 'hammer2 stat'
Matthew Dillon [Sat, 19 May 2012 22:21:01 +0000 (15:21 -0700)]
hammer2 - Add 'hammer2 stat'

* Add the 'hammer2 stat' directive to access inode information not
  available from a normal stat.

  Currently reports ncopies, data_count, inode_count, data_quota, and
  inode_quota.

2 years agohammer2 - Get data-usage aggregation working, add INODE_GET
Matthew Dillon [Sat, 19 May 2012 22:17:03 +0000 (15:17 -0700)]
hammer2 - Get data-usage aggregation working, add INODE_GET

* Cleanup aggregation of data_count and inode_count in the inode.
  data_count should now work properly (though it requires a 'sync'
  if you want up-to-date information).

  This allow data and inode usage for an entire sub-tree to be
  retrieved from the parent directory inode.  No need to run 'du'
  over millions of inodes.

  The new 'hammer2 stat' command can be used to access the info.

* Add the HAMMER2IOC_INODE_GET/SET ioctls to access information that
  cannot be obtained from a normal stat().

2 years agoMerge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository...
Matthew Dillon [Sat, 19 May 2012 19:07:40 +0000 (12:07 -0700)]
Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2

2 years agokernel -- tmpfs: Convert tmpfs inode counter to per-mount field
Venkatesh Srinivas [Sat, 19 May 2012 03:33:56 +0000 (20:33 -0700)]
kernel -- tmpfs: Convert tmpfs inode counter to per-mount field

tmpfs used a global counter under a spinlock to set inode numbers. This
should be a per-mount field, protected by the mount lock.

2 years agohammer2 - Flush ordering fixes
Matthew Dillon [Sat, 19 May 2012 02:18:14 +0000 (19:18 -0700)]
hammer2 - Flush ordering fixes

* The flush code is required to write out modified chains, not just bdwrite()
  them.  Otherwise the disk synchronization and volume header write will be
  mis-ordered.

* Don't re-write indirect blocks that the OS had already written out.  This
  check is already being made for data blocks, and inode modifications are
  embedded and thus must always be written out.

* This fixes issues where 'hammer2 show <device>' would find corrupt
  topology during concurrent filesystem write activity.  The disk media
  is always supposed to be consistent.

  We don't care about block-reuse cases for this debug command but we do
  care that, sans block-reuse, a dump will produce a consistent topology.

2 years agohammer2 - general stabilization, flusher, mmap, etc
Matthew Dillon [Sat, 19 May 2012 00:19:17 +0000 (17:19 -0700)]
hammer2 - general stabilization, flusher, mmap, etc

* Revamp the flush logic.  Flushes now stage the blockref related to the
  data written out to the media.  Higher level chains save the staged
  blockref instead of the current blockref.

* This allows flushes to occur concurrent with active modification of the
  topology without having to restart the flush.  Modifications made after
  the flush has started running will remain intact and not be committed
  to media until the next flush (see note).

  NOTE: Currently chain deletions break this, but this is the only issue
  currently.

* Fix lost chains during unmount.  Deleted chains can still have the MOVED
  and/or MODIFIED bits set, which add additional refs and prevents them
  from being freed.

  Detect when a chain is being deleted permanently (verses temporarily due
  to a rename) and clean out the bits in question.

  NOTE: Currently deletions are removed from the in-memory topology, which
is why the previous NOTE above is still a problem, so we will need
to fix this and to retain at least the MOVED for flushes in
progress.

* Fix data corruption related to unflagged chains which wind up not getting
  flushed and also due to a bug in the indirect block management code.

* Fix a mmap() access failure for cached direct-data (less than 512 bytes).
  nvextendbuf() was not being called for the direct-data case during the
  write().

* Buildworld with a HAMMER2 /usr/obj now succeeds.

* 'hammer2 pfs-create <label>' now defaults to a pfstype of MASTER,
  instead of requiring that the pfstype always be specified.

2 years agoMerge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository...
Matthew Dillon [Sat, 19 May 2012 00:01:15 +0000 (17:01 -0700)]
Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2

2 years agoamr(4): Some fixes.
Sascha Wildner [Fri, 18 May 2012 23:57:25 +0000 (01:57 +0200)]
amr(4): Some fixes.

* Bring in some small updates from FreeBSD.

* Add MODULE_VERSION.

* Make the interrupt handler MPSAFE. This was a porting oversight by me.

2 years agoFix some typos in manual pages.
Sascha Wildner [Fri, 18 May 2012 18:10:50 +0000 (20:10 +0200)]
Fix some typos in manual pages.

2 years agobsd-family-tree: Sync with FreeBSD.
Sascha Wildner [Fri, 18 May 2012 16:53:28 +0000 (18:53 +0200)]
bsd-family-tree: Sync with FreeBSD.

2 years agobuiltin.1: Bring in some enhancements from FreeBSD.
Sascha Wildner [Fri, 18 May 2012 11:16:32 +0000 (13:16 +0200)]
builtin.1: Bring in some enhancements from FreeBSD.

It is modeled after what they did but based on what we actually have in
our shells' source.

* Use "No**" to mark commands which exist externally but are implemented
  as a script executing the builtin.

* Some further explanations and mdoc fixes.

2 years agobuiltin.1: Add two more built-in commands.
Sascha Wildner [Fri, 18 May 2012 11:05:09 +0000 (13:05 +0200)]
builtin.1: Add two more built-in commands.

2 years agotcp: Implement RFC4653 Non-Congestion Robustness (NCR)
Sepherosa Ziehau [Fri, 18 May 2012 07:29:39 +0000 (15:29 +0800)]
tcp: Implement RFC4653 Non-Congestion Robustness (NCR)

It is enabled by default and can be disabled using sysctl node:
net.inet.tcp.ncr

As far as I have tested on heavily reordered network path, this
algorithm does avoid most of the spurious fast retransmits.  While
on the normal network path, the fast retransmits stil could be
triggered properly.

2 years agotcp: Improve RFC3517bis support
Sepherosa Ziehau [Fri, 18 May 2012 02:33:21 +0000 (10:33 +0800)]
tcp: Improve RFC3517bis support

- Factor out tcp_fast_recovery()
- Delay fast retransmit or fast recovery for duplicated ACK which
  carries data or updates receiving window, so that
  o  The segments sent by fast retransmit/recovery could carry
     proper ack sequence and SACK information.
  o  Receiving window could get updated, so more new data could be
     injected into the network by the fast recovery.

2 years agoMerge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository...
Matthew Dillon [Fri, 18 May 2012 03:00:27 +0000 (20:00 -0700)]
Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2

2 years agohammer2 - hardlink stabilization (3), data and inode count propagation.
Matthew Dillon [Fri, 18 May 2012 01:41:51 +0000 (18:41 -0700)]
hammer2 - hardlink stabilization (3), data and inode count propagation.

* Files with cached chains have to be flushed before they can be copied
  to the hardlink target, because the original inode will become a
  OBJTYPE_HARDLINK pointer which isn't allowed to have any sub-chains
  under the inode.

* We also need to flush for the upcoming snapshot function to work properly
  or dirty in-memory data will not show up in the snapshot.

* Propagate the inode and byte use count up the chain.  Tie the inode count
  into df's inode count (per-PFS).  The byte count and quota fields are not
  yet tied in.

* Adjust stat[v]fs() to return filesystem space useage using the allocation
  iterator for now, to aid debugging.

* Adjust the allocation iterator to skip reserved areas at the beginning of
  each 2GB storage zone.

2 years agokernel: Remove some bogus casts to the own type.
Sascha Wildner [Thu, 17 May 2012 23:52:22 +0000 (01:52 +0200)]
kernel: Remove some bogus casts to the own type.

2 years agobuiltin.1: Sync with what we have.
Sascha Wildner [Thu, 17 May 2012 23:03:10 +0000 (01:03 +0200)]
builtin.1: Sync with what we have.

2 years agoshare/man/man1/Makefile: One MLINK per line.
Sascha Wildner [Thu, 17 May 2012 21:48:20 +0000 (23:48 +0200)]
share/man/man1/Makefile: One MLINK per line.

2 years agoexamples/rconfig: Some fixes to our installation scripts.
Sascha Wildner [Thu, 17 May 2012 20:17:53 +0000 (22:17 +0200)]
examples/rconfig: Some fixes to our installation scripts.

* Allow the script to be run in a netbooted scenario, too.

* Raise the default size of the root partition to 768M (like the
  installer's default).

* While here, add some comments and whitespace.

Submitted-by: Joachim de Groot <jdegroot@web.de>
2 years agohammer2 - hardlink stabilization pass
Matthew Dillon [Thu, 17 May 2012 18:34:18 +0000 (11:34 -0700)]
hammer2 - hardlink stabilization pass

* Fix another edge case where nkeybits could exceed 64, resulting in
  an assertion.

2 years agohammer2 - hardlink stabilization pass
Matthew Dillon [Thu, 17 May 2012 18:01:51 +0000 (11:01 -0700)]
hammer2 - hardlink stabilization pass

* Fix infinite loop in hammer2_chain_create_indirect() related to the
  case where the key range is the full 64 bits, which can occur when
  invisible hardlink entries are mixed in with normal entries.

* Fix the nlinks count in a couple of places.

* Don't iterate invisibile directory entries.  Lookups of hardlink targets
  by inode number are absolute.  Normal directory entries have a collision
  counter, hardlink targets do not.

2 years agoMerge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository...
Matthew Dillon [Thu, 17 May 2012 18:01:30 +0000 (11:01 -0700)]
Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2

2 years agovkernel: Fix compilation with profiling support.
Sascha Wildner [Thu, 17 May 2012 14:17:22 +0000 (16:17 +0200)]
vkernel: Fix compilation with profiling support.

The vkernel is a special userland program in the regard that its Makefile
is generated by config(8), which is kind of tailored to the real kernel.

So first of all, we have to modify config(8) to detect it's a vkernel we
want to build and in this case it should not define GPROF which otherwise
activates the real kernel's profiling bits.

Then, modify libkern's mcount.c to skip kernel specific parts too.

Then, modify the vkernels' Makefiles to take into account ${PROF} (and
while we're here, ${DEBUG} too) which are set by the surrounding Makefile
which is generated by config(8).

The vkernel is now (from profiling point of view) treated like any other
userland program.

Last but not least, add some documentation about building a vkernel with
profiling support to vkernel's manpage.

To build with profiling, simply add CONFIGARGS=-p to the buildkernel
command line. It will need the config(8) program to be in /usr/obj's
btools dir, so either a buildworld with this commit needs to be done,
or config can be installed manually to /usr/sbin and nativekernel can
be used.

Tested-by: tuxillo
2 years agotcp: Ignore TCP_NOPUSH socketopt by default
Sepherosa Ziehau [Thu, 17 May 2012 09:58:41 +0000 (17:58 +0800)]
tcp: Ignore TCP_NOPUSH socketopt by default

For ill optimized programs which misuses this sockopt will cause
unpredicted length of network stalling, if the total sending size
is not TCP sending segment size aligned.

sysctl node net.inet.tcp.disable_nopush controls whether TCP_NOPUSH
will take effect or not

I am not going to fight agaist the stupid programs in the wild.

DragonFly-bug: http://bugs.dragonflybsd.org/issues/2368

This is actually _not_ a bug on our side.

2 years agoMerge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository...
Matthew Dillon [Thu, 17 May 2012 08:53:11 +0000 (01:53 -0700)]
Merge branches 'hammer2' and 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly into hammer2

2 years agohammer2 - Complete core hardlink support work
Matthew Dillon [Thu, 17 May 2012 08:36:51 +0000 (01:36 -0700)]
hammer2 - Complete core hardlink support work

This implements core hardlink support for hammer2.  In order to maintain the
strict bottom-up block modification hierarchy for the chains hardlinks must
be implemented with special forwarding inodes.

When a hardlink is created (nlinks 1->2) the file is replaced with a
forwarding entry and then recreated as a special hidden directory entry
indexed by its inode number at a higher directory level which is common
to all hardlinks to that file.

The forwarding entry simply specifies the inode number, thus our ability to
trivially snapshot a PFS is retained.

Since the real inode is indexed at a higher common directory locating the
real inode simply requires iterating parent directories until we find a
match.

* Default vfs.hammer2.hardlink_enable to 1 (enabled).

* Track and adjust nlinks.

* Implement OBJTYPE_HARDLINK forwarding directory entry, hidden inode,
  vnode->v_data inode replacement for the nlinks 1->2 case, and hidden
  inode deletion for the nlinks 1->0 case.

* The deconsolidation for the nlinks 2->1 case is not yet implemented.