Matthew Dillon [Sun, 8 Jun 2008 18:16:26 +0000 (18:16 +0000)]
HAMMER 53B/Many: Complete overhaul of strategy code, reservations, etc
* Completely overhaul the strategy code. Implement direct reads and writes
for all cases. REMOVE THE BACKEND BIO QUEUE. BIOs are no longer queued
to the flusher under any circumstances.
Remove numerous hacks that were previously emplaced to deal with BIO's
being queued to the flusher.
* Add a mechanism to invalidate buffer cache buffers that might be shadowed
by direct I/O. e.g. if a strategy write uses the vnode's bio directly
there may be a shadow hammer_buffer that will then become stale and must
be invalidated.
* Implement a reservation tracking structure (hammer_reserve) to track
storage reservations made by the frontend. The backend will not attempt
to free or reuse reserved space if it encounters it.
Use reservations to back cached holes (struct hammer_hole) for the
same reason.
* Index hammer_buffer on the zone-X offset instead of the zone-2 offset.
Base the RB tree in the hammer_mount instead of (zone-2) hammer_volume.
This removes nearly all blockmap lookup operations from the critical path.
* Do a much better job tracking cached dirty data for the purposes of
calculating whether the filesystem will become full or not.
* Fix a critical bug in the CRC generation of short data buffers.
* Fix a VM deadlock.
* Use 16-byte alignment for all on-disk data instead of 8-byte alignment.
* Major code cleanup.
As-of this commit write performance is now extremely good.
Matthew Dillon [Sun, 8 Jun 2008 17:19:09 +0000 (17:19 +0000)]
HAMMER Utilities: Critical bug in newfs_hammer
* newfs_hammer was not properly setting up the small-data zone.
Sepherosa Ziehau [Sun, 8 Jun 2008 10:06:05 +0000 (10:06 +0000)]
Add tunable to enable/disable PBCC support in acx(4) and it is enabled
by default.
Sepherosa Ziehau [Sun, 8 Jun 2008 08:38:06 +0000 (08:38 +0000)]
Parallelize in_ifaddrhead operation
Nicolas Thery [Sun, 8 Jun 2008 07:56:06 +0000 (07:56 +0000)]
Assert that move in directory entry hash table can't fail.
Sepherosa Ziehau [Sun, 8 Jun 2008 03:58:03 +0000 (03:58 +0000)]
- oia is no longer used
- Tranform for(;;) loop into while() loop
Michael Neumann [Sat, 7 Jun 2008 12:30:26 +0000 (12:30 +0000)]
Move fetching of "hw.hasbrokenint12" tunable closer to it's usage.
Michael Neumann [Sat, 7 Jun 2008 12:15:33 +0000 (12:15 +0000)]
Cosmetic changes (remove whitespace).
Michael Neumann [Sat, 7 Jun 2008 12:03:52 +0000 (12:03 +0000)]
Remove unnecessary conversion to kilobytes (divide by 1024) to then later
multiply it again by 1024 to get to bytes.
Michael Neumann [Sat, 7 Jun 2008 11:44:04 +0000 (11:44 +0000)]
Use NULL instead of 0.
Michael Neumann [Sat, 7 Jun 2008 11:37:23 +0000 (11:37 +0000)]
Correct typos.
Matthew Dillon [Sat, 7 Jun 2008 07:41:51 +0000 (07:41 +0000)]
HAMMER 53A/Many: Read and write performance enhancements, etc.
* Add hammer_io_direct_read(). For full-block reads this code allows
a high-level frontend buffer cache buffer associated with the
regular file vnode to directly access the underlying storage,
instead of loading that storage via a hammer_buffer and bcopy()ing it.
* Add a write bypass, allowing the frontend to bypass the flusher and
write full-blocks directly to the underlying storage, greatly improving
frontend write performance. Caveat: See note at bottom.
The write bypass is implemented by adding a feature whereby the frontend
can soft-reserve unused disk space on the physical media without having
to interact (much) with on-disk meta-data structures. This allows the
frontend to flush high-level buffer cache buffers directly to disk
and release the buffer for reuse by the system, resulting in very high
write performance.
To properly associate the reserved space with the filesystem so it can be
accessed in later reads, an in-memory hammer_record is created referencing
it. This record is queued to the backend flusher for final disposition.
The backend disposes of the record by inserting the appropriate B-Tree
element and marking the storage as allocated. At that point the storage
becomes official.
* Clean up numerous procedures to support the above new features. In
particular, do a major cleanup of the cached truncation offset code
(this is the code which allows HAMMER to implement wholely asynchronous
truncate()/ftruncate() support.
Also clean up the flusher triggering code, removing numerous hacks that
had been in place to deal with the lack of a direct-write mechanism.
* Start working on statistics gathering to track record and B-Tree
operations.
* CAVEAT: The backend flusher creates a significant cpu burden when flushing
a large number of in-memory data records. Even though the data itself
has already been written to disk, there is currently a great deal of
overhead involved in manipulating the B-Tree to insert the new records.
Overall write performance will only be modestly improved until these
code paths are optimized.
Sepherosa Ziehau [Sat, 7 Jun 2008 07:22:22 +0000 (07:22 +0000)]
Use ASSERT_IFAC_VALID whenever possible
Sepherosa Ziehau [Sat, 7 Jun 2008 06:34:57 +0000 (06:34 +0000)]
Add ASSERT_IFAC_VALID
Sepherosa Ziehau [Sat, 7 Jun 2008 04:59:01 +0000 (04:59 +0000)]
- Expose ifa_forwardmsg()
- Add ifa_domsg()
# They will be needed soon
Sascha Wildner [Fri, 6 Jun 2008 13:19:25 +0000 (13:19 +0000)]
Don't use NULL where 0 is meant.
Sepherosa Ziehau [Fri, 6 Jun 2008 12:35:27 +0000 (12:35 +0000)]
Make sure that ifac is still valid before unlinking it from or linking it to ifnet
Sepherosa Ziehau [Fri, 6 Jun 2008 10:47:14 +0000 (10:47 +0000)]
Add periodic rf calibration support for acx111 part. This seems to stablize
performance during long time TX stress.
Sascha Wildner [Thu, 5 Jun 2008 18:06:33 +0000 (18:06 +0000)]
* Fix some cases where NULL was used but 0 was meant (and vice versa).
* Remove some bogus casts of NULL to (void *).
Sascha Wildner [Thu, 5 Jun 2008 18:01:49 +0000 (18:01 +0000)]
Remove some unneeded definitions of NULL.
Sascha Wildner [Thu, 5 Jun 2008 17:53:10 +0000 (17:53 +0000)]
Include <sys/_null.h> for the definition of NULL.
Sascha Wildner [Thu, 5 Jun 2008 17:49:53 +0000 (17:49 +0000)]
Add <sys/_null.h> for the definition of NULL:
[...]
#ifndef __cplusplus
#define NULL ((void *)0)
#else
#define NULL 0
#endif
[...]
Sepherosa Ziehau [Thu, 5 Jun 2008 15:29:47 +0000 (15:29 +0000)]
Add rt_cpuid, which records rtentry's owning CPU id. It could ease route
entry related debugging and sanity checks.
Nicolas Thery [Wed, 4 Jun 2008 04:34:54 +0000 (04:34 +0000)]
Fix bugs in spin_trylock_wr():
- globaldata.gd_spinlock_wr was not decremented back on failure;
- incorrect comparison in loop trying to clear cached shared bits (loop
must fail if spinlock is still held for read by another cpu).
Reviewed-by: dillon@
Matthew Dillon [Tue, 3 Jun 2008 18:47:25 +0000 (18:47 +0000)]
HAMMER 52/Many: Read-only mounts and mount upgrades/downgrades.
* Finish implementing MNT_UPDATE, allowing a HAMMER mount to be upgraded
or downgraded.
* Adjust the recovery code to not flush buffers dirtied by recovery
operations (running the UNDOs) when the mount is read-only. The
buffers will be flushed when the mount is updated to read-write.
* Improve recovery performance by not flushing dirty buffers until the
end (if a read-write mount).
* A crash which occurs during recovery might cause the next recovery to
fail. Delay writing out the recovered volume header until all the other
buffers have been written out to fix the problem.
Matthew Dillon [Tue, 3 Jun 2008 18:43:34 +0000 (18:43 +0000)]
HAMMER Utilities: Enhance mount_hammer
* Allow devices to be specified as dev:dev:dev, so a multi-volume hammer
mount can be specified in /etc/fstab.
* Implement -u (mount update)
Matthew Dillon [Tue, 3 Jun 2008 16:16:40 +0000 (16:16 +0000)]
Do not update f_offset on EINVAL.
Reported-by: VOROSKOI Andras <voroskoi@gmail.com>
Sascha Wildner [Tue, 3 Jun 2008 12:40:09 +0000 (12:40 +0000)]
mdoc cleanup
Sascha Wildner [Tue, 3 Jun 2008 09:33:27 +0000 (09:33 +0000)]
Fix a crash when "Arctic Ocean" was selected.
Taken-from: FreeBSD
Matthew Dillon [Tue, 3 Jun 2008 06:20:30 +0000 (06:20 +0000)]
HAMMER Utilities: More pre-formatting, cleanup
* Fully initialize the large-data and small-data blockmaps in addition
to the B-Tree blockmap.
* Set vol0_stat_bigblocks properly so used space shows as 0, or otherwise
a fairly small number, when the volume is empty.
* Display the total amount of space pre-allocated by newfs_hammer for
the boot-area, memory-log, undo-buffer, and blockmap infrastructure.
Sascha Wildner [Mon, 2 Jun 2008 20:40:07 +0000 (20:40 +0000)]
Add 'options HAMMER' to LINT.
Noticed-by: Dionysus Blazakis <dion.blazakis@gmail.com>
Matthew Dillon [Mon, 2 Jun 2008 20:19:03 +0000 (20:19 +0000)]
HAMMER 51/Many: Filesystem full casework, nohistory flag.
* Track the amount of unsynced information and return ENOSPC if the
filesystem would become full. The idea here is to detect that the
filesystem is full and yet still give the flusher enough runway to
flush cached dirty data and inodes.
* Implement the NOHISTORY flag. Implement inheritance of NOHISTORY and
NODUMP.
The NOHISTORY flag tells HAMMER not to retain historical information on
a filesystem object. If set on a directory any objects created in that
directory will also inherit the flag. For example, it could be set
on /usr/obj.
Matthew Dillon [Mon, 2 Jun 2008 20:17:07 +0000 (20:17 +0000)]
Report the nohistory, noshistory, and nouhistory flags, and allow them
to be specified with chflags.
Set the inverted field for nosunlink, nosunlnk, nouunlink, nouulnk. This
is an inverted flag.
Matthew Dillon [Mon, 2 Jun 2008 20:13:38 +0000 (20:13 +0000)]
Add the UF_NOHISTORY and SF_NOHISTORY chflags flags. The nohistory flag
allows you to mount a HAMMER filesystem normally and still specify that
certain files and directory not retain historical information.
If set on a directory the flag will be inherited by any objects
created within that directory.
Adjust UFS to inherit the NOHISTORY and NODUMP flags from the parent
directory when creating new objects. NODUMP used to be ufs-specific
and UFS's backup/restore remembered the inheritance. As a more generic
flag it needs to be inherited within the filesystem. Note that UFS
has no historical retention capability and ignores the NOHISTORY flag,
but we might use it with the journaling audit trail later.
Matthew Dillon [Mon, 2 Jun 2008 20:06:36 +0000 (20:06 +0000)]
Disallow negative seek positions for regular files, directories, and
character-special devices to conform to OpenGroup specifications.
Reported-by: VOROSKOI Andras <voroskoi@gmail.com>
Matthew Dillon [Mon, 2 Jun 2008 20:03:22 +0000 (20:03 +0000)]
Add missing exit(1).
Reported-by: Johannes Hofmann <hofmann@blob.baaderstrasse.com>
Matthew Dillon [Mon, 2 Jun 2008 16:57:53 +0000 (16:57 +0000)]
HAMMER Utilities: Correct vol0_stat_freebigblocks.
* The root_vol->ondisk->vol0_stat_freebigblocks was not being properly
decremented when newfs_hammer allocated big blocks, causing it to report
a value that is too large.
Matthew Dillon [Mon, 2 Jun 2008 16:55:08 +0000 (16:55 +0000)]
Fix kernel compile warnings.
Matthew Dillon [Mon, 2 Jun 2008 16:54:21 +0000 (16:54 +0000)]
Even using the objcache we need a one-per-cpu free-thread cache in order
to keep an exiting thread intact throughout its exit sequence.
This fixes a double-fault which can occur on shutdown. The bug was mainly
tickled by exiting kernel threads.
Hasso Tepper [Mon, 2 Jun 2008 06:50:08 +0000 (06:50 +0000)]
According to SUSv3 including just regex.h must be enough. Fixes build of
several pkgsrc packages.
Hasso Tepper [Mon, 2 Jun 2008 06:42:45 +0000 (06:42 +0000)]
Unbreak buildworld.
Matthew Dillon [Sun, 1 Jun 2008 21:05:39 +0000 (21:05 +0000)]
HAMMER 50/Many: VFS_STATVFS() support, stabilization.
* Add support for VFS_STATVFS(), returning 64 bit quantities for available
space, etc.
* When freeing a big-block any holes cached for that block must be
cleaned out.
* Fix a conditional testing whether a layer2 big-block must be allocated in
layer1. The bug could only occur if a layer2 big-block gets freed in
layer1, and we currently never do this.
* Clean-up comments related to freeing blocks.
Matthew Dillon [Sun, 1 Jun 2008 20:59:29 +0000 (20:59 +0000)]
HAMMER Utilities: Performance adjustments, bug fixes.
* Newfs_hammer now pre-allocates the layer1 and layer2 blockmap blocks,
and pre-sizes each blockmap to 4x the initial filesystem size instead
of 100x the initial filesystem size.
The blockmap can be dynamically resized at any time, given a little code.
In addition, there is simply no need to give it a 100x initial dynamic
range. This only bloats the size of the layer-2 map unnecessarily.
* Change alloc_blockmap() to use rootmap->next_offset for allocations
instead of rootmap->alloc_offset and fix a bug where rootmap->phys_offset
was improperly being incremented (it is a fixed field once set).
The bug was in a code-path that could not by executed by current
incarnations of newfs_hammer.
Matthew Dillon [Sun, 1 Jun 2008 20:52:21 +0000 (20:52 +0000)]
Use newly available libc and system calls related to statvfs to make df
work with 64 bit statvfs fields. Tested with a 4TB VN-backed HAMMER
filesystem.
Matthew Dillon [Sun, 1 Jun 2008 20:46:45 +0000 (20:46 +0000)]
Add getmntvinfo() which uses the new getvfsstat() system call.
Matthew Dillon [Sun, 1 Jun 2008 20:44:45 +0000 (20:44 +0000)]
More header file cleanups related statvfs.
Matthew Dillon [Sun, 1 Jun 2008 20:18:03 +0000 (20:18 +0000)]
Clean up statvfs() and related prototypes. Place the prototypes in the
correct file.
Matthew Dillon [Sun, 1 Jun 2008 19:55:32 +0000 (19:55 +0000)]
Implement a new system call: getvfsstat(). This system call returns
an array of statfs and statvfs structures. Unfortunately there is no way
to just return an array of statvfs structures because the statvfs structure
does not have sufficient information in it to identify the mount point.
getvfsstat(struct statfs *buf, struct statvfs *vbuf,
long vbufsize, int flags);
Matthew Dillon [Sun, 1 Jun 2008 19:27:37 +0000 (19:27 +0000)]
* Implement new system calls in the kernel: statvfs(), fstatvfs(),
fhstatvfs().
* Implement a new VFS op, VFS_STATVFS(). Implement a default for this new
op for VFSs which do not implement VFS_STATVFS(), which calls VFS_STATFS()
and converts the structure (using Joerg's conversion procedure from libc).
* Remove statvfs(), fstatvfs(), and fhstatvfs() from libc. These functions
are now system calls.
Sascha Wildner [Sun, 1 Jun 2008 08:54:18 +0000 (08:54 +0000)]
Raise the size of the /etc MFS to 12MB (for ssh blacklists).
Sepherosa Ziehau [Sun, 1 Jun 2008 08:09:14 +0000 (08:09 +0000)]
- Rename ifa_portfn() to ifnet_portfn()
- Create inline function ifa_portfn() which simply calls ifnet_portfn()
Sepherosa Ziehau [Sun, 1 Jun 2008 07:44:37 +0000 (07:44 +0000)]
Rename:
ifaddrinit -> ifnetinit
ifaddr_threads -> ifnet_threads
Sepherosa Ziehau [Sun, 1 Jun 2008 07:43:29 +0000 (07:43 +0000)]
Avoid code duplication
Sepherosa Ziehau [Sun, 1 Jun 2008 04:01:24 +0000 (04:01 +0000)]
acx111 parts can't send using short slot time, but it seems to have no problem
to receive packets sent using short slot time. Turn on short slot time
support, so that we don't prevent other STA from using short slot time.
Sepherosa Ziehau [Sun, 1 Jun 2008 03:58:38 +0000 (03:58 +0000)]
Use 1Mbits/s as beacon sending rate; it seems to fix TX performance issue
when using acx(4) as hostap.
Matthew Dillon [Sun, 1 Jun 2008 02:03:10 +0000 (02:03 +0000)]
HAMMER Utilities: New utility 'undo'.
* Add a new utility called 'undo' which makes use of HAMMER capabilities
to retrieve prior versions of a file, even if that file has been deleted.
This utility can dump all prior versions of the file, generate a history
of transaction ids associated with the file, has a 'quick diff' relative
to the most recent change, and can also generate diffs for all versions
of the file.
This utility only works with HAMMER filesystems.
Matthew Dillon [Sun, 1 Jun 2008 01:33:58 +0000 (01:33 +0000)]
HAMMER Utilities: Cleanup
* Cleanup the softprune code a bit.
Matthew Dillon [Sun, 1 Jun 2008 01:33:25 +0000 (01:33 +0000)]
HAMMER 49B/Many: Stabilization pass
* Fix range checks in the pruning ioctl.
* Fix an incorrect assertion in hammer_vop_strategy_read().
Matthew Dillon [Sat, 31 May 2008 18:45:04 +0000 (18:45 +0000)]
HAMMER Utilities: Add the 'hammer softprune' command.
Add a new hammer pruning command called 'hammer softprune'. This command
is much simpler to use then the 'hammer prune' command. You simply specify
a directory containing softlinks to HAMMER snapshots, typically in the form:
"<path_to_hammer_filesystem>/@@0x<16-char-transaction_id>". The command
will scan the directory non-recursively, collect all the softlinks, extract
the transaction ids, and prune the HAMMER filesystem to contain only those
snapshots.
In addition, information created before the snapshot softlink with the
lowest transaction id is destroyed and information created after the
softlink with the highest transaction id is retained (remains fine-grained).
This gives the administrator an easy way to maintain official snapshots
while at the same time retaining our 'undo' capability by leaving recent
modifications intact.
A simple cron job or script coupled with the use of the 'hammer synctid'
can be used to create a snapshot softlink every so often, and older
snapshots can be cleaned out or thinned simply by removing the associated
softlinks, and then re-running 'hammer softprune' on the directory
containing the softlinks.
Unlike the 'hammer prune' command, the softprune command does not require
the time ranges for snapshots to be well-ordered.
Matthew Dillon [Sat, 31 May 2008 18:37:57 +0000 (18:37 +0000)]
HAMMER 49/Many: Enhance pruning code
* Pass the element array in as a pointer rather then embedding it in
the hammer_ioc_prune structure.
* Adjust the modulo calculations to allow non-aligned snapshots to be
pruned, to support the new 'hammer softprune' command.
Sepherosa Ziehau [Sat, 31 May 2008 13:12:59 +0000 (13:12 +0000)]
Reduce log verbosity
Sascha Wildner [Sat, 31 May 2008 12:04:15 +0000 (12:04 +0000)]
mdoc cleanup
Sepherosa Ziehau [Sat, 31 May 2008 11:18:09 +0000 (11:18 +0000)]
Use same naming convention as other host controller stats
Sepherosa Ziehau [Sat, 31 May 2008 08:29:05 +0000 (08:29 +0000)]
- Avoid excessive goto
- Adjust arphdr pointer, if we need to do another m_pullup()
- Indentation in switch block
Obtained-from: FreeBSD w/ mod
Sepherosa Ziehau [Sat, 31 May 2008 06:03:26 +0000 (06:03 +0000)]
Add ifa_listmask field in ifaddr_container; currently it is mainly used
to do sanity checks.
Sascha Wildner [Sat, 31 May 2008 04:51:55 +0000 (04:51 +0000)]
Add some missing manual pages: wcscoll(3), wcswidth(3), wcsxfrm(3), and
wcwidth(3)
Taken-from: NetBSD
Sascha Wildner [Fri, 30 May 2008 22:58:08 +0000 (22:58 +0000)]
Mention that -W too only works with -p or -d.
Sascha Wildner [Fri, 30 May 2008 22:51:31 +0000 (22:51 +0000)]
Minor corrections.
Simon Schubert [Fri, 30 May 2008 21:47:04 +0000 (21:47 +0000)]
Implement Farnsworth mode.
Sascha Wildner [Fri, 30 May 2008 18:00:23 +0000 (18:00 +0000)]
Fix typo.
Noticed-by: Antonio Huete Jimenez <tuxillo@quantumachine.net>
Simon Schubert [Fri, 30 May 2008 09:39:46 +0000 (09:39 +0000)]
Include sys/select.h to conform to SUS.
Noticed-by: hasso@
Simon Schubert [Fri, 30 May 2008 08:07:22 +0000 (08:07 +0000)]
Fix type name as well.
Noticed-by: hasso@
Simon Schubert [Fri, 30 May 2008 08:05:49 +0000 (08:05 +0000)]
Fix macro name.
Noticed-by: hasso@
Sepherosa Ziehau [Thu, 29 May 2008 13:30:15 +0000 (13:30 +0000)]
Add brief description about the recently added interrupt moderation sysctl nodes
Sepherosa Ziehau [Wed, 28 May 2008 13:53:42 +0000 (13:53 +0000)]
- Rename bce_init_context() to bce_init_ctx()
- Clear out all quick context entries. This fixes the watchdog timeout, which
usually happens after many tiny packets are transmitted
Obtained-from: FreeBSD if_bce.c rev 1.36
# FreeBSD didn't mention a _single_ word of this change and its effect in the
# commit log. The description of the fix and the its effect is obtained from
# Linux bnx2 commit log.
Sepherosa Ziehau [Wed, 28 May 2008 12:11:13 +0000 (12:11 +0000)]
- ifnet.if_output() should be called without ifnet.if_serializer being
held. Add assertion about it in ether_output().
- ether_output_frame() should be called without the ifnet.if_serializer
being held. Add assertion in it.
- arp_ifinit() will be called with ifnet.if_serializer being held. To
prevent serializer from recursion, ifnet.if_serializer is released
before calling arprequest(), which supposes caller does not hold output
iface's serializer.
- ifnet.if_serializer can't be held when calling arp_ifinit2().
Reported-by: dillon@
Sepherosa Ziehau [Wed, 28 May 2008 10:51:56 +0000 (10:51 +0000)]
- Add tunables and sysctl nodes for interrupt moderation variables.
Settings are committed to device during device initialization or in
interrupt routine.
- Default interrupt moderation variables' value from Broadcom's driver
seem to be misconfigiured. Following changes are made:
Send max coalesced BD count 20 -> 24
Send coalescing ticks 80 -> 1000 (~1000HZ)
Receive max coalesced BD count 6 -> 24
Receive coalescing ticks 18 -> 125 (~8000Hz)
Two changes on "Receive" interrupt moderation variables avoid livelock
Simon Schubert [Wed, 28 May 2008 10:37:25 +0000 (10:37 +0000)]
Include sys/fd_set.h in the BSD_VISIBLE case.
Seems that a lot of software assumes that fd_set will be defined after
including sys/types.h, so restore this property.
Noted-by: hasso@
Simon Schubert [Wed, 28 May 2008 10:35:11 +0000 (10:35 +0000)]
Move definition of fd_set to sys/fd_set.h.
Matthew Dillon [Tue, 27 May 2008 23:44:46 +0000 (23:44 +0000)]
Generate a semi-random MAC address when connecting to a SOCK_SEQPACKET
socket (ala vknetd's /dev/vknet) instead of a TAP interface.
Matthew Dillon [Tue, 27 May 2008 23:26:38 +0000 (23:26 +0000)]
Implement a new utility called vknet. This utility interconnects the
networks between the local machine and a remote machine, typically by
connecting to a TAP interface or socket supplied by a running vknetd on
each machine.
Update manual pages for vknetd and vkernel to include vknet.
Matthew Dillon [Tue, 27 May 2008 22:47:16 +0000 (22:47 +0000)]
Only test the IP protocol (ip_p) for IP frames.
Matthew Dillon [Tue, 27 May 2008 17:31:12 +0000 (17:31 +0000)]
Fix socketvar.h inclusion by userland. This is a temporary hack and,
frankly, a lot more of the header file should be made _KERNEL-only.
Reported-by: Hasso Tepper <hasso@estpak.ee>
Matthew Dillon [Tue, 27 May 2008 17:10:49 +0000 (17:10 +0000)]
Add the notty utility, a program I wrote long ago which I should have
brought into base long ago. This program provides a convenient shortcut
to detaching a command from the controlling terminal and running it in the
background.
Sepherosa Ziehau [Tue, 27 May 2008 13:27:35 +0000 (13:27 +0000)]
Use standard include path
Sascha Wildner [Tue, 27 May 2008 12:08:29 +0000 (12:08 +0000)]
Sync zoneinfo database with tzdata2008c from elsie.
africa: 8.10 -> 8.11
asia: 8.18 -> 8.20
africa - Morocco observes DST in 2008
asia - Choibalsan is now 8 hours off UTC
- Pakistan observes DST in 2008
Sepherosa Ziehau [Tue, 27 May 2008 12:07:01 +0000 (12:07 +0000)]
Only collect 'count' packets when polling(4) is used. Set softc cached
RX consumer index to the value we have processed before break out of the
loop, so we could come back again upon next poll.
Michael Neumann [Tue, 27 May 2008 12:00:47 +0000 (12:00 +0000)]
Add a boot loader tunable hw.usb.hack_defer_exploration, which if set
to 0 reverts to the old USB behaviour, i.e. USB keyboards should be again
usable early at boot. By default, this is set to 1 which will avoid hanging
the system on qemu and my HP compaq laptop and maybe others.
Note that this is a hack around a shortcoming in the current USB stack and
will go away once the shortcoming has been fixed.
Sepherosa Ziehau [Tue, 27 May 2008 11:39:42 +0000 (11:39 +0000)]
- Apply same adjustment to softc cached TX/RX BD index
- Clear used_tx_bd in bce_free_tx_chain
Obtained-from: FreeBSD if_bce.c rev 1.{36,37}
Matthew Dillon [Tue, 27 May 2008 07:48:00 +0000 (07:48 +0000)]
Do not try to set-up the bridge or tap interfaces when connecting to
a unix domain socket instead of a tap interface.
Matthew Dillon [Tue, 27 May 2008 07:46:57 +0000 (07:46 +0000)]
Add vknetd to the build.
Matthew Dillon [Tue, 27 May 2008 05:25:36 +0000 (05:25 +0000)]
Get rid of an old and terrible hack. Local stream sockets enqueue packets
directly on the peer's sockbuf, rather then the sender's sockbuf. That
part of the code is fine, but in order to prevent the sender from queueing
infinite mbufs (because its sockbuf appears to be empty when you do that)
the code dynamically messed around with the sender's high water mark.
This blew up the new SOCK_SEQPACKET. In particular, it blows up the
use of the PR_ATOMIC on stream sockets and can cause spurious EMSGSIZE
errors to be returned instead of the EWOULDBLOCK that should have been
returned.
Also fix, or partially the resource limit code which tries to reduce the
high water mark when a user is using too many mbufs. This never worked
well and still doesn't, but it is in better shape now.
Get rid of the crufty code and simply add a flag to the signalsockbuf,
SSB_STOP, to stop the sender.
Also adjust the vkernel to increase the default socket buffer when
connecting to vknet instead of if_tap. VKE currently issues non-blocking
writes to vknet/tap and we do not want to lose packets for no good reason.
Matthew Dillon [Tue, 27 May 2008 01:58:02 +0000 (01:58 +0000)]
Create a new daemon called vknetd. This daemon uses the new SOCK_SEQPACKET
feature to create a virtualized packet bridge accessible by userland (in
particular, user-run virtual kernels).
Matthew Dillon [Tue, 27 May 2008 01:10:47 +0000 (01:10 +0000)]
* Implement SOCK_SEQPACKET sockets for local communications. These sockets
operate like SOCK_STREAM but each write() builds a record and each read()
reads a record. That is, the data is not aggregated together or allowed
to be partially read.
This allows local sockets to have the same packetization characteristics
as if_tap when desired.
* Add a feature to the vkernel which allows a unix domain socket to be
specified for the network interface rather then a TAP interface. The
vkernel will connect to the socket using SOCK_SEQPACKET and read and
write packets to it.
* Clean up some libc/kernel namespace collisions related to including
sys/socket.h.
Nicolas Thery [Mon, 26 May 2008 17:11:09 +0000 (17:11 +0000)]
Allocate lwkt threads from objcache instead of custom per-cpu cache backed
by zone.
Reviewed-by: dillon@
Sepherosa Ziehau [Mon, 26 May 2008 14:23:25 +0000 (14:23 +0000)]
Avoid panic upon module unloading
Michael Neumann [Mon, 26 May 2008 14:00:46 +0000 (14:00 +0000)]
Allow a NULL pointer as argument to usb_get_next_event(), and don't
allocate a "struct usb_event" on stack in usb_add_event().
Obtained-from: NetBSD/usb.c 1.83
Michael Neumann [Mon, 26 May 2008 13:56:08 +0000 (13:56 +0000)]
Remove __HAVE_GENERIC_SOFT_INTERRUPTS ifdefs as we don't support the
softintr_* API of NetBSD.
Sepherosa Ziehau [Mon, 26 May 2008 13:29:33 +0000 (13:29 +0000)]
Fix following possible bugs for SIOCSIFADDR, if in_ifinit() fails
Conditions:
o ifaceX has an AF_INET ia
o SIOCSIFADDR is used to change address, and new address' hash value is
different from ia's
o ia is currently in hash bucket B1
o ia is removed from B1 and installed into hash table using new address
hash value, assume its new hash bucket is B2, and B1 != B2
1) Dangling ia reference in inaddr hash table
o ifnet.if_ioctl fails
o ia is reinstalled into hash bucket B1, but without being first removed
from hash bucket B2
Hash bucket B2 will have a dangling reference to ia
2) ia is left in wrong hash bucket
o rtinit fails
o ia's address is restored to oldaddr
ia itself is left in hash bucket indexed by new address's hash value
- In in_ifinit(), if it fails, unlink ia from inaddr hash table instead of
delaying the unlinking to in_control_internal(). If necessary reinstall
ia into inaddr hash table with original address
- After the above fix, in_control_internal() needs to unlink ia from inaddr
only if cmd is SIOCDIFADDR and ia resides in inaddr hash table. Whether
ia is in inaddr hash table or not, is currently indicated by ia address's
family; add XXX comment that this assumption is not good
- Constfy 'sin' parameter to in_ifinit()
Reviewed-by: dillon@
Michael Neumann [Mon, 26 May 2008 13:24:59 +0000 (13:24 +0000)]
Fix typos and cosmetic changes.