Sepherosa Ziehau [Sun, 25 Dec 2016 09:20:47 +0000 (17:20 +0800)]
tcp: Nuke the sysctl to disable local port extension.
Matthew Dillon [Fri, 23 Dec 2016 22:38:13 +0000 (14:38 -0800)]
AHCI - Misc fixes
* Reduce chip reset time from 500ms to 250ms to speed up booting on
machines with multiple AHCI controllers.
* Fix a bug in a piece of the error recovery code that was waiting
forever.
* Implement the hw.ahci.synchronous_boot TUNABLE. Setting this variable
to 0 in loader.conf causes the ahci device probe to be fully asynchronous
during booting. This is HIGHLY experimental and not recommended on
systems with only one controller as the kernel may boot too quickly for
the boot drive to probe before the kernel gets to init.
* Do a pass on the ahci.4 manual page.
Sascha Wildner [Thu, 22 Dec 2016 12:36:05 +0000 (13:36 +0100)]
Update the pciconf(8) database.
December 19, 2016 snapshot from http://pciids.sourceforge.net/
Sascha Wildner [Thu, 22 Dec 2016 11:29:45 +0000 (12:29 +0100)]
libc: Include <unistd.h> for ftell/ftruncate/truncate prototypes.
Currently, <stdio.h> and <sys/types.h> define them too, so this is
only cosmetic.
While here, fix a case in dump(8) too.
Imre Vadász [Thu, 22 Dec 2016 00:03:45 +0000 (01:03 +0100)]
drm/i915: Fix typo in get_bdb_header(), fixes vbt validity check.
Matthew Dillon [Wed, 21 Dec 2016 22:12:46 +0000 (14:12 -0800)]
ahci - Add workarounds for Marvell 88SE9215
* This Marvell chip also needs some quirks. Probably most of the older
Marvell chips need the same quirks, and the newer probably needs the
FR cycling quirk, but for now I'm adding them only specifically as
they are tested.
Reported-by: Edward Berger
Matthew Dillon [Wed, 21 Dec 2016 19:14:17 +0000 (11:14 -0800)]
ahci - Improve port-multiplier detection
* Improve port-multiplier detaction by adding workarounds for
poorly-implemented AHCI and PM chipsets. Now detects the popular
Rosewill 4-bay enclosure, which uses chipid 0x575f197b.
Increase device detect timeout from 3/10 second to 2 seconds. This
enclosure stupidly takes extra time on the first COMRESET after a cold
power-on to detect, I'm guessing because it is testing both its USB and
its eSATA port.
This port multiplier sometimes returns ready before its software has
completely initialized, causing PM register READs to succeed, but
return data values of 0. If we get a data value of 0 for the REV register
we sleep a little and try once more.
* Marvell AHCI chip does not immediately latch the signature on the
second FIS during a software reset. Give it 500ms to do so.
Ignore a BSY condition between the first and second FIS during a software
reset probe of the PM.
Sepherosa Ziehau [Wed, 21 Dec 2016 14:08:19 +0000 (22:08 +0800)]
ip: Set mbuf hash for output IP packets.
This paves the way to implement Flow-Queue-Codel.
zrj [Wed, 21 Dec 2016 07:56:04 +0000 (09:56 +0200)]
Fix typo.
Also downgragrade to warning to ease up update from 4.6-release,
host bmake does not handle: make -f Makefile.inc1 -V WORLD_ALTCOMPILER
John Marino [Tue, 20 Dec 2016 20:16:17 +0000 (14:16 -0600)]
Take II on fallback HOST_BINUTILSVER
The format of BINUTILSVER is binutils2XX, but the previous change included
the libexec prefix. Strip this out too.
Reported-by: zrj
John Marino [Tue, 20 Dec 2016 18:32:53 +0000 (12:32 -0600)]
Fix world build in NO_ALTBINUTILS edge case
In the case that the machine has been updated within 30 days but with
NO_ALTBINUTILS set, the world build fails. This is due because the
logic to fallback to earlier binutils versions fails due to empty
directories that are installed regardless of the NO_ALBINUTILS setting.
The logic was updated to search for binutils programs rather than
directories. In the edge case, the oldest version of binutils on the
system is used to build the native versions during the early build phases.
Sepherosa Ziehau [Tue, 20 Dec 2016 03:09:30 +0000 (11:09 +0800)]
hyperv: Add API to read raw value of Hyper-V timer.
Accelerate Hyper-V event timer reloading.
Sepherosa Ziehau [Tue, 20 Dec 2016 02:56:08 +0000 (10:56 +0800)]
hyperv: Move commonly shared header files to the module's top dir.
Sepherosa Ziehau [Tue, 20 Dec 2016 02:50:15 +0000 (10:50 +0800)]
hyperv: Implement Hyper-V reference TSC cputimer.
This one is at least 2 times faster than its rdmsr counterpart.
Obtained-from: FreeBSD
Sepherosa Ziehau [Tue, 20 Dec 2016 02:49:44 +0000 (10:49 +0800)]
cputimer: Add more IDs for VMM cputimers.
Matthew Dillon [Mon, 19 Dec 2016 18:08:06 +0000 (10:08 -0800)]
ahci - Implement FBS for port-multipliers
* Implement FBS (FIS-Based Switching) for port-multipliers. If the
chipset supports it, the ahci driver now turns on FBS mode which
allows us to queue concurrent requests to different targets.
Most AHCI chipsets do not support FBS resulting in poor port-multiplier
performance.
- FBS is enabled in the PM probe.
- FBS must be disabled when doing a hard reset.
- In FBS mode commands must be queued to PREG_CI one at a time,
and the target must be written to AHCI_PREG_FBS prior to activation
via CI.
- RFIS area is larger, and RFIS responses are copied from the
appropriate target index instead of index 0.
- Issue a COMRESET during the PM probe if a BSY status is
recognized, which helps on chipsets which do not implement
the SCLO cap.
* Clean-up a little logic in ahci_port_stop().
* Use the saved sc_cap to check for the SCLO capability instead of
re-reading AHCI_REG_CAP in a few places.
* Dump the RFIS data to the console on error.
* Fixup sc_cap to directly incorporate quirks.
Matthew Dillon [Mon, 19 Dec 2016 07:21:19 +0000 (23:21 -0800)]
ahci - Add quirks for Marvell devices
* Add some quirks for badly broken Marvell devices.
* 88SE9172 - This badly broken AHCI chipset does not support FR *or*
CR responses.
* 88SE9230 - This badly broken AHCI chipset supports FR and CR, but
cannot maintain FR across a disconnect. FRE must be
cycled on the insertion detect in order to re-assert
FR and be able to detect the new device.
This chipset also seems to have other problems, sometimes
generating an error (TFES error) on SET_FEATURES, which
does not happen when the drive is connected to the Intel
AHCI chipset.
* Implement quirks for these devices. Also, don't enable FRE with
POD and SUD (do it separately), and sequence CMD_ICC_ACTIVE a bit
differently than before.
Matthew Dillon [Mon, 19 Dec 2016 00:45:06 +0000 (16:45 -0800)]
ahci - Adjust a few things
* These changes have no effect on known AHCI devices but are a good idea.
* As suggested in the AHCI spec 10.1.2, zero out the memory pointed to
by the FB and CL port dma addresses.
* Write to FB before FBU, and to CLB before CLBU, just in case hardware
clears the upper bits on a write to the lower bits (no known AHCI
hardware does this but its something that is commonly implemented in
other hw so...).
* Improved I/O error reporting.
Sascha Wildner [Mon, 19 Dec 2016 17:46:11 +0000 (18:46 +0100)]
Some mdoc cleanup in tuning.7 and swapcache.8
Reported-by: zrj
zrj [Mon, 19 Dec 2016 15:55:21 +0000 (17:55 +0200)]
gcc50: Build lto-wrapper even if buildworld is not LTO enabled.
After default binutils update is now safe to do that.
Keep in mind that buildworld still should work when downgrading to non LTO one.
This finally allows to have standard buildworld and LTO'ed buildkernel.
zrj [Mon, 19 Dec 2016 16:01:05 +0000 (18:01 +0200)]
<sys/param.h>: Bump __DragonFly_version for binutils update.
zrj [Mon, 19 Dec 2016 06:02:20 +0000 (08:02 +0200)]
Switch to binutils227 as default base binutils.
DPorts were fixed to work with ld.gold version 1.12 from binutils 2.27,
some workarounds were added to few ports. Haskell.
ld(ld.gold) has become very strict, in some scenarios LDVER=ld.bfd will help.
Updated binutils bring better support for world/kernel compilation with -flto.
Also updated ld.gold now is able to link chromium without any DSO warnings.
Signed-off-by: marino, swildner
zrj [Mon, 19 Dec 2016 15:37:27 +0000 (17:37 +0200)]
flex: Disable LTO in the libfl.a for clang.
clang has issues with such LTO'ed static library.
This library is small and gains of LTO are minimal.
Unbreaks ports like lang/gscheme.
Sepherosa Ziehau [Mon, 19 Dec 2016 15:49:56 +0000 (23:49 +0800)]
ip: Add parenthesis properly.
Sepherosa Ziehau [Mon, 19 Dec 2016 13:32:41 +0000 (21:32 +0800)]
ip: Move multicast addresses detection into common place.
zrj [Sun, 18 Dec 2016 16:25:33 +0000 (18:25 +0200)]
libc: Avoid negative offsets in link_ntoa().
Discussed-with: swildner
Taken-from: FreeBSD
Tomohiro Kusumi [Sat, 17 Dec 2016 22:52:26 +0000 (07:52 +0900)]
sbin/hammer: Redo
e4323571 partly (after reverted by
03d5db37)
> sbin/hammer: Fix bug in get_buffer_data()
>
> The previous commit made clear that xor part of get_buffer_data()
> was wrong. Since buf_offset is in any zone not limited to zone-2,
> xor of two offsets doesn't necessarily show the right result to
> know whether they belong to the same buffer, even if ->zone2_offset
> is originally translated from the same zone within the same buffer.
>
> It needs to take xor of long offsets instead of full 64 bits.
Tomohiro Kusumi [Sat, 17 Dec 2016 21:20:35 +0000 (06:20 +0900)]
Revert "sbin/hammer: Fix bug in get_buffer_data()"
This reverts commit
e4323571a2e8310683120148b720a92f801c618f.
HAMMER_OFF_LONG_ENCODE() part is ok, but limiting to direct
zones causes several issues on formatting undo fifo, while
the commit avoids overhead of releasing everytime.
Tomohiro Kusumi [Sat, 17 Dec 2016 11:26:17 +0000 (20:26 +0900)]
sbin/hammer: Fix bug in get_buffer_data()
The previous commit made clear that xor part of get_buffer_data()
was wrong. Since buf_offset is in any zone not limited to zone-2,
xor of two offsets doesn't necessarily show the right result to
know whether they belong to the same buffer, even if ->zone2_offset
is originally translated from the same zone within the same buffer.
It needs to take xor of long offsets instead of full 64 bits.
The reason cache releasing is now limited to directly translated
zones is because for indirectly translated zones (i.e. undo zone),
it can't tell overlap by xor of offsets regardless of long format.
Prior to this commit, get_buffer_data() has been releasing buffers
that don't need to be released (i.e. *bufferp being the right cache),
and has resulted in huge overhead as shown in below comparison.
In the first example, get_buffer_data() is releasing *bufferp for
undo fifo entries everytime when it doesn't need to release.
-- Prior to this commit
# time newfs_hammer -L TEST /dev/da4
Volume 0 DEVICE /dev/da4 size 4.55TB
initialize freemap volume 0
initializing the undo map (1024 MB)
---------------------------------------------
HAMMER version 6
1 volume total size 4.55TB
root-volume: /dev/da4
boot-area-size: 32.00KB
memory-log-size: 256.00KB
undo-buffer-size: 1.00GB
total-pre-allocated: 1.02GB
<snip>
newfs_hammer -L TEST /dev/da4 3.05s user 1.16s system 41% cpu 10.098 total
-- Using this commit
# time newfs_hammer -L TEST /dev/da4
Volume 0 DEVICE /dev/da4 size 4.55TB
initialize freemap volume 0
initializing the undo map (1024 MB)
---------------------------------------------
HAMMER version 6
1 volume total size 4.55TB
root-volume: /dev/da4
boot-area-size: 32.00KB
memory-log-size: 256.00KB
undo-buffer-size: 1.00GB
total-pre-allocated: 1.02GB
<snip>
newfs_hammer -L TEST /dev/da4 2.72s user 0.04s system 73% cpu 3.755 total
Tomohiro Kusumi [Sat, 17 Dec 2016 00:00:44 +0000 (09:00 +0900)]
sbin/hammer: Fix terminology of buf_offset
This commit just renames (local and struct field) variables.
No functional difference.
The way HAMMER userspace uses name "buf_offset" is misleading.
In kernel space, "buf_offset" is for arbitrary zone offsets that
are not limited to zone-2, however in userspace "buf_offset" is
used for zone-2. It should be renamed to "zone2_offset" so the
terminology being used in kernel and userspace are the same.
This is important because the name implies what's stored in
upper 4 bits of 64 bits offset, and having misleading variable
names tends to be error-prone (see the next commit).
Tomohiro Kusumi [Sat, 17 Dec 2016 10:27:38 +0000 (19:27 +0900)]
sys/vfs/hammer: Rename misleading macro hammer_is_zone2_mapped_index()
All zones are mapped to zone2 (whether directly or indirectly),
so hammer_is_zone2_mapped_index() is a misleading name.
It should have indicated it's for B-Tree records related zones.
Tomohiro Kusumi [Sat, 17 Dec 2016 00:34:30 +0000 (09:34 +0900)]
sbin/hammer: Remove redundant blockmap lookup in hammer show
blockmap_lookup() is called via check_data_crc() right before
check_data_crc() gets called. This isn't necessary for checking
data CRC either.
Tomohiro Kusumi [Fri, 16 Dec 2016 18:52:37 +0000 (03:52 +0900)]
sbin/hammer: Use calloc(3) instead of malloc(3)+bzero(3)
Tomohiro Kusumi [Fri, 16 Dec 2016 18:43:17 +0000 (03:43 +0900)]
sbin/hammer: Properly use calloc(3)
It's supposed to be number and then size.
Tomohiro Kusumi [Fri, 16 Dec 2016 17:24:45 +0000 (02:24 +0900)]
sbin/hammer: Refactor hammer_cache_flush()
Tomohiro Kusumi [Fri, 16 Dec 2016 16:49:12 +0000 (01:49 +0900)]
sbin/hammer: Remove redundant cache counter NCache
Incrementation and decrementation of NCache is always aligned
with CacheUse in a single thread program like /sbin/hammer,
so this cache counter isn't necessary.
Tomohiro Kusumi [Fri, 16 Dec 2016 16:46:05 +0000 (01:46 +0900)]
sbin/hammer: Use HAMMER_BUFSIZE to calculate CacheMax
CacheMax is to be compared with multiple of HAMMER_BUFSIZE,
so use HAMMER_BUFSIZE to initialize CacheMax.
Tomohiro Kusumi [Fri, 16 Dec 2016 15:51:38 +0000 (00:51 +0900)]
sbin/hammer: Change fprintf (without exit) to err variants
In additon to
bac217f3 and
02318f07, these are fprints calls
not followed by exit right after fprintf, but makes no difference
with err variants (as it'll exit(1) shortly).
The ones in sbin/hammer/cmd_dedup.c should have been changed
in
02318f07.
Tomohiro Kusumi [Fri, 16 Dec 2016 15:08:11 +0000 (00:08 +0900)]
sbin/mount_hammer: Use warn(3) variants
Tomohiro Kusumi [Fri, 16 Dec 2016 14:49:47 +0000 (23:49 +0900)]
sbin/newfs_hammer: Refactoring
Tomohiro Kusumi [Fri, 16 Dec 2016 14:23:46 +0000 (23:23 +0900)]
sbin/newfs_hammer: Use warn(3) variants
Tomohiro Kusumi [Fri, 16 Dec 2016 08:10:18 +0000 (17:10 +0900)]
sbin/newfs_hammer: Mention root volume is volume#0 in manpage
Tomohiro Kusumi [Fri, 16 Dec 2016 06:33:45 +0000 (15:33 +0900)]
sbin/hammer: Don't hardcode 0 for root PFS
HAMMER code doesn't hardcode 0 for root PFS
(e.g. see sbin/newfs_hammer, it could be !=0 if one wants to do so).
Fix the existing error messages using hardcoded 0.
Also add "(root PFS)" for PFS#0 in hammer info command.
Sepherosa Ziehau [Sat, 17 Dec 2016 13:20:58 +0000 (21:20 +0800)]
mbuf: Factor function to set mbuf hash.
Matthew Dillon [Sat, 17 Dec 2016 06:25:00 +0000 (22:25 -0800)]
vmstat - Adjust headers
* Widen some of the header names to make them more readable.
* Adjust manual page.
Matthew Dillon [Sat, 17 Dec 2016 06:15:09 +0000 (22:15 -0800)]
vmstat - Revamp output
* Revamp iterative output, e.g. 'vmstat 1' or 'vmstat'. Make the fields
wider, remove the pdpages column, and format the values to fit. The
previous output format had become completely unusable due to blowing out
available widths.
* Revamp vmstat -z and add support for vmstat -z <interval>. Output the
information in a more useful form.
Matthew Dillon [Sat, 17 Dec 2016 06:13:26 +0000 (22:13 -0800)]
kernel - remove mapzone
* mapzone is no longer being used, remove it.
Matthew Dillon [Sat, 17 Dec 2016 03:43:01 +0000 (19:43 -0800)]
debug - Update kmapinfo, zallocinfo, slabinfo
* Update the kmapinfo, zallocinfo, and slabinfo commands so they work
properly with current kernels.
* kmapinfo now breaks-down who is using each vm_map_entry in the
kernel_map, and prints out aggregate results for each subsystem.
Matthew Dillon [Sat, 17 Dec 2016 03:39:46 +0000 (19:39 -0800)]
kernel - Tag vm_map_entry structure, slight optimization to zalloc, misc.
* Tag the vm_map_entry structure, allowing debugging programs to
break-down how KMEM is being used more easily.
This requires an additional argument to vm_map_find() and most
kmem_alloc*() functions.
* Remove the page chunking parameter to zinit() and zinitna(). It was
only being used degeneratively. Increase the chunking from one page
to four pages, which will reduce the amount of vm_map_entry spam in
the kernel_map.
* Use atomic ops when adjusting zone_kern_pages.
Matthew Dillon [Fri, 16 Dec 2016 19:38:35 +0000 (11:38 -0800)]
drm - Fix memory leak in broadwell or later GPUs
* vunmap() linux compatibility code was not implemented, leading to a
memory leak for certain operations in newer GPUs. Browsers tend to
tickle the code paths in question.
* Implement vunmap() to fix the leak.
Tomohiro Kusumi [Thu, 15 Dec 2016 07:56:17 +0000 (16:56 +0900)]
sbin/hammer: Change fprintf/exit to err variants [2/2]
Change
fprintf(stderr, ...); exit(1);
and
perror(...); exit(1);
to
err(1, ...) or errx(1, ...);
where possible for consistency.
This commit is just conversion to err(3) variants.
Messages themselves are not changed, except for removing
strerror(errno) for err(3), and removing trailing \n,
though err variants add program name in the messages.
err(3) and variants are non standard BSD functions, but HAMMER
userspace has been using BSD stuff aside from the existing
err/errx. Also note that err(3) is available in Linux as well.
Tomohiro Kusumi [Thu, 15 Dec 2016 15:17:19 +0000 (00:17 +0900)]
sbin/hammer: Change fprintf/exit to err variants [1/2]
Change
fprintf(stderr, ...); exit(1);
and
perror(...); exit(1);
to
err(1, ...) or errx(1, ...);
where possible for consistency.
In test_volume(), if open(2)/pread(2) failed, err(1) without
scanning rest of the volumes. This function itself is redundant
anyway as mentioned in
1e297b34, so no one cares.
Other than that this is just conversion to err(3) variants.
Messages themselves are not changed, except for removing
strerror(errno) for err(3), and removing trailing \n,
though err variants add program name in the messages.
err(3) and variants are non standard BSD functions, but HAMMER
userspace has been using BSD stuff aside from the existing
err/errx. Also note that err(3) is available in Linux as well.
Tomohiro Kusumi [Thu, 15 Dec 2016 09:16:36 +0000 (18:16 +0900)]
sbin/hammer: Fix/remove redundant error variable
Check ioctl result right after ioctl.
Since this for-loop expects ioctl(GET_PSEUDOFS) to eventually
return ENOENT for a new slot, error variable is likely to be
overwritten with ENOENT by the time for-loop ends. This could
result in overlooking possible real errors.
If errno is checked (if not ENOENT) right after ioctl, there's
also no need to preserve errno value.
Sascha Wildner [Wed, 14 Dec 2016 20:53:15 +0000 (21:53 +0100)]
kernel/acpica: Fix shutdown issues with ACPICA
20161117.
Observed by ivadasz on a Fujitsu Lifebook E744 where shutdown stopped
working properly after the upgrade to
20161117.
This reverts some AML parser commits. A real fix should be following
later.
Patch-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: ivadasz
Tomohiro Kusumi [Wed, 14 Dec 2016 16:41:12 +0000 (01:41 +0900)]
sbin/hammer: Add a trivial wrapper over blockmap_lookup()
Layer1/2 args are just for debugging, so make blockmap_lookup()
a wrapper over the existing blockmap_lookup(..,NULL,NULL,..);
Tomohiro Kusumi [Wed, 14 Dec 2016 02:24:58 +0000 (11:24 +0900)]
sys/vfs/hammer: Use hammer_is_zone_xxx()
Tomohiro Kusumi [Tue, 13 Dec 2016 21:18:16 +0000 (06:18 +0900)]
sbin/hammer: Cleanup blockmap_lookup()
The original intention of this function was to return result_offset
even if there was an error (probably for debugging in early stage),
but no one no longer cares about returned offset on error.
Tomohiro Kusumi [Tue, 13 Dec 2016 17:52:57 +0000 (02:52 +0900)]
sbin/hammer: Add __blockmap_xlate_to_zone2() to refactor get_buffer()
No functional difference, but this should make more sense than
how it was implemented before. The only thing this part really
does is convert zone offset to zone-2.
If error is set by blockmap_lookup(), there's nothing it can do
to recover, but get_buffer() mustn't call exit(1) here. A command
like hammer recover (which could possibly pass invalid offsets)
expects get_buffer() to return NULL for invalid offsets.
The reason for not calling blockmap_lookup() via get_buffer()
when zone_offset is in zone-2 is because zone_offset could be
0 when newfs_hammer calls get_buffer_data() on bootstrap when
layer1/2 entries aren't even created. It's ok to directly call
it with zone-2 offset like hammer show and blockmap do.
format_freemap()
-> get_buffer_data()
-> get_buffer()
-> blockmap_lookup()
-> get_buffer_data(0)
-> get_buffer(0)
-> blockmap_lookup(0) /* XXX */
Tomohiro Kusumi [Wed, 14 Dec 2016 01:43:34 +0000 (10:43 +0900)]
sbin/hammer: Add __alloc_buffer() to refactor get_buffer()
No functional difference, but this should make more sense than
how get_buffer() (which is one of the complicated functions in
HAMMER userspace) was implemented before.
After the previous commit, volume_info* doesn't need to be
visible to get_buffer().
Tomohiro Kusumi [Tue, 13 Dec 2016 16:50:18 +0000 (01:50 +0900)]
sbin/hammer: Remove redundant volume arg in find_buffer()
Since buf_offset is (and supposed to be) a canonical zone-2 offset,
the function can retrieve volume_info* via decoded id from offset.
This is also for the next commit.
Tomohiro Kusumi [Tue, 13 Dec 2016 18:21:42 +0000 (03:21 +0900)]
sbin/hammer: Fix known bug in full scan recovery mentioned in
f2dd4b0c
As mentioned in
3d900665, introducing scan range limit by default,
and preserving the original default behavior as full scan mode
worked around a bug mentioned in
f2dd4b0c, but possible assertion
error (by having access to not existing volumes) has still been
there as far as full scan mode is concerned.
This commit is to fix that.
Tomohiro Kusumi [Mon, 12 Dec 2016 16:04:50 +0000 (01:04 +0900)]
sbin/hammer: Fix rename printfs to differentiate recover paths
This helps understand recovery path from stdout.
It doesn't really matter if the change makes any sense to real users,
because most printfs by this command aren't understandable anyway
unless one looks at the code.
Tomohiro Kusumi [Mon, 12 Dec 2016 06:44:55 +0000 (15:44 +0900)]
sbin/hammer: Use big-block append offset to limit recovery scan range
This commit is to fix a remaining issue mentioned in
e3cefcca,
which recovers irrelevant files from old filesystem even with the
scan range limit introduced by
e3cefcca and quick scan mode
introduced by
e819b271.
As shown in an example below, whenever a filesystem is recreated
and the current one uses less space than the old filesystem, the
command is likely to recover files from old filesystem (even with
e3cefcca and
e819b271), because B-Tree big-blocks could have nodes
from old filesystem after their append offset, especially if the
block is the last one in B-Tree zone.
In order to avoid recovery of irrelevant files, the command needs
to check if scanning offset is beyond append offset of the B-Tree
big-block that contains this offset, and ignore all nodes beyond
the append offset. [*] shows this situation. Note that the append
offset is checked only if layer1/2 entries that point to this
B-Tree big-block have good CRC result.
This applies to both default and quick scan mode, but not to full
scan mode. Full scan scans everything no matter what.
--------------------------------------------------------> offset
|--------------------------------------------------| volume size
|<----------------------------------------->| previously used
|<---->| previously unused
|<----------------------------------->| currently used
|<---------->| currently unused
... -------------------------->| full scan
... ---------------->| default scan
... --->||<------->||<------->||<--->| default scan [*]
... |<-->| ... |<-->| ... |<-->| quick scan
... |<->| ... |<->| ... |<->| quick scan [*]
===== comparison of recovered files
1. Zero clear the first 1GB of /dev/da1.
# dd if=/dev/zero of=/dev/da1 bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 2.714761 secs (
395519867 bytes/sec)
2. Create a filesystem and clone 968MB dragonfly source.
# newfs_hammer -L TEST /dev/da1 > /dev/null
# mount_hammer /dev/da1 /HAMMER
# cd /HAMMER
# git clone /usr/local/src/dragonfly > /dev/null 2>&1
# du -sh .
968M .
# cd
# umount /HAMMER
3. Create a filesystem again with 1 regular file.
# newfs_hammer -L TEST /dev/da1 > /dev/null
# mount_hammer /dev/da1 /HAMMER
# cd /HAMMER
# ls -l
total 0
# echo test > test
# cat ./test
test
# cd
# umount /HAMMER
4-1. Recover a filesystem assuming it only has 1 regular file.
# rm -rf /tmp/a
# hammer -f /dev/da1 recover /tmp/a recover > /dev/null
# cat /tmp/a/PFS00000/test
test
# tree /tmp/a | wc -l
19659
# du -a /tmp/a | grep obj_0x | wc -l
19661
4-2. Do the same as 4-1 using this commit.
# rm -rf /tmp/b
# hammer -f /dev/da1 recover /tmp/b recover > /dev/null
# cat /tmp/b/PFS00000/test
test
# tree /tmp/b
/tmp/b
`-- PFS00000
`-- test
1 directory, 1 file
#
Matthew Dillon [Mon, 12 Dec 2016 02:08:20 +0000 (18:08 -0800)]
kernel - Re-fix chromebook keyboard
* The elantec commit broke the chromebook keyboard. Re-fix it.
hopefully elantec support still works.
Sascha Wildner [Sun, 11 Dec 2016 17:59:28 +0000 (18:59 +0100)]
installer: Fix source directory specification.
It got broken in
f7df6c8e7a.
While here, pass -o to all backend invocations.
Sascha Wildner [Sun, 11 Dec 2016 17:59:07 +0000 (18:59 +0100)]
installer: Rename is_livecd -> is_installmedia.
Tomohiro Kusumi [Sat, 10 Dec 2016 18:30:05 +0000 (03:30 +0900)]
sys/vfs/hammer: Remove redundant function btree_max_elements()
36211fc6 in 2015 could/should have removed this.
Tomohiro Kusumi [Sat, 10 Dec 2016 18:02:46 +0000 (03:02 +0900)]
sbin/hammer: Cleanup hammer recover
Separate debug code into a different inlined function.
Tomohiro Kusumi [Sat, 10 Dec 2016 06:21:00 +0000 (15:21 +0900)]
sbin/hammer: Add full mode for hammer recover to revive full scan
This commit revives the original full scan recovery by adding full
option, after the previous commit introduced offset limit.
Apparently, both full option and quick option can't be specified.
To summarize 3 modes,
1. default - Full scan, but only upto the last big-block being used.
2. full - Full scan, which scans the entire fs image with no limit.
3. quick - B-Tree only scan, plus associated records in other zones.
1. was introduced (by the previous commit) to fix a bug, as well as
to avoid irrelevant files.
2. was introduced (by this commit) to revive the original full scan
recovery behavior, which is by far the slowest, but most reliable
in terms of recovery except for the above bug.
3. was introduced (by
e819b271) to speed up the recovery process,
provided B-Tree zone is not corrupted. This is the fastest.
Tomohiro Kusumi [Sat, 10 Dec 2016 05:41:43 +0000 (14:41 +0900)]
sbin/hammer: Use last active big-block to limit recovery scan range
This commit is to fix a bug mentioned in
f2dd4b0c. This commit
uses offset of the last active big-block (big-block with maximum
zone-2 offset whose layer2->zone is neither 0,4,15), as an upper
limit of scan range, so it doesn't scan beyond actual consumption.
Note that this upper limit is used only if all layer1/2 entries
have correct CRC values. Otherwise the entire image is scanned
as usual (unless quick option is used).
Note that this upper limit doesn't necessarily equal a big-block
before the first unused big-block offset (i.e. layer2->zone == 0),
because reblock could locate unused big-block between used ones.
Note that using the upper limit also tries to avoid recovery of
irrelevant files from old filesystem that could exist beyond the
upper limit (if not perfect). It also speeds up recovery process.
Tomohiro Kusumi [Fri, 9 Dec 2016 17:23:49 +0000 (02:23 +0900)]
sbin/hammer: Minor fix for hammer recover quick mode
* Remove assert(b); since it's totally possible that the whole
B-Tree zone was corrupted and nothing was found.
* Print B-Tree zone info only when using quick option.
* Rename a local variable limit to zone_limit for the next commit.
Matthew Dillon [Fri, 9 Dec 2016 21:46:46 +0000 (13:46 -0800)]
smbfs - Fix rename operation
* The rename operation was not updating smbfs's internal name hash.
Properly update the hash.
Reported-by: dflyum (Uwe Muenzberg)
Sascha Wildner [Fri, 9 Dec 2016 19:13:00 +0000 (20:13 +0100)]
newfs_msdos(8): Sync with FreeBSD.
* New options: '-C size' to create an empty image of the specified
size and '-@ offset' to add the image at the specfied offset.
* Separate some parts into mkfs_msdos.c for later perusal by
makefs(8), which we have yet to bring in.
* Numerous improvements and bug fixes.
* Raise WARNS to 6.
Taken-from-and-thanks-to: FreeBSD and NetBSD
Imre Vadász [Fri, 9 Dec 2016 18:04:45 +0000 (19:04 +0100)]
vgapci: There is no drmn driver in DragonFly, there is only drm.
* So no need to allocate a child device at vgapci for a non-existent
drmn.
Tomohiro Kusumi [Fri, 9 Dec 2016 15:47:58 +0000 (00:47 +0900)]
sbin/hammer: Fix typo from
14331391
One could take this as a typo for both "covers" and "converts",
and "covers" is the right one.
Tomohiro Kusumi [Thu, 8 Dec 2016 10:05:42 +0000 (19:05 +0900)]
sbin/hammer: Add quick mode for hammer recover
Since hammer recover command tries to recover filesystem data
based on assumption on ondisk data bytes that look like B-Tree
nodes/elms, the command can tell the recovery process is done
once scanning offset gets to the point where there is no more
big-blocks for B-Tree zone, without scanning through the whole
address space of all volumes (provided B-Tree zone is alive).
By specifying quick option after the target directory option,
this command makes use of B-Tree big-block info prefetched before
recovery process, and stops recovery once all B-Tree big-blocks
are scanned. As shown in below example, this makes recovery
much faster by cutting unnecessary I/Os.
The drawback is that quick mode is based on assumption that
B-Tree zone isn't corrupted. If B-Tree zone is somehow corrupted,
prefetched info is incomplete or totally wrong, so one needs
to linearly scan the whole address space of all volumes to
check every possible B-Tree nodes/elms without using quick mode
which is what's been done by default.
-- example of default and quick mode
# newfs_hammer -L TEST /dev/da1 > /dev/null
# mount_hammer /dev/da1 /HAMMER
# cd /HAMMER
# git clone /usr/local/src/dragonfly > /dev/null 2>&1
# cd
# umount /HAMMER
# time hammer -f /dev/da1 recover /tmp/a > /dev/null
hammer -f /dev/da1 recover /tmp/a > /dev/null 309.51s user 122.96s system 21% cpu 33:50.17 total
# cd /tmp/a/PFS00000/dragonfly/sys/vfs/hammer
# make > /dev/null 2>&1; echo $?
0
# file hammer.ko
hammer.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
# time hammer -f /dev/da1 recover /tmp/b quick > /dev/null
hammer -f /dev/da1 recover /tmp/b quick > /dev/null 0.41s user 3.41s system 14% cpu 26.652 total
# cd /tmp/b/PFS00000/dragonfly/sys/vfs/hammer
# make > /dev/null 2>&1; echo $?
0
# file hammer.ko
hammer.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
Tomohiro Kusumi [Thu, 8 Dec 2016 10:32:45 +0000 (19:32 +0900)]
sbin/hammer: Fix whitespace alignment changed by
e0d7dd09
d_read needs one more space.
Tomohiro Kusumi [Thu, 8 Dec 2016 10:04:52 +0000 (19:04 +0900)]
sbin/hammer: Minor cleanup for hammer blockmap
François Tigeot [Thu, 8 Dec 2016 08:42:08 +0000 (09:42 +0100)]
drm/i915: Update to Linux 4.5
* Mostly bugfixes. Lots and lots of bugfixes.
* Skylake and Broxton support improvements
* Initial Kabylake support
Tomohiro Kusumi [Wed, 7 Dec 2016 18:27:39 +0000 (03:27 +0900)]
sbin/hammer: Use HAMMER_OBJID_ROOT
zrj [Mon, 5 Dec 2016 18:11:24 +0000 (20:11 +0200)]
Revamp alt compiler handling for clang 3.9.1 import.
Many users are still constantly asking weather llvm/clang compiler could be
added into a base as an alternative to current used ones (gcc50 and gcc47).
There are few issues in doing that:
* It is very hard to keep both compiler flavors in harmony while one or the
other is being updated. There were always two base compilers in base system
in DragonFly and common practice still is to replace previous alternative
compiler with an updated version, performing tests and then flip them up.
* With clang introduction, this scheme would break badly due to both using
slightly different c++ capabilities, flags support (WARNS mechanism), etc.
* Different incompatible libraries libLLVM + libc++ vs libstdc++, also clang
requiring a lot of effort to rewrite cmake logic into Makefiles for make(1).
* SBU costs, gcc47 only has ~4min buildtime overhead at -j5 level(i7 laptop)
while even clang38 tests has shown two-fold increase in buildworld time.
* How DPorts infrastructure would handle both flavors? License roadmap?
So to make compromise it was chosen to provide a way for users and developers
to select the alternative compiler they like while keeping all groups happy,
ones continuing to enjoy the very fast world rebuilds and others having a way
to further develop and integrate clang into the infrastructure. Since DragonFly
is currently x86_64 only, we might as well experiment more with compilers.
This changeset adds some flexibility when it comes to handling base system
compilers. Even if it would be decided that clang does not fit very well in
DragonFly base system (due to complexities, updating/patching problems and
compilation times), we at least will have a very clean way for adding,
testing and finally making base default upcoming gcc70 and later. All of this
would be possible without disturbing both primary and alternative default
compilers, while developers and users will be testing both base and dports.
Also as a bonus we will be able to add compilers like pcc and scc that have
no native c++ frontend support too while reusing default compiler parts.
For now I am keeping this expansion undocumented and candidate for a revert.
While there mark few places for further work to reduce amount of ORDER: for
faster/better parallelism in btools/ctools bootstrapping stages.
Bootstrap is still fine from DragonFly 4.0.6-RELEASE.
Matthew Dillon [Tue, 6 Dec 2016 23:11:12 +0000 (15:11 -0800)]
docs - Modernize swapcache(8)
* Give swapcache(8) an update taking into account our growing knowledge of
the capabilities and limitations of flash storage.
Matthew Dillon [Tue, 6 Dec 2016 22:36:32 +0000 (14:36 -0800)]
hammer - Disallow modifying ioctls when filesystem is read-only
* Disallow modifying ioctls if the filesystem has been mounted read-only
or gone into read-only mode due to an I/O error.
* This is only a partial fix. There are still error-pathing problems
in numerous procedures, particularly the node locking code, that might
result in a token life-lock.
Reported-by: Peter Avalos
Matthew Dillon [Tue, 6 Dec 2016 22:34:24 +0000 (14:34 -0800)]
libc - Take care of minor buffer overrun in link_ntoa()
* Take care of a minor buffer overrun in link_ntoa(). It is unlikely
that any program produces the conditions required to trigger the
problem.
Taken-from: FreeBSD-SA-16:37.libc
Reported-by: swildner, zrj, others
Tomohiro Kusumi [Tue, 6 Dec 2016 18:19:14 +0000 (03:19 +0900)]
sbin/hammer: Fix direntry message in hammer recover
name could have already been free'd, so move it to the beginning.
Also enable it only on -v, just like inode/data rectype cases.
Tomohiro Kusumi [Tue, 6 Dec 2016 17:47:04 +0000 (02:47 +0900)]
sbin/hammer: Fix inode/data messages in hammer recover
Based on other printf messages where "file" indicates regfile,
the first one should be "inode" rather than "file" because it
could be both directory and regfile.
The second one could be "file" because it's for file data, but
just sync with the first format.
Tomohiro Kusumi [Tue, 6 Dec 2016 13:14:06 +0000 (22:14 +0900)]
sbin/hammer: Minor cleanup for hammer recover
The reason for moving "info.pfs_id = dict->pfs_id;" is because
PT_FIGURE only requires strlen of "PFS%05d" (max 65535), and
dict->pfs_id never changes during path lookup by design.
Matthew Dillon [Tue, 6 Dec 2016 18:13:11 +0000 (10:13 -0800)]
dntpd - Fix memory leak
* Every log line leaked a bit of memory. Fixed.
zrj [Tue, 6 Dec 2016 08:14:32 +0000 (10:14 +0200)]
vkernel: Add a dummy cpu_smp_stopped() function (unbreaks build).
Follow-up to
63cff0361caa40216fcb16f79855de833431274b
Matthew Dillon [Tue, 6 Dec 2016 00:49:04 +0000 (16:49 -0800)]
kernel - Increase worst-case maximum exec rate
* The pid reuse algorithm limits the maximum fork rate. This limit
was set too low. Increase the limit from 10000/sec to 100000/sec.
Currently our opteron maxes out at 43000/sec.
Note that with 999999 pids and a 10-second mandatory reuse time
floor there isn't much of a point increasing the limit beyond
100000/sec.
100,000/sec. Currently our opteron maxes out at around
43,000/sec (vfork/exec/wait3/exit of a small static binary).
* The domain reuse array was increased to 1MB to accomodate this
change. In addition, update the array in a cache-friendly manner.
* Modify test/sysperf/exec1 to take a nprocesses argument for the
timing run.
Matthew Dillon [Mon, 5 Dec 2016 23:26:46 +0000 (15:26 -0800)]
kernel - Remove unused process_exit and process_exec eventhandlers
* Remove these two eventhandlers. They are not used in DragonFly.
* Fixes an unnecessary global lock bottleneck in exec and exit.
Suggested-by: Mateusz Guzik (mjg_)
Matthew Dillon [Mon, 5 Dec 2016 23:07:43 +0000 (15:07 -0800)]
kernel - Spiff up locks a bit
* Do a little optimization of _spin_lock_contested(). The critical path
is able to avoid two atomic ops in the initialization portion of the
contested path.
* Optimize _spin_lock_shared_contested() to use atomic_fetchadd_long()
to add a shared-lock count instead of atomic_cmpset_long(). Shared
spinlocks are used heavily and this will prevent a lot of unnecessary
spinning when many cpus are using the same lock at the same time.
* Hold fdp->fd_spin across fdp->fd_cdir and fdp->fd_ncdir modifications.
This completes other work which caches fdp->fd_ncdir and avoids having
to obtain the spin-lock when the cache matches.
Discussed-with: Mateusz Guzik (mjg_)
Matthew Dillon [Mon, 5 Dec 2016 23:01:10 +0000 (15:01 -0800)]
kernel - Make kern_proc cache-friendly
* Make the proc_tokens[], allprocs[], allpgrps[], and allsessn[]
arrays cache-friendly by aggregating them into a cache-aligned
struct procglob.
* Doesn't do much for the token array, but should help
allprocs/allpgrps/allsessn scans whos structures were previously
8-byte aligned.
Tomohiro Kusumi [Sun, 4 Dec 2016 09:57:15 +0000 (18:57 +0900)]
sbin/hammer: Add hammer strip command
This command is inspired by hammer recover command, and does
opposite of what recover command does.
This command zero clears zone-8(B-Tree) big-blocks, zone-9(meta)
big-blocks, and then the whole volume header, except that volume
signature field is overwritten with "STRIPPED" instead of zeros.
After running, a filesystem is no longer mountable or recoverable
with hammer recover command. This command is also fast as it only
zero clears good enough ondisk data to make it unmountable and
unrecoverable.
Keep in mind that this command does _not_ zero clear user data.
Users would normally use a software designed to completely shred
a filesystem. This command is not designed to shred a filesystem.
The name "strip" gives better idea of what it really does than
using "shred"/etc.
-- example
# newfs_hammer -L TEST /dev/da1 /dev/da2 /dev/da3 > /dev/null
# mount_hammer /dev/da1:/dev/da2:/dev/da3 /HAMMER
# cd /HAMMER
# dd if=/dev/urandom of=./out bs=1M count=120000
120000+0 records in
120000+0 records out
125829120000 bytes transferred in 1766.417077 secs (
71234094 bytes/sec)
# cd
# umount /HAMMER
# hammer -f /dev/da1:/dev/da2:/dev/da3 strip
You have requested that HAMMER filesystem (TEST) be stripped
Do you really want to do this? [y/n] y
Stripping HAMMER filesystem (TEST) in 5 4 3 2 1.. starting destruction pass
8000000021000000
9000000021800000
800000019c000000
800000030c000000
800000047e000000
80000005f7000000
8000000767000000
80000008d8000000
8000000a51800000
8000000bc5000000
8000000d37800000
8000000ead000000
800000101e800000
8000001193000000
8000001304000000
8000001478800000
80000015ee000000
8000001760800000
80000018d1800000
8000001a47000000
8000001bb6000000
801000013c000000
/dev/da1
/dev/da2
/dev/da3
# mount_hammer /dev/da1:/dev/da2:/dev/da3 /HAMMER
mount: Invalid argument
mount_hammer: /dev/da1: Invalid volume signature
4445505049525453
Tomohiro Kusumi [Sun, 4 Dec 2016 09:26:22 +0000 (18:26 +0900)]
sbin/hammer: Make hammer_parsedevs() take open(2) flag
This is for the next commit.
No functional change.
Tomohiro Kusumi [Sun, 4 Dec 2016 14:44:00 +0000 (23:44 +0900)]
sbin/hammer: Add "[y/n]" before getyn()
Tomohiro Kusumi [Sun, 4 Dec 2016 09:21:17 +0000 (18:21 +0900)]
sbin/hammer: Fix recursively called hammer_parsedevs()
c2b74c42 had to change recursively called hammer_parsedevs() as well.
Matthew Dillon [Mon, 5 Dec 2016 17:25:56 +0000 (09:25 -0800)]
kernel - Remove debugging kprintf
* Remove the 'exit race handled' debugging kprintf.
Matthew Dillon [Mon, 5 Dec 2016 17:21:19 +0000 (09:21 -0800)]
kernel - Try to idle cpus when in panic()
* Try to use MONITOR/MWAIT to idle cpus while they are stopped in a panic(),
instead of hard-looping. This significantly reduces power consumption while
in a paniced state and is particularly helpful on laptops.
Reported-by: tuxillo
Matthew Dillon [Mon, 5 Dec 2016 17:15:44 +0000 (09:15 -0800)]
kernel - more kmalloc and nlookup performance optimizations
* Give the pcpu counters in struct malloc_type their own cache line per
cpu. This removes a large kmalloc/kfree bottleneck on multi-socket
systems
* Avoid having to ref, lock, and GETATTR intermediate directory components
in nlookup() by adding the NCF_WXOK flag. This flag is set in the ncp
when the directory permissions are at least 555. This saves significant
overhead in all situations, including single-threaded.
Discussed-with: Mateusz Guzik (mjg_)