Sepherosa Ziehau [Thu, 29 Dec 2016 12:47:10 +0000 (20:47 +0800)]
ifq: Factor out if_classq from altq_classq and use it for default ifq.
This reduces memory foot print for default ifq and could be used
by the upcoming "flow" of FQ-CoDel.
Matthew Dillon [Sat, 7 Jan 2017 03:25:15 +0000 (19:25 -0800)]
kernel - Fix swap issue, implement dynamic pmap PT/PD/PDP deletion (4)
* Track down and fix another bug. pmap_dynamic_delete was imploding
higher level page table pages still being held by higher call levels.
Add a pv_hold count check to avoid the situation.
Matthew Dillon [Sat, 7 Jan 2017 02:06:14 +0000 (18:06 -0800)]
kernel - Implement CPU localization hinting for low level page allocations
* By default vm_page_alloc() and kmem_alloc*() localize to the calling cpu.
* A cpu override may be passed in the flags to make these functions localize
differently.
* Currently implemented as a test only for the pcpu globaldata, idle
thread, and stacks for kernel threads targetted to specific cpus.
Matthew Dillon [Sat, 7 Jan 2017 00:04:32 +0000 (16:04 -0800)]
kernel - Fix swap issue, implement dynamic pmap PT/PD/PDP deletion (3)
* More pmap fixes. Fix a bug introduced by the original commit that
could still create managed PT/PD/PDP page tables in kernel_pmap when
doing a wiring change.
* Assert that we never create managed PT/PD/PDP page tables in kernel_pmap.
Managed PTEs can still be created (e.g. for pageable kernel memory).
* Add sysctl pmap_dynamic_delete, which defaults to enabled. This sysctl
can be set to 0 to disable dynamic deletion of PT/PD/PDP pages in user
pmaps.
Matthew Dillon [Fri, 6 Jan 2017 20:48:49 +0000 (12:48 -0800)]
kernel - Fix swap issue, implement dynamic pmap PT/PD/PDP deletion (2)
* Fix bug in the PT/PD/PDP code. pmap_allocpte() was improperly trying
to create managed entities for higher-level kernel page tables, which
implodes the kernel. The kernel manages these entities itself.
Matthew Dillon [Fri, 6 Jan 2017 16:35:01 +0000 (08:35 -0800)]
kernel - vmm_init() must run after SMP startup
* vmm_init() must run after SMP startup (fix bug introduced by recent
commits).
* cleanup.
Sepherosa Ziehau [Fri, 6 Jan 2017 14:51:06 +0000 (22:51 +0800)]
ifq: Switch to drop-head for default enqueue method.
This is consistent w/ upcoming CoDel support.
François Tigeot [Fri, 6 Jan 2017 08:46:52 +0000 (10:46 +0200)]
drm/i915: Update to Linux 4.6
* Skylake and Kabylake support improvements
* FBC (FrameBuffer Compression) now enabled by default on Haswell and
Broadwell GPUs
* PSR (Panel Self Refresh) support improved, now enabled by default on
Valleyview, CherryView, Haswell and Broadwell
* Improved DSI panel support
* HDMI hotplug fixes
* Various bugfixes everywhere
Matthew Dillon [Fri, 6 Jan 2017 03:37:27 +0000 (19:37 -0800)]
kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)
* Fix miscellaneous bugs in the recent NUMA commits.
* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will
disable the NUMA code. Note that NUMA is only applicable on multi-socket
systems.
François Tigeot [Fri, 6 Jan 2017 08:08:39 +0000 (09:08 +0100)]
drm/linux: Add USEC_PER_MSEC definition
Commit proofread-by: zrj
Sepherosa Ziehau [Fri, 6 Jan 2017 07:59:25 +0000 (15:59 +0800)]
if: Defer the if_up until the ifnet.if_ioctl is called.
This ensures the interface is initialized by the interface driver
before it can be used by the rest of the system.
Obtained-from: FreeBSD
Sepherosa Ziehau [Fri, 6 Jan 2017 07:50:10 +0000 (15:50 +0800)]
if: Remove unnecessary critical sections.
Sepherosa Ziehau [Fri, 6 Jan 2017 06:42:06 +0000 (14:42 +0800)]
alc: Add Killer E2500 support.
Sepherosa Ziehau [Fri, 6 Jan 2017 06:29:50 +0000 (14:29 +0800)]
hyperv/vmbus: Fix interrupt timer detection logic.
Sepherosa Ziehau [Fri, 6 Jan 2017 05:48:02 +0000 (13:48 +0800)]
hyperv: Reorder the Hyper-V TSC initialization a bit.
This kinda simplifies the initialization logic.
Matthew Dillon [Fri, 6 Jan 2017 02:08:40 +0000 (18:08 -0800)]
kernel - Add NUMA awareness to vm_page_alloc() and related functions
* Add NUMA awareness to the kernel memory subsystem. This first iteration
will primarily affect user pages. kmalloc and objcache are not
NUMA-friendly yet (and its questionable how useful it would be to make
them so).
* Tested with synth on monster (4-socket opteron / 48 cores) and a 2-socket
xeon (32 threads). Appears to dole out localized pages 5:1 to 10:1.
Matthew Dillon [Fri, 6 Jan 2017 00:33:49 +0000 (16:33 -0800)]
kernel - Refactor phys_avail[] and dump_avail[]
* Refactor phys_avail[] and dump_avail[] into a more understandable
structure.
Sascha Wildner [Thu, 5 Jan 2017 14:20:50 +0000 (15:20 +0100)]
alc.4: Add Killer E2400 to the list of supported devices.
Taken-from: FreeBSD
Sepherosa Ziehau [Thu, 5 Jan 2017 13:45:52 +0000 (21:45 +0800)]
alc: Sync w/ FreeBSD
Mainly
- Fix DMA selection for AR816x family chips.
- Add Killer E2400 support.
Obtained-from: FreeBSD 277907, 295735 (part), 304574, 304584
Sepherosa Ziehau [Thu, 5 Jan 2017 13:15:06 +0000 (21:15 +0800)]
pci: Add a quirk for chips w/ broken MSI support.
These chips (mainly chips supported by alc(4)) will not send MSI,
if INTxDIS is set.
Obtained-from: FreeBSD
Sascha Wildner [Wed, 4 Jan 2017 07:41:18 +0000 (08:41 +0100)]
<sys/vfscache.h>: Sync enum vtagtype with what we have.
Sascha Wildner [Wed, 4 Jan 2017 07:39:55 +0000 (08:39 +0100)]
Remove portal file system, mount_portal and examples.
It has been broken for a long time I think.
Approved-by: dillon
Matthew Dillon [Tue, 3 Jan 2017 02:48:11 +0000 (18:48 -0800)]
cam - Fix bus registration race
* Fix bus registration race. This race could only occur when
hw.ahci.synchronous_boot is set to 0 (it defaults to 1).
Matthew Dillon [Tue, 3 Jan 2017 01:54:25 +0000 (17:54 -0800)]
vmstat - Make vmstat -m more readable (2)
* Follow up on first commit.
Matthew Dillon [Tue, 3 Jan 2017 01:49:36 +0000 (17:49 -0800)]
kernel - vm_object work
* Adjust OBJT_SWAP object management to be more SMP friendly. The hash
table now uses a combined structure to reduce unnecessary cache
interactions.
* Allocate VM objects via kmalloc() instead of zalloc. Remove the zalloc
pool for VM objects and use kmalloc(). Early initialization of the kernel
does not have to access vm_object allocation functions until after basic
VM initialization.
* Remove a vm_page_cache console warning that is no longer applicable.
(It could be triggered by the RSS rlimit handling code).
Matthew Dillon [Tue, 3 Jan 2017 01:47:23 +0000 (17:47 -0800)]
kernel - Add kmalloc_set_unlimited()
* Add kmalloc_set_unlimited() to more trivially unlimit a kmalloc pool.
Matthew Dillon [Tue, 3 Jan 2017 01:46:22 +0000 (17:46 -0800)]
vmstat - Make vmstat -m more readable
* Make vmstat -m more readable by converting to appropriate units.
* Shorten some of the malloc_type pool names.
Matthew Dillon [Tue, 3 Jan 2017 01:42:55 +0000 (17:42 -0800)]
nvme - Adjust manual page
* Adjust the last paragraph describing EFI booting to match our current
state of affairs.
Matthew Dillon [Tue, 3 Jan 2017 01:40:50 +0000 (17:40 -0800)]
kernel - Fix kmalloc pool accounting for M_NETCRED
* Some kfree()'s for M_NETCRED should really have been for M_RTABLE. Fixes
an accounting error that shows up in 'vmstat -m'.
* Rename the kmalloc pool in netinet/ip_encap.c to M_IPENCAP. It was
previously named M_NETCRED and duplicated another pool's name.
Sascha Wildner [Mon, 2 Jan 2017 02:26:31 +0000 (03:26 +0100)]
Sync ACPICA with Intel's version
20161222.
* Fixed a regression where occasionally a valid resource
descriptor was incorrectly detected as invalid at runtime,
and a AE_AML_NO_RESOURCE_END_TAG was returned.
* Fixed a problem with the recently implemented support that
enables control method invocations as Target operands to
many ASL operators. Warnings of this form: "Needed type
[Reference], found [Processor]" were seen at runtime for
some method invocations.
This is the proper fix for
72b7bc0a284cc.
* Enhanced iasl(8) output for Switch/Case statements.
For a more detailed list, please see sys/contrib/dev/acpica/changes.txt.
Matthew Dillon [Mon, 2 Jan 2017 01:52:23 +0000 (17:52 -0800)]
kernel - Fix TRIM bugs in UFS
* Fix serious bug in devfs's implementation of VOP_FREEBLKS. devfs was
running this operation asynchronously, but callers (aka UFS) expect it
to run synchronously.
* Fix minor bug in CAM related to TRIM failures.
* Enforce block count limitations in NVMe for WRITEZ.
* Mostly applicable to NVMe, which will implement FREEBLKS using the WRITEZ
command (at least for now). Trim is disabled on SATA SSDs by default in
the driver.
Fixes UEFI booting issues with NVMe when using a UFS /boot. Writing or
updating the UFS /boot mounted via NVMe resulted in a corrupt partition due
to the asynchronous VOP_FREEBLKS that we fixed above.
Reported-by: mneumann.
Matthew Dillon [Sun, 1 Jan 2017 21:20:49 +0000 (13:20 -0800)]
kernel - Fix bugs in recent RSS/swap commits
* Refactor the vm_page_try_to_cache() call to take a page already busied,
and fix a case where it was previously being called improperly that left
a VM page permanently busy.
Matthew Dillon [Sat, 31 Dec 2016 00:36:29 +0000 (16:36 -0800)]
vmstat - (-m) Make large values more readable
* Display values > 99M in megabytes instead of kilobytes. Makes everything
a whole lot easier to read. For vmstat -m
Sascha Wildner [Sun, 1 Jan 2017 04:04:17 +0000 (05:04 +0100)]
Bump copyrights.
Sascha Wildner [Sat, 31 Dec 2016 00:45:17 +0000 (01:45 +0100)]
humanize_number.3: Fix typo.
Matthew Dillon [Sat, 31 Dec 2016 00:08:50 +0000 (16:08 -0800)]
df, pstat - Use HN_FRACTIONAL
* Use the new HN_FRACTIONAL to display fractional digits in a better
way than HN_DECIMAL. The general problem being solved is that in
numerous cases HN_DECIMAL would only display two digits, which is
not enough precision.
For example, if you have a 32.3G volume it would previously display as
32G, and will now display as 32.3G. If I configured 14.6TB of swap it
would previously display as 14T and will now display as 14.6T.
Matthew Dillon [Sat, 31 Dec 2016 00:07:04 +0000 (16:07 -0800)]
libutil - Add HN_FRACTIONAL to humanize_number()
* Add HN_FRACTIONAL to humanize_number(). This is an expanded HN_DECIMAL
mode. Up to two additional fractional digits will be displayed if they
would fit in the buffer. Fractional digits are not displayed for small
numbers (less than 1000 or less than 1024 depending on the mode).
Matthew Dillon [Fri, 30 Dec 2016 22:47:16 +0000 (14:47 -0800)]
libkvm - Interim solution to boost swap statistics fields
* Change ksw_used and ksw_total to unsigned, which increases the maximum
total swap that can be displayed properly from ~8TB to ~16TB.
This is an interim solution, since DragonFly now supports more than 16TB
of swap.
* Noticed when the pstat output was mangled after I configured 16TB of
actual honest to god swap space.
Matthew Dillon [Fri, 30 Dec 2016 20:21:26 +0000 (12:21 -0800)]
kernel - Fix swap issue, implement dynamic pmap PT/PD/PDP deletion
* The pmap code is now able to dynamically destroy PT, PD, and PDP
page table pages when they become empty. To do this we had to
recode the higher-level page tables to wire on creation of a lower-level
pv_entry instead of wiring on pte entry.
DragonFly previously left PD and PDP pages intact, and possibly also PTs,
until process exit. In normal operation this had no real impact since
most programs don't bloat up enough for the extra page table pages to
matter, but its good to finally fix it as it allows the pmap footprint
to be significantly reduced in the very few situations where a program
bloats and unbloats during operation.
* Fix an issue with recent swap changes. We must increase the stripe
between multiple swap devices to match the number of entries available
on a radix leaf, which increased from 32 to 64. This fixes a pstat -s
accounting error that would sometimes attribute swap frees to the wrong
device.
* Refactor the RSS limiting code to scan the pmap instead of scan the
vm_map and related underlying objects. This greatly enhances performance
because the underlying objects might have many pages that are not mapped.
By scanning the pmap, we avoid having to sift through them all.
Also makes use of the dynamic removal feature in the pmap code to restrict
the effort required to do the pmap scan, and allows us to avoid most of
the issues related to stacked VM objects.
Sascha Wildner [Fri, 30 Dec 2016 15:10:22 +0000 (16:10 +0100)]
Raise WARNS to 3 for sftp(1) and sftp-server(8).
Sascha Wildner [Fri, 30 Dec 2016 14:46:56 +0000 (15:46 +0100)]
chmod.c: Remove mention of POSIX in a comment.
POSIX doesn't specify -h.
Submitted-by: Sevan Janiyan
Taken-from: NetBSD
Dragonfly-bug: <http://bugs.dragonflybsd.org/issues/2948>
Sascha Wildner [Thu, 29 Dec 2016 23:10:13 +0000 (00:10 +0100)]
zone.9: Adjust for the removal of the 'zalloc' arg to zinit/zinitna.
Sascha Wildner [Thu, 29 Dec 2016 23:04:38 +0000 (00:04 +0100)]
nlookup.9: Adjust for the removal of nlookup_set_cred().
Matthew Dillon [Wed, 28 Dec 2016 22:07:02 +0000 (14:07 -0800)]
kernel - Add flexibility to the RSS rlimit
* Add sysctl vm.pageout_memuse_mode, defaulting to 1:
0 - disable (behavior prior to memoryuse rlimit commits). RLIMIT_RSS
is ignored. Pagedaemon operates normally based on global page
queues.
1 - passive mode (default). Pagedaemon operates normally, but additional
actions are taken for processes exceeding their RLIMIT_RSS.
Enforces RSS on a per-process basis by removing pages from the pmap,
but simply deactivates the page and does not synchronously free it
or page it out to swap. The deactivated pages are more likely to be
cleaned out by the system by the pagedaemon verses what it would
normally choose from the global page queues.
This mode has the smoothest results for the process being limited,
as well as a lower impact on actual paging to swap, but this mode
has the similar impact on alloctable memory for other unrelated
processes if the limited process continues to allocate large amounts
of memory.
2 - active mode. Pagedaemon operates normally, but additional actions
are taken for processes exceeding their RLIMIT_RSS.
Enforces RSS on a per-process basis by actively freeing clean pages
and actively paging out dirty pages. This has the least impact on
other unrelated processes but can cause the limited process to stall
for short periods of time. This mode has the least impact on
allocatable memory.
However, this mode can cause excessive paging to swap, and thus is
not the default.
Matthew Dillon [Wed, 28 Dec 2016 20:04:03 +0000 (12:04 -0800)]
kernel - Increase KVM from 128G to 511G, further increase maximum swap
* Increase KVM (Kernel Virtual Memory) to the maximum we currently
support. Up to half of it can be used for swblock structures
(SWAPMETA in vmstat -z). This allows the following swap maximums.
128G of ram - 15TB of data can be swapped out.
256G of ram - 30TB of data can be swapped out.
512G+ of ram - 55TB - this is the maximum we can support swapped out.
* We can support > 512G of KVM in the future with only a bit of work on
how KVM is reserved.
* Remove some debugging code.
Matthew Dillon [Wed, 28 Dec 2016 20:03:36 +0000 (12:03 -0800)]
kernel - Cleanup swap comments
* Cleanup some incorrect comments
Sascha Wildner [Wed, 28 Dec 2016 14:46:54 +0000 (15:46 +0100)]
kernel: Fix a -Wundef warning.
Sascha Wildner [Wed, 28 Dec 2016 14:46:06 +0000 (15:46 +0100)]
<errno.h>: Generally include <sys/cdefs.h>.
I forgot this change in
329111df7bb1e6c835b3e1835b384ecf2dd3aaf7.
Sepherosa Ziehau [Wed, 28 Dec 2016 13:02:26 +0000 (21:02 +0800)]
tcp: Fix connect to INADDR_ANY.
Reported-by: mneumann
DragonFly-bug: http://bugs.dragonflybsd.org/issues/2973
Matthew Dillon [Wed, 28 Dec 2016 02:34:26 +0000 (18:34 -0800)]
kernel - Implement RLIMIT_RSS, Increase maximum supported swap
* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS
exceeds the rlimit. Currently the algorith used to choose the pages
is fairly unsophisticated (we don't have the luxury of a per-process
vm_page_queues[] array).
* Implement the swap_user_async sysctl, default off. This sysctl can be
set to 1 to enable asynchronous paging in the RSS code. This is mostly
for testing and is not recommended since it allows the process to eat
memory more quickly than it can be paged out.
* Reimplement vm.swap_burst_read so the sysctl now specifies the number
of pages that are allowed to be burst. Still disabled by default (will
be enabled in a followup commit).
* Fix an overflow in the nswap_lowat and nswap_hiwat calculations.
* Refactor some of the pageout code to support synchronous direct
paging, which the RSS code uses. Thew new code also implements a
feature that will move clean pages to PQ_CACHE, making them immediately
reallocatable.
* Refactor the vm_pageout_deficit variable, using atomic ops.
* Fix an issue in vm_pageout_clean() (originally part of the inactive scan)
which prevented clustering from operating properly on write.
* Refactor kern/subr_blist.c and all associated code that uses to increase
swblk_t from int32_t to int64_t, and to increase the radix supported from
31 bits to 63 bits.
This increases the maximum supported swap from 2TB to some ungodly large
value. Remember that, by default, space for up to 4 swap devices
is preallocated so if you are allocating insane amounts of swap it is
best to do it with four equal-sized partitions instead of one so kernel
memory is efficiently allocated.
* There are two kernel data structures associated with swap. The blmeta
structure which has approximately a 1:8192 ratio (ram:swap) and is
pre-allocated up-front, and the swmeta structure whos KVA is reserved
but not allocated.
The swmeta structure has a 1:341 ratio. It tracks swap assignments for
pages in vm_object's. The kernel limits the number of structures to
approximately half of physical memory, meaning that if you have a machine
with 16GB of ram the maximum amount of swapped-out data you can support
with that is 16/2*341 = 2.7TB. Not that you would actually want to eat
half your ram to do actually do that.
A large system with, say, 128GB of ram, would be able to support
128/2*341 = 21TB of swap. The ultimate limitation is the 512GB of KVM.
The swap system can use up to 256GB of this so the maximum swap currently
supported by DragonFly on a machine with > 512GB of ram is going to be
256/2*341 = 43TB. To expand this further would require some adjustments
to increase the amount of KVM supported by the kernel.
* WARNING! swmeta is allocated via zalloc(). Once allocated, the memory
can be reused for swmeta but cannot be freed for use by other subsystems.
You should only configure as much swap as you are willing to reserve ram
for.
Matthew Dillon [Wed, 28 Dec 2016 01:30:23 +0000 (17:30 -0800)]
kernel - Do a better job locking CAM ref counts
* Fix some ref-count races inside CAM which can be triggered particularly
by asynchronous AHCI probing.
Imre Vadász [Tue, 27 Dec 2016 21:46:31 +0000 (22:46 +0100)]
drm: Invert del_timer_sync return value, to match behaviour in Linux.
* DragonFly's callout_drain() returns 1 if the function was executed.
Linux's del_timer_sync() returns 1 if the timer was pending and got
cancelled.
So inverting the callout_drain() return value should make del_timer_sync
behave a bit more like the correponding function in Linux.
* This fixes the behaviour of the "if (del_timer_sync(&domain->timer) == 0)"
check in intel_uncore_forcewake_reset() .
Sepherosa Ziehau [Sun, 25 Dec 2016 13:01:54 +0000 (21:01 +0800)]
syncache: Simplify port calculation by reusing ACK's hash for IPv4.
Sepherosa Ziehau [Sun, 25 Dec 2016 12:55:57 +0000 (20:55 +0800)]
loopback: Allow turning off RSS.
Sepherosa Ziehau [Sun, 25 Dec 2016 11:19:49 +0000 (19:19 +0800)]
tcp: Save faddr/fport before lport selection.
So that the inpcb installed onto the lport hash can have correct
4-tuple. Reminded by the "Problem #2" in the following FreeBSD PR:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=174087
Sepherosa Ziehau [Sun, 25 Dec 2016 09:20:47 +0000 (17:20 +0800)]
tcp: Nuke the sysctl to disable local port extension.
Matthew Dillon [Fri, 23 Dec 2016 22:38:13 +0000 (14:38 -0800)]
AHCI - Misc fixes
* Reduce chip reset time from 500ms to 250ms to speed up booting on
machines with multiple AHCI controllers.
* Fix a bug in a piece of the error recovery code that was waiting
forever.
* Implement the hw.ahci.synchronous_boot TUNABLE. Setting this variable
to 0 in loader.conf causes the ahci device probe to be fully asynchronous
during booting. This is HIGHLY experimental and not recommended on
systems with only one controller as the kernel may boot too quickly for
the boot drive to probe before the kernel gets to init.
* Do a pass on the ahci.4 manual page.
Sascha Wildner [Thu, 22 Dec 2016 12:36:05 +0000 (13:36 +0100)]
Update the pciconf(8) database.
December 19, 2016 snapshot from http://pciids.sourceforge.net/
Sascha Wildner [Thu, 22 Dec 2016 11:29:45 +0000 (12:29 +0100)]
libc: Include <unistd.h> for ftell/ftruncate/truncate prototypes.
Currently, <stdio.h> and <sys/types.h> define them too, so this is
only cosmetic.
While here, fix a case in dump(8) too.
Imre Vadász [Thu, 22 Dec 2016 00:03:45 +0000 (01:03 +0100)]
drm/i915: Fix typo in get_bdb_header(), fixes vbt validity check.
Matthew Dillon [Wed, 21 Dec 2016 22:12:46 +0000 (14:12 -0800)]
ahci - Add workarounds for Marvell 88SE9215
* This Marvell chip also needs some quirks. Probably most of the older
Marvell chips need the same quirks, and the newer probably needs the
FR cycling quirk, but for now I'm adding them only specifically as
they are tested.
Reported-by: Edward Berger
Matthew Dillon [Wed, 21 Dec 2016 19:14:17 +0000 (11:14 -0800)]
ahci - Improve port-multiplier detection
* Improve port-multiplier detaction by adding workarounds for
poorly-implemented AHCI and PM chipsets. Now detects the popular
Rosewill 4-bay enclosure, which uses chipid 0x575f197b.
Increase device detect timeout from 3/10 second to 2 seconds. This
enclosure stupidly takes extra time on the first COMRESET after a cold
power-on to detect, I'm guessing because it is testing both its USB and
its eSATA port.
This port multiplier sometimes returns ready before its software has
completely initialized, causing PM register READs to succeed, but
return data values of 0. If we get a data value of 0 for the REV register
we sleep a little and try once more.
* Marvell AHCI chip does not immediately latch the signature on the
second FIS during a software reset. Give it 500ms to do so.
Ignore a BSY condition between the first and second FIS during a software
reset probe of the PM.
Sepherosa Ziehau [Wed, 21 Dec 2016 14:08:19 +0000 (22:08 +0800)]
ip: Set mbuf hash for output IP packets.
This paves the way to implement Flow-Queue-Codel.
zrj [Wed, 21 Dec 2016 07:56:04 +0000 (09:56 +0200)]
Fix typo.
Also downgragrade to warning to ease up update from 4.6-release,
host bmake does not handle: make -f Makefile.inc1 -V WORLD_ALTCOMPILER
John Marino [Tue, 20 Dec 2016 20:16:17 +0000 (14:16 -0600)]
Take II on fallback HOST_BINUTILSVER
The format of BINUTILSVER is binutils2XX, but the previous change included
the libexec prefix. Strip this out too.
Reported-by: zrj
John Marino [Tue, 20 Dec 2016 18:32:53 +0000 (12:32 -0600)]
Fix world build in NO_ALTBINUTILS edge case
In the case that the machine has been updated within 30 days but with
NO_ALTBINUTILS set, the world build fails. This is due because the
logic to fallback to earlier binutils versions fails due to empty
directories that are installed regardless of the NO_ALBINUTILS setting.
The logic was updated to search for binutils programs rather than
directories. In the edge case, the oldest version of binutils on the
system is used to build the native versions during the early build phases.
Sepherosa Ziehau [Tue, 20 Dec 2016 03:09:30 +0000 (11:09 +0800)]
hyperv: Add API to read raw value of Hyper-V timer.
Accelerate Hyper-V event timer reloading.
Sepherosa Ziehau [Tue, 20 Dec 2016 02:56:08 +0000 (10:56 +0800)]
hyperv: Move commonly shared header files to the module's top dir.
Sepherosa Ziehau [Tue, 20 Dec 2016 02:50:15 +0000 (10:50 +0800)]
hyperv: Implement Hyper-V reference TSC cputimer.
This one is at least 2 times faster than its rdmsr counterpart.
Obtained-from: FreeBSD
Sepherosa Ziehau [Tue, 20 Dec 2016 02:49:44 +0000 (10:49 +0800)]
cputimer: Add more IDs for VMM cputimers.
Matthew Dillon [Mon, 19 Dec 2016 18:08:06 +0000 (10:08 -0800)]
ahci - Implement FBS for port-multipliers
* Implement FBS (FIS-Based Switching) for port-multipliers. If the
chipset supports it, the ahci driver now turns on FBS mode which
allows us to queue concurrent requests to different targets.
Most AHCI chipsets do not support FBS resulting in poor port-multiplier
performance.
- FBS is enabled in the PM probe.
- FBS must be disabled when doing a hard reset.
- In FBS mode commands must be queued to PREG_CI one at a time,
and the target must be written to AHCI_PREG_FBS prior to activation
via CI.
- RFIS area is larger, and RFIS responses are copied from the
appropriate target index instead of index 0.
- Issue a COMRESET during the PM probe if a BSY status is
recognized, which helps on chipsets which do not implement
the SCLO cap.
* Clean-up a little logic in ahci_port_stop().
* Use the saved sc_cap to check for the SCLO capability instead of
re-reading AHCI_REG_CAP in a few places.
* Dump the RFIS data to the console on error.
* Fixup sc_cap to directly incorporate quirks.
Matthew Dillon [Mon, 19 Dec 2016 07:21:19 +0000 (23:21 -0800)]
ahci - Add quirks for Marvell devices
* Add some quirks for badly broken Marvell devices.
* 88SE9172 - This badly broken AHCI chipset does not support FR *or*
CR responses.
* 88SE9230 - This badly broken AHCI chipset supports FR and CR, but
cannot maintain FR across a disconnect. FRE must be
cycled on the insertion detect in order to re-assert
FR and be able to detect the new device.
This chipset also seems to have other problems, sometimes
generating an error (TFES error) on SET_FEATURES, which
does not happen when the drive is connected to the Intel
AHCI chipset.
* Implement quirks for these devices. Also, don't enable FRE with
POD and SUD (do it separately), and sequence CMD_ICC_ACTIVE a bit
differently than before.
Matthew Dillon [Mon, 19 Dec 2016 00:45:06 +0000 (16:45 -0800)]
ahci - Adjust a few things
* These changes have no effect on known AHCI devices but are a good idea.
* As suggested in the AHCI spec 10.1.2, zero out the memory pointed to
by the FB and CL port dma addresses.
* Write to FB before FBU, and to CLB before CLBU, just in case hardware
clears the upper bits on a write to the lower bits (no known AHCI
hardware does this but its something that is commonly implemented in
other hw so...).
* Improved I/O error reporting.
Sascha Wildner [Mon, 19 Dec 2016 17:46:11 +0000 (18:46 +0100)]
Some mdoc cleanup in tuning.7 and swapcache.8
Reported-by: zrj
zrj [Mon, 19 Dec 2016 15:55:21 +0000 (17:55 +0200)]
gcc50: Build lto-wrapper even if buildworld is not LTO enabled.
After default binutils update is now safe to do that.
Keep in mind that buildworld still should work when downgrading to non LTO one.
This finally allows to have standard buildworld and LTO'ed buildkernel.
zrj [Mon, 19 Dec 2016 16:01:05 +0000 (18:01 +0200)]
<sys/param.h>: Bump __DragonFly_version for binutils update.
zrj [Mon, 19 Dec 2016 06:02:20 +0000 (08:02 +0200)]
Switch to binutils227 as default base binutils.
DPorts were fixed to work with ld.gold version 1.12 from binutils 2.27,
some workarounds were added to few ports. Haskell.
ld(ld.gold) has become very strict, in some scenarios LDVER=ld.bfd will help.
Updated binutils bring better support for world/kernel compilation with -flto.
Also updated ld.gold now is able to link chromium without any DSO warnings.
Signed-off-by: marino, swildner
zrj [Mon, 19 Dec 2016 15:37:27 +0000 (17:37 +0200)]
flex: Disable LTO in the libfl.a for clang.
clang has issues with such LTO'ed static library.
This library is small and gains of LTO are minimal.
Unbreaks ports like lang/gscheme.
Sepherosa Ziehau [Mon, 19 Dec 2016 15:49:56 +0000 (23:49 +0800)]
ip: Add parenthesis properly.
Sepherosa Ziehau [Mon, 19 Dec 2016 13:32:41 +0000 (21:32 +0800)]
ip: Move multicast addresses detection into common place.
zrj [Sun, 18 Dec 2016 16:25:33 +0000 (18:25 +0200)]
libc: Avoid negative offsets in link_ntoa().
Discussed-with: swildner
Taken-from: FreeBSD
Tomohiro Kusumi [Sat, 17 Dec 2016 22:52:26 +0000 (07:52 +0900)]
sbin/hammer: Redo
e4323571 partly (after reverted by
03d5db37)
> sbin/hammer: Fix bug in get_buffer_data()
>
> The previous commit made clear that xor part of get_buffer_data()
> was wrong. Since buf_offset is in any zone not limited to zone-2,
> xor of two offsets doesn't necessarily show the right result to
> know whether they belong to the same buffer, even if ->zone2_offset
> is originally translated from the same zone within the same buffer.
>
> It needs to take xor of long offsets instead of full 64 bits.
Tomohiro Kusumi [Sat, 17 Dec 2016 21:20:35 +0000 (06:20 +0900)]
Revert "sbin/hammer: Fix bug in get_buffer_data()"
This reverts commit
e4323571a2e8310683120148b720a92f801c618f.
HAMMER_OFF_LONG_ENCODE() part is ok, but limiting to direct
zones causes several issues on formatting undo fifo, while
the commit avoids overhead of releasing everytime.
Tomohiro Kusumi [Sat, 17 Dec 2016 11:26:17 +0000 (20:26 +0900)]
sbin/hammer: Fix bug in get_buffer_data()
The previous commit made clear that xor part of get_buffer_data()
was wrong. Since buf_offset is in any zone not limited to zone-2,
xor of two offsets doesn't necessarily show the right result to
know whether they belong to the same buffer, even if ->zone2_offset
is originally translated from the same zone within the same buffer.
It needs to take xor of long offsets instead of full 64 bits.
The reason cache releasing is now limited to directly translated
zones is because for indirectly translated zones (i.e. undo zone),
it can't tell overlap by xor of offsets regardless of long format.
Prior to this commit, get_buffer_data() has been releasing buffers
that don't need to be released (i.e. *bufferp being the right cache),
and has resulted in huge overhead as shown in below comparison.
In the first example, get_buffer_data() is releasing *bufferp for
undo fifo entries everytime when it doesn't need to release.
-- Prior to this commit
# time newfs_hammer -L TEST /dev/da4
Volume 0 DEVICE /dev/da4 size 4.55TB
initialize freemap volume 0
initializing the undo map (1024 MB)
---------------------------------------------
HAMMER version 6
1 volume total size 4.55TB
root-volume: /dev/da4
boot-area-size: 32.00KB
memory-log-size: 256.00KB
undo-buffer-size: 1.00GB
total-pre-allocated: 1.02GB
<snip>
newfs_hammer -L TEST /dev/da4 3.05s user 1.16s system 41% cpu 10.098 total
-- Using this commit
# time newfs_hammer -L TEST /dev/da4
Volume 0 DEVICE /dev/da4 size 4.55TB
initialize freemap volume 0
initializing the undo map (1024 MB)
---------------------------------------------
HAMMER version 6
1 volume total size 4.55TB
root-volume: /dev/da4
boot-area-size: 32.00KB
memory-log-size: 256.00KB
undo-buffer-size: 1.00GB
total-pre-allocated: 1.02GB
<snip>
newfs_hammer -L TEST /dev/da4 2.72s user 0.04s system 73% cpu 3.755 total
Tomohiro Kusumi [Sat, 17 Dec 2016 00:00:44 +0000 (09:00 +0900)]
sbin/hammer: Fix terminology of buf_offset
This commit just renames (local and struct field) variables.
No functional difference.
The way HAMMER userspace uses name "buf_offset" is misleading.
In kernel space, "buf_offset" is for arbitrary zone offsets that
are not limited to zone-2, however in userspace "buf_offset" is
used for zone-2. It should be renamed to "zone2_offset" so the
terminology being used in kernel and userspace are the same.
This is important because the name implies what's stored in
upper 4 bits of 64 bits offset, and having misleading variable
names tends to be error-prone (see the next commit).
Tomohiro Kusumi [Sat, 17 Dec 2016 10:27:38 +0000 (19:27 +0900)]
sys/vfs/hammer: Rename misleading macro hammer_is_zone2_mapped_index()
All zones are mapped to zone2 (whether directly or indirectly),
so hammer_is_zone2_mapped_index() is a misleading name.
It should have indicated it's for B-Tree records related zones.
Tomohiro Kusumi [Sat, 17 Dec 2016 00:34:30 +0000 (09:34 +0900)]
sbin/hammer: Remove redundant blockmap lookup in hammer show
blockmap_lookup() is called via check_data_crc() right before
check_data_crc() gets called. This isn't necessary for checking
data CRC either.
Tomohiro Kusumi [Fri, 16 Dec 2016 18:52:37 +0000 (03:52 +0900)]
sbin/hammer: Use calloc(3) instead of malloc(3)+bzero(3)
Tomohiro Kusumi [Fri, 16 Dec 2016 18:43:17 +0000 (03:43 +0900)]
sbin/hammer: Properly use calloc(3)
It's supposed to be number and then size.
Tomohiro Kusumi [Fri, 16 Dec 2016 17:24:45 +0000 (02:24 +0900)]
sbin/hammer: Refactor hammer_cache_flush()
Tomohiro Kusumi [Fri, 16 Dec 2016 16:49:12 +0000 (01:49 +0900)]
sbin/hammer: Remove redundant cache counter NCache
Incrementation and decrementation of NCache is always aligned
with CacheUse in a single thread program like /sbin/hammer,
so this cache counter isn't necessary.
Tomohiro Kusumi [Fri, 16 Dec 2016 16:46:05 +0000 (01:46 +0900)]
sbin/hammer: Use HAMMER_BUFSIZE to calculate CacheMax
CacheMax is to be compared with multiple of HAMMER_BUFSIZE,
so use HAMMER_BUFSIZE to initialize CacheMax.
Tomohiro Kusumi [Fri, 16 Dec 2016 15:51:38 +0000 (00:51 +0900)]
sbin/hammer: Change fprintf (without exit) to err variants
In additon to
bac217f3 and
02318f07, these are fprints calls
not followed by exit right after fprintf, but makes no difference
with err variants (as it'll exit(1) shortly).
The ones in sbin/hammer/cmd_dedup.c should have been changed
in
02318f07.
Tomohiro Kusumi [Fri, 16 Dec 2016 15:08:11 +0000 (00:08 +0900)]
sbin/mount_hammer: Use warn(3) variants
Tomohiro Kusumi [Fri, 16 Dec 2016 14:49:47 +0000 (23:49 +0900)]
sbin/newfs_hammer: Refactoring
Tomohiro Kusumi [Fri, 16 Dec 2016 14:23:46 +0000 (23:23 +0900)]
sbin/newfs_hammer: Use warn(3) variants
Tomohiro Kusumi [Fri, 16 Dec 2016 08:10:18 +0000 (17:10 +0900)]
sbin/newfs_hammer: Mention root volume is volume#0 in manpage
Tomohiro Kusumi [Fri, 16 Dec 2016 06:33:45 +0000 (15:33 +0900)]
sbin/hammer: Don't hardcode 0 for root PFS
HAMMER code doesn't hardcode 0 for root PFS
(e.g. see sbin/newfs_hammer, it could be !=0 if one wants to do so).
Fix the existing error messages using hardcoded 0.
Also add "(root PFS)" for PFS#0 in hammer info command.
Sepherosa Ziehau [Sat, 17 Dec 2016 13:20:58 +0000 (21:20 +0800)]
mbuf: Factor function to set mbuf hash.
Matthew Dillon [Sat, 17 Dec 2016 06:25:00 +0000 (22:25 -0800)]
vmstat - Adjust headers
* Widen some of the header names to make them more readable.
* Adjust manual page.