Antonio Huete Jimenez [Sat, 11 Jun 2016 21:47:47 +0000 (14:47 -0700)]
Makefile.usr - A bit of cleanup
- Use targets instead of .if in a few checks.
- Exit on error for better scripting
Imre Vadász [Sat, 11 Jun 2016 15:54:44 +0000 (17:54 +0200)]
if_iwm - Add and use iwm_phy_db_free(), to plug phy_db memory leak.
* Memory leakage in M_DEVBUF is now at ca. 2KB for each iwm(4) module
load/unload cycle.
Taken-From: Linux iwlwifi
Imre Vadász [Sat, 11 Jun 2016 13:21:06 +0000 (15:21 +0200)]
if_iwm - GC unused struct iwm_rx_buf. Two small nitpicks.
Imre Vadász [Sat, 11 Jun 2016 11:46:33 +0000 (13:46 +0200)]
if_iwm - Use mbuf for large firmware commands, like OpenBSD does.
* We also need to consider the size of large firmware commands in
iwm_alloc_tx_ring(), in the dma tag creation, when
qid == IWM_MVM_CMD_QUEUE.
Inspired-by: OpenBSD and existing code in iwm_rx_addbuf()
Sascha Wildner [Sat, 11 Jun 2016 12:39:57 +0000 (14:39 +0200)]
kqueue.2: Improve markup.
Matthew Dillon [Sat, 11 Jun 2016 07:53:32 +0000 (00:53 -0700)]
hammer2 - Fix infinite flush recursion, reduce bulkfree console spam
* Fix infinitee flush recursions. Chains marked HAMMER2_CHAIN_DESTROY
must be recursed on so they can be deleted by the chain lastdrop code.
Also add a missing downward propagation.
* Reduce console spam in the bulkfree code reporting the number of inodes
scanned.
Matthew Dillon [Sat, 11 Jun 2016 02:50:13 +0000 (19:50 -0700)]
test - prt() double va_arg use
* ptr() needs to reset its va in prt() on the second use.
Matthew Dillon [Sat, 11 Jun 2016 02:46:08 +0000 (19:46 -0700)]
hammer2 - Fix *errorp, instrument strategy errors
* Instrument strategy call errors
* Instrument a number of errors with console messages
* Pre-zero *errorp in hammer2_write_file_core()
* Set trivial in hammer2_write_file() if lbase at or beyond the file EOF.
* use uiomovebp() to avoid mmap()/read() and mmap()/write() deadlocks.
Matthew Dillon [Sat, 11 Jun 2016 02:41:21 +0000 (19:41 -0700)]
kernel - Instrument vnode pager error
* Provide more information when reporting vnode pager errors
* Report nvextendbuf() error. Fix type-o's
Matthew Dillon [Fri, 10 Jun 2016 18:25:58 +0000 (11:25 -0700)]
hammer2 - Fix upgrade deadlock
* Fix a deadlock which occurs when hammer2_chain_unlock() tries to
upgrade the 'last' shared lock to exclusive. This can deadlock if
another thread obtains the chain shared before we manage to do the
upgrade.
Just use a 'try' here. If it fails it means someone else got a lock
(of any kind) and we don't have to worry about dropping the chain data.
* Replace hammer2_mtx_upgrade() with hammer2_mtx_upgrade_try(). Remove
support for a blocking 'upgrade'. It is no longer needed and it is
too dangerous to have anyway.
Imre Vadász [Fri, 10 Jun 2016 20:34:17 +0000 (22:34 +0200)]
if_iwm - Use DragonFly specific convenience functions for bus_dma stuff.
* Use bus_dmamap_load_mbuf_defrag() in iwm_tx().
* Use bus_dmamem_coherent() in iwm_dma_contig_alloc().
* Use bus_dmamap_load_mbuf_segment() in iwm_rx_addbuf().
* This means iwm_dma_map_addr() is no longer needed on DragonFly.
* Try to keep around the corresponding/old code for FreeBSD for easier
syncing of changes to/from FreeBSD.
Imre Vadász [Fri, 10 Jun 2016 20:26:21 +0000 (22:26 +0200)]
if_iwm - Compare paylen to datasz instead of sizeof(cmd->data).
Matthew Dillon [Fri, 10 Jun 2016 18:14:31 +0000 (11:14 -0700)]
test - Cleanup some test/debug code
* Cleanup some test/debug code
Matthew Dillon [Fri, 10 Jun 2016 18:09:29 +0000 (11:09 -0700)]
hammer2 - Add truncation lock, change dio persistence
* Add a truncation lock to interlock between write()'s and ftruncate()
calls. This prevents a junk buffer from surviving a ftruncate() if
it happens to get written concurrently with the ftruncate().
* Change dio persistence. Do not automatically persist dio's while refs
are held, this creates problems for ref'd chains held in the
hammer2_inode_t structure and prevents vfsync() from working properly.
Add a persist_refs field which we will use later in the XIO code to
persist a chain's DIO across an unlock/relock sequence, and possibly
in other places too.
Matthew Dillon [Fri, 10 Jun 2016 18:03:18 +0000 (11:03 -0700)]
kernel - Try to improve 'Warning: vfsync skipped dirty bufs'... messages
* Use BUF_TIMELOCK instead of locking non-blocking if the vfsync()
encounters a buffer that it cannot lock.
* This should theoretically reduce (hopefully prevent) instances of the
'vfsync skipped N dirty bufs' warnings on the console which occur under
heavy filesystem loads.
* Also remove 'Warning buffer ... was recycled' kprintfs. This debugging
was originally added to determine if a particular retry path was getting
hit (it does), and is no longer needed.
Tomohiro Kusumi [Fri, 10 Jun 2016 06:52:07 +0000 (15:52 +0900)]
sbin/newfs_hammer2: Change error messages to "hammer2"
Tomohiro Kusumi [Fri, 10 Jun 2016 06:26:28 +0000 (15:26 +0900)]
sbin/newfs_hammer2: Fix ascii-art of initial image
Tomohiro Kusumi [Fri, 10 Jun 2016 05:14:33 +0000 (14:14 +0900)]
sbin/hammer2: Add #include guard
Tomohiro Kusumi [Fri, 10 Jun 2016 02:58:05 +0000 (11:58 +0900)]
sbin/hammer2: Use volatile sig_atomic_t
Tomohiro Kusumi [Fri, 10 Jun 2016 02:41:46 +0000 (11:41 +0900)]
sys/vfs/hammer2: Change u_int{8,16,32,64}_t to uint{8,16,32,64}_t
hammer2 mostly uses uint{8,16,32,64}_t, so fix u_int{8,16,32,64}_t.
Matthew Dillon [Fri, 10 Jun 2016 05:23:27 +0000 (22:23 -0700)]
nvme - Add kernel dump support
* Add kernel dump support to the nvme driver.
* Issue a FLUSH and chip shutdown sequence after the dump completes.
Matthew Dillon [Fri, 10 Jun 2016 04:24:26 +0000 (21:24 -0700)]
hammer2 - Cache chain->data and chain->dio until last release.
* Instead of releasing the chain data and dio on unlock, leave it
cached and intact until the last release. This will allow an upcoming
change to pass unlocked chain structures betweten threads to work
efficiently.
* Fix a race condition in the chain->parent linkage test that could cause
chain->data to be cleared improperly.
Imre Vadász [Fri, 10 Jun 2016 00:04:31 +0000 (02:04 +0200)]
if_iwm - Fix iwm_dma_contig_free(). dma->map is always NULL here.
* When bus_dmamem_alloc is used, the bus_dmamap_t is set to NULL, so we
were never actually freeing any dma memory allocations done via
iwm_dma_contig_alloc(). So we should check dma->vaddr instead of
dma->map here.
* This reduces dma memory leakage (as displayed by
"sysctl vm.dma_free_pages") to 11 pages for each if_iwm module
load/unload cycle.
Matthew Dillon [Fri, 10 Jun 2016 00:25:19 +0000 (17:25 -0700)]
hammer2 - Rename hammer2_thread.c to hammer2_admin.c
* hammer2_thread.c does a lot more than kernel threading support for H2.
It's does all the XOP administration as well, and that is really more
its primary function.
Matthew Dillon [Thu, 9 Jun 2016 23:49:09 +0000 (16:49 -0700)]
hammer2 - Allow chains to be cached
* Cache chain structures on the refs 1->0 transition. We still drop the
underlying dio and backing data (future optimizations are possible here
within the DIO subsystem but we have to be careful when it comes to
leaving kernel buffer cache buffers locked).
This allows hammer2 to retain a lot of the infrastructure that gets reused
across multiple system calls without having to constantly reconstitute it,
improving performance.
* Fix a few recent chain->flags modifications that weren't atomic. They
have to be atomic.
Imre Vadász [Thu, 9 Jun 2016 23:25:50 +0000 (01:25 +0200)]
if_iwm - Free rx ring on detach. Free nvm_sections data after parsing.
* Call iwm_free_rx_ring() when detaching.
* Free nvm_sections[i].data allocations after parsing the nvm data.
Matthew Dillon [Thu, 9 Jun 2016 20:23:53 +0000 (13:23 -0700)]
test - Pull in Mark Adler's hw iscsi crc32 bundle
* Pull his sample code into /usr/src/test/debug so we don't lose track of
it.
* Implements iscsi crc32 in hardware, uncached streaming memory bandwidth
is around 13 GBytes/sec.
Matthew Dillon [Thu, 9 Jun 2016 18:39:24 +0000 (11:39 -0700)]
kernel - Scan more pages in vm_pageout to fix OOM killer
* The pageout daemon was not being aggressive enough when working under
the heavy I/O read loads now made possible by nvme. Certain loads could
improperly trigger the process killer.
* Instead of trying to calculate the exact number of pages per pageout
queue to try to free up, which has had numerous edge conditions cause
problems over the years, change it so we are a lot more generous. The
page queues are scanned with an iterator so pulling more pages off each
one should work just fine.
* Fixes issue with combined tar cf /dev/null /mnt and find /mnt | wc -l
on a nvme mount with 2.4M files on it + one large 16GB file.
Sascha Wildner [Thu, 9 Jun 2016 18:12:00 +0000 (20:12 +0200)]
kernel: Save some indent here and there and some small cleanup.
All these are related to an inspection of the places where we do:
if (...) {
...
goto blah;
} else {
...
}
in which case the 'else' is not needed.
I only changed places where I thought that it improves readability or
is just as readable without the 'else'.
Sascha Wildner [Thu, 9 Jun 2016 17:30:12 +0000 (19:30 +0200)]
kernel/modnext: Improve the flow a bit regarding setting 'error'.
Sascha Wildner [Thu, 9 Jun 2016 16:56:49 +0000 (18:56 +0200)]
Remove am-utils, the Berkeley automounter suite (amd, amq, etc.)
We recently got FreeBSD's autofs(5) which replaces it. FreeBSD
added notes to their am-utils and related manual pages saying
that it is obsolete and advises to use autofs(5) instead.
DragonFly's port of it is almost surely broken and the last time
I heard from someone trying to get it to work was in 2013 and
back then it just hung (in select(), according to my notes).
So I don't think removing instead of trying to fix it will do
any harm.
François Tigeot [Thu, 9 Jun 2016 08:23:27 +0000 (10:23 +0200)]
drm/i915: Fix hangs on some broadwell machines
This driver failed to correctly initialize on some Broadwell systems,
symptoms being a black screen and an always spinning cpu fan.
Matthew Dillon [Thu, 9 Jun 2016 06:04:18 +0000 (23:04 -0700)]
world - Fix sysctlbyname() errno handling cases
* A number of routines inherited some bad code from each other,
The return value from sysctlbyname() was not being tested prior
to checking errno. Reformulate.
Reported-by: Stephen Welker stephen.welker@nemostar.com.au
Matthew Dillon [Thu, 9 Jun 2016 05:24:51 +0000 (22:24 -0700)]
hammer2 - multi-thread read-ahead XOPs
* Distribute asynchronous logical buffer read-ahead XOPs to multiple
worker threads. XOPs related to a particular inode are usually sent
to just one worker thread to reduce collision retries.
This works around the high messaging overhead(s) associated with the
current XOP architecture by spreading the pain around. And even though
the default check code is now xxhash, distributing the checks also helps
a great deal. The H2 chain topology actually parallelizes quite well for
read operations.
Streaming reads through the filesystem now run at over 1 GByte/sec (they
capped out at ~340MB/sec before). The effect on things like 'tar'
are not quite as pronounced but small-file scan/read performance will
typically improve by a tiny bit too.
* This change is probably more SSD-friendly than HDD-friendly for streaming
reads due to out-of-order queueing of the I/O requests. I ran a quick
read test on a WD black and it appeared to perform acceptably so for
now I'm going to run with it. Adjusting read-ahead scale via
vfs.hammer2.cluster_enable can be used to find a good value (for now).
* Remove the 'get race' kprintfs. This case now occurs very often due
to distributed read-aheads.
* chain->flags must use atomic ops, fix some cases I muffed up in recent
commits.
Matthew Dillon [Thu, 9 Jun 2016 05:10:49 +0000 (22:10 -0700)]
hammer2 - Revamp worker thread signaling
* Revamp how worker thread signaling works. Get rid of a number of race
conditions and use atomic ops. We no longer need thr->lk.
* Make hammer2_cluster_enable's scaling factor work with cluster_write()
as well as cluster_read().
Matthew Dillon [Wed, 8 Jun 2016 23:06:51 +0000 (16:06 -0700)]
hammer2 - Add xxhash to H2 and throw in debug stuff for performance testing.
* Add the xxhash. This is a high-speed non-cryptographic hash code
algorithm. Sam pointed me at the site, the code is available on
github and is BSD licensed:
git://github.com/Cyan4973/xxHash.git
This hash has good distribution and is very fast.
* Change HAMMER2 to default to using xxhash64 instead of iscsi_crc32().
xxhash can process data at several GBytes/sec where as even the
multi-table iscsi_crc32() can only do around 500 MBytes/sec, which
is too slow for today's modern storage subsystems (NVME can nominally
do 1.5-2.5 GBytes/sec, and high-end cards can do 5GBytes/sec).
* There are four major paths that eat tons of CPU in H2:
- The XIO path does a ton of allocation/deallocation and synchronous
messaging. This has not yet been fixed.
- The check code (when it was iscsi_crc32()) slowed everything down.
This is fixed, the default check code is now xxhash64.
- The check code was being called over and over again for the same cached
buffer due to the hammer2_chain_t structure being thrown away.
Currently a hack involving a mask stored in the underlying DIO is being
used to indicate that the check code was previously valid. This is
strictly temporary. The actual mask will have to be stored in the
device buffer cache buffer and a second one in the chain structure.
The chain structure must be made persistent as well (not yet done).
- The DEDUP code was also calling iscsi_crc32() redundantly (at least for
reads).
The read path has been fixed. The write path is doable but requires more
coding (not yet fixed).
- The logical file cluster_read() in the kernel was not doing any read-ahead
due to H2 not implementing BMAP, creating long synchronous latencies.
The kernel code for cluster_read() and cluster_readcb() has been fixed
to do read-ahead whether a logical BMAP is implemented or not. H2 will
now pipeline reads.
Suggested-by: Samuel J. Greear <sjg@thesjg.com> (xxhash)
Matthew Dillon [Wed, 8 Jun 2016 23:02:34 +0000 (16:02 -0700)]
hammer - Make vfs.hammer.cluster_enable an integer
* Instead of being a boolean, make it an integer and use it to control
how much read-ahead is requested (in 64KB blocks).
* Change cluster_enable from 1 to 2 to roughly match the cluster_read()
changes committed to the kernel.
* Make hammer_io_indirect_read() use cluster_readcb() instead of breadcb()
so we get read-ahead on this path.
Matthew Dillon [Wed, 8 Jun 2016 22:53:31 +0000 (15:53 -0700)]
kernel - Fix some clustering issues
* Change B_RAM functionality. We were previously setting B_RAM
on the last async buffer and doing some cruft to probe ahead.
Instead, set B_RAM in the middle and use a simple heuristic to
estimate where to pick-up the read-ahead again.
* Clean-up the read-ahead. When the caller of cluster_read() asks for
read-ahead, we do the read-ahead whether or not BMAP says it is
contiguous. All a failed BMAP does now is prevent cluster_rbuild()
from getting called (that is, it doesn't try to gang multiple buffers
together).
When thinking about this, the logical buffer cache sequential heuristic
is telling us that userland is going to read the data, so why stop and
then have to stall on an I/O read later when userland actually reads
the data?
* This will improve pipelining for both hammer1 and hammer2.
Sascha Wildner [Wed, 8 Jun 2016 07:58:03 +0000 (09:58 +0200)]
kqueue.2: Add some info about EVFILT_FS.
Matthew Dillon [Wed, 8 Jun 2016 05:30:00 +0000 (22:30 -0700)]
nvme - Add interrupt coalescing support
* Add interrupt coalescing support. However, disable it in the code for
now by setting its parameters to 0. I tried minimal parameters (time
set to 1 which is 100uS and aggregation threshold set to 4) and it
completely destroyed performance in all my tests on the Intel 750.
Even in tests where the interrupt rate was less than 10,000/sec, the
intel controller is clearly implementing a broken algorithm and is
actually enforcing that 100uS of latency even if the interrupt rate
has not exceeded the rate. So even relatively large transfers had
horrible performance.
So for now the code is in, but its turned off.
Tomohiro Kusumi [Wed, 8 Jun 2016 03:06:40 +0000 (12:06 +0900)]
sys/vfs/hammer: Remove sys/vfs/hammer/hammer_freemap.c
This file does nothing and hasn't done anything since 2008.
> This space reserved for our low-level storage localization manager XXX
The current blockmap code has nothing to do with localization
(which is high-level idea than blockmap), and this probably
won't be implemented in the future as well.
Tomohiro Kusumi [Wed, 8 Jun 2016 02:48:43 +0000 (11:48 +0900)]
sys/vfs/hammer: Add HAMMER_VOL_ALLOC for reserved space after volume header
HAMMER_VOL_ALLOC is where "boot area" starts.
Tomohiro Kusumi [Wed, 8 Jun 2016 02:19:43 +0000 (11:19 +0900)]
sys/kern: Make nlookup() keep ESTALE on retry
This basically doesn't happen with autofs, but keep the return
value with ESTALE instead of updating it with ENOENT on retry.
01:30 (dillon) though you might want to return ESTALE if the second try fails
01:30 (dillon) instead of ENOENT
01:30 (tkusumi) why ?
01:30 (dillon) so the user can distinguish between the two
Tomohiro Kusumi [Wed, 8 Jun 2016 02:13:36 +0000 (11:13 +0900)]
sbin/hammer: err() on readhammerbuf() failure
If readhammerbuf() failed to read 16KB buffer, it should err()
regardless of AssertOnFailure flag.
Continuing with bzero'd buffer won't help anything other than
making a problem difficult to investigate.
Matthew Dillon [Wed, 8 Jun 2016 01:17:14 +0000 (18:17 -0700)]
nvme - Adjust queue mapping
* Add more fu to the manual page.
* Adjust queue mappings. Get rid of the multi-priority read and write
for the optimal mapping (4 queues per cpu). Instead just have 2 (a read
and a write queue), which allows the card to use an optimal mapping
when 31 queues are supported.
Matthew Dillon [Tue, 7 Jun 2016 21:14:46 +0000 (14:14 -0700)]
nvme - Check admin_cap
* Check admin command capabilities, do not attempt to query the controller
list or namespace list if namespace management is not supported.
NOTE: The Intel 750 returns total garbage for unsupported ns management
commands without return any error code in the status.
* Minor man-page fixes.
Sascha Wildner [Tue, 7 Jun 2016 20:48:07 +0000 (22:48 +0200)]
nvme.4: Remove an unneeded .Pp and use .Dx.
Sascha Wildner [Tue, 7 Jun 2016 19:26:54 +0000 (21:26 +0200)]
README.examples: Remove autofs/. It's not installed at the moment.
Tomohiro Kusumi [Tue, 7 Jun 2016 20:13:32 +0000 (05:13 +0900)]
autofs: Add "Donated to DragonFlyBSD by ..." to manpages
Imre Vadász [Tue, 7 Jun 2016 19:50:29 +0000 (21:50 +0200)]
if_iwm - Avoid bus_dmamap_create()/_destroy() calls in iwm_rx_addbuf().
* Instead of doing bus_dmamap_create() and bus_dmamap_destroy() all the
time, create an extra bus_dmamap_t which we can use to safely
try bus_dmamap_load()-ing the new mbuf. On success we just swap the
spare bus_dmamap_t with the data->map of that ring entry.
Sascha Wildner [Tue, 7 Jun 2016 19:20:23 +0000 (21:20 +0200)]
auto_master.5: Fix .Sx markup.
Sepherosa Ziehau [Tue, 7 Jun 2016 10:11:10 +0000 (18:11 +0800)]
de: Install if_init.
This makes 'ifconfig deX inet x.x.x.x' work.
Sascha Wildner [Tue, 7 Jun 2016 07:04:56 +0000 (09:04 +0200)]
kernel/iwm: Fix building without IWM_DEBUG.
Justin C. Sherrill [Tue, 7 Jun 2016 02:46:32 +0000 (22:46 -0400)]
Merge branch 'master' of ssh://crater.dragonflybsd.org/repository/git/dragonfly
Justin C. Sherrill [Tue, 7 Jun 2016 02:45:25 +0000 (22:45 -0400)]
Remove random download note as it no longer happens that way, and clean up.
Imre Vadász [Mon, 6 Jun 2016 20:22:27 +0000 (22:22 +0200)]
if_iwm - Use vap->iv_myaddr instead of ic->ic_macaddr when vap != NULL.
* ic_macaddr is only used for the initial mac address provided by NVM.
We should rather use vap->iv_myaddr when vap != NULL, to allow the
MAC address to be changed via e.g. ifconfig.
Imre Vadász [Sat, 4 Jun 2016 11:21:06 +0000 (13:21 +0200)]
if_iwm - Add support for Intel AC 8260 chipset.
Taken-From: OpenBSD, Linux (iwlwifi driver)
Imre Vadász [Fri, 3 Jun 2016 21:56:47 +0000 (23:56 +0200)]
iwmfw - Add 8000C firmware for Intel AC 8260 support.
Matthew Dillon [Mon, 6 Jun 2016 17:01:45 +0000 (10:01 -0700)]
nvme - Cleanups, limit nirqs
* Cleanups in the manual page.
* Limit nirqs to ncpus + 1. We don't need more than this number for now.
zrj [Mon, 6 Jun 2016 08:28:15 +0000 (11:28 +0300)]
usr.sbin/autofs: Unbreak make depend.
Add back -I{.CURDIR} path for common.l
zrj [Mon, 6 Jun 2016 07:17:05 +0000 (10:17 +0300)]
sys/cpu: Unbreak world.
For now just hide new functions under _KERNEL.
While there, perform some cleanup.
zrj [Mon, 6 Jun 2016 05:58:23 +0000 (08:58 +0300)]
drm/i915: Remove empty header.
Matthew Dillon [Mon, 6 Jun 2016 06:00:37 +0000 (23:00 -0700)]
nvme - Fix minor cpu mapping issues
* Fix some issues with the cpu mapping. cpu 0 was not getting properly
accounted for due to an array overflow bug. And do a few other things.
* With these changes, extints are nicely distributed across all cpus on
large concurrent workloads, and IPIs are minimal-to-none.
Matthew Dillon [Mon, 6 Jun 2016 03:41:00 +0000 (20:41 -0700)]
nvme - Implement MSIX and reverse comq mapping
* Implement MSIX. Map completion queues to cpus via a rotation.
* Adjust the comq mapping code. For now prioritize assigning a 1:1 cpu
mapping for submission and completion queues over creating separate
queues for reads and writes.
* Tested, systat -pv 1 shows this is capable of pushing 50,000+ interrupts
per second on EACH cpu (all 8 in the xeon box I tested), and run
250,000 IOPS x 2 cards (500,000 IOPS) using interrupt based comq handling.
Matthew Dillon [Sun, 5 Jun 2016 22:35:37 +0000 (15:35 -0700)]
ahci/misc - Add manual links
* Add manual links to nvme.4 to some of the other manual pages.
Matthew Dillon [Sun, 5 Jun 2016 22:33:59 +0000 (15:33 -0700)]
ahci - update manual page
* Fixup the copyright and enhance some of the text. The AHCI driver is
really almost a complete rewrite... even a almost-from-scratch rewrite,
so it is appropriate to add the DragonFly copyright to the comments
section.
Matthew Dillon [Sun, 5 Jun 2016 22:33:24 +0000 (15:33 -0700)]
nvme - Add manual page
* Add the nvme(4) manual page and include some general performance metrics.
Matthew Dillon [Sun, 5 Jun 2016 21:24:02 +0000 (14:24 -0700)]
nvme - Iterate disk units for multiple devices
* Just iterate disk units starting at 0 for now, instead of trying to
use the namespace id, preventing collisions when multiple nvme
controllers are present.
Matthew Dillon [Sun, 5 Jun 2016 18:59:47 +0000 (11:59 -0700)]
nvme - Fix b_resid prior to biodone()
* Oops, forgot to set b_resid to 0 on a successful I/O prior to biodone().
Could cause 'dd' tests to EOF early.
Matthew Dillon [Sun, 5 Jun 2016 18:37:43 +0000 (11:37 -0700)]
debug - fix randread
* Fix randread when used with large > 2GB areas. random() was limiting
the range.
Matthew Dillon [Sun, 5 Jun 2016 18:37:06 +0000 (11:37 -0700)]
kernel - Add nvme driver to the kernel build as a module.
* Add the nvme driver to the kernel build as a module.
Matthew Dillon [Sun, 5 Jun 2016 18:29:22 +0000 (11:29 -0700)]
kernel - Add bus_space_read_8() and bus_space_write_8()
* Add bus_space_read_8() and bus_space_write_8() for memory-mapped
handles only (used by nvme).
Matthew Dillon [Sun, 5 Jun 2016 18:28:41 +0000 (11:28 -0700)]
kernel - Add PCIS_STORAGE_NVM
* Add #defines for PCIS_STORAGE_NVM and friends
Matthew Dillon [Sun, 5 Jun 2016 18:04:48 +0000 (11:04 -0700)]
nvme - Flesh out the driver more
* Handle the case where there are an insufficient number of queue entries
available to handle BIOs (tested by forcing maxqe to 4).
* Issue delete queue commands and issue and wait for controller shutdown
on a normal halt/reboot as per spec.
* Disallow new device open()s during unload.
Sascha Wildner [Sun, 5 Jun 2016 17:47:08 +0000 (19:47 +0200)]
automountd(8) et al.: Clean up the Makefile a little bit.
* The current dir is searched automatically for includes.
* WARNS=6 is not necessary since it is the default for usr.bin.
* Add DPADD for make depend.
Sascha Wildner [Sun, 5 Jun 2016 16:21:55 +0000 (18:21 +0200)]
Adjust share/examples/etc/README.examples for autofs.
Sascha Wildner [Sun, 5 Jun 2016 16:15:37 +0000 (18:15 +0200)]
Add some assignments to etc/defaults/rc.conf for autofs.
Sascha Wildner [Sun, 5 Jun 2016 16:14:56 +0000 (18:14 +0200)]
Add autofs to LINT64.
Tomohiro Kusumi [Sun, 5 Jun 2016 17:18:57 +0000 (02:18 +0900)]
sys/vfs/autofs: Remove .vfs_sync = vfs_stdsync,
This should have been removed by
ef560bee along with other fs.
Imre Vadász [Sun, 5 Jun 2016 14:58:53 +0000 (16:58 +0200)]
if_iwm - Fix m_defrag() usage. Copy-pasto when copying code from OpenBSD.
Imre Vadász [Sun, 5 Jun 2016 12:31:01 +0000 (14:31 +0200)]
if_iwm - Add and use iwm_is_valid_ether_addr() function.
* While there use IEEE80211_ADDR_EQ and IEEE80211_ADDR_COPY macros where
appropriate.
Imre Vadász [Sun, 5 Jun 2016 11:40:31 +0000 (13:40 +0200)]
if_iwm - Avoid leaking memory, and fix error handling in iwm_rx_addbuf().
Imre Vadász [Sat, 4 Jun 2016 12:49:12 +0000 (14:49 +0200)]
if_iwm - When transitioning to INIT, vap->iv_newstate will just ignore arg.
* Just fix the comment to be more helpful here.
Tomohiro Kusumi [Sun, 5 Jun 2016 11:25:02 +0000 (20:25 +0900)]
sbin/hammer: Cleanup on
aac2051d
The if conditional wasn't necessary after
aac2051d.
Sascha Wildner [Sun, 5 Jun 2016 10:35:54 +0000 (12:35 +0200)]
kstrdup.9: Mention kstrndup().
Matthew Dillon [Sun, 5 Jun 2016 06:45:59 +0000 (23:45 -0700)]
kernel - Flesh out nvme interrupts (non-msi for now)
* MSI/MSIX not working currently so just turn it off for the moment.
* Normal interrupt now operational. Implement a real nvme_intr() and
Cleanup some of our polling hacks now that interrupts work.
* Rearrange shutdown so admin polling continues to work while the devfs
disk infrastructure is being torn down.
* Tests with this little samsung mini-pcie nvme card:
120,000 IOPS (concurrent 512 byte dd)
1.5 GBytes/sec (sequential read uncompressable file through filesystem)
1.5 GBytes/sec reading via tar.
test40# ls -la /mnt2/test.dat
-rw-r--r-- 1 root wheel
7516192768 Jun 4 23:50 /mnt2/test.dat
test40# time tar cf /dev/null /mnt2/test.dat
0.062u 3.937s 0:04.84 82.4% 28+69k 28642+0io 0pf+0w (from media)
0.164u 1.367s 0:01.81 83.9% 29+71k 978+0io 0pf+0w (from buffer cache)
Matthew Dillon [Sun, 5 Jun 2016 02:35:29 +0000 (19:35 -0700)]
kernel - Initial native DragonFly NVME driver commit
* Initial from-scratch NVME implementation using the NVM Express 1.2a
chipset specification pdf. Nothing ported from anywhere else.
Basic implementation.
* Not yet connected to the build, interrupts are not yet functional
(it currently just polls at 100hz for testing), some additional error
handling is needed, and we will need ioctl support and a userland utility
to do various administrative tasks like formatting.
* Near full header spec keyed in including the bits we don't use (yet).
* Full SMP BIO interface and four different queue topologies implemented
depending on how many queues the chipset lets us create. The best is
ncpus * 4 queues, i.e. (low, high priority) x (read, write) per cpu.
The second best is just (low, high priority) x (read, write) shared between
all cpus.
Extremely low BIO overhead. Full strategy support and beginnings of
optimizations for low-latency I/Os (currently a hack).
* Initial testing with multiple concurrent sequential dd's on a little
samsung nvme mini-pcie card:
1.2 GBytes/sec 16KB
2.0 GBytes/sec 32KB
2.5 GBytes/sec 64KB
Imre Vadász [Sat, 4 Jun 2016 21:12:11 +0000 (23:12 +0200)]
if_iwm - Fix iwm_mvm_lmac_scan_fill_channels(), only add 11b and 11a chans.
Fixes breakage introduced in
edfc8a0769eef4f5d883c22ee95a6ec79a1d85c6,
after which only channels 36,40,44,48,52,56 from the 5GHz band were
scanned, since all the 11b channels were added twice.
Imre Vadász [Sat, 4 Jun 2016 10:52:08 +0000 (12:52 +0200)]
if_iwm - Make some functions static in if_iwm_led.c, no functional change.
* While there clean up two more cases of
if (ret)
return ret;
return 0;
Tomohiro Kusumi [Sat, 4 Jun 2016 16:23:27 +0000 (01:23 +0900)]
sbin/hammer: Fix used bytes for zone15
zone15 (physically not available zone) is always 100% used,
so bytes_used for zone15 big-block should be 8MB istead of 0.
Has nothing to do with
http://lists.dragonflybsd.org/pipermail/users/2016-June/thread.html#249681
Tomohiro Kusumi [Sat, 4 Jun 2016 15:27:25 +0000 (00:27 +0900)]
sbin/hammer: Make hammer blockmap check offset/space
Based on this report, which may not be a filesystem issue.
http://lists.dragonflybsd.org/pipermail/users/2016-June/thread.html#249681
Tomohiro Kusumi [Sat, 4 Jun 2016 10:36:36 +0000 (19:36 +0900)]
sbin/hammer: Remove debug printfs
These debug printfs are no longer really used for anything and
probably by anyone as /sbin/hammer is stable enough.
These printfs generate tons of output to stderr, so it would be
more convenient to just add some printfs to wherever necessary
or use gdb for whatever issues found, rather than using these.
Tomohiro Kusumi [Fri, 3 Jun 2016 11:15:14 +0000 (20:15 +0900)]
sbin/hammer: Cleanup zone statistics functions
Also remove a debug function hammer_dump_layer1_bits().
Imre Vadász [Fri, 3 Jun 2016 21:08:15 +0000 (23:08 +0200)]
if_run - Add missing RUN_LOCK/RUN_UNLOCK around a run_get_tsf() call.
* The RUN_LOCK_ASSERT in run_do_request() got triggered when running
e.g. "tcpdump -i wlan0 -y IEEE802_11_RADIO" on the interface.
Sascha Wildner [Fri, 3 Jun 2016 17:34:24 +0000 (19:34 +0200)]
Create the /etc/autofs directory via mtree.
Sascha Wildner [Fri, 3 Jun 2016 08:51:29 +0000 (10:51 +0200)]
kernel/autofs: Add some missing files to the Makefile.
Tomohiro Kusumi [Sun, 29 May 2016 04:59:24 +0000 (13:59 +0900)]
usr.sbin/autofs: Workaround namecache bug after unmount
autounmountd gets affected by the namecache bug mentioned in
https://bugs.dragonflybsd.org/issues/2908.
This can be worked around by stat(2) (or any syscall that resolves
the name once again) after unmount failed with EBUSY.
Without this workaround, a process at automounted filesystem will
see ENOTCONN after autounmountd's attempt to unmount. This must
not be removed until the namecache bug is fixed.
Tomohiro Kusumi [Tue, 24 May 2016 15:09:48 +0000 (00:09 +0900)]
sys/kern: Don't implement .vfs_sync unless sync is supported
The only reason filesystems without requirement of syncing
(e.g. no backing storage) need to implement .vfs_sync is because
those fs need a sync with a return value of 0 on unmount.
If unmount allows sync with return value of EOPNOTSUPP for fs
that do not support sync, those fs no longer have to implement
.vfs_sync with vfs_stdsync() only to pass dounmount().
The drawback is when there is a sync (other than vfs_stdnosync)
that returns EOPNOTSUPP for real errors. The existing fs in
DragonFly don't do this (and shouldn't either).
Also see https://bugs.dragonflybsd.org/issues/2912.
# grep "\.vfs_sync" sys/vfs sys/gnu/vfs -rI | grep vfs_stdsync
sys/vfs/udf/udf_vfsops.c: .vfs_sync = vfs_stdsync,
sys/vfs/portal/portal_vfsops.c: .vfs_sync = vfs_stdsync
sys/vfs/devfs/devfs_vfsops.c: .vfs_sync = vfs_stdsync,
sys/vfs/isofs/cd9660/cd9660_vfsops.c: .vfs_sync = vfs_stdsync,
sys/vfs/autofs/autofs_vfsops.c: .vfs_sync = vfs_stdsync, /* for unmount(2) */
sys/vfs/tmpfs/tmpfs_vfsops.c: .vfs_sync = vfs_stdsync,
sys/vfs/dirfs/dirfs_vfsops.c: .vfs_sync = vfs_stdsync,
sys/vfs/ntfs/ntfs_vfsops.c: .vfs_sync = vfs_stdsync,
sys/vfs/procfs/procfs_vfsops.c: .vfs_sync = vfs_stdsync
sys/vfs/hpfs/hpfs_vfsops.c: .vfs_sync = vfs_stdsync,
sys/vfs/nullfs/null_vfsops.c: .vfs_sync = vfs_stdsync,
Tomohiro Kusumi [Wed, 18 May 2016 08:01:32 +0000 (17:01 +0900)]
autofs: Port autofs from FreeBSD
Brought in basically from
FreeBSD@GitHub
cac9beab7d53f0c37ce2a2a1b893be59028928f4
with lots of changes.
Note that this commit isn't necessarily 1:1 with above commit.
Kernel code is basically a rewrite based on the FreeBSD code.
Userspace is basically 1:1 with FreeBSD except that lots of small
changes (including related commits listed below) were necessary.
This is due to autofs being dependent on FreeBSD specific interface,
command options and such.
For userspace, note that non-functional stuff (e.g. whitespace
warnings via git am) are intentionally left to be 1:1 with FreeBSD.
Userspace is basically portable, so don't try to obfuscate the real
changes made for DragonFly by fixing these for now till things are
considered stable unless it's a bug from FreeBSD.
Summary of newly added or modified files.
- sys/vfs/autofs - autofs filesystem
- usr.sbin/autofs - autofs userspace command and daemons
- etc/ - configuration files and manpages
- others - changes in misc subsystems (not independent of autofs)
Related DragonFly commits.
- usr.sbin/autofs: Workaround namecache bug after unmount
- sys/kern: Don't implement .vfs_sync unless sync is supported
- user.sbin/fstyp: Port fstyp from FreeBSD
- sys/kern: Retry nlookup if nresolve returned ESTALE
- sys/sys: Fix IOCPARM_MAX
- sys/sys: Extend IOCPARM_MAX
- usr.bin/showmount: Add -E option
- sbin/mount_nfs: Add -o retrycnt= option
- sys/kern: Add kqueue EVFILT_FS
- sys/kern: Add kstrndup()
Related DragonFly PRs.
- https://bugs.dragonflybsd.org/issues/2900
- https://bugs.dragonflybsd.org/issues/2901
- https://bugs.dragonflybsd.org/issues/2905
- https://bugs.dragonflybsd.org/issues/2907
- https://bugs.dragonflybsd.org/issues/2908
- https://bugs.dragonflybsd.org/issues/2909
- https://bugs.dragonflybsd.org/issues/2912
- https://bugs.dragonflybsd.org/issues/2913
- https://bugs.dragonflybsd.org/issues/2914
Other related resource.
- http://lists.dragonflybsd.org/pipermail/users/2016-May/thread.html#249556
- http://lists.dragonflybsd.org/pipermail/users/2016-June/thread.html#249680
- https://www.dragonflydigest.com/2016/05/06/18066.html
Tomohiro Kusumi [Wed, 18 May 2016 07:58:01 +0000 (16:58 +0900)]
usr.sbin/fstyp: Port fstyp from FreeBSD
Brought in from FreeBSD@GitHub
3e3c248f832f796881ac2ce0d45049552e8d9a9b.
Needed by autofs.
Removed ZFS and GEOM support.
Added HAMMER support.
Note that fstyp has been changed to test a filesystem type vector
fsvtypes[] in addition to the existing fstypes[]. This is a bit
ad-hoc, but was necessary to support partial volume(s) for HAMMER
without a major code rewrite.