freebsd.git
2 years agoSkip snapshot in zfs_iter_mounted()
youzhongyang [Wed, 20 Oct 2021 23:07:19 +0000 (19:07 -0400)]
Skip snapshot in zfs_iter_mounted()

The intention of the zfs_iter_mounted() is to traverse the dataset
and its descendants, not the snapshots. The current code can cause
a mounted snapshot to be included and thus zfs_open() on the snapshot
with ZFS_TYPE_FILESYSTEM would print confusing message such as "cannot
open 'rpool/fs@snap': snapshot delimiter '@' is not expected here".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #12447
Closes #12448

2 years agovdev_id: Fix enclosure_symlinks feature
Tony Hutter [Wed, 20 Oct 2021 22:48:04 +0000 (15:48 -0700)]
vdev_id: Fix enclosure_symlinks feature

The vdev_id.conf "enclosure_symlinks" option persistently creates
and maps /dev/by-enclosure symlinks to dynamic /dev/sg* devices.

This patch fixes two issues:

1. The enclosure_symlinks feature was accidentally broken in:

   vdev_id: Support daisy-chained JBODs in multipath mode

2. Even when working, the feature numbered the enclosure
   sequentially rather than by HBA port number.  That meant that
   if a port was down or didn't appear in sysfs, then the
   enclosure_sumlinks numbers would be numbered wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12660

2 years agolibshare: nfs: pass through ipv6 addresses in bracket notation
felixdoerre [Wed, 20 Oct 2021 17:40:00 +0000 (19:40 +0200)]
libshare: nfs: pass through ipv6 addresses in bracket notation

Recognize when the host part of a sharenfs attribute is an ipv6
Literal and pass that through without modification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Dörre <felix@dogcraft.de>
Closes: #11171
Closes #11939
Closes: #1894

2 years agozpool should call zfs_nicestrtonum() with non-NULL handle
Toomas Soome [Wed, 20 Oct 2021 17:34:34 +0000 (20:34 +0300)]
zpool should call zfs_nicestrtonum() with non-NULL handle

When zfs_nicestrtonum() is called and there will be an error,
the message is left in libzfs handle, if provided. We can use
this message, to provide better feedback for user.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #12650

2 years agoRemove code duplication
Pawel Jakub Dawidek [Mon, 18 Oct 2021 23:50:33 +0000 (16:50 -0700)]
Remove code duplication

Remove code duplication by moving code responsible for partial block
zeroing to a separate function: dnode_partial_zero().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12627

2 years agoNotify on UNAVAIL statechange
Francesco Mazzoli [Fri, 15 Oct 2021 22:57:23 +0000 (00:57 +0200)]
Notify on UNAVAIL statechange

`UNAVAIL` is maybe not quite as concerning as `DEGRADED`, but still an
event of notice, in my opinion. For example it is triggered when a
drive goes missing.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Francesco Mazzoli <f@mazzo.li>
Closes #12629
Closes #12630

2 years agozdb: fix overflow of time estimation
Teodor Spæren [Fri, 15 Oct 2021 22:55:34 +0000 (00:55 +0200)]
zdb: fix overflow of time estimation

The calculation of estimated time remaining in zdb -cc could overflow,
as reported in #10666. This patch fixes this, by using uint64_t instead
of ints in the calculations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Teodor Spæren <teodor@sparen.no>
Closes #10666
Closes #12610

2 years agoRemove FreeBSD's local copy of the dmu_buf_hold_array() function
Pawel Jakub Dawidek [Wed, 13 Oct 2021 18:01:01 +0000 (11:01 -0700)]
Remove FreeBSD's local copy of the dmu_buf_hold_array() function

Make the main dmu_buf_hold_array() function non-static.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12628

2 years agozio: use unsigned values for enum
Teodor Spæren [Mon, 11 Oct 2021 17:58:06 +0000 (19:58 +0200)]
zio: use unsigned values for enum

cppcheck complains about the use of 1 << 31, because enums are signed
ints which cannot represent this. As discussed in issue #12611, it
appears that with C99, we can use an unsiged int for the enum, on most
platforms.

I've crafted this commit for just the include/sys/zio.h header, as it's
the only one with a shift of 31. If this is something we want to adopt
in the rest of the project, I will go through and apply it to the rest
of the project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Teodor Spæren <teodor@sparen.no>
Closes #12611
Closes #12615

2 years agoExport minimal zfs_refcount interfaces
Brian Behlendorf [Mon, 11 Oct 2021 17:54:39 +0000 (10:54 -0700)]
Export minimal zfs_refcount interfaces

Lustre makes light use of the zfs_refcount interfaces which
isn't a problem when using a non-debug build of OpenZFS. However,
when debugging is enabled the required symbols are not exported.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12613

2 years agoZTS: Add known exceptions
Brian Behlendorf [Mon, 11 Oct 2021 17:52:32 +0000 (10:52 -0700)]
ZTS: Add known exceptions

Add the following test failures to the exception list for FreeBSD
to ensure we notice new unexpected failures.

   pool_checkpoint/checkpoint_big_rewind
   pool_checkpoint/checkpoint_indirect

And the following for Linux.

   zvol/zvol_misc/zvol_misc_snapdev

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12621
Issue #12622
Issue #12623
Closes #12624

2 years agoZTS: deadman_sync fix
Brian Behlendorf [Mon, 11 Oct 2021 17:49:13 +0000 (10:49 -0700)]
ZTS: deadman_sync fix

In the CI environment it's possible for events to be slightly
delayed resulting in 4, instead of 5, events appearing in the
log file.  This isn't a problem and should be considered a
success to avoid false positive test results.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12625

2 years agoinitramfs: use correct dataset for rootfs on rollback=1
nachtgeist [Fri, 8 Oct 2021 18:16:31 +0000 (18:16 +0000)]
initramfs: use correct dataset for rootfs on rollback=1

When booting with root=zfs:rpool/myrootfs@foosnapshot rollback=1,
myrootfs and its descendants get rolled back to foosnapshot, however
ZFS_BOOTFS still contains myrootfs@foosnapshot instead of the
actually desired value of myrootfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Daniel Reichelt <hacking@nachtgeist.net>
Closes #12585
Closes #12586

2 years agoFail invalid incremental recursive send gracefully
Ryan Moeller [Fri, 8 Oct 2021 18:14:26 +0000 (14:14 -0400)]
Fail invalid incremental recursive send gracefully

zfs send -R -i snap1 pool/ds@snap1 is an invalid invocation of zfs send
because the incremental source and target snapshots are the same.  We
have an error message for this condition, but we don't make it there
because of a failed assert while iterating through the dataset's
snapshots.

Check for NULL to avoid the assert so we can make it to the error
message.

Test this form of invalid send invocation in rsend tests.  Fix the
rsend_016_neg test while here: log_neg itself doesn't fail the test,
and writing to /dev/null is not supported on all Linux kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11121
Closes #12533

2 years agoCorrect refcount_add in dmu_zfetch
Rich Ercolani [Fri, 8 Oct 2021 18:10:34 +0000 (14:10 -0400)]
Correct refcount_add in dmu_zfetch

refcount_add_many(foo,N) is not the same as
for (i=0; i < N; i++) { refcount_add(foo); }

Unfortunately, this is only actually true with debug kernels and
reference_tracking_enable=1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12589
Closes #12602

2 years agoarcstat: Fix integer division with python3
Valmiky Arquissandas [Fri, 8 Oct 2021 15:32:27 +0000 (16:32 +0100)]
arcstat: Fix integer division with python3

The arcstat script requests compatibility with python2 and python3, but
PEP 238 modified the / operator and results in erroneous output when
run under python3.

This commit replaces instances of / with //, yielding the expected
result in both versions of Python.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Valmiky Arquissandas <foss@kayvlim.com>
Closes #12603

2 years agoSimplify and document OpenZFS library dependencies
Brian Behlendorf [Thu, 7 Oct 2021 17:31:26 +0000 (10:31 -0700)]
Simplify and document OpenZFS library dependencies

For those not already familiar with the code base it can be a
challenge to understand how the libraries are laid out.  This
has sometimes resulted in functionality being added in the
wrong place.  To help avoid that in the future this commit
documents the high-level dependencies for easy reference in
lib/Makefile.am.  It also simplifies a few things.

- Switched libzpool dependency on libzfs_core to libzutil.
  This change makes it clear libzpool should never depend
  on the ioctl() functionality provided by libzfs_core.

- Moved zfs_ioctl_fd() from libzutil to libzfs_core and
  renamed it lzc_ioctl_fd().  Normal access to the kmods
  should all be funneled through the libzfs_core library.
  The sole exception is the pool_active() which was updated
  to not use lzc_ioctl_fd() to remove the libzfs_core
  dependency.

- Removed libzfs_core dependency on libzutil.

- Removed the lib/libzfs/os/freebsd/libzfs_ioctl_compat.c
  source file which was all dead code.

- Removed libzfs_core dependency from mkbusy and ctime
  test utilities.  It was only needed for some trivial
  wrapper functions and that code is easy to replicate
  to shed the unneeded dependency.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12602

2 years agoRemove zdb and libzpool from initramfs image
Serapheim Dimitropoulos [Thu, 7 Oct 2021 16:52:03 +0000 (09:52 -0700)]
Remove zdb and libzpool from initramfs image

= Motivation

At Delphix we are heavy users of kernel crash dumps that are captured
through a crash kernel that is spawned whenever the main kernel panics.
The way that this works internally is that a certain amount of memory is
reserved while the main system is running so the initramfs of the crash
kernel can be loaded when a panic occurs.

In order to keep reserved memory at minimum we've been historically
trying to identify the binaries that are part of the kernel's initramfs
that are big and finding ways of either making them smaller or do not
include them in the initramfs image. An example is always stripping the
DWARF info of the ZFS kernel module copy that is included in the
initramfs image of both our running and our crash kernel (the difference
in size there is 76MB vs 4MB).

We've recently identified that libzpool has been the largest binary in
our initramfs images - currently sized around 17MB.

= This Patch

The ZFS scripts do not explicitly copy libzpool to initramfs. They copy
zdb which pulls in libzpool as a dependency. Given that both zdb and
libzpool are not really essential for initramfs (e.g. we'll still have
access to the once the root filesystem is unpacked) this patch removes
them from initramfs.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #12616

2 years agoDocument additional -c caveat
Rich Ercolani [Tue, 5 Oct 2021 18:48:17 +0000 (14:48 -0400)]
Document additional -c caveat

One might expect "send data as it is on disk, and cannot trigger
compression changes" to imply "does not attempt to compress data
that was not compressed on the sender."

One would be mistaken.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12570

2 years agoRescan enclosure sysfs path on import
Tony Hutter [Mon, 4 Oct 2021 19:32:16 +0000 (12:32 -0700)]
Rescan enclosure sysfs path on import

When you create a pool, zfs writes vd->vdev_enc_sysfs_path with the
enclosure sysfs path to the fault LEDs, like:

    vdev_enc_sysfs_path = /sys/class/enclosure/0:0:1:0/SLOT8

However, this enclosure path doesn't get updated on successive imports
even if enclosure path to the disk changes.  This patch fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11950
Closes #12095

2 years agoReject zfs send -RI with nonexistent fromsnap
Rich Ercolani [Mon, 4 Oct 2021 17:17:30 +0000 (13:17 -0400)]
Reject zfs send -RI with nonexistent fromsnap

Right now, zfs send -I dataset@nonexistent dataset@existent fails, but
zfs send -RI dataset@nonexistent dataset@existent does not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12574
Closes #12575

2 years agoZFS: Remove a redundant if condition (#12598)
Attila Fülöp [Sat, 2 Oct 2021 18:50:57 +0000 (20:50 +0200)]
ZFS: Remove a redundant if condition (#12598)

Commit 0c03d21ac99ebdbe left in a redundant if condition while
removing some code. Just remove it.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12598

2 years agoProper support for DESTDIR and INSTALL_MOD_PATH
José Luis Salvador Rufo [Fri, 1 Oct 2021 17:44:34 +0000 (19:44 +0200)]
Proper support for DESTDIR and INSTALL_MOD_PATH

The environment variables DESTDIR and INSTALL_MOD_PATH must
be mutually exclusive.

https://www.gnu.org/prep/standards/html_node/DESTDIR.html
https://www.kernel.org/doc/Documentation/kbuild/modules.txt

This issue was discussed in this Buildroot thread:
https://lists.buildroot.org/pipermail/buildroot/2021-August/621350.html

I saw this behavior in other different projects, as:

- Yocto Project:
  https://www.yoctoproject.org/pipermail/meta-freescale/2013-August/004307.html

- Google IA Coral:
  https://coral.googlesource.com/linux-imx-debian/+/refs/heads/master/debian/rules

For the above reasons, INSTALL_MOD_PATH will be set as DESTDIR
by default.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Signed-off-by: Romain Naour <romain.naour@gmail.com>
Closes #12577

2 years agoZTS: Minimize udev_wait in zvol_misc tests
Ryan Moeller [Fri, 1 Oct 2021 15:36:02 +0000 (11:36 -0400)]
ZTS: Minimize udev_wait in zvol_misc tests

The zvol_misc tests, in particular zvol_misc_volmode, make use of a
common udev_wait function to wait for zvol devices in /dev to quiesce
on Linux.  On other platforms this function currently only sleeps for
one second before returning.  This is insufficient, and
zvol_misc_volmode has been flaky on FreeBSD as a result.

Replace udev_wait with block_device_wait, passing through the optional
device parameter where possible.  Rearrange a few checks to strengthen
the verifications we are making and avoid unnecessarily sleeping.  We
must keep udev_wait in a couple places to pass in Github CI workflows.
Remove zvol_misc_volmode from the maybe failing tests on FreeBSD in
zts-report.py.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12583

2 years agolibspl: fix warning about missing spl_pagesize declaration
Érico Nogueira Rolim [Fri, 24 Sep 2021 22:59:26 +0000 (19:59 -0300)]
libspl: fix warning about missing spl_pagesize declaration

Though it's unlikely anyone will alter its signature, avoids any
possible type mismatch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Érico Nogueira <erico.erc@gmail.com>
Closes #12567

2 years agoAssorted parameter changes for performance tests
John Wren Kennedy [Tue, 21 Sep 2021 22:17:36 +0000 (16:17 -0600)]
Assorted parameter changes for performance tests

* Add async runs for sequential_writes, random_readwrite_fixed and
  random_writes
* Remove some larger block sizes that give similar results to others
* Remove nthreads == 4 from random_writes_zil test

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12576

2 years agoHandle partial reads in zfs_read
Rich Ercolani [Mon, 20 Sep 2021 17:30:50 +0000 (13:30 -0400)]
Handle partial reads in zfs_read

Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12370
Closes #12509
Closes #12516

2 years agoUpstream: unmount snapshots before destroying them on macOS
Jorgen Lundman [Mon, 20 Sep 2021 15:29:59 +0000 (00:29 +0900)]
Upstream: unmount snapshots before destroying them on macOS

Add function zfs_destroy_snaps_nvl_os() call. The main issue is that
macOS needs to unmount any mounted snapshots before they can be
destroyed. Other platforms can handle this in the kernel, but sending
a storm of zed events to unmount seems undesirable when we can do it
in userland to start with.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: ilovezfs <ilovezfs@icloud.com>
Closes #12550

2 years agoAdded test for being able to read various variants of zstd
Rich Ercolani [Mon, 20 Sep 2021 15:08:20 +0000 (11:08 -0400)]
Added test for being able to read various variants of zstd

As detailed in #12022 and #12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12030

2 years agoReally zero the zero page
Alexander Motin [Fri, 17 Sep 2021 17:17:18 +0000 (13:17 -0400)]
Really zero the zero page

While switching abd_zero_buf allocation KPI I've missed the fact
that kmem_zalloc() zeroed the allocation, while kmem_cache_alloc()
does not.  Add explicit bzero() after it.

I don't think it should have caused real problems, but leaking one
memory page content all over the pool is not good.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12569

2 years agoAvoid panic in case of pool errors and missing L2ARC
George Amanakis [Thu, 16 Sep 2021 16:40:15 +0000 (18:40 +0200)]
Avoid panic in case of pool errors and missing L2ARC

In case an ARC buffer is allocated only on L2ARC, and there are
underlying errors in a pool with the cache device in faulty state, a
panic can occur in arc_read_done()->arc_hdr_destroy()->
arc_hdr_l2arc_destroy()->arc_hdr_clear_flags() when trying to free
the ARC buffer.

Fix this by discarding the buffer's identity in arc_hdr_destroy(), in
case the buffer is not empty, before calling arc_hdr_l2hdr_destroy().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12392

2 years agoLinux 5.14 compat: META
Brian Behlendorf [Wed, 15 Sep 2021 21:09:31 +0000 (14:09 -0700)]
Linux 5.14 compat: META

Increase the Linux-Maximum version in the META file to 5.14.
All of the required compatibility patches have been merged
and the 5.14 kernel has been officially released.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12565

2 years agoTemporarily use root credentials to mount snapshots in .zfs
Allan Jude [Tue, 14 Sep 2021 23:10:00 +0000 (19:10 -0400)]
Temporarily use root credentials to mount snapshots in .zfs

When mounting a snapshot in the .zfs/snapshots control directory,
temporarily assume roots credentials to perform the VFS_MOUNT().

This allows regular users and users inside jails to access these
snapshots.

The regular usermount code is not helpful here, since it requires
that the user performing the mount own the mountpoint, which won't
be the case for .zfs/snapshot/<snapname>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Modirum MDPay
Sponsored-By: Klara Inc.
Closes #11312

2 years agoUse fallthrough macro
Brian Behlendorf [Tue, 14 Sep 2021 16:17:54 +0000 (09:17 -0700)]
Use fallthrough macro

As of the Linux 5.9 kernel a fallthrough macro has been added which
should be used to anotate all intentional fallthrough paths.  Once
all of the kernel code paths have been updated to use fallthrough
the -Wimplicit-fallthrough option will because the default.  To
avoid warnings in the OpenZFS code base when this happens apply
the fallthrough macro.

Additional reading: https://lwn.net/Articles/794944/

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12441

2 years agoIterate encrypted clones at zvol_create_minor
Jorgen Lundman [Mon, 13 Sep 2021 20:27:07 +0000 (05:27 +0900)]
Iterate encrypted clones at zvol_create_minor

Userland figures out which encryption-root keys are required to load,
and issues ZFS_IOC_LOAD_KEY.
The tail section of spa_keystore_load_wkey() will call
zvol_create_minors() on the encryption-root object.

Any clones of the encrypted zvol will not be plumbed. This commits
adds additional logic to detect if zvol has clones, and is encrypted,
then adds these to the list of zvols to call zvol_create_minors() on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12471

2 years agoFixed data integrity issue when underlying disk returns error
Arun KV [Mon, 13 Sep 2021 20:02:39 +0000 (01:32 +0530)]
Fixed data integrity issue when underlying disk returns error

Errors in zil_lwb_write_done() are not propagated to
zil_lwb_flush_vdevs_done() which can result in zil_commit_impl()
not returning an error to applications even when zfs was not able
to write data to the disk.

Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to
allow errors to propagate and consolidate the error handling for
flush and write errors to a single location (rather than having
error handling split between the "write done" and "flush done"
handlers).

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Arun KV <arun.kv@datacore.com>
Closes #12391
Closes #12443

2 years agoZTS: Waiting for zvols to be available
Brian Behlendorf [Mon, 13 Sep 2021 19:18:01 +0000 (12:18 -0700)]
ZTS: Waiting for zvols to be available

This is a follow up patch for PR #12515 which addresses some
additional ZTS tests which are unreliable are should explicitly
wait for the required zvols to be available.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: @Theo13111
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12553

2 years agoVerify embedded blkptr's in arc_read()
Brian Behlendorf [Fri, 10 Sep 2021 01:02:07 +0000 (18:02 -0700)]
Verify embedded blkptr's in arc_read()

The block pointer verification check in arc_read() should also
cover embedded block pointers.  While highly unlikely, accessing
a damaged block pointer can result in panic.  To further harden
the code extend the existing check to include embedded block
pointers and add a comment explaining the rational for this
sanity check.  Lastly, correct a flaw in zfs_blkptr_verify()
so the error count is checked even when checking a untrusted
config to verify the non-pool-specific portions of a block
pointer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12535

2 years agoUpstream: Add snapshot and zvol events
Jorgen Lundman [Thu, 9 Sep 2021 17:44:21 +0000 (02:44 +0900)]
Upstream: Add snapshot and zvol events

For kernel to send snapshot mount/unmount events to zed.

For kernel to send symlink creates/removes on zvol plumbing.
(/dev/run/dsk/zvol/$pool/$zvol -> /dev/diskX)

If zed misses the ENODEV, all errors after are EINVAL. Treat any error
as kernel module failure.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12416

2 years agoLinux 5.15 compat: get_acl()
Brian Behlendorf [Thu, 9 Sep 2021 16:38:35 +0000 (09:38 -0700)]
Linux 5.15 compat: get_acl()

Kernel commits

332f606b32b6 ovl: enable RCU'd ->get_acl()
0cad6246621b vfs: add rcu argument to ->get_acl() callback

Added compatibility code to detect the new ->get_acl() interface
and correctly handle the case where the new rcu argument is set.

Reviewed-by: Coleman Kane <ckane@colemankane.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12548

2 years agoAllow sending corrupt snapshots even if metadata is corrupted
Allan Jude [Thu, 9 Sep 2021 14:17:31 +0000 (10:17 -0400)]
Allow sending corrupt snapshots even if metadata is corrupted

When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
Closes #12541

2 years agoarc: Drop an incorrect assert
Rich Ercolani [Wed, 8 Sep 2021 21:00:03 +0000 (17:00 -0400)]
arc: Drop an incorrect assert

Unfortunately, there was an overzealous assertion that was (in pretty
specific circumstances) false, causing failure.  This assertion was
added in error, so we're removing it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #9897
Closes #12020
Closes #12246

2 years agoUpstream: zdb inode mapping
Jorgen Lundman [Wed, 8 Sep 2021 20:56:04 +0000 (05:56 +0900)]
Upstream: zdb inode mapping

Unfortunately macOS reserves inode ID numbers 0-15, and we can
not used them. In macOS port we simply map them really high IDs.

Normally this is hidden inside the _os implementation, but this is
the one place in the common source files.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12530

2 years agoCompressed receive with different ashift can result in incorrect PSIZE on disk
Paul Dagnelie [Wed, 8 Sep 2021 20:52:28 +0000 (13:52 -0700)]
Compressed receive with different ashift can result in incorrect PSIZE on disk

We round up the psize to the nearest multiple of the asize or to the
lsize, whichever is smaller. Once that's done, we allocate a new
buffer of the appropriate size, zero the tail, and copy the data
into it. This adds a small performance cost to these kinds of writes,
but fixes the bookkeeping problems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12522
Closes #8462

2 years agoLinux 5.15 compat: standalone <linux/stdarg.h>
Alexander [Wed, 8 Sep 2021 19:59:43 +0000 (21:59 +0200)]
Linux 5.15 compat: standalone <linux/stdarg.h>

Kernel commits

39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers")
c0891ac15f04 ("isystem: ship and use stdarg.h")
564f963eabd1 ("isystem: delete global -isystem compile option")

(for now can be found in linux-next.git tree, will land into the
 Linus' tree during the ongoing 5.15 cycle with one of akpm merges)

removed the -isystem flag and disallowed the inclusion of any
compiler header files. They also introduced a minimal
<linux/stdarg.h> as a replacement for <stdarg.h>.
include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes
<stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h>
and include it instead of the compiler's one to prevent module
build breakage.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12531

2 years agoLinux 5.15 compat: block device readahead
Brian Behlendorf [Wed, 8 Sep 2021 15:03:13 +0000 (08:03 -0700)]
Linux 5.15 compat: block device readahead

The 5.15 kernel moved the backing_dev_info structure out of
the request queue structure which causes a build failure.

Rather than look in the new location for the BDI we instead
detect this upstream refactoring by the existance of either
the blk_queue_update_readahead() or disk_update_readahead()
functions.  In either case, there's no longer any reason to
manually set the ra_pages value since it will be overridden
with a reasonable default (2x the block size) when
blk_queue_io_opt() is called.

Therefore, we update the compatibility wrapper to do nothing
for 5.9 and newer kernels.  While it's tempting to do the
same for older kernels we want to keep the compatibility
code to preserve the existing behavior.  Removing it would
effectively increase the default readahead to 128k.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12532

2 years agoDetect iSCSI in the zpool cmd vdev media script
Don Brady [Thu, 2 Sep 2021 23:11:53 +0000 (17:11 -0600)]
Detect iSCSI in the zpool cmd vdev media script

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #12206

2 years agoCI: don't install abigail-tools
George Melikov [Tue, 31 Aug 2021 20:56:45 +0000 (23:56 +0300)]
CI: don't install abigail-tools

We use docker image instead.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoUpdate ABI files via new libabigail version
George Melikov [Tue, 31 Aug 2021 19:26:30 +0000 (22:26 +0300)]
Update ABI files via new libabigail version

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoLibabigail: make .abi files more consistent
George Melikov [Tue, 31 Aug 2021 18:52:05 +0000 (21:52 +0300)]
Libabigail: make .abi files more consistent

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoCI: use fresh libabigail via docker image
George Melikov [Tue, 31 Aug 2021 17:53:12 +0000 (20:53 +0300)]
CI: use fresh libabigail via docker image

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoCheck for libabigail version
George Melikov [Tue, 31 Aug 2021 17:49:29 +0000 (20:49 +0300)]
Check for libabigail version

We need to use 1.8.0+ version, older versions
may segfault and give inconsistent results.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529

2 years agoautoconf: allow Release to contain hyphen
Jorgen Lundman [Thu, 2 Sep 2021 13:51:29 +0000 (22:51 +0900)]
autoconf: allow Release to contain hyphen

To avoid clashing with tags and releases, we'll use "zfs-macOS".

Meta:          1
Name:          zfs-macOS

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12437

2 years agoZTS: Remove exceptions for flaky zhack on FreeBSD
Ryan Moeller [Wed, 1 Sep 2021 20:20:00 +0000 (16:20 -0400)]
ZTS: Remove exceptions for flaky zhack on FreeBSD

Issue #11854 has been resolved, so we can remove the exceptions for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12527

2 years agoAdd zpool_disable_datasets_os() / zfs_unmount_os()
Jorgen Lundman [Tue, 31 Aug 2021 15:56:00 +0000 (00:56 +0900)]
Add zpool_disable_datasets_os() / zfs_unmount_os()

zpool_disable_datasets_os():
macOS needs to do a bunch of work to kick everything off zvols.

zfs_unmount_os():
This allows us to unmount any zvols that may be mounted. Like with
zfs destroy foo/vol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12436

2 years agoFreeBSD: Don't remove SA xattr if not SA znode
Ryan Moeller [Mon, 30 Aug 2021 23:01:09 +0000 (19:01 -0400)]
FreeBSD: Don't remove SA xattr if not SA znode

We attempt to remove an existing SA xattr when setting a dir xattr, but
this only makes sense if the znode has been upgraded to the SA format.
Otherwise, we will hit an assert in zfs_sa_get_xattr.

Make sure this is an SA znode before attempting to remove the SA xattr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12514

2 years agoFix cross-endian interoperability of zstd
Rich Ercolani [Mon, 30 Aug 2021 21:13:46 +0000 (17:13 -0400)]
Fix cross-endian interoperability of zstd

It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12008
Closes #12022

2 years agoZTS: Enable punch-hole tests on FreeBSD
Ka Ho Ng [Sun, 22 Aug 2021 15:22:07 +0000 (23:22 +0800)]
ZTS: Enable punch-hole tests on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12458

2 years agoFreeBSD: Implement hole-punching support
Ka Ho Ng [Wed, 4 Aug 2021 15:57:48 +0000 (23:57 +0800)]
FreeBSD: Implement hole-punching support

This adds supports for hole-punching facilities in the FreeBSD kernel
starting from __FreeBSD_version 1400032.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12458

2 years agoZTS: Waiting for zvols to be available
Brian Behlendorf [Sun, 29 Aug 2021 15:56:58 +0000 (08:56 -0700)]
ZTS: Waiting for zvols to be available

The ZTS block_device_wait helper function should use -e when waiting
for a file to appear since it will be either a block special device
or a symlink.  This didn't cause any failures but when a device path
was specified the function would wait longer than needed.

Additionally update the most flakey test cases to pass the file path
to block_device_wait to try and improve the test reliability.  The
udev behavior on Fedora in particular can result in frequent false
positives.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12515

2 years agoCorrect checking bdev_check_media_change message
Ryan Moeller [Fri, 27 Aug 2021 17:02:54 +0000 (13:02 -0400)]
Correct checking bdev_check_media_change message

We're not looking for bdev_disk_changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12492

2 years agoMake 'zpool labelclear -f' work on offlined disks
Tony Hutter [Fri, 27 Aug 2021 16:26:49 +0000 (09:26 -0700)]
Make 'zpool labelclear -f' work on offlined disks

This patch allows you to clear the label on offlined disks in an active
pool with `-f`.  Previously, labelclear wouldn't let you do that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12511

2 years agoExtend zpool-iostat to account for ZIO_PRIORITY_REBUILD (#12319)
Trevor Bautista [Thu, 26 Aug 2021 18:26:49 +0000 (11:26 -0700)]
Extend zpool-iostat to account for ZIO_PRIORITY_REBUILD (#12319)

Previously, zpool-iostat did not display any data regarding rebuild I/Os
in either the latency/size histograms (-w/-l/-r) or the queue data (-q).
This fix essentially utilizes the existing infrastructure for tracking
rebuild queue data and displays this data in the proper places within
zpool-iostat's output.

Signed-off-by: Trevor Bautista <tbautista@newmexicoconsortium.org>
Signed-off-by: Trevor Bautista <tbautista@lanl.gov>
Co-authored-by: Trevor Bautista <tbautista@newmexicoconsortium.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2 years agovdev_id: Return an error if config file is not found (#12508)
Anton Gubarkov [Wed, 25 Aug 2021 20:01:26 +0000 (23:01 +0300)]
vdev_id: Return an error if config file is not found (#12508)

Signed-off-by: Anton Gubarkov <anton.gubarkov@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2 years agozpool-remove.8: describe top-level vdev sector size limitation
Sam Hathaway [Mon, 23 Aug 2021 21:59:18 +0000 (17:59 -0400)]
zpool-remove.8: describe top-level vdev sector size limitation

Document that top-level vdevs cannot be removed unless all top-level
vdevs have the same sector size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Sam Hathaway <sam@sam-hathaway.com>
Closes #11339
Closes #12472

2 years agoInitialize parity blocks before RAID-Z reconstruction benchmarking
Mark Johnston [Mon, 23 Aug 2021 18:10:17 +0000 (14:10 -0400)]
Initialize parity blocks before RAID-Z reconstruction benchmarking

benchmark_raidz() allocates a row to benchmark parity calculation and
reconstruction.  In the latter case, the parity blocks are left
uninitialized, leading to reports from KMSAN.

Initialize parity blocks to 0xAA as we do for the data earlier in the
function.  This does not affect the selected RAID-Z implementation on
any of several systems tested.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12473

2 years agoZTS: Add tests for creation time
Ryan Moeller [Mon, 26 Jul 2021 20:08:52 +0000 (16:08 -0400)]
ZTS: Add tests for creation time

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12432

2 years agoLinux 4.11 compat: statx support
Richard Yao [Sun, 17 Mar 2019 00:43:13 +0000 (20:43 -0400)]
Linux 4.11 compat: statx support

Linux 4.11 added a new statx system call that allows us to expose crtime
as btime. We do this by caching crtime in the znode to match how atime,
ctime and mtime are cached in the inode.

statx also introduced a new way of reporting whether the immutable,
append and nodump bits have been set. It adds support for reporting
compression and encryption, but the semantics on other filesystems is
not just to report compression/encryption, but to allow it to be turned
on/off at the file level. We do not support that.

We could implement semantics where we refuse to allow user modification
of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc()
to find out encryption/compression information. That would introduce
locking that will have a minor (although unmeasured) performance cost.
It also would be inferior to zdb, which reports far more detailed
information. We therefore omit reporting of encryption/compression
through statx in favor of recommending that users interested in such
information use zdb.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #8507

2 years agozfs.4: Fix typo s/compatiblity/compatibility/
Gordon Bergling [Tue, 17 Aug 2021 17:01:07 +0000 (19:01 +0200)]
zfs.4: Fix typo s/compatiblity/compatibility/

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Gordon Bergling <gbergling@googlemail.com>
Closes #12464

2 years agoRemove b_pabd/b_rabd allocation from arc_hdr_alloc()
Alexander Motin [Tue, 17 Aug 2021 16:15:54 +0000 (12:15 -0400)]
Remove b_pabd/b_rabd allocation from arc_hdr_alloc()

When a header is allocated for full overwrite it is a waste of time
to allocate b_pabd/b_rabd for it, since arc_write() will free them
without ever being touched.  If it is a read or a partial overwrite
then arc_read() and arc_hdr_decrypt() allocate them explicitly.

Reduced memory allocation in user threads also reduces ARC eviction
throttling there, proportionally increasing it in ZIO threads, that
is not good.  To minimize or even avoid it introduce ARC allocation
reserve, allowing certain arc_get_data_abd() callers to allocate a
bit longer in situations where user threads will already throttle.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12398

2 years agoIncrease default volblocksize from 8KB to 16KB
Alexander Motin [Tue, 17 Aug 2021 15:59:46 +0000 (11:59 -0400)]
Increase default volblocksize from 8KB to 16KB

Many things has changed since previous default was set many years ago.
Nowadays 8KB does not allow adequate compression or even decent space
efficiency on many of pools due to 4KB disk physical block rounding,
especially on RAIDZ and DRAID.  It effectively limits write throughput
to only 2-3GB/s (250-350K blocks/s) due to sync thread, allocation,
vdev queue and other block rate bottlenecks.  It keeps L2ARC expensive
despite many optimizations and dedup just unrealistic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12406

2 years agoOptimize arc_l2c_only lists assertions
Alexander Motin [Tue, 17 Aug 2021 15:55:34 +0000 (11:55 -0400)]
Optimize arc_l2c_only lists assertions

It is very expensive and not informative to call multilist_is_empty()
for each arc_change_state() on debug builds to check for impossible.
Instead implement special index function for arc_l2c_only->arcs_list,
multilists, panicking on any attempt to use it.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12421

2 years agoFix/improve dbuf hits accounting
Alexander Motin [Tue, 17 Aug 2021 15:50:31 +0000 (11:50 -0400)]
Fix/improve dbuf hits accounting

Instead of clearing stats inside arc_buf_alloc_impl() do it inside
arc_hdr_alloc() and arc_release().  It fixes statistics being wiped
every time a new dbuf is filled from the ARC.

Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits.
Since the hits are accounted under hash lock, replace atomics with
simple increments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12422

2 years agoAvoid vq_lock drop in vdev_queue_aggregate()
Alexander Motin [Tue, 17 Aug 2021 15:47:00 +0000 (11:47 -0400)]
Avoid vq_lock drop in vdev_queue_aggregate()

vq_lock is already too congested for two more operations per I/O.
Instead of dropping and reacquiring it inside vdev_queue_aggregate()
delegate the zio_vdev_io_bypass() and zio_execute() calls for parent
I/Os to callers, that drop the lock any way to execute the new I/O.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12297

2 years agoUse more atomics in refcounts
Alexander Motin [Tue, 17 Aug 2021 15:44:34 +0000 (11:44 -0400)]
Use more atomics in refcounts

Use atomic_load_64() for zfs_refcount_count() to prevent torn reads
on 32-bit platforms.  On 64-bit ones it should not change anything.

When built with ZFS_DEBUG but running without tracking enabled use
atomics instead of mutexes same as for builds without ZFS_DEBUG.
Since rc_tracked can't change live we can check it without lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12420

2 years agoZTS: Avoid unset $tmpdir in redacted_panic
Ryan Moeller [Mon, 16 Aug 2021 23:38:34 +0000 (19:38 -0400)]
ZTS: Avoid unset $tmpdir in redacted_panic

The redacted_send tests make use of a $tmpdir variable, except in
redacted_send/redacted_panic the variable is never defined.

Use $TEST_BASE_DIR instead.

Clean up the stream file after the test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12455

2 years agoRestore FreeBSD sysctl processing for arc.min and arc.max
Allan Jude [Mon, 16 Aug 2021 15:35:19 +0000 (11:35 -0400)]
Restore FreeBSD sysctl processing for arc.min and arc.max

Before OpenZFS 2.0, trying to set the FreeBSD sysctl vfs.zfs.arc_max
to a disallowed value would return an error.
Since the switch, it instead only generates WARN_IF_TUNING_IGNORED

Keep the ability to set the sysctl's specifically to 0, even though
that is less than the minimum, because some tests depend on this.

Also lost, was the ability to set vfs.zfs.arc_max to a value less
than the default vfs.zfs.arc_min at boot time. Restore this as well.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12161

2 years agozfs: add missed dependency of zfs module on zlib
Ryan Moeller [Fri, 13 Aug 2021 20:42:45 +0000 (16:42 -0400)]
zfs: add missed dependency of zfs module on zlib

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Co-authored-by: Konstantin Belousov <kib@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
External-issue: https://reviews.freebsd.org/D31207
Closes #12442

2 years agoAdd zfs.sh -r flag to reload modules
Ryan Moeller [Fri, 13 Aug 2021 20:37:46 +0000 (16:37 -0400)]
Add zfs.sh -r flag to reload modules

zfs.sh already can load and unload, so why not both?

This is convenient when developing changes to the module and you want
to rapidly make some changes, rebuild the module, reload the module,
and test the changes.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12450

2 years agoFix usage of find in tests/Makefile.am
Ryan Moeller [Fri, 13 Aug 2021 20:13:57 +0000 (16:13 -0400)]
Fix usage of find in tests/Makefile.am

The path is not optional on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12453

2 years agoRun arc_evict thread at higher priority
Tony Nguyen [Tue, 10 Aug 2021 17:36:26 +0000 (11:36 -0600)]
Run arc_evict thread at higher priority

Run arc_evict thread at higher priority, nice=0, to give it more CPU
time which can improve performance for workload with high ARC evict
activities.

On mixed read/write and sequential read workloads, I've seen between
10-40% better performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #12397

2 years agoMake get_key_material_file fail more verbosely
Rich Ercolani [Thu, 5 Aug 2021 23:48:33 +0000 (19:48 -0400)]
Make get_key_material_file fail more verbosely

It turns out, there are a lot of possible reasons for fopen to fail.
Let's share which reason we failed for today.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12410

2 years agoEnable /proc/diskstats for zvols
Brian Behlendorf [Thu, 5 Aug 2021 21:35:34 +0000 (14:35 -0700)]
Enable /proc/diskstats for zvols

The /proc/diskstats accounting needs to be explicitly enabled
for block devices which do not use multi-queue.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12440
Closes #12066

2 years agoMan zpool-scrub.8: describe sequential scrub
George Melikov [Thu, 5 Aug 2021 21:30:28 +0000 (00:30 +0300)]
Man zpool-scrub.8: describe sequential scrub

Describe sequential scrub and add examples of scrub status.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12429

2 years agoModify checksum obtain method of QAT
hedongzhang [Tue, 3 Aug 2021 17:46:33 +0000 (01:46 +0800)]
Modify checksum obtain method of QAT

CpaDcGeneratefooter function that obtain the checksum code
does not support the CPA_DC_STATELESS mode. So we get the
adler32 chencksum of the end of the zlib from dc_results.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Signed-off-by: hedong.zhang <h_d_zhang@163.com>
Closes #12343

2 years agoAllow disabling of unmapped I/O on FreeBSD
Mark Johnston [Mon, 2 Aug 2021 19:18:24 +0000 (15:18 -0400)]
Allow disabling of unmapped I/O on FreeBSD

We have a tunable which permits one to disable the use of unmapped I/O
for the buffer cache.  Respect it in ZFS as well.  This is useful for
KMSAN, which cannot easily maintain shadow state for unmapped pages.

No functional change intended, as unmapped I/O is permitted by default
and there's no real reason to disable it in practice except for
debugging.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12446

2 years agoAvoid small buffer copying on write
Alexander Motin [Tue, 27 Jul 2021 23:05:47 +0000 (19:05 -0400)]
Avoid small buffer copying on write

It is wrong for arc_write_ready() to use zfs_abd_scatter_enabled to
decide whether to reallocate/copy the buffer, because the answer is
OS-specific and depends on the buffer size.  Instead of that use
abd_size_alloc_linear(), moved into public header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12425

2 years agoFix format specifier warnings
Brian Behlendorf [Tue, 27 Jul 2021 03:22:27 +0000 (20:22 -0700)]
Fix format specifier warnings

Commit 5dbf6c5a66d did not address these format specifier warnings
since they were introduced by an unrelated commit which had not
been rebased on 5dbf6c5a66d when merged.  Fix them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12435

2 years agoRemove overlooked __sun_attr__ based macros
Brian Behlendorf [Tue, 27 Jul 2021 00:31:00 +0000 (17:31 -0700)]
Remove overlooked __sun_attr__ based macros

The __NORETURN, __CONST, and __PURE macros in the FreeBSD platform
code were based on the __sun_attr__ macro which was removed in
commit 5dbf6c5a6.  This caused a build failure because the
__NORETURN macro was still used in one place in kernel code.
The __CONST and __PURE macros were entirely unused.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12435

2 years agoUpdate ABI files with generated in CI worker
George Melikov [Sun, 18 Jul 2021 15:55:46 +0000 (18:55 +0300)]
Update ABI files with generated in CI worker

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12388

2 years agostoreabi: add `no-corpus-path` flag
George Melikov [Sun, 18 Jul 2021 14:37:36 +0000 (17:37 +0300)]
storeabi: add `no-corpus-path` flag

Try to minimize diffs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12388

2 years agomacOS can also set va_type
Jorgen Lundman [Mon, 26 Jul 2021 23:38:06 +0000 (08:38 +0900)]
macOS can also set va_type

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12357

2 years agoMove check_file to os/$platform section
Jorgen Lundman [Mon, 26 Jul 2021 23:34:11 +0000 (08:34 +0900)]
Move check_file to os/$platform section

Keep check_file_generic() in shared code base, and allow special case
code in check_file() in os section. In future, macOS will have
additional checks in check_file().

Linux and FreeBSD wrappers just calls check_file_generic().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12385

2 years agoAdd comment on metaslab_class_throttle_reserve() locking
Alexander Motin [Mon, 26 Jul 2021 23:30:20 +0000 (19:30 -0400)]
Add comment on metaslab_class_throttle_reserve() locking

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Issue #12314
Closes #12419

2 years agoAssorted fixes for the performance tests
John Wren Kennedy [Mon, 26 Jul 2021 21:47:08 +0000 (15:47 -0600)]
Assorted fixes for the performance tests

- Bail out early if we're running the perf tests and forget to
  specify disks.
- Allow perf tests to run with any number of disks.
- Remove weekly vs. nightly settings
- Move variables with common values to perf.shlib
- Use zinject to clear the ARC over export/import
- Fix dbuf cache size calculation

When the meaning of `dbuf_cache_max_bytes` changed, the performance
test that covers the dbuf cache started to fail. The test would try to
write files for the test using the max possible size of the cache,
inevitably filling the pool and failing. This change uses
`dbuf_cache_shift` to correctly calculate the dbuf cache size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12408

2 years agoRead past end of argv array in zpool_do_import()
Matthew Ahrens [Mon, 26 Jul 2021 19:51:39 +0000 (12:51 -0700)]
Read past end of argv array in zpool_do_import()

`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and
`pool_specified` to `import_pools()`.  If `pool_specified==FALSE`, the
`argv[]` arguments are not used.  However, these values may be off the
end of the `argv[]` array, so loading them could dereference unmapped
memory.  This error is reported by the asan build:

```
=================================================================
==6003==ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 8 at 0x6030000004a8 thread T0
    #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796
    #1 0x562a078858c5 in main zpool_main.c:10709
    #2 0x7f5115231bf6 in __libc_start_main
    #3 0x562a07885eb9 in _start

0x6030000004a8 is located 0 bytes to the right of 24-byte region
allocated by thread T0 here:
    #0 0x7f5116ac6b40 in __interceptor_malloc
    #1 0x562a07885770 in main zpool_main.c:10699
    #2 0x7f5115231bf6 in __libc_start_main
```

This commit passes NULL for these arguments if they are off the end
of the `argv[]` array.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #12339

2 years agoAdd missing properties to zfs allow manpage
Václav Skála [Tue, 20 Jul 2021 20:21:18 +0000 (22:21 +0200)]
Add missing properties to zfs allow manpage

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #12402

2 years agoFixes in persistent L2ARC
George Amanakis [Mon, 26 Jul 2021 19:30:24 +0000 (21:30 +0200)]
Fixes in persistent L2ARC

In l2arc_add_vdev() first decide whether the device is eligible for
L2ARC rebuild or whole device trim and then add it to the list of cache
devices. Otherwise l2arc_feed_thread() might already start writing on
the device invalidating previous content as l2ad_hand = l2ad_start.
However l2arc_rebuild_vdev() needs the device present in the cache
device list to figure out its l2arc_dev_t. Fix this by moving most of
l2arc_rebuild_vdev() in a new function l2arc_rebuild_dev() which does
not need to search in the cache device list.

In contrast to l2arc_add_vdev() we do not have to worry about
l2arc_feed_thread() invalidating previous content when onlining a
cache device. The device parameters (l2ad*) are not cleared when
offlining the device and writing new buffers will not invalidate
all previous content. In worst case only buffers that have not had
their log block written to the device will be lost.

Retire persist_l2arc_00{4,5,8} tests since they cover code already
covered by the remaining ones. Test persist_l2arc_006 is renamed to
persist_l2arc_004 and persist_l2arc_007 is renamed to persist_l2arc_005.

Fix a typo in persist_l2arc_004, and remove an assertion that is not
always true from l2arc_arcstats_pos. Also update an assertion in
persist_l2arc_005 and explain why in a comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12365

2 years agoRemove NOTE(CONSTCOND) and note.h
наб [Sat, 5 Jun 2021 14:02:41 +0000 (16:02 +0200)]
Remove NOTE(CONSTCOND) and note.h

These were mostly used to annotate do {} while(0)s

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201

2 years agoNormalise /*FALLTHR{OUGH,U}*/
наб [Sat, 5 Jun 2021 13:07:38 +0000 (15:07 +0200)]
Normalise /*FALLTHR{OUGH,U}*/

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201