Aaron LI [Sat, 3 Jul 2021 10:03:29 +0000 (18:03 +0800)]
nvmm: Rename a few things for clarity
Aaron LI [Sat, 3 Jul 2021 08:59:33 +0000 (16:59 +0800)]
nvmm: Make FPU state more OS-indenpendent
* Introduce an OS-indenpendent 'nvmm_x64_state_fpu' structure, derived
from NetBSD's current FPU implementation.
* Also introduce the 'nvmm_x86_xsave' structure, containing the FPU area
and the XSAVE header.
* Add the 'nvmm_x86_xsave_size()' that determines the XSAVE area size to
simplify the code.
* Rename gfpu -> gxsave, for clarity.
* Define 'CTASSERT' because 'nvmm.h' and 'nvmm_x86.h' headers will
be used by libnvmm(3), but <sys/cdefs.h> only defines 'CTASSERT' for
kernel.
* Update libnvmm.3 man page accordingly.
Aaron LI [Sat, 3 Jul 2021 08:11:46 +0000 (16:11 +0800)]
nvmm: Rewrite vmx_vmx{on,off}() as inline ASM functions
Aaron LI [Sat, 3 Jul 2021 08:04:02 +0000 (16:04 +0800)]
nvmm: Make svm_vmrun() void
Aaron LI [Sat, 3 Jul 2021 08:00:28 +0000 (16:00 +0800)]
nvmm: Add SVM CET definitions
Not actually used. For completeness.
Aaron LI [Sat, 3 Jul 2021 07:14:23 +0000 (15:14 +0800)]
nvmm: Redefine CPUID values to be OS-indenpendent
Redefine all CPUID values locally to be OS-indenpendent.
Remove those compat CPUID defines from nvmm_compat.h, no longer needed.
Aaron LI [Sat, 3 Jul 2021 06:30:35 +0000 (14:30 +0800)]
nvmm: Improve CPUID emulation #5: handle Fn0000_0001:EBX[23:16]
Handle CPUID Fn0000_0001:EBX[23:16] to report the logical CPU count.
Aaron LI [Sat, 3 Jul 2021 06:29:15 +0000 (14:29 +0800)]
nvmm: Improve CPUID emulation #4: handle Fn0000_0004 on Intel
Handle CPUID Fn0000_0004 (Deterministic Cache Parameters) on Intel CPUs.
Aaron LI [Sat, 3 Jul 2021 05:07:42 +0000 (13:07 +0800)]
nvmm: Improve CPUID emulation #3: handle Fn8000_0008:ECX on AMD
Properly handle Fn8000_0008:ECX on AMD CPUs to report correct CPU count
info. Similar to Fn0000_000B:ECX on Intel CPUs.
Aaron LI [Sat, 3 Jul 2021 04:04:35 +0000 (12:04 +0800)]
nvmm: Improve CPUID emulation #2: mask upper bits of guest EAX/ECX
Use uint32_t instead of uint64_t for guest EAX/ECX and mask the upper
bits, to prevent wrong results if the upper bits happen to contain
garbage. Not encountered in the wild so far, but could happen.
Aaron LI [Sat, 3 Jul 2021 03:51:53 +0000 (11:51 +0800)]
nvmm: Improve CPUID emulation #1: flags
* Mask PQE (Platform Quality of Service Enforcement); shouldn't be
exposed.
* Add LA57, for completeness.
* Add more flags in Fn8000_0001:EDX, for AMD CPUs.
Aaron LI [Sat, 3 Jul 2021 03:16:33 +0000 (11:16 +0800)]
nvmm: Clarify state handling
* Make a clear distinction between global host state and per-cpu host
state. The former gets saved in a global structure, while the latter
stays in the per-cpu structure.
* Make the host XCR0 part of the global host state, and stop using
rdxcr() in each world switch because it's unnecessary.
Aaron LI [Sat, 3 Jul 2021 02:03:40 +0000 (10:03 +0800)]
nvmm: Clarify the RESET state
Just use plain values instead of macros.
This also eliminates the PAT* compat code in 'nvmm_compat.h'.
Aaron LI [Sat, 3 Jul 2021 01:50:03 +0000 (09:50 +0800)]
nvmm: Add #CP (control protection exception)
Aaron LI [Sat, 3 Jul 2021 00:29:53 +0000 (08:29 +0800)]
libnvmm: Clarify x86 MOVS emulation
Aaron LI [Wed, 30 Jun 2021 13:57:02 +0000 (21:57 +0800)]
libnvmm: Fix a memory leak in nvmm_machine_create()
Also free the allocated 'pages' when ioctl(NVMM_IOC_MACHINE_CREATE)
fails.
Aaron LI [Tue, 29 Jun 2021 23:29:27 +0000 (07:29 +0800)]
nvmm: Improve FPU support and reduce diff against NetBSD
I was using 'struct savexmm64' to translate NetBSD's 'struct xsave_header'.
This works but isn't good enough, because 'savexmm64' can't deal with
xstate, so I disabled the xstate header related code in the old code.
This commit changes to use 'struct saveymm64' instead. It contains the
XSAVE header and YMM xstate component, allowing us to enable the
originally disabled xstate header code in NVMM.
In addition, define some compat macros to adapt NetBSD's FPU structures
to ours, reducing the NVMM code difference against NetBSD.
Will later work on AVX support in guest VM.
Aaron LI [Tue, 29 Jun 2021 14:33:06 +0000 (22:33 +0800)]
doc: Import nvmm TODO note from NetBSD-current
Aaron LI [Wed, 16 Jun 2021 22:49:59 +0000 (06:49 +0800)]
libnvmm.3: Mention regression tests in FILES section
Aaron LI [Sat, 19 Jun 2021 12:31:52 +0000 (20:31 +0800)]
testcases/libnvmm: Improve makefile to not write in source tree
Both build and dfregress(8) would write output in place in the source
tree, which however may be on a readonly mount via NFS. Improve the
makefile to cpdup the whole directory to /tmp and then do everything
there. (credit to Matt Dillon for the idea)
Aaron LI [Wed, 16 Jun 2021 14:35:26 +0000 (22:35 +0800)]
testcases/libnvmm: Add to dfregress(8) test framework
Rewrite and add makefiles to add these testcases to dfregress(8) test
framework. Add a handy 'make test' target to easily run the tests.
Remove unused ATF test scripts.
Aaron LI [Wed, 16 Jun 2021 14:33:53 +0000 (22:33 +0800)]
testcases/libnvmm: Port to DragonFly
Minor tweaks similar to the porting of libnvmm(3).
Aaron LI [Tue, 15 Jun 2021 23:17:28 +0000 (07:17 +0800)]
Import libnvmm tests from NetBSD-current
Branch: NetBSD-current
Date: 2021-06-25
Path: tests/lib/libnvmm
Aaron LI [Mon, 7 Jun 2021 13:41:21 +0000 (21:41 +0800)]
libnvmm.3: Mention 'calc-vm' and 'demo' test code in FILES
Also sort the items in FILES section.
Aaron LI [Mon, 7 Jun 2021 13:40:58 +0000 (21:40 +0800)]
test/nvmm/demo: Improve progress logs to help test/debug
* Add several more progress logs.
* Reduce accepting trap count to 6, reducing the total test time.
* Update the example output in README.
Aaron LI [Sun, 13 Jun 2021 08:06:05 +0000 (16:06 +0800)]
test/nvmm/demo: Rewrite makefiles to not write in source tree
Rewrite the makefiles so it no longer write in the source tree but
output in the /tmp directory. This is useful for building with a
NFS-exported readonly mount of the source tree (e.g., used by dillon).
Aaron LI [Tue, 8 Jun 2021 06:29:40 +0000 (14:29 +0800)]
test/nvmm/demo: Port 'smallkern' to DragonFly
Aaron LI [Tue, 8 Jun 2021 04:52:46 +0000 (12:52 +0800)]
test/nvmm/demo: Make 'smallkern' more self-contained
Provide local 'asm.h' and 'trap.h' headers (derived from NetBSD),
extract necessary PTE_* and PSL_* defines, making 'smallkern' much more
self-contained, which greatly reduces the needed modifications for
porting it to DragonFly. Moreover, it helps to keep the ported code
working on both operating systems.
Aaron LI [Tue, 8 Jun 2021 04:47:45 +0000 (12:47 +0800)]
test/nvmm/demo: Various cleanups to 'smallkern'
* Remove unused variables, symbols, function prototypes and functions.
* Move function prototypes and 'extern' declarations to header files.
* Add 'static' qualifier for file-local variables.
* Add inclusion guard to header files.
* Various minor adjustments.
Aaron LI [Sun, 6 Jun 2021 04:02:25 +0000 (12:02 +0800)]
test/nvmm/demo: Fix ELF load/mmap issue on DragonFly
The 'smallkern' ELF built on DragonFly has a zero-sized GNU_STACK
segment, which causes mmap() to fail (EINVAL). Add conditionals
in elf_parse() to ignore such a segment (while also check for
unsupported non-LOAD segments).
Now 'toyvirt' correctly loads the 'smallkern' ELF on DragonFly.
In addition, assert in toyvirt_mem_add() that the size must be
greater than zero.
Aaron LI [Tue, 8 Jun 2021 01:34:31 +0000 (09:34 +0800)]
test/nvmm/demo: Port 'toyvirt' to DragonFly
Just some minor subsitutions.
Use 'ifdef's to make it work on both NetBSD and DragonFly.
Aaron LI [Fri, 28 May 2021 15:10:04 +0000 (23:10 +0800)]
test/nvmm/demo: Fix some compilation warnings
Aaron LI [Thu, 27 May 2021 23:33:45 +0000 (07:33 +0800)]
test/nvmm/demo: Update 'toyvirt' to current libnvmm(3)
Various adjustments to the demo code to make it work again with the
current libnvmm(3) API in NetBSD 9.1.
In addition, add one more return check of nvmm_vcpu_configure() and
improve the logging messages a bit. Update the example output in
README accordingly.
Tested on NetBSD 9.1. Porting to DragonFly follows.
Aaron LI [Fri, 28 May 2021 10:39:58 +0000 (18:39 +0800)]
test/nvmm/demo: Update makefiles and README
* Adjust makefiles to be a bit more generic.
* Adjust compiler flags to enable more warnings and debug info.
* Add top-level makefile to ease the build.
* Update README.
Aaron LI [Thu, 27 May 2021 23:28:27 +0000 (07:28 +0800)]
test/nvmm: Add a demo for demonstration of libnvmm(3) API
The demo consists of two components:
* toyvirt: a toy virtualizer, that executes in a VM the 64bit ELF binary
given as argument;
* smallkern: an example of such binary.
Obtained from: https://www.netbsd.org/~maxv/nvmm/nvmm-demo.zip
Aaron LI [Sun, 13 Jun 2021 07:12:44 +0000 (15:12 +0800)]
test/nvmm: Add a Makefile and a test script for 'calc-vm'
We write the built binary in /tmp instead of current directory. This is
useful for building with a NFS-exported readonly mount of the source
tree (e.g., used by dillon).
Also add a test script that runs the 'calc-vm' test program in a loop.
It helped reveal the VMCS remote clear bug.
Aaron LI [Sun, 30 May 2021 23:18:21 +0000 (07:18 +0800)]
test/nvmm: Add progress logs in 'calc-vm'
Help test/debug NVMM/libnvmm.
Aaron LI [Sun, 30 May 2021 00:51:01 +0000 (08:51 +0800)]
test/nvmm: Enhance error checks in 'calc-vm'
Enhance error checks to help test/debug NVMM/libnvmm.
Aaron LI [Thu, 27 May 2021 23:22:52 +0000 (07:22 +0800)]
test/nvmm: Add 'calc-vm' (simple VM-based calculator)
A simple calculator. Creates a VM which performs the addition of the two
ints given as argument.
Obtained from: https://www.netbsd.org/~maxv/nvmm/calc-vm.c
Blog: https://blog.netbsd.org/tnf/entry/from_zero_to_nvmm
Aaron LI [Sun, 13 Jun 2021 14:34:25 +0000 (22:34 +0800)]
Bump __DragonFly_version for adding nvmm(4) and libnvmm(3)
Matthew Dillon [Thu, 24 Jun 2021 00:51:40 +0000 (17:51 -0700)]
nvmm - Fix TSC synchronization issues
* Save the guest TSC offset in cpudata as 'gtsc_offset', replacing the
origin absolute TSC value stored as 'gtsc'.
* QEMU and other emulators probably have no intention of actually
forcing the TSC state in the SETSTATE call, so don't act on it
if it matches the value we previously returned.
This allows the guest to inherit a completely synchronized TSC from
the host. Without it, the TSC's for the VCPUs wind up being badly
out of sync.
* Updating MSR_TSC completely blows up TSC mp synchronization. We
assume QEMU did not intend to update the TSC if it tries to write
0 or tries to write the value returned in the previous getstate.
* This allows kernels to use the TSC as a clock, which costs nothing,
verses the ACPI or HPET which have horrible overhead and a global
mutex in QEMU.
Matthew Dillon [Wed, 23 Jun 2021 06:35:30 +0000 (23:35 -0700)]
nvmm - Change max emulated RAM from 128GB to 128TB
* Increase the max emulated RAM from 128GB to 128TB. Ok, I'm not
sure what the actual maximum is, but it sure as heck is more
than 128GB.
* Successfully booted a 8TB qemu on the threadripper (host ate
~275GB to boot it, mostly initializing the vm_page_array[]).
This points to other things we could work on in the kernel
to reduce memory overhead. Our really fat struct vm_page's,
for one.
Aaron LI [Sun, 27 Jun 2021 05:36:00 +0000 (13:36 +0800)]
nvmm: Revamp host TLB flush mechanism
* Leverage the pmap layer to track guest pmap generation id and the host
CPUs that the guest pmap is active on. This avoids the inefficient
_tlb_flush() callbacks from NVMM that invalidate all TLB entries.
* Currently just add all CPUs to the backing pmap for guest physical
memory as they are encountered. Do not yet try to remove any CPUs,
because multiple vCPUs may wind up (temporarily) scheduled to the same
physical CPU. So more sophisticated tracking is needed.
* Fix a bug in SVM's host TLB flush handling where breaking out of the
loop and returning, then re-entering the loop on the same cpu, could
improperly clear the machine flush request.
Credit to Matt Dillon.
Aaron LI [Fri, 25 Jun 2021 10:30:26 +0000 (18:30 +0800)]
pmap: Add some API routines to help NVMM manage guest memory
Add the following three routines for NVMM to use. NVMM can use these
routines to manipulate the cpumask for the pmap backing guest physical
memory.
* pmap_add_cpu()
* pmap_del_cpu()
* pmap_del_all_cpus()
NOTE: The scheduler might somtimes overload multiple vCPUs on the same
physical cpu, so operating is not quite as simple as calling
add_cpu/del_cpu in the core vmrun routines.
Credit to Matt Dillon
Aaron LI [Fri, 25 Jun 2021 10:08:46 +0000 (18:08 +0800)]
pmap: Change pmap->pm_invgen to uint64_t to be compatible with NVMM
Change the 'pmap->pm_invgen' member from 'long' to 'uint64_t', to be
compatible with NVMM's machgen.
Update the atomic operation on 'pm_invgen' accordingly, and no need to
use the '_acq' acquire version (including a read barrier).
Credit to Matt Dillon.
Matthew Dillon [Tue, 29 Jun 2021 06:04:31 +0000 (23:04 -0700)]
nnvm - Move *_fpu_enter/leave inside the cli/sti
* Move the host-to-guest and guest-to-host FP code inside the
hard interrupt disablement. The main reason this needs to
be done is that DragonFly's normal interrupt mechanism is
allowed to use the FP unit (using npxpush/npxpop).
In addition, interrupts will allow the 'interrupt thread' to
preempt the current kernel thread outside of a critical section.
And inside a critical section the interrupt still fires, but
just sets a flag.
* I don't want the host kernel dealing with guest FP state at all,
under any circumstances.
Aaron LI [Sun, 27 Jun 2021 03:59:13 +0000 (11:59 +0800)]
nvmm: Check for pending host events before VM entry
mycpu->gd_reqflags can accumulate action items (pending host interrupts,
AST (asynchronous software trap), etc.). Even if not in a critical
section, some action items can accumulate. When in a critical section,
even more action items can accumulate. Thus, gd_reqflags MUST be
checked *after* hard interrupt disablement to determine if the VM entry
has to be aborted, making the state safe to VM entry.
Credit to Matt Dillon.
Aaron LI [Sun, 27 Jun 2021 03:54:20 +0000 (11:54 +0800)]
nvmm: Improve nvmm_return_needed() by using nvmm_break_wanted()
Use the newly added nvmm_break_wanted() routine to check for pending
host events, improving nvmm_return_needed(). Just stuff
nvmm_break_wanted() into nvmm_return_needed() and get rid of
preempt_needed(), making the code clearer.
Also add __predict_false() macro to help performance a bit.
Matthew Dillon [Wed, 23 Jun 2021 05:19:33 +0000 (22:19 -0700)]
kernel - Add RQF_XINVLTLB to gd_reqflags
Add RQF_XINVLTLB to gd_reqflags. This bit is set on every CPU related
to a pmap after a pmap_inval*() operation makes an adjustment in that
pmap, as part of the IPI sequence.
Will be used by NVMM.
Aaron LI [Sat, 26 Jun 2021 23:58:06 +0000 (07:58 +0800)]
NVMM: Sync with NetBSD #2: SVM & VMX backends
This commit syncs the NVMM kernel part to match NetBSD current (as of
2021-06-25). The main changes are as follows:
* Improve host FPU handling. The host FPU state is now save in PCB
instead of in vCPU data area.
* Clear TS flag from the host's CR0 in _vcpu_init(), because it is also
cleared inside the _vcpu_run() loop. Not clearing it could trigger
DNAs on VMEXITs.
* Set VMCS_HOST_IDTR_BASE on each CPU independently, because the IDT is
now per-CPU (in NetBSD).
NOTE: DragonFly is also using per-CPU IDT, so this change fixes a
porting issue.
* Disable interrupts earlier to prevent possible race against TLB flush
IPIs, because such IPIs don't respect the IPL, so enforcing IPL_HIGH
has no effect.
* VMX: Improve CR0 handling:
- Flush the guest TLB when certain CR0 bits change.
- Employ VMCS_CR0_SHADOW to allow the guest to update certain static
CR0 bits. Guest gets the illusion that the CR0 change was applied,
but the "real" CR0 bits remain unchanged.
- Force CR0_ET to 1 in shadow CR0; force CR0_ET and CR0_NE in real
CR0.
- Add comments to clarify better.
NOTE:
NetBSD has overhauled the FPU handling, so NVMM no longer needs to save
host FPU state in the _cpudata structure. I haven't found a way to do
this on DragonFly yet, so leave it and investigate it later.
Aaron LI [Sat, 26 Jun 2021 11:14:24 +0000 (19:14 +0800)]
NVMM: Sync with NetBSD #1: copyright headers
Aaron LI [Mon, 14 Jun 2021 23:26:03 +0000 (07:26 +0800)]
nvmm: Fix SVM TSS restore on DragonFly
In DragonFly, PCPU(tss_gdt) points directly to the gdt[] entry for the
current CPU's TSS descriptor; while NetBSD's CPUVAR(GDT) points to the
gdtstore[] table. Fix that 'and' instruction so it works on DragonFly.
(Credit to Matt Dillon for debugging and fixing this.)
The 'and' instruction clears the busy bit (bit 41) so the TSS descriptor
becomes "available" for the reloading, as required by 'ltr' instruction.
(The TSS descriptor was in use prior to launching the guest so it has
been marked busy.)
Credit:
* Illumos: Bug #13029: AMD bhyve should reload TSS ASAP
https://www.illumos.org/issues/13029
* Illumos: 13029 AMD bhyve should reload TSS ASAP
https://github.com/illumos/illumos-gate/commit/
4d3fdeb14779bb6b0838521971d9ac99d65b0572
Aaron LI [Sun, 13 Jun 2021 06:25:39 +0000 (14:25 +0800)]
nvmm: Implement waits for lwkt_send_ipiq_mask()
Unlike lwkt_send_ipiq(), lwkt_send_ipiq_mask() doesn't have a sequence
number to wait for completion, and a wait mechanism like that would be
very expensive.
Here we choose a simple method. Just have {vmx,svm}_change_cpu()
decrement a global with an atomic op and issue a wakeup() when it hits
0. And the callers can just tsleep in a loop until its zero
Credit to Matt Dillon for the patch.
Aaron LI [Sun, 13 Jun 2021 06:16:59 +0000 (14:16 +0800)]
nvmm: Fix VMX VMCS remote clear issues
When clearing a VMCS from a remote CPU, must wait for the IPI to
complete. Otherwise the VMCS may be wrong when the thread migrates to
another CPU and thus cause panics when executing VMX instructions.
Credit to Matt Dillon for the debugging and fix.
Aaron LI [Sun, 13 Jun 2021 07:00:51 +0000 (15:00 +0800)]
nvmm: Fix issues of porting 'curcpu()' as 'mycpu'
In NVMM porting step #10, I ported NetBSD's 'curcpu()' as our 'mycpu'.
This was incorrect, because the 'struct globaldata *' pointer returned
by 'mycpu' is NOT stable and can change. (see the comments in
'pc64/include/thread.h')
Use 'mycpuid' to implement 'curcpu()' and adjust the code accordingly.
Aaron LI [Sun, 6 Jun 2021 11:38:17 +0000 (19:38 +0800)]
nvmm: Fix '-Wnested-externs' warning nvmm_x86_vmx.c
The 'extern uint8_t vmx_resume_rip' declaration in vmx_vcpu_init()
causes a 'nested extern declaration'. Fix it by changing it to a
function prototype, which is actually an assembly function.
Aaron LI [Sun, 6 Jun 2021 11:31:54 +0000 (19:31 +0800)]
nvmm: Port to DragonFly #24: pmap transform & TLB invalidation
* Port NetBSD's pmap_ept_transform() to DragonFly's. We don't make
'pmap_ept_has_ad' a global in the pmap code, so need to pass extra
flags to our pmap_ept_transform().
* Replace NetBSD's pmap_tlb_shootdown() with our pmap_inval_smp().
* Add two new fields 'pm_data' & 'pm_tlb_flush' to 'struct pmap', which
are used as a callback by NVMM to handle its own TLB invalidation.
Note that pmap_enter() also calls pmap_inval_smp() on EPT/NPT pmap
and requires the old PTE be returned, so we can't place the NVMM TLB
callback at the beginning part of pmap_inval_smp() and return 0.
Aaron LI [Sun, 6 Jun 2021 12:06:59 +0000 (20:06 +0800)]
pmap: Implement pmap_npt_transform() for NVMM
This function will transform an initialized pmap structure for use by
NVMM's AMD SVM backend.
AMD's NPT (nested page table), aka RVI (rapid virtualization indexing)
implementation is more complete than Intel's EPT; it supports A/D bits
and uses the same bits positions as native x86 page tables. So this
function is a simplified version of pmap_ept_transform().
Aaron LI [Sun, 6 Jun 2021 04:51:50 +0000 (12:51 +0800)]
pc64/vmm: Use pmap_ept_transform() to simplify EPT code
Aaron LI [Sun, 6 Jun 2021 03:30:13 +0000 (11:30 +0800)]
pmap: Implement pmap_ept_transform() for NVMM
The pmap_ept_transform() transforms an initialized pmap structure to be
EPT type for Intel VMX hypervisor (e.g., NVMM) use. This implementation
is derived from vmx_ept_init() and vmx_ept_pmap_pinit() in
'pc64/vmm/ept.c'.
Note that this function has a different prototype as NetBSD's one,
because we don't make 'pmap_ept_has_ad' a global variable so we need to
pass extra flags to the pmap.
When zeroing out the page directories, note that the valid area is two
pages if there is a pm_pmlpv_iso PTE installed (i.e., the system has
meltdown mitigation enabled), otherwise, it's only one page. (credit to
Matt Dillon)
Aaron LI [Tue, 25 May 2021 23:43:54 +0000 (07:43 +0800)]
nvmmctl(8): Rewrite makefile and hook to build
Aaron LI [Tue, 25 May 2021 23:43:29 +0000 (07:43 +0800)]
nvmmctl(8): Port to DragonFly
* Adjust header inclusions
* Add 'XCR0_FLAGS1' macro define
* Add several '__unused' attributes
* Rename '__dead' to '__dead2'
Aaron LI [Sat, 12 Jun 2021 10:11:10 +0000 (18:11 +0800)]
libnvmm: Adapt to also build on NetBSD
Adapt the libnvmm code to build and work on both DragonFly and NetBSD.
So it can help debug the porting issues.
Aaron LI [Sun, 30 May 2021 23:25:14 +0000 (07:25 +0800)]
libnvmm: Fix mmap() failure with 'permission denied'
The mmap() in nvmm_vcpu_create() was always failing with the EACCES
(permission denied) error code. It was because mmap() was requesting
prot = PROT_READ|PROT_WRITE and flags = MAP_SHARED, but the fd was
opened with O_RDONLY (or O_WRONLY in nvmm_root_init()) and thus
disallowed such a mmap request.
Fix this issue by opening the nvmm fd with O_RDWR flag. This also
requires to change the mode of '/dev/nvmm' from 0640 to 0660.
However, this makes root owner distinguishing in nvmm kernel module
useless. So change to identify root owner by checking whether the
caller has root privilege.
In addition, refactor nvmm_root_init() to also check for root privilege
first and then call nvmm_init().
Aaron LI [Sun, 9 May 2021 23:05:43 +0000 (07:05 +0800)]
libnvmm: Update makefiles and hook to build
Aaron LI [Sun, 9 May 2021 23:05:16 +0000 (07:05 +0800)]
libnvmm: Port to DragonFly
* Add 'nvmm_compat.h' to adapt some macros/constants for DragonFly.
* Add some '__unused' attributes to fix compilation warnings/errors.
* Adjust header inclusions.
* Update nvmm(4) kernel source path in the man page, also update
'struct nvmm_x64_state' to match DragonFly's version.
Aaron LI [Sat, 29 May 2021 15:14:06 +0000 (23:14 +0800)]
nvmm: Improve makefile to allow standalone build
This change makes the nvmm kernel module is buildable in its own
directory. This helps debug the module.
Aaron LI [Sun, 9 May 2021 23:35:35 +0000 (07:35 +0800)]
nvmm: Rewrite makefiles and hook to build
Note that kernel header files are install by the top-level
'include/Makefile'. However, it will install all found header
files in the specified directories, including 'nvmm_compat.h'
and 'nvmm_internal.h'. Therefore, add a guard to prevent them
from including by userland utilities (e.g., libnvmm, nvmmctl).
Aaron LI [Sun, 9 May 2021 23:17:27 +0000 (07:17 +0800)]
nvmm: Add to sys/conf/files and LINT64
Meanwhile, remove the unused 'files.nvmm'.
Aaron LI [Sun, 23 May 2021 10:52:45 +0000 (18:52 +0800)]
nvmm: Port to DragonFly #23: header inclusion adjustments
Aaron LI [Sun, 23 May 2021 10:21:31 +0000 (18:21 +0800)]
nvmm: Port to DragonFly #22: pmap EPT/NPT base address
Replace NetBSD's pmap->pm_pdirpa[0] with our vtophys(pmap->pm_pml4).
In addition, use vmspace_pmap() to grab the pmap, which is more
consistent with other code in our code base.
Aaron LI [Sun, 23 May 2021 10:13:54 +0000 (18:13 +0800)]
nvmm: Port to DragonFly #21: virtual address space management
Adapt the following NetBSD UVM functions to DragonFly:
* uvmspace_alloc() -> vmspace_alloc()
* uvmspace_free() -> vmspace_rel()
* uvm_fault() -> vm_fault()
* uvm_map() -> vm_map_insert() + vm_map_inherit() + vm_map_madvise() ...
* uvm_map_pageable() -> vm_map_wire()
* uvm_unmap(), uvm_deallocate() -> vm_map_remove()
To support the UVM_FLAG_FIXED & UVM_FLAG_UNMAP flags in uvm_map(),
vm_map_delete() is called unconditionally to make room fot the coming
new mapping. Note that vm_map_findspace() cannot be called in this case,
because it's not guaranteed to return the input hint address if the
hint range is available.
Use vm_map_wire() to wire/unwire the mapping; vm_map_unwire() is for
userland mlock operations.
In uvm_deallocate(), need to unwire kernel page before remove, because
vm_map_remove() only handles user wirings.
Reviewed-by: Matt Dillon
Aaron LI [Sat, 22 May 2021 13:58:44 +0000 (21:58 +0800)]
nvmm: Port to DragonFly #20: preemption & critical section
In DragonFly, a normal kernel thread will not migrate to another CPU or be
preempted (except by an interrupt thread), so kpreempt_{disable,enable}()
are not needed. However, we can't use critical section as an instead,
because that would also prevent interrupt/reschedule flags from being
set, which would be a problem for nvmm_return_needed() that's called from
vcpu_run() loop. (credit to Matt Dillon)
Port nvmm_return_needed() to DragonFly. But note that the
*_resched_wanted() functions cannot be used in critical sections, which
would prevent the relevant flags from being set. (credit to Matt Dillon)
Port splhigh()/splx() as critical sections in DragonFly for the moment.
Don't worry about it unless we have issues with it later.
Aaron LI [Sat, 22 May 2021 08:28:48 +0000 (16:28 +0800)]
nvmm: Port to DragonFly #19: IPI cross-cpu calls
Replace NetBSD xcall(9) API by our lwkt_send_ipiq() and
lwkt_send_ipiq_mask() to unicast/broadcast a function call to one/all
CPUs.
In DragonFly, a normal kernel thread won't migrate to another CPU,
so no need to implement NetBSD's curlwp_bind() and curlwp_bindx().
Aaron LI [Sat, 22 May 2021 04:03:06 +0000 (12:03 +0800)]
nvmm: Port to DragonFly #18: kernel memory allocation
Use kmem_alloc() and kmem_free() to implement uvm_km_alloc() and
uvm_km_free() as they're used in svm_vcpu_create() and vmx_vcpu_create().
However, our kmem_alloc() may return 0 (i.e., allocation failure), so
need an extra check in the caller functions.
Since we've defined 'kmem_alloc' and 'kmem_free' macros to adapt
NetBSD's functions to use our kmalloc() and kfree(). Therefore, extra
parentheses are added around 'kmem_alloc' and 'kmem_free' to avoid macro
expansion, so the original functions would be called.
In addition, change the 'kmem_free()' to 'uvm_km_free()' in
vmx_vcpu_create(), aligning with the invocation pattern as well as
the use case in svm_vcpu_create().
Aaron LI [Fri, 21 May 2021 15:19:46 +0000 (23:19 +0800)]
nvmm: Port to DragonFly #17: physical page allocation
Implement uvm_pagealloc() and uvm_pagefree() with vm_page_alloczwq() and
vm_page_freezwq(), respectively, which are added for this purpose by
Matt Dillon in
14067db606f14f728f62891ebcdc30366e95aa3d.
These two functions are used in 'nvmm_x86_svm.c' to allocate the HSAVE
memory.
Aaron LI [Fri, 21 May 2021 14:39:59 +0000 (22:39 +0800)]
nvmm: Port to DragonFly #16: contiguous memory allocation
svm_memalloc() and vmx_memalloc() need to allocate memory block that's
both virtually contiguous and physically contiguous. NetBSD achieves
this requirement by first allocating a list of physically contiguous
pages and a virtually contiguous memory address, and then mapping them
page by page.
We can just use contigmalloc(9) to achieve the same goal.
Aaron LI [Wed, 19 May 2021 10:52:22 +0000 (18:52 +0800)]
nvmm: Port to DragonFly #15: anonymous object management
Implement compat code for NetBSD anonymous object management:
uao_create(), uao_reference() and uao_detach().
The created object should be pageable by default, for example, the
object of guest physical memory. So choose the default pager to create
the anonymous object.
If the object needs to be wired (e.g., the object for communicating
between kernel and userland), the uvm_map_pageable() can be called to
wire the object.
Aaron LI [Tue, 18 May 2021 06:05:40 +0000 (14:05 +0800)]
nvmm: Port to DragonFly #14: device & module operations
Replace NetBSD 'cdevsw' and 'fileops' structs with our 'dev_ops' struct,
and port NVMM to support both device open/close and module load/unload
operations.
NetBSD doesn't support cloning device, so it clones the file descriptor
(fd_clone() function) of the opened device (/dev/nvmm) and reassociates
it to the current process. So that each process sees a separate
instance of the device. See also NetBSD 'sys/net/if_tap.c' for an
example with detailed explanation on this mechansim.
DragonFly supports per-file-descriptor data with the devfs cdevpriv API,
which is much simpler than the method with autoclone device.
Also credit to Jaromír Doleček for his porting work of NVMM to DragonFly.
See: https://github.com/Moritz-Systems/DragonFlyBSD/commit/
b96e5836fd25b448bb54775ac0107917adc2937d
Aaron LI [Sun, 16 May 2021 09:18:00 +0000 (17:18 +0800)]
nvmm: Port to DragonFly #13: debug register save & restore
Derived from NetBSD's x86_dbregs_save()/x86_dbregs_restore() in
'sys/arch/x86/x86/dbregs.c'.
Aaron LI [Tue, 25 May 2021 06:40:41 +0000 (14:40 +0800)]
nvmm: Port to DragonFly #12: FPU save & restore
Note that the host FPU state is indeterminant and depends on whether
the user program used the FPU or not, so there might not be any state to
save. npxpush() and npxpop() can handle this. Accordingly, need to use
'mcontext_t' to store host FPU state.
At first I used fpu_area_save() and fpu_area_restore() to deal with the
host FPU state, but it caused a hard fault loop when trying to boot an
OS in QEMU, because it failed to handle an uninitialized FPU. Thanks
to Matt Dillon for tracking it down and fixing it.
Credit to FreeBSD vmm code: save_guest_fpustate(), restore_guest_fpustate()
Aaron LI [Sat, 15 May 2021 14:37:08 +0000 (22:37 +0800)]
nvmm: Port to DragonFly #11: CPU features
Aaron LI [Fri, 14 May 2021 12:10:32 +0000 (20:10 +0800)]
nvmm: Port to DragonFly #10: cpu_info etc.
* Replace 'struct cpu_info' with 'struct globaldata'.
* Port cpu_info's ci_tss_sel/ci_tss/ci_gdt.
* Port curcpu(), cpu_number(), cpu_index() functions.
* Port CPU iteration code.
Aaron LI [Fri, 14 May 2021 00:09:58 +0000 (08:09 +0800)]
nvmm: Port to DragonFly #9: atomic operations
Add compat defines for NetBSD's atomic_inc_64(), atomic_{inc,dec}_uint().
However, we don't have an alternative for the type-generic
atomic_load_relaxed() function. So just modify the code accordingly.
Aaron LI [Wed, 12 May 2021 23:25:59 +0000 (07:25 +0800)]
nvmm: Port to DragonFly #8: kcpuset(9) -> cpumask(9)
Translate NetBSD's kcpuset(9) API to our cpumask(9) API. Use the atomic
version to avoid possible races between multiple vCPUs.
Aaron LI [Wed, 12 May 2021 23:23:04 +0000 (07:23 +0800)]
nvmm: Port to DragonFly #7: memory allocation
Add compat code to adapt NetBSD's kmem_alloc()/kmem_zalloc()/kmem_free().
Aaron LI [Tue, 11 May 2021 23:16:23 +0000 (07:16 +0800)]
nvmm: Port to DragonFly #6: mutex/rwlock
Add compat code to adapt NetBSD's mutex and rwlock to use DragonFly's
lockmgr(9).
Aaron LI [Sun, 30 May 2021 08:02:49 +0000 (16:02 +0800)]
nvmm: Port to DragonFly #5: constants/functions/macros
Update nvmm_compat.h to include various compat constant/functions
defines.
Aaron LI [Sun, 30 May 2021 08:01:24 +0000 (16:01 +0800)]
nvmm: Port to DragonFly #4: PAT modes
Adapt NetBSD's PATENTRY() and PAT_* modes to ours PAT_VALUE() and PAT_*
defines.
Aaron LI [Tue, 11 May 2021 23:10:55 +0000 (07:10 +0800)]
nvmm: Port to DragonFly #3: CR/MSR defines
Add XCR0 and various MSRs compat defines.
Aaron LI [Tue, 11 May 2021 23:03:56 +0000 (07:03 +0800)]
nvmm: Port to DragonFly #2: CPUID Fn0000_000B for SVM
Add CPUID Fn0000_000B (Extended Topology Enumeration) defines for SVM.
Obtained from NetBSD.
Aaron LI [Tue, 11 May 2021 06:25:17 +0000 (14:25 +0800)]
nvmm: Port to DragonFly #1: nvmm_x86_{svmfunc,vmxfunc}.S
Aaron LI [Tue, 25 May 2021 06:39:10 +0000 (14:39 +0800)]
nvmm: Port to DragonFly #0: initial nvmm_compat.h
Add nvmm_compat.h to hold the major compatibility code for the porting.
Currently there are mostly CPUID2_* and CPUID_SEF_* defines.
Credit to Jaromír Doleček for his initial porting of NVMM to DragonFly.
See: https://github.com/Moritz-Systems/DragonFlyBSD/commit/
b96e5836fd25b448bb54775ac0107917adc2937d
Aaron LI [Sun, 9 May 2021 23:06:54 +0000 (07:06 +0800)]
Add group 'nvmm' and GID_NVMM for nvmm(4) & nvmmctl(8)
Aaron LI [Sun, 9 May 2021 23:03:31 +0000 (07:03 +0800)]
nvmm.4: Add HISTORY and hook to build
Aaron LI [Sun, 9 May 2021 22:52:17 +0000 (06:52 +0800)]
nvmm: Bring some minor changes from NetBSD-current
These changes help port NVMM to DragonFly by reducing the required
difference.
Aaron LI [Wed, 5 May 2021 08:16:06 +0000 (16:16 +0800)]
Import nvmm.4 manpage from NetBSD 9-stable
Branch: NetBSD 9-stable
Date: Fri Apr 30 14:08:16 2021 +0000
Path: share/man/man4/nvmm.4
Aaron LI [Wed, 5 May 2021 08:06:40 +0000 (16:06 +0800)]
Import nvmmctl(8) from NetBSD 9-stable
This is a program to control NVMM(4) virtual machines. It currently
implements the following two commands:
- identify: display the capabilities of the system.
- list: display information on each virtual machine registered in the
system.
Branch: NetBSD 9-stable
Date: Fri Apr 30 14:08:16 2021 +0000
Path: usr.sbin/nvmmctl
Aaron LI [Wed, 5 May 2021 07:58:32 +0000 (15:58 +0800)]
Import libnvmm(3) from NetBSD 9-stable
This is the virtualization API that provides a way for VMM software to
effortlessly create and manage virtual machines via NVMM(4).
Branch: NetBSD 9-stable
Date: Fri Apr 30 14:08:16 2021 +0000
Path: lib/libnvmm
Aaron LI [Wed, 5 May 2021 07:35:16 +0000 (15:35 +0800)]
Import nvmm(4) from NetBSD 9-stable
This is the kernel driver that provides support for hardware-accelerated
virtualization. It is made of an MI frontend with the following two MD
backends:
- x86 Intel VMX
- x86 AMD SVM
Branch: NetBSD 9-stable
Date: Fri Apr 30 14:08:16 2021 +0000
Path: sys/dev/nvmm