nrelease - fix/improve livecd
[dragonfly.git] / share / man / man7 / vkernel.7
... / ...
CommitLineData
1.\"
2.\" Copyright (c) 2006, 2007
3.\" The DragonFly Project. All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\"
9.\" 1. Redistributions of source code must retain the above copyright
10.\" notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\" notice, this list of conditions and the following disclaimer in
13.\" the documentation and/or other materials provided with the
14.\" distribution.
15.\" 3. Neither the name of The DragonFly Project nor the names of its
16.\" contributors may be used to endorse or promote products derived
17.\" from this software without specific, prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30.\" SUCH DAMAGE.
31.\"
32.Dd September 7, 2021
33.Dt VKERNEL 7
34.Os
35.Sh NAME
36.Nm vkernel ,
37.Nm vcd ,
38.Nm vkd ,
39.Nm vke
40.Nd virtual kernel architecture
41.Sh SYNOPSIS
42.Cd "platform vkernel64 # for 64 bit vkernels"
43.Cd "device vcd"
44.Cd "device vkd"
45.Cd "device vke"
46.Pp
47.Pa /var/vkernel/boot/kernel/kernel
48.Op Fl hstUvz
49.Op Fl c Ar file
50.Op Fl e Ar name Ns = Ns Li value : Ns Ar name Ns = Ns Li value : Ns ...
51.Op Fl i Ar file
52.Op Fl I Ar interface Ns Op Ar :address1 Ns Oo Ar :address2 Oc Ns Oo Ar /netmask Oc Ns Oo Ar =mac Oc
53.Op Fl l Ar cpulock
54.Op Fl m Ar size
55.Op Fl n Ar numcpus Ns Op Ar :lbits Ns Oo Ar :cbits Oc
56.Op Fl p Ar pidfile
57.Op Fl r Ar file Ns Op Ar :serno
58.Op Fl R Ar file Ns Op Ar :serno
59.Sh DESCRIPTION
60The
61.Nm
62architecture allows for running
63.Dx
64kernels in userland.
65.Pp
66The following options are available:
67.Bl -tag -width ".Fl m Ar size"
68.It Fl c Ar file
69Specify a readonly CD-ROM image
70.Ar file
71to be used by the kernel, with the first
72.Fl c
73option defining
74.Li vcd0 ,
75the second one
76.Li vcd1 ,
77and so on.
78The first
79.Fl r ,
80.Fl R ,
81or
82.Fl c
83option specified on the command line will be the boot disk.
84The CD9660 filesystem is assumed when booting from this media.
85.It Fl e Ar name Ns = Ns Li value : Ns Ar name Ns = Ns Li value : Ns ...
86Specify an environment to be used by the kernel.
87This option can be specified more than once.
88.It Fl h
89Shows a list of available options, each with a short description.
90.It Fl i Ar file
91Specify a memory image
92.Ar file
93to be used by the virtual kernel.
94If no
95.Fl i
96option is given, the kernel will generate a name of the form
97.Pa /var/vkernel/memimg.XXXXXX ,
98with the trailing
99.Ql X Ns s
100being replaced by a sequential number, e.g.\&
101.Pa memimg.000001 .
102.It Fl I Ar interface Ns Op Ar :address1 Ns Oo Ar :address2 Oc Ns Oo Ar /netmask Oc Ns Oo Ar =MAC Oc
103Create a virtual network device, with the first
104.Fl I
105option defining
106.Li vke0 ,
107the second one
108.Li vke1 ,
109and so on.
110.Pp
111The
112.Ar interface
113argument is the name of a
114.Xr tap 4
115device node or the path to a
116.Xr vknetd 8
117socket.
118The
119.Pa /dev/
120path prefix does not have to be specified and will be automatically prepended
121for a device node.
122Specifying
123.Cm auto
124will pick the first unused
125.Xr tap 4
126device.
127.Pp
128The
129.Ar address1
130and
131.Ar address2
132arguments are the IP addresses of the
133.Xr tap 4
134and
135.Nm vke
136interfaces.
137Optionally,
138.Ar address1
139may be of the form
140.Li bridge Ns Em X
141in which case the
142.Xr tap 4
143interface is added to the specified
144.Xr bridge 4
145interface.
146The
147.Nm vke
148address is not assigned until the interface is brought up in the guest.
149.Pp
150The
151.Ar netmask
152argument applies to all interfaces for which an address is specified.
153.Pp
154The
155.Ar MAC
156argument is the MAC address of the
157.Xr vke 4
158interface.
159If not specified, a pseudo-random one will be generated.
160.Pp
161When running multiple vkernels it is often more convenient to simply
162connect to a
163.Xr vknetd 8
164socket and let vknetd deal with the tap and/or bridge.
165An example of this would be
166.Pa /var/run/vknet:0.0.0.0:10.2.0.2/16 .
167.It Fl l Ar cpulock
168Specify which, if any, real CPUs to lock virtual CPUs to.
169.Ar cpulock
170is one of
171.Cm any ,
172.Cm map Ns Op , Ns Ar startCPU ,
173or
174.Ar CPU .
175.Pp
176.Cm any
177does not map virtual CPUs to real CPUs.
178This is the default.
179.Pp
180.Cm map Ns Op , Ns Ar startCPU
181maps each virtual CPU to a real CPU starting with real CPU 0 or
182.Ar startCPU
183if specified.
184.Pp
185.Ar CPU
186locks all virtual CPUs to the real CPU specified by
187.Ar CPU .
188.Pp
189Locking the vkernel to a set of cpus is recommended on multi-socket systems
190to improve NUMA locality of reference.
191.It Fl m Ar size
192Specify the amount of memory to be used by the kernel in bytes,
193.Cm K
194.Pq kilobytes ,
195.Cm M
196.Pq megabytes
197or
198.Cm G
199.Pq gigabytes .
200Lowercase versions of
201.Cm K , M ,
202and
203.Cm G
204are allowed.
205.It Fl n Ar numcpus Ns Op Ar :lbits Ns Oo Ar :cbits Oc
206.Ar numcpus
207specifies the number of CPUs you wish to emulate.
208Up to 16 CPUs are supported with 2 being the default unless otherwise
209specified.
210.Pp
211.Ar lbits
212specifies the number of bits within APICID(=CPUID) needed for representing
213the logical ID.
214Controls the number of threads/core (0 bits - 1 thread, 1 bit - 2 threads).
215This parameter is optional (mandatory only if
216.Ar cbits
217is specified).
218.Pp
219.Ar cbits
220specifies the number of bits within APICID(=CPUID) needed for representing
221the core ID.
222Controls the number of core/package (0 bits - 1 core, 1 bit - 2 cores).
223This parameter is optional.
224.It Fl p Ar pidfile
225Specify a pidfile in which to store the process ID.
226Scripts can use this file to locate the vkernel pid for the purpose of
227shutting down or killing it.
228.Pp
229The vkernel will hold a lock on the pidfile while running.
230Scripts may test for the lock to determine if the pidfile is valid or
231stale so as to avoid accidentally killing a random process.
232Something like '/usr/bin/lockf -ks -t 0 pidfile echo -n' may be used
233to test the lock.
234A non-zero exit code indicates that the pidfile represents a running
235vkernel.
236.Pp
237An error is issued and the vkernel exits if this file cannot be opened for
238writing or if it is already locked by an active vkernel process.
239.It Fl r Ar file Ns Op Ar :serno
240Specify a R/W disk image
241.Ar file
242to be used by the kernel, with the first
243.Fl r
244option defining
245.Li vkd0 ,
246the second one
247.Li vkd1 ,
248and so on.
249A serial number for the virtual disk can be specified in
250.Ar serno .
251.Pp
252The first
253.Fl r
254or
255.Fl c
256option specified on the command line will be the boot disk.
257.It Fl R Ar file Ns Op Ar :serno
258Works like
259.Fl r
260but treats the disk image as copy-on-write. This allows
261a private copy of the image to be modified but does not
262modify the image file. The image file will not be locked
263in this situation and multiple vkernels can run off the
264same image file if desired.
265.Pp
266Since modifications are thrown away, any data you wish
267to retain across invocations needs to be exported over
268the network prior to shutdown.
269This gives you the flexibility to mount the disk image
270either read-only or read-write depending on what is
271convenient.
272However, keep in mind that when mounting a COW image
273read-write, modifications will eat system memory and
274swap space until the vkernel is shut down.
275.It Fl s
276Boot into single-user mode.
277.It Fl t
278Tell the vkernel to use a precise host timer when calculating clock values.
279If the TSC isn't used, this will impose higher overhead on the vkernel as it
280will have to make a system call to the real host every time it wants to get
281the time.
282However, the more precise timer might be necessary for your application.
283.Pp
284By default, the vkernel uses the TSC cpu timer if possible, or an imprecise
285(host-tick-resolution) timer which uses a user-mapped kernel page and does
286not have any syscall overhead.
287To disable the TSC cpu timer, use the
288.Fl e Ar hw.tsc_cputimer_enable=0
289flag.
290.It Fl U
291Enable writing to kernel memory and module loading.
292By default, those are disabled for security reasons.
293.It Fl v
294Turn on verbose booting.
295.It Fl z
296Force the vkernel's ram to be pre-zerod. Useful for benchmarking on
297single-socket systems where the memory allocation does not have to be
298NUMA-friendly.
299This options is not recommended on multi-socket systems or when the
300.Fl l
301option is used.
302.El
303.Sh DEVICES
304A number of virtual device drivers exist to supplement the virtual kernel.
305.Ss Disk device
306The
307.Nm vkd
308driver allows for up to 16
309.Xr vn 4
310based disk devices.
311The root device will be
312.Li vkd0
313(see
314.Sx EXAMPLES
315for further information on how to prepare a root image).
316.Ss CD-ROM device
317The
318.Nm vcd
319driver allows for up to 16 virtual CD-ROM devices.
320Basically this is a read only
321.Nm vkd
322device with a block size of 2048.
323.Ss Network interface
324The
325.Nm vke
326driver supports up to 16 virtual network interfaces which are associated with
327.Xr tap 4
328devices on the host.
329For each
330.Nm vke
331device, the per-interface read only
332.Xr sysctl 3
333variable
334.Va hw.vke Ns Em X Ns Va .tap_unit
335holds the unit number of the associated
336.Xr tap 4
337device.
338.Pp
339By default, half of the total mbuf clusters available is distributed equally
340among all the vke devices up to 256.
341This can be overridden with the tunable
342.Va hw.vke.max_ringsize .
343Take into account the number passed will be aligned to the lower power of two.
344.Sh SIGNALS
345The virtual kernel only enables
346.Dv SIGQUIT
347and
348.Dv SIGTERM
349while operating in regular console mode.
350Sending
351.Ql \&^\e
352.Pq Dv SIGQUIT
353to the virtual kernel causes the virtual kernel to enter its internal
354.Xr ddb 4
355debugger and re-enable all other terminal signals.
356Sending
357.Dv SIGTERM
358to the virtual kernel triggers a clean shutdown by passing a
359.Dv SIGUSR2
360to the virtual kernel's
361.Xr init 8
362process.
363.Sh DEBUGGING
364It is possible to directly gdb the virtual kernel's process.
365It is recommended that you do a
366.Ql handle SIGSEGV noprint
367to ignore page faults processed by the virtual kernel itself and
368.Ql handle SIGUSR1 noprint
369to ignore signals used for simulating inter-processor interrupts.
370.Sh FILES
371.Bl -tag -width ".It Pa /sys/config/VKERNEL64" -compact
372.It Pa /dev/vcdX
373.Nm vcd
374device nodes
375.It Pa /dev/vkdX
376.Nm vkd
377device nodes
378.It Pa /sys/config/VKERNEL64
379.El
380.Pp
381.Nm
382configuration file, for
383.Xr config 8 .
384.Sh CONFIGURATION FILES
385Your virtual kernel is a complete
386.Dx
387system, but you might not want to run all the services a normal kernel runs.
388Here is what a typical virtual kernel's
389.Pa /etc/rc.conf
390file looks like, with some additional possibilities commented out.
391.Bd -literal
392hostname="vkernel"
393network_interfaces="lo0 vke0"
394ifconfig_vke0="DHCP"
395sendmail_enable="NO"
396#syslog_enable="NO"
397blanktime="NO"
398.Ed
399.Sh BOOT DRIVE SELECTION
400You can override the default boot drive selection and filesystem
401using a kernel environment variable. Note that the filesystem
402selected must be compiled into the vkernel and not loaded as
403a module. You need to escape some quotes around the variable data
404to avoid mis-interpretation of the colon in the
405.Fl e
406option. For example:
407.Pp
408.Fl e
409vfs.root.mountfrom=\\"hammer:vkd0s1d\\"
410.Sh DISKLESS OPERATION
411To boot a
412.Nm
413from a NFS root, a number of tunables need to be set:
414.Bl -tag -width indent
415.It Va boot.netif.ip
416IP address to be set in the vkernel interface.
417.It Va boot.netif.netmask
418Netmask for the IP to be set.
419.It Va boot.netif.name
420Network interface name inside the vkernel.
421.It Va boot.nfsroot.server
422Host running
423.Xr nfsd 8 .
424.It Va boot.nfsroot.path
425Host path where a world and distribution
426targets are properly installed.
427.El
428.Pp
429See an example on how to boot a diskless
430.Nm
431in the
432.Sx EXAMPLES
433section.
434.Sh EXAMPLES
435A couple of steps are necessary in order to prepare the system to build and
436run a virtual kernel.
437.Ss Setting up the filesystem
438The
439.Nm
440architecture needs a number of files which reside in
441.Pa /var/vkernel .
442Since these files tend to get rather big and the
443.Pa /var
444partition is usually of limited size, we recommend the directory to be
445created in the
446.Pa /home
447partition with a link to it in
448.Pa /var :
449.Bd -literal
450mkdir -p /home/var.vkernel/boot
451ln -s /home/var.vkernel /var/vkernel
452.Ed
453.Pp
454Next, a filesystem image to be used by the virtual kernel has to be
455created and populated (assuming world has been built previously).
456If the image is created on a UFS filesystem you might want to pre-zero it.
457On a HAMMER filesystem you should just truncate-extend to the image size
458as HAMMER does not re-use data blocks already present in the file.
459.Bd -literal
460vnconfig -c -S 2g -T vn0 /var/vkernel/rootimg.01
461disklabel -r -w vn0s0 auto
462disklabel -e vn0s0 # add `a' partition with fstype `4.2BSD'
463newfs /dev/vn0s0a
464mount /dev/vn0s0a /mnt
465cd /usr/src
466make installworld DESTDIR=/mnt
467cd etc
468make distribution DESTDIR=/mnt
469echo '/dev/vkd0s0a / ufs rw 1 1' >/mnt/etc/fstab
470echo 'proc /proc procfs rw 0 0' >>/mnt/etc/fstab
471.Ed
472.Pp
473Edit
474.Pa /mnt/etc/ttys
475and replace the
476.Li console
477entry with the following line and turn off all other gettys.
478.Bd -literal
479console "/usr/libexec/getty Pc" cons25 on secure
480.Ed
481.Pp
482Replace
483.Li \&Pc
484with
485.Li al.Pc
486if you would like to automatically log in as root.
487.Pp
488Then, unmount the disk.
489.Bd -literal
490umount /mnt
491vnconfig -u vn0
492.Ed
493.Ss Compiling the virtual kernel
494In order to compile a virtual kernel use the
495.Li VKERNEL64
496kernel configuration file residing in
497.Pa /sys/config
498(or a configuration file derived thereof):
499.Bd -literal
500cd /usr/src
501make -DNO_MODULES buildkernel KERNCONF=VKERNEL64
502make -DNO_MODULES installkernel KERNCONF=VKERNEL64 DESTDIR=/var/vkernel
503.Ed
504.Ss Enabling virtual kernel operation
505A special
506.Xr sysctl 8 ,
507.Va vm.vkernel_enable ,
508must be set to enable
509.Nm
510operation:
511.Bd -literal
512sysctl vm.vkernel_enable=1
513.Ed
514.Ss Configuring the network on the host system
515In order to access a network interface of the host system from the
516.Nm ,
517you must add the interface to a
518.Xr bridge 4
519device which will then be passed to the
520.Fl I
521option:
522.Bd -literal
523kldload if_bridge.ko
524kldload if_tap.ko
525ifconfig bridge0 create
526ifconfig bridge0 addm re0 # assuming re0 is the host's interface
527ifconfig bridge0 up
528.Ed
529.Ss Running the kernel
530Finally, the virtual kernel can be run:
531.Bd -literal
532cd /var/vkernel
533\&./boot/kernel/kernel -m 1g -r rootimg.01 -I auto:bridge0
534.Ed
535.Pp
536You can issue the
537.Xr reboot 8 ,
538.Xr halt 8 ,
539or
540.Xr shutdown 8
541commands from inside a virtual kernel.
542After doing a clean shutdown the
543.Xr reboot 8
544command will re-exec the virtual kernel binary while the other two will
545cause the virtual kernel to exit.
546.Ss Diskless operation (vkernel as a NFS client)
547Booting a
548.Nm
549with a
550.Xr vknetd 8
551network configuration. The line continuation backslashes have been
552omitted. For convenience and to reduce confusion I recommend mounting
553the server's remote vkernel root onto the host running the vkernel binary
554using the same path as the NFS mount. It is assumed that a full system
555install has been made to /var/vkernel/root using a kernel KERNCONF=VKERNEL64
556for the kernel build.
557.Bd -literal
558\&/var/vkernel/root/boot/kernel/kernel
559 -m 1g -n 4 -I /var/run/vknet
560 -e boot.netif.ip=10.100.0.2
561 -e boot.netif.netmask=255.255.0.0
562 -e boot.netif.gateway=10.100.0.1
563 -e boot.netif.name=vke0
564 -e boot.nfsroot.server=10.0.0.55
565 -e boot.nfsroot.path=/var/vkernel/root
566.Ed
567.Pp
568In this example vknetd is assumed to have been started as shown below, before
569running the vkernel, using an unbridged TAP configuration routed through
570the host.
571IP forwarding must be turned on, and in this example the server resides
572on a different network accessible to the host executing the vkernel but not
573directly on the vkernel's subnet.
574.Bd -literal
575kldload if_tap
576sysctl net.inet.ip.forwarding=1
577vknetd -t tap0 10.100.0.1/16
578.Ed
579.Pp
580You can run multiple vkernels trivially with the same NFS root as long as
581you assign each one a different IP on the subnet (2, 3, 4, etc). You
582should also be careful with certain directories, particularly /var/run
583and possibly also /var/db depending on what your vkernels are going to be
584doing.
585This can complicate matters with /var/db/pkg.
586.Sh BUILDING THE WORLD UNDER A VKERNEL
587The virtual kernel platform does not have all the header files expected
588by a world build, so the easiest thing to do right now is to specify a
589pc64 (in a 64 bit vkernel) target when building the world under a virtual
590kernel, like this:
591.Bd -literal
592vkernel# make MACHINE_PLATFORM=pc64 buildworld
593vkernel# make MACHINE_PLATFORM=pc64 installworld
594.Ed
595.Sh SEE ALSO
596.Xr vknet 1 ,
597.Xr bridge 4 ,
598.Xr ifmedia 4 ,
599.Xr tap 4 ,
600.Xr vn 4 ,
601.Xr sysctl.conf 5 ,
602.Xr build 7 ,
603.Xr config 8 ,
604.Xr disklabel 8 ,
605.Xr ifconfig 8 ,
606.Xr vknetd 8 ,
607.Xr vnconfig 8
608.Rs
609.%A Aggelos Economopoulos
610.%D March 2007
611.%T "A Peek at the DragonFly Virtual Kernel"
612.Re
613.Sh HISTORY
614Virtual kernels were introduced in
615.Dx 1.7 .
616.Sh AUTHORS
617.An -nosplit
618.An Matt Dillon
619thought up and implemented the
620.Nm
621architecture and wrote the
622.Nm vkd
623device driver.
624.An Sepherosa Ziehau
625wrote the
626.Nm vke
627device driver.
628This manual page was written by
629.An Sascha Wildner .