From: Franco Fichtner Date: Wed, 1 Jan 2014 10:40:18 +0000 (+0100) Subject: netmap: revamped documentation X-Git-Tag: v3.9.0~900 X-Git-Url: https://gitweb.dragonflybsd.org/~tuxillo/dragonfly.git/commitdiff_plain/7c417b37f6ba47f2de12ee2ae90d047c179b8bfc netmap: revamped documentation --- diff --git a/share/man/man4/netmap.4 b/share/man/man4/netmap.4 index 01602d5be8..2f7eec78e8 100644 --- a/share/man/man4/netmap.4 +++ b/share/man/man4/netmap.4 @@ -27,7 +27,7 @@ .\" .\" $FreeBSD: head/share/man/man4/netmap.4 228017 2011-11-27 06:55:57Z gjb $ .\" -.Dd October 18, 2013 +.Dd December 26, 2013 .Dt NETMAP 4 .Os .Sh NAME @@ -40,13 +40,15 @@ is a framework for extremely fast and efficient packet I/O (reaching 14.88 Mpps with a single core at less than 1 GHz) for both userspace and kernel clients. -Userspace clients can use the netmap API +Userspace clients can use the +.Nm +API to send and receive raw packets through physical interfaces or ports of the -.Xr VALE 4 +.Xr vale 4 switch. .Pp -.Nm VALE +.Xr vale 4 is a very fast (reaching 20 Mpps per port) and modular software switch, implemented within the kernel, which can interconnect @@ -55,11 +57,12 @@ virtual ports, physical devices, and the native host stack. .Nm uses a memory mapped region to share packet buffers, descriptors and queues with the kernel. -Simple -.Pa ioctl()s -are used to bind interfaces/ports to file descriptors and +.Xr ioctl 2 +is used to bind interfaces/ports to file descriptors and implement non-blocking I/O, whereas blocking I/O uses -.Pa select()/poll() . +.Xr select 2 +and +.Xr poll 2 . .Nm can exploit the parallelism in multiqueue devices and multicore systems. @@ -72,82 +75,96 @@ a generic emulation layer is available to implement the API on top of unmodified device drivers, at the price of reduced performance (but still better than what can be achieved with -sockets or BPF/pcap). +.Xr socket 2 , +.Xr bpf 4 , +or +.Xr pcap 3 ) . .Pp For a list of devices with native .Nm -support, see the end of this manual page. -.Pp -.Sh OPERATION - THE NETMAP API +support, see section +.Sx SUPPORTED INTERFACES +at the end of this manual page. +.Sh OPERATING THE API .Nm -clients must first -.Pa open("/dev/netmap") , -and then issue an -.Pa ioctl(fd, NIOCREGIF, (struct nmreq *)arg) -to bind the file descriptor to a specific interface or port. +clients must first issue the following code to open the device +node and to bind the file descriptor to a specific interface or port: +.Bd -literal -offset indent +fd = open("/dev/netmap"); +ioctl(fd, NIOCREGIF, (struct nmreq *)arg); +.Ed +.Pp .Nm has multiple modes of operation controlled by the content of the -.Pa struct nmreq -passed to the -.Pa ioctl() . +.Vt struct nmreq +passed to +.Xr ioctl 2 . In particular, the -.Em nr_name +.Va nr_name field specifies whether the client operates on a physical network interface or on a port of a -.Nm VALE -switch, as indicated below. Additional fields in the -.Pa struct nmreq +.Xr vale 4 +switch, as indicated below. +Additional fields in the +.Vt struct nmreq control the details of operation. -.Pp .Bl -tag -width XXXX -.It Dv Interface name (e.g. 'em0', 'eth1', ... ) +.It Sy Interface name (e.g. 'em0', 'eth1', ...) The data path of the interface is disconnected from the host stack. Depending on additional arguments, the file descriptor is bound to the NIC (one or all queues), or to the host stack. -.It Dv valeXXX:YYY (arbitrary XXX and YYY) -The file descriptor is bound to port YYY of a VALE switch called XXX, +.It Sy valeXXX:YYY (arbitrary XXX and YYY) +The file descriptor is bound to port YYY of a +.Xr vale 4 +switch called XXX, where XXX and YYY are arbitrary alphanumeric strings. The string cannot exceed IFNAMSIZ characters, and YYY cannot matching the name of any existing interface. .Pp The switch and the port are created if not existing. -.It Dv valeXXX:ifname (ifname is an existing interface) +.It Sy valeXXX:ifname (ifname is an existing interface) Flags in the argument control whether the physical interface -(and optionally the corrisponding host stack endpoint) -are connected or disconnected from the VALE switch named XXX. +(and optionally the corresponding host stack endpoint) +are connected or disconnected from the +.Xr vale 4 +switch named XXX. .Pp -In this case the -.Pa ioctl() -is used only for configuring the VALE switch, typically through the -.Nm vale-ctl +In this case +.Xr ioctl 2 +is used only for configuring the +.Xr vale 4 +switch, typically through the +.Cm vale-ctl command. -The file descriptor cannot be used for I/O, and should be -.Pa close()d -after issuing the -.Pa ioctl(). +The file descriptor cannot be used for I/O, and should be passed to +.Xr close 2 +after issuing +.Xr ioctl 2 . .El .Pp The binding can be removed (and the interface returns to regular operation, or the virtual port destroyed) with a -.Pa close() +.Xr close 2 on the file descriptor. .Pp The processes owning the file descriptor can then -.Pa mmap() +.Xr mmap 2 the memory region that contains pre-allocated buffers, descriptors and queues, and use them to read/write raw packets. Non blocking I/O is done with special -.Pa ioctl()'s , -whereas the file descriptor can be passed to -.Pa select()/poll() +.Xr ioctl 2 +commands, whereas the file descriptor can be passed to +.Xr select 2 +and +.Xr poll 2 to be notified about incoming packet or available transmit buffers. .Ss DATA STRUCTURES The data structures in the mmapped memory are described below (see -.Xr sys/net/netmap.h +.In net/netmap.h for reference). All physical devices operating in .Nm @@ -160,22 +177,22 @@ Virtual ports instead use separate memory regions, shared only with the kernel. .Pp All references between the shared data structure -are relative (offsets or indexes). Some macros help converting +are relative (offsets or indexes). +Some macros help converting them into actual pointers. -.Bl -tag -width XXX -.It Dv struct netmap_if (one per interface) +.Bl -tag -width XXXX +.It Sy struct netmap_if (one per interface) indicates the number of rings supported by an interface, their sizes, and the offsets of the -.Pa netmap_rings -associated to the interface. +.Nm +rings associated to the interface. .Pp -.Pa struct netmap_if +.Vt struct netmap_if is at offset -.Pa nr_offset -in the shared memory region is indicated by the -field in the structure returned by the -.Pa NIOCREGIF -(see below). +.Va nr_offset +in the shared memory region indicated by the +field in the structure returned by +.Dv NIOCREGIF . .Bd -literal struct netmap_if { char ni_name[IFNAMSIZ]; /* name of the interface. */ @@ -185,25 +202,28 @@ struct netmap_if { const ssize_t ring_ofs[]; /* offset of tx and rx rings */ }; .Ed -.It Dv struct netmap_ring (one per ring) +.It Sy struct netmap_ring (one per ring) Contains the positions in the transmit and receive rings to synchronize the kernel and the application, and an array of -.Pa slots -describing the buffers. -'reserved' is used in receive rings to tell the kernel the -number of slots after 'cur' that are still in usr -indicates how many slots starting from 'cur' +.Nm +slots describing the buffers. +.Va reserved +is used in receive rings to tell the kernel the number of slots after +.Va cur +that are still in use indicates how many slots starting from +.Va cur the +.\" XXX Fix and finish this sentence? .Pp Each physical interface has one -.Pa netmap_ring +.Vt struct netmap_ring for each hardware transmit and receive ring, plus one extra transmit and one receive structure that connect to the host stack. .Bd -literal struct netmap_ring { - const ssize_t buf_ofs; /* see details */ + const ssize_t buf_ofs; /* see details */ const uint32_t num_slots; /* number of slots in the ring */ uint32_t avail; /* number of usable slots */ uint32_t cur; /* 'current' read/write index */ @@ -219,15 +239,21 @@ struct netmap_ring { } .Ed .Pp -In transmit rings, after a system call 'cur' indicates -the first slot that can be used for transmissions, -and 'avail' reports how many of them are available. -Before the next netmap-related system call on the file +In transmit rings, after a system call +.Va cur +indicates the first slot that can be used for transmissions, and +.Va avail +reports how many of them are available. +Before the next +.Nm Ns -related +system call on the file descriptor, the application should fill buffers and -slots with data, and update 'cur' and 'avail' +slots with data, and update +.Va cur +and +.Va avail accordingly, as shown in the figure below: .Bd -literal - cur |----- avail ---| (after syscall) v @@ -237,17 +263,26 @@ accordingly, as shown in the figure below: |-- avail --| (before syscall) cur .Ed - -In receive rings, after a system call 'cur' indicates -the first slot that contains a valid packet, -and 'avail' reports how many of them are available. -Before the next netmap-related system call on the file +.Pp +In receive rings, after a system call +.Va cur +indicates the first slot that contains a valid packet, and +.Va avail +reports how many of them are available. +Before the next +.Nm Ns -related +system call on the file descriptor, the application can process buffers and release them to the kernel updating -'cur' and 'avail' accordingly, as shown in the figure below. -Receive rings have an additional field called 'reserved' -to indicate how many buffers before 'cur' are still -under processing and cannot be released. +.Va cur +and +.Va avail +accordingly, as shown in the figure below. +Receive rings have an additional field called +.Va reserved +to indicate how many buffers before +.Va cur +cannot be released because they are still being processed. .Bd -literal cur |-res-|-- avail --| (after syscall) @@ -257,9 +292,8 @@ under processing and cannot be released. |res|--|flags >> 8) & 0xff) +#define NS_RFRAGS(_slot) (((_slot)->flags >> 8) & 0xff) uint64_t ptr; /* buffer address (indirect buffers) */ }; .Ed +.Pp The flags control how the the buffer associated to the slot should be managed. -.It Dv packet buffers +.It Sy packet buffers are normally fixed size (2 Kbyte) buffers allocated by the kernel -that contain packet data. Buffers addresses are computed through -macros. +that contain packet data. .El .Pp -.Bl -tag -width XXX -Some macros support the access to objects in the shared memory -region. In particular, -.It NETMAP_TXRING(nifp, i) -.It NETMAP_RXRING(nifp, i) -return the address of the i-th transmit and receive ring, -respectively, whereas -.It NETMAP_BUF(ring, buf_idx) -returns the address of the buffer with index buf_idx +Addresses are computed through macros in order to +support access to objects in the shared memory region, e.g.: +.Bl -tag -width ".Fn NETMAP_BUF ring buf_idx" +.It Fn NETMAP_TXRING nifp i +Returns the address of the +.Va i Ns -th +transmit ring. +.It Fn NETMAP_RXRING nifp i +Returns the address of the +.Va i Ns -th +receive ring. +.It Fn NETMAP_BUF ring buf_idx +Returns the address of the buffer with index +.Va buf_idx (which can be part of any ring for the given interface). .El -.Pp +.Ss FLAGS Normally, buffers are associated to slots when interfaces are bound, and one packet is fully contained in a single buffer. -Clients can however modify the mapping using the +Clients can, however, modify the mapping using the following flags: -.Ss FLAGS -.Bl -tag -width XXX -.It NS_BUF_CHANGED -indicates that the buf_idx in the slot has changed. +.Bl -tag -width ".Fn NS_RFRAGS slot" +.It Dv NS_BUF_CHANGED +indicates that the +.Va buf_idx +in the slot has changed. This can be useful if the client wants to implement some form of zero-copy forwarding (e.g. by passing buffers from an input interface to an output interface), or needs to process packets out of order. .Pp The flag MUST be used whenever the buffer index is changed. -.It NS_REPORT +.It Dv NS_REPORT indicates that we want to be woken up when this buffer -has been transmitted. This reduces performance but insures +has been transmitted. +This reduces performance but insures a prompt notification when a buffer has been sent. Normally, .Nm notifies transmit completions in batches, hence signals -can be delayed indefinitely. However, we need such notifications +may be delayed indefinitely. +However, we need such notifications before closing a descriptor. -.It NS_FORWARD -When the device is open in 'transparent' mode, -the client can mark slots in receive rings with this flag. +.It Dv NS_FORWARD +When the device is opened in +.Sq transparent +mode, the client can mark slots in receive rings with this flag. For all marked slots, marked packets are forwarded to the other endpoint at the next system call, thus restoring (in a selective way) the connection between the NIC and the host stack. -.It NS_NO_LEARN +.It Dv NS_NO_LEARN tells the forwarding code that the SRC MAC address for this -packet should not be used in the learning bridge -.It NS_INDIRECT -indicates that the packet's payload is not in the netmap -supplied buffer, but in a user-supplied buffer whose -user virtual address is in the 'ptr' field of the slot. +packet should not be used in the learning bridge. +.It Dv NS_INDIRECT +indicates that the packet's payload is not in the +.Nm Ns -supplied +buffer, but in a user-supplied buffer whose +user virtual address is in the +.Va ptr +field of the slot. The size can reach 65535 bytes. -.Em This is only supported on the transmit ring of virtual ports -.It NS_MOREFRAG +This is only supported on the transmit ring of virtual ports. +.It Dv NS_MOREFRAG indicates that the packet continues with subsequent buffers; -the last buffer in a packet must have the flag clear. +the last buffer in a packet must have the flag cleared. The maximum length of a chain is 64 buffers. -.Em This is only supported on virtual ports -.It NS_RFRAGS(slot) +This is only supported on virtual ports. +.It Fn NS_RFRAGS slot on receive rings, returns the number of remaining buffers in a packet, including this one. -Slots with a value greater than 1 also have NS_MOREFRAG set. -The length refers to the individual buffer, there is no -field for the total length. +Slots with a value greater than 1 also have +.Dv NS_MOREFRAG +set. +The length refers to the individual buffer; +there is no field for the total length. .Pp -On transmit rings, if NS_DST is set, it is passed to the lookup +On transmit rings, if +.Dv NS_DST +is set, it is passed to the lookup function, which can use it e.g. as the index of the destination port instead of doing an address lookup. .El -.Sh IOCTLS +.Sh SYSTEM CALLS .Nm -supports some ioctl() to synchronize the state of the rings -between the kernel and the user processes, plus some +supports +.Xr ioctl 2 +commands to synchronize the state of the rings +between the kernel and the user processes, as well as to query and configure the interface. -The former do not require any argument, whereas the latter -use a -.Pa struct nmreq +The former do not require any argument, whereas the latter use a +.Vt struct nmreq defined as follows: .Bd -literal struct nmreq { @@ -378,100 +429,129 @@ struct nmreq { uint16_t nr_tx_rings; /* number of tx rings */ uint16_t nr_rx_rings; /* number of tx rings */ uint16_t nr_ringid; /* ring(s) we care about */ -#define NETMAP_HW_RING 0x4000 /* low bits indicate one hw ring */ -#define NETMAP_SW_RING 0x2000 /* we process the sw ring */ +#define NETMAP_HW_RING 0x4000 /* low bits indicate one hw ring */ +#define NETMAP_SW_RING 0x2000 /* we process the sw ring */ #define NETMAP_NO_TX_POLL 0x1000 /* no gratuitous txsync on poll */ -#define NETMAP_RING_MASK 0xfff /* the actual ring number */ - uint16_t nr_cmd; -#define NETMAP_BDG_ATTACH 1 /* attach the NIC */ -#define NETMAP_BDG_DETACH 2 /* detach the NIC */ -#define NETMAP_BDG_LOOKUP_REG 3 /* register lookup function */ -#define NETMAP_BDG_LIST 4 /* get bridge's info */ - uint16_t nr_arg1; - uint16_t nr_arg2; - uint32_t spare2[3]; +#define NETMAP_RING_MASK 0xfff /* the actual ring number */ + uint16_t nr_cmd; +#define NETMAP_BDG_ATTACH 1 /* attach the NIC */ +#define NETMAP_BDG_DETACH 2 /* detach the NIC */ +#define NETMAP_BDG_LOOKUP_REG 3 /* register lookup function */ +#define NETMAP_BDG_LIST 4 /* get bridge's info */ + uint16_t nr_arg1; + uint16_t nr_arg2; + uint32_t spare2[3]; }; - .Ed +.Pp A device descriptor obtained through .Pa /dev/netmap -also supports the ioctl supported by network devices. -.Pp -The netmap-specific +supports the .Xr ioctl 2 -command codes below are defined in -.In net/netmap.h -and are: -.Bl -tag -width XXXX +command codes supported by network devices, as well as +specific command codes defined in +.In net/netmap.h . +These specific command codes are as follows: +.Bl -tag -width ".Dv NIOCTXSYNC" .It Dv NIOCGINFO -returns EINVAL if the named device does not support netmap. -Otherwise, it returns 0 and (advisory) information +returns +.Dv EINVAL +if the named device does not support +.Nm . +Otherwise, it returns zero and advisory information about the interface. Note that all the information below can change before the -interface is actually put in netmap mode. +interface is actually put into +.Nm +mode. .Pp -.Pa nr_memsize -indicates the size of the netmap -memory region. Physical devices all share the same memory region, -whereas VALE ports may have independent regions for each port. -These sizes can be set through system-wise sysctl variables. -.Pa nr_tx_slots, nr_rx_slots -indicate the size of transmit and receive rings. -.Pa nr_tx_rings, nr_rx_rings -indicate the number of transmit -and receive rings. -Both ring number and sizes may be configured at runtime -using interface-specific functions (e.g. -.Pa sysctl -or -.Pa ethtool . +.Va nr_memsize +indicates the size of the +.Nm +memory region. +Physical devices all share the same memory region, whereas +.Xr vale 4 +ports may have independent regions for each port. +These sizes can be set through system-wide +.Xr sysctl 8 +variables. +.Va nr_tx_slots +and +.Va nr_rx_slots +indicate the size of transmit and receive rings, respectively. +.Va nr_tx_rings +and +.Va nr_rx_rings +indicate the number of transmit and receive rings, respectively. +Both ring number and size may be configured at runtime +using interface-specific functions (e.g.\& +.Xr sysctl 8 +on BSD, or +.Xr ethtool 8 +on Linux). .It Dv NIOCREGIF -puts the interface named in nr_name into netmap mode, disconnecting -it from the host stack, and/or defines which rings are controlled -through this file descriptor. -On return, it gives the same info as NIOCGINFO, and nr_ringid +puts the interface specified via +.Va nr_name +into +.Nm +mode, disconnecting it from the host stack, and/or defines which +rings are controlled through this file descriptor. +On return, it gives the same info as +.Dv NIOCGINFO , +and +.Va nr_ringid indicates the identity of the rings controlled through the file descriptor. .Pp -Possible values for nr_ringid are -.Bl -tag -width XXXXX +Possible values for +.Va nr_ringid +are as follows: +.Bl -tag -width "Dv NETMAP_HW_RING + i" .It 0 -default, all hardware rings -.It NETMAP_SW_RING -the ``host rings'' connecting to the host stack -.It NETMAP_HW_RING + i -the i-th hardware ring +default; all hardware rings +.It Dv NETMAP_SW_RING +.Dq host rings +connecting to the host stack +.It Dv NETMAP_HW_RING + i +i-th hardware ring .El +.Pp By default, a -.Nm poll +.Xr poll 2 or -.Nm select +.Xr select 2 call pushes out any pending packets on the transmit ring, even if -no write events are specified. -The feature can be disabled by or-ing -.Nm NETMAP_NO_TX_SYNC -to nr_ringid. -But normally you should keep this feature unless you are using +no write events were specified. +The feature can be disabled by OR-ing the flag +.Dv NETMAP_NO_TX_SYNC +into +.Va nr_ringid . +Normally, you should keep this feature unless you are using separate file descriptors for the send and receive rings, because -otherwise packets are pushed out only if NETMAP_TXSYNC is called, -or the send queue is full. +otherwise packets are pushed out only if +.Dv NETMAP_TXSYNC +is called, or the send queue is full. .Pp -.Pa NIOCREGIF +.Dv NIOCREGIF can be used multiple times to change the association of a file descriptor to a ring pair, always within the same device. .Pp When registering a virtual interface that is dynamically created to a .Xr vale 4 switch, we can specify the desired number of rings (1 by default, -and currently up to 16) on it using nr_tx_rings and nr_rx_rings fields. +and currently up to 16) by setting the +.Va nr_tx_rings +and +.Va nr_rx_rings +fields accordingly. .It Dv NIOCTXSYNC -tells the hardware of new packets to transmit, and updates the +tells the hardware about new packets to transmit, and updates the number of slots available for transmission. .It Dv NIOCRXSYNC -tells the hardware of consumed packets, and asks for newly available +tells the hardware about consumed packets, and asks for newly available packets. .El -.Sh SYSTEM CALLS +.Pp .Nm uses .Xr select 2 @@ -483,40 +563,60 @@ to map memory. .Pp Applications may need to create threads and bind them to specific cores to improve performance, using standard -OS primitives, see +OS primitives; see .Xr pthread 3 . In particular, .Xr pthread_setaffinity_np 3 may be of use. .Sh EXAMPLES -The following code implements a traffic generator -.Pp -.Bd -literal -compact -#include +The following code implements a traffic generator: +.Bd -literal +#include +#include +#include +#include +#include #include -struct netmap_if *nifp; -struct netmap_ring *ring; -struct nmreq nmr; -fd = open("/dev/netmap", O_RDWR); -bzero(&nmr, sizeof(nmr)); -strcpy(nmr.nr_name, "ix0"); -nmr.nm_version = NETMAP_API; -ioctl(fd, NIOCREGIF, &nmr); -p = mmap(0, nmr.nr_memsize, fd); -nifp = NETMAP_IF(p, nmr.nr_offset); -ring = NETMAP_TXRING(nifp, 0); -fds.fd = fd; -fds.events = POLLOUT; -for (;;) { - poll(list, 1, -1); - for ( ; ring->avail > 0 ; ring->avail--) { - i = ring->cur; - buf = NETMAP_BUF(ring, ring->slot[i].buf_index); - ... prepare packet in buf ... - ring->slot[i].len = ... packet length ... - ring->cur = NETMAP_RING_NEXT(ring, i); - } +#include +#include +#include + +int +main(void) +{ + struct netmap_if *nifp; + struct netmap_ring *ring; + struct pollfd fds; + struct nmreq nmr; + void *p; + int fd; + + fd = open("/dev/netmap", O_RDWR); + bzero(&nmr, sizeof(nmr)); + strcpy(nmr.nr_name, "ix0"); + nmr.nr_version = NETMAP_API; + ioctl(fd, NIOCREGIF, &nmr); + p = mmap(0, nmr.nr_memsize, PROT_WRITE | PROT_READ, + MAP_SHARED, fd, 0); + nifp = NETMAP_IF(p, nmr.nr_offset); + ring = NETMAP_TXRING(nifp, 0); + fds.fd = fd; + fds.events = POLLOUT; + + for (;;) { + poll(&fds, 1, -1); + for (; ring->avail > 0; ring->avail--) { + uint32_t i; + void *buf; + + i = ring->cur; + buf = NETMAP_BUF(ring, ring->slot[i].buf_idx); + /* prepare packet in buf */ + ring->slot[i].len = 0; /* packet length */ + ring->cur = NETMAP_RING_NEXT(ring, i); + } + } } .Ed .Sh SUPPORTED INTERFACES @@ -526,17 +626,26 @@ supports the following interfaces: .Xr igb 4 , .Xr ixgbe 4 , .Xr lem 4 , -.Xr re 4 +and +.Xr re 4 . .Sh SEE ALSO .Xr vale 4 +.Rs +.%A Luigi Rizzo +.%T Revisiting network I/O APIs: the netmap framework +.%J Communications of the ACM +.%V 55 (3) +.%P 45-51 +.%D March 2012 +.Re +.Rs +.%A Luigi Rizzo +.%T netmap: a novel framework for fast packet I/O +.%D June 2012 +.%O USENIX ATC '12, Boston +.Re .Pp -http://info.iet.unipi.it/~luigi/netmap/ -.Pp -Luigi Rizzo, Revisiting network I/O APIs: the netmap framework, -Communications of the ACM, 55 (3), pp.45-51, March 2012 -.Pp -Luigi Rizzo, netmap: a novel framework for fast packet I/O, -Usenix ATC'12, June 2012, Boston +.Lk http://info.iet.unipi.it/~luigi/netmap/ .Sh AUTHORS .An -nosplit The @@ -548,10 +657,11 @@ and further extended with help from .An Matteo Landi , .An Gaetano Catalli , .An Giuseppe Lettieri , +and .An Vincenzo Maffione . .Pp .Nm and -.Nm VALE -have been funded by the European Commission within FP7 Projects +.Xr vale 4 +have been funded by the European Commission within the FP7 Projects CHANGE (257422) and OPENLAB (287581). diff --git a/share/man/man4/vale.4 b/share/man/man4/vale.4 index 750ceaf08a..237cc968e1 100644 --- a/share/man/man4/vale.4 +++ b/share/man/man4/vale.4 @@ -27,7 +27,7 @@ .\" .\" $FreeBSD: head/share/man/man4/vale.4 228017 2011-11-27 06:55:57Z gjb $ .\" -.Dd July 27, 2012 +.Dd December 26, 2013 .Dt VALE 4 .Os .Sh NAME @@ -38,18 +38,18 @@ .Sh DESCRIPTION .Nm is a feature of the -.Nm netmap -module that implements multiple Virtual switches that can -be used to interconnect netmap clients, including traffic -sources and sinks, packet forwarders, userspace firewalls, -and so on. +.Xr netmap 4 +module that implements multiple virtual switches that can +be used to interconnect +.Xr netmap 4 +clients, including traffic sources and sinks, packet forwarders, +userspace firewalls, and so on. .Pp .Nm is implemented completely in software, and is extremely fast. On a modern machine it can move almost 20 Million packets per second (Mpps) per core with small frames, and about 70 Gbit/s with 1500 byte frames. -.Pp .Sh OPERATION .Nm dynamically creates switches and ports as client connect @@ -73,105 +73,106 @@ constraint being that the full name must fit within 16 characters. .Pp .Nm -ports can be physical network interfaces that support +ports can be physical network interfaces that support the .Xr netmap 4 API by specifying the interface name for -.Pa [port]. +.Pa [port] . See -.Nm OPERATION -section in .Xr netmap 4 for details of the naming rule. .Pp -Physical interfaces are attached using -.Pa NIOCGREGIF +Physical interfaces are attached using the +.Dv NIOCGREGIF command of -.Pa ioctl(), +.Xr ioctl 2 , and -.Pa NETMAP_BDG_ATTACH -at -.Em nr_cmd +.Dv NETMAP_BDG_ATTACH +for the +.Va nr_cmd field in -.Em struct nmreq . +.Vt struct nmreq . The corresponding host stack can also be attached to the bridge, specifying -.Pa NETMAP_BDG_HOST +.Dv NETMAP_BDG_HOST in -.Em nr_arg1 . +.Va nr_arg1 . To detach the interface from the bridge, -.Pa NETMAP_BDG_DETACH -is used instead of NETMAP_BDG_ATTACH. +.Dv NETMAP_BDG_DETACH +is used instead of +.Dv NETMAP_BDG_ATTACH . The host stack is also detached from the bridge at the same -time if it has been attached. +time if it had been attached. .Pp -Physical interfaces are treated as system configuration; -they are kept being attached even after the configuring process dies, -and detached by any process. +Physical interfaces are treated as a system configuration; +they remain attached even after the configuring process died, +and can be detached by any other process. .Pp Once a physical interface is attached, this interface is no longer -available to be directly accessed by netmap clients (user processes) or to be -attached by another bridge. -On the other hand, when any netmap client holds the physical interface, +available to be directly accessed by +.Xr netmap 4 +clients (user processes) or to be attached by another bridge. +On the other hand, when a +.Xr netmap 4 +client holds the physical interface, this interface cannot be attached to a bridge. .Pp -.Pa NETMAP_BDG_LIST -subcommand in nr_cmd of -.Em struct nmreq -is used to obtain bridge and port -information. There are two modes of how it works; +.Va NETMAP_BDG_LIST +subcommand in +.Va nr_cmd +of +.Vt struct nmreq +is used to obtain bridge and port information. +There are two modes of how this works. If any -.Em nr_name +.Va nr_name starting from non '\\0' is provided, -.Pa ioctl() -returning -indicates the position of -the named interface. -This position is represented by an index of the bridge and the port, and -put in -.Em nr_arg1 +.Xr ioctl 2 +returning indicates the position of the named interface. +This position is represented by an index of the bridge and the port, +and put into the +.Va nr_arg1 and -.Em nr_arg2 -fields, respectively. If the named interface does not exist, -.Pa ioctl() +.Va nr_arg2 +fields, respectively. +If the named interface does not exist, +.Xr ioctl 2 returns -.Pa EINVAL . +.Dv EINVAL . .Pp If -.Em nr_name +.Va nr_name starting from '\\0' is provided, -.Pa ioctl() +.Xr ioctl 2 returning indicates the first existing interface on and after the position specified in -.Em nr_arg1 +.Va nr_arg1 and -.Em nr_arg2. +.Va nr_arg2 . If the caller specified a port index greater than the highest index of the ports, it is recognized as port index 0 of the -next bridge -( -.Em nr_arg1 +next bridge ( +.Va nr_arg1 + 1, -.Em nr_arg2 +.Va nr_arg2 = 0). -.Pa ioctl() +.Xr ioctl 2 returns -.Pa EINVAL +.Dv EINVAL if the given position is higher than that of any existing interface. On successful return of -.Pa ioctl() , +.Xr ioctl 2 , the interface name is also stored in -.Em nr_name . -.Pa NETMAP_BDG_LIST -is always used with -.Pa NIOCGINFO +.Va nr_name . +.Dv NETMAP_BDG_LIST +is always used with the +.Dv NIOCGINFO command of -.Pa ioctl() +.Xr ioctl 2 . .Pp Below is an example of printing all the existing ports walking through all the bridges. - -.Bd -literal -compact +.Bd -literal struct nmreq nmr; int fd = open("/dev/netmap", O_RDWR); @@ -185,39 +186,39 @@ for (; !ioctl(fd, NIOCGINFO, &nmr); nmr.nr_arg2++) { nmr.nr_name[0] = '\\0'; } .Ed -.Pp -See -.Xr netmap 4 -for details on the API. .Ss LIMITS .Nm currently supports up to 8 switches, 254 ports per switch, -1024 buffers per port. These hard limits will be -changed to sysctl variables in future releases. +1024 buffers per port. +These hard limits will be changed to +.Xr sysctl 8 +variables in future releases. .Pp Attaching the host stack to the bridge imposes significant performance -degradation when many packets are forwarded to the host stack by either -unicast or broadcast. -This is because every single packet going to the host stack causes mbuf -allocation in the same thread context as one forwarding packets. -.Pp +degradation when many packets are forwarded to the host stack. +This is because each packet forwarded to the host stack causes +.Xr mbuf 9 +allocation in the same thread context. .Sh SYSCTL VARIABLES .Nm -uses the following sysctl variables to control operation: -.Bl -tag -width 12 +uses the following +.Xr sysctl 8 +variables to control operation: +.Bl -tag -width "dev.netmap.verbose" .It dev.netmap.bridge The maximum number of packets processed internally in each iteration. Defaults to 1024, use lower values to trade latency with throughput. -.Pp .It dev.netmap.verbose Set to non-zero values to enable in-kernel diagnostics. .El -.Pp .Sh EXAMPLES Create one switch, with a traffic generator connected to one -port, and a netmap-enabled tcpdump instance on another port: +port, and a +.Xr netmap 4 Ns -enabled +.Xr tcpdump 1 +instance on another port: .Bd -literal -offset indent tcpdump -ni vale-a:1 & pkt-gen -i vale-a:0 -f tx & @@ -233,21 +234,23 @@ qemu -net nic -net netmap,ifname=vale-2:d ... & .Ed .Sh SEE ALSO .Xr netmap 4 -.Pp -.Xr http://info.iet.unipi.it/~luigi/vale/ -.Pp -Luigi Rizzo, Giuseppe Lettieri: VALE, a switched ethernet for virtual machines, -June 2012, http://info.iet.unipi.it/~luigi/vale/ +.Rs +.%A Luigi Rizzo +.%A Giuseppe Lettieri +.%T VALE, a switched ethernet for virtual machines +.%J http://info.iet.unipi.it/~luigi/vale/ +.%D June 2012 +.Re .Sh AUTHORS .An -nosplit The .Nm -switch has been designed and implemented in 2012 by +switch has been designed and implemented in 2012 by .An Luigi Rizzo and .An Giuseppe Lettieri at the Universita` di Pisa. .Pp .Nm -has been funded by the European Commission within FP7 Projects +has been funded by the European Commission within the FP7 Projects CHANGE (257422) and OPENLAB (287581).