<sys/time.h>: Add 3rd arg to timespecadd()/sub() and make them public. * Switch to the three argument versions of the timespecadd() and timespecsub() macros. These are now the predominant ones. FreeBSD, OpenBSD, NetBSD, and Solaris (albeit only for the kernel) have them. * Make those macros public too. This allows for a number of cleanups where they were defined locally. Pointed-out-by: zrj Reviewed-by: dillon
kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions. The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD because it tended to get forgotten and the code would compile anyway with the wrong constants. And since MB_WAIT and MB_DONTWAIT ended up as ocflags for an objcache_get() or objcache_reclaimlist call (which use M_WAITOK and M_NOWAIT), it was just one big converting back and forth with some sanitization in between. This commit allows M_* again for the mbuf functions and keeps the sanitizing as it was before: when M_WAITOK is among the passed flags, objcache functions will be called with M_WAITOK and when it is absent, they will be called with M_NOWAIT. All other flags are scrubbed by the MB_OCFLAG() macro which does the same as the former MBTOM(). Approved-by: dillon
tcp: Implement asynchronous pru_connect This is mainly used to improve TCP nonblocking connect(2) performance. Before this commit the user space thread uses nonblocking connect(2) will have to wait for the netisr completes the SYN output. This could be performance hit for nonblocking connect(2). First, the user space thread is put into sleep, even if the connect(2) is nonblocking. Second, it does not make too much sense for nonblocking connect(2) to wait for the SYN output. TCP's asynchronous pru_connect implementation will set ISCONNECTING before dispatching netmsg to netisr0. The errors like EADDRNOTAVAIL, i.e. out of local port space, will be notified through kevent(2) or getsockopt(2) SOL_SOCKET/SO_ERROR. NFS and other kernel code still use old synchronized pru_connect. This commit only affects connect(2) syscall. Sysctl node kern.ipc.soconnect_async is added to enable and disable asynchronous pru_connect. It is enabled by default. The performance measurement (i7-2600 w/ bnx(4)), using tools/tools/netrate/accept_connect/kq_connect_client: kq_connect_client -4 SERVADDR -p SERVPORT -i 8 -c 32 -l 30 (8 processes, each creates 32 connections simultaniously, run 30 secs) 16 runs average: asynchronous pru_connect synchronized pru_connect 220979.89 conns/s 189106.88 conns/s This commit gives ~16% performance improvement for nonblocking connect(2)
network - MP socket free & abort interactions, so_state * Add so_refs and ref-count the socket structure to deal with MP races on sofree(). * Ref the socket structure for all soabort() operations (they are usually asynchronous). The netmsg_pru_abort() handler will sofree() the ref after calling the protocol stack's abort function. * Use atomic ops to set and clear bits in so_state, because it is modified by both the fronttend and the backend. * Remove numerous critical sections that are no longer effective. * Protect the accept queues with so_rcv.ssb_token. * Protect after-the-fact calls to soisdisconnected() with a soreference() to avoid use-after-free cases. * Wrap unix domain, mroute, div, raw, and key sockets/protocols with their own private tokens.
kernel - Fix numerous MP issues with sockbuf's and ssb_flags part 1/2 * Use atomic ops for ssb_flags handling * Use atomic_cmpset_int() to interlock SSB_LOCK with SSB_WANT, and SSB_WAIT with SSB_WAKEUP. Note in particular that WAIT/WAKEUP assumes the client side of the socket is single threaded via an appropriate lock. This needs more work.
kernel - Tear out socket polling * Remove existing (now legacy) code that implements socket polling, kq filters are now the "One True (and only) Way" * Implement a new socket_wait() that can be used to wait for data to arrive on a single descriptor with an optional timeout.
AMD64 - Refactor uio_resid and size_t assumptions. * uio_resid changed from int to size_t (size_t == unsigned long equivalent). * size_t assumptions in most kernel code has been refactored to operate in a 64 bit environment. * In addition, the 2G limitation for VM related system calls such as mmap() has been removed in 32 bit environments. Note however that because read() and write() return ssize_t, these functions are still limited to a 2G byte count in 32 bit environments.
Give the sockbuf structure its own header file and supporting source file. Move all sockbuf-specific functions from kern/uipc_socket2.c into the new kern/uipc_sockbuf.c and move all the sockbuf-specific structures from sys/socketvar.h to sys/sockbuf.h. Change the sockbuf structure to only contain those fields required to properly management a chain of mbufs. Create a signalsockbuf structure to hold the remaining fields (e.g. selinfo, mbmax, etc). Change the so_rcv and so_snd structures in the struct socket from a sockbuf to a signalsockbuf. Remove the recently added sorecv_direct structure which was being used to provide a direct mbuf path to consumers for socket I/O. Use the newly revamped sockbuf base structure instead. This gives mbuf consumers direct access to the sockbuf API functions for use outside of a struct socket. This will also allow new API functions to be added to the sockbuf interface to ease the job of parsing data out of chained mbufs.
Add kernel syscall support for explicit blocking and non-blocking I/O regardless of the setting applied to the file pointer. send/sendmsg/sendto/recv/recvmsg/recfrom: New MSG_ flags defined in sys/socket.h may be passed to these functions to override the settings applied to the file pointer on a per-I/O basis. MSG_FBLOCKING - Force the operation to be blocking MSG_FNONBLOCKING- Force the operation to be non-blocking pread/preadv/pwrite/pwritev: These system calls have been renamed and wrappers will be added to libc. The new system calls are prefixed with a double underscore (like getcwd vs __getcwd) and include an additional flags argument. The new flags are defined in sys/fcntl.h and may be used to override settings applied to the file pointer on a per-I/O basis. Additionally, the internal __ versions of these functions now accept an offset of -1 to mean 'degenerate into a read/readv/write/writev' (i.e. use the offset in the file pointer and update it on completion). O_FBLOCKING - Force the operation to be blocking O_FNONBLOCKING - Force the operation to be non-blocking O_FAPPEND - Force the write operation to append (to a regular file) O_FOFFSET - (implied of the offset != -1) - offset is valid O_FSYNCWRITE - Force a synchronous write O_FASYNCWRITE - Force an asynchronous write O_FUNBUFFERED - Force an unbuffered operation (O_DIRECT) O_FBUFFERED - Force a buffered operation (negate O_DIRECT) If the flags do not specify an operation (e.g. neither FBLOCKING or FNONBLOCKING are set), then the settings in the file pointer are used. The original system calls will become wrappers in libc, without the flags arguments. The new system calls will be made available to libc_r to allow it to perform non-blocking I/O without having to mess with a descriptor's file flags. NOTE: the new __pread and __pwrite system calls are backwards compatible with the originals due to a pad byte that libc always set to 0. The new __preadv and __pwritev system calls are NOT backwards compatible, but since they were added to HEAD just two months ago I have decided to not renumber them either. NOTE: The subrev has been bumped to 1.5.4 and installworld will refuse to install if you are not running at least a 1.5.4 kernel.