gitweb.dragonflybsd.org Git - dragonfly.git/atom - sys/kern/sys_generic.c history

kernel - Fix long-standing bug in kqueue backend for poll()

2024-01-23T21:00:54Z

kernel - Fix long-standing bug in kqueue backend for *poll*()

* The poll() family of system calls passes an fds[] array with a
  series of descriptors and event requests.  Our kernel implementation
  uses kqueue but a long standing bug breaks situations where
  more than one fds[] entry for the poll corresponds to the same
  { ident, filter } for kqueue, causing only the last such entry
  to be registered with kqueue and breaking poll().

* Added feature to kqueue to supply further distinctions between
  knotes beyond the nominal { kq, filter, ident } tuple, allowing
  us to fix poll().

* Added a FreeBSD feature where poll() implements an implied POLLHUP
  when events = 0.  This is used by X11 and (perhaps mistakenly) also
  by sshd.  Our poll previous ignored fds[] entries with events = 0.

* Note that sshd can generate poll fds[] arrays with both an events = 0
  and an events = POLLIN for the same descriptor, which broke sshd
  when I initially added the events = 0 support due to the first bug.

  Now with that fixed, sshd works properly.  However it is unclear whether
  the authors of sshd intended events = 0 to detect POLLHUP or not.

Reported-by: servik (missing events = 0 poll feature)
Testing: servik, dillon

[D B] sys/kern/sys_generic.c

poll/select: Fix panic in kqueue backend

2023-03-28T02:11:05Z

poll/select: Fix panic in kqueue backend

* The poll and select system calls use kqueue as a backend and
  attempt to cache active events from prior calls to improve
  performance.

  However, this makes a potential race more likely where in a
  high-concurrency application one thread close()es a descriptor
  that another thread had previously used in a poll/select operation
  and this close() races the later poll/select operation that is
  attempting to remove the kevent.

* The race can sometimes prevent the poll/select kevent copyout
  code from removing previously cached but no-longer-used
  events, because the removal references the events by their
  descriptor rather than directly and the descriptor is no longer
  valid.

  This causes kern_kevent() to loop infinite and hit a panic
  designed to check for that situation.

* Fix the problem by moving the removal of old events from the
  poll/select copyout code into kqueue_scan().  kqueue_scan()
  can detect old unused events using the sequence id that the
  poll/select kernel code stores in the kevent.

[D B] sys/kern/sys_generic.c

kernel - Implement POLLHUP for pipes and filesystem fifos (3)

2021-05-13T17:37:03Z

kernel - Implement POLLHUP for pipes and filesystem fifos (3)

* Add an internal NOTE_HUPONLY flag to allow the poll() system call
  to tell the kevent system that EVFILT_READ should only trigger on
  a HUP and not trigger on read-data-present.

* Linux does not trigger POLLHUP on a half-closed socket, make
  DFly have the same behavior.  POLLHUP is only triggered on a fully-closed
  socket.

* Fix bug where data-present on the pipe, socket, or fifo would trigger an
  EVFILT_READ event when only a HUP is being requested.  This caused our
  poll() implementation to complain about spurious events (which then
  results in incorrect operation).

[D B] sys/kern/sys_generic.c

kernel - Implement POLLHUP for pipes and filesystem fifos (2)

2021-05-13T16:15:27Z

kernel - Implement POLLHUP for pipes and filesystem fifos (2)

* Allow POLLHUP to be requested without POLLIN.

Reported-by: mjg

[D B] sys/kern/sys_generic.c

kernel - Refactor in-kernel system call API to remove bcopy()

2020-07-25T04:25:07Z

kernel - Refactor in-kernel system call API to remove bcopy()

* Change the in-kernel system call prototype to take the
  system call arguments as a separate pointer, and make the
  contents read-only.

  int     sy_call_t (void *);
  int     sy_call_t (struct sysmsg *sysmsg, const void *);

* System calls with 6 arguments or less no longer need to copy
  the arguments from the trapframe to a holding structure.  Instead,
  we simply point into the trapframe.

  The L1 cache footprint will be a bit smaller, but in simple tests
  the results are not noticably faster... maybe 1ns or so
  (roughly 1%).

[D B] sys/kern/sys_generic.c

kernel: Remove some unused variables and a dead sysctl.

2020-07-15T18:41:16Z

kernel: Remove some unused variables and a dead sysctl.

Reported-by: mjg

[D B] sys/kern/sys_generic.c

kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug) (2)

2020-06-06T18:05:53Z

kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug) (2)

* Certain unsupported EV_ERROR events can cause kern_kevent() to
  live-lock, which hits a 'checkloop failed' panic.  Silently
  deregister such events.

* Complain and deregister any kqueue event on behalf of *poll()
  which does not set any poll return flags.

Reported-by: swildner

[D B] sys/kern/sys_generic.c

kernel - Generate POLLHUP for fully disconnected socket

2020-03-18T23:35:39Z

kernel - Generate POLLHUP for fully disconnected socket

* Properly generate POLLHUP for fully disconnected sockets.

  However, there is still a possible issue.  We do not set POLLHUP
  for half-closed sockets and it is really unclear whether we should
  or not once read data has been exhausted.

[D B] sys/kern/sys_generic.c

kernel: Fix spurious M_IOV declarations.

2019-10-18T09:19:44Z

kernel: Fix spurious M_IOV declarations.

 Do not include  unconditionally in  as it only
 hides issues in other headers and source codes.
 * Include  before  in sys_generic.c for M_IOV.

[D B] sys/kern/sys_generic.c

kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf

2018-04-22T00:30:42Z

kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf

* Use an eventcounter and the per-thread fd cache to fix
  bottlenecks in checkfdclosed().  This will work well for
  the vast majority of applications and test benches.

* Batch holdfp*() operations on kqueue collections when implementing
  poll() and select().  This significant improves performance.
  Full scaling not yet achieved, however.

* Increase copyin item batching from 8 to 32 for select() and poll().

* Give the uidinfo structure a pcpu array to hold the posixlocks
  and openfiles count fields, with a rollup contained in the uidinfo
  structure itself.

  This removes numerous global bottlenecks related to open(),
  close(), dup*(), and lockf operations (posixlocks count).

  ui_openfiles will force a rollup on limit reached to be sure
  that the limit was actually reached.  ui_posixlocks stays fairly
  loose.  Each cpu rolls up generally only when the pcpu count exceeds
  +32 or goes below -32.

* Give the proc structure a pcpu array for the same counts, in order
  to properly support seteuid() and such.

* Replace P_ADVLOCK with a char field proc->p_advlock_flag, and
  remove token operations around the field.

[D B] sys/kern/sys_generic.c

kernel - per-thread fd cache, p_fd lock bypass

2018-04-20T15:44:32Z

kernel - per-thread fd cache, p_fd lock bypass

* Implement a per-thread (fd,fp) cache.  Cache hits can keep fp's
  in a held state (avoiding the need to fhold()/fdrop() the ref count),
  and bypasses the p_fd spinlock.  This allows the file pointer structure
  to generally be shared across cpu caches.

* Can cache up to four descriptors in each thread, LRU.  This is the common
  case.  Highly threaded programs tend to focus work on a distinct
  file descriptors in each thread.

* One file descriptor can be cached in up to four threads.  This is
  a significant limitation, though relatively uncommon.  On a cache miss
  the code drops into the normal shared p_fd spinlock lookup.

[D B] sys/kern/sys_generic.c

poll/select: Use 64bit serial for poll/select's kevent.udata.

2017-08-29T06:34:09Z

poll/select: Use 64bit serial for poll/select's kevent.udata.

This fixes the issue mentioned in this commit:
ce4975442fa0524017fb3c1aef93bbe6880ae770

It takes ~200 years for 2.5Ghz cpu to make the 64bit serial wrap,
even if the cpu's speed were 10 times faster tomorrow, it still
would take two decades to make the 64bit serial wrap.

Suggested-by: dillon@

[D B] sys/kern/sys_generic.c

Revert "select: Don't allow unwanted/leftover fds being returned."

2017-08-29T05:51:59Z

Revert "select: Don't allow unwanted/leftover fds being returned."

This reverts commit ce4975442fa0524017fb3c1aef93bbe6880ae770.

[D B] sys/kern/sys_generic.c

poll: Fix inverse test

2017-08-28T14:38:25Z

poll: Fix inverse test

[D B] sys/kern/sys_generic.c

select: Don't allow unwanted/leftover fds being returned.

2017-08-28T13:49:00Z

select: Don't allow unwanted/leftover fds being returned.

The root cause is that the lwp_kqueue_serial will wrap pretty quickly,
6 seconds on my laptop, if the select(2) is polling, either due to heavy
workload or 0 timeout.  The POC test:
https://leaf.dragonflybsd.org/~sephe/select_wrap.c

Fixing this issue by saving the original fd_sets and do additional
kevent filtering before return the fd to userland.

poll(2) suffers the similar issue and will be fixed in later commit.

Reported-by: many

[D B] sys/kern/sys_generic.c

kernel - Incidental MPLOCK removal

2017-01-11T17:47:56Z

kernel - Incidental MPLOCK removal

* Remove misc #include  statements that are no longer needed.

* Replace mplock with acct_lock in kern_acct.c

* Replace mplock with msg_token in sysv_msg.c

* Replace mplock with p->p_token in the profiling code.

[D B] sys/kern/sys_generic.c

kernel - Fix bug in socket_wait() (used by samba)

2016-12-01T20:50:01Z

kernel - Fix bug in socket_wait() (used by samba)

* socket_wait() was not properly initializing the temporary kqueue
  structure, resulting in corruption that prevented the event from being
  properly deleted.

* Fixes mount_smbfs panic.

Reported-by: dflyum

[D B] sys/kern/sys_generic.c

kernel - Remove mplock from KTRACE paths

2016-09-27T21:39:03Z

kernel - Remove mplock from KTRACE paths

* The mplock is no longer needed for KTRACE, ktrace writes are serialized
  by the vnode lock and everything else is MPSAFE.  Note that this change
  means that even fast system calls may interleave in the ktrace output on
  a multi-threaded program.

* Fix ktrace bug related to vkernels.  The syscall2() code assumes that
  no tokens are held on entry (since we are coming from usermode), but
  a system call made from the vkernel may actually be nested inside
  another syscall2().  The mplock KTRACE held caused this to assert in
  the nested syscall2().  The removal of the mplock from the ktrace path
  also fixes this bug.

* Minor comment adjustment in vm_vmspace.c.

Reported-by: tuxillo

[D B] sys/kern/sys_generic.c

kernel - Implement ppoll system call with precise microseconds timeout.

2015-12-12T20:26:09Z

kernel - Implement ppoll system call with precise microseconds timeout.

* Implement a maximum timeout of 2000s, because systimer(9) just accepts an
  int timeout in microseconds.

* Add kern.kv_sleep_threshold sysctl variable for tuning the threshold for
  the ppoll sleep duration (in nanoseconds), below which we will
  busy-loop with DELAY instead of using tsleep for waiting.

[D B] sys/kern/sys_generic.c

kernel - Return EINVAL on negative timeout to poll()

2015-05-26T18:03:32Z

kernel - Return EINVAL on negative timeout to poll()

* Return EINVAL if poll() is called with a negative
  timeout, as per manual page.

Submitted-by: stateless

[D B] sys/kern/sys_generic.c