gitweb.dragonflybsd.org Git - dragonfly.git/atom - sys/sys/proc.h history

kernel - Add per-process capability-based restrictions

2023-10-13T02:55:19Z

kernel - Add per-process capability-based restrictions

* This new system allows userland to set capability restrictions which
  turns off numerous kernel features and root accesses.  These restrictions
  are inherited by sub-processes recursively.  Once set, restrictions cannot
  be removed.

  Basic restrictions that mimic an unadorned jail can be enabled without
  creating a jail, but generally speaking real security also requires
  creating a chrooted filesystem topology, and a jail is still needed
  to really segregate processes from each other.  If you do so, however,
  you can (for example) disable mount/umount and most global root-only
  features.

* Add new system calls and a manual page for syscap_get(2) and syscap_set(2)

* Add sys/caps.h

* Add the "setcaps" userland utility and manual page.

* Remove priv.9 and the priv_check infrastructure, replacing it with
  a newly designed caps infrastructure.

* The intention is to add path restriction lists and similar features to
  improve jailess security in the near future, and to optimize the
  priv_check code.

[D B] sys/sys/proc.h

kernel/libc: Remove the old vmm code.

2021-09-07T16:18:25Z

kernel/libc: Remove the old vmm code.

Removes the kernel code and two system calls.

Bump __DragonFly_version too.

Reviewed-by: aly, dillon

[D B] sys/sys/proc.h

kernel - Fix /dev/fd/N and clean up the old dup error-code-driven path

2021-03-20T02:27:11Z

kernel - Fix /dev/fd/N and clean up the old dup error-code-driven path

* When opening /dev/fd/N, replicate the file pointer for descriptors
  that represent vnodes instead of dup()ing.  This ensures that the seek
  offset and other fp-related elements are not shared unexpectedly.

* Refactor the open() path to allow dev_dopen() to replace the
  struct file by passing a struct file ** instead of a struct file *.
  This removes old error-code-based hacks.

* This fixes the shared seek position that fexecve() was operating with
  due to its use of /dev/fd/N for scripts.

Reported-by: aly

[D B] sys/sys/proc.h

kernel - Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS

2020-11-15T19:41:26Z

kernel - Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS

* Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS to procctl(2).

  This follows the linux and freebsd semantics, however it should be noted
  that since the child of a fork() clears the setting, these semantics have
  a fork/exit race between an exiting parent and a child which has not
  yet setup its death wish.

* Also fix a number of signal ranging checks.

Requested-by: zrj

[D B] sys/sys/proc.h

kernel - Add P_SWAPPEDOUT flag back in for deprecated compat

2020-07-25T19:11:13Z

kernel - Add P_SWAPPEDOUT flag back in for deprecated compat

* Add this flag back in for backwards compatibility with some
  ports.  The flag doesn't do anything any more.

[D B] sys/sys/proc.h

kernel - Remove P_SWAPPEDOUT flag and paging mode

2020-07-25T05:57:11Z

kernel - Remove P_SWAPPEDOUT flag and paging mode

* This code basically no longer functions in any worthwhile or
  useful manner, remove it.

  The code harkens back to a time when machines had very little
  memory and had to time-share processes by actually descheduling
  them for long periods of time (like 20 seconds) and paging out
  the related memory.

  In modern times the chooser algorithm just doesn't work well
  because we can no longer assume that programs with large
  memory footprints can be demoted.

* In modern times machines have sufficient memory to rely almost
  entirely on the VM fault and pageout scan.  The latencies caused
  by fault-ins are usually sufficient to demote paging-intensive
  processes while allowing the machine to continue to function.

  If functionality need to be added back in, it can be added back
  in on the fault path and not here.

[D B] sys/sys/proc.h

kernel - Refactor sysclock_t from 32 to 64 bits

2020-06-09T16:08:16Z

kernel - Refactor sysclock_t from 32 to 64 bits

* Refactor the core cpu timer API, changing sysclock_t from 32
  to 64 bits.  Provide a full 64-bit count from all sources.

* Implement muldivu64() using gcc's 128-bit integer type.  This
  functions takes three 64-bit valus, performs (a * b) / d
  using a 128-bit intermediate calculation, and returns a 64-bit
  result.

  Change all timer scaling functions to use this function which
  effectively gives systimers the capability of handling any
  timeout that fits 64 bits for the timer's resolution.

* Remove TSC frequency scaling, it is no longer needed.  The
  TSC timer is now used at its full resolution.

* Use atomic_fcmpset_long() instead of a clock spinlock when
  updating the msb bits for hardware timer sources less than
  64 bits wide.

* Properly recalculate existing systimers when the clock source
  is changed.  Existing systimers were not being recalculated,
  leading to the system failing to boot when time sources had
  radically different clock frequencies.

[D B] sys/sys/proc.h

kernel: Clean up a few headers a bit.

2020-05-22T08:06:07Z

kernel: Clean up a few headers a bit.

Remove 'extern' and parameter names from function prototypes.

[D B] sys/sys/proc.h

kernel - Allow 8254 timer to be forced, clean-up user/sys/intr/idle

2020-03-11T18:50:36Z

kernel - Allow 8254 timer to be forced, clean-up user/sys/intr/idle

* Allows the 8254 timer to be forced on for machines which do not
  support the LAPIC timer during deep-sleep.  Fix an assertion that
  occurs in this situation.

  hw.i8254.intr_disable="0"

* Adjust the statclock to calculate user/sys/intr/idle time
  properly when the clock interrupt occurs from an interrupt
  thread instead of from a hard interrupt.

  Basically when the clock interrupt occurs from an interrupt thread,
  we have to look at curthread->td_preempted instead of curthread.

  In addition RQF_INTPEND will be set across the call due to the way
  processing works and we have to look at the bitmask of interrupt
  sources instead of this bit.

Reported-by: CuteLarva

[D B] sys/sys/proc.h

kernel - Simple code path optimizations

2020-02-26T06:05:43Z

kernel - Simple code path optimizations

* Add __read_mostly and __read_frequently to numerous variables as
  appropriate to reduce unnecessary cache line ping-ponging.

* Adjust conditionals in the syscall code with __predict_true/false
  to clean up the execution path.

[D B] sys/sys/proc.h

kernel - Fix rare wait*() deadlock

2020-02-15T05:37:32Z

kernel - Fix rare wait*() deadlock

* It is possible for the kernel to deadlock two processes or process
  threads attempting to wait*() on the same pid.

* Fix by adding a bit of magic to give ownership of the reaping
  operation to one of the waiters, and causing the other waiters
  to skip/reject that pid.

[D B] sys/sys/proc.h

: Fix legacy inclusion issues.

2019-11-12T12:43:30Z

: Fix legacy inclusion issues.

 Sadly this header was not being included properly for a long time.
 Make it publicly accessible and put a big NOTE how to do it properly for
 future codes.  This makes the  the only other header that
 defines _KERNEL_STRUCTURES to solve long term inclusion order issues.
 Previous variant was hiding implicit dependencies, adjust netstat(1).

 Any changes in this header breaks a lot of ports, try not to change any
 of the structs.  Also make sure KERN_SIGTRAMP has public visibility.

 While there remove two defines that were not used since introduced in
 5dfd06ac148512faf075c4e399e8485fd955578f

[D B] sys/sys/proc.h

Add .

2019-11-11T14:46:30Z

Add .

 Collect and gather all scatter cpumask bits to correct headers. This
 cleans up the namespace and simplifies platform handling in asm macros.
 The cpumask_t together with its macros is already non MI feature that is
 used in userland utilities, libraries, kernel scheduler and syscalls.
 It deserves sys/ header.  Adjust syscalls.master and rerun sysent.

 While there, fix an issue in ports that set POSIX env, but has
 implementation of setting thread names through pthread_set_name_np().

[D B] sys/sys/proc.h

kernel and libc - Reimplement lwp_setname*() using /dev/lpmap

2019-11-12T19:00:42Z

kernel and libc - Reimplement lwp_setname*() using /dev/lpmap

* Generally speaking we are implementing the features necessary
  to allow per-thread titling set via pthread_set_name_np() to
  show up in 'ps' output, and to use lpmap to make it fast.

* The lwp_setname() system call now stores the title in
  lpmap->thread_title[].

* Implement a libc fast-path for lwp_setname() using lpmap.
  If called more than 10 times, libc will use lpmap for any
  further calls, which omits the need to make any system calls.

* setproctitle() now stores the title in upmap->proc_title[]
  instead of replacing proc->p_args.  proc->p_args is now no
  longer modified from its original contents.

* The kernel now includes lpmap->thread_title[] in the following
  priority order when retrieving the process command line:

  lpmap->thread_title[]		User-supplied thread title, if not empty
  upmap->proc_title[]		User-supplied process title, if not empty
  proc->p_args			Original process arguments (no longer modified)

* Put the TID in /dev/lpmap for convenient access

* Enhance the KERN_PROC_ARGS sysctl to allow the TID to be specified.
  The sysctl now accepts { KERN_PROC, KERN_PROC_ARGS, pid, tid }
  in addition to the existing { KERN_PROC, KERN_PROC_ARGS, pid }
  mechanism.

  Enhance libkvm to use the new feature.  libkvm will fall-back to
  the old version if necessary.

[D B] sys/sys/proc.h

kernel - sigblockall()/sigunblockall() support (per thread shared page)

2019-11-12T01:06:55Z

kernel - sigblockall()/sigunblockall() support (per thread shared page)

* Implement /dev/lpmap, a per-thread RW shared page between userland
  and the kernel.  Each thread in the process will receive a unique
  shared page for communication with the kernel when memory-mapping
  /dev/lpmap and can access varous variables via this map.

* The current thread's TID is retained for both fork() and vfork().
  Previously it was only retained for vfork().  This avoids userland
  code confusion for any bits and pieces that are indexed based on the
  TID.

* Implement support for a per-thread block-all-signals feature that
  does not require any system calls (see next commit to libc).  The
  functions will be called sigblockall() and sigunblockall().

  The lpmap->blockallsigs variable prevents normal signals from being
  dispatched.  They will still be queued to the LWP as per normal.
  The behavior is not quite that of a signal mask when dealing with
  global signals.

  The low 31 bits represents a recursion counter, allowing recursive
  use of the functions.  The high bit (bit 31) is set by the kernel
  if a signal was prevented from being dispatched.  When userland decrements
  the counter to 0 (the low 31 bits), it can check and clear bit 31 and
  if found to be set userland can then make a dummy 'real' system call
  to cause pending signals to be delivered.

  Synchronous TRAPs (e.g. kernel-generated SIGFPE, SIGSEGV, etc) are not
  affected by this feature and will still be dispatched synchronously.

* PThreads is expected to unmap the mapped page upon thread exit.
  The kernel will force-unmap the page upon thread exit if pthreads
  does not.

  XXX needs work - currently if the page has not been faulted in
  the kernel has no visbility into the mapping and will not unmap it,
  but neither will it get confused if the address is accessed.  To
  be fixed soon.  Because if we don't, programs using LWP primitives
  instead of pthreads might not realize that libc has mapped the page.

* The TID is reset to 1 on a successful exec*()

* On [v]fork(), if lpmap exists for the current thread, the kernel will
  copy the lpmap->blockallsigs value to the lpmap for the new thread
  in the new process.  This way sigblock*() state is retained across
  the [v]fork().

  This feature not only reduces code confusion in userland, it also
  allows [v]fork() to be implemented by the userland program in a way
  that ensures no signal races in either the parent or the new child
  process until it is ready for them.

* The implementation leverages our vm_map_backing extents by having
  the per-thread memory mappings indexed within the lwp.  This allows
  the lwp to remove the mappings when it exits (since not doing so
  would result in a wild pmap entry and kernel memory disclosure).

* The implementation currently delays instantiation of the mapped
  page(s) and some side structures until the first fault.

  XXX this will have to be changed.

[D B] sys/sys/proc.h

drm - Refactor task_struct and implement mm_struct

2019-08-07T05:34:17Z

drm - Refactor task_struct and implement mm_struct

* Change td->td_linux_task from an embedded structure to a pointer.

* Add p->p_linux_mm to support tracking mm_struct's.

* Change the 'current' macro to test td->td_linux_task and call
  a support function, linux_task_alloc(), if it is NULL.

* Implement callbacks from the main kernel for thread exit and
  process exit to support functions that drop the td_linux_task and
  p_linux_mm pointers.

  Initialize and clear these callbacks in the module load/unload
  in drm_drv.c

* Implement required support functions in linux_sched.c

[D B] sys/sys/proc.h

kernel - Refactor tty_token, fix SMP performance issues

2018-10-04T17:22:35Z

kernel - Refactor tty_token, fix SMP performance issues

* Remove most uses of tty_token in favor of per-tty tp->t_token.
  This is particularly important for removing bottlenecks related to PTYs,
  which are used all over the place.  tty_token remains in a few places
  managing overall registration and global list manipulation.

* tty structures are now required to be persistent.  Implement a sepearate
  ttyinit() function.  Continue to allow ttyregister() and ttyunregister()
  calls, but these no longer presume destruction of the structure.

* Refactor ttymalloc() to take a **tty pointer and interlock allocations.
  Allocations are intended to be one-time.  ttymalloc() only requires the
  tty_token for initial allocations.

* Remove all critical section use that was combined with tty_token and
  tp->t_token.  Leave only the tokens.  The critical sections were
  hold-overs going all the way back to pre-SMP days.

* syscons now gets its own token, vga_token.  The ISA VGA code and
  the framebuffer code also now use this token instead of tty_token.

* The keyboard subsystem now uses kbd_token instead of tty_token.

* A few remaining serial-like devices (snp, nmdm) also get their own
  tokens, as well as use the now required tp->t_token.

* Remove use of tty_token in the session management code.  This fixes
  a niggling performance path since sessions almost universally go
  hand-in-hand with fork/exec/exit sequences.  Instead we use the
  already-existing per-hash session token.

[D B] sys/sys/proc.h

: Fix unused macro name (number == bit number) and comment.

2018-04-24T12:49:10Z

: Fix unused macro name (number == bit number) and comment.

[D B] sys/sys/proc.h

kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf

2018-04-22T00:30:42Z

kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf

* Use an eventcounter and the per-thread fd cache to fix
  bottlenecks in checkfdclosed().  This will work well for
  the vast majority of applications and test benches.

* Batch holdfp*() operations on kqueue collections when implementing
  poll() and select().  This significant improves performance.
  Full scaling not yet achieved, however.

* Increase copyin item batching from 8 to 32 for select() and poll().

* Give the uidinfo structure a pcpu array to hold the posixlocks
  and openfiles count fields, with a rollup contained in the uidinfo
  structure itself.

  This removes numerous global bottlenecks related to open(),
  close(), dup*(), and lockf operations (posixlocks count).

  ui_openfiles will force a rollup on limit reached to be sure
  that the limit was actually reached.  ui_posixlocks stays fairly
  loose.  Each cpu rolls up generally only when the pcpu count exceeds
  +32 or goes below -32.

* Give the proc structure a pcpu array for the same counts, in order
  to properly support seteuid() and such.

* Replace P_ADVLOCK with a char field proc->p_advlock_flag, and
  remove token operations around the field.

[D B] sys/sys/proc.h

kernel - Add p_ppid

2017-10-19T19:09:56Z

kernel - Add p_ppid

* We have proc->p_pptr, but still needed a shared p->p_token to access
  the ppid.  Buckle under and add proc->p_ppid as well so getppid() can
  run lockless.

* Adjust the vmtotal proc scan to use a shared proc->p_token instead
  of an exclusive one.

[D B] sys/sys/proc.h