jail: Simplify a bit by using the new BIT64 sysctl functions - No functional changes. - The per-jail settings have been renamed to match the new capability constants. The default settings will be renamed soon too. - Fix a missing prison chflags check in ufs_settattr() and ext2fs_setattr().
jail - Rework sysctl configuration variables - Jail sysctls are now jail-specific so that different jails can have different settings. Each jail will have its own subtree which can be operated directly with sysctl(8). Naming convention: jail.<n>.<setting> - All previous sysctls are now moved to 'jail.defaults' and they are used as a template for any newly created jail. Example: # jls JID Hostname Path IPs 2 t02.local /jails/02 10.0.0.3 1 t01.local /jails/01 10.0.0.2 # sysctl jail jail.jailed: 0 jail.list: 2 t02.local /jails/02 10.0.0.3 1 t01.local /jails/01 10.0.0.2 jail.defaults.allow_raw_sockets: 0 jail.defaults.chflags_allowed: 0 jail.defaults.sysvipc_allowed: 0 jail.defaults.socket_unixiproute_only: 1 jail.defaults.set_hostname_allowed: 1 jail.1.set_hostname_allowed: 1 jail.1.socket_unixiproute_only: 1 jail.1.sysvipc_allowed: 0 jail.1.chflags_allowed: 0 jail.1.allow_raw_sockets: 0 jail.2.set_hostname_allowed: 1 jail.2.socket_unixiproute_only: 1 jail.2.sysvipc_allowed: 0 jail.2.chflags_allowed: 0 jail.2.allow_raw_sockets: 0 # sysctl jail.2.allow_raw_sockets=1 jail.2.allow_raw_sockets: 0 -> 1 # jexec 2 ping -q -c 1 10.0.0.1 PING 10.0.0.1 (10.0.0.1): 56 data bytes --- 10.0.0.1 ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.766/0.766/0.766/0.000 ms # jexec 1 ping -q -c 1 10.0.0.1 ping: socket: Operation not permitted # service jail stop Stopping jails: t01.local t02.local. # sysctl jail jail.jailed: 0 jail.defaults.allow_raw_sockets: 0 jail.defaults.chflags_allowed: 0 jail.defaults.sysvipc_allowed: 0 jail.defaults.socket_unixiproute_only: 1 jail.defaults.set_hostname_allowed: 1
kernel/sysctl: Switch kern.osrevision to showing __DragonFly_version. It was tied to a historic define (BSD) that started as 199506 and was sporadically bumped in the past until 200708. Revert the define back to 199506, as it is not supposed to be bumped, and add a comment about this (taken from NetBSD). We cannot remove these defines completely because at least some are used by ports.
kernel - Make certain sysctl's unlocked * Automatically flag all SYSCTL_[U]INT, [U]LONG, and [U]QUAD definitions CTLFLAG_NOLOCK. These do not have to be locked. Will improve program startup performance a tad. * Flag a ton of other sysctls used in program startup and also 'ps' CTLFLAG_NOLOCK. * For kern.hostname, interlock changes using XLOCK and allow the sysctl to run NOLOCK, avoiding unnecessary cache line bouncing.
kernel ELF: Reimplement Elf Branding, .note.ABI-tag Static executables built with the GNU gold linker are not recognized as valid ELF binaries, although the same binaries built by ld linker did work. It was suspected that gold was keying off the .note.ABI-tag. Primitive support for this tag had been added years ago from NetBSD, and later Corecode disabled it except for ELF program headers. I removed all the .note.ABI-tag support that had been added after DragonFly forked from FreeBSD and ported over FreeBSD's branding logic and .note.ABI-tag support. In particular, the branding logic a lot cleaner now, and will easily support 32-bit binaries on x86_64 should this feature arise in DragonFly in the future. With these changes, gold can now build static executables that are recognized and execute. The Linuxolator had to be modified to work with the new branding scheme as well (i386 only).
Move the following entries from kern to security - kern.ps_showallprocs - kern.ps_showallthreads - kern.unprivileged_read_msgbuf - kern.hardlink_check_uid - kern.hardlink_check_gid This is only a cosmetic change helping users to find the right sysctls more easily. And it could help if we want to add more security related function (eg MAC framework etc). While here add missing description for three of them.
Rename /usr/src/sys/machine to /usr/src/sys/platform. Give the platform name its own variable, MACHINE_PLATFORM, instead of trying to use MACHINE to name it. Adjust the build infrastructure to match. Revert MACHINE back to its original definition and remove uname shims. This removes confusion with third party software. This means a pc32 build has MACHINE=i386 and MACHINE_ARCH=i386, and a vkernel build also has MACHINE=i386 and MACHINE_ARCH=i386. The new MACHINE_PLAFORM would be pc32 for a pc32 build, and vkernel for a vkernel build. Adjust all kernel configuration files to specify platform, machine, AND machine_arch.
We want the virtual kernel to be default-secure. Disable writes to kernel memory and disable module loading by default when running a virtual kernel. Run the virtual kernel with the -U option (for Unsecure) to run with these enabled. Reads are still allowed since the virtual kernel's memory does not contain any compromising data from the real kernel.
- Unhook usr.bin/uname from boot strap tools building, because it is not used as boot strap tool at all. - Add hw.machin_uname, which is "i386" on pc32(machine)/i386(cpu). It is used by uname(1) -m option and uname(3), since most third party application understand "i386" much better than "pc32". In uname(3), fallback to hw.machine, if hw.machine_uname does not exist, so we can stay compatible with old kernel which does not have hw.machine_uname. Implementation-suggestions-from: dillon@ Approved-by: dillon@
Major kernel build infrastructure changes, part 1/2 (sys). These changes are primarily designed to create a 2-layer machine and cpu build hierarchy in order to support virtual kernel builds in the near term and future porting efforts in the long term. * Split arch/ into a set of platform architectures under machine/ and a set of cpu architectures under cpu/. All platform and cpu header files will be accessible via <machine/*.h>. Platform header files may override cpu header files (the platform header file then typically #include's the cpu header file). * Any cpu header files that are not overridden will be copied directly into /usr/include/machine/, allowing the platform to omit those header files (not have to create degenerate forwarding header files). * All source files access platform and cpu architecture files via the <machine/*.h> path. The <cpu/*.h> path should only be used by platform header files when including the lower level cpu header files. * Require both the 'machine' and the 'machine_arch' directives in the kernel config file. * When building modules in the presence of a kernel config, use the IF files, use*.h files, and opt*.h files provided by the kernel config and do not generate them in each module's object directory. This streamlines the module build considerably.
POSIX lock resource limit part 3/4 This splits "struct lockf" into the general book-keeping of ranges and blocked request and the "struct lockf_range" which constists of the data for a specific range. Adjust the interface of lf_advlock to remove one level of pointer indirection and embedded "struct lockf" directly in the inodes. Don't mess with wait channels any more. Change the algorithm for determing locks to a more direct approach, which both simplifies the lock acquisition and proper book-keeping of the number of ranges currently used. The later is necessary to prevent local resource exhaustion. The code is not fully malloc block-safe, but as good or bad as the old code. Add the kernel part of the posixlocks rlimit. This is the maximum number of POSIX lock ranges any user can acquire. These numbers are tracked for each user and process and checked at lock/unlock time. If a process changes uid, its locks are transfered to the new uid which can effectivly boost that number above the limit. This is based on the patch set from Devon H. O'Dell <dodell@sitetronics.com> for the general infrastructure with some adjustment to better integrate with the new lockf code.