Reorganize the way machine architectures are handled. Consolidate the kernel configurations into a single generic directory. Move machine-specific Makefile's and loader scripts into the appropriate architecture directory. Kernel and module builds also generally add sys/arch to the include path so source files that include architecture-specific headers do not have to be adjusted. sys/<ARCH> -> sys/arch/<ARCH> sys/conf/*.<ARCH> -> sys/arch/<ARCH>/conf/*.<ARCH> sys/<ARCH>/conf/<KERNEL> -> sys/config/<KERNEL>
The thread/proc pointer argument in the VFS subsystem originally existed for... well, I'm not sure *WHY* it originally existed when most of the time the pointer couldn't be anything other then curthread or curproc or the code wouldn't work. This is particularly true of lockmgr locks. Remove the pointer argument from all VOP_*() functions, all fileops functions, and most ioctl functions.
Remove buf->b_saveaddr, assert that vmapbuf() is only called on pbuf's. Pass the user pointer and length to vmapbuf() rather then having it try to pull the information out of the buffer. vmapbuf() is now responsible for setting b_data, b_bufsize, and b_bcount. Also fix a bug in cam_periph_mapmem(). The procedure was failing to unmap earlier vmapped bufs if later vmapbuf() calls in the loop failed.
Do not set the pcb_ext field until the private TSS has been completely initialized, otherwise an interrupt can come along and preempt, then attempt to restore using the incompletely initialized TSS. Do not free the pcb_ext data until after we have switched back to the common TSS, otherwise a blockage in kmem_free() may cause a premature thread switch with the now invalid private TSS. Do not depend on need_user_resched() to set a private TSS prior to returning from a system call, it may optimize itself into a NOP and not actually set the private TSS prior to our return to userland. Instead, active the new private TSS manually by doing a forced thread switch to ourselves. Reported-by: Sascha Wildner <saw@online.de>
Fix a process exit/wait race. The wait*() code was making a faulty test to determine that the exiting process had completely exited and was no longer running. Testing the TDF_RUNNING flag is insufficient because an exiting process may block at various points after becoming a Zombie, but before it deschedules itself for the last time. Add a new flag, TDF_EXITING, which is set just prior to a thread descheduling itself for the last time. The reaper then checks that TDF_EXITING is set and TDF_RUNNING is clear. Fix a second faulty test in both the exit and the thread cpu migration code. If a thread gets preempted, TDF_RUNNING will be temporarily cleared, so testing TDF_RUNNING is not sufficient by itself. We must also test the TDF_PREEMPT_LOCK flag to be sure that it is also clear. So the grand result is that to really be sure the zombie process has been completely descheduled and is no longer running or will ever run again, the TDF_EXITING, TDF_RUNNING, *and* TDF_PREEMPT_LOCK flags must be tested and all must be clear except for TDF_EXITING. It should be noted that TDF_RUNNING on the previously scheduled process is always cleared AFTER we have context-switched into the next scheduled thread or the idle thread, so seeing a cleared TDF_RUNNING along with the appropriate state for the other flags does in fact guarentee that the thread in question is no longer using its stack in any way. Reported-by: Stefan Krueger <skrueger@meinberlikomm.de>
Allow 'options SMP' *WITHOUT* 'options APIC_IO'. That is, an ability to produce an SMP-capable kernel that uses the PIC/ICU instead of the IO APICs for interrupt routing. SMP boxes with broken BIOSes (namely my Shuttle XPC SN95G5) could very well have serious interrupt routing problems when operating in IO APIC mode. One solution is to not use the IO APICs. That is, to run only the Local APICs for the SMP management. * Don't conditionalize NIDT. Just set it to 256 * Make the ICU interrupt code MP SAFE. This primarily means using the imen_spinlock to protect accesses to icu_imen. * When running SMP without APIC_IO, set the LAPIC TPR to prevent unintentional interrupts. Leave LINT0 enabled (normally with APIC_IO LINT0 is disabled when the IO APICs are activated). LINT0 is the virtual wire between the 8259 and LAPIC 0. * Get rid of NRSVIDT. Just use IDT_OFFSET instead. * Clean up all the APIC_IO tests which should have been SMP tests, and all the SMP tests which should have been APIC_IO tests. Explicitly #ifdef out all code related to the IO APICs when APIC_IO is not set.
Major cleanup of the interrupt registration subsystem. * Collapse the separate registrations in the kernel interrupt thread and i386 layers into a single machine-independant kernel interrupt thread layer in kern/kern_intr.c. Get rid of the i386 layer's 'MUX' code entirely. * Have the interrupt vector assembly code (icu_vector.s and apic_vector.s) call a machine-independant function in the kernel interrupt thread layer to figure out how to process an interrupt. * Move a lot of assembly into the new C interrupt processing function. * Add support for INTR_MPSAFE. If a device driver registers an interrupt as being MPSAFE, the Big Giant Lock will not be obtained or required. * Temporarily just schedule the ithread if a FAST interrupt cannot be executed due to its serializer being locked. * Add LWKT serialization support for a non-blocking 'try' function. * Get rid of ointhand2_t and adjust all old ISA code to use inthand2_t. * Supply a frame pointer as a pointer rather then embedding it on th stack. * Allow FAST and SLOW interrupts to be mixed on the same IRQ, though this will not necessarily result in optimal operation. * Remove direct APIC/ICU vector calls from the apic/icu vector assembly code. Everything goes through the new routine in kern/kern_intr.c now. * Add a new flag, INTR_NOPOLL. Interrupts registered with the flag will not be polled by the upcoming emergency general interrupt polling sysctl (e.g. ATA cannot be safely polled due to the way ATA register access interferes with ATA DMA). * Remove most of the distinction in the i386 assembly layers between FAST and SLOW interrupts (part 1/2). * Revamp the interrupt name array returned to userland to list multiple drivers associated with the same IRQ.
Remove all remaining SPL code. Replace the mtd_cpl field in the machine dependant thread structure and the CPL field in the interrupt stack frame with dummies (so structural sizes do not change, yet). Remove all interrupt handler SPL mask and mask pointer code. Remove all spl*() functions except for splz(). Note that doreti uses a temporary CPL mask internally to accumulate a bitmap of FAST interrupts which could not be executed due to not being able to get the BGL. This mask has no outside visibility. Note that gd_fpending and gd_ipending still exist to support critical section interrupt deferment.
Implement TLS support, tls manual pages, and link the umtx and tls manual pages together. TLS stands for 'thread local storage' and is used to support efficient userland threading and threaded data access models. Three TLS segments are supported in order to (eventually) support GCC3's __thread qualifier. David Xu's thread library only uses one descriptor for now. The system calls implement a mostly machine-independant API which return architecture-specific results. Rather then pass the actual descriptor structure, which unnecessarily pollutes the userland implementation, we pass a more generic (base,size) and the system call returns the %gs load value for IA32. For AMD64 and other architectures, the returned value will be something for those architectures. The current low level assembly support is not as efficient as it could be, but it is good enough for now. The heavy weight switch code for processes does the work. The light weight switch code for pure kernel threads has not been changed (since the kernel doesn't use TLS descriptors we can just ignore them). Based on work by David Xu <davidxu@freebsd.org> and Matthew Dillon <dillon@backplane.com>
Add a intrmask_t pointer to register_int() and register_swi(), and make the interrupt thread loop set the spl for the duration of the call to the handler. Allow interrupt threads to be run with or without a critical section based on the kern.int_use_crit_section sysctl. The default is to conservatively run with a critical section for now. Turning this off will cause cpu usage in interrupts to be properly accounted for by top, systat, and ps.
Try to close an occassional VM page related panic that is believed to occur due to the VM page queues or free lists being indirectly manipulated by interrupts that are not protected by splvm(). Do this by replacing splvm()'s with critical sections in a number of places. Note: some of this work bled over into the "VFS messaging/interfacing work stage 8/99" commit.
Correct a bug in the last FPU optimized bcopy commit. The user FPU state was being corrupted by interrupts. Fix the bug by implementing a feature described as a missif in the original FreeBSD comments... add a pointer to the FP saved state in the thread structure so routines which 'borrow' the FP unit can simply revector the pointer temporarily to avoid corruption of the original user FP state. The MMX_*_BLOCK macros in bcopy.s have also been simplified somewhat. We can simplify them even more (in the future) by reserving FPU save space in the per-cpu structure instead of on the stack.
Do some minor critical path performance improvements in the scheduler and at the user/system boundary. Avoid some unnecessary segment prefix ops, remove some unnecessary memory ops by using more optimal critical section inlines, and use 32 bit arithmatic instead of 64 bit arithmatic when calculating system tick overheads in userret(). This saves a whopping 5ns worth of syscall overhead, which just proves how silly I am sometimes.