VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK.
NOTE: unionfs and nullfs are temporarily broken by this commit.
* Remove the old namecache API. Remove vfs_cache_lookup(), cache_lookup(),
cache_enter(), namei() and lookup() are all gone. VOP_LOOKUP() and
VOP_CACHEDLOOKUP() have been collapsed into a single non-caching
VOP_LOOKUP().
* Complete the new VFS CACHE (namecache) API. The new API is able to
supply topological guarentees and is able to reserve namespaces,
including negative cache spaces (whether the target name exists or not),
which the new API uses to reserve namespace for things like NRENAME
and NCREATE (and others).
* Complete the new namecache API. VOP_NRESOLVE, NLOOKUPDOTDOT, NCREATE,
NMKDIR, NMKNOD, NLINK, NSYMLINK, NWHITEOUT, NRENAME, NRMDIR, NREMOVE.
These new calls take (typicaly locked) namecache pointers rather then
combinations of directory vnodes, file vnodes, and name components. The
new calls are *MUCH* simpler in concept and implementation. For example,
VOP_RENAME() has 8 arguments while VOP_NRENAME() has only 3 arguments.
The new namecache API uses the namecache to lock namespaces without having
to lock the underlying vnodes. For example, this allows the kernel
to reserve the target name of a create function trivially. Namecache
records are maintained BY THE KERNEL for both positive and negative hits.
Generally speaking, the kernel layer is now responsible for resolving
path elements. NRESOLVE is called when an unresolved namecache record
needs to be resolved. Unlike the old VOP_LOOKUP, NRESOLVE is simply
responsible for associating a vnode to a namecache record (positive hit)
or telling the system that it's a negative hit, and not responsible for
handling symlinks or other special cases or doing any of the other
path lookup work, much unlike the old VOP_LOOKUP.
It should be particularly noted that the new namecache topology does not
allow disconnected namecache records. In rare cases where a vnode must
be converted to a namecache pointer for new API operation via a file handle
(i.e. NFS), the cache_fromdvp() function is provided and a new API VOP,
VOP_NLOOKUPDOTDOT() is provided to allow the namecache to resolve the
topology leading up to the requested vnode. These and other topological
guarentees greatly reduce the complexity of the new namecache API.
The new namei() is called nlookup(). This function uses a combination
of cache_n*() calls, VOP_NRESOLVE(), and standard VOP calls resolve the
supplied path, deal with symlinks, and so forth, in a nice small compact
compartmentalized procedure.
* The old VFS code is no longer responsible for maintaining namecache records,
a function which was mostly adhoc cache_purge()s occuring before the VFS
actually knows whether an operation will succeed or not.
The new VFS code is typically responsible for adjusting the state of
locked namecache records passed into it. For example, if NCREATE succeeds
it must call cache_setvp() to associate the passed namecache record with
the vnode representing the successfully created file. The new requirements
are much less complex then the old requirements.
* Most VFSs still implement the old API calls, albeit somewhat modified
and in particular the VOP_LOOKUP function is now *MUCH* simpler. However,
the kernel now uses the new API calls almost exclusively and relies on
compatibility code installed in the default ops (vop_compat_*()) to
convert the new calls to the old calls.
* All kernel system calls and related support functions which used to do
complex and confusing namei() operations now do far less complex and
far less confusing nlookup() operations.
* SPECOPS shortcutting has been implemented. User reads and writes now go
directly to supporting functions which talk to the device via fileops
rather then having to be routed through VOP_READ or VOP_WRITE, saving
significant overhead. Note, however, that these only really effect
/dev/null and /dev/zero.
Implementing this was fairly easy, we now simply pass an optional
struct file pointer to VOP_OPEN() and let spec_open() handle the
override.
SPECIAL NOTES: It should be noted that we must still lock a directory vnode
LK_EXCLUSIVE before issuing a VOP_LOOKUP(), even for simple lookups, because
a number of VFS's (including UFS) store active directory scanning information
in the directory vnode. The legacy NAMEI_LOOKUP cases can be changed to
use LK_SHARED once these VFS cases are fixed. In particular, we are now
organized well enough to actually be able to do record locking within a
directory for handling NCREATE, NDELETE, and NRENAME situations, but it hasn't
been done yet.
Many thanks to all of the testers and in particular David Rhodus for
finding a large number of panics and other issues.