Rune Machine Object Model General Rune outputs machine-independent intermediate assembly, called Rune Assembly which is then assembled with RAS into Rune Object Code and linked via RLD into Rune Machine libraries and executables. These programs are platform agnostic. They must agree on the width of pointers but this agreed-upon width does not have to match the pointer width on the target architecture. Rune programs can be shipped in any form... source, pre-linked Rune objects, post-linked Rune libraries and executables, or any combination. It is also possible (but NOT recommended) to ship final architecture- optimized binaries and libraries. More typically, Rune Machine libraries and executables are shipped and will self-translate to the target architecture on first-run, caching the resulting target binary locally. The Rune machine model has a lot of features that are only possible through the intention of not having a direct hardware implementation of the model. Primarily, per-procedure register set isolation and the discrete object cache using negative (e.g. %fp-relative) offsets which can be treated like registers during optimization and translation. That said, the machine model CAN have a direct hardware implementation as long as the self-translate pass handles the issues. The Rune machine model includes procedural, threading, atomicy, endian conversion, locking, and event-handling primitives. There is also a Rune virtual model capable of supplying a higher level of sophistication needed to allow use of Rune in an operating system implementation. This added sophistication (for example) necessarily includes MMU support. Most of the higher-level abstracted sophistication of Rune is implemented via a system library and handled through standard Rune library calls which the architecture layer can then translate to a more direct target implementation. Endian Rune programs are intended to run portably on little and big endian architectures. Instructions exist to help with specific endian formatting and language emitters using the machine model should use them. All data layouts in object code are typed for endian translation purposes. It is illegal for language emitters to optimize sub-field selection in memory objects (e.g. pull a 'char' out of an 'int') because the endian is not known at that point in time. Endian conversions and object truncation should use explicit instructions and allow the backend to optimize the operation instead of the frontend. Procedural Context A procedural context in the Rune object model consists of: * An isolated object-model register set. Each procedure gets its own register space which is opaque to callers and callees. This abstraction is typically maintained until the final translation to the target architecture is made. * A positive %fp-relative frame space The positive frame space is what one would typically consider a procedure's stack variable space. The size and formatting is under the full control of the assembly emitter and will not be modified by any translation stage. Effective addresses and indirect accesses can be used. * A negative %fp-relative frame space The negative frame space is a dynamic-sized but statically-realized spill area which may be used by the assembly emitter AND ANY TRANSLATION OR OPTIMIZATION STAGE. Elements in this space may only be directly loaded and stored, without the use of indirect addressing or pointers. That is, elements in this space must be treated kinda as if they were registers. All later translation and optimization stages can freely reorder, overload, add, and optimize this space. I must repeat: Assembly must treat elements in this space as if they were discrete registers. Effective addresses and indirect accesses are NOT allowed. The assembly emitter should use this space for *ALL* local variables whos fields are only directly accessed (never indirectly), and this can include structures as long as their elements are only ever accessed directly and not passed by reference anywhere. Rune is allowed to optimize all accesses in this space including moving elements to and from registers and compacting (through overloading) any actual memory use. A later translation stage might use this space for register spill when translating to a target architecture with an insufficient number of registers, or might assign elements FROM this space to machine registers when the target architecture has extra registers available and not actually have to reserve memory in the spill space for said elements. * A positive %ap-relative argument space Similar the positive frame space. When a procedure call is made the caller's arguments to the procedure can be accessed via this space. This space is generally used only for arguments and return values that might require effective or indirect addressing to access, or which are too large to fit into the allowed negative frame space area. * A negative %ap-relative argument space Similar the negative frame space. When a procedure call is made the caller's arguments to the procedure can be accessed via this space. Again, via discrete loads and stores only and object sizes must match. Most call arguments should use the negative argument space as this is what allows later Rune stages to optimize arguments and return values for automatic registerization on the target architecture. Any argument or return value elements which need to be effectively addressed or are accessed indirectly must use the positive space and cannot use this space. * An out-of-band exception handling model RAISE and LRAISE is supported via relocation data, mostly in an out-of-band manner. Code emitted to support the exception handling model is usually left to the final stage. This allows exception handling to be sequenced throughout the procedure without having to generate explicit conditional code to deal with it. The final stage usually stuffs exception handling code outside the main body of the procedure so it does not intefere with the critical path. Frame Spaces There are multiple two-sided frame spaces implemented in Rune which hang off of various special registers: %fp - Procedure frame space (also argument frame space for any calls made). %ap - Argument/return value frame space. %tp - Thread frame space. %db - Library frame space. These frame spaces are limited to the maximum positive signed offset for the address model and -32768 on the negative side. Negative frame spaces CANNOT be larger than -32768. In addition, because negative frame spaces can be augmented by multiple backend layers, the language emitter itself is limited to -16384. The limitations to the negative space exist to ensure that various optimization and translation layers can operate in bounded memory and to allow more optimal encoding in the Rune Object Model. E.g. all negative frame offsets can assume 16-bit offset encoding. The negative frame space can be reorderd, extended, compacted, and overloaded at will by the Rune assembler and any later stage, including the final archictectural stage. The positive frame space is fixed by the assembly emitter (usually a language backend). This means that any negative offsets chosen by an assembly emitter or any intermediate stage CAN BE RENAMED to another offset, and in fact could even be removed entirely if the target architecture is able to store the data in a real machine register. The more common case, however, is that intermediate stages might have to spill Rune registers into the negative frame space and that the final translation to the target architecture will need additional negative frame space to spill call-used registers. Procedure Calls and Reg-Args Procedure calls in Rune are relatively simple. Just store arguments into the positive and negative frame space and make the procedure call, then pull return values out of the positive and negative frame space. There are several caveats: * All procedure calls must have a linker relocation protyping the call arguments. Only negative frame space arguments are required to be prototyped. If you don't want to deal with this, do not use the negative frame space at all (but you will also lose related registerization optimizations). * There is no 'stack' per-say, just the frame. There is no 'pushing' or 'popping' allowed in the Rune object model. The frame pointer is meant to be deterministic for the duration of the procedure's execution. * Implied element of the frame such as the return %pc, saved %db, and saved old %ap are in neither the negative OR positive frame space. They and any other implied target architectural storage exist inbetween the negative and positive frame space. Remember that you cannot make any assumption with regards to actual offsets or memory use for negative frame objects. * Since each procedure has an isolated object-model register set, registerization of parameters and return values is handled via the negative frame space. Data cannot be passed between procedures in 'registers', which is important because it allows various backend stages to fully analyize the object model register space on a per-procedure basis. Callers store arguments in the negative frame space and callees simply load them into registers in a deterministic manner. For return values, callees store values in the negative frame space and the caller loads the results into registers in a deterministic manner (that is, unconditionally, just after the call returns). With a small amount of determinism, the backend can translate this mechanic into reg-args, reg-return, and reg-retain (which, if the register remains live for the duration of the target procedure, can be optimized into avoiding a machine save/restore). * A procedure may NOT make any assumptions as to the contents of the positive frame beyond what the procedure itself allocates, since the backend may need to tack on additional space for various reasons. * Var-args. All variable arguments must be laid down in the positive frame space. The negative frame space MUST NOT be used for var-args calls. The assembly emitter can decide where to put any fixed arguments in a var-args call... the negative frame space is typical, if they fit. * A better reg-retain mechanism is to use negative-library-frame-space globals. Rune already optimizes library descriptor accesses (which are also negative-library-frame-space accesses), so it simply leverages a feature already present. As long as you do not go overboard the backend can optimize such accesses into a semi-permanent machine register and not have to reload it across procedure calls. Other Assembly-level Features * The Rune assembler allows symbolic 'variable' naming for local frame space accesses via the '.local' pseudo-op. More importantly, the Rune assembler will analyize the execution paths and automatically overload the space. However, we recognize that most language backends will calculate offset for the positive frame space themselves, and this is also perfectly acceptable. * Register retentions. The language emitting the Rune assembly can choose how it wants to treat registers and variables by how it manipulates arguments and return values. For example, the language emitter can attempt to retain a register across procedure calls (avoid saving and restoring it) by storing it in the negative frame space prior to a call and loading it upon return. The backend may then be able to optimize the case if the call target does not modify the contents. * Infinite registers. The Rune assembler allows assembly emitters (aka language backends) to specify an infinite number of registers and will automatically spill excess registers which it cannot collapse through overloading into the negative frame space. This makes language backends a lot easier to implement. * Symbolic Indirect shortcut. The object model of course only allows indirect accesses via pointer registers, but RAS allows the assembly to specify a local symbolic name instead and will automatically load and manage the pointer register. LCALL SYS_Read(LIB_SYS) MOVE.L 24(Fubar),%r2 This mechanic is required if you want later stages to optimize library calls by inlining them. RAS is allowed to assume that the indirect pointer is deterministic, that is a constant as-of entry into the procedure. RAS is allowed to cache the pointer value at any point in the procedure. Final Architecture Translation The final translation to a target architecture usually occurs either on-the-fly or as a pass on a rune object model binary when it is run for the first time. * Target architecture register model can be substantially different and include call-scratch/call-used registers as well as passing some or all arguments and return values in registers. * The negative argument frame space is typically optimized into register passing of arguments and return values. * The final translation pass is responsible for converting the per-procedure Rune register spaces into the target architecture's shared register set. * Rune requires [L]CALLs to have a linker prototype (basically a pseudo-relocation) for any negative frame space argument or return value elements. Positive frame space arguments and return values do not have to be prototyped, though they can. * May be able to optimize library calls by inlining the contents. Generally requires using the RAS assembly shortcut for library calls. * May be able to optimize often-used indirect registers loaded from globals by retaining the machine register through multiple procedures without having to reload it. * Library calls exist in a compact form and the translation must include any appropriate %db spills (for whatever the target uses for %db). The target typically reserves a machine register for %db but it might have to hang it off of %tp if the target has insufficient pointer registers. * Rune defines machine optimizations that might be outside the scope of the Rune Object Model. For these situation, special library calls are used and the final translation stage optimizes them into the target architecture's special instructions. Most typically, complex matrix, vector, or graphics operations might fall into this category. This is an extremely powerful mechanism when combined with negative-argument-space optimizations. THREAD MODEL Rune is intended to support a lightweight threading model as well as a heavy weight (security-separated) process model. Register save and restore can be a big problem due to the sheer number of registers a machine target might have. There can potentially be hundreds of thousands or even millions of light weight threads active in a large application. Because Rune abstracts a register isolation model in the higher layers, the actual thread switching has to be handled by the final architectural translation stage. This stage typically allocates negative thread space (%tp relative) for machine register spills and can choose to reduce the number of machine registers it uses in order to optimize the light-weight switch, or can analyze the thread to avoid saving and restoring machine registers which thread never uses (which are often floating point registers), depending on the complexity of the thread. Access to both the negative and positive thread frame space typically occurs ONLY through library calls. The final translation pass is free to translation well known calls into direct accesses but in most cases it will not be necessary. Since most thread library calls access %tp-relative data and not %db-relative data, these library calls for the most part will not have to spill/load/restore %db. Library Model The Rune Library model operates via the %db register. The execution context, including the main() application itself, always has an active %db. * Exported library calls use LCALL/LRES/LRET and build a %db-relative vector. * Global variables are not global as in really global, they actually hang off of %db. Unspecified global variable accesses are assumed to hang off the current %db. * Any library can access the global space and make library calls to any other library. This is handled through a LIBRARY LINKAGE stored in a global in the library. So, for example, if library A and library B make calls to library Z, library A will have an A-relative global and library B will have a B-relative global containing a pointer to library Z's %db. * The linker pastes everything together. * Rune supports library instancing and in-context application exec()s without having to create a new unix-level process. This works because all globals are library-relative accesses. An in-context exec is handled simply by loading the desired application into the current process, not allowing it to share existing loaded libraries, and running it. Library instancing is supported by virtual of the way library chaining works (already described above), simply by creating a new instance of a library and determining whether the chaining from that library should also be new instances or share existing copies. An in-context fork is not really supported, though it would be theoretically possible with a lot of care. There's no real point to doing it when one can use a real unix-level fork(). The in-context exec feature is actually a lot more useful. * Negative-frame-space accesses via %db are usually used to access library descriptors but can also be used to hold registerizable globals. When used in this manner, the backend may be able to optimize accesses to negative-frame-space 'globals' into a permanent or semi-permanent machine register. * This model allows libraries to be loaded into memory independently, with very little linking overhead. Secure Data Model The Rune machine does not use a secure data model on its own, that is up to the language generating the code. However, the memory model is quite flexible and the translator can use masking for data and call tables to impose fences around code and data without needing a MMU. Object Code Layout The object module is laid out in native endian in a manner which can be converted as needed. If linking mixed-endian object files together, all objects are converted to the machine's native-endian mode for processing. It's easier and faster. All static data laid out in an object file must specify object type or have a layout prototype so endian can be properly converted. Instruction codes are also laid out in the selected endian mode. 32-bit instructions are organized as two 16-bit words, each word in the selected endian mode but organized with the MSB word first and the LSB word second (which is required for proper instruction decoding since the instruction width is calculated from the first word). Instruction extension words are also laid out in the selected endian mode. Since instructions must be 16-bit aligned, 16-bit extension words work as you expect. However, because 32 and 64-bit extension words do not have to be aligned, their layout is subject to a masking rule to avoid having to make any shifts during reconstruction. Lets take a 64 bit extension word which is misaligned by 16 bits. In 16-bit chunks the layout is: -XXXX--- In order to reconstruct the 64-bit extension a masking operation is used: (0XXX | X000) -> endian conversion (if needed) -> result Object modules must explicitly delineate code and data for endian conversion to work properly: code RO code const RO data data RW data pre-initialized bss RW data zero-fill All procedures and library vectors must have correct relocation and prototyping records. All laid-out data in const and data sections must have correct relocation and prototyping records for object width. Making Translation Easier The Rune object model is designed to make translation and later-stage optimizations trivial. This is why the assembler enforces a fully independent register space, does not allow read-before-first-load (through deterministic analysis), implements the negative/positive frame spaces with special restrictions on the negative space, requires correct alignment for accesses, and does not allow condition codes to persist beyond the next instruction. Negative Frame Space Domains This deserves a bit more of an explanation. When using the %ap, %fp, and %db-relative negative frame space you have to follow the same rules as you would for using registers. That is, only discrete loads and stores are allowed, a store must be deterministically present before you can load the same location, you can't mix operation sizes, and there is no legal effective address. You can repurpose the negative frame space however you like, as long as it is deterministic. That is, you could use -4(%fp), -3(%fp), -2(%fp), -1(%fp) to hold four 8-bit objects and then repurpose -4(%fp) to hold one 32-bit object once those four 8-bit objects are no longer needed. You can't store as one width and load as a different width. The Rune assembler must be able to validate that the rules are followed which it does through simple graph-based analysis. Rune explicitly does NOT try to validate potentially complex interactions or perform constant arithmatic to determine what code paths might actually be followed. To make things easier you can declare simple .cache or .lcache objects and allow the Rune assembler to handle the overloading for you. .cache labels are converted to valid negative offsets and not exported. Rune itself requires that the negative frame space not exceeds 32KB and since the Rune assembler and various intermediate translations stages might need to allocate their own storage in the negative frame space, we restrict language output to the (-16384 to -1) range. This restriction also serves to put a cap on memory use by various Rune stages when analyzing a program to guarantee that it remains within reason. File Formats Rune is meant to be an all-in-one environment. Rune file formats are not ELF and intentionally so. All Rune files except source files are organized into a hunk file. The top level of the file always contains a single hunk and any extranious data past the end of the hunk is ignored (allowing the whole file to be aligned if desired). class RuneHunk { uint32_t type; /* hunk type & recursion */ uint32_t flags; /* type-specific flags */ uint32_t size; /* hunk topology size */ uint32_t suboffset; /* &type relative to sub-topology */ }; class MappedRuneHunk { RuneHunk head; uint64_t data_offset; /* mapped data offset in file or 0 */ uint64_t data_size; /* mapped size */ }; #define HUNK_TYPE_RECURSE 0x80000000 #define HUNK_TYPE_MAPPED 0x40000000 A hunk contains three components, all optional: (1) Embedded meta-data built into the hunk structure (2) A memory-mapped component reference (typically 4KB aligned) (3) Embedded recursive hunk topology. The embedded meta-data is basically hunk-relative offset 0 through the suboffset. If the hunk has a memory-mapped component it is flagged in the type (it is part of the type id) and specified in fields just after the basic header. The offset is relative to the base file or (if recursing through a higher-level memory-mapped component), the higher-level memory-mapped component. The hunk can be flagged recursive. If a hunk is recursive the next level down is embedded in a series of hunks from (suboffset) relative to the hunk head to (size) relative to the hunk end. A memory-mapped component can represent a recursive hunk structure. In this situation the memory mapped component acts like a file and contains a single HUNK (which might itself be recursive), with extranious data ignored. The offset for any memory-mapped sub-components are relative to the top-level HUNK. This document does not describe the precise hunking format but it roughly follows the abstractions the Rune language itself needs to operate.