2 Rune Machine Object Model
6 Rune outputs machine-independent intermediate assembly, called Rune
7 Assembly which is then assembled with RAS into Rune Object Code and
8 linked via RLD into Rune Machine libraries and executables. These
9 programs are platform agnostic. They must agree on the width of pointers
10 but this agreed-upon width does not have to match the pointer width on
11 the target architecture.
13 Rune programs can be shipped in any form... source, pre-linked Rune
14 objects, post-linked Rune libraries and executables, or any combination.
15 It is also possible (but NOT recommended) to ship final architecture-
16 optimized binaries and libraries. More typically, Rune Machine libraries
17 and executables are shipped and will self-translate to the target
18 architecture on first-run, caching the resulting target binary locally.
20 The Rune machine model has a lot of features that are only possible through
21 the intention of not having a direct hardware implementation of the model.
22 Primarily, per-procedure register set isolation and the discrete object
23 cache using negative (e.g. %fp-relative) offsets which can be treated
24 like registers during optimization and translation. That said, the
25 machine model CAN have a direct hardware implementation as long as the
26 self-translate pass handles the issues.
28 The Rune machine model includes procedural, threading, atomicy, endian
29 conversion, locking, and event-handling primitives. There is also a
30 Rune virtual model capable of supplying a higher level of sophistication
31 needed to allow use of Rune in an operating system implementation. This
32 added sophistication (for example) necessarily includes MMU support.
34 Most of the higher-level abstracted sophistication of Rune is implemented
35 via a system library and handled through standard Rune library calls
36 which the architecture layer can then translate to a more direct target
41 Rune programs are intended to run portably on little and big endian
42 architectures. Instructions exist to help with specific endian
43 formatting and language emitters using the machine model should use
46 All data layouts in object code are typed for endian translation purposes.
47 It is illegal for language emitters to optimize sub-field selection in
48 memory objects (e.g. pull a 'char' out of an 'int') because the endian is
49 not known at that point in time. Endian conversions and object truncation
50 should use explicit instructions and allow the backend to optimize the
51 operation instead of the frontend.
55 A procedural context in the Rune object model consists of:
57 * An isolated object-model register set.
59 Each procedure gets its own register space which is opaque to callers
60 and callees. This abstraction is typically maintained until the final
61 translation to the target architecture is made.
63 * A positive %fp-relative frame space
65 The positive frame space is what one would typically consider a
66 procedure's stack variable space. The size and formatting is under
67 the full control of the assembly emitter and will not be modified
68 by any translation stage. Effective addresses and indirect accesses
71 * A negative %fp-relative frame space
73 The negative frame space is a dynamic-sized but statically-realized
74 spill area which may be used by the assembly emitter AND ANY TRANSLATION
75 OR OPTIMIZATION STAGE.
77 Elements in this space may only be directly loaded and stored, without
78 the use of indirect addressing or pointers. That is, elements in this
79 space must be treated kinda as if they were registers. All later
80 translation and optimization stages can freely reorder, overload, add,
81 and optimize this space. I must repeat: Assembly must treat elements
82 in this space as if they were discrete registers. Effective addresses
83 and indirect accesses are NOT allowed.
85 The assembly emitter should use this space for *ALL* local variables
86 whos fields are only directly accessed (never indirectly), and this
87 can include structures as long as their elements are only ever accessed
88 directly and not passed by reference anywhere.
90 Rune is allowed to optimize all accesses in this space including moving
91 elements to and from registers and compacting (through overloading)
92 any actual memory use. A later translation stage might use this space
93 for register spill when translating to a target architecture with
94 an insufficient number of registers, or might assign elements FROM
95 this space to machine registers when the target architecture has extra
96 registers available and not actually have to reserve memory in the
97 spill space for said elements.
99 * A positive %ap-relative argument space
101 Similar the positive frame space. When a procedure call is made the
102 caller's arguments to the procedure can be accessed via this space.
103 This space is generally used only for arguments and return values that
104 might require effective or indirect addressing to access, or which are
105 too large to fit into the allowed negative frame space area.
107 * A negative %ap-relative argument space
109 Similar the negative frame space. When a procedure call is made the
110 caller's arguments to the procedure can be accessed via this space.
111 Again, via discrete loads and stores only and object sizes must match.
113 Most call arguments should use the negative argument space as this
114 is what allows later Rune stages to optimize arguments and return
115 values for automatic registerization on the target architecture.
117 Any argument or return value elements which need to be effectively
118 addressed or are accessed indirectly must use the positive space and
119 cannot use this space.
121 * An out-of-band exception handling model
123 RAISE and LRAISE is supported via relocation data, mostly in an
124 out-of-band manner. Code emitted to support the exception handling
125 model is usually left to the final stage. This allows exception
126 handling to be sequenced throughout the procedure without having to
127 generate explicit conditional code to deal with it.
129 The final stage usually stuffs exception handling code outside the
130 main body of the procedure so it does not intefere with the critical
135 There are multiple two-sided frame spaces implemented in Rune which hang
136 off of various special registers:
138 %fp - Procedure frame space (also argument frame space for any calls made).
139 %ap - Argument/return value frame space.
140 %tp - Thread frame space.
141 %db - Library frame space.
143 These frame spaces are limited to the maximum positive signed offset
144 for the address model and -32768 on the negative side. Negative frame
145 spaces CANNOT be larger than -32768. In addition, because negative
146 frame spaces can be augmented by multiple backend layers, the language
147 emitter itself is limited to -16384.
149 The limitations to the negative space exist to ensure that various
150 optimization and translation layers can operate in bounded memory and
151 to allow more optimal encoding in the Rune Object Model. E.g. all
152 negative frame offsets can assume 16-bit offset encoding.
154 The negative frame space can be reorderd, extended, compacted, and
155 overloaded at will by the Rune assembler and any later stage, including
156 the final archictectural stage. The positive frame space is fixed by
157 the assembly emitter (usually a language backend). This means that
158 any negative offsets chosen by an assembly emitter or any intermediate
159 stage CAN BE RENAMED to another offset, and in fact could even be removed
160 entirely if the target architecture is able to store the data in a
161 real machine register. The more common case, however, is that
162 intermediate stages might have to spill Rune registers into the negative
163 frame space and that the final translation to the target architecture
164 will need additional negative frame space to spill call-used registers.
166 Procedure Calls and Reg-Args
168 Procedure calls in Rune are relatively simple. Just store arguments into
169 the positive and negative frame space and make the procedure call, then
170 pull return values out of the positive and negative frame space. There
173 * All procedure calls must have a linker relocation protyping the call
174 arguments. Only negative frame space arguments are required to be
175 prototyped. If you don't want to deal with this, do not use the
176 negative frame space at all (but you will also lose related
177 registerization optimizations).
179 * There is no 'stack' per-say, just the frame. There is no 'pushing'
180 or 'popping' allowed in the Rune object model. The frame pointer is
181 meant to be deterministic for the duration of the procedure's execution.
183 * Implied element of the frame such as the return %pc, saved %db, and
184 saved old %ap are in neither the negative OR positive frame space.
185 They and any other implied target architectural storage exist inbetween
186 the negative and positive frame space.
188 Remember that you cannot make any assumption with regards to actual
189 offsets or memory use for negative frame objects.
191 * Since each procedure has an isolated object-model register set,
192 registerization of parameters and return values is handled via the
193 negative frame space. Data cannot be passed between procedures in
194 'registers', which is important because it allows various backend
195 stages to fully analyize the object model register space on a
198 Callers store arguments in the negative frame space and callees simply
199 load them into registers in a deterministic manner. For return values,
200 callees store values in the negative frame space and the caller loads
201 the results into registers in a deterministic manner (that is,
202 unconditionally, just after the call returns).
204 With a small amount of determinism, the backend can translate this
205 mechanic into reg-args, reg-return, and reg-retain (which, if the
206 register remains live for the duration of the target procedure,
207 can be optimized into avoiding a machine save/restore).
209 * A procedure may NOT make any assumptions as to the contents of the
210 positive frame beyond what the procedure itself allocates, since
211 the backend may need to tack on additional space for various reasons.
213 * Var-args. All variable arguments must be laid down in the positive
214 frame space. The negative frame space MUST NOT be used for var-args
215 calls. The assembly emitter can decide where to put any fixed
216 arguments in a var-args call... the negative frame space is typical,
219 * A better reg-retain mechanism is to use negative-library-frame-space
220 globals. Rune already optimizes library descriptor accesses (which
221 are also negative-library-frame-space accesses), so it simply leverages
222 a feature already present.
224 As long as you do not go overboard the backend can optimize such accesses
225 into a semi-permanent machine register and not have to reload it across
228 Other Assembly-level Features
230 * The Rune assembler allows symbolic 'variable' naming for local frame
231 space accesses via the '.local' pseudo-op. More importantly, the
232 Rune assembler will analyize the execution paths and automatically
235 However, we recognize that most language backends will calculate
236 offset for the positive frame space themselves, and this is also
237 perfectly acceptable.
239 * Register retentions. The language emitting the Rune assembly can
240 choose how it wants to treat registers and variables by how it
241 manipulates arguments and return values. For example, the language
242 emitter can attempt to retain a register across procedure calls (avoid
243 saving and restoring it) by storing it in the negative frame space
244 prior to a call and loading it upon return. The backend may then
245 be able to optimize the case if the call target does not modify the
248 * Infinite registers. The Rune assembler allows assembly emitters
249 (aka language backends) to specify an infinite number of registers
250 and will automatically spill excess registers which it cannot
251 collapse through overloading into the negative frame space. This
252 makes language backends a lot easier to implement.
254 * Symbolic Indirect shortcut. The object model of course only allows
255 indirect accesses via pointer registers, but RAS allows the assembly
256 to specify a local symbolic name instead and will automatically load
257 and manage the pointer register.
259 LCALL SYS_Read(LIB_SYS)
262 This mechanic is required if you want later stages to optimize library
263 calls by inlining them.
265 RAS is allowed to assume that the indirect pointer is deterministic,
266 that is a constant as-of entry into the procedure. RAS is allowed to
267 cache the pointer value at any point in the procedure.
269 Final Architecture Translation
271 The final translation to a target architecture usually occurs either
272 on-the-fly or as a pass on a rune object model binary when it is run
275 * Target architecture register model can be substantially different
276 and include call-scratch/call-used registers as well as passing
277 some or all arguments and return values in registers.
279 * The negative argument frame space is typically optimized into
280 register passing of arguments and return values.
282 * The final translation pass is responsible for converting the
283 per-procedure Rune register spaces into the target architecture's
286 * Rune requires [L]CALLs to have a linker prototype (basically a
287 pseudo-relocation) for any negative frame space argument or return
288 value elements. Positive frame space arguments and return values
289 do not have to be prototyped, though they can.
291 * May be able to optimize library calls by inlining the contents.
292 Generally requires using the RAS assembly shortcut for library calls.
294 * May be able to optimize often-used indirect registers loaded from
295 globals by retaining the machine register through multiple procedures
296 without having to reload it.
298 * Library calls exist in a compact form and the translation must include
299 any appropriate %db spills (for whatever the target uses for %db).
300 The target typically reserves a machine register for %db but it might
301 have to hang it off of %tp if the target has insufficient pointer
304 * Rune defines machine optimizations that might be outside the scope
305 of the Rune Object Model. For these situation, special library calls
306 are used and the final translation stage optimizes them into the
307 target architecture's special instructions.
309 Most typically, complex matrix, vector, or graphics operations
310 might fall into this category. This is an extremely powerful mechanism
311 when combined with negative-argument-space optimizations.
315 Rune is intended to support a lightweight threading model as well as a
316 heavy weight (security-separated) process model. Register save and
317 restore can be a big problem due to the sheer number of registers a
318 machine target might have. There can potentially be hundreds of thousands
319 or even millions of light weight threads active in a large application.
321 Because Rune abstracts a register isolation model in the higher layers,
322 the actual thread switching has to be handled by the final architectural
325 This stage typically allocates negative thread space (%tp relative) for
326 machine register spills and can choose to reduce the number of machine
327 registers it uses in order to optimize the light-weight switch, or can
328 analyze the thread to avoid saving and restoring machine registers which
329 thread never uses (which are often floating point registers), depending
330 on the complexity of the thread.
332 Access to both the negative and positive thread frame space typically
333 occurs ONLY through library calls. The final translation pass is free
334 to translation well known calls into direct accesses but in most cases
335 it will not be necessary. Since most thread library calls access
336 %tp-relative data and not %db-relative data, these library calls for
337 the most part will not have to spill/load/restore %db.
341 The Rune Library model operates via the %db register. The execution
342 context, including the main() application itself, always has an active
345 * Exported library calls use LCALL/LRES/LRET and build a %db-relative
348 * Global variables are not global as in really global, they actually hang
349 off of %db. Unspecified global variable accesses are assumed to hang
352 * Any library can access the global space and make library calls to any
353 other library. This is handled through a LIBRARY LINKAGE stored in
354 a global in the library.
356 So, for example, if library A and library B make calls to library Z,
357 library A will have an A-relative global and library B will have a
358 B-relative global containing a pointer to library Z's %db.
360 * The linker pastes everything together.
362 * Rune supports library instancing and in-context application exec()s
363 without having to create a new unix-level process. This works because
364 all globals are library-relative accesses.
366 An in-context exec is handled simply by loading the desired application
367 into the current process, not allowing it to share existing loaded
368 libraries, and running it.
370 Library instancing is supported by virtual of the way library chaining
371 works (already described above), simply by creating a new instance of
372 a library and determining whether the chaining from that library should
373 also be new instances or share existing copies.
375 An in-context fork is not really supported, though it would be
376 theoretically possible with a lot of care. There's no real point
377 to doing it when one can use a real unix-level fork(). The in-context
378 exec feature is actually a lot more useful.
380 * Negative-frame-space accesses via %db are usually used to access library
381 descriptors but can also be used to hold registerizable globals. When
382 used in this manner, the backend may be able to optimize accesses to
383 negative-frame-space 'globals' into a permanent or semi-permanent
386 * This model allows libraries to be loaded into memory independently,
387 with very little linking overhead.
391 The Rune machine does not use a secure data model on its own, that is up
392 to the language generating the code. However, the memory model is quite
393 flexible and the translator can use masking for data and call tables
394 to impose fences around code and data without needing a MMU.
398 The object module is laid out in native endian in a manner which can be
399 converted as needed. If linking mixed-endian object files together,
400 all objects are converted to the machine's native-endian mode for
401 processing. It's easier and faster.
403 All static data laid out in an object file must specify object type or
404 have a layout prototype so endian can be properly converted.
406 Instruction codes are also laid out in the selected endian mode. 32-bit
407 instructions are organized as two 16-bit words, each word in the selected
408 endian mode but organized with the MSB word first and the LSB word second
409 (which is required for proper instruction decoding since the instruction
410 width is calculated from the first word).
412 Instruction extension words are also laid out in the selected endian mode.
413 Since instructions must be 16-bit aligned, 16-bit extension words work as
414 you expect. However, because 32 and 64-bit extension words do not have
415 to be aligned, their layout is subject to a masking rule to avoid having
416 to make any shifts during reconstruction.
418 Lets take a 64 bit extension word which is misaligned by 16 bits. In
419 16-bit chunks the layout is:
423 In order to reconstruct the 64-bit extension a masking operation is used:
425 (0XXX | X000) -> endian conversion (if needed) -> result
427 Object modules must explicitly delineate code and data for endian
428 conversion to work properly:
432 data RW data pre-initialized
433 bss RW data zero-fill
435 All procedures and library vectors must have correct relocation and
438 All laid-out data in const and data sections must have correct relocation
439 and prototyping records for object width.
441 Making Translation Easier
443 The Rune object model is designed to make translation and later-stage
444 optimizations trivial. This is why the assembler enforces a fully
445 independent register space, does not allow read-before-first-load
446 (through deterministic analysis), implements the negative/positive frame
447 spaces with special restrictions on the negative space, requires
448 correct alignment for accesses, and does not allow condition codes to
449 persist beyond the next instruction.
451 Negative Frame Space Domains
453 This deserves a bit more of an explanation. When using the %ap, %fp,
454 and %db-relative negative frame space you have to follow the same rules
455 as you would for using registers. That is, only discrete loads and
456 stores are allowed, a store must be deterministically present before you
457 can load the same location, you can't mix operation sizes, and there is
458 no legal effective address.
460 You can repurpose the negative frame space however you like, as long as
461 it is deterministic. That is, you could use -4(%fp), -3(%fp), -2(%fp),
462 -1(%fp) to hold four 8-bit objects and then repurpose -4(%fp) to hold
463 one 32-bit object once those four 8-bit objects are no longer needed.
465 You can't store as one width and load as a different width. The Rune
466 assembler must be able to validate that the rules are followed which
467 it does through simple graph-based analysis. Rune explicitly does NOT
468 try to validate potentially complex interactions or perform constant
469 arithmatic to determine what code paths might actually be followed.
471 To make things easier you can declare simple .cache or .lcache objects
472 and allow the Rune assembler to handle the overloading for you. .cache
473 labels are converted to valid negative offsets and not exported.
475 Rune itself requires that the negative frame space not exceeds 32KB
476 and since the Rune assembler and various intermediate translations
477 stages might need to allocate their own storage in the negative frame
478 space, we restrict language output to the (-16384 to -1) range. This
479 restriction also serves to put a cap on memory use by various Rune stages
480 when analyzing a program to guarantee that it remains within reason.
484 Rune is meant to be an all-in-one environment. Rune file formats are not
485 ELF and intentionally so. All Rune files except source files are organized
486 into a hunk file. The top level of the file always contains a single hunk
487 and any extranious data past the end of the hunk is ignored (allowing the
488 whole file to be aligned if desired).
491 uint32_t type; /* hunk type & recursion */
492 uint32_t flags; /* type-specific flags */
493 uint32_t size; /* hunk topology size */
494 uint32_t suboffset; /* &type relative to sub-topology */
497 class MappedRuneHunk {
499 uint64_t data_offset; /* mapped data offset in file or 0 */
500 uint64_t data_size; /* mapped size */
503 #define HUNK_TYPE_RECURSE 0x80000000
504 #define HUNK_TYPE_MAPPED 0x40000000
506 A hunk contains three components, all optional:
508 (1) Embedded meta-data built into the hunk structure
509 (2) A memory-mapped component reference (typically 4KB aligned)
510 (3) Embedded recursive hunk topology.
512 The embedded meta-data is basically hunk-relative offset 0 through
513 the suboffset. If the hunk has a memory-mapped component it is
514 flagged in the type (it is part of the type id) and specified in
515 fields just after the basic header. The offset is relative to the
516 base file or (if recursing through a higher-level memory-mapped component),
517 the higher-level memory-mapped component.
519 The hunk can be flagged recursive. If a hunk is recursive the next
520 level down is embedded in a series of hunks from (suboffset) relative
521 to the hunk head to (size) relative to the hunk end.
523 A memory-mapped component can represent a recursive hunk structure.
524 In this situation the memory mapped component acts like a file and
525 contains a single HUNK (which might itself be recursive), with
526 extranious data ignored. The offset for any memory-mapped sub-components
527 are relative to the top-level HUNK.
529 This document does not describe the precise hunking format but it roughly
530 follows the abstractions the Rune language itself needs to operate.