docs/objmodel.txt

   1
   2                           Rune Machine Object Model
   3
   4                                     General
   5
   6     Rune outputs machine-independent intermediate assembly, called Rune
   7     Assembly which is then assembled with RAS into Rune Object Code and
   8     linked via RLD into Rune Machine libraries and executables.  These
   9     programs are platform agnostic.  They must agree on the width of pointers
  10     but this agreed-upon width does not have to match the pointer width on
  11     the target architecture.
  12
  13     Rune programs can be shipped in any form... source, pre-linked Rune
  14     objects, post-linked Rune libraries and executables, or any combination.
  15     It is also possible (but NOT recommended) to ship final architecture-
  16     optimized binaries and libraries.  More typically, Rune Machine libraries
  17     and executables are shipped and will self-translate to the target
  18     architecture on first-run, caching the resulting target binary locally.
  19
  20     The Rune machine model has a lot of features that are only possible through
  21     the intention of not having a direct hardware implementation of the model.
  22     Primarily, per-procedure register set isolation and the discrete object
  23     cache using negative (e.g. %fp-relative) offsets which can be treated
  24     like registers during optimization and translation.  That said, the
  25     machine model CAN have a direct hardware implementation as long as the
  26     self-translate pass handles the issues.
  27
  28     The Rune machine model includes procedural, threading, atomicy, endian
  29     conversion, locking, and event-handling primitives.  There is also a
  30     Rune virtual model capable of supplying a higher level of sophistication
  31     needed to allow use of Rune in an operating system implementation.  This
  32     added sophistication (for example) necessarily includes MMU support.
  33
  34     Most of the higher-level abstracted sophistication of Rune is implemented
  35     via a system library and handled through standard Rune library calls
  36     which the architecture layer can then translate to a more direct target
  37     implementation.
  38
  39                                     Endian
  40
  41     Rune programs are intended to run portably on little and big endian
  42     architectures.  Instructions exist to help with specific endian
  43     formatting and language emitters using the machine model should use
  44     them.
  45
  46     All data layouts in object code are typed for endian translation purposes.
  47     It is illegal for language emitters to optimize sub-field selection in
  48     memory objects (e.g. pull a 'char' out of an 'int') because the endian is
  49     not known at that point in time.  Endian conversions and object truncation
  50     should use explicit instructions and allow the backend to optimize the
  51     operation instead of the frontend.
  52
  53                             Procedural Context
  54
  55     A procedural context in the Rune object model consists of:
  56
  57     * An isolated object-model register set.
  58
  59       Each procedure gets its own register space which is opaque to callers
  60       and callees.  This abstraction is typically maintained until the final
  61       translation to the target architecture is made.
  62
  63     * A positive %fp-relative frame space
  64
  65       The positive frame space is what one would typically consider a
  66       procedure's stack variable space.  The size and formatting is under
  67       the full control of the assembly emitter and will not be modified
  68       by any translation stage.  Effective addresses and indirect accesses
  69       can be used.
  70
  71     * A negative %fp-relative frame space
  72
  73       The negative frame space is a dynamic-sized but statically-realized
  74       spill area which may be used by the assembly emitter AND ANY TRANSLATION
  75       OR OPTIMIZATION STAGE.
  76
  77       Elements in this space may only be directly loaded and stored, without
  78       the use of indirect addressing or pointers.  That is, elements in this
  79       space must be treated kinda as if they were registers.  All later
  80       translation and optimization stages can freely reorder, overload, add,
  81       and optimize this space.  I must repeat: Assembly must treat elements
  82       in this space as if they were discrete registers.  Effective addresses
  83       and indirect accesses are NOT allowed.
  84
  85       The assembly emitter should use this space for *ALL* local variables
  86       whos fields are only directly accessed (never indirectly), and this
  87       can include structures as long as their elements are only ever accessed
  88       directly and not passed by reference anywhere.
  89
  90       Rune is allowed to optimize all accesses in this space including moving
  91       elements to and from registers and compacting (through overloading)
  92       any actual memory use.  A later translation stage might use this space
  93       for register spill when translating to a target architecture with
  94       an insufficient number of registers, or might assign elements FROM
  95       this space to machine registers when the target architecture has extra
  96       registers available and not actually have to reserve memory in the
  97       spill space for said elements.
  98
  99     * A positive %ap-relative argument space
 100
 101       Similar the positive frame space.  When a procedure call is made the
 102       caller's arguments to the procedure can be accessed via this space.
 103       This space is generally used only for arguments and return values that
 104       might require effective or indirect addressing to access, or which are
 105       too large to fit into the allowed negative frame space area.
 106
 107     * A negative %ap-relative argument space
 108
 109       Similar the negative frame space.  When a procedure call is made the
 110       caller's arguments to the procedure can be accessed via this space.
 111       Again, via discrete loads and stores only and object sizes must match.
 112
 113       Most call arguments should use the negative argument space as this
 114       is what allows later Rune stages to optimize arguments and return
 115       values for automatic registerization on the target architecture.
 116
 117       Any argument or return value elements which need to be effectively
 118       addressed or are accessed indirectly must use the positive space and
 119       cannot use this space.
 120
 121     * An out-of-band exception handling model
 122
 123       RAISE and LRAISE is supported via relocation data, mostly in an
 124       out-of-band manner.  Code emitted to support the exception handling
 125       model is usually left to the final stage.  This allows exception
 126       handling to be sequenced throughout the procedure without having to
 127       generate explicit conditional code to deal with it.
 128
 129       The final stage usually stuffs exception handling code outside the
 130       main body of the procedure so it does not intefere with the critical
 131       path.
 132
 133                                 Frame Spaces
 134
 135     There are multiple two-sided frame spaces implemented in Rune which hang
 136     off of various special registers:
 137
 138     %fp - Procedure frame space (also argument frame space for any calls made).
 139     %ap - Argument/return value frame space.
 140     %tp - Thread frame space.
 141     %db - Library frame space.
 142
 143     These frame spaces are limited to the maximum positive signed offset
 144     for the address model and -32768 on the negative side.  Negative frame
 145     spaces CANNOT be larger than -32768.  In addition, because negative
 146     frame spaces can be augmented by multiple backend layers, the language
 147     emitter itself is limited to -16384.
 148
 149     The limitations to the negative space exist to ensure that various
 150     optimization and translation layers can operate in bounded memory and
 151     to allow more optimal encoding in the Rune Object Model.  E.g. all
 152     negative frame offsets can assume 16-bit offset encoding.
 153
 154     The negative frame space can be reorderd, extended, compacted, and
 155     overloaded at will by the Rune assembler and any later stage, including
 156     the final archictectural stage.  The positive frame space is fixed by
 157     the assembly emitter (usually a language backend).  This means that
 158     any negative offsets chosen by an assembly emitter or any intermediate
 159     stage CAN BE RENAMED to another offset, and in fact could even be removed
 160     entirely if the target architecture is able to store the data in a
 161     real machine register.  The more common case, however, is that
 162     intermediate stages might have to spill Rune registers into the negative
 163     frame space and that the final translation to the target architecture
 164     will need additional negative frame space to spill call-used registers.
 165
 166                             Procedure Calls and Reg-Args
 167
 168     Procedure calls in Rune are relatively simple.  Just store arguments into
 169     the positive and negative frame space and make the procedure call, then
 170     pull return values out of the positive and negative frame space.  There
 171     are several caveats:
 172
 173     * All procedure calls must have a linker relocation protyping the call
 174       arguments.  Only negative frame space arguments are required to be
 175       prototyped.  If you don't want to deal with this, do not use the
 176       negative frame space at all (but you will also lose related
 177       registerization optimizations).
 178
 179     * There is no 'stack' per-say, just the frame.  There is no 'pushing'
 180       or 'popping' allowed in the Rune object model.  The frame pointer is
 181       meant to be deterministic for the duration of the procedure's execution.
 182
 183     * Implied element of the frame such as the return %pc, saved %db, and
 184       saved old %ap are in neither the negative OR positive frame space.
 185       They and any other implied target architectural storage exist inbetween
 186       the negative and positive frame space.
 187
 188       Remember that you cannot make any assumption with regards to actual
 189       offsets or memory use for negative frame objects.
 190
 191     * Since each procedure has an isolated object-model register set,
 192       registerization of parameters and return values is handled via the
 193       negative frame space.  Data cannot be passed between procedures in
 194       'registers', which is important because it allows various backend
 195       stages to fully analyize the object model register space on a
 196       per-procedure basis.
 197
 198       Callers store arguments in the negative frame space and callees simply
 199       load them into registers in a deterministic manner. For return values,
 200       callees store values in the negative frame space and the caller loads
 201       the results into registers in a deterministic manner (that is,
 202       unconditionally, just after the call returns).
 203
 204       With a small amount of determinism, the backend can translate this
 205       mechanic into reg-args, reg-return, and reg-retain (which, if the
 206       register remains live for the duration of the target procedure,
 207       can be optimized into avoiding a machine save/restore).
 208
 209     * A procedure may NOT make any assumptions as to the contents of the
 210       positive frame beyond what the procedure itself allocates, since
 211       the backend may need to tack on additional space for various reasons.
 212
 213     * Var-args. All variable arguments must be laid down in the positive
 214       frame space.  The negative frame space MUST NOT be used for var-args
 215       calls.  The assembly emitter can decide where to put any fixed
 216       arguments in a var-args call... the negative frame space is typical,
 217       if they fit.
 218
 219     * A better reg-retain mechanism is to use negative-library-frame-space
 220       globals.  Rune already optimizes library descriptor accesses (which
 221       are also negative-library-frame-space accesses), so it simply leverages
 222       a feature already present.
 223
 224       As long as you do not go overboard the backend can optimize such accesses
 225       into a semi-permanent machine register and not have to reload it across
 226       procedure calls.
 227
 228                             Other Assembly-level Features
 229
 230     * The Rune assembler allows symbolic 'variable' naming for local frame
 231       space accesses via the '.local' pseudo-op.  More importantly, the
 232       Rune assembler will analyize the execution paths and automatically
 233       overload the space.
 234
 235       However, we recognize that most language backends will calculate
 236       offset for the positive frame space themselves, and this is also
 237       perfectly acceptable.
 238
 239     * Register retentions.  The language emitting the Rune assembly can
 240       choose how it wants to treat registers and variables by how it
 241       manipulates arguments and return values.  For example, the language
 242       emitter can attempt to retain a register across procedure calls (avoid
 243       saving and restoring it) by storing it in the negative frame space
 244       prior to a call and loading it upon return.  The backend may then
 245       be able to optimize the case if the call target does not modify the
 246       contents.
 247
 248     * Infinite registers.  The Rune assembler allows assembly emitters
 249       (aka language backends) to specify an infinite number of registers
 250       and will automatically spill excess registers which it cannot
 251       collapse through overloading into the negative frame space.  This
 252       makes language backends a lot easier to implement.
 253
 254     * Symbolic Indirect shortcut.   The object model of course only allows
 255       indirect accesses via pointer registers, but RAS allows the assembly
 256       to specify a local symbolic name instead and will automatically load
 257       and manage the pointer register.
 258
 259       LCALL     SYS_Read(LIB_SYS)
 260       MOVE.L    24(Fubar),%r2
 261
 262       This mechanic is required if you want later stages to optimize library
 263       calls by inlining them.
 264
 265       RAS is allowed to assume that the indirect pointer is deterministic,
 266       that is a constant as-of entry into the procedure.  RAS is allowed to
 267       cache the pointer value at any point in the procedure.
 268
 269                             Final Architecture Translation
 270
 271     The final translation to a target architecture usually occurs either
 272     on-the-fly or as a pass on a rune object model binary when it is run
 273     for the first time.
 274
 275     * Target architecture register model can be substantially different
 276       and include call-scratch/call-used registers as well as passing
 277       some or all arguments and return values in registers.
 278
 279     * The negative argument frame space is typically optimized into
 280       register passing of arguments and return values.
 281
 282     * The final translation pass is responsible for converting the
 283       per-procedure Rune register spaces into the target architecture's
 284       shared register set.
 285
 286     * Rune requires [L]CALLs to have a linker prototype (basically a
 287       pseudo-relocation) for any negative frame space argument or return
 288       value elements.  Positive frame space arguments and return values
 289       do not have to be prototyped, though they can.
 290
 291     * May be able to optimize library calls by inlining the contents.
 292       Generally requires using the RAS assembly shortcut for library calls.
 293
 294     * May be able to optimize often-used indirect registers loaded from
 295       globals by retaining the machine register through multiple procedures
 296       without having to reload it.
 297
 298     * Library calls exist in a compact form and the translation must include
 299       any appropriate %db spills (for whatever the target uses for %db).
 300       The target typically reserves a machine register for %db but it might
 301       have to hang it off of %tp if the target has insufficient pointer
 302       registers.
 303
 304     * Rune defines machine optimizations that might be outside the scope
 305       of the Rune Object Model.  For these situation, special library calls
 306       are used and the final translation stage optimizes them into the
 307       target architecture's special instructions.
 308
 309       Most typically, complex matrix, vector, or graphics operations
 310       might fall into this category.  This is an extremely powerful mechanism
 311       when combined with negative-argument-space optimizations.
 312
 313                                 THREAD MODEL
 314
 315     Rune is intended to support a lightweight threading model as well as a
 316     heavy weight (security-separated) process model.  Register save and
 317     restore can be a big problem due to the sheer number of registers a
 318     machine target might have.  There can potentially be hundreds of thousands
 319     or even millions of light weight threads active in a large application.
 320
 321     Because Rune abstracts a register isolation model in the higher layers,
 322     the actual thread switching has to be handled by the final architectural
 323     translation stage.
 324
 325     This stage typically allocates negative thread space (%tp relative) for
 326     machine register spills and can choose to reduce the number of machine
 327     registers it uses in order to optimize the light-weight switch, or can
 328     analyze the thread to avoid saving and restoring machine registers which
 329     thread never uses (which are often floating point registers), depending
 330     on the complexity of the thread.
 331
 332     Access to both the negative and positive thread frame space typically
 333     occurs ONLY through library calls.  The final translation pass is free
 334     to translation well known calls into direct accesses but in most cases
 335     it will not be necessary.  Since most thread library calls access
 336     %tp-relative data and not %db-relative data, these library calls for
 337     the most part will not have to spill/load/restore %db.
 338
 339                                 Library Model
 340
 341     The Rune Library model operates via the %db register.  The execution
 342     context, including the main() application itself, always has an active
 343     %db.
 344
 345     * Exported library calls use LCALL/LRES/LRET and build a %db-relative
 346       vector.
 347
 348     * Global variables are not global as in really global, they actually hang
 349       off of %db.  Unspecified global variable accesses are assumed to hang
 350       off the current %db.
 351
 352     * Any library can access the global space and make library calls to any
 353       other library.  This is handled through a LIBRARY LINKAGE stored in
 354       a global in the library.
 355
 356       So, for example, if library A and library B make calls to library Z,
 357       library A will have an A-relative global and library B will have a
 358       B-relative global containing a pointer to library Z's %db.
 359
 360     * The linker pastes everything together.
 361
 362     * Rune supports library instancing and in-context application exec()s
 363       without having to create a new unix-level process.  This works because
 364       all globals are library-relative accesses.
 365
 366       An in-context exec is handled simply by loading the desired application
 367       into the current process, not allowing it to share existing loaded
 368       libraries, and running it.
 369
 370       Library instancing is supported by virtual of the way library chaining
 371       works (already described above), simply by creating a new instance of
 372       a library and determining whether the chaining from that library should
 373       also be new instances or share existing copies.
 374
 375       An in-context fork is not really supported, though it would be
 376       theoretically possible with a lot of care.  There's no real point
 377       to doing it when one can use a real unix-level fork().  The in-context
 378       exec feature is actually a lot more useful.
 379
 380     * Negative-frame-space accesses via %db are usually used to access library
 381       descriptors but can also be used to hold registerizable globals.  When
 382       used in this manner, the backend may be able to optimize accesses to
 383       negative-frame-space 'globals' into a permanent or semi-permanent
 384       machine register.
 385
 386     * This model allows libraries to be loaded into memory independently,
 387       with very little linking overhead.
 388
 389                                 Secure Data Model
 390
 391     The Rune machine does not use a secure data model on its own, that is up
 392     to the language generating the code.  However, the memory model is quite
 393     flexible and the translator can use masking for data and call tables
 394     to impose fences around code and data without needing a MMU.
 395
 396                                 Object Code Layout
 397
 398     The object module is laid out in native endian in a manner which can be
 399     converted as needed.  If linking mixed-endian object files together,
 400     all objects are converted to the machine's native-endian mode for
 401     processing.  It's easier and faster.
 402
 403     All static data laid out in an object file must specify object type or
 404     have a layout prototype so endian can be properly converted.
 405
 406     Instruction codes are also laid out in the selected endian mode. 32-bit
 407     instructions are organized as two 16-bit words, each word in the selected
 408     endian mode but organized with the MSB word first and the LSB word second
 409     (which is required for proper instruction decoding since the instruction
 410     width is calculated from the first word).
 411
 412     Instruction extension words are also laid out in the selected endian mode.
 413     Since instructions must be 16-bit aligned, 16-bit extension words work as
 414     you expect.  However, because 32 and 64-bit extension words do not have
 415     to be aligned, their layout is subject to a masking rule to avoid having
 416     to make any shifts during reconstruction.
 417
 418     Lets take a 64 bit extension word which is misaligned by 16 bits.  In
 419     16-bit chunks the layout is:
 420
 421     -XXXX---
 422
 423     In order to reconstruct the 64-bit extension a masking operation is used:
 424
 425     (0XXX | X000) -> endian conversion (if needed) -> result
 426
 427     Object modules must explicitly delineate code and data for endian
 428     conversion to work properly:
 429
 430     code        RO code
 431     const       RO data
 432     data        RW data pre-initialized
 433     bss         RW data zero-fill
 434
 435     All procedures and library vectors must have correct relocation and
 436     prototyping records.
 437
 438     All laid-out data in const and data sections must have correct relocation
 439     and prototyping records for object width.
 440
 441                             Making Translation Easier
 442
 443     The Rune object model is designed to make translation and later-stage
 444     optimizations trivial.  This is why the assembler enforces a fully
 445     independent register space, does not allow read-before-first-load
 446     (through deterministic analysis), implements the negative/positive frame
 447     spaces with special restrictions on the negative space, requires
 448     correct alignment for accesses, and does not allow condition codes to
 449     persist beyond the next instruction.
 450
 451                             Negative Frame Space Domains
 452
 453     This deserves a bit more of an explanation.  When using the %ap, %fp,
 454     and %db-relative negative frame space you have to follow the same rules
 455     as you would for using registers.  That is, only discrete loads and
 456     stores are allowed, a store must be deterministically present before you
 457     can load the same location, you can't mix operation sizes, and there is
 458     no legal effective address.
 459
 460     You can repurpose the negative frame space however you like, as long as
 461     it is deterministic.  That is, you could use -4(%fp), -3(%fp), -2(%fp),
 462     -1(%fp) to hold four 8-bit objects and then repurpose -4(%fp) to hold
 463     one 32-bit object once those four 8-bit objects are no longer needed.
 464
 465     You can't store as one width and load as a different width.  The Rune
 466     assembler must be able to validate that the rules are followed which
 467     it does through simple graph-based analysis.  Rune explicitly does NOT
 468     try to validate potentially complex interactions or perform constant
 469     arithmatic to determine what code paths might actually be followed.
 470
 471     To make things easier you can declare simple .cache or .lcache objects
 472     and allow the Rune assembler to handle the overloading for you.  .cache
 473     labels are converted to valid negative offsets and not exported.
 474
 475     Rune itself requires that the negative frame space not exceeds 32KB
 476     and since the Rune assembler and various intermediate translations
 477     stages might need to allocate their own storage in the negative frame
 478     space, we restrict language output to the (-16384 to -1) range.  This
 479     restriction also serves to put a cap on memory use by various Rune stages
 480     when analyzing a program to guarantee that it remains within reason.
 481
 482                                     File Formats
 483
 484     Rune is meant to be an all-in-one environment.  Rune file formats are not
 485     ELF and intentionally so.  All Rune files except source files are organized
 486     into a hunk file.  The top level of the file always contains a single hunk
 487     and any extranious data past the end of the hunk is ignored (allowing the
 488     whole file to be aligned if desired).
 489
 490     class RuneHunk {
 491             uint32_t    type;           /* hunk type & recursion */
 492             uint32_t    flags;          /* type-specific flags */
 493             uint32_t    size;           /* hunk topology size */
 494             uint32_t    suboffset;      /* &type relative to sub-topology */
 495     };
 496
 497     class MappedRuneHunk {
 498             RuneHunk    head;
 499             uint64_t    data_offset;    /* mapped data offset in file or 0 */
 500             uint64_t    data_size;      /* mapped size */
 501     };
 502
 503     #define HUNK_TYPE_RECURSE   0x80000000
 504     #define HUNK_TYPE_MAPPED    0x40000000
 505
 506     A hunk contains three components, all optional:
 507
 508     (1) Embedded meta-data built into the hunk structure
 509     (2) A memory-mapped component reference (typically 4KB aligned)
 510     (3) Embedded recursive hunk topology.
 511
 512     The embedded meta-data is basically hunk-relative offset 0 through
 513     the suboffset.  If the hunk has a memory-mapped component it is
 514     flagged in the type (it is part of the type id) and specified in
 515     fields just after the basic header.  The offset is relative to the
 516     base file or (if recursing through a higher-level memory-mapped component),
 517     the higher-level memory-mapped component.
 518
 519     The hunk can be flagged recursive.  If a hunk is recursive the next
 520     level down is embedded in a series of hunks from (suboffset) relative
 521     to the hunk head to (size) relative to the hunk end.
 522
 523     A memory-mapped component can represent a recursive hunk structure.
 524     In this situation the memory mapped component acts like a file and
 525     contains a single HUNK (which might itself be recursive), with
 526     extranious data ignored.  The offset for any memory-mapped sub-components
 527     are relative to the top-level HUNK.
 528
 529     This document does not describe the precise hunking format but it roughly
 530     follows the abstractions the Rune language itself needs to operate.
 531