RUNE README 14 April 2016 Matthew Dillon This archive is the Rune language suite and tool chain, written by Matthew Dillon. It is distributed under the Rune copyright which can be found in the COPYRIGHT file. This is basically an open-source copyright but with some restrictions to disallow project forking for a period of time, and a few other things. It will eventually sunset into a 2-clause BSD copyright and I may change the dates to sunset sooner, but for the moment I need to keep relatively tight control over the project. Rune is a powerful object-oriented language with built-in high- performance threading, automatic object locking, bounds checking, and many other powerful features. The current distribution implements the Rune interpreter and the Rune code generator. The interpreter should be easily ported to linux (its DragonFly/BSD specific at the moment). The code generator is x86-64 (64 bit intel) only for now. We will also be soliciting for help writing an ARM generator or making the LLVM backend work (a lot more difficult then it might seem, I immediately hit regressions in LLVM that results in 10-20 minute llvm compilation times for simple programs). Rune uses the standard BSD build system. This README is meant only as a 'getting started' overview. More extensive documentation is in docs/ and, frankly, at least until we get people on board to help write documentation, understanding the language mostly means going through the examples in classes/* and tests/*. libruntime - Support library for rune utilities and runtime for rune binaries. librune - Rune Language parser and resolver. libgen - Rune Language Interpreter and Code Generator. ext_x11 - Basic X11 interface (dynamically loaded at parse-time and run-time), demonstrative of how to add core APIs to the language. rune - RUNE language front-end utility. ras - RAS Rune Assembler utility. classes - Rune Language - base classes shipped with the language (gadgets gfx stdio sys etc) tests - Rune Language - Test programs. docs - Documentation (mostly HTML). REPOSITORY You can obtain the repository with the following GIT command: git clone git://git.dragonflybsd.org/rune.git If you are a DragonFly committer with permission to commit to the master repo, we recommend these additional steps after creating the initial clone: # configure reasonable push defaults and set your username and # email address for pushes. # git config --global push.default simple git config --global --edit # Then edit .git/config and change the repository to the master # repo for pushes by changing the url line: # vi .git/config (edit the file, change the url line) url = ssh://crater.dragonflybsd.org/repository/git/rune.git BUILDING On a DragonFly system the repo can be built as a user or as root. as root - (recommended) This will install Rune in /usr/local/rune/ plus two symlinks in /usr/local/bin for 'ras' and 'rune'. as user - (not recommended) This will install Rune in ~/.rune/ plus two symlinks in ~/bin/ for 'ras' and 'rune'. Be sure that you have a ~/bin for your user and that your $PATH includes it. You can also build with or without an independend object hierarchy. We recommend with (i.e. use make obj). Make sure the source tree is clean before creating the object hierarchy. That way the source tree will remain clean. The suggested build sequence: # WARNING! Be sure not to mix builds without the object hierarchy # with builds with the object hierarchy. Your source # tree should remain clean when building with the object # hierarchy so make sure there aren't any junk object or # other files in it. # make clean cleandepend make obj make depend make -j 8 make all install TESTING Interpreter. From the rune repo: tests/hello.d Compiler (x86 64-bit only for now) examples. You can generate intermediate Rune assembly by specifying a '.m' suffix, or x86 assembly with a '.s' suffix. Otherwise rune will do everything needed to create a binary. Examples: rune tests/hello.d -o x ./x # Generates intermediate RUNE assembly file only. This stops # after the rune build stage and does not run 'ras'. # rune tests/hello.d -o x.m # Generates machine assembly file only. This stops after # 'ras' is run and does not run the platform assembler or linker. # rune tests/hello.d -o x.s # Generates an actual binary. This runs all stages, through ras, # and into the platform assembler and linker to generate a binary. # The binary requires the rune infrastructure in ~/.rune or # /usr/local/rune (depending on how you built rune) to operate. # rune tests/hello.d -o x -g X11 basic tests w/threads. This will create a window with 900 simple input requestors each with a blinking cursor (which is a thread, so 900+1 Rune threads and typically ~4-8 pthreads). Under the interpreter: tests/gfxinput2.d And under the compiler: rune tests/gfxinput2.d -o x ./x Interpretation You can interpret a rune script by making the script executable and giving it a script startup as the first line: #!/usr/local/bin/rune -x See examples, in tests/*.d. The interpreter can run just about everything but threading is still a bit primitive and it will have problems with the few things in Rune which don't use the event handler and block in real system calls. Be sure to use the -x option above to tell rune to stop argument processing after the first filename, so additional arguments you supply when you run it get fed into your rune program verbatim. The interpreter optimizes itself on the fly and is very fast, probably near the top-end of performance for non-JIT interpreters. Nominal code will run only 4-8x slower and tight loops only 25x slower in the interpreter. For example, on a 3.4GHz Haswell the interpreter can execute a tight loop for (i = 0; i < 1000000000L; ++i); with an overhead of around 7ns per loop, which is significantly better than most other interpreted languages. Code Generation Currently the only operational code generator is the 64-bit x86 backend in the rune assembler. This is the default code generation path which you get when you specify an output file on the rune command line. The llvm backend is currently non-operational. I was able to get it to work previously but its a real bitch to interface to and I seem to be hitting degenerate conditions which cause it to twiddle its thumbs for a very long time on simple programs (sometimes upwards of 15 minutes!). So I've had to set it aside for now. Rune generates pseudo-assembly output that it will feed into RAS (the Rune assembler), which then optimizes and converts it to x86. That is fed into the normal compiler backend ('as', then 'cc' in linker mode). RAS makes use of a number of special ELF features such as section grouping and weak stubs. My original intend was to develop an entire toolchain but it simply became too much work so Rune generates assembly. Currently Rune cannot really generate independent object modules for Rune source files. It really needs to generate a single output file which gets fed through the assembler, compiler, and linker. Ultimately the intent is to be able to generate an object module at each library demarc. -- Rune's pseudo-assembly implements a nice orthogonal instruction set along with some meta-instructions for locking and ref-counting to interface with the runtime. Rune generates the assembly (you can see it if you output to a *.m file). The assembly implements an infinite register set and RAS is capable of optimizing stack-based memory objects into registers when given appropriate hints, which RUNE generates. RAS implements a fairly sophisticated register optimizer. It certainly can't compete with GCC but it's fast and it works pretty well. It converts the pseudo-assembly into a basic-block model, adding JMPs as needed (it does not require RUNE to generate basic blocks as output). RAS implements graph-based object life calculations for cacheable stack-based memory objects and virtual registers and can do some instruction pruning based on that. It also implements extra-register spills around calls if it thinks they are needed. It implements conditional optimizations and is capable of both reversing AND inverting conditional tests to produce relatively optimal code flow. RAS does not implement constant expression collapses, that's actually something the RUNE frontend does during the compilation process so RAS does not have (and does not need) any sort of complex expression handler. RAS implements a number of simple instruction optimizations, patricularly when zeroing small blocks with BZERO, using the instruction extension as an alignment hint. RUNE and RAS implement 32, 64, and 128-bit floats but do not implement any floating point vectorization, and in-fact I might never implement vectorization as a native feature. What I may do instead is implement core types for matrices and vectors and basic operations (there aren't actually too many) that do. That is way in the future though. The Rune call model is *NOT* heavily optimized yet. It uses a memory vector for both call arguments AND return values which is not yet registerized. Also the mandatory locking is not heavily optimized yet, so there is a lot of overhead associated with object locks and refs. Despite that, Rune binaries should run quite fast. -- Rune's run-time threading model is very sophisticated. Basically it uses sigjmpbuf()s without signal masking which is pretty much the ultimate in terms of switching speed and compactness (roughly spills 14 registers and switches without having to make any system calls). Rune uses N:M threading so the number of pthreads it creates is relatively limited, making Rune threads EXTREMELY light-weight. Since Rune is a threading-centric language this winds up being very important. The tests/gfxinput2.d test program demonstrates light-weight threading. Documentation The documentation will improve as the project progresses. For the moment you will need to learn by example, primariliy by looking at the various tests in tests/*.d and the class hierarchy in classes/*. The docs/grammer.html and docs/overview.html files are fairly complete.