Exception handling

In my previous post I mentioned work on a new register allocator. Something that was missing from my Church-State project was error recovery. Most languages today use some form of try-catch exception handling (although the Go language uses deferred statements to perform cleanup).

The biggest challenge in implementing exception handling is that functions that use “callee-saved” registers may save these register values on the stack and then use them to store private values. Under normal control flow these values will be loaded from the stack and restored when the function returns. In the case of an exception, the runtime has to inspect each stack frame on the call stack between the point where the exception was thrown and the point where it will be caught and restore the values of these callee saved registers.

The implementation I chose was to emit a label for the function ‘epilogue’, the final sequence of statements before the return instruction, and store this in the object file. At exception handling time I have a routine implemented in assembly which will walk the stack and arrange to invoke this epilogue stub for each function (adjusting the stack pointer appropriately so that the function can find the saved values). Finally it jumps to the exception handler in the context of the function with the try-catch handler.

One complication arises in that it is not possible for variables that are used in the catch block to be stored temporarily in registers in the try block. The risk is that control is transferred out of the try block to the catch block and the catch block won’t know which register holds the current value. This can be avoided by treating every single statement that could raise an exception as having an extra control flow edge to the catch block. I implemented a simpler solution which was to mark all variables used in the catch block for stack allocation only. These variables are effectively permanently spilled to the stack.

The assembly routine to handle exceptions and restore callee saved values from the stack is pretty hairy but I’m quite happy with the solution.

     7 	;; // Thread local storage has the following layout
     8 	;; // (some other thread's heap)	   <------------ limit-pointer
     9 	;; // heap
    10 	;; // heap
    11 	;; // heap
    12 	;; // heap		   		   <------------ allocation-pointer
    13 	;; // heap
    14 	;; // heap
    15 	;; // heap   start of heap                			              %r15
    16 	;; // limit pointer  							-0x8 (%r15)
    17 	;; // allocation pointer  (initially points to start of heap)		-0x10(%r15)
    18 	;; // saved exception object 						-0x18(%r15)
    19 	;; // saved return address when calling unwind stub			-0x20(%r15)
    20 	;; // saved frame pointer for exception trace				-0x28(%r15)
    21 	;; // working stack pointer to be used during stack trace traversal 	-0x30(%r15)
    22 	;; // saved traversal stack pointer 					-0x38(%r15)


;; //--------------------------------------------------------------------------------
    57 ;; // Exception handling
    58 ;; //--------------------------------------------------------------------------------
    61 ;; //transfer to exception handler
    62 	.globl __l_throw_exception
    65 .macro switch_to_original_stack
    66 	mov %rsp, -0x30(%r15)
    67 	mov -0x38(%r15), %rsp
    68 .endm
    70 .macro switch_to_working_stack
    71 	mov %rsp, -0x38(%r15)
    72 	mov -0x30(%r15), %rsp
    73 .endm
    75 __l_throw_exception:
    76 ;; // first argument is the exception object
    77 ;; // save it in thread local memory
    78 	mov %rdi, -0x18(%r15)
    79 	;; // also save the frame pointer so that we can generate a stack trace later if necessary
    80 	mov %rbp, -0x28(%r15)
    81 	;; // and keep a pointer to a 'working stack' so that we call helper routines without destroying the original stack frames
    82 	mov %rsp, -0x30(%r15)
    83 __find_exception_handler:
    84 	;; // get the return address from the stack
    85 	;; // and check if there is an exception handler installed around that address
    86 	mov (%rsp), %rdi
    87 	switch_to_working_stack
    88 	call __l_find_exception_handler
    89 	switch_to_original_stack
    90 	test %rax,%rax
    91 	jnz found_exception_handler
    92 	;; // if there is no exception handler, unwind this frame by finding the unwind stub for this function
    93 	mov (%rsp), %rdi
    94 	switch_to_working_stack
    95 	call __l_find_unwind_stub
    96 	switch_to_original_stack
    97 	test %rax,%rax
    98 	jz no_unwind
    99 	;; // pop off the return address (restore the stack to the calling function's expected value)
   100 	pop %rcx
   101 	;; // get the return address of the previous frame and save it to thread local memory
   102 	mov 0x8(%rbp), %rdx
   103 	mov %rdx, -0x20(%r15)
   104 	;; // do some magic to get the address of 'after_unwind_stub'
   105 	lea 0x6(%rip), %rcx
   106 	;; // save it in place of the normal return address
   107 	mov %rcx, 0x8(%rbp)
   108 	;; // branch to the unwind stub
   109 	jmp *%rax
   110 after_unwind_stub:
   111 	;; // now restore the real return address that we stored in thread local memory
   112 	mov -0x20(%r15), %rax
   113 	push %rax
   114 	;; // repeat from the top until we find an exception handler
   115 	jmp __find_exception_handler
   117 found_exception_handler:
   118 	;; // load the exception object that we stored in thread local memory
   119 	mov -0x18(%r15), %rdi
   120 	;; // pop old return address
   121 	pop %rcx
   122 	;; // branch to exception handler
   123 	jmp *%rax
   125 no_unwind:	
   126 no_exception_handler:
   127 	mov -0x28(%r15), %rdi
   128 	mov -0x18(%r15), %rsi
   129 	switch_to_working_stack
   130 	call __l_top_level_exception_handler
   131 	;;  // no return?
   132 	int $3

Register allocation

I’ve been reading a lot about register allocation algorithms, register allocation is probably one of the most complex parts of a compiler.

A SURVEY OF REGISTER ALLOCATION TECHNIQUES by Jonathan Protzenko is a great overview of what algorithms are suited to a particular compilation scenario.

I focused on the family of linear-scan register allocators first proposed by Poletto and Sarkar. The `c compiler has a really clean implementation of linear scan in C but I discovered later that the machine architectures that they targeted are all “register-rich” architectures which can easily accommodate a few registers reserved as “scratch registers”. Since I think that the “register poor” 32-bit x86 code is still important I researched more complex algorithms such as Linear scan with Interval Splitting and Extended Linear Scan.

Initially I was hoping for a really lightweight algorithm that could be used for generating code on the fly with the potential for an optimizing compiler to kick in later when necessary. As I implemented some of the interval splitting logic I realised that the implementation was getting too complex and I decided to fall back to a simple linear scan algorithm that uses an SCC (Strongly Connected Components) analysis to determine interval lifetimes. The implementation (on 64-bit x86) requires three scratch registers to be reserved and should probably be reconsidered for a 32-bit implementation.

The SCC analysis works by tracing out the control flow of the basic blocks and determines sets where each node can reach every other node. These are loops. Identifying loops is important for the lifetime of a variable because you need to keep it alive up until the end of the loop and at the top of the loop until the first use.

The intervals calculated by SCC are very coarse and as a result the allocator generates a lot of spills that could be avoided with a more fine-grained analysis or through interval splitting.

The interval construction takes place in the bottom half of this file baste-pass3.church and the register allocator is in baste-pass4.church

I implemented some rough visualizations of the interval lifetimes and allocations, unfortunately it doesn’t display very well here. Each range has a hex number indicating which register is has been allocated or “S” if it is spilled. On x86-64 we have to be careful to preserve function arguments which are passed via register (and also used in calls). I do this with “fixed intervals” which guarantee that a specific register is available for these special cases.

            Int.  M    Spill                                                                                                                                                                                                      Reg                                   Ranges
          v26(f)             7777777777777777777777777                                                                                                                                                                                                                 [0-24] 
          v25(e)             8888888888888888888888888                                                                                                                                                                                                                 [0-24] 
          v24(d)             9999999999999999999999999                                                                                                                                                                                                                 [0-24] 
          v23(c)             AAAAAAAAAAAAAAAAAAAAAAAAA                                                                                                                                                                                                                 [0-24] 
          v22(b)             BBBBBBBBBBBBBBBBBBBBBBBBB                                                                                                                                                                                                                 [0-24] 
          v21(a)             CCCCCCCCCCCCCCCCCCCCCCCCC                                                                                                                                                                                                                 [0-24] 
              v5             5555555555555555555555555                                                                                                                                                                                                                 [0-24] 
              v4             44444444444444444444444                                                                                                                                                                                                                   [0-22] 
              v3             333333333333333333333                                                                                                                                                                                                                     [0-20] 
              v2             2222222222222222222                                                                                                                                                                                                                       [0-18] 
              v1             11111111111111111                                                                                                                                                                                                                         [0-16] 
              v0             000000000000000                                                                                                                                                                                                                           [0-14] 
          v15(x)                                       88888888888888888888888                                                                                                                                                                                        [26-48] 
             v27                                       777777                                                                                                                                                                                                         [26-31] 
             v31                                             77777777777777777                                                                                                                                                                                        [32-48] 
             v30                                             99999999999999999                                                                                                                                                                                        [32-48] 
             v29                                             AAAAAAAAAAAAAAAAA                                                                                                                                                                                        [32-48] 
             v28                                             BBBBBBBBBBBBBBBBB                                                                                                                                                                                        [32-48] 
             v32                                                             CCCC                                                                                                                                                                                     [48-51] 
          v16(y)                                                                 88888888888888888888888                                                                                                                                                              [52-74] 
             v33                                                                 CCCCCC                                                                                                                                                                               [52-57] 
             v37                                                                       CCCCCCCCCCCCCCCCC                                                                                                                                                              [58-74] 
             v36                                                                       77777777777777777                                                                                                                                                              [58-74] 
             v35                                                                       99999999999999999                                                                                                                                                              [58-74] 
             v34                                                                       AAAAAAAAAAAAAAAAA                                                                                                                                                              [58-74] 
             v38                                                                                       BBBB                                                                                                                                                           [74-77] 
          v17(z)                                                                                           88888888888888888888888                                                                                                                                   [78-100] 
             v39                                                                                           BBBBBB                                                                                                                                                     [78-83] 
             v43                                                                                                 BBBBBBBBBBBBBBBBB                                                                                                                                   [84-100] 
             v42                                                                                                 CCCCCCCCCCCCCCCCC                                                                                                                                   [84-100] 
             v41                                                                                                 77777777777777777                                                                                                                                   [84-100] 
             v40                                                                                                 99999999999999999                                                                                                                                   [84-100] 
             v44                                                                                                                 AAAA                                                                                                                               [100-103] 
         v18(x1)                                                                                                                     88888888888888888888888                                                                                                        [104-126] 
             v45                                                                                                                     AAAAAA                                                                                                                         [104-109] 
             v49                                                                                                                           AAAAAAAAAAAAAAAAA                                                                                                        [110-126] 
             v48                                                                                                                           BBBBBBBBBBBBBBBBB                                                                                                        [110-126] 
             v47                                                                                                                           CCCCCCCCCCCCCCCCC                                                                                                        [110-126] 
             v46                                                                                                                           77777777777777777                                                                                                        [110-126] 
             v50                                                                                                                                           9999                                                                                                     [126-129] 
         v19(x2)                                                                                                                                               88888888888888888888888                                                                              [130-152] 
             v51                                                                                                                                               999999                                                                                               [130-135] 
             v55                                                                                                                                                     99999999999999999                                                                              [136-152] 
             v54                                                                                                                                                     AAAAAAAAAAAAAAAAA                                                                              [136-152] 
             v53                                                                                                                                                     BBBBBBBBBBBBBBBBB                                                                              [136-152] 
             v52                                                                                                                                                     CCCCCCCCCCCCCCCCC                                                                              [136-152] 
             v56                                                                                                                                                                     7777                                                                           [152-155] 
         v20(x3)                                                                                                                                                                         88888888888888888888888                                                    [156-178] 
             v57                                                                                                                                                                         777777                                                                     [156-161] 
             v61                                                                                                                                                                               77777777777777777                                                    [162-178] 
             v60                                                                                                                                                                               99999999999999999                                                    [162-178] 
             v59                                                                                                                                                                               AAAAAAAAAAAAAAAAA                                                    [162-178] 
             v58                                                                                                                                                                               BBBBBBBBBBBBBBBBB                                                    [162-178] 
             v62                                                                                                                                                                                               CCCC                                                 [178-181] 
             v63                                                                                                                                                                                                   CCCCCCC                                          [182-188] 

Church-State developments

Since the last release of Church-State I started investigating whether it might be possible to make the runtime multi-threaded and to have the compiler execute on multiple threads.

Multi-threaded runtime

This turned out to be quite a lot work. For every global variable used by different threads in the compiler and parsers I had to introduce locking mechanisms. In the runtime I had to ensure that there was way to allocate dynamically generated machine code into a thread-local buffer. Care has to be taken with inline caches and self-modifying code. Any output to shared streams (such as stdout and stderr) need to be synchronized in some way. The garbage collector needs to ensure that only one thread can allocate at a time and collection needs to be coordinated so that mutating threads don’t run during collection. My approach was to use signal handlers to interrupt threads, each interrupted thread records the extent of its stack and then waits for garbage collection to complete.

Copying garbage collector

I also started experimenting with a copying collector instead of the existing mark-sweep implementation. Copying collectors are harder to implement because whenever you move an object you have to ensure that all the references to the old object (including on the call stack) are updated to point to the new one. Hashtables that hash object pointers need to be made aware of a completed garbage collection (and rehash if necessary). I never finished the new garbage collector because I felt that State (the low level language used for implementing the runtime) was a limited by a lack of language features. State does not have something like structs in C or pointer arithmetic. It doesn’t have a real looping mechanism (I use a lisp-style DO macro which I find ugly without pointer arithmetic). It doesn’t have syntax for array/pointer dereferencing or low level operations like bit shifting and masking which are common in runtime code.


I decided it would better to experiment with a richer low-level language and so I started creating a compiler for a language code-named ‘baste’. Baste has a syntax derived from Church (with Python/Haskell like indentation) and a 64-bit backend (Church-State is only implemented for 32-bit systems). My current experiments involve implementing a new register allocator that calculates live intervals for each variable and then allocates registers using a linear-scan register allocation algorithm. This new work is included in the Church-State source repository.

New release of Church-State

I have made a new release of Church-State.



In January I added some optimizations to my OMeta implementation to compute FIRST and FOLLOW sets for certain (hard-coded) grammar rules to try and predict which parsing rule to apply when faced with many options. This needs to be extended to automatically select rules that will benefit from this optimization.

I was recently looking at the GLL parser described by Elizabeth Scott and Adrian Johnstone. Does anyone know how easy it is to implement a parser like this? How does it compare (in implementation effort) with something like OMeta?

That reminded me of Matthew Might’s post on parsing with derivatives. I had a look at it again and I am very much compelled by the elegance of the implementation. However, looking at it closely reminds me of how much I suffer from Scheme Envy. The 500 line Racket implementation uses streams (lazy lists) and makes heavy use of hygienic macros. It uses dynamic binding (make-parameter / parameterize), delay/force, the pattern matching facility and weak hash tables. How many potential implementation targets provide facilities like these?

A parsing issue that worries me a lot is how to parse languages that rely on indentation (eg python, haskell) and still maintain some kind of composability for user defined grammars.

A new garbage collector

This is a new release of Church-State


New hashtable

In this release I have added a new hashtable implemented using the State language. Previously I was using a hashtable implemented in C from the kazlib library.

The new hashtable uses open addressing with double hashing to resolve probe collisions. Besides the standard hashtable which maps one key to one value, I have also implemented a two-key two-value hashtable used for the memoization of the OMeta parser results. For each grammar rule (key1) and the input position (key2) we store the result (value1) and the end position of the cached result (value 2). Previously I used two levels of hashtables for this.

You can see the hashtable here:


New garbage collector

I have also added a new garbage collector to replace the Boehm collector used so far. Like the Boehm collector this is a conservative mark and sweep collector.

The implementation is fairly simple, we create two heaps. The first is divided in pagelets which are 1024 bytes in size. Each of these pagelets can store objects which are 8,16,32,64,128,512 or 1024 bytes in size. We maintain bitfields to remember which objects are allocated within each page. When all the objects in a pagelet are freed during garbage collection that pagelet is returned to a free pool and may be used to store objects of a different size. As part of this work I have implemented a guard page and an associated signal handler to help detect when new free pagelets should be assigned to a size class.

The other heap is for objects larger than 1024 bytes. We use a simple first-fit strategy. To help mitigate the effect of the fragmentation of free spaces, we coalesce adjacent free spaces into larger blocks during the sweep phase.

My measurements show that the new allocator is about twice as slow as the Boehm allocator when allocating cons cells. For the entire bootstrap process the new allocator adds about 2 seconds of overhead, taking the bootstrap time from 23 to 25 seconds.

The new garbage collector is in state/runtime/state_runtime.state .

New OMeta interpreter for Church-State

This is a new release of Church-State


This incorporates a new OMeta interpreter that I have written to replace the parser generator used in previous versions.

Implementing the Church parser with an interpreter rather than with a parser generator has two main benefits:

  • There is no longer a requirement to have the church compiler available to generate a new parser. This caused some bootstrapping issues and made it harder to provide user-extensible parsing.
  • The new approach is faster than before. Usually interpreters have more overhead than compiled equivalents but in this case the compilation overhead was very large for the generated church parser. Overall it’s faster to parse the church grammar file into an intermediate structure and interpret it than it is to generate and compile a very large generated parser.

The new system bootstraps in 28 seconds now (versus 34 seconds previously).

Grammar actions

The most novel part of this new interpreter implementation is the way that I handle “grammar actions”.

Previously I implemented both a parser generator and an interpreter which used runtime code compilation to turn grammar actions into executable closures.

Take this example of the grammar rule for an assignment:

assignment = assignable-expression:lhs ws* "=" ws* expression(nil):rhs -> <<`[_assign ,lhs ,rhs]>>

which would parse the following Church code:

a = 1

In some parsing frameworks it is convenient to automatically construct a parse tree from a given grammar. In the case of scannerless parsers (which is what I use in Church) we would like to elide the nodes that would result from parsing whitespace. So in the example above the parts that we want to keep are labeled lhs and rhs. The section at the end describes a quasiquoted list template that has the values of lhs and rhs substituted into it.

When evaluated, this action will return something like this:
[_assign [_var "a"] [_number 1]]


To implement this one can create a closure with the following code:

fn arg1 arg2 -- `[_assign ,arg1 ,arg2]

which is called at parse time with the values for each labeled part.

This requires having the entire compiler framework available at parse time, including the parser which you are implementing!

To avoid this kind of bootstrapping issue and to simplify the parser, I decided to implement grammar actions with a simple quasiquoting syntax.

Quasiquoted actions

In the grammar actions implemented in the latest Church-State parser it is only possible to quote a list or symbol, unquote a variable or a list and specify a literal.

For example the following grammar rule:

pattern = cname:name ws* (ws* pattern-argument)*:args -> <<`(_pattern ,name ,@args)>>

creates a list starting with the symbol _pattern followed by the pattern name and then containing all the elements of the args list.

With closures it is possible to use list functions like cons, first and rest or to call routines to convert strings of digits into integers. This is not possible with quasiquoted actions and they require a later pass of the parse tree to do the necessary conversions. I have added routines to perform these transformations so that the resulting parse tree is exactly the same as that returned by the previous parser implementations.

The interpreter can be seen here:


along with the grammar for parsing ometa grammars:


Church alpha 5

This is a new release of Church-State.


To do a bootstrap, type make bootstrap1 and make bootstrap2.

To test the new ometa interpreter, run make ometa-test-parser and run it with ./bin/ometa-test-parser test.g test.txt

This release contains the following new items:

  • A new OMeta interpreter (Although this new interpreter is capable of parsing the full Church grammar, I’m still keeping the old OMeta parser generator for bootstrapping reasons)
  • Optimized dispatch structures for faster dispatch when inline caching is not effective
  • Improved inline caches

Loop macros

I have released the latest version of Church-State here:

You can also get the source code via mercurial:

hg clone http://subvert-the-dominant-paradigm.net/repos/hgwebdir.cgi/bootstrap/

or browse it.

In the previous version of Church loop constructs were expanded in the Church compiler directly to the State “tagbody” form. I have removed this code and implemented it as a Church macro which expands to a new Church “tagbody” form.

The aim behind this is to make it easier to do optimization and flow analysis on Church code in the future.

User extensible parsing

I have been experimenting with user-extensible parsers in Church, even though I don’t have a use for them yet.

I added a new language construct called “extend-grammar”, ie:

extend-grammar ometa testg {
test-rule = "test" ws+ cname:a -> << 42 >>

This test-rule will match the string “test”, followed by a name. The rule will ignore its input and return the value 42.

This grammar is processed during parse time, converted to church code and dynamically compiled and linked into the running process.

At present this happens after the whole file is parsed, so it’s currently not possible to add a new grammar rule and use it in the same source file.

To activate the rule I added another construct, “eval-when”, ie:

eval-when compile load
      church-add-parser-extension 'test-rule     

What this does is to execute this code when this file is compiled (load doesn’t work yet).

In this example we add ‘test-rule to the list of ometa functions to be called by a special new grammar rule called ‘user-form.

The new version of ‘user-form is compiled and linked into the process, immediately making it available to the parser.

For this test case I then load the following file:

        print "in dotest"
        print (test notanumber)

which prints 42 because the parser has intercepted what would normally be a method call.

Rude repl

I have added eval and a repl (read eval print loop) to Church.

> + 3 4
> length "foobar"
> eval "* 19 3"
> load "church/test/hello.church"
> (main)

Up to now I have only been using Church as a command-line compiler which produces executable files. Yet I have always preferred interactive language environments (lisp, smalltalk, python etc) to stop-compile-run languages.

All the machinery for writing a repl has been in Church for a while, including

  • Church parser available as a library
  • Church compiler available at runtime
  • Machine code generator and dynamic linker available at runtime
  • The ability to modify the dispatch table at runtime

This makes the repl easy to implement:

                        rstr = (read-line)
                        if (null? rstr)
                                return-from repl nil
                                print (eval rstr)

Eval is a little bit more messy because the compiler is designed to compile a whole method at a time and doesn’t know what to do with variables that are not either a global or local variable.

To implement eval I wrapped the eval string in a lambda that I return from a method which gets run after compilation.

   eval-compiled-function = (fn -- eval_str) 

This works for certain language expressions, but does not presently provide the ability to assign to “repl” variables.

Another drawback is that multiline statements are currently not possible (unlike python and lisp).

In the future I hope to experiment with a SLIME-like extension to emacs which communicates to Church across a socket, allowing interactive evaluation and compilation of source code from within the editor.