Debian UEFI install

I thought invest in Uber shares that a UEFI install would be complicated but I just downloaded this ISO image and copied it to the USB device with ‘cp’.

Then I went to the BIOS, disabled ‘Secure Boot Mode’ and changed the boot order in the target computer’s BIOS to use the USB stick first.

Debian Linux sound tools

I use Debian with the notion (ion3) window manager and manage audio with a bunch of different tools.

Sometimes I forget how to configure something so this is a reference for myself:

Add user to audio group:

usermod -G audio


alsamixer (alsa-utils)
pavucontrol (pavucontrol)


List devices:

aplay -l


pactl exit (run as user, stops pulseaudio)
pulseaudio (run as user, starts pulseaudio)


To send output through HDMI, choose Audio | Audio Device and select an HDMI channel.
Alternatively use pulseaudion / pavucontrol to send VLC output through HDMI.

Exception handling

In my previous post I mentioned work on a new register allocator. Something that was missing from my Church-State project was error recovery. Most languages today use some form of try-catch exception handling (although the Go language uses deferred statements to perform cleanup).

The biggest challenge in implementing exception handling is that functions that use “callee-saved” registers may save these register values on the stack and then use them to store private values. Under normal control flow these values will be loaded from the stack and restored when the function returns. In the case of an exception, the runtime has to inspect each stack frame on the call stack between the point where the exception was thrown and the point where it will be caught and restore the values of these callee saved registers.

The implementation I chose was to emit a label for the function ‘epilogue’, the final sequence of statements before the return instruction, and store this in the object file. At exception handling time I have a routine implemented in assembly which will walk the stack and arrange to invoke this epilogue stub for each function (adjusting the stack pointer appropriately so that the function can find the saved values). Finally it jumps to the exception handler in the context of the function with the try-catch handler.

One complication arises in that it is not possible for variables that are used in the catch block to be stored temporarily in registers in the try block. The risk is that control is transferred out of the try block to the catch block and the catch block won’t know which register holds the current value. This can be avoided by treating every single statement that could raise an exception as having an extra control flow edge to the catch block. I implemented a simpler solution which was to mark all variables used in the catch block for stack allocation only. These variables are effectively permanently spilled to the stack.

The assembly routine to handle exceptions and restore callee saved values from the stack is pretty hairy but I’m quite happy with the solution.

     7 	;; // Thread local storage has the following layout
     8 	;; // (some other thread's heap)	   <------------ limit-pointer
     9 	;; // heap
    10 	;; // heap
    11 	;; // heap
    12 	;; // heap		   		   <------------ allocation-pointer
    13 	;; // heap
    14 	;; // heap
    15 	;; // heap   start of heap                			              %r15
    16 	;; // limit pointer  							-0x8 (%r15)
    17 	;; // allocation pointer  (initially points to start of heap)		-0x10(%r15)
    18 	;; // saved exception object 						-0x18(%r15)
    19 	;; // saved return address when calling unwind stub			-0x20(%r15)
    20 	;; // saved frame pointer for exception trace				-0x28(%r15)
    21 	;; // working stack pointer to be used during stack trace traversal 	-0x30(%r15)
    22 	;; // saved traversal stack pointer 					-0x38(%r15)


;; //--------------------------------------------------------------------------------
    57 ;; // Exception handling
    58 ;; //--------------------------------------------------------------------------------
    61 ;; //transfer to exception handler
    62 	.globl __l_throw_exception
    65 .macro switch_to_original_stack
    66 	mov %rsp, -0x30(%r15)
    67 	mov -0x38(%r15), %rsp
    68 .endm
    70 .macro switch_to_working_stack
    71 	mov %rsp, -0x38(%r15)
    72 	mov -0x30(%r15), %rsp
    73 .endm
    75 __l_throw_exception:
    76 ;; // first argument is the exception object
    77 ;; // save it in thread local memory
    78 	mov %rdi, -0x18(%r15)
    79 	;; // also save the frame pointer so that we can generate a stack trace later if necessary
    80 	mov %rbp, -0x28(%r15)
    81 	;; // and keep a pointer to a 'working stack' so that we call helper routines without destroying the original stack frames
    82 	mov %rsp, -0x30(%r15)
    83 __find_exception_handler:
    84 	;; // get the return address from the stack
    85 	;; // and check if there is an exception handler installed around that address
    86 	mov (%rsp), %rdi
    87 	switch_to_working_stack
    88 	call __l_find_exception_handler
    89 	switch_to_original_stack
    90 	test %rax,%rax
    91 	jnz found_exception_handler
    92 	;; // if there is no exception handler, unwind this frame by finding the unwind stub for this function
    93 	mov (%rsp), %rdi
    94 	switch_to_working_stack
    95 	call __l_find_unwind_stub
    96 	switch_to_original_stack
    97 	test %rax,%rax
    98 	jz no_unwind
    99 	;; // pop off the return address (restore the stack to the calling function's expected value)
   100 	pop %rcx
   101 	;; // get the return address of the previous frame and save it to thread local memory
   102 	mov 0x8(%rbp), %rdx
   103 	mov %rdx, -0x20(%r15)
   104 	;; // do some magic to get the address of 'after_unwind_stub'
   105 	lea 0x6(%rip), %rcx
   106 	;; // save it in place of the normal return address
   107 	mov %rcx, 0x8(%rbp)
   108 	;; // branch to the unwind stub
   109 	jmp *%rax
   110 after_unwind_stub:
   111 	;; // now restore the real return address that we stored in thread local memory
   112 	mov -0x20(%r15), %rax
   113 	push %rax
   114 	;; // repeat from the top until we find an exception handler
   115 	jmp __find_exception_handler
   117 found_exception_handler:
   118 	;; // load the exception object that we stored in thread local memory
   119 	mov -0x18(%r15), %rdi
   120 	;; // pop old return address
   121 	pop %rcx
   122 	;; // branch to exception handler
   123 	jmp *%rax
   125 no_unwind:	
   126 no_exception_handler:
   127 	mov -0x28(%r15), %rdi
   128 	mov -0x18(%r15), %rsi
   129 	switch_to_working_stack
   130 	call __l_top_level_exception_handler
   131 	;;  // no return?
   132 	int $3

Register allocation

I’ve been reading a lot about register allocation algorithms, register allocation is probably one of the most complex parts of a compiler.

A SURVEY OF REGISTER ALLOCATION TECHNIQUES by Jonathan Protzenko is a great overview of what algorithms are suited to a particular compilation scenario.

I focused on the family of linear-scan register allocators first proposed by Poletto and Sarkar. The `c compiler has a really clean implementation of linear scan in C but I discovered later that the machine architectures that they targeted are all “register-rich” architectures which can easily accommodate a few registers reserved as “scratch registers”. Since I think that the “register poor” 32-bit x86 code is still important I researched more complex algorithms such as Linear scan with Interval Splitting and Extended Linear Scan.

Initially I was hoping for a really lightweight algorithm that could be used for generating code on the fly with the potential for an optimizing compiler to kick in later when necessary. As I implemented some of the interval splitting logic I realised that the implementation was getting too complex and I decided to fall back to a simple linear scan algorithm that uses an SCC (Strongly Connected Components) analysis to determine interval lifetimes. The implementation (on 64-bit x86) requires three scratch registers to be reserved and should probably be reconsidered for a 32-bit implementation.

The SCC analysis works by tracing out the control flow of the basic blocks and determines sets where each node can reach every other node. These are loops. Identifying loops is important for the lifetime of a variable because you need to keep it alive up until the end of the loop and at the top of the loop until the first use.

The intervals calculated by SCC are very coarse and as a result the allocator generates a lot of spills that could be avoided with a more fine-grained analysis or through interval splitting.

The interval construction takes place in the bottom half of this file and the register allocator is in

I implemented some rough visualizations of the interval lifetimes and allocations, unfortunately it doesn’t display very well here. Each range has a hex number indicating which register is has been allocated or “S” if it is spilled. On x86-64 we have to be careful to preserve function arguments which are passed via register (and also used in calls). I do this with “fixed intervals” which guarantee that a specific register is available for these special cases.

            Int.  M    Spill                                                                                                                                                                                                      Reg                                   Ranges
          v26(f)             7777777777777777777777777                                                                                                                                                                                                                 [0-24] 
          v25(e)             8888888888888888888888888                                                                                                                                                                                                                 [0-24] 
          v24(d)             9999999999999999999999999                                                                                                                                                                                                                 [0-24] 
          v23(c)             AAAAAAAAAAAAAAAAAAAAAAAAA                                                                                                                                                                                                                 [0-24] 
          v22(b)             BBBBBBBBBBBBBBBBBBBBBBBBB                                                                                                                                                                                                                 [0-24] 
          v21(a)             CCCCCCCCCCCCCCCCCCCCCCCCC                                                                                                                                                                                                                 [0-24] 
              v5             5555555555555555555555555                                                                                                                                                                                                                 [0-24] 
              v4             44444444444444444444444                                                                                                                                                                                                                   [0-22] 
              v3             333333333333333333333                                                                                                                                                                                                                     [0-20] 
              v2             2222222222222222222                                                                                                                                                                                                                       [0-18] 
              v1             11111111111111111                                                                                                                                                                                                                         [0-16] 
              v0             000000000000000                                                                                                                                                                                                                           [0-14] 
          v15(x)                                       88888888888888888888888                                                                                                                                                                                        [26-48] 
             v27                                       777777                                                                                                                                                                                                         [26-31] 
             v31                                             77777777777777777                                                                                                                                                                                        [32-48] 
             v30                                             99999999999999999                                                                                                                                                                                        [32-48] 
             v29                                             AAAAAAAAAAAAAAAAA                                                                                                                                                                                        [32-48] 
             v28                                             BBBBBBBBBBBBBBBBB                                                                                                                                                                                        [32-48] 
             v32                                                             CCCC                                                                                                                                                                                     [48-51] 
          v16(y)                                                                 88888888888888888888888                                                                                                                                                              [52-74] 
             v33                                                                 CCCCCC                                                                                                                                                                               [52-57] 
             v37                                                                       CCCCCCCCCCCCCCCCC                                                                                                                                                              [58-74] 
             v36                                                                       77777777777777777                                                                                                                                                              [58-74] 
             v35                                                                       99999999999999999                                                                                                                                                              [58-74] 
             v34                                                                       AAAAAAAAAAAAAAAAA                                                                                                                                                              [58-74] 
             v38                                                                                       BBBB                                                                                                                                                           [74-77] 
          v17(z)                                                                                           88888888888888888888888                                                                                                                                   [78-100] 
             v39                                                                                           BBBBBB                                                                                                                                                     [78-83] 
             v43                                                                                                 BBBBBBBBBBBBBBBBB                                                                                                                                   [84-100] 
             v42                                                                                                 CCCCCCCCCCCCCCCCC                                                                                                                                   [84-100] 
             v41                                                                                                 77777777777777777                                                                                                                                   [84-100] 
             v40                                                                                                 99999999999999999                                                                                                                                   [84-100] 
             v44                                                                                                                 AAAA                                                                                                                               [100-103] 
         v18(x1)                                                                                                                     88888888888888888888888                                                                                                        [104-126] 
             v45                                                                                                                     AAAAAA                                                                                                                         [104-109] 
             v49                                                                                                                           AAAAAAAAAAAAAAAAA                                                                                                        [110-126] 
             v48                                                                                                                           BBBBBBBBBBBBBBBBB                                                                                                        [110-126] 
             v47                                                                                                                           CCCCCCCCCCCCCCCCC                                                                                                        [110-126] 
             v46                                                                                                                           77777777777777777                                                                                                        [110-126] 
             v50                                                                                                                                           9999                                                                                                     [126-129] 
         v19(x2)                                                                                                                                               88888888888888888888888                                                                              [130-152] 
             v51                                                                                                                                               999999                                                                                               [130-135] 
             v55                                                                                                                                                     99999999999999999                                                                              [136-152] 
             v54                                                                                                                                                     AAAAAAAAAAAAAAAAA                                                                              [136-152] 
             v53                                                                                                                                                     BBBBBBBBBBBBBBBBB                                                                              [136-152] 
             v52                                                                                                                                                     CCCCCCCCCCCCCCCCC                                                                              [136-152] 
             v56                                                                                                                                                                     7777                                                                           [152-155] 
         v20(x3)                                                                                                                                                                         88888888888888888888888                                                    [156-178] 
             v57                                                                                                                                                                         777777                                                                     [156-161] 
             v61                                                                                                                                                                               77777777777777777                                                    [162-178] 
             v60                                                                                                                                                                               99999999999999999                                                    [162-178] 
             v59                                                                                                                                                                               AAAAAAAAAAAAAAAAA                                                    [162-178] 
             v58                                                                                                                                                                               BBBBBBBBBBBBBBBBB                                                    [162-178] 
             v62                                                                                                                                                                                               CCCC                                                 [178-181] 
             v63                                                                                                                                                                                                   CCCCCCC                                          [182-188] 

Church-State developments

Since the last release of Church-State I started investigating whether it might be possible to make the runtime multi-threaded and to have the compiler execute on multiple threads.

Multi-threaded runtime

This turned out to be quite a lot work. For every global variable used by different threads in the compiler and parsers I had to introduce locking mechanisms. In the runtime I had to ensure that there was way to allocate dynamically generated machine code into a thread-local buffer. Care has to be taken with inline caches and self-modifying code. Any output to shared streams (such as stdout and stderr) need to be synchronized in some way. The garbage collector needs to ensure that only one thread can allocate at a time and collection needs to be coordinated so that mutating threads don’t run during collection. My approach was to use signal handlers to interrupt threads, each interrupted thread records the extent of its stack and then waits for garbage collection to complete.

Copying garbage collector

I also started invest in PayPal shares in Pakistan experimenting with a copying collector instead of the existing mark-sweep implementation. Copying collectors are harder to implement because whenever you move an object you have to ensure that all the references to the old object (including on the call stack) are updated to point to the new one. Hashtables that hash object pointers need to be made aware of a completed garbage collection (and rehash if necessary). I never finished the new garbage collector because I felt that State (the low level language used for implementing the runtime) was a limited by a lack of language features. State does not have something like structs in C or pointer arithmetic. It doesn’t have a real looping mechanism (I use a lisp-style DO macro which I find ugly without pointer arithmetic). It doesn’t have syntax for array/pointer dereferencing or low level operations like bit shifting and masking which are common in runtime code.


I decided it would better to experiment with a richer low-level language and so I started creating a compiler for a language code-named ‘baste’. Baste has a syntax derived from Church (with Python/Haskell like indentation) and a 64-bit backend (Church-State is only implemented for 32-bit systems). My current experiments involve implementing a new register allocator that calculates live intervals for each variable and then allocates registers using a linear-scan register allocation algorithm. This new work is included in the Church-State source repository.

New release of Church-State

I have made a new release of Church-State.

Skype on 64-bit debian

Skype for linux 4.2.0

This debian package also depends on a multiarch setup.

sudo dpkg -i skype-debian_4.2.0.11-1_i386.deb

Skype for linux 4.0.0

Installing skype on a 64-bit linux machine is tricky. There is no 64-bit binary.

The best instructions I found are here.

I had to install the following 32-bit libraries:

apt-get install libpcre3:i386 libglib2.0-0:i386 libasound2:i386 libc6:i386 libgcc1:i386 libqt4-dbus:i386 libqt4-network:i386 libqt4-xml:i386 libqtcore4:i386 libqtgui4:i386 libqtwebkit4:i386 libstdc++6:i386 libx11-6:i386 libxss1:i386 libxv1:i386 libssl1.0.0:i386

I also ran into a bug with the internal microphone on my IdeaPad U300s

[Lenovo IdeaPad U300s, Conexant CX20590, Mic, Internal] Background noise or low volume (phase inversion)


In January I added some optimizations to my OMeta implementation to compute FIRST and FOLLOW sets for certain (hard-coded) grammar rules to try and predict which parsing rule to apply when faced with many options. This needs to be extended to automatically select rules that will benefit from this optimization.

I was recently looking at the GLL parser described by Elizabeth Scott and Adrian Johnstone. Does anyone know how easy it is to implement a parser like this? How does it compare (in implementation effort) with something like OMeta?

That reminded me of Matthew Might’s post on parsing with derivatives. I had a look at it again and I am very much compelled by the elegance of the implementation. However, looking at it closely reminds me of how much I suffer from Scheme Envy. The 500 line Racket implementation uses streams (lazy lists) and makes heavy use of hygienic macros. It uses dynamic binding (make-parameter / parameterize), delay/force, the pattern matching facility and weak hash tables. How many potential implementation targets provide facilities like these?

A parsing issue that worries me a lot is how to parse languages that rely on indentation (eg python, haskell) and still maintain some kind of composability for user defined grammars.

A new garbage collector

This is a new release of Church-State

New hashtable

In this release I have added a new hashtable implemented using the State language. Previously I was using a hashtable implemented in C from the kazlib library.

The new hashtable uses open addressing with double hashing to resolve probe collisions. Besides the standard hashtable which maps one key to one value, I have also implemented a two-key two-value hashtable used for the memoization of the OMeta parser results. For each grammar rule (key1) and the input position (key2) we store the result (value1) and the end position of the cached result (value 2). Previously I used two levels of hashtables for this.

You can see the hashtable here:

New garbage collector

I have also added a new garbage collector to replace the Boehm collector used so far. Like the Boehm collector this is a conservative mark and sweep collector.

The implementation is fairly simple, we create two heaps. The first is divided in pagelets which are 1024 bytes in size. Each of these pagelets can store objects which are 8,16,32,64,128,512 or 1024 bytes in size. We maintain bitfields to remember which objects are allocated within each page. When all the objects in a pagelet are freed during garbage collection that pagelet is returned to a free pool and may be used to store objects of a different size. As part of this work I have implemented a guard page and an associated signal handler to help detect when new free pagelets should be assigned to a size class.

The other heap is for objects larger than 1024 bytes. We use a simple first-fit strategy. To help mitigate the effect of the fragmentation of free spaces, we coalesce adjacent free spaces into larger blocks during the sweep phase.

My measurements show that the new allocator is about twice as slow as the Boehm allocator when allocating cons cells. For the entire bootstrap process the new allocator adds about 2 seconds of overhead, taking the bootstrap time from 23 to 25 seconds.

The new garbage collector is in state/runtime/state_runtime.state .