Archive for October, 2008

Improving dispatch performance in Church

Friday, October 31st, 2008

Over the last week I have been working on performance optimizations for Church code. My basic test case is to run the OMeta parser (implemented in Church) on the “base.church” file in the Church runtime library. This file implements most of the Church runtime code. Parsing this file with the Common Lisp implementation of OMeta takes 0.5 seconds on my laptop. Initial runs of the Church code took well over a minute.

The first major change was to cache symbol lookups. Previously I was storing all symbol objects in an a-list (a linked list) and every time a symbol was used the table would be searched linearly using string comparisons. To avoid this lookup I modified the State compiler to take a (load-constant-value <raw-value>) form. When the state compiler compiles this form it creates a cell in the data segment from which to load a constant value. It also generates some initialization code which will call out to the runtime to initialize the constant.

By collecting all these initialization sequences and running them on startup we can intern all the symbols used by the program before it starts running. Each symbol reference thereafter is a simple memory load.

Using these cached symbols makes class lookup a lot easier too. Previously I stored all classes in a table keyed by symbol and searched it linearly when looking up classes. In the new implementation I store class objects in a “class” slot associated with the symbol representing the class’s name.

Similarly, the dispatch table was stored as a global list and searched first by symbol (selector) and then compared according to argument types. I moved the dispatch rules to the symbol objects, making the search much shorter.

These changes brought the instruction execution count down to about 40 billion instructions for Church (versus about 650 million for Lisp).

Further optimizations involved removing all cons’ing from the dynamic dispatch routines, inlining as many of those calls as possible to avoid function call overhead and rewriting recursive routines as tagbody loops.

I also reordered the tests in the “class-of” code to check the most common cases first.

All these optimizations brought the time down to about 4.5 seconds and the execution count to 13 billion. This is still about a factor of 10 slower than the lisp implementation. Eventually I hope to bring it to about a factor of 2, possible future optimizations are inline-caches for method lookup and optimized cons and closure allocation. Eventually I might also look at more sophisticated approaches, such as type inference, an optimizing compiler pass and runtime profiling.

To profile the code I tried two tools, Intel’s VTune and the callgrind module of valgrind. VTune was quite disappointing, besides having to paste serial numbers into an obscure installation utility (which failed the first few times) the download was 500 megabytes. After installation the sampling driver failed to work but I managed to run the call tracing module.

valgrind provides similar information to VTune, but the kcachegrind visualizer is much better, the call graph is very easy to work with and is also possible to see hot loops at the assembly level.

Bootstrapping Church

Wednesday, October 8th, 2008

As part of my work so far on the Church/State language, I have implemented a parser generator, two compilers, a machine code generator and a library to write ELF files. All of these were implemented in Common Lisp. In addition to this I have implemented runtime code for State (100 LOC) and for Church (about 2000 LOC).

The long term goal for this project is to bootstrap, achieved by rewriting all the components mentioned above in Church itself such that the system can eventually compile itself without relying on an external system such as Common Lisp or a C compiler.

So far I have ported OMeta (with the new JS style syntax) to Church and written a Church grammar that produces Church expressions as actions. I have also completed the basic framework of the code generator (enough to assemble code that adds two constant integers together).

Here is an outline of the remaining work:

  • Port Church compiler – the tricky party here is implementing macros. Currently I use ‘eval’ to compile macro bodies in lisp, to do this in church requires dynamic compilation of church code too.
  • Port State compiler – This includes writing a sexp parser (a lisp reader). Hopefully this will be fairly easy to do with the current OMeta implementation. I will probably also have to improve the quasiquote and splice operators in Church to implement this. Currently State macros are implemented in Lisp, I think the best approach will be to try rewrite them in State
  • Port the rest of code generator – Should be a fairly straightforward matter of implementing the remaining AST instructions and machine instructions
  • Port the ELF library – Since I used the binary-types library in the Lisp implementation, I’m not sure what approach will be necessary to do this in Church. I might end up implementing some kind of MOP system which allows special metaclasses and special slot attributes which can be used to control the binary output of class instances

Along the way I’m sure I will have to add more runtime functionality, and will probably also have to create an operator similar to destructuring-bind for deconstructing lists.

I hope to be able to avoid adding dynamically scoped variables, multiple value return and a “rest” parameter, all of which were used in the Lisp implementation.