ICFP ’09

I had the pleasure of participating in the ICFP contest again this year. I think I joined a haskell team in 2002 (which is funny because I’m not proficient in haskell at all!). But since then I’ve only been part of a Lisp team in 2007. That time I failed to implement a ropes-like library in enough time to get us enough momentum.

So this year it was great to try again, our team ended up being just myself and another Lisp hacker from New Zealand. Since I’m in South Africa there was quite a time zone difference between us and we both stayed up till early hours during the last two days.

As always it didn’t take long to write up the initial code for the interpreter, but we spent several hours tracking down bugs, parsing the inputs and figuring out how to approach the problem.

My partner then took over, he wrote all the physics and modeling code while I tried to provide moral support and starting hacking a visualizer.

Initially we used a lisp library called cgn, which was ok for the initial runs but proved too slow later on. cgn writes out data points to a text file (and we had to patch it to write double-floats correctly) and then feeds this to gnuplot.

The writing and reading of these ascii files was too slow for the kinds of scenarios we were modeling (tens of thousands of data points), so I started over, trimming the dataset by only using every 100th data point and writing data files in binary format.

Since I had never really used gnuplot before I spent several hours reading documentation and poking around until I was able to generate scripts like the following to render our data:


set key bottom right
plot [-100000000:100000000] 'plots/ROCKET.dat' binary record=11951X11951 format="%float64%float64" title 'plots/ROCKET', 'plots/TARGET.dat' binary record=11951X11951 format="%float64%float64" title 'plots/TARGET'
pause -1

I was quite pleased with the results, but there are obviously better ways to display the trace if you have the time and create the right tools.

Satellite visualization

(Note the start and end labels are swapped in this picture)

In the end we never really got a solution for the 4th task, but managed to score about 2000 points with our solutions.

OMeta2 (OMeta in Common Lisp)

Since my last post about my implementation of OMeta in Common Lisp I have updated it to use a newer syntax. This implementation is called “ometa2” for lack of a better name.

I have been using it to generate the Church parser for my Church and State project.

The previous version used to pass some arguments on the “OMeta stack” and in the new version I rewrote it to avoid this overhead. This meant creating an “ometa-input-stream” and a “list-input-stream” which allows reading objects from a list as if it was a stream.

The new version also uses a hash-table for memoization, which improves performance.

There are still however some performance concerns, the generated parsers contain too many function/lambda applications and some kind of optimization to reduce these will make the parsers much faster. For now the performance is sufficient for my work and reasonable considering the simple implementation.

You can get the code here:

http://subvert-the-dominant-paradigm.net/repos/hgwebdir.cgi/ometa2

Bootstrapping Church

As part of my work so far on the Church/State language, I have implemented a parser generator, two compilers, a machine code generator and a library to write ELF files. All of these were implemented in Common Lisp. In addition to this I have implemented runtime code for State (100 LOC) and for Church (about 2000 LOC).

The long term goal for this project is to bootstrap, achieved by rewriting all the components mentioned above in Church itself such that the system can eventually compile itself without relying on an external system such as Common Lisp or a C compiler.

So far I have ported OMeta (with the new JS style syntax) to Church and written a Church grammar that produces Church expressions as actions. I have also completed the basic framework of the code generator (enough to assemble code that adds two constant integers together).

Here is an outline of the remaining work:

  • Port Church compiler – the tricky party here is implementing macros. Currently I use ‘eval’ to compile macro bodies in lisp, to do this in church requires dynamic compilation of church code too.
  • Port State compiler – This includes writing a sexp parser (a lisp reader). Hopefully this will be fairly easy to do with the current OMeta implementation. I will probably also have to improve the quasiquote and splice operators in Church to implement this. Currently State macros are implemented in Lisp, I think the best approach will be to try rewrite them in State
  • Port the rest of code generator – Should be a fairly straightforward matter of implementing the remaining AST instructions and machine instructions
  • Port the ELF library – Since I used the binary-types library in the Lisp implementation, I’m not sure what approach will be necessary to do this in Church. I might end up implementing some kind of MOP system which allows special metaclasses and special slot attributes which can be used to control the binary output of class instances

Along the way I’m sure I will have to add more runtime functionality, and will probably also have to create an operator similar to destructuring-bind for deconstructing lists.

I hope to be able to avoid adding dynamically scoped variables, multiple value return and a “rest” parameter, all of which were used in the Lisp implementation.

Church and State (part 2)

While State is a low level language that compiles directly to machine code, Church has features common to many high level languages (and some uncommon ones):

Church

  • A block-indented syntax (similar to Python or Haskell)
  • Macros
    
    syntax new [(&rest args) (let ((class-name (second (first args)))
                                   (key-args (loop for (_key k v) in (cdr args) collect `(:symbol ,k) collect v)))
                                `(:let (((:var "o") (:inline state::state-make-object (:symbol ,class-name) ,@key-args)))
                                   (:begin
                                    (:apply "init" (:var "o"))
                                    (:var "o"))))]
    
  • A class system
  • Dynamic dispatch (dispatch depends on all the arguments of a function, also known as multiple-dispatch)
  • Pattern matching syntax like Erlang or Prolog (yet to be implemented)
  • A high level ‘loop’ control structure
    
    bar list1 list2
            loop
                    for x in list1
                    for y in list2
                    when x
                    do (print x)
                    when y
                    do (print y)
                    collect y
    
    
    

At the moment I have a fairly primitive implementation for dynamic dispatch, I use a cons list to store lists of patterns and bodies, sorted first by symbol and then by argument types.

Symbols are also interned into a cons list, later I will have to find a way to include them directly in compiled code.

Loops are expanded into State’s tagbody forms.

The parser is implemented in my Common Lisp version of OMeta, which I plan to port to Church later.

My future work is focussed on writing the Church and State compilers in Church itself, thus bootstrapping the language.

Church and State (part 1)

After porting jolt-burg I started developing a higher level language on top of it. My codenames for this high-level and low-level combination are Church and State.

State

My implementation of State has been progressing steadily to the point where I have the following features:

  • lisp-like sexp syntax (I use the lisp reader from the host environment to read the State source files)
  • functions and global variables
  • Macros
    
    (syntax end? (x) `(if (null? ,x)
                          1
                          (if (pair? ,x)
                              0
                              (abort "bad list encountered in end?"))))
    
  • partial lexical closures (also known as ‘downward funargs’)
    (define |main| (lambda ()
                     (let ((a 1)
                           (b 2))
                       ((lambda (arg)
                          (+ arg a)) b))))
    
    
  • basic machine operations such as + – * bitwise and, bitwise or and bitshifts
  • if forms
    
    (if a
        1
        2)
    
    
  • ‘cond’, ‘and’ and ‘or’ forms
  • the tagbody control structure
    (tagbody
       start
       (setup-vars) 
       next
       (process-next)
       (if (end)
           (go finished)
           (go next))
       finished
       (cleanup))
    
    
  • primitive structures
  • non-local-return
    
    (define foo (lambda ()
                     (block a
                       (block b
                         (return-from foo 1)))
                     (not-executed)))
    
    

All of the above has been implemented with a 32-bit x86 backend. The compiler generates relocatable code which is written to ELF objects and linked with the unix linker.

jolt-burg

In April I posted to the fonc mailing list about my port of Ian Piumarta’s “jolt-burg”.

This is a combination of a simple lisp-like language (“jolt”) and a compiler from a tree of “instruction” objects to native machine code.

My implementation was in Common Lisp and my compiler targeted relocatable ELF object files.

From my post:

Currently I have a compiler that can read sexp input, eg:

(define |main| (lambda ()
        ((call-c |dlsym| 0 "puts") "hello, dynamic world!")))

and then compile it down to a tree of blocks and instruction objects (see instruction.lisp or Instruction.st) 

BLOCK(--- NIL ---)                                                                                                                                                         
| | 0(--- VOID BLOCK --- 369                                                                                                                                               
| | | | | 0(+++ REG CALLI4 ---                                                                                                                                             
| | | | | | 0(+++ REG CNSTP4 --- reloc ---)                                                                                                                                
| | | | | | 0(+++ REG CNSTI4 --- 0)                                                                                                                                        
| | | | | | 0(+++ REG CNSTP4 --- reloc ---))                                                                                                                               
| | | | | 0(+++ REG CNSTP4 --- reloc ---))                                                                                                                                 
| | 0(--- VOID BLOCK --- 369                                                                                                                                               
| | | | | 0(+++ REG CALLI4 EBX                                                                                                                                             
| | | | | | 0(+++ REG CNSTP4 EBX reloc ---)                                                                                                                                
| | | | | | 0(+++ REG CNSTI4 EAX 0)                                                                                                                                        
| | | | | | 0(+++ REG CNSTP4 ECX reloc ---))                                                                                                                               
| | | | | 0(+++ REG CNSTP4 EAX reloc ---))                       

This tree is processed by the burg compiler to emit x86 machine code. (in the diagram we see registers being assigned to the tree nodes)

The last stage is when I collect the machine code generated by the burg compiler and combine it with data and relocation entries to generate an ELF object file.

I can then pass this to the unix linker (ld) to produce an executable.

ld --dynamic-linker=/lib/ld-linux.so.2 state/start.o state/hello.state.o -ldl  -lc -o myhello

Here you can see the disassembled elf object file with relocation entries for the "dlsym" symbol and the "puts" and "hello world" strings.

objdump -d state/hello.state.o -r

Disassembly of section .text:

00000000 
: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 53 push %ebx 4: 83 ec 14 sub $0x14,%esp 7: bb 00 00 00 00 mov $0x0,%ebx 8: R_386_32 dlsym c: b8 00 00 00 00 mov $0x0,%eax 11: b9 00 00 00 00 mov $0x0,%ecx 12: R_386_32 _data_286 16: 89 4c 24 04 mov %ecx,0x4(%esp) 1a: 89 04 24 mov %eax,(%esp) 1d: ff d3 call *%ebx 1f: 89 c3 mov %eax,%ebx 21: b8 00 00 00 00 mov $0x0,%eax 22: R_386_32 _data_287 26: 89 04 24 mov %eax,(%esp) 29: ff d3 call *%ebx 2b: 89 c0 mov %eax,%eax 2d: 89 c0 mov %eax,%eax 2f: 83 c4 14 add $0x14,%esp 32: 5b pop %ebx 33: 5d pop %ebp 34: c3 ret

You can view the code at:

http://subvert-the-dominant-paradigm.net/repos/hgwebdir.cgi/bootstrap/

(see the burg directory)

(click files to see the files, or run ‘hg clone <url>’ to get a local copy)

OMeta in Common Lisp

OMeta is a parsing engine by Ian Piumarta and Alessandro Warth. It combines PEG rules with a syntax for naming matched components and executable actions that can refer to the named parts. A simple OMeta rule looks like this:

ostring ::=     $' (~$' <ochar>)*:s $' => [(coerce  s 'string)]

Here a string is matched starting with a single quote, followed by any character that is not a single quote and ending with a single quote.

The action for this rule turns the list of characters matched into a lisp string.

I created an OMeta implementation in Common Lisp by bootstrapping from the squeak implementation of OMeta.

To do this I first modified the OMeta parser to be able to read lisp forms in the action bodies. Next I modified the OMeta compiler to produce lisp forms instead of smalltalk code.

Using these generated forms in combination with hand-coded primitive rules (ometa-prim.lisp) I was able to use two new grammars ometa-parser.g and ometa-compiler.g to fully bootstrap the system.

My code is available from this mercurial repository:


hg clone http://subvert-the-dominant-paradigm.net/repos/hgwebdir.cgi/ometa/

To run it:


(load "load.lisp")

Then you can parse a grammar file into its AST:


(parser-grammar-file "grammar.g")

To create an executable parser, first declare a subclass of ometa:


(defclass dnaparser (ometa) ())

Next write the production rules in a grammar file:


base ::=
$A | $T | $G | $C^L
dsequence ::=
<base>*:s => [ (coerce s 'string) ]^L

and generate the parser from the grammar:


(generate-parser 'dnaparser "example.g" "example.lisp")

This reads the grammar file “example.g” and produces lisp defmethods for the class ‘dnaparser’. These lisp forms are written to “example.lisp”.

After loading the parser, you can run the productions like this:


CL-USER> (let ((*ometa-class-name* 'dnaparser))
(run-production 'dsequence (coerce "GGCCGGGC" 'list)))
"GGCCGGGC"

As an alternative to generate-parser, you can use install-grammar to load the rules without generating an intermediate file.

The squeak implementation of OMeta supports several extensions that I have not implemented:

– Memoization of previous parse results
– Support for left-recursive rules
– Ability to apply a rule in the super class
– Support for rules that call out to a “foreign” parser during the parse
– An optimizer to remove redundant AND and OR forms produced by the parser

The Farmer, Fox, Chicken and Grain problem (in lisp)

The same problem in lisp. Here you can see how much work has to be done to mimic sets.


(defun safe-p (list)
  (or (member 'farmer list)
      (and (not (and (member 'fox list)
                     (member 'chicken list)))
           (not (and (member 'chicken list)
                     (member 'grain list))))))

(defun permute-state (state)
  (let (result)
    (destructuring-bind (near far) state
      (if (member 'farmer near)
          (let ((others (set-difference near '(farmer))))
            (push (list 'farmer (list others
                                      (cons 'farmer far))) result)
            (loop for item in others
                  do (push (list item (list  (remove item others) (cons item (cons 'farmer far)))) result))
            result)
          ; we just reverse the sides                                                                                                    
          (loop for (move (near far)) in (permute-state (reverse state))
                collect (list move (list far near)))))))

(defvar *history* nil)
(defun solve (initial-state goal-state moves)
  (if (and (null (set-difference (first initial-state) (first goal-state)))
           (null (set-difference (second initial-state) (second goal-state))))
      (progn
        (format t "Found solution~%")
        (loop for move in (reverse moves)
              and direction = 'forward then (if (eq direction 'forward)
                                                'back
                                                'forward)
              do (if (eq move 'farmer)
                     (format t "Farmer crosses ~A~%" direction)
                     (format t "Farmer crosses ~A with ~A~%" direction move)))))
  (loop for (move new-state) in (permute-state initial-state)
        do
        (if (and (not (member (sort  (copy-list (first new-state)) #'string-lessp) *history* :test #'equal))
                 (safe-p (first new-state))
                 (safe-p (second new-state)))
            (progn
              (push (sort (copy-list (first new-state)) #'string-lessp) *history*)
              (solve new-state goal-state (cons move moves))))))

and the output:

CL-USER: (progn (setf *history* nil) (solve '((farmer fox chicken grain) ()) '(() (farmer fox chicken grain)) nil))
Found solution                                                                                                                           
Farmer crosses FORWARD with CHICKEN                                                                                                      
Farmer crosses BACK                                                                                                                      
Farmer crosses FORWARD with GRAIN                                                                                                        
Farmer crosses BACK with CHICKEN                                                                                                         
Farmer crosses FORWARD with FOX                                                                                                          
Farmer crosses BACK                                                                                                                      
Farmer crosses FORWARD with CHICKEN                                                                                                      
NIL
CL-USER:

metapeg

I was inspired by the metacircular parser written by DV Schorre and Ian Piumarta’s related work in his pepsi/coke project to write a parser generator that can generate itself.

I bootstrapped the parser by using cl-peg to generate lisp code that could eventually generate itself.

I have made several versions of metapeg available, the first two are simple lisp and scheme implementations. The later two are more complex and allow the parser to walk back up the parse tree during the parse to examine the text matched by previous parse nodes. This functionality allows easy implementation of a @tag construct used to implement the indentation rules in languages like haskell, python and yaml (as suggested here) .

As an example I wrote a simple parser for yaml sequences. Here is some sample input:


-
 - 'foo'
 - 'bar'
 - 'yah'
 -
  - 'a'
  - 'b'
  - 'c'

This is the grammar:


program <- seq-element+ 

inset <- @inset ws+
ws <- [ \\t]
nl <- [\\n]
ws_or_nl <- ws/nl

seq-element <- "-" ws* string nl { (third data) } / nested-sequence
nested-sequence <- "-" ws* nl inset seq-element (@inset seq-element)* { 
(cons (fifth data) (zip-second (sixth data))) }

string <- "'" [ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]+ "'" { 
(char-list-to-string (second data)) }

and the output:

CL-USER> (value (parse "../examples/nested_sequence.yaml" "yaml.lisp"))
(("FOO" "BAR" "YAH" ("A" "B" "C")))

The parsers generated by metapeg do not implement memoization of the parse results, possibly making them unsuitable for large grammars or large inputs.