Prolog in Church

November 24th, 2009

As a step towards implementing an experimental type inferencing or type checking system I have built a prolog interpreter in Church. The interpreter is based on the example in Peter Norvig’s book, “Paradigms of Artificial Intelligence Programming”.

As a first step I wrote the grammar for parsing prolog programs. Here is a sample ‘parent’ program describing parent and grandparent relationships between Debian derivatives:


parent(ubuntu,kubuntu).
parent(ubuntu,edubuntu).

parent(debian, ubuntu).

grandparent(X,Z) :- parent(X,Y), parent(Y,Z).

and here is the OMeta grammar for parsing prolog programs:


ometa prolog <: ometa {

comment = "/*" (~cnewline anything)* cnewline -> << '_comment >>,

ws = $  | $	,
wsnl = ws | cnewline,

prolog-top-levels = (comment | prolog-rule)*,

prolog-rule = wsnl* prolog-clause:rhead ws* (":-" ws* prolog-or:body -> << body >>)* ws* $. wsnl*  -> <<`[rule ,rhead ,body]>>,

prolog-or = listof("prolog-impl", ";"):e  wsnl* -> << `[or ,e] >>,
prolog-impl = wsnl* prolog-clause:a wsnl* "->" wsnl* prolog-or:b wsnl* -> << `[implies ,a ,b] >> | prolog-and,
prolog-and = wsnl* listof("prolog-expr", ","):e wsnl* -> << `[and ,e] >>,

prolog-clause = wsnl* prolog-name:name wsnl* prolog-arg-list:args wsnl* -> <<`[,name | ,args]>>,
prolog-arg-list = $( listof("prolog-expr", ","):args wsnl* $) wsnl* -> << args >>,

prolog-expr = wsnl* (
	         $[ wsnl* $] wsnl* -> << `[]>> |
	      	 $[ wsnl* listof("prolog-expr", ","):list-head ws* ($| wsnl* prolog-expr)*:list-tail wsnl* $] wsnl* -> << (append! list-head (if list-tail (first list-tail) nil)) >> |
	      	 $( wsnl* prolog-or:e wsnl* $) wsnl* -> << e >> |
		 prolog-clause:c wsnl* -> << c >> |
                 prolog-number |
                 prolog-variable:v wsnl* -> << v >>),

prolog-number = "-"*:sign digit+:d -> << (convert-number sign d)>>,

prolog-variable = (letter | digit | $_)+:l -> << (intern (coerce l 'string)) >>,

prolog-name = (letter | digit | $_)+:l -> << (intern (coerce l 'string)) >>  }

The core of prolog lies in the unification and backtracking algorithms.

Unification will try to match its two inputs or else bind a variable to the corresponding value:


unify x y bindings
	cond
		(eq? bindings prolog-fail) prolog-fail
		(eq? x y) bindings
		(eq? x '_) bindings
		(eq? y '_) bindings
		(variable-symbol? x) (unify-variable x y bindings)
		(variable-symbol? y) (unify-variable y x bindings)
		(and (cons? x) (cons? y)) (unify (rest x) (rest y) (unify (first x) (first y) bindings))
		true prolog-fail

unify-variable var x bindings
	cond
		(get-binding var bindings) (unify (lookup var bindings) x bindings)
		(and (variable-symbol? x) (get-binding x bindings)) (unify var (lookup x bindings) bindings)
		(and occurs-check? (occurs-check var x bindings)) prolog-fail
		true (extend-bindings var x bindings)

Backtracking is achieved by allowing each clause to provide multiple solutions and trying all of these possibilities:


prove-all pi:prolog-interpreter goals bindings
	cond
		(eq? bindings prolog-fail) nil
		(null? goals) (list bindings)
		true
			mapcan (fn goal1-solution
				prove-all pi (rest goals) goal1-solution
) (prove pi (first goals) bindings)

As a test I used this map coloring program from a prolog tutorial:


member(X,[X|_]).
member(X,[_|List]) :- member(X,List).

adjacent(X,Y,Map) :-  member([X,Y],Map) ; member([Y,X],Map). 

find_regions([],R,R).
find_regions([[X,Y]|S], R,A) :-
 (member(X,R) ->
  (member(Y,R) -> find_regions(S,R,A)     ; find_regions(S,[Y|R],A))  ;
  (member(Y,R) -> find_regions(S,[X|R],A) ; find_regions(S,[X,Y|R],A) )). 

color(Map,Colors,Coloring) :-
        find_regions(Map,[],Regions),
        color_all(Regions,Colors,Coloring),
        not(conflict(Map,Coloring)).
color_all([R|Rs],Colors,[[R,C]|A]) :-
        member(C,Colors),
        color_all(Rs,Colors,A).
color_all([],_,[]). 

conflict(Map,Coloring) :- member([R1,C],Coloring),
        member([R2,C],Coloring),
        adjacent(R1,R2,Map). 

map1([[1,2],[1,3],[1,4],[1,5],[2,3],[2,4],[3,4],[4,5]]).

which yields the following colourings:


"query"
[[map1 M] [color M [red green blue yellow] Coloring]]

M = [[1 2] [1 3] [1 4] [1 5] [2 3] [2 4] [3 4] [4 5]]
Coloring = [[5 red] [4 green] [3 red] [1 blue] [2 yellow]]
;
M = [[1 2] [1 3] [1 4] [1 5] [2 3] [2 4] [3 4] [4 5]]
Coloring = [[5 red] [4 green] [3 red] [1 yellow] [2 blue]]
...

You can browse the source files for the prolog interpreter here.

jonesforth64

July 20th, 2009

I have ported jonesforth to 64-bit x86 code.

jonesforth is a tutorial-style implementation of forth which explains in detail how the compiler and runtime is implemented. Porting the code to a slightly different assembly language helped me to think carefully about what each primitive does and about how they are used in the runtime code.

As noted in the jonesforth comments, the original advantage of using direct-threaded code on a 16-bit machine is that calling each word can be encoded in two bytes instead of three. That’s a savings of 33%. On 32-bit x86, it’s four bytes versus five, saving 20%. In my 64-bit implementation I chose to extend the 4-byte addresses to 8-byte words. This actually results in wasting space rather than saving it because on x86-64 calls and branches are usually encoded with a 5-byte instruction using relative displacement.

The port was fairly straightforward, I mostly just replaced 32-bit registers (eax, esp, esi etc) with the 64-bit equivalents (rax, rsp, rsi etc) and changed every reference to the word-size from 4 to 8.

The biggest difference is that syscalls use different registers on 64-bit linux and these registers can be clobbered during the call.

You can get the code from the mercurial repository:

hg clone http://subvert-the-dominant-paradigm.net/repos/jonesforth64/

To compile it:


gcc -m64 -nostdlib -static -Wl,-Ttext,0 -Wl,--build-id=none -o jonesforth64 jonesforth64.S

To run it:


cat jonesforth64.f - | ./jonesforth64
JONESFORTH VERSION 45
6393 CELLS REMAINING
OK
1 2 3 4
.S
4 3 2 1
ROT
.S
3 2 4 1

I’ve tested most of the code in the .f file, but I haven’t yet implemented C strings, file-io or the built-in assembler.

I’ve tried to keep the comments intact, but haven’t updated them to reflect different word sizes or registers etc.

Hullabaloo in the Guava Orchard

July 1st, 2009

Hullabaloo in the Guava Orchard

This book and “The Inheritance of Loss” make Kiran Desai one of my favourite authors.

ICFP ‘09

June 30th, 2009

I had the pleasure of participating in the ICFP contest again this year. I think I joined a haskell team in 2002 (which is funny because I’m not proficient in haskell at all!). But since then I’ve only been part of a Lisp team in 2007. That time I failed to implement a ropes-like library in enough time to get us enough momentum.

So this year it was great to try again, our team ended up being just myself and another Lisp hacker from New Zealand. Since I’m in South Africa there was quite a time zone difference between us and we both stayed up till early hours during the last two days.

As always it didn’t take long to write up the initial code for the interpreter, but we spent several hours tracking down bugs, parsing the inputs and figuring out how to approach the problem.

My partner then took over, he wrote all the physics and modeling code while I tried to provide moral support and starting hacking a visualizer.

Initially we used a lisp library called cgn, which was ok for the initial runs but proved too slow later on. cgn writes out data points to a text file (and we had to patch it to write double-floats correctly) and then feeds this to gnuplot.

The writing and reading of these ascii files was too slow for the kinds of scenarios we were modeling (tens of thousands of data points), so I started over, trimming the dataset by only using every 100th data point and writing data files in binary format.

Since I had never really used gnuplot before I spent several hours reading documentation and poking around until I was able to generate scripts like the following to render our data:


set key bottom right
plot [-100000000:100000000] ‘plots/ROCKET.dat’ binary record=11951X11951 format=”%float64%float64″ title ‘plots/ROCKET’, ‘plots/TARGET.dat’ binary record=11951X11951 format=”%float64%float64″ title ‘plots/TARGET’
pause -1

I was quite pleased with the results, but there are obviously better ways to display the trace if you have the time and create the right tools.

Satellite visualization

(Note the start and end labels are swapped in this picture)

In the end we never really got a solution for the 4th task, but managed to score about 2000 points with our solutions.

Church alpha 3

March 20th, 2009

This release fixes some bugs:

http://subvert-the-dominant-paradigm.net/~jewel/church/church-alpha-3.tar.gz

  • When I first wrote some of the church compiler passes and the sexp parser I used global variables to store state specific to that pass. I knew that only one thread would be running and that these passes didn’t have to be reentrant. Later I added functionality to compile macros on the fly and to compile dispatch matchers. When I did this I tried to rewrite all the modules that used global state, but I missed the sexp parser and this caused some strange bugs later on.
  • When allocating memory for dynamically allocated code, I was storing the address of the memory block in a fixnum. This worked fine until malloc started returning addresses from higher memory which used the most significant two bits in a word. These would get shifted out when tagging a fixnum and yield an invalid address. To work around this I now box the address by storing it in an array

Church alpha 2

March 11th, 2009

I have made a new release incorporating the performance work described in previous posts.

http://subvert-the-dominant-paradigm.net/~jewel/church/church-alpha-2.tar.gz

Compiling dispatch matchers at runtime

March 11th, 2009

After implementing inline slot caches I tried various approaches for caching method lookup. At first I thought I could store a global hash table which was keyed by combining the hash values of the types of the arguments passed to a function. Then by comparing the arguments types with the types stored in the table entries I could find the correct method to dispatch to. This turned out to be slower than my current implementation which simply walks the cons lists that describe each method’s type pattern and calling “type?” to test whether the argument is a subtype of the target type.

My eventual solution was to turn this lookup process into straight-line code by dynamically creating machine code to execute these tests.

Consider the following Church code:


length s:string
   ......

length n:nil
   0

length l:cons
   ......                                                                                                                                                 

length a:array
   ......


Here we check the type of the argument to distinguish between nil, strings, cons-lists and arrays. For nil and cons types we can generate an efficient check against the object tag, for strings and arrays we call out to “type?”. The following state code is generated for these tests:


 (DEFINE DISPATCH-MATCHER31
  (LAMBDA (ARG1 ARG2 ARG3 ARG4 ARG5 ARG6 ARG7)
    (LET ((ARG-COUNT (LOAD-ARGUMENT-COUNT)))
      (IF (= ARG-COUNT 1)
          (BEGIN
           (IF
            (AND (= (BAND ARG1 LOWTAG_BITS) TAG_REF)
                 (CHURCH-IF (TYPE? ARG1 137075987) 1 0))
            (RETURN (CALL-C 134993347 1 (LOAD-CLOSURE-POINTER) ARG1)))
           (IF
            (AND (= (BAND ARG1 LOWTAG_BITS) TAG_REF)
                 (CHURCH-IF (TYPE? ARG1 137075507) 1 0))
            (RETURN (CALL-C 134968518 1 (LOAD-CLOSURE-POINTER) ARG1)))
           (IF (= (BAND ARG1 LOWTAG_BITS) TAG_CONS)
               (RETURN (CALL-C 134968094 1 (LOAD-CLOSURE-POINTER) ARG1)))
           (IF (= ARG1 TAG_NIL)
               (RETURN (CALL-C 134968046 1 (LOAD-CLOSURE-POINTER) ARG1))))))
    (CHURCH-DISPATCH-MATCHER-FAILED 1))))

The first two tests check for the TAG_REF tag which is used for normal objects. If the tag matches we call “type?” with the literal address representing the class of the target type. If the test matches we can call the associated method directly.

The other two tests only compare the tag, making it quite efficient for primitive types like cons, fixnums, true and nil.

After generating this code the calling function is patched to jump to this matcher method directly.

Together with the inline slot caches these modifications speed the system up by about a factor of two.

Future work involves determining more precisely when to compile these dispatchers (at the moment I do it after a symbol has been dispatched through 5000 times) and optimizing the generated tests. (Even in this example there is redundant check for TAG_REF). It is probably also possible to be more efficient when overlapping types are involved.

Inline slot caches

March 11th, 2009

As part of my performance work I have implemented inline slot caches for slot reads and writes in Church. These are implemented by patching the program code at runtime with assembly code that checks the type of the target object and loads the correct offset for slot access.

In this example we see the code that prepares a call to ‘church-fixup-initial-slot-access’. The first two arguments on the stack (%esp) and 0×4(%esp) are the argument-count and closure pointer used in the State calling convention. The next two arguments 0×8(%esp) and 0xc(%esp) are the object being accessed and the symbol representing the name of the slot to be accessed.


0x080ce733 mov    %ebx,0xc(%esp)
0x080ce737 mov    %edx,0x8(%esp)
0x080ce73b mov    %ecx,0x4(%esp)
0x080ce73f mov    %eax,(%esp)
0x080ce742 call   0x80ab3b4 
0x080ce747 mov    %eax,%eax
0x080ce749 nop
0x080ce74a nop
0x080ce74b nop
0x080ce74c nop    


The fixup routine gets the type of the object and examines all the parent classes to determine the correct offset for accessing the slot in this object. It then generates x86 machine code and patches the calling function. At the moment I do this by directly emitting a byte sequence for each instruction, this is quite crude and error-prone but manageable when such a small amount of code is being generated.


              (write-byte! patch-start #x90)
              (write-byte! patch-start #x90)
              (write-byte! patch-start #x90)
              (write-byte! patch-start #x90)
              (write-byte! patch-start #x90)

              (write-word! patch-start #x08244c8b)
              (write-byte! patch-start #x81)
              (write-byte! patch-start #x39)
              (write-word! patch-start obj-type)
                                        ;je
              (write-byte! patch-start #x74)
              (write-byte! patch-start #x7)
...

First the old call is overwritten with nops and then we emit some comparison and jump instructions. The final output looks like this:


0x080ce733 mov    %ebx,0xc(%esp)
0x080ce737 mov    %edx,0x8(%esp)
0x080ce73b mov    %ecx,0x4(%esp)
0x080ce73f mov    %eax,(%esp)
0x080ce742 nop
0x080ce743 nop
0x080ce744 nop
0x080ce745 nop
0x080ce746 nop
0x080ce747 mov    0x8(%esp),%ecx
0x080ce74b cmpl   $0x8302993,(%ecx)
0x080ce751 je     0x80ce75a 
0x080ce753 call   0x80ab8ba 
0x080ce758 jmp    0x80ce760 
0x080ce75a mov    0x4(%ecx),%eax
0x080ce760 mov    %eax,%eax
0x080ce762 nop
0x080ce763 nop
0x080ce764 nop
0x080ce765 nop

The untagged object pointer is moved into %ecx and the first word (which points to the class wrapper for this object) is compared with the literal address of the class wrapper seen the first time. If it is the same, we simply load the slot at the precomputed offset (0×4) and store it in %eax. If not we jump to a runtime function which does a conventional (but much slower) lookup.

Macros and dynamic compilation

February 17th, 2009

I have released the next version of my Church-State system at:

http://subvert-the-dominant-paradigm.net/~jewel/church/church-alpha-1.tar.gz.

This release adds proper Church macros and dynamic compilation to the system. In this example, a macro for moving an immediate value to a register


macro MOVLir imm reg
        <<
                _Or_L 0xb8 (_r4 \reg) \imm

>>

you can see quoted code wrapped in angular brackets and backslashes used to unquote values that must be inserted into the template.

The macro syntax is not final (like most of Church’s syntax) and it should look cleaner as it matures.

To handle macros like these the system is now capable of taking Church source code parsed out of the macro body, compiling it to low-level State code and then compiling and linking the resulting machine code into the running image.

This involves resolving low level symbols (I use dlsym) and modifying the Church dispatch tables and macro rewriters to use the new macros. I also have to run an initialization function which sets up the “code vector” with constant values (interned symbols etc).

Now that I have dynamic compilation working, it should be fairly easy to add a REPL to the system.

Performance

As part of this release I have also disabled a lot of optimizations that I had worked on before. These include inline caches for method dispatch and slot access. The reason I have disabled these optimizations is that they cost too much in terms of space compared to the benefit they provide in improved speed.

I’m now pursuing a new approach which uses “class wrappers” marked with random seeds. The idea is that these seeds can be used to hash into lookup tables which memoize the effective method for dispatch. I hope to be able to incorporate these ideas plus others from contemporary implementations (of javascript vms etc) to make the system substantially faster.

Church release

February 6th, 2009

I’m proud to have reached the stage where my Church-State system can compile itself (ie the compiler is bootstrapped).

I have made the first alpha release available at:

http://subvert-the-dominant-paradigm.net/~jewel/church/church-alpha-0.tar.gz

To try it out you’ll need a 32-bit x86 linux system with “ld” installed. (Usually ld will be installed if you’ve installed something like gcc).

There are two simple test files mentioned in the README and there are also instructions for bootstrapping the system.

One thing missing from the release is a compiler that compiles the output from the OMeta parser generator to Church files. That means it’s not possible to change the grammars just yet.

Another incomplete feature is that Church and State macros are hard-coded into the compiler. If you look at church-pass1.church and state-pass1.church you’ll see the various hard-coded macros (some of which are quite complex). To be able to include these macros in the source files where they are used I need to be able to dynamically compile and load church code. I’ve completed the first step of this process, see state-dynamic.church and church-test-dynamic-alloc.church for working code that can compile a church file down to native code, allocate memory for it and link it into the running image.

Once I have Church macros working, I plan to rewrite a lot of assembler-i386.church to use macros instead of functions for emitting machine instructions. I think that this will dramatically improve compilation times. While preparing for this release I did a lot of work on performance, even removing array bounds checking and some other safety checks to make it faster. Currently the system bootstraps in 90 seconds on my laptop, but my goal is to be 2 or 3 times as fast.