This is a new release of Church-State
This incorporates a new OMeta interpreter that I have written to replace the parser generator used in previous versions.
Implementing the Church parser with an interpreter rather than with a parser generator has two main benefits:
- There is no longer a requirement to have the church compiler available to generate a new parser. This caused some bootstrapping issues and made it harder to provide user-extensible parsing.
- The new approach is faster than before. Usually interpreters have more overhead than compiled equivalents but in this case the compilation overhead was very large for the generated church parser. Overall it’s faster to parse the church grammar file into an intermediate structure and interpret it than it is to generate and compile a very large generated parser.
The new system bootstraps in 28 seconds now (versus 34 seconds previously).
The most novel part of this new interpreter implementation is the way that I handle “grammar actions”.
Previously I implemented both a parser generator and an interpreter which used runtime code compilation to turn grammar actions into executable closures.
Take this example of the grammar rule for an assignment:
assignment = assignable-expression:lhs ws* "=" ws* expression(nil):rhs -> <<`[_assign ,lhs ,rhs]>>
which would parse the following Church code:
a = 1
In some parsing frameworks it is convenient to automatically construct a parse tree from a given grammar. In the case of scannerless parsers (which is what I use in Church) we would like to elide the nodes that would result from parsing whitespace. So in the example above the parts that we want to keep are labeled lhs and rhs. The section at the end describes a quasiquoted list template that has the values of lhs and rhs substituted into it.
When evaluated, this action will return something like this:
[_assign [_var "a"] [_number 1]]
To implement this one can create a closure with the following code:
fn arg1 arg2 -- `[_assign ,arg1 ,arg2]
which is called at parse time with the values for each labeled part.
This requires having the entire compiler framework available at parse time, including the parser which you are implementing!
To avoid this kind of bootstrapping issue and to simplify the parser, I decided to implement grammar actions with a simple quasiquoting syntax.
In the grammar actions implemented in the latest Church-State parser it is only possible to quote a list or symbol, unquote a variable or a list and specify a literal.
For example the following grammar rule:
pattern = cname:name ws* (ws* pattern-argument)*:args -> <<`(_pattern ,name ,@args)>>
creates a list starting with the symbol _pattern followed by the pattern name and then containing all the elements of the args list.
With closures it is possible to use list functions like cons, first and rest or to call routines to convert strings of digits into integers. This is not possible with quasiquoted actions and they require a later pass of the parse tree to do the necessary conversions. I have added routines to perform these transformations so that the resulting parse tree is exactly the same as that returned by the previous parser implementations.
The interpreter can be seen here:
along with the grammar for parsing ometa grammars: