I was inspired by the metacircular parser written by DV Schorre and Ian Piumarta’s related work in his pepsi/coke project to write a parser generator that can generate itself.

I bootstrapped the parser by using cl-peg to generate lisp code that could eventually generate itself.

I have made several versions of metapeg available, the first two are simple lisp and scheme implementations. The later two are more complex and allow the parser to walk back up the parse tree during the parse to examine the text matched by previous parse nodes. This functionality allows easy implementation of a @tag construct used to implement the indentation rules in languages like haskell, python and yaml (as suggested here) .

As an example I wrote a simple parser for yaml sequences. Here is some sample input:

 - 'foo'
 - 'bar'
 - 'yah'
  - 'a'
  - 'b'
  - 'c'

This is the grammar:

program <- seq-element+ 

inset <- @inset ws+
ws <- [ \\t]
nl <- [\\n]
ws_or_nl <- ws/nl

seq-element <- "-" ws* string nl { (third data) } / nested-sequence
nested-sequence <- "-" ws* nl inset seq-element (@inset seq-element)* { 
(cons (fifth data) (zip-second (sixth data))) }

string <- "'" [ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]+ "'" { 
(char-list-to-string (second data)) }

and the output:

CL-USER> (value (parse "../examples/nested_sequence.yaml" "yaml.lisp"))
(("FOO" "BAR" "YAH" ("A" "B" "C")))

The parsers generated by metapeg do not implement memoization of the parse results, possibly making them unsuitable for large grammars or large inputs.