You are looking at historical revision 30703 of this page. It may differ significantly from its current revision.

comparse

Comparse is a library of parser combinators similar to Haskell's Parsec. Its implementation is based on Drew Crampsie's smug for Common Lisp but slightly deviates in naming and provides a number of additional convenience combinators.

Caveat

As of now the API documentation is still incomplete, notably missing the main entry point procedure parse. Also, the Examples section is yet to be written.

Introduction

Comparse provides a library of parsers which can be applied to inputs in order to produce parse results. It also provides parser combinators to create new parsers. In Comparse inputs are represented by a parser-input type. It can hold a sequence of any kinds of items (often characters). For example:

(use comparse)

(define some-input
  (->parser-input "hello world"))

A parser then is a function which takes an input as its argument and, in case the parse succeeds, returns a parse result (which can be a value of any kind) and the remainder of the input (which is a parser-input again). In case the parse fails it returns #f.

A very basic parser provided by Comparse is the item parser which consumes one item from the given input and returns it as its parse result. If the input is empty, item fails. We can apply it to the input we defined above using the parse procedure

(parse item some-input)

Which gives us these results:

#\h
#<parser-input #\e #\l #\l #\o #\space #\w #\o #\r #\l #\d ...>

There is another basic parser called fail which always fails, no matter what's passed to it. These primitive parsers are not very useful on their own, of course. In order to do more interesting things Comparse provides a selection of parser combinators. Just like regular function combinators (such as compose or complement), parser combinators are functions which take one or more parsers as their arguments and combine them into a new parser. One of the basic parser combinators in Comparse is bind. It takes a parser and a procedure as its arguments and returns a parser which first applies the passed parser and, if it succeeds, calls the procedure with its parse result. The procedure then must return another parser which will be applied to the remainder of the parser that was passed to bind. Okay, that sounds pretty dazzling, so an example is in order:

(define double-item
  (bind item (lambda (x)
               (lambda (remainder)
                 (cons (list x x) remainder)))))

(parse double-item some-input)

Which gives us:

(#\h #\h)
#<parser-input #\e #\l #\l #\o #\space #\w #\o #\r #\l #\d ...>

So the procedure we pass to bind returns a parser which constructs a two-item list from the parse result and returns it with the untouched remainder. As returning a result from a parser without touching the remainder is needed quite frequently, Comparse provides the result parser combinator which does just that. Using it we can write double-item like this:

(define double-item
  (bind item (lambda (x)
               (result (list x x)))))

Comparse provides many convenient and more high-level parser combinators than the ones presented in this introdution. See the Examples section on how to use them to create parsers for practical tasks. Also note that you would usually not manually convert raw inputs (like strings or input ports) into a parser-input object like we did above. Instead you can pass the raw input object directly to the parse procedure which will automatically turn it into a parser-input for you.

A note on variadic combinators

Some parser combinators accept one or more parsers as their arguments. As a convenience when just one argument is passed and it is a list instead of a parser, that list is treated as the argument list. In other words those combinators implicitly apply themselves to the given (single) list argument.

API

[procedure] (result value)

Returns a parser which always succeeds, leaves the input untouched and returns value.

[procedure] (fail input)

A parser which always fails.

[procedure] (item input)

A parser which consumes and returns the first item of input, i.e. it returns that item and the tail of input. Fails if input is empty.

[procedure] (bind parser proc)

Returns a parser which applies parser and calls proc with its parse result if it succeeds. proc must return a parser which will be applied to the remainder returned by parser.

[procedure] (satisfies condition . args)

Returns a parser which consumes the next item of its input and applies condition to it like this: (apply condition item args). If that call returns #f, the parser fails. Otherwise it succeeds and returns the item as its result.

[procedure] (is x)

Returns a parser which succeeds when the next item of its input is eq? to x.

[procedure] (in collection . items)

Returns a parser which succeeds when the next item of the given input is contained in collection. Collection can be a SRFI 14 char-set or a list which is checked for membership with memq. Given more than one argument, membership of the next item is checked for with memq in (cons collection items).

[procedure] (sequence parser . parsers)

Returns a parser which applies the given parsers from left to right and returns the list of their results when all succeed. See also the note on variadic combinators.

[procedure] (char-seq str)

Returns a parser which consumes the chars of the given string str in the order contained therein, i.e. it parses a literal string. On success the parser returns str.

[procedure] (any-of parser . parsers)

Returns a parser which applies the given parsers from left to right and returns the result of the first one that succeeds. See also the note on variadic combinators.

[procedure] (all-of parser . parsers)

Returns a parser which applies the given parsers from left to right and returns the result of the last one if all succeed. See also the note on variadic combinators.

[procedure] (none-of parser . parsers)

Returns a parser which applies the given parsers from left to right and returns #t as its result if none of them succeed while it doesn't consume any items from the input. See also the note on variadic combinators.

[procedure] (none-of* parser [parsers ...] but)

Returns a parser which applies the given parsers from left to right. If none of them succeed, it applies the parser but and returns its result.

[procedure] (preceded-by parser . parsers)

Returns a parser which applies the given parsers from left to right to the input or the remainder of the preceding parser, respectively, and returns the result of the last one. See also the note on variadic combinators.

[procedure] (followed-by parser following-parser . following-parsers)

Returns a parser which applies parser if the given following-parser[s] succeed on its remainder from left to right. See also the note on variadic combinators which applies to the following-parser[s] argument(s) in this case.

[procedure] (enclosed-by open content close)

Returns a parser which applies the given parsers open, content, and close in that order to the input or the remainder of the preceding parser, respectively, and returns the result of content with the remainder of close.

Examples

Yet to be written.

About this egg

Source

The source code is hosted at Bitbucket. Feel free to fork it and send pull requests there.

Author

Moritz Heidkamp

Version history

0.1.0
Add parser-input wrapper type so as not to leak lazy-seq implementation detail. Add end-of-input parser. Improve performance of as-string and repeated combinators.
0.0.5
Turn latch into compile-time dependency
0.0.4
Fix potential apply limit exceedance in as-string. Extend repeated to parse exactly n repetitions.
0.0.3
Correct dependencies
0.0.2
Add memoization support
0.0.1
Initial release

License

 Copyright (c) 2013, Moritz Heidkamp
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are
 met:
 
 Redistributions of source code must retain the above copyright
 notice, this list of conditions and the following disclaimer.
 
 Redistributions in binary form must reproduce the above copyright
 notice, this list of conditions and the following disclaimer in the
 documentation and/or other materials provided with the distribution.
 
 Neither the name of the author nor the names of its contributors may
 be used to endorse or promote products derived from this software
 without specific prior written permission.
 
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
 FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
 COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
 OF THE POSSIBILITY OF SUCH DAMAGE.