1. Prcc(Parser/Regex Combinator library for Chicken scheme)
    1. Introduction
    2. Combinators
      1. Helpers
    3. Example
  2. More information
  3. Author
  4. License
  5. Version History

Prcc(Parser/Regex Combinator library for Chicken scheme)

Introduction

Prcc is a PEG-like combinator parser library and inspired by Ruby gem rsec.

Each combinator is a procedure that accepts an opaque "context" object and returns an object representing its match, or #f if it does not match.

Combinators

[procedure] (char CHAR)

Generate a parser that reads a char and returns this character as a string.

[procedure] (<c> CHAR)

Alias of char.

[procedure] (seq PARSER ...)

Sequence parser: each subparser must match, and their results are returned in a list.

[procedure] (<and> PARSER ...)

Alias of sequence parser.

[procedure] (sel PARSER ...)

Branch parser and ordered selected. Returns the result of the first parser that matches.

[procedure] (<or> PARSER ...)

Alias of branch parser.

[procedure] (one? PARSER)

Appear 0 or 1 time. Returns the empty string if PARSER doesn't match.

[procedure] (<?> PARSER)

Alias of one?.

[procedure] (rep PARSER)

Repeat 0 to infinite times. Returns a list of PARSER results, with as many items as matches that were found.

Example:

(parse-string "aabba" (rep (sel (char #\a) (char #\b))))
=> ("a" "a" "b" "b" "a")
[procedure] (<*> PARSER)

Alias of rep.

[procedure] (rep+ PARSER)

Repeat 1 to infinite times.

[procedure] (<+> PARSER)

Alias of rep+.

[procedure] (pred PARSER0 PARSER1)

Lookahead predicate PARSER1.

Example:

(parse-string "a" (pred (char #\a) (eof)))
=> "a"
;; If we had used (seq), we would get '("a" "")

;; This also allows us to ensure this is the entire string:
(parse-string "ab" (pred (char #\a) (eof)))
=> #f

;; Without the lookahead, it will simply consume as much as possible:
(parse-string "ab" (char #\a))
=> "a"
[procedure] (<&> PARSER0 PARSER1)

Alias of pred

[procedure] (pred! PARSER0 PARSER1)

Negative lookahead.

[procedure] (<&!> PARSER0 PARSER1)

Alias of pred!.

[procedure] (eof)

End of file.

[procedure] (act PARSER [SUCC-PROC] [FAIL-PREC])

Act on the result of the parser, whether it's success or failure.

This allows you to add semantic actions to the parser.

Note: Be sure not to return #f in SUCC-PROC, because that will be filtered out.

Example:

(define a-or-b (sel (char #\a) (char #\b)))
(parse-string "aabba" (rep (act a-or-b (lambda (x) (if (string=? "a" x) 'yes 'no)))))
=> (yes yes no no yes)
[procedure] (<@> PARSER [SUCC-PROC] [FAIL-PREC])

Alias of act.

[procedure] (neg PARSER)

Take parser failure as pass.

[procedure] (<^> PARSER)

Alias of neg.

[procedure] (regexp-parser STRING [CHUNK-SIZE])

Generate a regexp parser.

[procedure] (<r> STRING [CHUNK-SIZE])

Alias of regexp-parser.

[syntax] (lazy PARSER)

Defer the binding of parser. This is useful for mutually recursive parsers, as PARSER can be defined after the use of the lazy parser.

Example:

;; Without "lazy" around bar, this would give an error that
;; bar is not yet defined.
(define foo (sel (char #\x) (lazy bar)))
(define bar (char #\y))
[procedure] (cached PARSER)

Cache parser result(packrat parsing).

Helpers

[procedure] (str STRING)

A string parser.

[procedure] (<s> STRING)

Alias of str.

[procedure] (one-of STRING)

Parse one of chars in STRING.

[procedure] (join+ PARSER0 PARSER1)

Repeat PARSER0 one or more times, interspersed by PARSER1.

Example:

;; Parse an array of "a" or "b" identifiers:
;; This can be done more elegantly with rep+_
(define ident (sel (char #\a) (char #\b)))

(parse-string
   "[a,b,b,a]"
   (even (ind (seq (char #\[) (join+ ident (char #\,)) (char #\])) 1)))

=> ("a" "b" "b" "a")
[procedure] (join+_ PARSER0 PARSER1 [skip: PARSER2])

Repeat PARSER0 with PARSER1 inserted but skip PARSER2. By default, PARSER2 is spaces parser (<s*>).

[procedure] (ind SEQ-PARSER INDEX)

Return the value of SEQ_PARSER output that is indicated by INDEX.

Example:

(parse-string "xy" (ind (seq (char #\x) (char #\y)) 1))
=> "y"
[procedure] (<#> SEQ-PARSER INDEX)

Alias of ind.

[procedure] (<w>)

A word letter (any uppercase or lowercase letter, digit or underscore, i.e. the same as (<r> "\\w")).

[procedure] (<w*>)

Zero or more word letters.

[procedure] (<w+>)

One or more word letters.

[procedure] (<space>)

One whitespace character (space, tab or newline).

[procedure] (<s*>)

Zero or more whitespace characters.

[procedure] (<s+>)

One or more whitespace characters.

[procedure] (rep_ PARSER0 [skip: PARSER1])

Repeat PARSER0 from 0 to infinite times, but skip PARSER1. By default, PARSER1 is spaces parser (<s*>).

[procedure] (<*_> PARSER0 [skip: PARSER1])

Alias of rep_.

[procedure] (rep+_ PARSER0 [skip: PARSER1])

Repeat PARSER0 from 1 to infinite times, but skip PARSER1. By default, PARSER1 is spaces parser (<s*>).

Example:

;; Parse an array of "a" or "b" identifiers:
(define ident (sel (char #\a) (char #\b)))

(parse-string
   "[a,b,b,a]"
   (ind (seq (char #\[) (rep+_ a-or-b skip: (char #\,)) (char #\])) 1))
=> ("a" "b" "b" "a")
[procedure] (<+_> PARSER0 [skip: PARSER1])

Alias of rep+_.

[procedure] (seq_ PARSER ... [skip: PARSER1])

Sequence parser but skip PARSER1. By default, PARSER1 is spaces parser (<s*>).

[procedure] (and_ PARSER ... [skip: PARSER1])

Alias of seq_.

[procedure] (even SEQ-PARSER)

Generate a parser which returns the elements at even-numbered positions of sequence parser output, collected in a list.

Note: This starts counting at zero!

Example:

(parse-string "abcde" (even (seq (char #\a) (char #\b) (char #\c) (char #\d) (char #\e))))
=> ("a" "c" "e")
[procedure] (odd SEQ-PARSER)

Generate a parser which returns the elements at odd-numbered positions of sequence parser output, collected in a list.

Note: This starts counting at zero!

Example:

(parse-string "abcde" (odd (seq (char #\a) (char #\b) (char #\c) (char #\d) (char #\e))))
=> ("b" "d")
[procedure] (parse-file FILENAME PARSER [CACHE])

Parse a file with PARSER. By default, no cache (CACHE=#f).

[procedure] (parse-string STRING PARSER [CACHE])

Parse a string with PARSER. By default, no cache (CACHE=#f).

[syntax] (parse-port PORT PARSER [CACHE])

Parse from PORT with PARSER. By default, no cache (CACHE=#f).

Example

(use prcc)

(define parser
  (<and>
    (<@> (<s> "hello")
      (lambda (o) "hello "))
    (<s> "world")
    (eof)))

(display (parse-string "helloworld" parser))
(newline)

More information

PEG wiki page

Packrat Parsing and Parsing Expression Grammars

Author

Wei Hu

License

 Copyright (C) 2012, Wei Hu
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
 
 Redistributions of source code must retain the above copyright notice, this
 list of conditions and the following disclaimer.
 Redistributions in binary form must reproduce the above copyright notice,
 this list of conditions and the following disclaimer in the documentation
 and/or other materials provided with the distribution.
 Neither the name of the author nor the names of its contributors may be
 used to endorse or promote products derived from this software without
 specific prior written permission.
 
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.

Version History

0.1
initial release