You are looking at historical revision 19066 of this page. It may differ significantly from its current revision.
abnf
Description
abnf is a collection of combinators to help constructing parsers for Augmented Backus-Naur form (ABNF) grammars (RFC 4234).
Library Procedures
The combinator procedures in this library are based on the interface provided by the lexgen library.
<CoreABNF> typeclass
The procedures of this library are provided as fields of the <CoreABNF> typeclass. Please see the typeclass library for information on type classes.
The <CoreABNF> class is intended to provide abstraction over different kinds of input sequences, e.g. character lists, strings, streams, etc. The following example illustrates the creation of an instance of <CoreABNF> specialized for character lists. This code is also provided as the abnf-charlist egg, which is fully compatible with abnf prior to version 3.0.
(require-extension typeclass input-classes abnf) (define char-list-<Input> (make-<Input> null? car cdr)) (define char-list-<Token> (Input->Token char-list-<Input>)) (define char-list-<CharLex> (Token->CharLex char-list-<Token>)) (define char-list-<CoreABNF> (Token.CharLex->CoreABNF char-list-<Token> char-list-<CharLex>)) (import-instance (<CoreABNF> char-list-<CoreABNF>) )
Terminal values and core rules
The following procedures are provided as fields in the <CoreABNF> typeclass:
[procedure] (char CHAR) => MATCHERProcedure char builds a pattern matcher function that matches a single character.
[procedure] (lit STRING) => MATCHERlit matches a literal string (case-insensitive).
The following primitive parsers match the rules described in RFC 4234, Section 6.1.
[procedure] (alpha STREAM-LIST) => STREAM-LISTMatches any character of the alphabet.
[procedure] (binary STREAM-LIST) => STREAM-LISTMatches [0..1].
[procedure] (decimal STREAM-LIST) => STREAM-LISTMatches [0..9].
[procedure] (hexadecimal STREAM-LIST) => STREAM-LISTMatches [0..9] and [A..F,a..f].
[procedure] (ascii-char STREAM-LIST) => STREAM-LISTMatches any 7-bit US-ASCII character except for NUL (ASCII value 0).
[procedure] (cr STREAM-LIST) => STREAM-LISTMatches the carriage return character.
[procedure] (lf STREAM-LIST) => STREAM-LISTMatches the line feed character.
[procedure] (crlf STREAM-LIST) => STREAM-LISTMatches the Internet newline.
[procedure] (ctl STREAM-LIST) => STREAM-LISTMatches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127].
[procedure] (dquote STREAM-LIST) => STREAM-LISTMatches the double quote character.
[procedure] (htab STREAM-LIST) => STREAM-LISTMatches the tab character.
[procedure] (lwsp STREAM-LIST) => STREAM-LISTMatches linear white-space. That is, any number of consecutive wsp, optionally followed by a crlf and (at least) one more wsp.
[procedure] (sp STREAM-LIST) => STREAM-LISTMatches the space character.
[procedure] (vspace STREAM-LIST) => STREAM-LISTMatches any printable ASCII character. That is, any character in the decimal range of [33..126].
[procedure] (wsp STREAM-LIST) => STREAM-LISTMatches space or tab.
[procedure] (quoted-pair STREAM-LIST) => STREAM-LISTMatches a quoted pair. Any characters (excluding CR and LF) may be quoted.
[procedure] (quoted-string STREAM-LIST) => STREAM-LISTMatches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all.
The following additional procedures are provided for convenience:
[procedure] (set CHAR-SET) => MATCHERMatches any character from an SRFI-14 character set.
[procedure] (set-from-string STRING) => MATCHERMatches any character from a set defined as a string.
Operators
[procedure] (concatenation MATCHER-LIST) => MATCHERconcatenation matches an ordered list of rules. (RFC 4234, Section 3.1)
[procedure] (alternatives MATCHER-LIST) => MATCHERalternatives matches any one of the given list of rules. (RFC 4234, Section 3.2)
[procedure] (range C1 C2) => MATCHERrange matches a range of characters. (RFC 4234, Section 3.4)
[procedure] (variable-repetition MIN MAX MATCHER) => MATCHERvariable-repetition matches between MIN and MAX or more consecutive elements that match the given rule. (RFC 4234, Section 3.6)
[procedure] (repetition MATCHER) => MATCHERrepetition matches zero or more consecutive elements that match the given rule.
[procedure] (repetition1 MATCHER) => MATCHERrepetition1 matches one or more consecutive elements that match the given rule.
[procedure] (repetition-n N MATCHER) => MATCHERrepetition-n matches exactly N consecutive occurences of the given rule. (RFC 4234, Section 3.7)
[procedure] (optional-sequence MATCHER) => MATCHERoptional-sequence matches the given optional rule. (RFC 4234, Section 3.8)
[procedure] (pass) => MATCHERThis matcher returns without consuming any input.
[procedure] (bind F P) => MATCHERGiven a rule P and function F, returns a matcher that first applies P to the input stream, then applies F to the returned list of consumed tokens, and returns the result and the remainder of the input stream.
[procedure] (drop-consumed P) => MATCHERGiven a rule P, returns a matcher that always returns an empty list of consumed tokens when P succeeds.
Abbreviated syntax
abnf supports the following abbreviations for commonly used combinators:
- ::
- concatenation
- :|
- alternatives
- :?
- optional-sequence
- :!
- drop-consumed
- :s
- lit
- :c
- char
- :*
- repetition
- :+
- repetition1
Examples
The following parser libraries have been implemented with abnf, in order of complexity:
Parsing date and time
(require-extension typeclass input-classes abnf) (define char-list-<Input> (make-<Input> null? car cdr)) (define char-list-<Token> (Input->Token char-list-<Input>)) (define char-list-<CharLex> (Token->CharLex char-list-<Token>)) (define char-list-<CoreABNF> (Token.CharLex->CoreABNF char-list-<Token> char-list-<CharLex>)) (define (between-fws p) (concatenation (drop-consumed (optional-sequence fws)) p (drop-consumed (optional-sequence fws)))) ;; Date and Time Specification from RFC 5322 (Internet Message Format) ;; The following abnf parser combinators parse a date and time ;; specification of the form ;; ;; Thu, 19 Dec 2002 20:35:46 +0200 ;; ; where the weekday specification is optional. ;; Match the abbreviated weekday names (define day-name (alternatives (lit "Mon") (lit "Tue") (lit "Wed") (lit "Thu") (lit "Fri") (lit "Sat") (lit "Sun"))) ;; Match a day-name, optionally wrapped in folding whitespace (define day-of-week (between-fws day-name)) ;; Match a four digit decimal number (define year (between-fws (repetition-n 4 decimal))) ;; Match the abbreviated month names (define month-name (alternatives (lit "Jan") (lit "Feb") (lit "Mar") (lit "Apr") (lit "May") (lit "Jun") (lit "Jul") (lit "Aug") (lit "Sep") (lit "Oct") (lit "Nov") (lit "Dec"))) ;; Match a month-name, optionally wrapped in folding whitespace (define month (between-fws month-name)) ;; Match a one or two digit number (define day (concatenation (drop-consumed (optional-sequence fws)) (alternatives (variable-repetition 1 2 decimal) (drop-consumed fws)))) ;; Match a date of the form dd:mm:yyyy (define date (concatenation day month year)) ;; Match a two-digit number (define hour (repetition-n 2 decimal)) (define minute (repetition-n 2 decimal)) (define isecond (repetition-n 2 decimal)) ;; Match a time-of-day specification of hh:mm or hh:mm:ss. (define time-of-day (concatenation hour (drop-consumed (char #\:)) minute (optional-sequence (concatenation (drop-consumed (char #\:)) isecond)))) ;; Match a timezone specification of the form ;; +hhmm or -hhmm (define zone (concatenation (drop-consumed fws) (alternatives (char #\-) (char #\+)) hour minute)) ;; Match a time-of-day specification followed by a zone. (define itime (concatenation time-of-day zone)) (define date-time (concatenation (optional-sequence (concatenation day-of-week (drop-consumed (char #\,)))) date itime (drop-consumed (optional-sequence cfws))))
Requires
Version History
- 3.0 Implemented typeclass interface
- 2.9 Bug fix in consumed-objects (reported by Peter Bex)
- 2.7 Added abbreviated syntax (suggested by Moritz Heidkamp)
- 2.6 Bug fixes in consumer procedures
- 2.5 Removed procedure memo
- 2.4 Moved the definition of bind and drop to lexgen
- 2.2 Added pass combinator
- 2.1 Added procedure variable-repetition
- 2.0 Updated to match the interface of lexgen 2.0
- 1.3 Fix in drop
- 1.2 Added procedures bind drop consume collect
- 1.1 Added procedures set and set-from-string
- 1.0 Initial release
License
Copyright 2009-2010 Ivan Raikov and the Okinawa Institute of Science and Technology. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A full copy of the GPL license can be found at <http://www.gnu.org/licenses/>.