You are looking at historical revision 16458 of this page. It may differ significantly from its current revision.
abnf
Description
abnf is a collection of combinators to help constructing parsers for Augmented Backus-Naur form (ABNF) grammars (RFC 4234).
Library Procedures
The combinator procedures in this library are based on the interface provided by the lexgen library.
Terminal values
[procedure] (char CHAR) => MATCHERProcedure char builds a pattern matcher function that matches a single character.
[procedure] (lit STRING) => MATCHERlit matches a literal string (case-insensitive).
Operators
[procedure] (concatenation MATCHER-LIST) => MATCHERconcatenation matches an ordered list of rules. (RFC 4234, Section 3.1)
[procedure] (alternatives MATCHER-LIST) => MATCHERalternatives matches any one of the given list of rules. (RFC 4234, Section 3.2)
[procedure] (range C1 C2) => MATCHERrange matches a range of characters. (RFC 4234, Section 3.4)
[procedure] (variable-repetition MIN MAX MATCHER) => MATCHERvariable-repetition matches between MIN and MAX or more consecutive elements that match the given rule. (RFC 4234, Section 3.6)
[procedure] (repetition MATCHER) => MATCHERrepetition matches zero or more consecutive elements that match the given rule.
[procedure] (repetition1 MATCHER) => MATCHERrepetition1 matches one or more consecutive elements that match the given rule.
[procedure] (repetition-n N MATCHER) => MATCHERrepetition-n matches exactly N consecutive occurences of the given rule. (RFC 4234, Section 3.7)
[procedure] (optional-sequence MATCHER) => MATCHERoptional-sequence matches the given optional rule. (RFC 4234, Section 3.8)
Core rules
The following primitive parsers match the rules described in RFC 4234, Section 6.1.
[procedure] (alpha STREAM-LIST) => STREAM-LISTMatches any character of the alphabet.
[procedure] (binary STREAM-LIST) => STREAM-LISTMatches [0..1].
[procedure] (decimal STREAM-LIST) => STREAM-LISTMatches [0..9].
[procedure] (hexadecimal STREAM-LIST) => STREAM-LISTMatches [0..9] and [A..F,a..f].
[procedure] (char STREAM-LIST) => STREAM-LISTMatches any 7-bit US-ASCII character except for NUL (ASCII value 0).
[procedure] (cr STREAM-LIST) => STREAM-LISTMatches the carriage return character.
[procedure] (lf STREAM-LIST) => STREAM-LISTMatches the line feed character.
[procedure] (crlf STREAM-LIST) => STREAM-LISTMatches the Internet newline.
[procedure] (ctl STREAM-LIST) => STREAM-LISTMatches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127].
[procedure] (dquote STREAM-LIST) => STREAM-LISTMatches the double quote character.
[procedure] (htab STREAM-LIST) => STREAM-LISTMatches the tab character.
[procedure] (lwsp STREAM-LIST) => STREAM-LISTMatches linear white-space. That is, any number of consecutive wsp, optionally followed by a crlf and (at least) one more wsp.
[procedure] (sp STREAM-LIST) => STREAM-LISTMatches the space character.
[procedure] (vspace STREAM-LIST) => STREAM-LISTMatches any printable ASCII character. That is, any character in the decimal range of [33..126].
[procedure] (wsp STREAM-LIST) => STREAM-LISTMatches space or tab.
[procedure] (quoted-pair STREAM-LIST) => STREAM-LISTMatches a quoted pair. Any characters (excluding CR and LF) may be quoted.
[procedure] (quoted-string STREAM-LIST) => STREAM-LISTMatches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all.
Additional convenience procedures and parser combinators
[procedure] (pass) => MATCHERThis matcher returns without consuming any input.
[procedure] (set CHAR-SET) => MATCHERMatches any character from an SRFI-14 character set.
[procedure] (set-from-string STRING) => MATCHERMatches any character from a set defined as a string.
[procedure] (bind F P) => MATCHERGiven a rule P and function F, returns a matcher that first applies P to the input stream, then applies F to the returned list of consumed tokens, and returns the result and the remainder of the input stream.
[procedure] (drop-consumed P) => MATCHERGiven a rule P, returns a matcher that always returns an empty list of consumed tokens when P succeeds.
Examples
The following parser libraries have been implemented with abnf, in order of complexity:
Parsing date and time
(use abnf) (define (between-fws p) (abnf:concatenation (abnf:drop-consumed (abnf:optional-sequence fws)) p (abnf:drop-consumed (abnf:optional-sequence fws)))) ;; Date and Time Specification from RFC 5322 (Internet Message Format) ;; The following abnf parser combinators parse a date and time ;; specification of the form ;; ;; Thu, 19 Dec 2002 20:35:46 +0200 ;; ; where the weekday specification is optional. ;; Match the abbreviated weekday names (define day-name (abnf:alternatives (abnf:lit "Mon") (abnf:lit "Tue") (abnf:lit "Wed") (abnf:lit "Thu") (abnf:lit "Fri") (abnf:lit "Sat") (abnf:lit "Sun"))) ;; Match a day-name, optionally wrapped in folding whitespace (define day-of-week (between-fws day-name)) ;; Match a four digit decimal number (define year (between-fws (abnf:repetition-n 4 abnf:decimal))) ;; Match the abbreviated month names (define month-name (abnf:alternatives (abnf:lit "Jan") (abnf:lit "Feb") (abnf:lit "Mar") (abnf:lit "Apr") (abnf:lit "May") (abnf:lit "Jun") (abnf:lit "Jul") (abnf:lit "Aug") (abnf:lit "Sep") (abnf:lit "Oct") (abnf:lit "Nov") (abnf:lit "Dec"))) ;; Match a month-name, optionally wrapped in folding whitespace (define month (between-fws month-name)) ;; Match a one or two digit number (define day (abnf:concatenation (abnf:drop-consumed (abnf:optional-sequence fws)) (abnf:alternatives (abnf:variable-repetition 1 2 abnf:decimal) (abnf:drop-consumed fws)))) ;; Match a date of the form dd:mm:yyyy (define date (abnf:concatenation day month year)) ;; Match a two-digit number (define hour (abnf:repetition-n 2 abnf:decimal)) (define minute (abnf:repetition-n 2 abnf:decimal)) (define isecond (abnf:repetition-n 2 abnf:decimal)) ;; Match a time-of-day specification of hh:mm or hh:mm:ss. (define time-of-day (abnf:concatenation hour (abnf:drop-consumed (abnf:char #\:)) minute (abnf:optional-sequence (abnf:concatenation (abnf:drop-consumed (abnf:char #\:)) isecond)))) ;; Match a timezone specification of the form ;; +hhmm or -hhmm (define zone (abnf:concatenation (abnf:drop-consumed fws) (abnf:alternatives (abnf:char #\-) (abnf:char #\+)) hour minute)) ;; Match a time-of-day specification followed by a zone. (define itime (abnf:concatenation time-of-day zone)) (define date-time (abnf:concatenation (abnf:optional-sequence (abnf:concatenation day-of-week (abnf:drop-consumed (abnf:char #\,)))) date itime (abnf:drop-consumed (abnf:optional-sequence cfws))))
Requires
Version History
- 2.6 Bug fixes in consumer procedures
- 2.5 Removed procedure memo
- 2.4 Moved the definition of bind and drop to lexgen
- 2.2 Added pass combinator
- 2.1 Added procedure variable-repetition
- 2.0 Updated to match the interface of lexgen 2.0
- 1.3 Fix in drop
- 1.2 Added procedures bind drop consume collect
- 1.1 Added procedures set and set-from-string
- 1.0 Initial release
License
Based on the Haskell Rfc2234 module by Peter Simons.
Copyright 2009 Ivan Raikov. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.