You are looking at historical revision 19066 of this page. It may differ significantly from its current revision.

abnf

Description

abnf is a collection of combinators to help constructing parsers for Augmented Backus-Naur form (ABNF) grammars (RFC 4234).

Library Procedures

The combinator procedures in this library are based on the interface provided by the lexgen library.

<CoreABNF> typeclass

The procedures of this library are provided as fields of the <CoreABNF> typeclass. Please see the typeclass library for information on type classes.

The <CoreABNF> class is intended to provide abstraction over different kinds of input sequences, e.g. character lists, strings, streams, etc. The following example illustrates the creation of an instance of <CoreABNF> specialized for character lists. This code is also provided as the abnf-charlist egg, which is fully compatible with abnf prior to version 3.0.

(require-extension typeclass input-classes abnf)

(define char-list-<Input>
  (make-<Input> null? car cdr))

(define char-list-<Token>
  (Input->Token char-list-<Input>))

(define char-list-<CharLex>
  (Token->CharLex char-list-<Token>))

(define char-list-<CoreABNF>
  (Token.CharLex->CoreABNF char-list-<Token> 
			   char-list-<CharLex>))

(import-instance (<CoreABNF> char-list-<CoreABNF>)
		 )

Terminal values and core rules

The following procedures are provided as fields in the <CoreABNF> typeclass:

[procedure] (char CHAR) => MATCHER

Procedure char builds a pattern matcher function that matches a single character.

[procedure] (lit STRING) => MATCHER

lit matches a literal string (case-insensitive).

The following primitive parsers match the rules described in RFC 4234, Section 6.1.

[procedure] (alpha STREAM-LIST) => STREAM-LIST

Matches any character of the alphabet.

[procedure] (binary STREAM-LIST) => STREAM-LIST

Matches [0..1].

[procedure] (decimal STREAM-LIST) => STREAM-LIST

Matches [0..9].

[procedure] (hexadecimal STREAM-LIST) => STREAM-LIST

Matches [0..9] and [A..F,a..f].

[procedure] (ascii-char STREAM-LIST) => STREAM-LIST

Matches any 7-bit US-ASCII character except for NUL (ASCII value 0).

[procedure] (cr STREAM-LIST) => STREAM-LIST

Matches the carriage return character.

[procedure] (lf STREAM-LIST) => STREAM-LIST

Matches the line feed character.

[procedure] (crlf STREAM-LIST) => STREAM-LIST

Matches the Internet newline.

[procedure] (ctl STREAM-LIST) => STREAM-LIST

Matches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127].

[procedure] (dquote STREAM-LIST) => STREAM-LIST

Matches the double quote character.

[procedure] (htab STREAM-LIST) => STREAM-LIST

Matches the tab character.

[procedure] (lwsp STREAM-LIST) => STREAM-LIST

Matches linear white-space. That is, any number of consecutive wsp, optionally followed by a crlf and (at least) one more wsp.

[procedure] (sp STREAM-LIST) => STREAM-LIST

Matches the space character.

[procedure] (vspace STREAM-LIST) => STREAM-LIST

Matches any printable ASCII character. That is, any character in the decimal range of [33..126].

[procedure] (wsp STREAM-LIST) => STREAM-LIST

Matches space or tab.

[procedure] (quoted-pair STREAM-LIST) => STREAM-LIST

Matches a quoted pair. Any characters (excluding CR and LF) may be quoted.

[procedure] (quoted-string STREAM-LIST) => STREAM-LIST

Matches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all.

The following additional procedures are provided for convenience:

[procedure] (set CHAR-SET) => MATCHER

Matches any character from an SRFI-14 character set.

[procedure] (set-from-string STRING) => MATCHER

Matches any character from a set defined as a string.

Operators

[procedure] (concatenation MATCHER-LIST) => MATCHER

concatenation matches an ordered list of rules. (RFC 4234, Section 3.1)

[procedure] (alternatives MATCHER-LIST) => MATCHER

alternatives matches any one of the given list of rules. (RFC 4234, Section 3.2)

[procedure] (range C1 C2) => MATCHER

range matches a range of characters. (RFC 4234, Section 3.4)

[procedure] (variable-repetition MIN MAX MATCHER) => MATCHER

variable-repetition matches between MIN and MAX or more consecutive elements that match the given rule. (RFC 4234, Section 3.6)

[procedure] (repetition MATCHER) => MATCHER

repetition matches zero or more consecutive elements that match the given rule.

[procedure] (repetition1 MATCHER) => MATCHER

repetition1 matches one or more consecutive elements that match the given rule.

[procedure] (repetition-n N MATCHER) => MATCHER

repetition-n matches exactly N consecutive occurences of the given rule. (RFC 4234, Section 3.7)

[procedure] (optional-sequence MATCHER) => MATCHER

optional-sequence matches the given optional rule. (RFC 4234, Section 3.8)

[procedure] (pass) => MATCHER

This matcher returns without consuming any input.

[procedure] (bind F P) => MATCHER

Given a rule P and function F, returns a matcher that first applies P to the input stream, then applies F to the returned list of consumed tokens, and returns the result and the remainder of the input stream.

[procedure] (drop-consumed P) => MATCHER

Given a rule P, returns a matcher that always returns an empty list of consumed tokens when P succeeds.

Abbreviated syntax

abnf supports the following abbreviations for commonly used combinators:

::
concatenation
:|
alternatives
:?
optional-sequence
:!
drop-consumed
:s
lit
:c
char
:*
repetition
:+
repetition1

Examples

The following parser libraries have been implemented with abnf, in order of complexity:

Parsing date and time


(require-extension typeclass input-classes abnf)

(define char-list-<Input>
  (make-<Input> null? car cdr))

(define char-list-<Token>
  (Input->Token char-list-<Input>))

(define char-list-<CharLex>
  (Token->CharLex char-list-<Token>))

(define char-list-<CoreABNF>
  (Token.CharLex->CoreABNF char-list-<Token> 
			   char-list-<CharLex>))


(define (between-fws p)
  (concatenation
   (drop-consumed (optional-sequence fws)) p 
   (drop-consumed (optional-sequence fws))))

;; Date and Time Specification from RFC 5322 (Internet Message Format)

;; The following abnf parser combinators parse a date and time
;; specification of the form
;;
;;   Thu, 19 Dec 2002 20:35:46 +0200
;;
; where the weekday specification is optional. 
			     
;; Match the abbreviated weekday names

(define day-name 
  (alternatives
   (lit "Mon")
   (lit "Tue")
   (lit "Wed")
   (lit "Thu")
   (lit "Fri")
   (lit "Sat")
   (lit "Sun")))

;; Match a day-name, optionally wrapped in folding whitespace

(define day-of-week (between-fws day-name))


;; Match a four digit decimal number

(define year (between-fws (repetition-n 4 decimal)))

;; Match the abbreviated month names

(define month-name (alternatives
		    (lit "Jan")
		    (lit "Feb")
		    (lit "Mar")
		    (lit "Apr")
		    (lit "May")
		    (lit "Jun")
		    (lit "Jul")
		    (lit "Aug")
		    (lit "Sep")
		    (lit "Oct")
		    (lit "Nov")
		    (lit "Dec")))

;; Match a month-name, optionally wrapped in folding whitespace

(define month (between-fws month-name))


;; Match a one or two digit number

(define day (concatenation
	     (drop-consumed (optional-sequence fws))
	     (alternatives 
	      (variable-repetition 1 2 decimal)
	      (drop-consumed fws))))

;; Match a date of the form dd:mm:yyyy
(define date (concatenation day month year))

;; Match a two-digit number 

(define hour      (repetition-n 2 decimal))
(define minute    (repetition-n 2 decimal))
(define isecond   (repetition-n 2 decimal))

;; Match a time-of-day specification of hh:mm or hh:mm:ss.

(define time-of-day (concatenation
		     hour (drop-consumed (char #\:))
		     minute (optional-sequence 
			     (concatenation (drop-consumed (char #\:))
 					 isecond))))

;; Match a timezone specification of the form
;; +hhmm or -hhmm 

(define zone (concatenation 
	      (drop-consumed fws)
	      (alternatives (char #\-) (char #\+))
	      hour minute))

;; Match a time-of-day specification followed by a zone.

(define itime (concatenation time-of-day zone))

(define date-time (concatenation
		   (optional-sequence
		    (concatenation
		     day-of-week
		     (drop-consumed (char #\,))))
		   date
		   itime
		   (drop-consumed (optional-sequence cfws))))

Requires

Version History

License

 Copyright 2009-2010 Ivan Raikov and the Okinawa Institute of Science and Technology.
 This program is free software: you can redistribute it and/or
 modify it under the terms of the GNU General Public License as
 published by the Free Software Foundation, either version 3 of the
 License, or (at your option) any later version.
 This program is distributed in the hope that it will be useful, but
 WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 General Public License for more details.
 A full copy of the GPL license can be found at
 <http://www.gnu.org/licenses/>.