Outdated egg!

This is an egg for CHICKEN 4, the unsupported old release. You're almost certainly looking for the CHICKEN 5 version of this egg, if it exists.

If it does not exist, there may be equivalent functionality provided by another egg; have a look at the egg index. Otherwise, please consider porting this egg to the current version of CHICKEN.

  1. Outdated egg!
  2. abnf
    1. Description
    2. Library Procedures
      1. <CoreABNF> typeclass
      2. Terminal values and core rules
      3. Operators
      4. Abbreviated syntax
    3. Examples
      1. Parsing date and time
    4. Requires
    5. Version History
    6. License

abnf

Description

abnf is a collection of combinators to help constructing parsers for Augmented Backus-Naur form (ABNF) grammars (RFC 4234).

Library Procedures

The combinator procedures in this library are based on the interface provided by the lexgen library.

<CoreABNF> typeclass

The procedures of this library are provided as fields of the <CoreABNF> typeclass. Please see the typeclass library for information on type classes.

The <CoreABNF> class is intended to provide abstraction over different kinds of input sequences, e.g. character lists, strings, streams, etc. The following example illustrates the creation of an instance of <CoreABNF> specialized for character lists. This code is also provided as the abnf-charlist egg, which is fully compatible with abnf prior to version 3.0.

(require-extension typeclass input-classes abnf)

(define char-list-<Input>
  (make-<Input> null? car cdr))

(define char-list-<Token>
  (Input->Token char-list-<Input>))

(define char-list-<CharLex>
  (Token->CharLex char-list-<Token>))

(define char-list-<CoreABNF>
  (CharLex->CoreABNF char-list-<CharLex>))

(import-instance (<CoreABNF> char-list-<CoreABNF>))
		 

Terminal values and core rules

The following procedures are provided as fields in the <CoreABNF> typeclass:

[procedure] (char CHAR) => MATCHER

Procedure char builds a pattern matcher function that matches a single character.

[procedure] (lit STRING) => MATCHER

lit matches a literal string (case-insensitive).

The following primitive parsers match the rules described in RFC 4234, Section 6.1.

[procedure] (alpha STREAM-LIST) => STREAM-LIST

Matches any character of the alphabet.

[procedure] (binary STREAM-LIST) => STREAM-LIST

Matches [0..1].

[procedure] (decimal STREAM-LIST) => STREAM-LIST

Matches [0..9].

[procedure] (hexadecimal STREAM-LIST) => STREAM-LIST

Matches [0..9] and [A..F,a..f].

[procedure] (ascii-char STREAM-LIST) => STREAM-LIST

Matches any 7-bit US-ASCII character except for NUL (ASCII value 0).

[procedure] (cr STREAM-LIST) => STREAM-LIST

Matches the carriage return character.

[procedure] (lf STREAM-LIST) => STREAM-LIST

Matches the line feed character.

[procedure] (crlf STREAM-LIST) => STREAM-LIST

Matches the Internet newline.

[procedure] (ctl STREAM-LIST) => STREAM-LIST

Matches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127].

[procedure] (dquote STREAM-LIST) => STREAM-LIST

Matches the double quote character.

[procedure] (htab STREAM-LIST) => STREAM-LIST

Matches the tab character.

[procedure] (lwsp STREAM-LIST) => STREAM-LIST

Matches linear white-space. That is, any number of consecutive wsp, optionally followed by a crlf and (at least) one more wsp.

[procedure] (sp STREAM-LIST) => STREAM-LIST

Matches the space character.

[procedure] (vspace STREAM-LIST) => STREAM-LIST

Matches any printable ASCII character. That is, any character in the decimal range of [33..126].

[procedure] (wsp STREAM-LIST) => STREAM-LIST

Matches space or tab.

[procedure] (quoted-pair STREAM-LIST) => STREAM-LIST

Matches a quoted pair. Any characters (excluding CR and LF) may be quoted.

[procedure] (quoted-string STREAM-LIST) => STREAM-LIST

Matches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all.

The following additional procedures are provided for convenience:

[procedure] (set CHAR-SET) => MATCHER

Matches any character from an SRFI-14 character set.

[procedure] (set-from-string STRING) => MATCHER

Matches any character from a set defined as a string.

Operators

[procedure] (concatenation MATCHER-LIST) => MATCHER

concatenation matches an ordered list of rules. (RFC 4234, Section 3.1)

[procedure] (alternatives MATCHER-LIST) => MATCHER

alternatives matches any one of the given list of rules. (RFC 4234, Section 3.2)

[procedure] (range C1 C2) => MATCHER

range matches a range of characters. (RFC 4234, Section 3.4)

[procedure] (variable-repetition MIN MAX MATCHER) => MATCHER

variable-repetition matches between MIN and MAX or more consecutive elements that match the given rule. (RFC 4234, Section 3.6)

[procedure] (repetition MATCHER) => MATCHER

repetition matches zero or more consecutive elements that match the given rule.

[procedure] (repetition1 MATCHER) => MATCHER

repetition1 matches one or more consecutive elements that match the given rule.

[procedure] (repetition-n N MATCHER) => MATCHER

repetition-n matches exactly N consecutive occurences of the given rule. (RFC 4234, Section 3.7)

[procedure] (optional-sequence MATCHER) => MATCHER

optional-sequence matches the given optional rule. (RFC 4234, Section 3.8)

[procedure] (pass) => MATCHER

This matcher returns without consuming any input.

[procedure] (bind F P) => MATCHER

Given a rule P and function F, returns a matcher that first applies P to the input stream, then applies F to the returned list of consumed tokens, and returns the result and the remainder of the input stream.

Note: this combinator will signal failure if the input stream is empty.

[procedure] (bind* F P) => MATCHER

The same as bind, but will signal success if the input stream is empty.

[procedure] (drop-consumed P) => MATCHER

Given a rule P, returns a matcher that always returns an empty list of consumed tokens when P succeeds.

Abbreviated syntax

abnf supports the following abbreviations for commonly used combinators:

::
concatenation
:?
optional-sequence
:!
drop-consumed
:s
lit
:c
char
:*
repetition
:+
repetition1

Examples

The following parser libraries have been implemented with abnf, in order of complexity:

Parsing date and time


(require-extension typeclass input-classes abnf)

(define char-list-<Input>
  (make-<Input> null? car cdr))

(define char-list-<Token>
  (Input->Token char-list-<Input>))

(define char-list-<CharLex>
  (Token->CharLex char-list-<Token>))

(define char-list-<CoreABNF>
  (CharLex->CoreABNF char-list-<CharLex>))

(import-instance (<Token> char-list-<Token> char-list/)
		 (<CharLex> char-list-<CharLex> char-list/)
                 (<CoreABNF> char-list-<CoreABNF> char-list/)
                 )

(define fws
  (concatenation
   (optional-sequence 
    (concatenation
     (repetition char-list/wsp)
     (drop-consumed 
      (alternatives char-list/crlf char-list/lf char-list/cr))))
   (repetition1 char-list/wsp)))


(define (between-fws p)
  (concatenation
   (drop-consumed (optional-sequence fws)) p 
   (drop-consumed (optional-sequence fws))))

;; Date and Time Specification from RFC 5322 (Internet Message Format)

;; The following abnf parser combinators parse a date and time
;; specification of the form
;;
;;   Thu, 19 Dec 2002 20:35:46 +0200
;;
; where the weekday specification is optional. 
			     
;; Match the abbreviated weekday names

(define day-name 
  (alternatives
   (char-list/lit "Mon")
   (char-list/lit "Tue")
   (char-list/lit "Wed")
   (char-list/lit "Thu")
   (char-list/lit "Fri")
   (char-list/lit "Sat")
   (char-list/lit "Sun")))

;; Match a day-name, optionally wrapped in folding whitespace

(define day-of-week (between-fws day-name))


;; Match a four digit decimal number

(define year (between-fws (repetition-n 4 char-list/decimal)))

;; Match the abbreviated month names

(define month-name (alternatives
		    (char-list/lit "Jan")
		    (char-list/lit "Feb")
		    (char-list/lit "Mar")
		    (char-list/lit "Apr")
		    (char-list/lit "May")
		    (char-list/lit "Jun")
		    (char-list/lit "Jul")
		    (char-list/lit "Aug")
		    (char-list/lit "Sep")
		    (char-list/lit "Oct")
		    (char-list/lit "Nov")
		    (char-list/lit "Dec")))

;; Match a month-name, optionally wrapped in folding whitespace

(define month (between-fws month-name))


;; Match a one or two digit number

(define day (concatenation
	     (drop-consumed (optional-sequence fws))
	     (alternatives 
	      (variable-repetition 1 2 char-list/decimal)
	      (drop-consumed fws))))

;; Match a date of the form dd:mm:yyyy
(define date (concatenation day month year))

;; Match a two-digit number 

(define hour      (repetition-n 2 char-list/decimal))
(define minute    (repetition-n 2 char-list/decimal))
(define isecond   (repetition-n 2 char-list/decimal))

;; Match a time-of-day specification of hh:mm or hh:mm:ss.

(define time-of-day (concatenation
		     hour (drop-consumed (char-list/char #\:))
		     minute (optional-sequence 
			     (concatenation (drop-consumed (char-list/char #\:))
 					 isecond))))

;; Match a timezone specification of the form
;; +hhmm or -hhmm 

(define zone (concatenation 
	      (drop-consumed fws)
	      (alternatives (char-list/char #\-) (char-list/char #\+))
	      hour minute))

;; Match a time-of-day specification followed by a zone.

(define itime (concatenation time-of-day zone))

(define date-time (concatenation
		   (optional-sequence
		    (concatenation
		     day-of-week
		     (drop-consumed (char-list/char #\,))))
		   date
		   itime
		   (drop-consumed (optional-sequence fws))))

(define (err s)
  (print "lexical error on stream: " s)
  `(error))

(require-extension lexgen)
(print (lex date-time err "Thu, 19 Dec 2002 20:35:46 +0200"))

Requires

Version History

License

 Copyright 2009-2015 Ivan Raikov
 This program is free software: you can redistribute it and/or
 modify it under the terms of the GNU General Public License as
 published by the Free Software Foundation, either version 3 of the
 License, or (at your option) any later version.
 This program is distributed in the hope that it will be useful, but
 WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 General Public License for more details.
 A full copy of the GPL license can be found at
 <http://www.gnu.org/licenses/>.