You are looking at historical revision 16531 of this page. It may differ significantly from its current revision.
lexgen is a lexer generator comprised in its core of only four small procedures. The programmer combines these procedures into regular expression pattern matchers.
A pattern matcher procedure takes a list of streams, and returns a new list of streams advanced by every combination allowed by the pattern matcher function. A stream is defined as a list that contains a list of characters consumed by the pattern matcher, and a list of characters not yet consumed.
Note that the number of streams returned by a pattern matcher typically won't match the number of streams passed in. If the pattern doesn't match at all, the empty list is returned.
Every combinator procedure in this library returns a procedure that takes in a continuation procedure and a list of streams as arguments.
Basic procedures[procedure] (tok TOKEN PROC) => MATCHER
Procedure tok builds a pattern matcher function that, for each stream given, applies a procedure to the given token TOKEN and an input character. If the procedure returns a true value, that value is prepended to the list of consumed elements, and the input character is removed from the list of input elements.[procedure] (seq MATCHER1 MATCHER2) => MATCHER
seq builds a matcher that matches a sequence of patterns.[procedure] (bar MATCHER1 MATCHER2) => MATCHER
bar matches either of two patterns. It's analogous to patterns separated by | in traditional regular expressions.[procedure] (star MATCHER) => MATCHER
star is an implementation of the Kleene closure. It is analogous to * in traditional regular expressions.
These procedures are built from the previous four and are provided for convenience.[procedure] (try PROC) => PROC
Converts a binary predicate procedure to a binary procedure that returns its right argument when the predicate is true, and false otherwise.[procedure] (char CHAR) => MATCHER
Matches a single character.[procedure] (lst MATCHER-LIST) => MATCHER
Constructs a matcher for the sequence of matchers in MATCHER-LIST.[procedure] (pos MATCHER) => MATCHER
Positive closure. Analogous to +.[procedure] (opt MATCHER) => MATCHER
Optional pattern. Analogous to ?.[procedure] (set CHAR-SET) => MATCHER
Matches any of a SRFI-14 set of characters.[procedure] (range CHAR CHAR) => MATCHER
Matches a range of characters. Analogous to character class .[procedure] (lit STRING) => MATCHER
Matches a literal string s.[procedure] (bind F P) => MATCHER
Given a rule P and function F, returns a matcher that first applies P to the input stream, then applies F to the returned list of consumed tokens, and returns the result and the remainder of the input stream.[procedure] (rebind F G P) => MATCHER
Given a rule P and procedures F and G, returns a matcher that first applies F to the input stream, then applies P to the resulting stream, then applies G to the resulting list of consumed elements and returns the result along with the remainder of the input stream.[procedure] (drop P) => MATCHER
Given a rule P, returns a matcher that always returns an empty list of consumed tokens when P succeeds.
Lexer procedures[procedure] (longest STREAM-LIST) => STREAM
Takes the resulting streams produced by the application of a pattern on a stream (or streams) and selects the longest match if one exists. If STREAM-LIST is empty, it returns #F.[procedure] (lex MATCHER ERROR STRING) => CHAR-LIST
lex takes a pattern and a string, turns the string into a list of streams (containing one stream), applies the pattern, and returns the longest match. Argument ERROR is a single-argument procedure called when the pattern does not match anything.
A stream is a list that can take one of two forms:
1) A list of two elements: the first element is a list of elements consumed by the pattern matcher; the second element is a list of elements not yet consumed. E.g., the list
((#\a) (#\b #\c #\d #\e))
represents a stream that contains the consumed character a, and the unconsumed characters b c d e.
2) A list of three elements: the first two elements are as before, but the third element is a procedure that is applied to the tail of the unconsumed list, in order to obtain the next character. E.g., the list:
((#\a) (#\b <port>) <procedure (lambda (in) (list (read-char in) in))>
represents a stream that contains the consumed character a, the unconsumed character b, and an input port to read subsequent characters from; and a procedure that reads one character from the input port, and returns it along with the modified port. Note that the use of side-effecting structures such as ports will lead to erroneous results with backtracking parsers.[procedure] (stream-unfold INIT F G)
A helper procedure to transform streams from one format to another. Procedure F must be a procedure of two arguments, a state and a list of unconsumed characters. Procedure G is applied to an unconsumed element, and is expected to return the original element representation, before F was applied to the unconsumed stream.
(use srfi-1 lexgen) ;; A pattern to match floating point numbers. ;; "-"?(([0-9]+(\\.[0-9]+)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)? (define numpat (let* ((digit (range #\0 #\9)) (digits (pos digit)) (fraction (seq (char #\.) digits)) (significand (bar (seq digits (opt fraction)) fraction)) (exp (seq (set "eE") (seq (opt (set "+-")) digits))) (sign (opt (char #\-)))) (seq sign (seq significand (opt exp))))) (define (err s) (print "lexical error on stream: " s) (list)) (lex numpat err "-123.45e-6") ;; Tokens with position information (define-record-type postok (make-postok pos token) postok? (pos postok-pos ) (token postok-token ) ) (define-record-printer (postok x out) (fprintf out "#<token ~A: ~A>" (postok-pos x) (postok-token x))) (define pos? pair?) (define pos-row car) (define pos-col cdr) (define make-pos cons) ;; Converts an input stream to a stream with position information: (define (stream-pos begtok) (and (postok? begtok) (stream-unfold begtok (lambda (begtok strm) (if (null? strm) '() (let* ((pos0 (postok-pos begtok)) (pos1 (let ((row0 (pos-row pos0)) (col0 (pos-col pos0))) (case (car strm) ((#\newline) (make-pos (+ 1 row0) 1)) ((#\return) (make-pos row0 1)) (else (make-pos row0 (+ 1 col0)))))) (res (cons (make-postok pos1 (car strm)) (cdr strm)))) res))) postok-token))) (define begpos (make-pos 1 0)) (define (getpos p) (let ((f (lambda (in) (and (pair? in) (pair? (cdr in)) (postok-pos (cadr in))))) (g (lambda (i s) (list (make-postok i (car s)))))) (rebind f g p))) (define pos-numpat-stream (list ((stream-pos (make-postok begpos 'start)) `(() ,(string->list "-123.45e-6"))))) (define pbnumpat (let* ((digit (range #\0 #\9)) (digits (star digit)) (fraction (seq (char #\.) digits)) (significand (bar (seq digits (opt fraction)) fraction)) (exp (seq (set "eE") (seq (opt (set "+-")) digits))) (sign (opt (char #\-)) ) (pat (seq (getpos (bind make-sign sign)) (seq (getpos (bind make-significand (longest significand))) (getpos (bind make-exp (longest (opt exp)))))))) pat)) (define (pos-num-parser s) (car (lex pbnumpat err s)))
- 3.3 Bug fixes in stream comparison
- 3.2 Improved input stream comparison procedures
- 3.1 Added rebind combinator and stream-unfold procedure
- 3.0 Added an extension mechanism for input streams of different types (to be elaborated and documented in subsequent versions).
- 2.6 Added bind and drop combinators
- 2.5 The seq combinator checks whether the first parser in the sequence has failed
- 2.4 Added (require-library srfi-1); using lset<= instead of equal? in star
- 2.3 Bug fix in procedure range; added procedure cps-table
- 2.2 Bug fix in procedure star
- 2.1 Added procedure lst
- 2.0 Core procedures rewritten in continuation-passing style
- 1.5 Using (require-extension srfi-1)
- 1.4 Ported to Chicken 4
- 1.2 Added procedures try and tok (supersedes pred)
- 1.0 Initial release
Based on the SML lexer generator by Thant Tessman.
Copyright 2009 Ivan Raikov and the Okinawa Institute of Science and Technology. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A full copy of the GPL license can be found at <http://www.gnu.org/licenses/>.