Outdated egg!
This is an egg for CHICKEN 3, the unsupported old release. You're almost certainly looking for the CHICKEN 4 version of this egg, if it exists.
If it does not exist, there may be equivalent functionality provided by another egg; have a look at the egg index. Otherwise, please consider porting this egg to the current version of CHICKEN.
lexgen
Description
lexgen is a lexer generator comprised in its core of only four small procedures. The programmer combines these procedures into regular expression pattern matchers.
A pattern matcher procedure takes a list of streams, and returns a new list of streams advanced by every combination allowed by the pattern matcher function. A stream is defined as a list that contains a list of characters consumed by the pattern matcher, and a list of characters not yet consumed.
Note that the number of streams returned by a pattern matcher typically won't match the number of streams passed in. If the pattern doesn't match at all, the empty list is returned.
Library Procedures
Every combinator procedure in this library returns a procedure that takes in a list of streams as an argument.
Basic procedures
[procedure] (tok TOKEN PROC) => MATCHERProcedure tok builds a pattern matcher function that, for each stream given, applies a procedure to the given token TOKEN and an input character. If the procedure returns a true value, that value is prepended to the list of consumed elements, and the input character is removed from the list of input elements.
[procedure] (seq MATCHER-LIST) => MATCHERseq builds a matcher that matches a sequence of patterns.
[procedure] (bar MATCHER-LIST) => MATCHERbar matches any of a list of patterns. It's analogous to a series of patterns separated by | in traditional regular expressions.
[procedure] (star MATCHER) => MATCHERstar is an implementation of the Kleene closure. It is analogous to * in traditional regular expressions.
Convenience procedures
These procedures are built from the previous four and are provided for convenience.
[procedure] (try PROC) => PROCConverts a binary predicate procedure to a binary procedure that returns its right argument when the predicate is true, and false otherwise.
[procedure] (char CHAR) => MATCHERMatches a single character.
[procedure] (pos MATCHER) => MATCHERPositive closure. Analogous to +.
[procedure] (opt MATCHER) => MATCHEROptional pattern. Analogous to ?.
[procedure] (set CHAR-SET) => MATCHERMatches any of a SRFI-14 set of characters.
[procedure] (range CHAR CHAR) => MATCHERMatches a range of characters. Analogous to character class [].
[procedure] (lit STRING) => MATCHERMatches a literal string s.
Lexer procedures
[procedure] (longest STREAM-LIST) => STREAMTakes the resulting streams produced by the application of a pattern on a stream (or streams) and selects the longest match if one exists. If STREAM-LIST is empty, it returns #F.
[procedure] (lex MATCHER STRING) => CHAR-LISTlex takes a pattern and a string, turns the string into a list of streams (containing one stream), applies the pattern, and returns the longest match.
Examples
(define a-pat (tok #\a (try char=?))) (define b-pat (tok #\b (try char=?))) (define a-then-b-pat (seq (list a-pat b-pat))) (define a-or-b-pat (seq (list a-pat b-pat))) (define a-star-pat (star a-pat)) (define abc-stream (list `(() ,(string->list "abc"))))
(print (a-pat abc-stream)) (print (b-pat abc-stream)) (print (a-then-b-pat abc-stream)) (print (a-or-b-pat abc-stream)) (print (a-star-pat abc-stream))
;; A pattern to match floating point numbers. ;; "-"?(([0-9]+(\\.[0-9]+)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?
(define numpat (let* ((digit (range #\0 #\9)) (digits (pos digit)) (fraction (seq `(,(char #\.) ,digits))) (significand (bar `(,(seq `(,digits ,(opt fraction))) ,fraction))) (exp (seq `(,(set "eE") ,(opt (set "+-")) ,digits))) (sign (opt (char #\-)) )) (seq `(,sign ,(seq `(,significand ,(opt exp))))))) (print (lex numpat "3.45e-6"))
Requires
Version History
- 1.2 Added procedures try and tok (supersedes pred)
- 1.0 Initial release
License
Based on the SML lexer generator by Thant Tessman.
Copyright 2009 Ivan Raikov. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.