You are looking at historical revision 22092 of this page. It may differ significantly from its current revision.

lalr

LALR(1) parser generator written in Scheme.

Usage

To generate the parser:

(require-extension lalr)

;; this will only work in the interpreter
(lalr-parser
  grammar definition ... )

To use the parser:


(include "parser.grm.scm") ;; file generated by lalr-parser macro

(parser lexer parse-error) 
;; parser is the function defined in the grammar file
;; lexer is a user-defined character-reading function
;; parse-error is a user-defined error function

Documentation

lalr implements an efficient algorithm for computing lookahead sets, which is the same as the one used in Bison (GNU yacc):

Efficient Computation of LALR(1) Look-Ahead Set, F. DeRemer and T. Pennello, TOPLAS, vol. 4, no. 4, october 1982.

The official documentation for lalr can be found here: http://code.google.com/p/lalr-scm/. The following is taken from chapters 4 and 6 in the official documentation.

Defining a parser

The file lalr.scm declares a macro called lalr-parser:

(lalr-parser [options] tokens rules ...)

To use this macro, you must first load lalr.scm in the Scheme interpreter via the include special form. The macro will not work in the compiler.

This macro, when given appropriate arguments, generates an LALR(1) parser. The macro accepts at least two arguments. The first is a list of symbols which represent the terminal symbols of the grammar. The remaining arguments are the grammar production rules.

The grammar format

The grammar is specified by first giving the list of terminals and the list of non-terminal definitions. Each non-terminal definition is a list where the first element is the non-terminal and the other elements are the right-hand sides (lists of grammar symbols). In addition to this, each right-hand side can be followed by a semantic action.

For example, consider the following (yacc) grammar for a very simple expression language:

   e : e '+' t
     | e '-' t
     | t
     ;
   t : t '*' f
     : t '/' f
     | f
     ;
   f : ID
     ;

The same grammar, written for the scheme parser generator, would look like this (with semantic actions):

(define expr-parser
  (lalr-parser
   ; Terminal symbols
   (ID + - * /)
   ; Productions
   (e (e + t)    : (+ $1 $3)
      (e - t)    : (- $1 $3)
      (t)        : $1)
   (t (t * f)    : (* $1 $3)
      (t / f)    : (/ $1 $3)
      (f)        : $1)
   (f (ID)       : $1)))

In semantic actions, the symbol $n refers to the synthesized attribute value of the nth symbol in the production. The value associated with the non-terminal on the left is the result of evaluating the semantic action (it defaults to #f).

Operator precedence and associativity

The above grammar implicitly handles operator precedences. It is also possible to explicitly assign precedences and associativity to terminal symbols and productions a la Yacc. Here is a modified (and augmented) version of the grammar:

(define expr-parser
 (lalr-parser
  ; Terminal symbols
  (ID
   (left: + -)
   (left: * /)
   (nonassoc: uminus))
  (e (e + e)              : (+ $1 $3)
     (e - e)              : (- $1 $3)
     (e * e)              : (* $1 $3)
     (e / e)              : (/ $1 $3)
     (- e (prec: uminus)) : (- $2)
     (ID)                 : $1)))

The left: directive is used to specify a set of left-associative operators of the same precedence level, the right: directive for right-associative operators, and nonassoc: for operators that are not associative. Note the use of the (apparently) useless terminal uminus. It is only defined in order to assign to the penultimate rule a precedence level higher than that of * and /. The prec: directive can only appear as the last element of a rule. Finally, note that precedence levels are incremented from left to right, i.e. the precedence level of + and - is less than the precedence level of * and / since the formers appear first in the list of terminal symbols (token definitions).

Options

The following options are available.

Error recovery

lalr implements a very simple error recovery strategy. A production can be of the form

  (rulename
      ...
      (error TERMINAL) : action-code)

There can be several such productions for a single rulename.This will cause the parser to skip all the tokens produced by the lexer that are different than the given TERMINAL. For a C-like language, one can synchronize on semicolons and closing curly brackets by writing error rules like these:

   (stmt
      (expression SEMICOLON) : ...
      (LBRACKET stmt RBRACKET) : ...
      (error SEMICOLON)
      (error RBRACKET))

Conflicts in the grammar are handled in a conventional way. In the absence of precedence directives, Shift/Reduce conflicts are resolved by shifting, and Reduce/Reduce conflicts are resolved by choosing the rule listed first in the grammar definition.

Using the parser

The parser generated by the lalr-parser macro is a function that takes two parameters. The first parameter is a lexical analyzer while the second is an error procedure. The lexical analyzer is zero-argument function (a thunk) invoked each time the parser needs to look-ahead in the token stream. A token is usually a pair whose car is the symbol corresponding to the token (the same symbol as used in the grammar definition). The cdr of the pair is the semantic value associated with the token. For example, a string token would have the car set to 'string while the cdr is set to the string value "hello". The token may also be a plain symbol, such as 'NEWLINE, if it has no need of a semantic value.

Once the end of file is encountered, the lexical analyzer must always return the symbol'*eoi* each time it is invoked.

The error procedure must be a function that accepts at least two parameters, a message and additional arguments to be printed.

Examples

 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;
 ;;; FILE: calc.grm
 ;;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;;
 ;;; Parser for simple calculator in Scheme.
 ;;;
 ;;; To use, run the command: 
 ;;; 
 ;;; csi < calc.grm
 ;;;
 ;;; This will create file `calc.grm.scm'. Then, to compile the
 ;;; calculator program, run: 
 ;;;
 ;;; csc calc.scm
 ;;;
 ;;
 ;;
 ;; @created   "Tue Jan  6 12:47:23 2004"
 ;; @modified  "Mon Oct 25 11:07:24 2004"
 ;; @author    "Dominique Boucher"
 ;; @copyright "Dominique Boucher"
 ;;
 ;; Simple arithmetic calculator. 
 ;; 
 ;;   This program illustrates the use of the lalr-scm parser generator
 ;; for Scheme. It is NOT robust, since calling a function with 
 ;; the wrong number of arguments may generate an error that will
 ;; cause the calculator to crash.


  ;;;
  ;;;;   The LALR(1) parser
  ;;;

  (require-extension lalr)

  (define calc-parser
    (lalr-parser

     ;; --- Options 
     ;; output a parser, called calc-parser, in a separate file - calc.grm.scm, 
     (output:    calc-parser "calc.grm.scm")
     ;; output the LALR table to calc.out
     (out-table: "calc.grm.out")

     ;; --- token definitions
     (ID NUM NEWLINE  LPAREN RPAREN
         (nonassoc: =)
         (left: - +)
         (left: * /)
         (left: COMMA)
         (nonassoc: uminus))

     (lines    (lines line) : (display-result $2)
               (line)       : (display-result $1))


     ;; --- rules
     (line     (NEWLINE)               : (void)
  	     (assign NEWLINE)        : $1
               (expr   NEWLINE)        : $1
               (error  NEWLINE)        : #f)

     (assign   (ID = expr)             : (add-binding $1 $3))

     (expr     (NUM)                   : $1
               (ID)                    : (get-binding $1)
  	     (ID LPAREN RPAREN)      : (invoke-proc $1 (list))
  	     (ID LPAREN args RPAREN) : (invoke-proc $1 $3)
  	     (expr + expr)           : (+ $1 $3)
               (expr - expr)           : (- $1 $3)
               (expr * expr)           : (* $1 $3)
               (expr / expr)           : (/ $1 $3)
               (- expr (prec: uminus)) : (- $2)
  	     (LPAREN expr RPAREN)    : $2)

     (args     (expr)                  : (list $1)
               (args COMMA expr)       : (cons $3 $1))))
 	
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  ;;;
  ;;; END FILE: calc.grm
  ;;;
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  ;;;
  ;;; FILE: calc.scm
  ;;;
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

  ;;;
  ;;;; Simple calculator in Scheme
  ;;;
  ;;
  ;; @created   "Tue Jan  6 12:47:23 2004"
  ;; @modified  "Mon Oct 25 11:07:24 2004"
  ;; @author    "Dominique Boucher"
  ;; @copyright "Dominique Boucher"
  ;;
  ;; Simple arithmetic calculator. 
  ;; 
  ;;   This program illustrates the use of the lalr-scm parser generator
  ;; for Scheme. It is NOT robust, since calling a function with 
  ;; the wrong number of arguments may generate an error that will
  ;; cause the calculator to crash.

  ;; parser

  (include "calc.grm.scm")


  ;;;
  ;;;;   The lexer
  ;;;


  (define (make-lexer errorp)
    (lambda ()
      (letrec ((skip-spaces
                (lambda ()
                  (let loop ((c (peek-char)))
                    (if (and (not (eof-object? c))
                             (or (char=? c #\space) (char=? c #\tab)))
                        (begin
                          (read-char)
                          (loop (peek-char)))))))
               (read-number
                (lambda (l)
                  (let ((c (peek-char)))
                    (if (char-numeric? c)
                        (read-number (cons (read-char) l))
                        (string->number (apply string (reverse l)))))))
               (read-id
                (lambda (l)
                  (let ((c (peek-char)))
                    (if (char-alphabetic? c)
                        (read-id (cons (read-char) l))
                        (string->symbol (apply string (reverse l))))))))

        ;; -- skip spaces
        (skip-spaces)
        ;; -- read the next token
        (let loop ((c (read-char)))
          (cond
           ((eof-object? c)      '*eoi*)
           ((char=? c #\newline) 'NEWLINE)
           ((char=? c #\+)       '+)
           ((char=? c #\-)       '-)
           ((char=? c #\*)       '*)
           ((char=? c #\/)       '/)
           ((char=? c #\=)       '=)
           ((char=? c #\,)       'COMMA)
           ((char=? c #\()       'LPAREN)
           ((char=? c #\))       'RPAREN)
           ((char-numeric? c)    (cons 'NUM (read-number (list c))))
           ((char-alphabetic? c) (cons 'ID  (read-id (list c))))
           (else
            (errorp "PARSE ERROR : illegal character: " c)
            (skip-spaces)
            (loop (read-char))))))))


  (define (read-line)
    (let loop ((c (read-char)))
      (if (and (not (eof-object? c))
               (not (char=? c #\newline)))
          (loop (read-char)))))


  ;;;
  ;;;;   Environment management
  ;;;


  (define *env* (list (cons '$$ 0)))


  (define (init-bindings)
    (set-cdr! *env* '())
    (add-binding 'cos cos)
    (add-binding 'sin sin)
    (add-binding 'tan tan)
    (add-binding 'expt expt)
    (add-binding 'sqrt sqrt))


  (define (add-binding var val)
    (set! *env* (cons (cons var val) *env*))
    val)


  (define (get-binding var)
    (let ((p (assq var *env*)))
      (if p
          (cdr p)
          0)))


  (define (invoke-proc proc-name args)
    (let ((proc (get-binding proc-name)))
      (if (procedure? proc)
          (apply proc args)
          (begin
            (display "ERROR: invalid procedure:")
            (display proc-name)
            (newline)
            0))))


  ;; value display
  (define (display-result v)
    (if v
        (begin
          (display "==> ")
          (display v)
          (newline))))

  ;;;
  ;;;;   The main program
  ;;;


  (define calc
    (lambda ()
      (call-with-current-continuation
       (lambda (k)
         (display "********************************") (newline)
         (display "*  Mini calculator in Scheme   *") (newline)
         (display "*                              *") (newline)
         (display "* Enter expressions followed   *") (newline)
         (display "* by [RETURN] or 'quit()' to   *") (newline)
         (display "* exit.                        *") (newline)
         (display "********************************") (newline)
         (init-bindings)
         (add-binding 'quit (lambda () (k #t)))
         (letrec ((errorp
                   (lambda args
                     (for-each display args) (newline)))
                  (start
                   (lambda ()
                     (calc-parser (make-lexer errorp) errorp))))
           (start))))))

  (calc)
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  ;;;
  ;;; END FILE: calc.scm
  ;;;
  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

About this egg

Author

Dominique Boucher

Version history

2.3.2
Documentation converted to wiki format
2.3.1
Bug fix in lalr-driver to clear the input variable every time a parser is called
2.3.0
Updated to upstream version 2.3.0; functionality split in two modules
2.1.2
Error-handler is now called with complete token
2.1.1
Fixed missing syntax attribute in setup script

License

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
02110-1301 USA

A full copy of the GPL license can be found on Debian systems in 
/usr/share/common-licenses/GPL-2