lalr
LALR(1) parser generator written in Scheme.
Usage
To generate the parser:
(import lalr) ;; this will only work in the interpreter (lalr-parser grammar definition ... )
To use the parser:
(include "parser.grm.scm") ;; file generated by lalr-parser macro (parser lexer parse-error) ;; parser is the function defined in the grammar file ;; lexer is a user-defined character-reading function ;; parse-error is a user-defined error function
Documentation
lalr implements an efficient algorithm for computing lookahead sets, which is the same as the one used in Bison (GNU yacc):
Efficient Computation of LALR(1) Look-Ahead Set, F. DeRemer and T. Pennello, TOPLAS, vol. 4, no. 4, october 1982.
The official documentation for lalr can be found here: https://github.com/schemeway/lalr-scm. The following is taken from chapters 4 and 6 in the official documentation.
Defining a parser
The file lalr.scm declares a macro called lalr-parser:
(lalr-parser [options] tokens rules ...)
To use this macro, you must first load lalr.scm in the Scheme interpreter via the include special form. The macro will not work in the compiler.
This macro, when given appropriate arguments, generates an LALR(1) parser. The macro accepts at least two arguments. The first is a list of symbols which represent the terminal symbols of the grammar. The remaining arguments are the grammar production rules.
The grammar format
The grammar is specified by first giving the list of terminals and the list of non-terminal definitions. Each non-terminal definition is a list where the first element is the non-terminal and the other elements are the right-hand sides (lists of grammar symbols). In addition to this, each right-hand side can be followed by a semantic action.
For example, consider the following (yacc) grammar for a very simple expression language:
e : e '+' t | e '-' t | t ; t : t '*' f : t '/' f | f ; f : ID ;
The same grammar, written for the scheme parser generator, would look like this (with semantic actions):
(define expr-parser (lalr-parser ; Terminal symbols (ID + - * /) ; Productions (e (e + t) : (+ $1 $3) (e - t) : (- $1 $3) (t) : $1) (t (t * f) : (* $1 $3) (t / f) : (/ $1 $3) (f) : $1) (f (ID) : $1)))
In semantic actions, the symbol $n refers to the synthesized attribute value of the nth symbol in the production. The value associated with the non-terminal on the left is the result of evaluating the semantic action (it defaults to #f).
Operator precedence and associativity
The above grammar implicitly handles operator precedences. It is also possible to explicitly assign precedences and associativity to terminal symbols and productions a la Yacc. Here is a modified (and augmented) version of the grammar:
(define expr-parser (lalr-parser ; Terminal symbols (ID (left: + -) (left: * /) (nonassoc: uminus)) (e (e + e) : (+ $1 $3) (e - e) : (- $1 $3) (e * e) : (* $1 $3) (e / e) : (/ $1 $3) (- e (prec: uminus)) : (- $2) (ID) : $1)))
The left: directive is used to specify a set of left-associative operators of the same precedence level, the right: directive for right-associative operators, and nonassoc: for operators that are not associative. Note the use of the (apparently) useless terminal uminus. It is only defined in order to assign to the penultimate rule a precedence level higher than that of * and /. The prec: directive can only appear as the last element of a rule. Finally, note that precedence levels are incremented from left to right, i.e. the precedence level of + and - is less than the precedence level of * and / since the formers appear first in the list of terminal symbols (token definitions).
Options
The following options are available.
- (output: name filename) - copies the parser to the given file. The parser is given the name name.
- (out-tables: filename) - outputs the parsing tables in filename in a more readable format.
- (expect: n) - don't warn about conflicts if there are n or less conflicts.
- (driver: glr) - generates a GLR parser instead of a regular LALR parser.
Error recovery
lalr implements a very simple error recovery strategy. A production can be of the form
(rulename ... (error TERMINAL) : action-code)
There can be several such productions for a single rulename.This will cause the parser to skip all the tokens produced by the lexer that are different than the given TERMINAL. For a C-like language, one can synchronize on semicolons and closing curly brackets by writing error rules like these:
(stmt (expression SEMICOLON) : ... (LBRACKET stmt RBRACKET) : ... (error SEMICOLON) (error RBRACKET))
Conflicts in the grammar are handled in a conventional way. In the absence of precedence directives, Shift/Reduce conflicts are resolved by shifting, and Reduce/Reduce conflicts are resolved by choosing the rule listed first in the grammar definition.
Using the parser
The parser generated by the lalr-parser macro is a function that takes two parameters. The first parameter is a lexical analyzer while the second is an error procedure. The lexical analyzer is zero-argument function (a thunk) invoked each time the parser needs to look-ahead in the token stream. A token is usually a pair whose car is the symbol corresponding to the token (the same symbol as used in the grammar definition). The cdr of the pair is the semantic value associated with the token. For example, a string token would have the car set to 'string while the cdr is set to the string value "hello". The token may also be a plain symbol, such as 'NEWLINE, if it has no need of a semantic value.
Once the end of file is encountered, the lexical analyzer must always return the symbol'*eoi* each time it is invoked.
The error procedure must be a function that accepts at least two parameters, a message and additional arguments to be printed.
Examples
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; FILE: calc.grm ;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; Parser for simple calculator in Scheme. ;;; ;;; To use, run the command: ;;; ;;; csi < calc.grm ;;; ;;; This will create file `calc.grm.scm'. Then, to compile the ;;; calculator program, run: ;;; ;;; csc calc.scm ;;;
(import lalr) (define calc-parser (lalr-parser ;; --- Options ;; output a parser, called calc-parser, in a separate file - calc.yy.scm, (output: calc-parser "calc.yy.scm") ;; output the LALR table to calc.out (out-table: "calc.out") ;; there should be no conflict (expect: 5) ;; --- token definitions (ID NUM = LPAREN RPAREN NEWLINE COMMA (left: + -) (left: * /) (nonassoc: uminus)) (lines (lines line) : (display-result $2) (line) : (display-result $1))
;; --- rules (line (assign NEWLINE) : $1 (expr NEWLINE) : $1 (NEWLINE) : #f (error NEWLINE) : #f) (assign (ID = expr) : (add-binding $1 $3))
(expr (expr + expr) : (+ $1 $3) (expr - expr) : (- $1 $3) (expr * expr) : (* $1 $3) (expr / expr) : (/ $1 $3) (- expr (prec: uminus)) : (- $2) (ID) : (get-binding $1) (ID LPAREN args RPAREN) : (invoke-proc $1 $3) (NUM) : $1 (LPAREN expr RPAREN) : $2)
(args () : '() (expr arg-rest) : (cons $1 $2))
(arg-rest (COMMA expr arg-rest) : (cons $2 $3) () : '()))) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; END FILE: calc.grm ;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; FILE: calc.scm ;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(import (chicken port)) (import lalr-driver)
;; parser
(include "calc.yy.scm") ;;; ;;;; The lexer ;;; (define (force-output) #f) (define (port-line port) (let-values (((line _) (port-position port))) line)) (define (port-column port) (let-values (((_ column) (port-position port))) column))
(define (make-lexer errorp) (lambda () (letrec ((skip-spaces (lambda () (let loop ((c (peek-char))) (if (and (not (eof-object? c)) (or (char=? c #\space) (char=? c #\tab))) (begin (read-char) (loop (peek-char))))))) (read-number (lambda (l) (let ((c (peek-char))) (if (char-numeric? c) (read-number (cons (read-char) l)) (string->number (apply string (reverse l))))))) (read-id (lambda (l) (let ((c (peek-char))) (if (char-alphabetic? c) (read-id (cons (read-char) l)) (string->symbol (apply string (reverse l))))))))
;; -- skip spaces (skip-spaces) ;; -- read the next token (let loop () (let* ((location (make-source-location "*stdin*" (port-line (current-input-port)) (port-column (current-input-port)) -1 -1)) (c (read-char))) (cond ((eof-object? c) '*eoi*) ((char=? c #\newline) (make-lexical-token 'NEWLINE location #f)) ((char=? c #\+) (make-lexical-token '+ location #f)) ((char=? c #\-) (make-lexical-token '- location #f)) ((char=? c #\*) (make-lexical-token '* location #f)) ((char=? c #\/) (make-lexical-token '/ location #f)) ((char=? c #\=) (make-lexical-token '= location #f)) ((char=? c #\,) (make-lexical-token 'COMMA location #f)) ((char=? c #\() (make-lexical-token 'LPAREN location #f)) ((char=? c #\)) (make-lexical-token 'RPAREN location #f)) ((char-numeric? c) (make-lexical-token 'NUM location (read-number (list c)))) ((char-alphabetic? c) (make-lexical-token 'ID location (read-id (list c)))) (else (errorp "PARSE ERROR : illegal character: " c) (skip-spaces) (loop))))))))
;;; ;;;; Environment management ;;; (define *env* (list (cons '$$ 0))) (define (init-bindings) (set-cdr! *env* '()) (add-binding 'cos cos) (add-binding 'sin sin) (add-binding 'tan tan) (add-binding 'expt expt) (add-binding 'sqrt sqrt)) (define (add-binding var val) (set! *env* (cons (cons var val) *env*)) val) (define (get-binding var) (let ((p (assq var *env*))) (if p (cdr p) 0))) (define (invoke-proc proc-name args) (let ((proc (get-binding proc-name))) (if (procedure? proc) (apply proc args) (begin (display "ERROR: invalid procedure:") (display proc-name) (newline) 0))))
;;; ;;;; The main program ;;; (define (display-result v) (if v (begin (display v) (newline))) (display-prompt)) (define (display-prompt) (display "[calculator]> ") (force-output)) (define calc (lambda () (call-with-current-continuation (lambda (k) (display "********************************") (newline) (display "* Mini calculator in Scheme *") (newline) (display "* *") (newline) (display "* Enter expressions followed *") (newline) (display "* by [RETURN] or 'quit()' to *") (newline) (display "* exit. *") (newline) (display "********************************") (newline) (init-bindings) (add-binding 'quit (lambda () (k #t))) (letrec ((errorp (lambda (message . args) (display message) (if (and (pair? args) (lexical-token? (car args))) (let ((token (car args))) (display (or (lexical-token-value token) (lexical-token-category token))) (let ((source (lexical-token-source token))) (if (source-location? source) (let ((line (source-location-line source)) (column (source-location-column source))) (if (and (number? line) (number? column)) (begin (display " (at line ") (display line) (display ", column ") (display (+ 1 column)) (display ")"))))))) (for-each display args)) (newline))) (start (lambda () (calc-parser (make-lexer errorp) errorp)))) (display-prompt) (start)))))) (calc) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; END FILE: calc.scm ;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
About this egg
Author
Dominique Boucher
Repository
This egg is hosted on the CHICKEN Subversion repository:
https://anonymous@code.call-cc.org/svn/chicken-eggs/release/5/lalr
If you want to check out the source code repository of this egg and you are not familiar with Subversion, see this page.
Version history
- 2.5.0
- Updated to upstream version 2.5.0
- 2.4.3
- Using (require-extension extras) in lalr.scm [thanks to Matt Gushee]
- 2.4.1
- Updated to upstream version 2.4.1; license updated to LGPL-3
- 2.3.2
- Documentation converted to wiki format
- 2.3.1
- Bug fix in lalr-driver to clear the input variable every time a parser is called
- 2.3.0
- Updated to upstream version 2.3.0; functionality split in two modules
- 2.1.2
- Error-handler is now called with complete token
- 2.1.1
- Fixed missing syntax attribute in setup script
License
This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A full copy of the Lesser GPL license can be found at <http://www.gnu.org/licenses/>.