genturfa'i (historical revision 21596) - The CHICKEN Scheme wiki

You are looking at historical revision 21596 of this page. It may differ significantly from its current revision.

genturfa'i

.i lo la .ckim. ke pe'a jajgau ratcu ke'e genturfa'i

Description

genturfa'i is a Scheme packrat parser.

Packrat parsing is a memoizing, backtracking, recursive descent parsing technique that runs in time and space linear to the size of the input text O(n).

Packrat parsers require more memory than traditional LL(k) parsers, but are more expressive. Unlike traditional LL(k) parsers, which have a separate tokenization step, Packrat parsers combine lexical analysis and grammar parsing into a single step, using the same language for both.

The word genturfa'i is a Lojban lujvo, consisting of the gismu gerna, stura, and facki. It translates to "grammar structure discover."

Authors

.alyn.post.. Inspired by the packrat parser authored by Tony Garnock-Jones.

Requirements

Requires srfi-1 (list library), srfi-6 (string ports), srfi-9 (records), srfi-13 (string libraries), srfi-28 (basic string formatting), srfi-37 (args-fold), srfi-39 (parameter objects) srfi-61 (more general cond clause), regex (regular expressions), and utf8.

The filter procedure from srfi-1 is used.

srfi-6 (with Chicken Scheme's extensions) is used for reading parse input from a port. It can be removed if you do not need to parse input from a port.

srfi-9 is used to construct records with no more than two elements. These calls could be replaced by cons if you wish to remove this dependency.

srfi-13 is required only to return the remaining characters in the input buffer after a successful parse. If your grammar manually checks for the end of the input, this dependency could be removed.

srfi-28 is used to format the module version number and for debugging support. It could be excluded with minor modification to the code.

srfi-37 is used to process command-line arguments.

srfi-39 is used to process command-line arguments and to dynamically configure the parser.

srfi-61 is used in the memoization code and regular expression token matching code.

regex is used to match input for certain terminal rules.

utf8 is used for utf8 character support.

srfi-1, srfi-6, srfi-9, srfi-13, srfi-28, srfi-39, and srfi-61 are built into Chicken Scheme.

The srfi-37, regex and utf8 eggs must be installed.

genturfa'i uses DSSSL style lambda lists, which are not standard R5RS Scheme. DSSSL style lambda lists are build into Chicken.

Documentation

Usage

With genturfa'i, A parser is expressed using a parsing expression grammar, or PEG. This language describes the grammar that genturfa'i is parsing. It is also possible to construct a grammar directly in Scheme.

This module implements two methods for writing parsers. The recommended method is to write your grammar in PEG and compile the grammar to Scheme code.

'This method of writing parsers has not yet been written.'

Your grammar can also be written directly in Scheme, using the primitive grammar construction operators provided by genturfahi. As this is the only mode that currently exists, it will be covered in this documentation.

A Note About Lojban

Lojban is a carefully constructed spoken language with an unambiguous grammar.

I've written this parser with the intention of parsing Lojban's PEG grammar., as expressed in the main grammar, morphology, and morphology header files.

Just like the name of this module, the routines in this module are named with Lojban words. If you are unfamiliar with Lojban, a translation of the words that appear in the API is provided below.

English translation of Lojban words

This table provides an english glass for each Lojban word that appears in the API documentation. This table is not intended to have any truth value outside of this document. For formal English glosses of these words, please consult the Lojban to English Translation Dictionary.

Lojban word	English gloss
cfari	begin
cmene, cme	name
fanmo	end
genturfa'i	parse
jalge	result
javni	rule
je	logical and
jonai	logical exclusive or
lerfu	character
mapti	match
midju	middle
morji	memoize
namapti	nonmatch
naselci	nonterminal
nilcla	length
nunjavni	rule generator
nunvalsi	token generator
pabalvi	next
porsi, poi	ordered sequence
rodacmene	names
rodajavni	rules
rodajalga	results
rodanunjavni	rules generator
rodanunjalga	results generator
rodavalsi	tokens
secuxna	option
sezvati, zva	location
samselpla	code
sumti	function argument
tolmohi	clear memoization
valsi, val	token

Writing A Parser

'This section describes features that are not yet implemented.'

This section will eventually document how to write a parser in PEG. As that code is not yet written, please refer to the API documents.

Configuration

'This section describes features that are not yet implemented.'

Debugging

'Debugging is not yet implemented.'

Debugging is used to see what rules are being called as the parser tries to parse the input.

;;
;; enable debug output
;;
(secuxna debug #t)

Profiling

'Profiling is not yet implemented.'

Profiling is used to determine how much time is being spent in each grammar rule.

;;
;; enable profiling
;;
(secuxna profile #t)

Memoization

'This option is not yet implemented.'

By default, memoization is enabled to ensure that the parser runs in linear space and time. For some grammars, memoization does not provide any performance benefit, while the extra cache lookup slows the parser down.

Memoization can be turned completely off:

;;
;; turn memoization completely off.
;;
(secuxna memoize #f)

Or it can be turned off for specific non-terminal rules:

;;
;; turn memoization off for non-terminal rule
;; 'javnipa' and 'javnire'.
;;
(secuxna memoize ((javnipa #f) (javnire #f)))

Changing the sentinel character

'This option is not yet implemented.'

A sentinel character is placed at the end of the parse input to indicate the end of input. This character cannot be a valid terminal character in the grammar. Placing a sentinal character at the end of the parse input is an optimization that allows the terminal matching rules to avoid checking for the end of the parse input when they try to match the parse input. As these rules are called frequently, removing this check can speed up parsing.

This is also the character that is returned when the end of input is explicitely matched.

By default, #\nul is used as a sentinal character in the parse input to indicate an end of input. If this character appears in your grammar, you'll need to change the sentinal character to something else:

;;
;; change the sentinal character used to
;; detect the end of the parse input.
;;
(secuxna sentinal #\nul) ; default

Changing the empty string

'This option is not yet implemented.'

The empty string is a parse rule matching zero characters. By default, the token generated by this rule is the zero length string "". To change the token produced by the empty string:

;;
;; change the token returned when we
;; match against the empty string.
;;
(secuxna empty-string "") ; default

Changing non-matching parse result

'This option is not yet implemented.'

When parse input does not match the grammar, a token is returned indicated non-match. By default, the non-matching token is #f. To change the token returned when the parse input does not match the grammar:

;;
;; change the token returned when the parse
;; input does not match the grammar.
;;
(secuxna nonmatch-token #f) ; default

API Documentation

This is the low-level interface for writing parsers directly in Scheme. This code is used by the compiler to generate your parser. It is not designed to be used directly.

As this interface is written specifically to support the PEG parser generator, it is expressive but not concise. A more concise syntax could be written on top of this API. I do not plan on developing this syntax, but if you would like to write parsers directly in Scheme, please feel free to add support for a more concise syntax. See the History section of this document for a starting idea.

This API defines the following components:

lerfu-porsi: The parse input with an associated position.
nunjavni: A generator that creates grammar rules.
javni: A grammar rule that creates a token from the parse input and advances the input position.
mapti: A match continuation for a grammar rule.
namapti: A non-match continuation for a grammar rule.
nunvalsi: A generator that returns parse results.
javni-valsi: The token generated by a grammar rule.
samselpla: Code that modifies the syntax tree generated by the parser.
genturfahi: The parser, containing all of the grammar rules.

The parser runs as follows:

A lerfu-porsi encapsulates the input to parse. It tracks the current position of the input based on the parse.

A nunjavni expects zero or more javni and generates a new javni. The parser is constructed by linking together javni through nunjavni generators.

A javni expects a lerfu-porsi, mapti, and namapti. mapti and namapti are continuations representing the next rule in the grammar. The set of all javni describe the grammar to parse.

Both mapti and namapti expect a lerfu-porsi (possibly advanced by the previous javni). mapti also expects a nunvalsi, the generator for the parse tree. mapti and namapti are used for backtracking and general control flow during the parse.

A nunvalsi generates either a javni-valsi or a list of javni-valsi. javni-valsi are not immediately generated, as future backtracking could invalidate a currently matched result. nunvalsi generate the parse tree resulting from the input when called.

A javni-valsi is a name and associated token. The token can be a single character or the entire syntax tree. names must be symbols, and are used to associate a piece of the abstract syntax tree with a variable name when executing samselpla. The parser converts the input in lerfu-porsi to javni-valsi.

samselpa is code embedded in the grammar. It is executed after a successful parse, modifying the syntax tree before it is returned from the parser. Often it is easier to work with the syntax tree "in place" rather than working on the entire syntax tree at once. samselpla is the mechanism for working with the syntax tree "in place."

genturfahi expects a lerfu-porsi and javni containing the starting grammar rule. It provides the terminal mapti and namapti continuations that generate the parse results by generating a javni-valsi from the nunvalsi and returning the parsed syntax tree.

Details on each component are below.

lerfu-porsi

[record] lerfu-porsi

The string to parse and the current parse position of that string. This structure is used by the parser to store the input to the parser.

It contains the following members:

zva: An integer which is the current index of poi.
poi: The string which will be parsed.

[procedure] (make-lerfu-porsi zva poi) => lerfu-porsi

zva: An integer describing the current position of the parse input.
poi: The parse input, as a specially formatted string. (See make-lerfu-porsi-string and make-lerfu-porsi-port.)

Create a lerfu-porsi from the integer index zva and the parse input poi. poi is specially formatted, and must come from calling make-lerfu-porsi-string or make-lerfu-porsi-port. It has an appended character acting as a "sentinel" for purposes of detecting the end of the input buffer.

zva must be less than or equal to the string-length of poi.

[procedure] (make-lerfu-porsi-pabalvi-lerfu porsi) => lerfu-porsi

porsi: the parse position to advance.

Create a lerfu-porsi from an existing lerfu-porsi by advancing the index zva by a single character.

It is an error if porsi is not of type lerfu-porsi.

Behavior is undefined if this procedure is called when lerfu-porsi-fanmo? returns #t for this porsi.

[procedure] (make-lerfu-porsi-pabalvi-valsi porsi nilcla) => lerfu-porsi

porsi: the parse position to advance.
nilcla: The number of characters the position will be advanced by.

Create a lerfu-porsi from an existing lerfu-porsi by advancing the index zva by the length of lefpoi.

It is an error if porsi is not of type lerfu-porsi.

Behavior is undefined if this procedure is called with a lefpoi whose length is greater than the number of characters in remaining in the parse input.

[procedure] (lerfu-porsi? porsi) => boolean

porsi: the object to test.

Return #t if porsi is of type lerfu-porsi and #f otherwise.

[procedure] (lerfu-porsi-zva porsi) => zva

porsi: A lerfu-porsi.

Return the zva member from porsi.

It is an error if porsi is not of type lerfu-porsi.

[procedure] (lerfu-porsi-poi porsi) => poi

porsi: A lerfu-porsi.

Return the poi member from porsi.

It is an error if porsi is not of type lerfu-porsi.

[procedure] (lerfu-porsi->string porsi) => string

porsi: A lerfu-porsi.

Return A string representation of porsi. This string representation is for debug use only, and cannot be converted back to a lerfu-porsi.

It is an error if porsi is not of type lerfu-porsi.

[procedure] (make-lerfu-porsi-string string) => porsi

string: The input string to parse.

Return a lerfu-porsi with string as the parse input. string is copied into the lerfu-porsi.

[procedure] (make-lerfu-porsi-port port) => porsi

port: The input port to parse.

Return a lerfu-porsi with port as the parse input. port is read into a string before being copied into the lerfu-porsi, as the parser may need to backtrack to any previous location in the string.

[procedure] (lerfu-porsi-string porsi) => string

porsi: A lerfu-porsi.

Return the characters that have not been parsed from porsi. This routine copies the characters from the buffer into a new string.

If all of the characters were parsed from lerfu-porsi, return "".

[procedure] (lerfu-porsi-lerfu porsi) => character

porsi: A lerfu-porsi.

Return the character at the current parse position zva in the parse input poi.

It is an error if porsi is not of type lerfu-porsi.

[procedure] (lerfu-porsi-fanmo? porsi) => boolean

porsi: A lerfu-porsi.

Return #t if this porsi has no more input to parse, #f otherwise.

It is an error if porsi is not of type lerfu-porsi.

javni-valsi

[record] javni-valsi

A javni-valsi is a token with an (optionally) associated name generated from a javni, indicating a match of that javni.

It contains the following members:

cme: The name associated with this token, or #f if there is no associated name.
val: The token generated from matching a rule.

[procedure] (make-javni-valsi cme val) => javni-valsi

cme: The name associated with this token, or #f if there is no associated name.
val: The token generated from matching a rule.

Create a javni-valsi from the name cme and the token val. If this javni-valsi has no associated cmene, cme can be #f.

val may be an object of any type, and is the current abstract syntax tree generated from this parse, as modified by samselpla executed after the parse succeeds.

[procedure] (javni-valsi? valsi) => boolean

valsi: the object to test.

Return #t if valsi is of type javni-valsi and #f otherwise.

[procedure] (javni-valsi-cme valsi) => cmene

valsi: A javni-valsi.

Return the cmene member from valsi.

It is an error if valsi is not of type javni-valsi.

[procedure] (javni-valsi-val valsi) => val

valsi: A javni-valsi.

Return the val member from valsi.

It is an error if valsi is not of type javni-valsi.

[procedure] (javni-nunvalsi-val nunvalsi) => val / (val ...)

nunvalsi: A javni-valsi generator.

Generate the javni-valsi from nunvalsi. If the result is a single javni-valsi, return the val member of that result. Otherwise, treat the result as a tree containing javnli-valsi and recursively retrieve the val member from each javni-valsi node in the tree.

It is an error if valsi is not (lists of) of type javni-valsi.

[procedure] (javni-rodavalsi-val rodavalsi) => val / (val ...)

rodavalsi: A javni-valsi.

This helper routine for javni-nunvalsi-val returns the val member of rodavalsi if rodavalsi is a javni-valsi, calling itself recursively every time rodavalsi is a list.

It is an error if valsi is not of type javni-valsi.

[procedure] (javni-valsi->string valsi) => string

valsi: A javni-valsi.

Return A string representation of valsi. This string representation is for debug use only, and cannot be converted back to a javni-valsi.

It is an error if valsi is not of type javni-valsi.

nunvalsi

A routine that generates a valsi.

[procedure] (venunjmina-nunvalsi cmene) => nunvalsi

cmene: The name to associated with the nunvalsi

If cmene is not #f, return a procedure that, when called, will produce a javni-valsi with a cme of cmene and a val resulting from calling the passed in nunvalsi.

This procedure is used by the ordered choice and optional javni to generate new nunvalsi when those rules have an associated cmene.

[procedure] (vejmina-nunvalsi cmene nunvalsi) => nunvalsi

cmene The name to associate with the newly generated nunvalsi.
nunvalsi: A javni-valsi generator.

Create a new javni-valsi with a cme of cmene and a val from the val of generating a valsi from nunvalsi.

This routine is is generated by venunjmina-nunvalsi when the caller provides a cmene.

[procedure] (vejmina-nunvalsi-nacmene nunvalsi) => nunvalsi

nunvalsi: A javni-valsi generator.

Return nunvalsi. This routine is a no-op generated by venunjmina-nunvalsi when the caller has not provided a cmene.

[procedure] (venunjmina-rodanunvalsi cmene) => nunvalsi

cmene: The name to associated with the nunvalsi

If cmene is not #f, return a procedure that, when called, will produce a javni-valsi with a cme of cmene and a val resulting from calling the passed in nunvalsi.

This procedure is used by the zero or more and sequence javni to generate new nunvalsi when those rules have an associated cmene.

[procedure] (vejmina-rodanunvalsi cmene . rodanunvalsi) => nunvalsi

cmene The name to associate with the newly generated nunvalsi.
rodanunvalsi A list of nunvalsi.

Create a new javni-valsi with a cme of cmene and a val being a list of the val from the valsi generated from each nunvalsi in rodanunvalsi.

This routine is is generated by venunjmina-rodanunvalsi when the caller provides a cmene.

[procedure] (vejmina-rodanunvalsi-nacmene . rodanunvalsi) => nunvalsi

rodanunvalsi A list of nunvalsi.

Create a new javni-valsi with a cme of #f and a val being a list of the val from the valsi generated from each nunvalsi in rodanunvalsi.

This routine is generated by venunjmina-rodanunvalsi when the caller does not provide a cmene.

nunjavni

A routine that generates a javni.

[procedure] (nunjavni-lerfu lerfu #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching lerfu.
lerfu: The character to match.

Generate a javni that matches a single lerfu.

nunjavni-lerfu corresponds to a nonterminal symbol in PEG syntax.

[procedure] (nunjavni-. #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching the next character.

Generate a javni that matches the next character in the parse input, regardless of what that character is. Call mapti with the matched character. If there are no characters remaining in the parse input, call namapti.

[procedure] (nunjavni-e #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by this rule.

Generate a javni that matches the empty string.

nunjavni-fanmo corresponds to the empty string expression in PEG syntax.

[procedure] (nunjavni-fanmo #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching the end of the lerfu-porsi.

Generate a javni that matches the end of the lerfu-porsi.

nunjavni-fanmo corresponds to a nonterminal symbol in PEG syntax.

[procedure] (nunjavni-valsi valsi #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching cmene.
valsi: The string to try to match.

Generate a javni that matches the parse input against valsi.

nunjavni-valsi matches a literal string, and will be faster than nunjavni-re or a composition of individual nunjavni-lerfu when the input to match can be expressed as a literal string.

[procedure] (nunjavni-re pattern #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching pattern.
pattern: the regular expression to try to match.

Generate a javni that matches the next character(s) in the parse input against the regular expression pattern. If pattern matches the next character(s) in the parse input, call mapti with the matched characters. Otherwise continue with namapti.

The character #\^ is prepended to pattern before compiling the regular expression.

[procedure] (nunjavni-* sumti-javni #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching sumti-javni.
sumti-javni: The rule to match zero-or-more times.

Generate a javni that runs the passed in sumti-javni until it generates a namapti. The javni returned by nunjavni-* never calls its namapti continuation, always calling mapti.

nunjavni-* corresponds to the zero-or-more expression in PEG syntax.

[procedure] (nunjavni-+ sumti-javni #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by this rule.
sumti-javni: The rule to match one-or-more times.

Generate a javni that runs the passed in sumti-javni until it generates a namapti. The javni returned by nunjavni-+ calls mapti if the passed in sumti-javni matches one or more times, and calls namapti if sumti-javni does not match.

nunjavni-+ corresponds to the one-or-more expression in PEG syntax.

[procedure] (nunjavni-? sumti-javni #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by this rule.
sumti-javni: The rule to optionally match.

Generate a javni that runs the passed in sumti-javni. The javni returned by nunjavni-? never calls its namapti continuation, always calling mapti.

nunjavni-? corresponds to the optional expression in PEG syntax.

[procedure] (nunjavni-& sumti-javni) => javni

sumti-javni: The rule to match.

Generate a javni that runs the passed in sumti-javni. The javni returned by nunjavni-& calls mapti or namapti according to the result from sumti-javni, but it does not advances the lerfu-porsi when mapti is called.

nunjavni-* corresponds to the and-predicate expression in PEG syntax.

[procedure] (nunjavni-! sumti-javni) => javni

sumti-javni: The rule to match.

Generate a javni that runs the passed in sumti-javni. The javni returned by nunjavni-& calls mapti if the sumti-javni calls namapti, and it calls mapti if the sumti-javni calls namapti.

nunjavni-! corresponds to the not-predicate expression in PEG syntax.

[procedure] (nunjavni-je #!rest rodajavni #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching all rodajavni.
rodajavni: one or more rules to match in sequence.

Generate a javni that runs each javni in the passed in rodajavni in order. The javni returned by nunjavni-je calls mapti if every javni in rodajavni matches, and namapti if any javnri in rodajavni does not match.

nunjavni-je corresponds to the sequence expression in PEG syntax.

[procedure] (nunjavni-jonai #!rest rodajavni #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching any rodajavni.
rodajavni: one or more rules to match in sequence.

Generate a javni that runs each javni in the passed in rodajavni in order. The javni returned by nunjavni-jonai calls mapti with the result from the first javni in rodajavni that matches, ignoring all remaining javni in rodajavni. nunjavni-je calls namapti if all the javni in rodajavni do not match.

nunjavni-je corresponds to the ordered-choice expression in PEG syntax.

[procedure] (nunjavni-morji sumti-javni) => javni

sumti-javni: the rule to match.

Generate a javni that runs the passed in sumti-javni and stores the result. If the javni returned by nunjavni-morji is called again with the same lerfu-porsi, the stored mapti or namapti is called instead of calling sumti-javni again.

nunjavni-je corresponds to memoization in packrat parsing.

Unlink other nunjavni, this rule generator does not accept a cmene key.

[syntax] (nunjavni-naselci sumti-javni) => javni

sumti-javni: the rule defined in the environment being captured.

nunjavni-nasleci encloses sumti-javni in a lambda, which captures the environment in which sumti-javni is defined.

This macro is used in the letrec used to define the non-terminals in a grammar to allow mutual recursion.

Because this is a macro, if you need to associated a cmene with this sumti-javni, use nunjavni-cmene.

[procedure] (nunjavni-samselpla samselpla sumti-javni #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching sumti-javni and executing samselpla.
samselpla: the code to execute if the generated javni calls mapti.
sumti-javni: The rule generating javni-valsi for samselpla.

Generate a javni that runs the passed in sumti-javni and, if the sumti-javni calls mapti, calls samselpla by passing any tokens in the jalga with associated cmene as DSSL-style #!key parameters.

If sumti-javni does not match, samselpla is not called. There is no way to execute code for a namapti, as there is only a single namapti called during a parse. Code that should be executed when a parse fails can be run after the parse completes; see genturfahi.

[procedure] (nunjavni-cmene sumti-javni #!key cmene) => javni

cmene: The optional name for the javni-valsi generated by successfully matching sumti-javni.
sumti-javni: the rule to match.

Generate a javni that runs the passed in sumti-javni and, if the sumti-javni matches, associates cmene with the javni-valsi associated with the mapti.

If sumti-javni does not match, call namapti without associating cmene with the result.

javni

A routine that, given a lerfu-porsi, creates a javni-valsi indicating match or nonmatch of that javni for that lerfu-porsi. This javni-valsi is then forwarded to the next javni by means of calling mapti for a match or namapti for a non-match.

A javni is provided the following parameters:

[procedure] (javni lerfu-porsi mapti namapti) => jalge

lerfu-porsi: the current parse position.
mapti: continuation for rule match.
namapti: continuation for rule non-match.

Note that jalge is not returned until the entire parse succeeds, and may not return at all on failure, if another path succeeds in the parse.

mapti .ijo namapti

A mapti is a continuation called by a javni when that javni matches the current lerfu-porsi.

A namapti is a continuation called by a javni when that javni does not match the current {lerfu-porsi}.

mapti and namapti have the signatures:

[procedure] (mapti lerfu-porsi nunvalsi) => jalge
[procedure] (namapti lerfu-porsi) => jalge

lerfu-porsi: The current parse position.
nunvalsi: A generated returning the current parse result.

genturfahi accepts the top-level parse rule and returns the result to the caller, rather than calling any additional mapti or namapti.

genturfahi

[procedure] (genturfahi-tolmohi) => '()

Clear the memoization cache. This is done automatically after each parse.

[procedure] (genturfahi javni) => rodavalsi

javni: The starting rule of the grammar.

Parse the javni, which is assumed to be the initial rule of the grammar. Return rodavalsi, the result of the parse.

If the parse succeed, rodavalsi is the abstract syntax tree as modified by any samselpla embedded in the grammar. If the parse failed, #f is returned. In both cases, the lerfu-porsi representing the input remaining after the parse is returned as a second value.

[procedure] (genturfahi* javni) => (rodavalsi lefpoi)

javni: The starting rule of the grammar.

genturfahi* calls genturfahi and converts the multiple arguments returned by genturfahi to a list and returns them.

This routine is used in the test framework, and might be useful elsewhere.

[constant] genturfahi-version-major

A number representing the major version of this module.

[constant] genturfahi-version-minor

A number representing the minor version of this module.

[constant] genturfahi-version-patch

A number representing the patch version of this module.

[constant] genturfahi-version

A string representing the version number of this module.

Examples

These examples show how to use the genturfahi API to construct a parser.

Operators

;;;; lerfu.scm

(use genturfahi)
(use test)
;;;
;;; terminal:
;;;
;;; lerfu <- \#a
;;;
(define (lerfu)
  (set! genturfahi-lerfu
        (genturfahi* (nunjavni-lerfu #\a)))

  ; match the only character this parser matches.
  ;
  (test '(#\a "") (genturfahi-lerfu "a"))

  ; There is no rule to match any other character,
  ; so they don't match.
  ;
  (test '(#f "b") (genturfahi-lerfu "b"))
  (test '(#f "c") (genturfahi-lerfu "c"))

  ; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-lerfu ""))

  ; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  ; this rule won't even match two #\a,
  ; only one.
  ;
  (test '(#\a "a") (genturfahi-lerfu "aa"))
  (test '(#\a "b") (genturfahi-lerfu "ab"))
  (test '(#\a "c") (genturfahi-lerfu "ac"))

  ; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  (test '(#f "da") (genturfahi-lerfu "da"))
  0)

(test-group "lerfu"
  (lerfu))

;;;; optional.scm

(use genturfahi)
(use test)

;;;
;;; optional: e?
;;;
;;; optional <- \#a?
;;;
(define (optional?)
  (set! genturfahi-optional
        (genturfahi* (nunjavni-? (nunjavni-lerfu #\a))))

  ; match the only character this parser matches.
  ;
  (test '(#\a "") (genturfahi-optional "a"))

  ; There is no rule to match any other character,
  ; but since this rule is optional, the empty string
  ; is returned.
  ;
  (test '("" "b") (genturfahi-optional "b"))
  (test '("" "c") (genturfahi-optional "c"))

  ; there is no rule for the emtpy string, but the
  ; optional rule matches anyway.
  ;
  (test '("" "") (genturfahi-optional ""))

  ; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  ; this rule won't even match two #\a,
  ; only zero or one.
  ;
  (test '(#\a "a") (genturfahi-optional "aa"))
  (test '(#\a "b") (genturfahi-optional "ab"))
  (test '(#\a "c") (genturfahi-optional "ac"))

  ; later characters that would match don't if there
  ; is no rule to match earlier characters.  We match
  ; zero #\a at the beginning of the parser input and
  ; return.
  ;
  (test '("" "da") (genturfahi-optional "da"))
  0)

(test-group "optional?"
  (optional?))

;;;; zero-or-more.scm

(use genturfahi)
(use test)

;;;
;;; zero-or-more: e*
;;;
;;; zero-or-more <- \#a*
;;;
(define (zero-or-more)
  (set! genturfahi-zero-or-more
        (genturfahi* (nunjavni-* (nunjavni-lerfu #\a))))

  ; match the only character this parser matches.
  ; notice that it returns a list of characters,
  ; even if it only matches a single character.
  ;
  (test '((#\a) "") (genturfahi-zero-or-more "a"))

  ; though the rule matches even when the character isn't present
  ; the empty list indicates no characters matched.
  ;
  (test '(() "") (genturfahi-zero-or-more ""))

  ; We match as many #\a as we can.
  ;
  (test '((#\a #\a) "") (genturfahi-zero-or-more "aa"))
  (test '((#\a #\a #\a) "") (genturfahi-zero-or-more "aaa"))

  ; There is no rule to match any other character,
  ; but since this rule matches zero-or-more, it
  ; succeeds in matching zero #\a.
  ;
  (test '(() "b") (genturfahi-zero-or-more "b"))
  (test '(() "c") (genturfahi-zero-or-more "c"))

  ; As well, it will match all #\a before ending
  ; the parse.  These match, but they don't parse
  ; the whole buffer!
  ;
  (test '((#\a) "b") (genturfahi-zero-or-more "ab"))
  (test '((#\a) "c") (genturfahi-zero-or-more "ac"))

  (test '((#\a #\a) "b") (genturfahi-zero-or-more "aab"))
  (test '((#\a #\a) "c") (genturfahi-zero-or-more "aac"))

  (test '((#\a #\a #\a) "b") (genturfahi-zero-or-more "aaab"))
  (test '((#\a #\a #\a) "c") (genturfahi-zero-or-more "aaac"))

  ; later characters that would match don't if there is no rule
  ; to match earlier characters.  This rule matches the empty
  ; list, but can't access the #\a later in the parse input.
  ;
  (test '(() "da") (genturfahi-zero-or-more "da"))
  (test '(() "daa") (genturfahi-zero-or-more "daa"))
  (test '(() "daaa") (genturfahi-zero-or-more "daaa"))
  0)

(test-group "zero-or-more"
  (zero-or-more))

;;;; one-or-more.scm

(use genturfahi)
(use test)

;;;
;;; one-or-more: e+
;;;
;;; one-or-more <- \#a+
;;;
(define (one-or-more)
  (set! genturfahi-one-or-more
        (genturfahi* (nunjavni-+ (nunjavni-lerfu #\a))))

  ; match the only character this parser matches.
  ; notice that it returns a list of characters,
  ; even if it only matches a single character.
  ;
  (test '((#\a) "") (genturfahi-one-or-more "a"))

  ; We match as many #\a as we can.
  ;
  (test '((#\a #\a) "") (genturfahi-one-or-more "aa"))
  (test '((#\a #\a #\a) "") (genturfahi-one-or-more "aaa"))

  ; but we don't match zero #\a.
  ;
  (test '(#f "") (genturfahi-one-or-more ""))
  (test '(#f "b") (genturfahi-one-or-more "b"))
  (test '(#f "c") (genturfahi-one-or-more "c"))

  ; As well, it will match all #\a before ending
  ; the parse.  These match, but they don't parse
  ; the whole buffer!
  ;
  (test '((#\a) "b") (genturfahi-one-or-more "ab"))
  (test '((#\a) "c") (genturfahi-one-or-more "ac"))

  (test '((#\a #\a) "b") (genturfahi-one-or-more "aab"))
  (test '((#\a #\a) "c") (genturfahi-one-or-more "aac"))

  (test '((#\a #\a #\a) "b") (genturfahi-one-or-more "aaab"))
  (test '((#\a #\a #\a) "c") (genturfahi-one-or-more "aaac"))

  ; later characters that would match don't if there is no rule
  ; to match earlier characters.  These rules don't match, as
  ; the input does not begin with #\a.
  ;
  (test '(#f "da") (genturfahi-one-or-more "da"))
  (test '(#f "daa") (genturfahi-one-or-more "daa"))
  (test '(#f "daaa") (genturfahi-one-or-more "daaa"))
  0)

(test-group "one-or-more"
  (one-or-more))

;;;; je.scm

(use genturfahi)
(use test)

;;;
;;; ordered sequence: e_1 e_2
;;;
;;; je <- \#a \#b \#c
;;;
(define (je)
  (set! genturfahi-je
        (genturfahi* (nunjavni-je (nunjavni-lerfu #\a)
                                  (nunjavni-lerfu #\b)
                                  (nunjavni-lerfu #\c))))

  ; matches each lerfu in sequence 
  ;
  (test '((#\a #\b #\c) "") (genturfahi-je "abc"))

  ; matches the first rule (#\a), but not the second or
  ; third (#\b and #\c), so this match fails.
  ;
  (test '(#f "a") (genturfahi-je "a"))
  (test '(#f "ac") (genturfahi-je "ac"))

  ; matches the first two rules (#\a and #\b), but not
  ; the third one.
  ;
  (test '(#f "ab") (genturfahi-je "ab"))
  (test '(#f "abd") (genturfahi-je "abd"))

  ; "abc" is the only correct sequence.  Every other
  ; combination of #\a, #\b, and #\c does not match.
  ;
  (test '(#f "acb") (genturfahi-je "acb"))
  (test '(#f "bac") (genturfahi-je "bac"))
  (test '(#f "bca") (genturfahi-je "bca"))
  (test '(#f "cab") (genturfahi-je "cab"))
  (test '(#f "cba") (genturfahi-je "cba"))

  ; there is no rule for #\d, no match
  ;
  (test '(#f "d") (genturfahi-je "d"))

  ; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-je ""))

  ; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  (test '((#\a #\b #\c) "a") (genturfahi-je "abca"))
  (test '((#\a #\b #\c) "b") (genturfahi-je "abcb"))
  (test '((#\a #\b #\c) "c") (genturfahi-je "abcc"))

  ; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  (test '(#f "dabc") (genturfahi-je "dabc"))
  0)

(test-group "je"
  (je))

;;;; jonai.scm

(use genturfahi)
(use test)

;;;
;;; ordered-choice: e_1 / e_2
;;;
;;; jonai <- \#a / \#b / \#c
;;;
(define (jonai)
  (set! genturfahi-jonai
        (genturfahi* (nunjavni-jonai (nunjavni-lerfu #\a)
                                     (nunjavni-lerfu #\b)
                                     (nunjavni-lerfu #\c))))

  ; matches (nunjavni-jonai (nunjavni-lerfu #\a) ...)
  ;
  (test '(#\a "") (genturfahi-jonai "a"))

  ; matches (nunjavni-jonai ... (nunjavni-lerfu #\b) ...)
  ;
  (test '(#\b "") (genturfahi-jonai "b"))

  ; matches (nunjavni-jonai ... (nunjavni-lerfu #\c))
  ;
  (test '(#\c "") (genturfahi-jonai "c"))

  ; there is no rule for #\d, no match
  ;
  (test '(#f "d") (genturfahi-jonai "d"))

  ; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-jonai ""))

  ; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  (test '(#\a "a") (genturfahi-jonai "aa"))
  (test '(#\a "b") (genturfahi-jonai "ab"))
  (test '(#\a "c") (genturfahi-jonai "ac"))

  ; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  ; none of these match.
  ;
  (test '(#f "da") (genturfahi-jonai "da"))
  (test '(#f "db") (genturfahi-jonai "db"))
  (test '(#f "dc") (genturfahi-jonai "dc"))
  0)

(test-group "jonai"
  (jonai))

;;;; and-predicate.scm

(use genturfahi)
(use test)

;;;
;;; and-predicate: &e
;;;
;;; and-predicate <- &\#a
;;;
(define (and-predicate)
  (set! genturfahi-and-predicate
        (genturfahi* (nunjavni-& (nunjavni-lerfu #\a))))

  ; the and-predicate matches as normal, but does not advance
  ; the input.
  ;
  (test '(#\a "a") (genturfahi-and-predicate "a"))

  ; It behaves like all other rules when there is no match,
  ; indicating no match.
  ;
  (test '(#f "b") (genturfahi-and-predicate "b"))
  (test '(#f "c") (genturfahi-and-predicate "c"))

  ; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-and-predicate ""))

  ; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  ; this rule won't even match two #\a,
  ; only one.
  ;
  (test '(#\a "aa") (genturfahi-and-predicate "aa"))
  (test '(#\a "ab") (genturfahi-and-predicate "ab"))
  (test '(#\a "ac") (genturfahi-and-predicate "ac"))

  ; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  (test '(#f "da") (genturfahi-and-predicate "da"))
  0)

(test-group "and-predicate"
  (and-predicate))

;;;; not-predicate.scm

(use genturfahi)
(use test)

;;;
;;; not-predicate: !e
;;;
;;; not-predicate <- !\#a
;;;
(define (not-predicate)
  (set! genturfahi-not-predicate
        (genturfahi* (nunjavni-! (nunjavni-lerfu #\a))))

  ; this not-predicate matches everything *but* \#a.
  ; not-predicate never advances the input.
  ;
  ; on match, we return the empty string.
  ;
  (test '(#f "a") (genturfahi-not-predicate "a"))
  (test '("" "b") (genturfahi-not-predicate "b"))
  (test '("" "c") (genturfahi-not-predicate "c"))

  ; we match the empty string (because the empty string
  ; isn't #\a.)
  ;
  (test '("" "") (genturfahi-not-predicate ""))

  ; there is no rule for the end-of-file
  ; These don't parse the whole buffer!
  ;
  (test '(#f "aa") (genturfahi-not-predicate "aa"))
  (test '(#f "ab") (genturfahi-not-predicate "ab"))
  (test '(#f "ac") (genturfahi-not-predicate "ac"))
  (test '("" "ba") (genturfahi-not-predicate "ba"))
  (test '("" "bb") (genturfahi-not-predicate "bb"))
  (test '("" "bc") (genturfahi-not-predicate "bc"))
  (test '("" "ca") (genturfahi-not-predicate "ca"))
  (test '("" "cb") (genturfahi-not-predicate "cb"))
  (test '("" "cc") (genturfahi-not-predicate "cc"))
  0)

(test-group "not-predicate"
  (not-predicate))

;;;; empty-string.scm

(use genturfahi)
(use test)

;;;
;;; empty-string:
;;;
;;; empty-string <- ""
;;;
(define (empty-string)
  (set! genturfahi-empty-string (genturfahi* (nunjavni-e)))

  ; the empty string is zero characters long, so
  ; it always matches.
  ;
  (test '("" "") (genturfahi-empty-string ""))
  (test '("" "a") (genturfahi-empty-string "a"))
  (test '("" "b") (genturfahi-empty-string "b"))
  (test '("" "c") (genturfahi-empty-string "c"))
  (test '("" "abc") (genturfahi-empty-string "abc"))
  0)

(test-group "empty-string"
  (empty-string))

;;;; end-of-input.scm

(use genturfahi)
(use test)

;;;
;;; end-of-input:
;;;
;;; end-of-input <- [eof]
;;;
(define (end-of-input)
  (set! genturfahi-eof (genturfahi* (nunjavni-fanmo)))

  ; The end of file only matches when we're at the end of
  ; the parse input.
  ;
  (test '(#\nul "") (genturfahi-eof ""))

  ; it doesn't match if here are any characters left.
  ;
  (test '(#f "a") (genturfahi-eof "a"))
  (test '(#f "b") (genturfahi-eof "b"))
  (test '(#f "c") (genturfahi-eof "c"))

  ; even if those remaining characters are the sentinel. 
  ;
  (test `(#f ,(make-string 1 #\nul)) (genturfahi-eof (make-string 1 #\nul)))
  0)

(test-group "end-of-input"
  (end-of-input))

;; jonai-naselci.scm: nonterminal symbols

(use genturfahi)
(use test)

;;;
;;; ordered choice with non-terminals
;;; this test is identical to the ordered choice file, though the
;;; grammar differs.
;;;
;;; jonai <- a / b / c
;;; a     <- #\a
;;; b     <- #\a
;;; c     <- #\a
;;;
(define (jonai-naselci)
  (set! genturfahi-jonai-naselci
    (letrec ((gerna (nunjavni-morji (nunjavni-jonai (nunjavni-naselci a)
                                                    (nunjavni-naselci b)
                                                    (nunjavni-naselci c))))
             (a (nunjavni-morji (nunjavni-lerfu #\a)))
             (b (nunjavni-morji (nunjavni-lerfu #\b)))
             (c (nunjavni-morji (nunjavni-lerfu #\c))))
      (genturfahi* gerna)))

  ; matches (nunjavni-jonai-naselci (nunjavni-lerfu #\a) ...)
  ;
  (test '(#\a "") (genturfahi-jonai-naselci "a"))

  ; matches (nunjavni-jonai-naselci ... (nunjavni-lerfu #\b) ...)
  ;
  (test '(#\b "") (genturfahi-jonai-naselci "b"))

  ; matches (nunjavni-jonai-naselci ... (nunjavni-lerfu #\c))
  ;
  (test '(#\c "") (genturfahi-jonai-naselci "c"))

  ; there is no rule for #\d, no match
  ;
  (test '(#f "d") (genturfahi-jonai-naselci "d"))

  ; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-jonai-naselci ""))

  ; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  (test '(#\a "a") (genturfahi-jonai-naselci "aa"))
  (test '(#\a "b") (genturfahi-jonai-naselci "ab"))
  (test '(#\a "c") (genturfahi-jonai-naselci "ac"))

  ; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  ; none of these match.
  ;
  (test '(#f "da") (genturfahi-jonai-naselci "da"))
  (test '(#f "db") (genturfahi-jonai-naselci "db"))
  (test '(#f "dc") (genturfahi-jonai-naselci "dc"))
  0)

(test-group "naselci"
  (jonai-naselci))

Executing Code

;; simple (no-op) example

(use genturfahi)

;; XXX: I haven't written this example yet!

;; samselpla.scm

(use genturfahi)
(use test)

;;;
;;; execute code on syntax tree.
;;;
;;; gerna <- a b c
;;; a     <- #\a
;;; b     <- #\b
;;; c     <- #\c
;;;
(define (samselpla)
  (set! genturfahi-samselpla
    (letrec ((gerna (nunjavni-morji
               ; concatenate the strings
               (nunjavni-samselpla
                 (lambda (#!key a b c) (string-append a b c))
                 (nunjavni-je
                   (nunjavni-cmene (nunjavni-naselci a) cmene: 'a:)
                   (nunjavni-cmene (nunjavni-naselci b) cmene: 'b:)
                   (nunjavni-cmene (nunjavni-naselci c) cmene: 'c:)))))
             (a (nunjavni-morji
               ; convert each character to a string.
               (nunjavni-samselpla
                 (lambda (#!key lerfu) (make-string 1 lerfu))
                 (nunjavni-lerfu #\a cmene: 'lerfu:))))
             (b (nunjavni-morji
               (nunjavni-samselpla
                 (lambda (#!key lerfu) (make-string 1 lerfu))
                 (nunjavni-lerfu #\b cmene: 'lerfu:))))
             (c (nunjavni-morji
               (nunjavni-samselpla
                 (lambda (#!key lerfu) (make-string 1 lerfu))
                 (nunjavni-lerfu #\c cmene: 'lerfu:)))))
      (genturfahi* gerna)))

  ; The code runs only after a a syntax tree is successfully
  ; generated.
  ;
  (test '("abc" "") (genturfahi-samselpla "abc"))

  ; It doesn't run at all if we fail to parse.
  ;
  (test '(#f "d") (genturfahi-samselpla "d"))

  ; Even if the parse partially matches before failing.
  ;
  (test '(#f "a") (genturfahi-samselpla "a"))
  (test '(#f "ab") (genturfahi-samselpla "ab"))

  ; But if we don't parse all of the input, that
  ; is a valid parse.
  ;
  (test '("abc" "a") (genturfahi-samselpla "abca"))
  (test '("abc" "b") (genturfahi-samselpla "abcb"))
  (test '("abc" "c") (genturfahi-samselpla "abcc"))
  0)

(test-group "samselpla"
  (samselpla))

Note that a tag should be placed outside of a nunjavni-jonai. If a tag is placed on a javni that is passed to a nunjavni-jonai, that same tag must be placed on every javni that is passed to that nunjavni-jonai.

The following example is undefined:

;; undefined interaction between nunjavni-jonai nunjavni-cmene

(use genturfahi)

;; XXX: This example isn't written yet!

And should instead be written as:

;; defined interaction between nunjavni-jonai nunjavni-cmene

(use genturfahi)

;; XXX: This example isn't written yet!

A Full Example

Here is a complete example for using this API to construct a parser. It is the same grammar presented in Chicken Gazette #5, which documents the packrat parser egg.

;; mex.scm

(use genturfahi)
(use test)

;;;
;;; This example appeared in chicken gazette #5 as the example for
;;; the packrat parser module:
;;;
;;;  http://gazette.call-cc.org/issues/5.html
;;;
;;; It is here ported to genturfa'i.
;;;
(define (mex)
  (let ((mex
    (letrec
      ((expr   (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-samselpla
                     (lambda (#!key a b) (+ a b))
                     (nunjavni-je
                       (nunjavni-cmene (nunjavni-naselci mulexp) cmene: 'a:)
                       (nunjavni-lerfu #\+)
                       (nunjavni-cmene (nunjavni-naselci mulexp) cmene: 'b:)))
                   (nunjavni-naselci mulexp))))
       (mulexp (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-samselpla
                     (lambda (#!key a b) (* a b))
                     (nunjavni-je
                       (nunjavni-cmene (nunjavni-naselci simple) cmene: 'a:)
                       (nunjavni-lerfu #\*)
                       (nunjavni-cmene (nunjavni-naselci simple) cmene: 'b:)))
                   (nunjavni-naselci simple))))
       (simple (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-naselci num)
                   (nunjavni-samselpla
                     (lambda (#!key a) a)
                     (nunjavni-je
                       (nunjavni-lerfu #\()
                       (nunjavni-cmene (nunjavni-naselci expr) cmene: 'a:)
                       (nunjavni-lerfu #\)))))))
       (num    (nunjavni-morji
                 (nunjavni-samselpla
                   (lambda (#!key x) (string->number (list->string x)))
                   (nunjavni-+ (nunjavni-naselci digit) cmene: 'x:))))
       (digit  (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-lerfu #\0)
                   (nunjavni-lerfu #\1)
                   (nunjavni-lerfu #\2)
                   (nunjavni-lerfu #\3)
                   (nunjavni-lerfu #\4)
                   (nunjavni-lerfu #\5)
                   (nunjavni-lerfu #\6)
                   (nunjavni-lerfu #\7)
                   (nunjavni-lerfu #\8)
                   (nunjavni-lerfu #\9)))))
      (genturfahi expr))))

    (test 2  (mex "2"))
    (test 22 (mex "22"))
    (test 4  (mex "2*2"))
    (test 4  (mex "2+2"))
    (test 16 (mex "2+2*7"))
    (test 28 (mex "(2+2)*7"))
    (test 42 (mex "3*4+5*6")))
    0)

(test-group "mex"
  (mex))

Source Code

The source code for this egg can be viewed by version:

trunk

A copy of this project is also hosted on github:

genturfahi

Salmonella Report

The Salmonella Report shows the result from the current nightly build.

genturfa'i

History

Brian Ford coined the modern terms PEGs and packrat parsing, which was the subject of his Master's Thesis. Much of their formal theory existed earlier, the technique having been discovered by Alexander Birman in 1970. More recent work on the topic has all been done by others. His Master's Thesis is an accessible introduction to the topic of packrat parsing.

Tony Garnock-Jones wrote a packrat parser for scheme in 2005. He later packaged this work with a json parser for mzscheme and published it. Tony has not updated his parser since 2006.

Tony's packrat parser was ported to Chicken Scheme by Christian Kellermann and Felix Winkelmann in 2009. In 2010, it was featured in Chicken Gazette 5.

Alan Post began using the packrat parser after it was introduced in Chicken Gazette #5 and encountered problems translating the Lojban grammar. Some of these problems were anticipated by Tony Garnock-Jones:

"...whereas an extensible parser would require some kind of reified representation of the composite parser, along the lines of

`((expr   (or (seq (num + num) ,(lambda (a _ b) (+ a b)))
              (seq (mulexp)    ,(lambda (a) a))))
  (mulexp (or (seq (num * num) ,(lambda (a _ b) (* a b))))))

which could then be interpreted, in order to parse some input, or stepped through, in order to debug a parser or parser extension, or used in various diagnostic printouts to help pinpoint errors in parsed text or parsing extension grammars."

genturfahi was written to address these shortcomings in Tony's packrat parser, and to continue the work of creating a full PEG parser in Scheme, capably of parsing a Lojban's grammar.

Wish List

Implement regular expression terminal for matching more than a single character out of the parse input.
Implement a PEG parser so grammar can be written in PEG format rather than being specified directly in Scheme.
Implement a configuration component for modifying the configuration of the parser.
Implement a debug option so the parse can be traced.
Implement a profiling option so grammar hotspots can be identified.

License

Copyright (c) 2010 ".alyn.post." <alyn.post@lodockikumazvati.org>
 
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Version History

0.0.1: Initial release.

genturfa'i

Description

Authors

Requirements

Documentation

Usage

A Note About Lojban

English translation of Lojban words

Writing A Parser

Configuration

Debugging

Profiling

Memoization

Changing the sentinel character

Changing the empty string

Changing non-matching parse result

API Documentation

lerfu-porsi

javni-valsi

nunvalsi

nunjavni

javni

mapti .ijo namapti

genturfahi

Examples

Operators

Executing Code

A Full Example

Source Code

Salmonella Report

History

Wish List

See Also

License

Version History