Editing page: Outdated egg! - The CHICKEN Scheme wiki

You can edit this page using wiki syntax for markup.

Article contents:

== Outdated egg!

This is an egg for CHICKEN 4, the unsupported old release.  You're almost certainly looking for [[/eggref/5/genturfahi|the CHICKEN 5 version of this egg]], if it exists.

If it does not exist, there may be equivalent functionality provided by another egg; have a look at the [[https://wiki.call-cc.org/chicken-projects/egg-index-5.html|egg index]]. Otherwise, please consider porting this egg to the current version of CHICKEN.

[[tags:egg]]

== genturfa'i

.i lo la .ckim. ke pe'a jajgau ratcu ke'e genturfa'i

[[toc:]]

=== Description

genturfa'i is a Scheme packrat parser.

Packrat parsing is a [[http://en.wikipedia.org/wiki/Memoization|memoizing]],
[[http://en.wikipedia.org/wiki/Backtracking|backtracking]],
[[http://en.wikipedia.org/wiki/Recursive_descent_parser|recursive
descent]] parsing technique that runs in
[[http://en.wikipedia.org/wiki/Big_O_notation|time and space linear to
the size of the input text]] O(n).

Packrat parsers require more memory than traditional LL(k) parsers,
but are more expressive.  They permit full lookahead and
backtracking.  Unlike traditional LL(k) parsers, which have a separate
[[http://en.wikipedia.org/wiki/Tokenization|tokenization]] step,
Packrat parsers combine lexical analysis and grammar parsing into a
single step, using the same language for both.

The word genturfa'i is a [[http://lojban.org/|Lojban]] lujvo, consisting
of the gismu gerna, stura, and facki.  It translates to  "grammar structure
discover," or "parser."

=== Authors

[[/users/alyn.post|.alyn.post.]].  Inspired by the [[packrat]] parser
authored by Tony Garnock-Jones.

=== Requirements

Requires srfi-1 (list library), srfi-6 (string ports), srfi-9 (records),
srfi-13 (string libraries), srfi-14 (character-set library), srfi-28
(basic string formatting), srfi-37 (args-fold), srfi-39 (parameter
objects), and srfi-69 (basic hash tables).

As well as the srfi above, genturfa'i requires the [[matchable]] and
[[sandbox]] eggs.

Several list-handling procedures from srfi-1 are used.

srfi-6 (with Chicken Scheme's extensions) is used for reading parse
input from a port.  It can be removed if you do not need to parse input
from a port.

srfi-9 is used to construct records with no more than two elements.
These calls could be replaced by {{cons}} if you wish to remove this
dependency.

srfi-13 is required only to return the remaining characters in the
input buffer after a successful parse.  If your grammar manually
checks for the end of the input, this dependency could be removed.

srfi-28 is used to format the module version number and for debugging
support.  It could be excluded with minor modification to the code.

srfi-37 is used to process command-line arguments.

srfi-39 is used to process command-line arguments and to dynamically
configure the parser.

srfi-69 is used in the memoization code.

matchable is used by the compiler and runtime system to generate a
parser.

srfi-1, srfi-6, srfi-9, srfi-13, srfi-28, and srfi-39 are built into
Chicken Scheme.

The srfi-37, srfi-69, matchable and sandbox eggs must be installed.

genturfa'i uses DSSSL style lambda lists, which are not standard
R5RS Scheme.  DSSSL style lambda lists are build into Chicken.

=== Documentation

==== Usage

With genturfa'i, A parser is expressed using a parsing expression
grammar, or PEG.  This language describes the grammar that
genturfa'i is parsing.  It is also possible to construct a grammar
directly in Scheme.

This module implements two methods for writing parsers.  The
recommended method is to write your grammar in  
[[http://en.wikipedia.org/wiki/Parsing_expression_grammar|PEG]]
and compile the grammar to Scheme code.

Your grammar can also be written directly in Scheme, using the
primitive grammar construction operators provided by {{genturfahi}}.
These same operators are used by the compiler to generate scheme
code, so there is no reason to use this interface directly.

==== A Note About Lojban

[[http://lojban.org/|Lojban]] is a carefully constructed spoken language
with an ''unambiguous grammar''.

I've written this parser with the intention of parsing
[[http://www.lojban.org/tiki/PEG|Lojban's PEG grammar]]
as expressed in the
[[http://bugs.call-cc.org/browser/release/4/jbogenturfahi/trunk/gerna.peg|grammar definition]] and
[[http://bugs.call-cc.org/browser/release/4/jbogenturfahi/trunk/rafske.peg|morphology]].

Just like the name of this module, the routines in this module are
named with Lojban words.  If you are unfamiliar with Lojban, a
translation of the words that appear in the API is provided below.

You do not need to understand Lojban in order to use genturfa'i.
The compiler translates your PEG grammar to Scheme using an
identifier you choose.

==== English translation of Lojban words

This table provides an english glass for each Lojban word that
appears in the API documentation.  This table is not intended to
have any truth value outside of this document.  For formal
English glosses of these words, please consult the
[[http://jbovlaste.lojban.org/|Lojban to English Translation Dictionary]].

<table>
<tr><th>Lojban word</th><th>English gloss</th></tr>
<tr><td>cfari</td><td>begin</td></tr>
<tr><td>cmene, cme</td><td>name</td></tr>
<tr><td>fanmo</td><td>end</td></tr>
<tr><td>genturfa'i</td><td>parse</td></tr>
<tr><td>jalge</td><td>result</td></tr>
<tr><td>javni</td><td>rule</td></tr>
<tr><td>je</td><td>logical and</td></tr>
<tr><td>jonai</td><td>logical exclusive or</td></tr>
<tr><td>lerfu</td><td>character</td></tr>
<tr><td>mapti</td><td>match</td></tr>
<tr><td>midju</td><td>middle</td></tr>
<tr><td>morji</td><td>memoize</td></tr>
<tr><td>namapti</td><td>nonmatch</td></tr>
<tr><td>naselci</td><td>nonterminal</td></tr>
<tr><td>nilcla</td><td>length</td></tr>
<tr><td>nunjavni</td><td>rule generator</td></tr>
<tr><td>nunvalsi</td><td>token generator</td></tr>
<tr><td>pabalvi</td><td>next</td></tr>
<tr><td>porsi, poi</td><td>ordered sequence</td></tr>
<tr><td>rodacmene</td><td>names</td></tr>
<tr><td>rodajavni</td><td>rules</td></tr>
<tr><td>rodajalga</td><td>results</td></tr>
<tr><td>rodanunjavni</td><td>rules generator</td></tr>
<tr><td>rodanunjalga</td><td>results generator</td></tr>
<tr><td>rodavalsi</td><td>tokens</td></tr>
<tr><td>secuxna</td><td>option</td></tr>
<tr><td>sezvati, zva</td><td>location</td></tr>
<tr><td>samselpla</td><td>code</td></tr>
<tr><td>sumti</td><td>function argument</td></tr>
<tr><td>tolmohi</td><td>clear memoization</td></tr>
<tr><td>valsi, val</td><td>token</td></tr>
</table>

==== Writing A Parser

Parsing is the process of converting an input stream into a symbolic
expression.  A language's grammar describes the way in which an
input stream is unambiguously translated into a symbolic expression.

genturfa'i uses a language called PEG to describe a grammar.  PEG is
particularly expressive, permitting full lookahead and backtracking.
It can parse grammars that other grammar languages (like EBNF) cannot,
at the expense of requiring more memory to parse the same size input.

==== Configuration

Configuration options may appear at the top of your PEG file.  They
must be specified before any productions.

===== Grammar Name

By default, the grammar described by your PEG file is compiled to
scheme and bound to a variable named "gerna".  To change the name
of this variable:

<enscript highlight=scheme>
{(define-name "variable-name")}
</enscript>

If you have more than one start production (see below) you must
define a variable name for each production:

<enscript highlight=scheme>
{(define-name '("variable-name-0" "variable-name-1"))}
</enscript>

===== Start Production

By default, the first production in the PEG file will be used to
begin parsing.  If another production should be used instead, it
must be specified:

<enscript highlight=scheme>
{(start-production "production-name")}
</enscript>

The top-level variable name described in {{define-name}} name will be
bound to the production.  All remaining productions will not be available
outside of the parser.  If more that one production should be publically
available, they may be specified as a list:

<enscript highlight=scheme>
{(start-production '("production-name-0" "production-name-1"))}
</enscript>

Each start production is bound the the variable name in the equivalent
position in {{define-name}}.

===== Debugging

''''Debugging is not yet tested.''''

Debugging is used to see what rules are being called as the parser
tries to parse the input.

<enscript highlight=scheme>
;;
;; enable debug output
;;
{(debug #t)}
</enscript>

When debugging is enabled, a trace of the parse is written to the
file "grammar-debug.scm".  This file is a regular Scheme file, and
can be loaded and traversed like any other symbolic expression.

To change the filename used to write debug information:

<enscript highlight=scheme>
;;
;; change the filename used to write debug output.
;;
{(debug-file "grammar-debug.scm")}
</enscript>

===== Profiling

''''Profiling is not yet tested.''''

Profiling is used to determine how much time is being spent in each
grammar rule.  Profiling information is collect in two ways.  By
production and by operator.  A production is the name of a series
of rules (the left hand side of an '<-' expression), whereas
operators are things like '*', '+', and '?'.

<enscript highlight=scheme>
;;
;; enable profiling
;;
{(profile #t)}
</enscript>

When profiling is enabled, the time spent in each routine is written
to the file "grammar-profile.scm".  This file is a regular Scheme
file, and can be loaded and traversed like any other symbolic
expression.

To change the filename used to write profile information:

<enscript highlight=scheme>
;;
;; change the filename used to write profile output.
;;
{(profile-file "grammar-profile.scm")}
</enscript>

===== Memoization

By default, memoization is enabled to ensure that the parser runs in
linear space and time.  For some grammars, memoization does not
provide any performance benefit, while the extra cache lookup
slows the parser down.

Memoization can be turned completely off:

<enscript highlight=scheme>
;;
;; turn memoization completely off.
;;
{(secuxna no-memoize #t)}
</enscript>

Or it can be turned off for specific non-terminal rules:

<enscript highlight=scheme>
;;
;; turn memoization off for non-terminal rule
;; 'rule-0' and 'rule-1'.  This interface also
;; allows a single rule to be expressed as a
;; string.
;;
{(secuxna memoize '("rule-0" "rule-1"))}
</enscript>

Simple rules, those which require less time to check than the time
to look up the result of the check, automatically turn memoization
off.

===== Changing the sentinel character

A sentinel character is placed at the end of the parse input to
indicate the end of input.  This character cannot be a valid
terminal character in the grammar.  Placing a sentinal character at
the end of the parse input is an optimization that allows the terminal
matching rules to avoid checking for the end of the parse input
when they try to match the parse input.  As these terminal rules run
frequently, removing this check can speed up parsing.

This is also the character that is returned when the end of input is
explicitely matched.

By default, {{#\nul}} is used as a sentinal character in the parse
input to indicate an end of input.  If this character appears in
your grammar, you'll need to change the sentinal character to
something else:

<enscript highlight=scheme>
;;
;; change the sentinal character used to
;; detect the end of the parse input.
;;
{(sentinal #\nul)} ; default
</enscript>

===== Changing the '?' operator's default token

The '?' operator is used to optionally match a rule.  By default,
when the rule is not matched, '?' returns the empty string {{""}}.

To change the token returned when '?' does not match it's rule:

<enscript highlight=scheme>
;;
;; change the token returned when the '?'
;; operator does not match it's rule.
;;
{(?-default "")} ; default
</enscript>

The special value {{ignore}} can be passed to this routine, which
indicates that in the event of the '?' operator not matching it's
rule, the parse tree should not be modified:

<enscript highlight=scheme>
;;
;; Don't modify the parse tree when a '?'
;; operator does not match it's rule.
;;
{(?-default 'ignore)}
</enscript>

===== Changing the '*' operator's default token

The '*' operator is used to match a rule zero or more times.  By
default, when the rule is matched zero times, '*' returns the empty
list {{'()}}.

To change the token returned when '*' matches it's rule zero times:

<enscript highlight=scheme>
;;
;; change the token returned when the '*'
;; operator matches it's rule zero times.
;;
{(*-default '())} ; default
</enscript>

The special value {{ignore}} can be passed to this routine, which
indicates that in the event of the '*' operator matching it's rule
zero times, the parse tree should not be modified:

<enscript highlight=scheme>
;;
;; Don't modify the parse tree when a '*'
;; operator matches it's rule zero times.
;;
{(*-default 'ignore)}
</enscript>

===== Changing the empty string

The empty string is a parse rule matching zero characters.  The
{{""}} rule returns the empty string.  By default, the token generated
by this rule is the zero length string {{""}}.  To change the token
produced by the empty string:

<enscript highlight=scheme>
;;
;; change the token returned when we
;; match against the empty string.
;;
{(empty-string "")} ; default
</enscript>

===== Changing the empty list

The empty list is a parse rule matching zero characters.  The
{{()}} rule returns the empty list.  By default, the token generated
by this rule is the nul list {{()}}.  To change the token produced by
the empty list:

<enscript highlight=scheme>
;;
;; change the token returned when we
;; match against the empty list.
;;
{(empty-list '())} ; default
</enscript>

===== Changing non-matching parse result

When parse input does not match the grammar, a token is returned
indicated non-match.  By default, the non-matching token is {{#f}}.
To change the token returned when the parse input does not match the
grammar:

<enscript highlight=scheme>
;;
;; change the token returned when the parse
;; input does not match the grammar.
;;
{(nonmatch-token #f)} ; default
</enscript>

===== Defining all productions as top-level productions

For sufficiently large grammars, this variable should be set to
reduce the size of the generated file:

<enscript highlight=scheme>
;;
;; By default, this is #f.
;;
{(define-toplevel #t)}
</enscript>

Unless Chicken will not compile your grammar, you may ignore this
option.

=== API Documentation

This is the low-level interface for writing parsers directly in
Scheme.  This code is used by the compiler to generate your parser.
It is not designed to be used directly.

As this interface is written specifically to support the PEG parser
generator, it is expressive but not concise.  A more concise syntax
could be written on top of this API.  I do not plan on developing this
syntax, but  if you would like to write parsers directly in Scheme,
please feel free to add support for a more concise syntax.  See the
History section of this document for a starting idea.

This API defines the following components:

; lerfu-porsi : The parse input with an associated position.
; nunjavni :    A generator that creates grammar rules.
; javni :       A grammar rule that creates a token from the parse input and advances the input position.
; mapti :       A match continuation for a grammar rule.
; namapti :     A non-match continuation for a grammar rule.
; nunvalsi:     A generator that returns parse results.
; javni-valsi : The token generated by a grammar rule.
; samselpla :   Code that modifies the syntax tree generated by the parser.
; genturfahi :  The parser, containing all of the grammar rules.

The parser runs as follows:

A {{lerfu-porsi}} encapsulates the input to parse.  It tracks the current
position of the input based on the parse.

A {{nunjavni}} expects zero or more {{javni}} and generates a new {{javni}}.
The parser is constructed by linking together {{javni}} through
{{nunjavni}} generators.

A {{javni}} expects a {{lerfu-porsi}}, {{mapti}}, and {{namapti}}.
{{mapti}} and {{namapti}} are continuations representing the next
rule in the grammar.  The set of all {{javni}} describe the grammar to
parse.

Both {{mapti}} and {{namapti}} expect a {{lerfu-porsi}} (possibly
advanced by the previous {{javni}}).  {{mapti}} also expects a
{{nunvalsi}}, the generator for the parse tree.  {{mapti}} and
{{namapti}} are used for backtracking and general control flow during
the parse.

A {{nunvalsi}} generates either a {{javni-valsi}} or a list of
{{javni-valsi}}.  {{javni-valsi}} are not immediately generated, as
future backtracking could invalidate a currently matched result.
{{nunvalsi}} generate the parse tree resulting from the input when
called.

A {{javni-valsi}} is a name and associated token.  The token can be a
single character or the entire syntax tree.  names must be symbols,
and are used to associate a piece of the abstract syntax tree with
a variable name when executing {{samselpla}}.  The parser converts
the input in {{lerfu-porsi}} to {{javni-valsi}}.

{{samselpa}} is code embedded in the grammar.  It is executed after
a successful parse, modifying the syntax tree before it is returned
from the parser.  Often it is easier to work with the syntax tree
"in place" rather than working on the entire syntax tree at once.
{{samselpla}} is the mechanism for working with the syntax tree "in
place."

{{genturfahi}} expects a {{lerfu-porsi}} and {{javni}} containing
the starting grammar rule.  It provides the terminal {{mapti}}
and {{namapti}} continuations that generate the parse results by
generating a {{javni-valsi}} from the {{nunvalsi}} and returning
the parsed syntax tree.

Details on each component are below.

==== lerfu-porsi

<record>lerfu-porsi</record>

The string to parse and the current parse position of that string.
This structure is used by the parser to store the input to the
parser.

It contains the following members:

; {{zva}} : An integer which is the current index of {{poi}}.
; {{poi}} : The string which will be parsed.

<procedure>(make-lerfu-porsi zva poi) => lerfu-porsi</procedure>

; {{zva}} : An integer describing the current position of the parse input.
; {{poi}} : The parse input, as a specially formatted string.  (See {{make-lerfu-porsi-string}} and {{make-lerfu-porsi-port}}.)

Create a {{lerfu-porsi}} from the integer index {{zva}} and the parse
input {{poi}}.  {{poi}} is specially formatted, and must come from
calling {{make-lerfu-porsi-string}} or {{make-lerfu-porsi-port}}.  It
has an appended character acting as a "sentinel" for purposes of detecting
the end of the input buffer.

{{zva}} must be less than or equal to the {{string-length}} of {{poi}}.

<procedure>(make-lerfu-porsi-pabalvi-lerfu porsi) => lerfu-porsi</procedure>

; {{porsi}} : the parse position to advance.

Create a {{lerfu-porsi}} from an existing {{lerfu-porsi}} by
advancing the index {{zva}} by a single character.

It is an error if {{porsi}} is not of type {{lerfu-porsi}}.

Behavior is undefined if this procedure is called when
{{lerfu-porsi-fanmo?}} returns {{#t}} for this {{porsi}}.

<procedure>(make-lerfu-porsi-pabalvi-valsi porsi nilcla) => lerfu-porsi</procedure>

; {{porsi}} : the parse position to advance.
; {{nilcla}} : The number of characters the position will be advanced by.

Create a {{lerfu-porsi}} from an existing {{lerfu-porsi}} by
advancing the index {{zva}} by the length of {{lefpoi}}.

It is an error if {{porsi}} is not of type {{lerfu-porsi}}.

Behavior is undefined if this procedure is called with a
{{lefpoi}} whose length is greater than the number of characters in
remaining in the parse input.

<procedure>(lerfu-porsi? porsi) => boolean</procedure>

; {{porsi}} : the object to test.

Return {{#t}} if {{porsi}} is of type {{lerfu-porsi}} and {{#f}}
otherwise.

<procedure>(lerfu-porsi-zva porsi) => zva</procedure>

; {{porsi}} : A {{lerfu-porsi}}.

Return the {{zva}} member from {{porsi}}.

It is an error if {{porsi}} is not of type {{lerfu-porsi}}.

<procedure>(lerfu-porsi-poi porsi) => poi</procedure>

; {{porsi}} : A {{lerfu-porsi}}.

Return the {{poi}} member from {{porsi}}.

It is an error if {{porsi}} is not of type {{lerfu-porsi}}.

<procedure>(lerfu-porsi->string porsi) => string</procedure>

; {{porsi}} : A {{lerfu-porsi}}.

Return A string representation of {{porsi}}.  This string
representation is for debug use only, and cannot be converted back
to a {{lerfu-porsi}}.

It is an error if {{porsi}} is not of type {{lerfu-porsi}}.

<procedure>(make-lerfu-porsi-string string) => porsi</procedure>

; {{string}} : The input string to parse.

Return a {{lerfu-porsi}} with {{string}} as the parse input.
{{string}} is copied into the {{lerfu-porsi}}.

<procedure>(make-lerfu-porsi-port port) => porsi</procedure>

; {{port}} : The input port to parse.

Return a {{lerfu-porsi}} with {{port}} as the parse input.
{{port}} is read into a string before being copied into
the {{lerfu-porsi}}, as the parser may need to backtrack
to any previous location in the string.

<procedure>(lerfu-porsi-string porsi) => string</procedure>

; {{porsi}} : A {{lerfu-porsi}}.

Return the characters that have not been parsed from {{porsi}}.
This routine copies the characters from the buffer into a
new string.

If all of the characters were parsed from {{lerfu-porsi}},
return {{""}}.

<procedure>(lerfu-porsi-lerfu porsi) => character</procedure>

; {{porsi}} : A {{lerfu-porsi}}.

Return the character at the current parse position {{zva}} in the
parse input {{poi}}.

It is an error if {{porsi}} is not of type {{lerfu-porsi}}.

<procedure>(lerfu-porsi-fanmo? porsi) => boolean</procedure>

; {{porsi}} : A {{lerfu-porsi}}.

Return {{#t}} if this {{porsi}} has no more input to parse, {{#f}}
otherwise.

It is an error if {{porsi}} is not of type {{lerfu-porsi}}.

==== javni-valsi

<record>javni-valsi</record>

A {{javni-valsi}} is a token with an (optionally) associated name generated
from a {{javni}}, indicating a match of that {{javni}}.

It contains the following members:

; {{cme}} : The name associated with this token, or {{#f}} if there is no associated name.
; {{val}} : The token generated from matching a rule.

<procedure>(make-javni-valsi cme val) => javni-valsi</procedure>

; {{cme}} : The name associated with this token, or {{#f}} if there is no associated name.
; {{val}} : The token generated from matching a rule.

Create a {{javni-valsi}} from the name {{cme}} and the token
{{val}}.  If this {{javni-valsi}} has no associated {{cmene}},
{{cme}} can be {{#f}}.

{{val}} may be an object of any type, and is the current abstract syntax
tree generated from this parse, as modified by {{samselpla}}
executed after the parse succeeds.

<procedure>(javni-valsi? valsi) => boolean</procedure>

; {{valsi}} : the object to test.

Return {{#t}} if {{valsi}} is of type {{javni-valsi}} and {{#f}}
otherwise.

<procedure>(javni-valsi-cme valsi) => cmene</procedure>

; {{valsi}} : A {{javni-valsi}}.

Return the {{cmene}} member from {{valsi}}.

It is an error if {{valsi}} is not of type {{javni-valsi}}.

<procedure>(javni-valsi-val valsi) => val</procedure>

; {{valsi}} : A {{javni-valsi}}.

Return the {{val}} member from {{valsi}}.

It is an error if {{valsi}} is not of type {{javni-valsi}}.

<procedure>(javni-nunvalsi-val nunvalsi) => val / (val ...)</procedure>

; {{nunvalsi}} : A {{javni-valsi}} generator.

Generate the {{javni-valsi}} from {{nunvalsi}}.  If the
result is a single {{javni-valsi}}, return the {{val}} member
of that result.  Otherwise, treat the result as a tree 
containing {{javnli-valsi}} and recursively retrieve the {{val}}
member from each {{javni-valsi}} node in the tree.

It is an error if {{valsi}} is not (lists of) of type {{javni-valsi}}.

<procedure>(javni-rodavalsi-val rodavalsi) => val / (val ...)</procedure>

; {{rodavalsi}} : A {{javni-valsi}}.

This helper routine for {{javni-nunvalsi-val}} returns the {{val}}
member of {{rodavalsi}} if {{rodavalsi}} is a {{javni-valsi}},
calling itself recursively every time {{rodavalsi}} is a list.

It is an error if {{valsi}} is not of type {{javni-valsi}}.

<procedure>(javni-valsi->string valsi) => string</procedure>

; {{valsi}} : A {{javni-valsi}}.

Return A string representation of {{valsi}}.  This string
representation is for debug use only, and cannot be converted back
to a {{javni-valsi}}.

It is an error if {{valsi}} is not of type {{javni-valsi}}.

==== nunvalsi

A routine that generates a {{valsi}}.

<procedure>(venunjmina-nunvalsi cmene) => nunvalsi</procedure>

; {{cmene}} : The name to associated with the {{nunvalsi}}

If {{cmene}} is not {{#f}}, return a procedure that, when called,
will produce a {{javni-valsi}} with a {{cme}} of {{cmene}} and a
{{val}} resulting from calling the passed in {{nunvalsi}}.

This procedure is used by the ordered choice and optional {{javni}}
to generate new {{nunvalsi}} when those rules have an associated
cmene.

<procedure>(vejmina-nunvalsi cmene nunvalsi) => nunvalsi</procedure>

; {{cmene}} : The name to associate with the newly generated {{nunvalsi}}.
; {{nunvalsi}} : A {{javni-valsi}} generator.

Create a new {{javni-valsi}} with a {{cme}} of {{cmene}} and a
{{val}} from the {{val}} of generating a {{valsi}} from {{nunvalsi}}.

This routine is is generated by {{venunjmina-nunvalsi}} when the
caller provides a {{cmene}}.

<procedure>(vejmina-nunvalsi-nacmene nunvalsi) => nunvalsi</procedure>

; {{nunvalsi}} : A {{javni-valsi}} generator.

Return {{nunvalsi}}.  This routine is a no-op generated by
{{venunjmina-nunvalsi}} when the caller has not provided a
{{cmene}}.

<procedure>(venunjmina-rodanunvalsi cmene) => nunvalsi</procedure>

; {{cmene}} : The name to associated with the {{nunvalsi}}

If {{cmene}} is not {{#f}}, return a procedure that, when called,
will produce a {{javni-valsi}} with a {{cme}} of {{cmene}} and a
{{val}} resulting from calling the passed in {{nunvalsi}}.

This procedure is used by the zero or more and sequence {{javni}}
to generate new {{nunvalsi}} when those rules have an associated
cmene.

<procedure>(vejmina-rodanunvalsi cmene . rodanunvalsi) => nunvalsi</procedure>

; {{cmene}} : The name to associate with the newly generated {{nunvalsi}}.
; {{rodanunvalsi}} : A list of {{nunvalsi}}.

Create a new {{javni-valsi}} with a {{cme}} of {{cmene}} and a
{{val}} being a list of the {{val}} from the {{valsi}} generated
from each {{nunvalsi}} in {{rodanunvalsi}}.

This routine is is generated by {{venunjmina-rodanunvalsi}} when the
caller provides a {{cmene}}.

<procedure>(vejmina-rodanunvalsi-nacmene . rodanunvalsi) => nunvalsi</procedure>

; {{rodanunvalsi}} : A list of {{nunvalsi}}.

Create a new {{javni-valsi}} with a {{cme}} of {{#f}} and a {{val}}
being a list of the {{val}} from the {{valsi}} generated from each
{{nunvalsi}} in {{rodanunvalsi}}.

This routine is generated by {{venunjmina-rodanunvalsi}} when the
caller does not provide a {{cmene}}.

==== nunjavni

A routine that generates a {{javni}}.

<procedure>(nunjavni-lerfu lerfu #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching {{lerfu}}.
; {{lerfu}} : The character to match.

Generate a {{javni}} that matches a single {{lerfu}}.

{{nunjavni-lerfu}} corresponds to a nonterminal symbol in PEG
syntax.

<procedure>(nunjavni-. #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching the next character.

Generate a {{javni}} that matches the next character in the parse
input, regardless of what that character is.  Call {{mapti}} with
the matched character.  If there are no characters remaining in the
parse input, call {{namapti}}.

<procedure>(nunjavni-e #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by this rule.

Generate a {{javni}} that matches the empty string.

{{nunjavni-fanmo}} corresponds to the empty string expression in PEG
syntax.

<procedure>(nunjavni-fanmo #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching the end of the {{lerfu-porsi}}.

Generate a {{javni}} that matches the end of the
{{lerfu-porsi}}.

{{nunjavni-fanmo}} corresponds to a nonterminal symbol in PEG
syntax.

<procedure>(nunjavni-valsi valsi #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching {{cmene}}.
; {{valsi}} : The string to try to match.

Generate a {{javni}} that matches the parse input against {{valsi}}.

{{nunjavni-valsi}} matches a literal string, and will be faster than
{{nunjavni-re}} or a composition of individual {{nunjavni-lerfu}}
when the input to match can be expressed as a literal string.

<procedure>(nunjavni-re pattern #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching {{pattern}}.
; {{pattern}} : the regular expression to try to match.

Generate a {{javni}} that matches the next character(s) in the parse
input against the regular expression {{pattern}}.  If {{pattern}} matches
the next character(s) in the parse input, call {{mapti}} with the
matched characters.  Otherwise continue with {{namapti}}.

The character {{#\^}} is prepended to {{pattern}} before compiling
the regular expression.

<procedure>(nunjavni-* sumti-javni #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching {{sumti-javni}}.
; {{sumti-javni}} : The rule to match zero-or-more times.

Generate a {{javni}} that runs the passed in {{sumti-javni}} until
it generates a {{namapti}}.  The {{javni}} returned by {{nunjavni-*}}
never calls its {{namapti}} continuation, always calling {{mapti}}.

{{nunjavni-*}} corresponds to the zero-or-more expression in PEG
syntax.

<procedure>(nunjavni-+ sumti-javni #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by this rule.
; {{sumti-javni}} : The rule to match one-or-more times.

Generate a {{javni}} that runs the passed in {{sumti-javni}} until
it generates a {{namapti}}.  The {{javni}} returned by {{nunjavni-+}}
calls {{mapti}} if the passed in {{sumti-javni}} matches one or more
times, and calls {{namapti}} if {{sumti-javni}} does not match.

{{nunjavni-+}} corresponds to the one-or-more expression in PEG
syntax.

<procedure>(nunjavni-? sumti-javni #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by this rule.
; {{sumti-javni}} : The rule to optionally match.

Generate a {{javni}} that runs the passed in {{sumti-javni}}.
The {{javni}} returned by {{nunjavni-?}} never calls its {{namapti}}
continuation, always calling {{mapti}}.

{{nunjavni-?}} corresponds to the optional expression in PEG
syntax.

<procedure>(nunjavni-& sumti-javni) => javni</procedure>

; {{sumti-javni}} : The rule to match.

Generate a {{javni}} that runs the passed in {{sumti-javni}}.
The {{javni}} returned by {{nunjavni-&}} calls {{mapti}} or
{{namapti}} according to the result from {{sumti-javni}}, but
it does not advances the {{lerfu-porsi}} when {{mapti}} is called.

{{nunjavni-*}} corresponds to the and-predicate expression in PEG
syntax.

<procedure>(nunjavni-! sumti-javni) => javni</procedure>

; {{sumti-javni}} : The rule to match.

Generate a {{javni}} that runs the passed in {{sumti-javni}}.
The {{javni}} returned by {{nunjavni-&}} calls {{mapti}} if
the {{sumti-javni}} calls {{namapti}}, and it calls {{mapti}} if
the {{sumti-javni}} calls {{namapti}}.

{{nunjavni-!}} corresponds to the not-predicate expression in PEG
syntax.

<procedure>(nunjavni-je #!rest rodajavni #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching all {{rodajavni}}.
; {{rodajavni}} : one or more rules to match in sequence.

Generate a {{javni}} that runs each {{javni}} in the passed in
{{rodajavni}} in order.  The {{javni}} returned by {{nunjavni-je}} calls
{{mapti}} if every {{javni}} in {{rodajavni}} matches, and {{namapti}}
if any {{javnri}} in {{rodajavni}} does not match.

{{nunjavni-je}} corresponds to the sequence expression in PEG syntax.

<procedure>(nunjavni-jonai #!rest rodajavni #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching any {{rodajavni}}.
; {{rodajavni}} : one or more rules to match in sequence.

Generate a {{javni}} that runs each {{javni}} in the passed in
{{rodajavni}} in order.  The {{javni}} returned by {{nunjavni-jonai}}
calls {{mapti}} with the result from the first {{javni}} in {{rodajavni}}
that matches, ignoring all remaining {{javni}} in {{rodajavni}}.
{{nunjavni-je}} calls {{namapti}} if all the {{javni}} in {{rodajavni}}
do not match.

{{nunjavni-je}} corresponds to the ordered-choice expression in PEG
syntax.

<procedure>(nunjavni-morji sumti-javni) => javni</procedure>

; {{sumti-javni}} : the rule to match.

Generate a {{javni}} that runs the passed in {{sumti-javni}} and
stores the result.  If the {{javni}} returned by {{nunjavni-morji}}
is called again with the same {{lerfu-porsi}}, the stored {{mapti}}
or {{namapti}} is called instead of calling {{sumti-javni}} again.

{{nunjavni-je}} corresponds to memoization in packrat parsing.

Unlink other nunjavni, this rule generator does not accept a {{cmene}}
key.

<macro>(nunjavni-naselci sumti-javni #!key cmene: cmene) => javni</macro>

; {{sumti-javni}} : the rule defined in the environment being captured.
; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching {{sumti-javni}}.

{{nunjavni-nasleci}} encloses {{sumti-javni}} in a lambda, which
captures the environment in which {{sumti-javni}} is defined.

This macro is used in the letrec used to define the non-terminals in a
grammar to allow mutual recursion of non-terminals.

<procedure>(nunjavni-samselpla samselpla sumti-javni #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching {{sumti-javni}} and executing {{samselpla}}.
; {{samselpla}} :   the code to execute if the generated {{javni}} calls {{mapti}}.
; {{sumti-javni}} : The rule generating {{javni-valsi}} for {{samselpla}}.

Generate a {{javni}} that runs the passed in {{sumti-javni}} and,
if the {{sumti-javni}} calls {{mapti}}, calls {{samselpla}} by
passing any tokens in the {{jalga}} with associated {{cmene}} as
DSSL-style #!key parameters.

If {{sumti-javni}} does not match, {{samselpla}} is not called.
There is no way to execute code for a {{namapti}}, as there is
only a single {{namapti}} called during a parse.  Code that should
be executed when a parse fails can be run after the parse
completes; see {{genturfahi}}.

<procedure>(nunjavni-cmene sumti-javni #!key cmene) => javni</procedure>

; {{cmene}} : The optional name for the {{javni-valsi}} generated by successfully matching {{sumti-javni}}.
; {{sumti-javni}} : the rule to match.

Generate a {{javni}} that runs the passed in {{sumti-javni}} and,
if the {{sumti-javni}} matches, associates {{cmene}} with the
{{javni-valsi}} associated with the {{mapti}}.

If {{sumti-javni}} does not match, call {{namapti}} without
associating {{cmene}} with the result.

All terminal {{javni}} accept a cmene: argument, which should be
used in preference to this function.  This function is required when
matching a {{javni-naselci}}, as we can't tell at compile-time that
we will associate a {{cmene}} with that {{javni}}.  Instead, the
binding waits until runtime where it is performed by this function.

==== javni

A routine that, given a {{lerfu-porsi}}, creates a {{javni-valsi}}
indicating match or nonmatch of that {{javni}} for that {{lerfu-porsi}}.
This {{javni-valsi}} is then forwarded to the next {{javni}} by
means of calling {{mapti}} for a match or {{namapti}} for a non-match.

A {{javni}} is provided the following parameters:

<procedure>(javni lerfu-porsi mapti namapti) => jalge</procedure>

; {{lerfu-porsi}} : the current parse position.
; {{mapti}}       : continuation for rule match.
; {{namapti}}     : continuation for rule non-match.

Note that {{jalge}} is not returned until the entire parse succeeds,
and may not return at all on failure, if another path succeeds in
the parse.

==== mapti .ijo namapti

A {{mapti}} is a continuation called by a {{javni}} when that {{javni}}
matches the current {{lerfu-porsi}}.

A {{namapti}} is a continuation called by a {{javni}} when that {{javni}}
does not match the current {lerfu-porsi}.

{{mapti}} and {{namapti}} have the signatures:

<procedure>(mapti lerfu-porsi nunvalsi) => jalge</procedure>
<procedure>(namapti lerfu-porsi) => jalge</procedure>

; {{lerfu-porsi}} : The current parse position.
; {{nunvalsi}} : A generated returning the current parse result.

{{genturfahi}} accepts the top-level parse rule and returns the
result to the caller, rather than calling any additional {{mapti}}
or {{namapti}}.

==== genturfahi

<procedure>(genturfahi-tolmohi) => '()</procedure>

Clear the memoization cache.  This is done automatically after each
parse.

<procedure>(genturfahi javni) => rodavalsi</procedure>

; {{javni}} : The starting rule of the grammar.

Parse the {{javni}}, which is assumed to be the initial rule of the
grammar.  Return {{rodavalsi}}, the result of the parse.

If the parse succeed, {{rodavalsi}} is the abstract syntax tree as
modified by any {{samselpla}} embedded in the grammar.  If the parse
failed, {{#f}} is returned.  In both cases, the {{lerfu-porsi}}
representing the input remaining after the parse is returned as a second
value.

<procedure>(genturfahi* javni) => (rodavalsi lefpoi)</procedure>

; {{javni}} : The starting rule of the grammar.

{{genturfahi*}} calls {{genturfahi}} and converts the multiple
arguments returned by {{genturfahi}} to a list and returns them.

This routine is used in the test framework, and might be useful
elsewhere.

<procedure>(genturfahi-peg port) => genturfahi</procedure>

; {{port}} : The port from which to read the PEG grammar.

{{genturfahi-peg}} parses a PEG grammar from {{port}} and returns
scheme code that will generate a parser for that grammar.

Note that the expression returned from this procedure is designed to
be written to a file and loaded later.  It must be evaluated with
eval if it is going to be used immediately.

<procedure>(genturfahi-peg* port) => genturfahi</procedure>

; {{port}} : The port from which to read the PEG grammar.

{{genturfahi-peg*}} calls {{genturfahi-peg}} and converts the
multiple arguments returned by {{genturfahi-peg}} to a list and
returns them.

This routine is used in the test framework, and might be useful
elsewhere.

<parameter>genturfahi-status</parameter>

The exit status of the parser.  If the parse input does not match
the parser, {{genturfahi-status}} is 1.  Otherwise,
{{genturfahi-status}} is zero.

The value of {{genturfahi-status}} is passed to {{exit}} once the
parse completes.

<constant>genturfahi-version-major</constant>

A number representing the major version of this module.

<constant>genturfahi-version-minor</constant>

A number representing the minor version of this module.

<constant>genturfahi-version-patch</constant>

A number representing the patch version of this module.

<constant>genturfahi-version</constant>

A string representing the version number of this module.

==== Examples

These examples show how to use the {{genturfahi}} API to construct a
parser.

===== Operators

<enscript language=scheme>
;;;; lerfu.scm

(use genturfahi)
(use test)
;;;
;;; terminal:
;;;
;;; lerfu <- \#a
;;;
(define (lerfu)
  (set! genturfahi-lerfu
        (genturfahi* (nunjavni-lerfu #\a)))

; match the only character this parser matches.
  ;
  (test '(#\a "") (genturfahi-lerfu "a"))

; There is no rule to match any other character,
  ; so they don't match.
  ;
  (test '(#f "b") (genturfahi-lerfu "b"))
  (test '(#f "c") (genturfahi-lerfu "c"))

; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-lerfu ""))

; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  ; this rule won't even match two #\a,
  ; only one.
  ;
  (test '(#\a "a") (genturfahi-lerfu "aa"))
  (test '(#\a "b") (genturfahi-lerfu "ab"))
  (test '(#\a "c") (genturfahi-lerfu "ac"))

; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  (test '(#f "da") (genturfahi-lerfu "da"))
  0)

(test-group "lerfu"
  (lerfu))
</enscript>

<enscript language=scheme>
;;;; optional.scm

(use genturfahi)
(use test)

;;;
;;; optional: e?
;;;
;;; optional <- \#a?
;;;
(define (optional?)
  (set! genturfahi-optional
        (genturfahi* (nunjavni-? (nunjavni-lerfu #\a))))

; match the only character this parser matches.
  ;
  (test '(#\a "") (genturfahi-optional "a"))

; There is no rule to match any other character,
  ; but since this rule is optional, the empty string
  ; is returned.
  ;
  (test '("" "b") (genturfahi-optional "b"))
  (test '("" "c") (genturfahi-optional "c"))

; there is no rule for the emtpy string, but the
  ; optional rule matches anyway.
  ;
  (test '("" "") (genturfahi-optional ""))

; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  ; this rule won't even match two #\a,
  ; only zero or one.
  ;
  (test '(#\a "a") (genturfahi-optional "aa"))
  (test '(#\a "b") (genturfahi-optional "ab"))
  (test '(#\a "c") (genturfahi-optional "ac"))

; later characters that would match don't if there
  ; is no rule to match earlier characters.  We match
  ; zero #\a at the beginning of the parser input and
  ; return.
  ;
  (test '("" "da") (genturfahi-optional "da"))
  0)

(test-group "optional?"
  (optional?))
</enscript>

<enscript language=scheme>
;;;; zero-or-more.scm

(use genturfahi)
(use test)

;;;
;;; zero-or-more: e*
;;;
;;; zero-or-more <- \#a*
;;;
(define (zero-or-more)
  (set! genturfahi-zero-or-more
        (genturfahi* (nunjavni-* (nunjavni-lerfu #\a))))

; match the only character this parser matches.
  ; notice that it returns a list of characters,
  ; even if it only matches a single character.
  ;
  (test '((#\a) "") (genturfahi-zero-or-more "a"))

; though the rule matches even when the character isn't present
  ; the empty list indicates no characters matched.
  ;
  (test '(() "") (genturfahi-zero-or-more ""))

; We match as many #\a as we can.
  ;
  (test '((#\a #\a) "") (genturfahi-zero-or-more "aa"))
  (test '((#\a #\a #\a) "") (genturfahi-zero-or-more "aaa"))

; There is no rule to match any other character,
  ; but since this rule matches zero-or-more, it
  ; succeeds in matching zero #\a.
  ;
  (test '(() "b") (genturfahi-zero-or-more "b"))
  (test '(() "c") (genturfahi-zero-or-more "c"))

; As well, it will match all #\a before ending
  ; the parse.  These match, but they don't parse
  ; the whole buffer!
  ;
  (test '((#\a) "b") (genturfahi-zero-or-more "ab"))
  (test '((#\a) "c") (genturfahi-zero-or-more "ac"))

(test '((#\a #\a) "b") (genturfahi-zero-or-more "aab"))
  (test '((#\a #\a) "c") (genturfahi-zero-or-more "aac"))

(test '((#\a #\a #\a) "b") (genturfahi-zero-or-more "aaab"))
  (test '((#\a #\a #\a) "c") (genturfahi-zero-or-more "aaac"))

; later characters that would match don't if there is no rule
  ; to match earlier characters.  This rule matches the empty
  ; list, but can't access the #\a later in the parse input.
  ;
  (test '(() "da") (genturfahi-zero-or-more "da"))
  (test '(() "daa") (genturfahi-zero-or-more "daa"))
  (test '(() "daaa") (genturfahi-zero-or-more "daaa"))
  0)

(test-group "zero-or-more"
  (zero-or-more))
</enscript>

<enscript language=scheme>
;;;; one-or-more.scm

(use genturfahi)
(use test)

;;;
;;; one-or-more: e+
;;;
;;; one-or-more <- \#a+
;;;
(define (one-or-more)
  (set! genturfahi-one-or-more
        (genturfahi* (nunjavni-+ (nunjavni-lerfu #\a))))

; match the only character this parser matches.
  ; notice that it returns a list of characters,
  ; even if it only matches a single character.
  ;
  (test '((#\a) "") (genturfahi-one-or-more "a"))

; We match as many #\a as we can.
  ;
  (test '((#\a #\a) "") (genturfahi-one-or-more "aa"))
  (test '((#\a #\a #\a) "") (genturfahi-one-or-more "aaa"))

; but we don't match zero #\a.
  ;
  (test '(#f "") (genturfahi-one-or-more ""))
  (test '(#f "b") (genturfahi-one-or-more "b"))
  (test '(#f "c") (genturfahi-one-or-more "c"))

; As well, it will match all #\a before ending
  ; the parse.  These match, but they don't parse
  ; the whole buffer!
  ;
  (test '((#\a) "b") (genturfahi-one-or-more "ab"))
  (test '((#\a) "c") (genturfahi-one-or-more "ac"))

(test '((#\a #\a) "b") (genturfahi-one-or-more "aab"))
  (test '((#\a #\a) "c") (genturfahi-one-or-more "aac"))

(test '((#\a #\a #\a) "b") (genturfahi-one-or-more "aaab"))
  (test '((#\a #\a #\a) "c") (genturfahi-one-or-more "aaac"))

; later characters that would match don't if there is no rule
  ; to match earlier characters.  These rules don't match, as
  ; the input does not begin with #\a.
  ;
  (test '(#f "da") (genturfahi-one-or-more "da"))
  (test '(#f "daa") (genturfahi-one-or-more "daa"))
  (test '(#f "daaa") (genturfahi-one-or-more "daaa"))
  0)

(test-group "one-or-more"
  (one-or-more))
</enscript>

<enscript language=scheme>
;;;; je.scm

(use genturfahi)
(use test)

;;;
;;; ordered sequence: e_1 e_2
;;;
;;; je <- \#a \#b \#c
;;;
(define (je)
  (set! genturfahi-je
        (genturfahi* (nunjavni-je (nunjavni-lerfu #\a)
                                  (nunjavni-lerfu #\b)
                                  (nunjavni-lerfu #\c))))

; matches each lerfu in sequence 
  ;
  (test '((#\a #\b #\c) "") (genturfahi-je "abc"))

; matches the first rule (#\a), but not the second or
  ; third (#\b and #\c), so this match fails.
  ;
  (test '(#f "a") (genturfahi-je "a"))
  (test '(#f "ac") (genturfahi-je "ac"))

; matches the first two rules (#\a and #\b), but not
  ; the third one.
  ;
  (test '(#f "ab") (genturfahi-je "ab"))
  (test '(#f "abd") (genturfahi-je "abd"))

; "abc" is the only correct sequence.  Every other
  ; combination of #\a, #\b, and #\c does not match.
  ;
  (test '(#f "acb") (genturfahi-je "acb"))
  (test '(#f "bac") (genturfahi-je "bac"))
  (test '(#f "bca") (genturfahi-je "bca"))
  (test '(#f "cab") (genturfahi-je "cab"))
  (test '(#f "cba") (genturfahi-je "cba"))

; there is no rule for #\d, no match
  ;
  (test '(#f "d") (genturfahi-je "d"))

; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-je ""))

; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  (test '((#\a #\b #\c) "a") (genturfahi-je "abca"))
  (test '((#\a #\b #\c) "b") (genturfahi-je "abcb"))
  (test '((#\a #\b #\c) "c") (genturfahi-je "abcc"))

; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  (test '(#f "dabc") (genturfahi-je "dabc"))
  0)

(test-group "je"
  (je))
</enscript>

<enscript language=scheme>
;;;; jonai.scm

(use genturfahi)
(use test)

;;;
;;; ordered-choice: e_1 / e_2
;;;
;;; jonai <- \#a / \#b / \#c
;;;
(define (jonai)
  (set! genturfahi-jonai
        (genturfahi* (nunjavni-jonai (nunjavni-lerfu #\a)
                                     (nunjavni-lerfu #\b)
                                     (nunjavni-lerfu #\c))))

; matches (nunjavni-jonai (nunjavni-lerfu #\a) ...)
  ;
  (test '(#\a "") (genturfahi-jonai "a"))

; matches (nunjavni-jonai ... (nunjavni-lerfu #\b) ...)
  ;
  (test '(#\b "") (genturfahi-jonai "b"))

; matches (nunjavni-jonai ... (nunjavni-lerfu #\c))
  ;
  (test '(#\c "") (genturfahi-jonai "c"))

; there is no rule for #\d, no match
  ;
  (test '(#f "d") (genturfahi-jonai "d"))

; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-jonai ""))

; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  (test '(#\a "a") (genturfahi-jonai "aa"))
  (test '(#\a "b") (genturfahi-jonai "ab"))
  (test '(#\a "c") (genturfahi-jonai "ac"))

; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  ; none of these match.
  ;
  (test '(#f "da") (genturfahi-jonai "da"))
  (test '(#f "db") (genturfahi-jonai "db"))
  (test '(#f "dc") (genturfahi-jonai "dc"))
  0)

(test-group "jonai"
  (jonai))
</enscript>

<enscript language=scheme>
;;;; and-predicate.scm

(use genturfahi)
(use test)

;;;
;;; and-predicate: &e
;;;
;;; and-predicate <- &\#a
;;;
(define (and-predicate)
  (set! genturfahi-and-predicate
        (genturfahi* (nunjavni-& (nunjavni-lerfu #\a))))

; the and-predicate matches as normal, but does not advance
  ; the input.
  ;
  (test '(#\a "a") (genturfahi-and-predicate "a"))

; It behaves like all other rules when there is no match,
  ; indicating no match.
  ;
  (test '(#f "b") (genturfahi-and-predicate "b"))
  (test '(#f "c") (genturfahi-and-predicate "c"))

; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-and-predicate ""))

; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  ; this rule won't even match two #\a,
  ; only one.
  ;
  (test '(#\a "aa") (genturfahi-and-predicate "aa"))
  (test '(#\a "ab") (genturfahi-and-predicate "ab"))
  (test '(#\a "ac") (genturfahi-and-predicate "ac"))

; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  (test '(#f "da") (genturfahi-and-predicate "da"))
  0)

(test-group "and-predicate"
  (and-predicate))
</enscript>

<enscript language=scheme>
;;;; not-predicate.scm

(use genturfahi)
(use test)

;;;
;;; not-predicate: !e
;;;
;;; not-predicate <- !\#a
;;;
(define (not-predicate)
  (set! genturfahi-not-predicate
        (genturfahi* (nunjavni-! (nunjavni-lerfu #\a))))

; this not-predicate matches everything *but* \#a.
  ; not-predicate never advances the input.
  ;
  ; on match, we return the empty string.
  ;
  (test '(#f "a") (genturfahi-not-predicate "a"))
  (test '("" "b") (genturfahi-not-predicate "b"))
  (test '("" "c") (genturfahi-not-predicate "c"))

; we match the empty string (because the empty string
  ; isn't #\a.)
  ;
  (test '("" "") (genturfahi-not-predicate ""))

; there is no rule for the end-of-file
  ; These don't parse the whole buffer!
  ;
  (test '(#f "aa") (genturfahi-not-predicate "aa"))
  (test '(#f "ab") (genturfahi-not-predicate "ab"))
  (test '(#f "ac") (genturfahi-not-predicate "ac"))
  (test '("" "ba") (genturfahi-not-predicate "ba"))
  (test '("" "bb") (genturfahi-not-predicate "bb"))
  (test '("" "bc") (genturfahi-not-predicate "bc"))
  (test '("" "ca") (genturfahi-not-predicate "ca"))
  (test '("" "cb") (genturfahi-not-predicate "cb"))
  (test '("" "cc") (genturfahi-not-predicate "cc"))
  0)

(test-group "not-predicate"
  (not-predicate))
</enscript>

<enscript language=scheme>
;;;; empty-string.scm

(use genturfahi)
(use test)

;;;
;;; empty-string:
;;;
;;; empty-string <- ""
;;;
(define (empty-string)
  (set! genturfahi-empty-string (genturfahi* (nunjavni-e)))

; the empty string is zero characters long, so
  ; it always matches.
  ;
  (test '("" "") (genturfahi-empty-string ""))
  (test '("" "a") (genturfahi-empty-string "a"))
  (test '("" "b") (genturfahi-empty-string "b"))
  (test '("" "c") (genturfahi-empty-string "c"))
  (test '("" "abc") (genturfahi-empty-string "abc"))
  0)

(test-group "empty-string"
  (empty-string))
</enscript>

<enscript language=scheme>
;;;; end-of-input.scm

(use genturfahi)
(use test)

;;;
;;; end-of-input:
;;;
;;; end-of-input <- [eof]
;;;
(define (end-of-input)
  (set! genturfahi-eof (genturfahi* (nunjavni-fanmo)))

; The end of file only matches when we're at the end of
  ; the parse input.
  ;
  (test '(#\nul "") (genturfahi-eof ""))

; it doesn't match if here are any characters left.
  ;
  (test '(#f "a") (genturfahi-eof "a"))
  (test '(#f "b") (genturfahi-eof "b"))
  (test '(#f "c") (genturfahi-eof "c"))

; even if those remaining characters are the sentinel. 
  ;
  (test `(#f ,(make-string 1 #\nul)) (genturfahi-eof (make-string 1 #\nul)))
  0)

(test-group "end-of-input"
  (end-of-input))
</enscript>

<enscript language=scheme>
;; jonai-naselci.scm: nonterminal symbols

(use genturfahi)
(use test)

;;;
;;; ordered choice with non-terminals
;;; this test is identical to the ordered choice file, though the
;;; grammar differs.
;;;
;;; jonai <- a / b / c
;;; a     <- #\a
;;; b     <- #\a
;;; c     <- #\a
;;;
(define (jonai-naselci)
  (set! genturfahi-jonai-naselci
    (letrec ((gerna (nunjavni-morji (nunjavni-jonai (nunjavni-naselci a)
                                                    (nunjavni-naselci b)
                                                    (nunjavni-naselci c))))
             (a (nunjavni-morji (nunjavni-lerfu #\a)))
             (b (nunjavni-morji (nunjavni-lerfu #\b)))
             (c (nunjavni-morji (nunjavni-lerfu #\c))))
      (genturfahi* gerna)))

; matches (nunjavni-jonai-naselci (nunjavni-lerfu #\a) ...)
  ;
  (test '(#\a "") (genturfahi-jonai-naselci "a"))

; matches (nunjavni-jonai-naselci ... (nunjavni-lerfu #\b) ...)
  ;
  (test '(#\b "") (genturfahi-jonai-naselci "b"))

; matches (nunjavni-jonai-naselci ... (nunjavni-lerfu #\c))
  ;
  (test '(#\c "") (genturfahi-jonai-naselci "c"))

; there is no rule for #\d, no match
  ;
  (test '(#f "d") (genturfahi-jonai-naselci "d"))

; there is no rule for the emtpy
  ; string, no match
  ;
  (test '(#f "") (genturfahi-jonai-naselci ""))

; there is no rule for the end-of-file
  ; These match, but they don't parse the
  ; whole buffer!
  ;
  (test '(#\a "a") (genturfahi-jonai-naselci "aa"))
  (test '(#\a "b") (genturfahi-jonai-naselci "ab"))
  (test '(#\a "c") (genturfahi-jonai-naselci "ac"))

; later characters that would match
  ; don't if there is no rule to match
  ; earlier characters.
  ;
  ; none of these match.
  ;
  (test '(#f "da") (genturfahi-jonai-naselci "da"))
  (test '(#f "db") (genturfahi-jonai-naselci "db"))
  (test '(#f "dc") (genturfahi-jonai-naselci "dc"))
  0)

(test-group "naselci"
  (jonai-naselci))
</enscript>

===== Executing Code

<enscript language=scheme>
;; simple (no-op) example

(use genturfahi)

;; XXX: I haven't written this example yet!
</enscript>

<enscript language=scheme>
;; samselpla.scm

(use genturfahi)
(use test)

;;;
;;; execute code on syntax tree.
;;;
;;; gerna <- a b c
;;; a     <- #\a
;;; b     <- #\b
;;; c     <- #\c
;;;
(define (samselpla)
  (set! genturfahi-samselpla
    (letrec ((gerna (nunjavni-morji
               ; concatenate the strings
               (nunjavni-samselpla
                 (lambda (#!key a b c) (string-append a b c))
                 (nunjavni-je
                   (nunjavni-cmene (nunjavni-naselci a) cmene: 'a:)
                   (nunjavni-cmene (nunjavni-naselci b) cmene: 'b:)
                   (nunjavni-cmene (nunjavni-naselci c) cmene: 'c:)))))
             (a (nunjavni-morji
               ; convert each character to a string.
               (nunjavni-samselpla
                 (lambda (#!key lerfu) (make-string 1 lerfu))
                 (nunjavni-lerfu #\a cmene: 'lerfu:))))
             (b (nunjavni-morji
               (nunjavni-samselpla
                 (lambda (#!key lerfu) (make-string 1 lerfu))
                 (nunjavni-lerfu #\b cmene: 'lerfu:))))
             (c (nunjavni-morji
               (nunjavni-samselpla
                 (lambda (#!key lerfu) (make-string 1 lerfu))
                 (nunjavni-lerfu #\c cmene: 'lerfu:)))))
      (genturfahi* gerna)))

; The code runs only after a a syntax tree is successfully
  ; generated.
  ;
  (test '("abc" "") (genturfahi-samselpla "abc"))

; It doesn't run at all if we fail to parse.
  ;
  (test '(#f "d") (genturfahi-samselpla "d"))

; Even if the parse partially matches before failing.
  ;
  (test '(#f "a") (genturfahi-samselpla "a"))
  (test '(#f "ab") (genturfahi-samselpla "ab"))

; But if we don't parse all of the input, that
  ; is a valid parse.
  ;
  (test '("abc" "a") (genturfahi-samselpla "abca"))
  (test '("abc" "b") (genturfahi-samselpla "abcb"))
  (test '("abc" "c") (genturfahi-samselpla "abcc"))
  0)

(test-group "samselpla"
  (samselpla))
</enscript>

Note that a tag should be placed outside of a {{nunjavni-jonai}}.
If a tag is placed on a {{javni}} that is passed to a {{nunjavni-jonai}},
that same tag must be placed on every {{javni}} that is passed to that
{{nunjavni-jonai}}.

The following example is undefined:

<enscript language=scheme>
;; undefined interaction between nunjavni-jonai nunjavni-cmene

(use genturfahi)

;; XXX: This example isn't written yet!
</enscript>

And should instead be written as:

<enscript language=scheme>
;; defined interaction between nunjavni-jonai nunjavni-cmene

(use genturfahi)

;; XXX: This example isn't written yet!
</enscript>

===== A Full Example

Here is a complete example for using this API to construct a parser.
It is the same grammar presented in
[[http://gazette.call-cc.org/issues/5.html|Chicken Gazette #5]],
which documents the {{packrat}} parser egg.

<enscript language=scheme>
;; mex.scm

(use genturfahi)
(use test)

;;;
;;; This example appeared in chicken gazette #5 as the example for
;;; the packrat parser module:
;;;
;;;  http://gazette.call-cc.org/issues/5.html
;;;
;;; It is here ported to genturfa'i.
;;;
(define (mex)
  (let ((mex
    (letrec
      ((expr   (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-samselpla
                     (lambda (#!key a b) (+ a b))
                     (nunjavni-je
                       (nunjavni-cmene (nunjavni-naselci mulexp) cmene: 'a:)
                       (nunjavni-lerfu #\+)
                       (nunjavni-cmene (nunjavni-naselci mulexp) cmene: 'b:)))
                   (nunjavni-naselci mulexp))))
       (mulexp (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-samselpla
                     (lambda (#!key a b) (* a b))
                     (nunjavni-je
                       (nunjavni-cmene (nunjavni-naselci simple) cmene: 'a:)
                       (nunjavni-lerfu #\*)
                       (nunjavni-cmene (nunjavni-naselci simple) cmene: 'b:)))
                   (nunjavni-naselci simple))))
       (simple (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-naselci num)
                   (nunjavni-samselpla
                     (lambda (#!key a) a)
                     (nunjavni-je
                       (nunjavni-lerfu #\()
                       (nunjavni-cmene (nunjavni-naselci expr) cmene: 'a:)
                       (nunjavni-lerfu #\)))))))
       (num    (nunjavni-morji
                 (nunjavni-samselpla
                   (lambda (#!key x) (string->number (list->string x)))
                   (nunjavni-+ (nunjavni-naselci digit) cmene: 'x:))))
       (digit  (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-lerfu #\0)
                   (nunjavni-lerfu #\1)
                   (nunjavni-lerfu #\2)
                   (nunjavni-lerfu #\3)
                   (nunjavni-lerfu #\4)
                   (nunjavni-lerfu #\5)
                   (nunjavni-lerfu #\6)
                   (nunjavni-lerfu #\7)
                   (nunjavni-lerfu #\8)
                   (nunjavni-lerfu #\9)))))
      (genturfahi expr))))

(test 2  (mex "2"))
    (test 22 (mex "22"))
    (test 4  (mex "2*2"))
    (test 4  (mex "2+2"))
    (test 16 (mex "2+2*7"))
    (test 28 (mex "(2+2)*7"))
    (test 42 (mex "3*4+5*6")))
    0)

(test-group "mex"
  (mex))
</enscript>

=== Source Code

The source code for this egg can be viewed by version:

* [[http://bugs.call-cc.org/browser/release/4/genturfahi/trunk|trunk]]
* [[http://bugs.call-cc.org/browser/release/4/genturfahi/tags/1.0|1.0]]

A copy of this project is also hosted on [[http://github.com|github]]:

* [[http://github.com/alanpost/genturfahi|genturfahi]]

=== Salmonella Report

The Salmonella Report shows the result from the current nightly
build.

* [[http://tests.call-cc.org/current/salmonella-report/genturfahi.html|build log]]
* [[http://tests.call-cc.org/current/salmonella-report/tests/genturfahi.html|test output]]
* [[http://tests.call-cc.org/current/salmonella-report/dep-graphs/genturfahi.png|dependency graph]]

=== History

[[http://pdos.csail.mit.edu/~baford/packrat/|Brian Ford]] coined the
modern terms PEGs and packrat parsing, which was the subject of his
Master's Thesis.  Much of their formal theory existed earlier, the
technique having been discovered by Alexander Birman in 1970.  More
recent work on the topic has all been done by others.  His Master's
Thesis is an accessible introduction to the topic of packrat
parsing.

Tony Garnock-Jones wrote a
[[http://www.lshift.net/blog/2005/08/11/extensible-parsing-systems|packrat parser for scheme]]
in 2005.  He later packaged this work with a json parser for mzscheme
[[http://www.lshift.net/blog/2005/08/22/json-for-mzscheme-and-a-portable-packrat-parsing-combinator-library|and published it]].  Tony has not updated
his parser since 2006.

Tony's packrat parser was ported to Chicken Scheme by
[[users/christian-kellermann|Christian Kellermann]] and
[[users/felix-winkelmann|Felix Winkelmann]] in 2009.  In 2010, it
was featured in
[[http://gazette.call-cc.org/issues/5.html|Chicken Gazette 5]].

Alan Post began using the {{packrat}} parser after it was introduced
in Chicken Gazette #5 and encountered problems translating the
Lojban grammar.  Some of these problems were anticipated by Tony
Garnock-Jones:

"...whereas an extensible parser would require some kind of reified
representation of the composite parser, along the lines of

<enscript language=scheme>
`((expr   (or (seq (num + num) ,(lambda (a _ b) (+ a b)))
              (seq (mulexp)    ,(lambda (a) a))))
  (mulexp (or (seq (num * num) ,(lambda (a _ b) (* a b))))))
</enscript>

which could then be interpreted, in order to parse some input, or
stepped through, in order to debug a parser or parser extension,
or used in various diagnostic printouts to help pinpoint errors in
parsed text or parsing extension grammars."

{{genturfahi}} was written to address these shortcomings in Tony's
{{packrat}} parser, creating a pull PEG parser in Scheme capable of
parsing Lojban's grammar.  As of 2011, this goal has been
accomplished.  I'm not aware of a PEG parser for Scheme more
feature-complete or as high-performance as genturfa'i.

=== Wish List

* Create an error handler so I know how much of the parse succeeded
  before failure.

=== See Also

There are other PEG packrat parsers available:

* The [[packrat]] parser by Tony Garnock-Jones.
* Jon Rafkind has written a packrat parser generator called
  [[https://github.com/kazzmir/Pegs/|Pegs]].
* John Leuner has written [[http://www.cliki.net/cl-peg|cl-peg]],
  a Common Lisp library for PEG parser generators.

=== License

Copyright (c) 2010 ".alyn.post." <alyn.post@lodockikumazvati.org>
  
 Permission to use, copy, modify, and/or distribute this software for any
 purpose with or without fee is hereby granted, provided that the above
 copyright notice and this permission notice appear in all copies.
 
 THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

=== Version History

; 1.0.0 : Version 1.0 released 7 Sep 2011.

Description of your changes:

I would like to authenticate

Authentication

Username:Password:

Spam control

What do you get when you subtract 15 from 21?