Nitrate (historical revision 45288) - The CHICKEN Scheme wiki

You are looking at historical revision 45288 of this page. It may differ significantly from its current revision.

Nitrate

Combinator based pattern matching

Nitrate
Version History
1. License

Nitrate is a combinator (procedure) based backtracking pattern matcher. Being procedure based, as opposed to syntax based, the matcher is extensible with any predicate or record type and its basic components are composable and can represent recursive structure.

This could also be called monadic pattern matching or a parser-combinator library for Scheme values.

Nitrate has been tested to work on Chibi Scheme in addition to CHICKEN 5.

Why a Procedure-Based Pattern Matcher?

Scheme has no standard pattern matcher. A commonly used one is the “Wright” pattern matcher, with a notable syntax-rules implementation using CPS macros by Alex Shinn. Such patterns matchers are not extensible. Some proposals for high-end, extensible pattern matchers include SRFI 257 and SRFI 262.

These matchers are all at the syntax level and are very difficult to implement and extend as a result. A static-typed language like Standard ML or Haskell requires a matcher as a syntactic construct because it has no way to discriminate against sum types. But Scheme is a latently typed language, so it does not have this issue. Higher-order procedures should be able to do the same thing that pattern matching syntax can, with the added benefit of extensibility.

This procedure-based pattern matcher is very composable. For instance, the pattern or~ could be defined as:

(define or~
  (case-lambda
    (() _~)
    ((x~) x~)
    ((x~ . spats) (if~ x~ _~ (apply or~ spats)))))

This code almost looks like how one would write or in a call-by-name version of Scheme. Since procedures are composed before matching, the code that makes up the patterns passed to or~ are not run until then.

The composition is done over a finite list (the argument list), hence the recursion is done at a meta-level to the matching. But with this system, one can also write recursive patterns at the matching level. Consider the following possible implementation of member~:

(define-recpat (member~ pat~)
  (or~ (cons~ pat~ _~) (cons~ _~ (member~ pat~))))

The macro define-recpat uses an eta-expansion trick to allow for a composition of matching procedures to be recursive.

A procedure-based pattern matcher would have a uniform syntax for matching any structured data. This may make the basic case of lists and vectors less ergonomic than a specialized solution, but this could encourage people to use abstract patterns instead of concrete ones.

One thing that a procedure-based pattern matcher can't do that a syntactic pattern matcher can is to bind values to identifiers, because identifiers are a syntactic construct. This has its upsides, though: a binding can appear zero or more times, and the client code can extract values as desired.

Design of the Matcher

The matcher is a depth-first, left-to-right traversal. Many R7RS-small systems are not going to take much opportunity to optimize anyways by re-arranging the evaulation order, so by fixing an evaluation order the matcher can express things such as cuts.

The matcher threads an immutable map through the match. Its immutability makes it easy to backtrack. The map is returned at the end of the match. The matcher procedures either return the map, or #f.

The matcher backtracks by maintaining a stack of continuations captured by call/cc. This means that failures jump, they do not flow up. This matcher only uses escape continuations by default, so even for implementations with an inefficient call/cc this implementation can still work with something like call/ec or call/1cc.

The default matcher is call/cc safe. That means that a matcher using only the procedures here can be entered and exited multiple times. This could be used, for instance, to match one branch, do some processing, and cut back to try another match if something goes wrong in the client code. Future Directions

One idea is to use identifiers (R6RS) instead of symbols. This would make the patterns easier to read and help with possible scope and hygiene issues. A more radical change would be to use SRFI 247 style syntactic monads.

Another possibility is to do a version of SRFI 262 lowered to the procedure level. SRFI 262 is designed to be compilable efficiently, which suggests a route for optimizing compilers: given a pattern (i.e. in a procedure), lift it such that it only needs to be evaluated once, evaluate derived matching procedures into matching primitives (maybe record types), optimize the primitives, and then output a matching procedure with compiler hints to aggresively inline.

Author

Peter McGoron

Repository

https://codeberg.org/phm/nitrate/

Requirements

Examples

Match a list with specified atoms:

(define (one-two-three-four lst)
  (match lst dict
    ((list~ 1 2 3 4) #t)
    (_~ #f)))
(one-two-three-four '(1 2 3 4)) ; => #t
(one-two-three-four '#(1 2 3 4)) ; => #f

Match a list, mapping values to symbols:

(define (simple lst)
  (match lst dict
    ((list~ 1 (b~ 'second) 3) (car (dict-ref (dto) dict 'second)))
    (_~ #f)))
(simple '(1 100 3)) ; => 100
(simple '(1 100 5)) ; => #f

Match as many numbers at the head of a list as possible:

(define (leading lst)
  (match lst dict
    ((not~ (?~ pair?)) (values '() '()))
    ((list*~ (and~ (?~ number?) (b~ 'n)) (b~ 'rest))
     (values (dict-ref/default (dto) dict 'n '())
             (car (dict-ref (dto) dict 'rest))))))

(leading '(1 2 3 4)) ; => (values '(1 2 3 4) '())
(leading '(1 2 3 x y 5)) ;=> (values '(1 2 3) '(x y 5))
(leading '(x 1 2)) ; => (values '() '(x 1 2))
(leading 'abcd) ; => (values '() '())

Partition a list using a predicate:

(define (partition predicate? lst)
  (match lst dict
    ((list*~ (or~ (and~ (?~ predicate?) (b~ 'hit))
                  (b~ 'miss))
             '())
     (values (dict-ref/default (dto) dict 'hit '())
             (dict-ref/default (dto) dict 'miss '())))))
(partition positive? '(1 2 3 -5 -6 7 8 -9))
; => (values '(1 2 3 7 8) '(-5 -6 -9))

API

TODO: port the API documentation to the CHICKEN wiki format. The full API is documented on a separate website.

Version History

1.0.0: Initial Release

License

Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.