1. Pattern matching
    1. Pattern Matching Expressions
    2. Patterns
    3. Match Failure
    4. Record Structures Pattern
    5. Code Generation

Pattern matching

(This description has been taken mostly from Andrew Wright's postscript document)

Pattern matching allows complicated control decisions based on data structure to be expressed in a concise manner. Pattern matching is found in several modern languages, notably Standard ML, Haskell and Miranda. These syntactic extensions internally use the match library unit.

Note: this pattern matching package is not compatible with hygienic macro-expanders like the syntax-case extension (available separately).

The basic form of pattern matching expression is:

(match exp [pat body] ...)

where exp is an expression, pat is a pattern, and body is one or more expressions (like the body of a lambda-expression). The match form matches its first subexpression against a sequence of patterns, and branches to the body corresponding to the first pattern successfully matched. For example, the following code defines the usual map function:

(define map
  (lambda (f l)
    (match l
      [() '()]
      [(x . y) (cons (f x) (map f y))])))

The first pattern () matches the empty list. The second pattern (x . y) matches a pair, binding x to the first component of the pair and y to the second component of the pair.

Pattern Matching Expressions

The complete syntax of the pattern matching expressions follows:

exp ::= (match exp clause ...)
     |  (match-lambda clause ...)
     |  (match-lambda* clause ...)
     |  (match-let ([pat exp] ...) body)
     |  (match-let* ([pat exp] ...) body)
     |  (match-letrec ([pat exp] ...) body)
     |  (match-let var ([pat exp] ...) body)
     |  (match-define pat exp)
clause ::= [pat body]
        |  [pat (=> identifier) body]
pat ::= identifier           matches anything, and binds identifier as a variable
     |  _                    anything
     |  ()                   itself (the empty list)
     |  #t                   itself
     |  #f                   itself
     |  string               an `equal?' string
     |  number               an `equal?' number
     |  character            an `equal?' character
     |  's-expression        an `equal?' s-expression
     |  (pat-1 ... pat-n)    a proper list of n elements
     |  (pat-1 ... pat-n . pat-n+1)  
                             a list of n or more elements
     |  (pat-1 ... pat-n pat-n+1 ..k)  
                             a proper list of n+k or more elements [1]
     |  #(pat-1 ... pat-n)   a vector of n elements
     |  #(pat-1 ... pat-n pat-n+1 ..k)  
                             a vector of n+k or more elements
     |  ($ struct pat-1 ... pat-n)  
                             a structure
     |  (= field pat)        a field of a structure
     |  (and pat-1 ... pat-n)  
                             if all of pat-1 through pat-n match
     |  (or pat-1 ... pat-n) 
                             if any of pat-1 through pat-n match
     |  (not pat-1 ... pat-n)
                             if none of pat-1 through pat-n match
     |  (? predicate pat-1 ... pat-n)  
                             if predicate true and pat-1 through pat-n all match
     |  (set! identifier)    anything, and binds identifier as a setter
     |  (get! identifier)    anything, and binds identifier as a getter
     |  `qp                  a quasipattern
qp ::= ()                    itself (the empty list)
    |  #t                    itself
    |  #f                    itself
    |  string                an `equal?' string
    |  number                an `equal?' number
    |  character             an `equal?' character
    |  symbol                an `equal?' symbol
    |  (qp-1 ... qp-n)       a proper list of n elements
    |  (qp-1 ... qp-n . qp-n+1)  
                             a list of n or more elements
    |  (qp-1 ... qp-n qp-n+1 ..k)  
                             a proper list of n+k or more elements
    |  #(qp-1 ... qp-n)      a vector of n elements
    |  #(qp-1 ... qp-n qp-n+1 ..k)  
                             a vector of n+k or more elements
    |  ,pat                  a pattern
    |  ,@pat                 a pattern, spliced

The notation ..k denotes a keyword consisting of three consecutive dots (ie., ...), or two dots and an non-negative integer (eg., ..1, ..2), or three consecutive underscores (ie., ___), or two underscores and a non-negative integer. The keywords ..k and __ k are equivalent. The keywords ..., ___, ..0, and __0 are equivalent.

The next subsection describes the various patterns.

The match-lambda and match-lambda* forms are convenient combinations of match and lambda, and can be explained as follows:

(match-lambda [pat body] ...)   =  (lambda (x) (match x [pat body] ...))
(match-lambda* [pat body] ...)  =  (lambda x (match x [pat body] ...))

where x is a unique variable. The match-lambda form is convenient when defining a single argument function that immediately destructures its argument. The match-lambda* form constructs a function that accepts any number of arguments; the patterns of match-lambda* should be lists.

The match-let, match-let*, match-letrec, and match-define forms generalize Scheme's let, let*, letrec, and define expressions to allow patterns in the binding position rather than just variables. For example, the following expression:

(match-let ([(x y z) (list 1 2 3)]) body ...)

binds x to 1, y to 2, and z to 3 in body .... These forms are convenient for destructuring the result of a function that returns multiple values as a list or vector. As usual for letrec and define, pattern variables bound by match-letrec and match-define should not be used in computing the bound value.

The match, match-lambda, and match-lambda* forms allow the optional syntax (=> identifier) between the pattern and the body of a clause. When the pattern match for such a clause succeeds, the identifier is bound to a `failure procedure' of zero arguments within the body. If this procedure is invoked, it jumps back to the pattern matching expression, and resumes the matching process as if the pattern had failed to match. The body must not mutate the object being matched, otherwise unpredictable behavior may result.

Patterns

identifier: (excluding the reserved names ?, ,, =, _, and, or, not, set!, get!, ..., and ..k for non-negative integers k) matches anything, and binds a variable of this name to the matching value in the body.

_: matches anything, without binding any variables.

(), #t, #f, string, number, character, 's-expression: These constant patterns match themselves, i.e., the corresponding value must be equal? to the pattern.

(pat-1 ... pat-n): matches a proper list of n elements that match pat-1 through pat-n.

(pat-1 ... pat-n . pat-n+1): matches a (possibly improper) list of at least n elements that ends in something matching pat-n+1.

(pat-1 ... pat-n pat-n+1 ...): matches a proper list of n or more elements, where each element of the tail matches pat-n+1. Each pattern variable in pat-n+1 is bound to a list of the matching values. For example, the expression:

(match '(let ([x 1][y 2]) z)
  [('let ((binding values) ...) exp)  body])

binds binding to the list '(x y), values to the list \'(1 2), and exp to 'z in the body of the match-expression. For the special case where pat-n+1 is a pattern variable, the list bound to that variable may share with the matched value.

(pat-1 ... pat-n pat-n+1 ___): This pattern means the same thing as the previous pattern.

(pat-1 ... pat-n pat-n+1 ..k): This pattern is similar to the previous pattern, but the tail must be at least k elements long. The pattern keywords ..0 and ... are equivalent.

(pat-1 ... pat-n ~ pat-n+1 __k): This pattern means the same thing as the previous pattern.

#(pat-1 ... pat-n): matches a vector of length n, whose elements match pat-1 through pat-n.

#(pat-1 ... pat-n pat-n+1 ...): matches a vector of length n or more, where each element beyond n matches pat-n+1.

#(pat-1 ... pat-n pat-n+1 ..k): matches a vector of length n+k or more, where each element beyond n matches pat-n+1.

($ struct pat-1 ... pat-n): matches a structure declared with define-record or define-record-type.

(= field pat): is intended for selecting a field from a structure. field may be any expression; it is applied to the value being matched, and the result of this application is matched against pat.

(and pat-1 ... pat-n): matches if all of the subpatterns match. At least one subpattern must be present. This pattern is often used as (and x pat) to bind x to to the entire value that matches pat (cf. as-patterns in ML or Haskell).

(or pat-1 ... pat-n): matches if any of the subpatterns match. At least one subpattern must be present. All subpatterns must bind the same set of pattern variables.

(not pat-1 ... pat-n): matches if none of the subpatterns match. At least one subpattern must be present. The subpatterns may not bind any pattern variables.

(? predicate pat-1 ... pat-n): In this pattern, predicate must be an expression evaluating to a single argument function. This pattern matches if predicate applied to the corresponding value is true, and the subpatterns pat-1 ... pat-n all match. The predicate should not have side effects, as the code generated by the pattern matcher may invoke predicates repeatedly in any order. The predicate expression is bound in the same scope as the match expression, i.e., free variables in predicate are not bound by pattern variables.

(set! identifier): matches anything, and binds identifier to a procedure of one argument that mutates the corresponding field of the matching value. This pattern must be nested within a pair, vector, box, or structure pattern. For example, the expression:

(define x (list 1 (list 2 3)))
(match x [(_ (_ (set! setit)))  (setit 4)])

mutates the cadadr of x to 4, so that x is '(1 (2 4)).

(get! identifier): matches anything, and binds identifier to a procedure of zero arguments that accesses the corresponding field of the matching value. This pattern is the complement to set!. As with set!, this pattern must be nested within a pair, vector, box, or structure pattern.

Quasipatterns: Quasiquote introduces a quasipattern, in which identifiers are considered to be symbolic constants. Like Scheme's quasiquote for data, unquote (,) and unquote-splicing (,@) escape back to normal patterns.

Match Failure

If no clause matches the value, the default action is to invoke the procedure (match-error-procedure) with the value that did not match. The default definition of (match-error-procedure) calls error with an appropriate message:

#;1> (match 1 (2 2))

Failed match:
Error: no matching clause for : 1

For most situations, this behavior is adequate, but it can be changed by altering the value of the parameter match-error-control:

{procedure} match-error-control
(match-error-control [MODE])

Selects a mode that specifies how match... macro forms are to be expanded. With no argument this procedure returns the current mode. A single argument specifies the new mode that decides what should happen if no match-clause applies. The following modes are supported:

#:error Signal an error. This is the default.
#:match Signal an error and output the offending form.
#:fail Omits pair? tests when the consequence is to fail in car or cdr rather than to signal an error.
unspecified Non-matching expressions will either fail in car or cdr or return an unspecified value. This mode applies to files compiled with the unsafe option or declaration.

When an error is signalled, the raised exception will be of kind (exn match).

[procedure] match-error-procedure
(match-error-procedure [PROCEDURE])

Sets or returns the procedure called upon a match error. The procedure takes one argument, the value which failed to match. When the error control mode is #:match a second argument, the source form of the match expression is available.

Record Structures Pattern

The $ pattern handles native record structures and SRFI-9 records transparently. Currently it is required that SRFI-9 record predicates are named exactly like the record type name, followed by a ? (question mark) character.

Code Generation

Pattern matching macros are compiled into if-expressions that decompose the value being matched with standard Scheme procedures, and test the components with standard predicates. Rebinding or lexically shadowing the names of any of these procedures will change the semantics of the match macros. The names that should not be rebound or shadowed are:

null? pair? number? string? symbol? boolean? char? procedure? vector? list?
equal?
car cdr cadr cdddr ...
vector-length vector-ref
reverse length call/cc

Additionally, the code generated to match a structure pattern like ($ Foo pat-1 ... pat-n) refers to the name Foo?. This name also should not be shadowed.

Previous: Non-standard macros and special forms

Next: Declarations