Loopy-loop

LOOP is a generalized iteration form supporting extensible iterator macros, keyword updates, and full recursion. The idea is to create a loop as simple and close to natural Scheme as possible, while still being extensible.

  1. Loopy-loop
    1. Introduction
    2. Bindings and scope
    3. Iterators
      1. in-list
      2. in-lists
      3. in-string / in-string-reverse
      4. in-vector / in-vector-reverse
      5. in-port / in-file
      6. in-range / in-range-reverse
      7. in-random
      8. in-random-element
      9. in-permutations
      10. in-combinations
      11. in-cartesian-product
      12. in-hash-table
      13. collecting
    4. Implicit matching
    5. Extending
    6. Further reading
    7. Requirements
    8. License
    9. History

Introduction

In its most basic usage, LOOP can be used as a drop-in replacement for named let (assuming the let name isn't passed as a first class value). So, for example, the following definitions

  (define (fold kons knil ls)
    (let lp ((ls ls) (knil knil))
      (if (pair? ls)
        (lp (cdr ls) (kons (car ls) knil))
        knil)))

and

  (define (fold kons knil ls)
    (loop lp ((ls ls) (knil knil))
      (if (pair? ls)
        (lp (cdr ls) (kons (car ls) knil))
        knil)))

are equivalent. We further allow automatic stepping of variables, as in the DO macro:

  (define (fold kons knil ls)
    (loop lp ((ls ls (cdr ls)) (knil knil (kons (car ls) knil)))
      (if (pair? ls)
        (lp)
        knil)))

The parameters have default steps, so we don't need to pass them explicitly anymore (though we can still do so if we wanted to override the defaults).

In addition, we provide extensible iterators to automatically handle the logic of stepping, fetching, and checking termination on sequence types. To use an iterator, specify one or more variable names followed by `<-' followed by the iterator and any parameters:

 (x <- in-foo bar baz qux)

To iterate over a list, use the IN-LIST macro:

 (x <- in-list ls)

This will bind X to the successive elements of LS in the body of the loop.

Now, when iterating automatically, the loop will also terminate automatically if it encounters the end of its input. In such a case you may want to specify a return value. You can do this by putting

 => <expr>

right at the start of the loop body. So our example now becomes:

  (define (fold kons knil ls)
    (loop lp ((x <- in-list ls) (knil knil (kons x knil)))
        => knil
      (lp)))

Note we can still call, or not call, the loop itself in the body according to whatever logic we want, and re-enter it possibly multiple times. However, in this common case where the entire body is reduced to just calling the loop again, we can omit it by using an anonymous loop:

  (define (fold kons knil ls)
    (loop ((x <- in-list ls) (knil knil (kons x knil)))
       => knil))

No flexibility is lost over named let, yet we've gained the convenience of iterators. If you wanted to change the above to work on vectors, all you would need to do is change the iterator:

 (x <- in-vector vec)

and it works as expected.

Bindings and scope

Iterator macros may introduce variables in three different lexical scopes:

Loop variables
Analogous to the variables in a named let, these are initialized once and updated on each iteration through the loop. In the example above, KNIL is a loop variable (as are all named let and DO-style variables).
Body variables
Bound in the body, these are usually derived directly from loop variables. They can't be overridden (see below) and are not available in the final expression. In the example above, X is a body variable.
Final variables
Bound once in the return expression, these are sometimes used for some final computation such as reversing a consed up list.

Within each of these three lexical scopes, all variables are updated in parallel, and none are ever mutated (unless the programmer does so manually). This referential transparency is important to achieve full non-tail recursion and re-entrancy.

In many cases the loop variables will be implicit and unnamed. For instance, IN-LIST uses a loop variable to cdr down the list of pairs, binding X to the successive cars. However, in such cases the iterator usually lets you explicitly name the loop variable if you want access to it.

Loop variables may be manually overridden on a recursive call. You can either use the original positional arguments, or specify individual values by name with the <- syntax, punning the initial binding. Thus in

  (loop lp ((x ls <- in-list ls)) ...)

the recursive calls

  (lp)
  (lp (cdr ls))
  (lp ls <- (cdr ls))

are all the same. Note that we are binding the loop variable LS, not X which is considered to be always derived from the loop variable. Note also that there is no need to recurse on CDR - we could use CDDR, or a completely unrelated list, or '() to force an early termination.

The following example flattens a tree into a list, using minimal conses and stack. This serves as an example of naming implicit loop variables, binding loop variables, and non-tail recursion.

  (define (flatten ls)
    (reverse
     (loop lp ((x ls <- in-list ls) (res '()))
         => res
       (if (pair? x)
           (lp res <- (lp ls <- x))
           (lp res <- (cons x res))))))

The scope of the final expression will include all the final variables, as well as all the last instances of all the loop variables, at least one of which will correspond to a true termination condition (you could manually check the others to see if the sequence lengths were uneven). The body variables are not bound, however the loop itself, if named, is available so that you can restart the loop with all new initial values if you want.

Iterators

in-list

syntax: (<element> [<pair>] <- in-list <list> [<cdr> [<null?>]])

Iterates over the successive elements of a list.

  ;;; Simple loop
  > (loop ((x <- in-list '(a b c))) (write x) (newline))
  a
  b
  c

  ;;; Reverse a list destructively.
  (define (reverse! list)
    (loop ((elt pair <- in-list list)
           (tail '() pair))
        => tail
      (set-cdr! pair tail)))

  ;;; Test for circularity
  (define (cddr* ls) ; CL's cddr
    (if (pair? (cdr ls)) (cddr ls) '()))

  (define (circular-list? ls)
    (and (pair? ls)
         (loop race ((tortoise <- in-list ls)
                     (hare <- in-list (cdr ls) cddr*))
             => #f
           (or (eq? hare tortoise) (race)))))

in-lists

syntax: (<elements> [<pairs>] <- in-lists <lol> [<cdr> [<null?>]])

Iterate over a list of lists. <elements> is bound to the heads of each of the lists in <lol>. The CDR and NULL? options can be specified as in IN-LIST.

  (define (any pred . lol)
    (loop lp ((elts <- in-lists lol))
        => #f
      (or (apply pred elts) (lp))))

in-string / in-string-reverse

syntax: (<element> [<index>] <- in-string <str> [<start> [<end> [<step>]]])
syntax: (<element> [<index>] <- in-string-reverse <str> [<start> [<end> [<step>]]])

Iterate over the characters of a string. Proceeds from <start>, inclusive, to <end>, exclusive. By default <start> is 0 and <end> is the string length, thus iterating over every character.

You can specify a step other than the default 1, for example 2 to iterate over every other character.

The reverse version steps from one less than the end, continuing until you step below the start. Thus with the same <start> and <end> and a <step> of 1 (or any divisor of the difference), the two forms will iterate over the same characters but in the reverse order.

Note this works correctly with the utf8 egg, but is not optimal in such cases because the use of numeric indexes is slow.

in-vector / in-vector-reverse

syntax: (<element> [<index>] <- in-vector <vec> [<start> [<end> [<step>]]])
syntax: (<element> [<index>] <- in-vector-reverse <vec> [<start> [<end> [<step>]]])

Analogues of the string iterators, but for vectors. Note also all of the SRFI-4 uniform vectors can iterated over as in-u8vector, etc.

in-port / in-file

syntax: (<datum> <- in-port [<port> [<reader> [<eof?>]]])
syntax: (<datum> <- in-file <path> [<reader> [<eof?>]])

Iterate over data read from a port, defaulting to (CURRENT-INPUT-PORT) for IN-PORT, and a port opened by (OPEN-INPUT-FILE <path>) for IN-FILE. The reader defaults to READ-CHAR?, and the termination test defaults to EOF-OBJECT?.

The stateful nature of ports means that these are not referentially transparent, and you can't save a loop iteration to go back to later. In particular, IN-FILE will close its port on the first termination, causing an error if you attempt to re-enter the same loop again.

  (define (read-mime-headers port)
    (loop lp ((line <- in-port port read-line)
              (res '() (cons line res)))
        => (reverse res) ; eof case
      (if (string-null? line)
        (reverse res)
        (lp))))

  ;; alternate version with a custom termination test
  (define (read-mime-headers port)
    (loop lp ((line <- in-port port read-line
                       (disjoin eof-object? string-null?))
              (res <- collecting line))
        => res))

  (define (file->sexp-list path)
    (loop ((x <- in-file path read) (ls <- collecting x)) => x))

in-range / in-range-reverse

syntax: (<number> <- in-range [[<from>] <to> [<step>]])
syntax: (<number> <- in-range-reverse [[<from>] <to> [<step>]])

Step through the real numbers beginning with <from> (default 0), until they would be greater than (less then in the -reverse case) or equal to <to> (thus <to> is never included). <step> defaults to 1.

Two arguments indicate <from> and <to>, so provide the default <from> of 0 if you're only interested in <to> and <step>.

These macros are subject to change in the near future.

in-random

syntax: (<number> <- in-random [<range> [<low>]])

With no arguments, <number> is bound to a random inexact number uniformly distributed over 0.0 and 1.0, inclusive, on each iteration.

With a single argument, <number> is bound to a random integer uniformly distributed over 0..<range>-1, inclusive.

With two arguments, <number> is bound to a random integer uniformly distributed over <low>..<low>+<range>-1, inclusive.

These are conceptually infinite sequences, and themselves never cause the loop to terminate.

in-random-element

syntax: (<element> <- in-random-element <vector-or-list>)

On each iteration, <element> is bound a random object uniformly chosen from the elements of the <vector-or-list> source.

Elements may be repeated, so this is a conceptually infinite sequence.

in-permutations

syntax: (<perm> <- in-permutations <list> [<n>])

With one argument, <perm> is bound to the successive permutations of the elements of <list> in lexicographic order. No assumptions about the elements are made - if <list> is a multi-set, duplicate permutations will arise.

This is very fast and mutation free. It uses only O(k) space, where k is the number of elements in <list>. Beware that the number of permutations of n elements is n!, which grows extremely fast.

  > (loop ((p <- in-permutations '(a b c) 2)) (write p) (newline))
  (a b)
  (a c)
  (b a)
  (b c)
  (c a)
  (c b)

in-combinations

syntax: (<comb> <- in-combinations <list> <n>)

Similar to IN-PERMUTATIONS, but iterates over all combinations of <n> elements from <list> (i.e. order doesn't matter).

  > (loop ((c <- in-combinations '(a b c) 2)) (write c) (newline))
  (a b)
  (a c)
  (b c)

Using permutations and combinations can be a convenient way to build very extensive (albeit brute-force) test suites, among other things.

in-cartesian-product

syntax: (<list> <- in-cartesian-product <list-of-lists>)

Iterates over the full cartesian product (all joins) of <list-of-lists>, lexicographically (the rightmost list changes first).

  > (loop ((x <- in-cartesian-product '((a b) (c d e)))) (write x) (newline))
  (a c)
  (a d)
  (a e)
  (b c)
  (b d)
  (b e)

in-hash-table

syntax: (<key> <value> <- in-hash-table <table>)

Iterate over the <key> and <value> pairs of a hash-table <table>. The current <key> being iterated over may be deleted from the table or have its value in the table changed safely.

The result is unspecified if you add or remove other values to the table while it is being iterated over. If you want to capture a safe snapshot of the table first, you can convert it to an alist and iterate over those values.

  (define (hash-table-purge! pred table)
    (loop ((k v <- in-table table))
      (if (pred k v)
        (hash-table-delete! table k))))

collecting

syntax: (<list> [<rev>] <- collecting <expr> [<cons> [<finalize> [<init>]]])

The only of the standard iterators that introduces a final variable. <list> is bound only in the => final clause. By default,

By specifying all of <cons>, <finalize> and <init> you could collect into any data structure.

The optional <rev> is a loop variable representing the intermediate consed results. You may override this manually to include or exclude values, or even reset the collected results mid-loop.

This is really just syntactic sugar over an accumulated list to save you the trouble of reversing manually at the end.

Implicit matching

For any body variable (as described above, the ones derived from iterators, e.g. the elements in a list), instead of a simple name you can use any sexp, and it will be matched against the result as in Common-Lisp's destructuring-bind, except using the matchable syntax. So for example, to iterate nicely over the pairs in an alist, you just do

  (loop (((k . v) <- in-list alist))
    (print "key: " k " value: " v))

This costs nothing if you don't use it, and is fast even if you do.

Extending

Adding your own iterators is easy. When a loop includes a binding such as

  (left ... <- in-iterator right ...)

then the iterator itself is called as a macro in the following form:

  (in-iterator ((left ...) (right ...)) next . rest)

where next and rest are the continuation. The continuation expects to be passed the appropriate information to insert in the loop, in the following form:

  (next ((temp-var value) ...)     ; one-time bindings outside the loop
        ((loop-var init step) ...) ; do-style description of loop variables
        (done? ...)                ; termination tests
        ((body-var value) ...)     ; body variables
        ((final-var value) ...)    ; final result bindings in => clause
        . rest)

Note that any or all of the terms may be empty lists - the iterator doesn't have to do anything.

As an example, consider the following simplified implementation of IN-LIST:

  (define-syntax in-list
    (syntax-rules ()
      ((in-list ((elt) (init-list)) next . rest)
       ;; pass the info to next
       (next ()                              ; no outer let bindings
             ((ls init-list (cdr ls)))       ; loop variable, init, step
             ((null? ls))                    ; termination tests
             ((elt (car ls)))                ; body variables and values
             ()                              ; no final result bindings
             . rest))))

This implementation of IN-LIST causes the code

  (loop ((x <- in-list '(1 2 3)))
    (print x))

to expand to

  (let ((ls '(1 2 3)))
    (if (null? ls)
      (if #f #f) ; unspecified value returned by default
      (let ((x (car ls)))
        (print x)
        (lp (cdr ls)))))

Note the outer let bindings are empty because we don't have anything to remember - the loop just proceeds by cdr'ing down the LS loop variable. In an interator such as IN-VECTOR, where you repeatedly VECTOR-REF the same vector, you'd want to bind the vector once so that it's not evaluated multiple times.

The final result bindings are also usually empty. Currently it's only used by COLLECTING to reverse the list that has been accumulated so far.

Further reading

See the article with message-id <1157562097.001179.11470@i42g2000cwa.googlegroups.com> posted on comp.lang.scheme in September of 2006 for the original version and a brief history of Lisp iteration constructs.

Requirements

matchable

License

public domain

History

0.5
srfi-4 uniform vectors and in-cartesian-product
0.4
fixing bug in IN-FILE
0.3
minor changes and cleanup
0.2
implicit matching
0.1
initial release