foof-loop: A Simple, Extensible Scheme Looping Facility
This is the reference manual for foof-loop version 8. This file was written by Taylor R. Campbell and is placed in the Public Domain. All warranties are disclaimed.
The most recently released version of this file is permanently stored at http://mumble.net/~campbell/scheme/foof-loop.txt.
- foof-loop: A Simple, Extensible Scheme Looping Facility
- Loop Structure
- Built-in Iterators
- Writing Custom Iterators
- Porting to New Schemes
- Design Notes and Rationale
Looping is one of the most fundamental concepts of computer programs, and it is not a simple concept that can be described succinctly once and for all programs; therefore, it is in our interest to have some way to write loops clearly. Although there are far too many kinds of loops -- perhaps an infinite number -- for any one framework to give a name to each conceivable way to express them, we should like to have a cohesive framework for expressing a large subset of the common kinds of loops in a flexible but simple way.
Basic Looping: Named LET versus DO versus LOOP
Let us begin with a simple example: a loop that counts the number of elements satisfying a certain predicate in a sequence. What sort of sequence? We shall try lists first. In standard Scheme, this might be written with named LET or DO; let us choose DO, simply because it is slightly more concise for this form of loop:
(define (count-matching-items list predicate) (do ((list list (cdr list)) (count 0 (if (predicate (car list)) (+ count 1) count))) ((null-list? list) count)))
This is fairly straightforward, but if we wish to substitute a vector for a list, we must alter the structure of the loop so that it steps through the indices of the vector, rather than through the list's structure. That is, we must write in precise detail the *means* of looping:
(define (count-matching-items vector predicate) (do ((index 0 (+ index 1)) (count 0 (if (predicate (vector-ref vector index)) (+ count 1) count))) ((= index (vector-length vector)) count)))
Even worse, if we wish to count the matching items read from an input port, we must restructure the whole loop using named LET, because we need to perform an action when the loop is entered *before* the termination condition:
(define (count-matching-items reader input-port predicate) (let continue ((count 0)) (let ((item (reader input-port))) (if (eof-object? item) count (continue (if (predicate item) (+ count 1) count))))))
We should like a notation that allows the programmer to specify *what* iteration, not *how* to iterate. We want to give the programmer the tools to request iteration over each element in a list, or each element in a vector, or each item read from an input port. LOOP lets us do so:
(define (count-matching-items list predicate) (loop ((for item (in-list list)) (with count 0 (if (predicate item) (+ count 1) count))) => count))
How do we read this mountain of syntactic sugar? The first part is LOOP, which is the name of the macro that invokes the whole construct. The second part is a list of iteration clauses, each of which acts in parallel through the whole iteration; that is, there are no nested loops here. (FOR ITEM (IN-LIST LIST)) asks for ITEM to be bound to each element of the list LIST; (WITH COUNT 0 <update>) asks COUNT to be a variable initialized to 0 and updated at each iteration to the value of <update>. Finally, the return value of the loop, once any of the iterators can go no further (in this case, once we have run through all elements of LIST), is specified with the `=>' token followed by the expression to return -- here, the number of matching items we counted.
We desired a way to write *what* iteration, and not *how*, and LOOP lets us do so. If what we want is to iterate over each element of a vector, we need simply substitute IN-VECTOR for IN-LIST. If what we want is to iterate for each item read from an input port, we need simply substitute IN-PORT.
(This code still must specify information about what type of sequence we are iterating through -- IN-VECTOR vs IN-LIST &c. --, and we might wish to elide that as well and substitute IN-COLLECTION or IN-SEQUENCE, or even IN-ASSOCIATION for alists & hash tables &c., but we do not provide any built-in mechanism for doing so. The author has, however, designed a generic collection iteration protocol, based on Dylan's, for which there are such LOOP iterators as IN-COLLECTION &c. They fall outside the scope of this introduction, however.)
Explicit Control of Iteration
An astute reader might wonder why we need a `=>' token to indicate the final value returned by the loop; a differently astute reader might wonder how LOOP can possibly be as general as DO or named LET without any program body inside the loop. We can, in fact, write a program body, and distinguishing the final expression from the body is precisely the reason for the `=>' token. Indeed, not only can we write an arbitrary program body as in DO, but we can even give a name to the loop and specify where to continue to the next iteration as we please. For example, if we wished to *find* a matching item, rather than to count all of them, we could write a procedure to do so thus:
(define (find-matching-item list if-absent predicate) (loop continue ((for item (in-list list))) => (if-absent) (if (predicate item) item (continue))))
In this example, the loop continues only if ITEM fails to satisfy PREDICATE. Only if we have run through every element of LIST, and found none satisfying PREDICATE, shall we evaluate the final expression and call IF-ABSENT. Again, we can always substitute IN-VECTOR, IN-PORT, or any other iterator as we please, for IN-LIST here.
In fact, not only can we give a name to the loop and invoke it explicitly to continue iteration, but we are not limited to calling it in tail positions in the loop body. We could, for example, write a (single-list) MAP procedure thus:
(define (map procedure list) (loop recur ((for element (in-list list))) => '() (cons (procedure element) (recur))))
If there are no more elements in LIST, then the loop yields the empty list; otherwise, it recursively processes the remaining elements of the list, and constructs a pair of the mapped ELEMENT and all the mapped remaining elements.
Finally, we need not supply a final expression at all, in which case what the loop returns by default is unspecified. For example, to write each element of a list LIST followed by a newline, we could use the following program:
(loop ((for element (in-list list))) (write element) (newline))
Updating Loop Variables
A third sort of astute reader may wonder still how LOOP can be as expressive and flexible as named LET, which allows for the updates of loop variables (which we introduced in LOOP using WITH clauses) when the loop procedure is called. In fact, LOOP is a drop-in replacement for named LET, provided that we use the loop procedure only in operator positions, because it is not actually a procedure but a macro, for reasons that will become apparent shortly. The following procedure's meaning is invariant whether we write LOOP or LET:
(define (reverse-map procedure list) (let continue ((list list) (result '())) (if (null-list? list) result (continue (cdr list) (cons (procedure (car list) result))))))
As loop clauses, (<variable> <initializer>) and (WITH <variable> <initializer>) are both equivalent to named LET bindings, and we can update the variables by passing new values to the named loop. As with named LET, the update is purely functional; that is, it works by rebinding, *not* by assignment.
But one of the primary gripes one hears with named LET is the distance between the list of loop variables and the expressions by which they are updated. The matter is not helped by the only association between the two: position. In LOOP, on the other hand, we may choose whichever of positional and named update we please. For example, if we wish to partition a list by a predicate by adding an element to one of two resultant lists, and leaving the other untouched, we might write this (following SRFI 1's specification of the procedure):
(define (partition predicate list) (loop continue ((for element (in-list list)) (with satisfied '()) (with unsatisfied '())) => (values (reverse satisfied) (reverse unsatisfied)) (if (predicate element) (continue (=> satisfied (cons element satisfied))) (continue (=> unsatisfied (cons element unsatisfied))))))
Iterator-Supplied Loop Variables
Sometimes it is not enough just to step through each element of a list or a vector; we may need to find where in the sequence we are, or move to another position. Some iterators, therefore, can expose the underlying loop variables, specific to each iterator. IN-LIST exposes the pair whose car is the current element of the list. It also ensures that modifying the cdr of that pair is safe and does not affect the iteration, so we can reverse a list destructively like so:
(define (reverse! list) (loop ((for element pair (in-list list)) (with tail '() pair)) => tail (set-cdr! pair tail)))
We step through the list, letting TAIL be () initially and then the last pair we encountered; for each pair we encounter, we set its cdr to be the current tail. We happen not to use ELEMENT here, although we could easily turn this into a REVERSE-MAP! procedure like so:
(define (reverse-map! procedure list) (loop ((for element pair (in-list list)) (with tail '() pair)) => tail (set-car! pair (procedure element)) (set-cdr! pair tail)))
Often, in processing certain sorts of lists, we find nested sublists which we want to coalesce into the containing list; Scheme's BEGIN works like this. We can flatten as we go along by updating the loop variable of IN-LIST explicitly, rather than simply letting it to be implicitly updated to the cdr of the current pair:
(define (flatten-begins list) (loop continue ((for element pair (in-list list)) (with subforms '())) => (reverse subforms) (if (and (pair? element) (eq? 'BEGIN (car element))) (continue (=> pair (append (cdr element) (cdr pair)))) (continue (=> subforms (cons element subforms))))))
In the case of vector iterators, the underlying loop variable is the index into the vector. For instance, to shuffle the contents of a vector by the Fisher-Yates method, given a procedure (RANDOM-INTEGER <exclusive upper bound>), we might write this:
(define (shuffle-vector! vector) (loop ((for element i (in-vector vector))) (let ((j (random-integer (+ i 1)))) (vector-set! vector i (vector-ref vector j)) (vector-set! vector j element))))
There are two iterators supplied for iterating over numbers: UP-FROM and DOWN-FROM. (These violate the convention of the prefix of `IN-' on iterator names, but after long and arduous discussion of the matter, the author and his minions concluded that these are better than any of the alternatives that were considered.) For example, to construct a list of a given length that contains the result of applying a given procedure to each respective index in the list, we could write this:
(define (list-tabulate length procedure) (loop ((for index (up-from 0 (to length))) (with list '() (cons (procedure index) list))) => (reverse list)))
NOTE: The keyword TO is part of the syntax of UP-FROM which specifies an upper bound for the iteration. It is *not* a separate procedure or special operator.
We can step by a specified increment as well by adding an on argument (BY <increment>); for instance, to produce a list of the even integers less than N, we might write this:
(loop ((for integer (up-from 0 (to N) (by 2))) (with evens '() (cons integer evens))) => (reverse evens))
We can also omit TO in order to get an unbounded iteration, which presumably we intend to terminate by some other means. For example, to find the length of a list (which we assume to be finite and non- circular), we might write the following:
(define (unsafe-length list) (loop ((for element (in-list list)) (for length (up-from 0))) => length))
(FOR <variable> (UP-FROM <start> [(TO <end>)] [(BY <step>)])) iterates with <variable> initially bound to <start> and then updated at each iteration to <step> added to its current value, until its value has reached or exceeded <end>. DOWN-FROM works slightly differently, however, and illustrates an important point in the design of LOOP, which I shall give its own paragraph to emphasize its significance.
Lower bounds are inclusive. Upper bounds are exclusive.
This means that, while UP-FROM includes the start and excludes the end, DOWN-FROM, which has the same syntax, excludes the start and includes the end, if it is reached. (It may be the case that <start> - <end> is not an even multiple of <step>, in which case <end> will be excluded anyway.) This bears repeating.
Lower bounds are inclusive. Upper bounds are exclusive.
Whether we are iterating forward or backward, over ranges of numbers or elements of vectors (with IN-VECTOR-REVERSE, which is just like IN-VECTOR except that it iterates over the reverse sequence of elements), the lower bound of the iteration is inclusive and the upper bound of the iteration is exclusive.
This has an important consequence for the nature of the loop variable in DOWN-FROM (or IN-VECTOR-REVERSE, or any other iteration of numbers from positive infinity toward negative infinity): the decrement is subtracted from the loop variable's value on *entry* to the loop. This means that if the variable is initialized to 0, its first value in the loop body will be -1, not 0; and if it is updated to N, the next value that it will have in the loop body (if it satisfies no termination condition) will be N - 1.
Let me say it one more time, just to be sure that the reader understands.
Lower bounds are inclusive. Upper bounds are exclusive.
The reason for this design decision is that reverse iteration is extremely counter-intuitive, and any inconsistency in convention is sure to be an eternal source of off-by-one errors. This is not to say that off-by-one errors are impossible, but that they are much easier to accidentally stumble upon in reverse iteration. The use of inclusive lower bounds and exclusive upper bounds is already very much established in Scheme, and it has some convenient properties for many programs.
Where one must use any other sort of bounds, because such bounds are the exception rather than the rule, programmers should draw attention to the fact, because this is so easy to mix up inadvertently. If possible, use inclusive lower bounds and exclusive upper bounds, but if it is truly necessary to use any other class of bounds, they should be expressed explicitly using WITH, LET, and UNTIL clauses. For example, to loop with inclusive bounds on both ends, write
(loop (... (with i start (+ i step)) (until (> i stop)) ...) ...).
At this point, the only iterators we have seen are those that step through sequences. Every time we have accumulated, say, a list of values as the result of a loop, we have always had to write it with a manual loop variable, a manual CONS, and a manual REVERSE at the end. We can simplify this by using accumulating iterators such as LISTING. If we use LISTING, (single-list) MAP becomes:
(define (map procedure list) (loop ((for element (in-list list)) (for result (listing (procedure element)))) => result))
Accumulators, as the reader has probably noticed, are not named according to the `IN-' convention for iterators. Instead, they are named with an `-ING' suffix. Furthermore, most of them follow the same basic structure:
(FOR <accumulated-result> (<accumulating> [(INITIAL <initial value>)] <datum or conditions>))
<Initial value> is something to seed the accumulation with. In LISTING, for example, it is the tail of the list after all the accumulated elements.
There are several ways to express the accumulation of a datum. The simplest is with an expression whose value is always accumulated; we showed that above in the most recent definition of MAP. We may also accumulate based on the value of a condition; for example, to filter from a list only those items that satisfy a predicate, we might write the following:
(define (filter predicate list) (loop ((for element (in-list list)) (for result (listing element (if (predicate element))))) => result))
Sometimes the value of the condition is also needed to compute the datum to accumulate, so we may also express the accumulation with a notation much like COND's `=>' clause, and accumulate a list of all non-false values yielded by a procedure applied to each element of a list:
(define (filter-map procedure list) (loop ((for element (in-list list)) (for result (listing (procedure list) => (lambda (x) x)))) => result))
Finally, the `=>' clause may be preceded by a test, to support multiple return values and to allow the passage of #F to the procedure following the `=>', as in SRFI 61's extended COND. An example of this is left to the imagination of the reader.
Along with LISTING, there are three other similar list-related accumulators: LISTING-REVERSE, APPENDING, and APPENDING-REVERSE. For elements elt_0, elt_1, ..., elt_n-1, elt_n, LISTING yields
(LIST elt_0 elt_1 ... elt_n-1 elt_n);
(LIST elt_n elt_n-1 ... elt_1 elt_0);
(APPEND elt_0 elt_1 ... elt_n-1 elt_n);
and APPENDING-REVERSE yields
(APPEND (REVERSE elt_n) (REVERSE elt_n-1) ... (REVERSE elt_1) (REVERSE elt_0)),
although it can be implemented considerably more efficiently than (APPENDING (REVERSE ...)).
There are also two destructive list-related accumulators: LISTING! and LISTING-INTO!. While LISTING must build up a list in reverse and then reverse it at the end, which requires twice as many cons cells as the result uses, LISTING! builds up a list forward by setting the cdr of each previous pair constructed, and requires only one intermediate cons cell, the initial one. LISTING-INTO! uses no intermediate cons cells because it is seeded with an initial pair to set the cdr of:
(let ((x (cons 'initial '()))) (loop ((for i (up-from 0 (to 10))) (for result (listing-into! x (* i i) (if (even? i)))))) x) ;Value: (INITIAL 0 4 16 36 64)
LISTING! and LISTING-INTO! are dangerous, however, because they are non-reentrant -- that is, it is unsafe to use them where non-local control transfers to reenter the loop body may occur, or where the loop may be continued at multiple different points.
Other LOOP Clauses: Local Bindings, Explicit Termination Conditions
So far we have seen FOR and WITH clauses in LOOP expressions -- FOR to dispatch to some specialized iterator, and WITH to express manual loop variables. These are the main clauses of interest, but there are four others: WHILE, UNTIL, LET, and LET-VALUES.
WHILE and UNTIL allow the programmer to specify termination conditions for the loop following which the final expression will be evaluated. Although it is possible to explicitly continue the loop in one arm of a conditional, this requires some boilerplate to invoke the final expression as well in the other arm of the conditional. For example, this procedure reads lines from an input port until an empty line, and returns a list of the non-empty lines, assuming a READ-LINE procedure:
(define (read-non-empty-lines input-port) (loop ((for line (in-port input-port read-line)) (until (string-empty? line)) (for lines (listing line))) => lines))
Iteration will terminate either when READ-LINE returns an EOF object, tested implicitly by IN-PORT, or when the line read is an empty string, tested explicitly by the user. In either case, LISTING will bind LINES to the list of lines in the final expression, which we return there.
(LET <variable> <expression>) binds <variable> to <expression>, after it has been determined that none of the FOR iterators have run out, and after all of their variables have been bound. <Variable> is visible inside the loop body. (LET-VALUES <bvl> <producer>) is similar, but it binds multiple values, rather than a single value; (LET <variable> <expression>) is equivalent to (LET-VALUES (<variable>) <expression>).
All variables bound by the iterators are visible in the right-hand sides of LET & LET-VALUES clauses, and all those variables as well as all variables on the left-hand sides of LET & LET-VALUES clauses are visible in the user termination conditions -- those supplied by WHILE and UNTIL -- and in the loop body.
This concludes the introduction to foof-loop. The rest of this document contains a reference manual for foof-loop as well as some further material.
LOOP forms have the following structure:
(LOOP [<loop-name>] (<loop-clause> ...) [=> <final-expression>] [<body>])
<loop-clause> may have one of the following forms:
(<variable> <initializer> [<update>]) (WITH <variable> <initializer> [<update>]) (FOR <variable> ... (<iterator> <argument> ...)) (LET <variable> <expression>) (WHILE <condition>) (UNTIL <condition>)
A loop operates by stepping through each iterator in parallel until one of them runs out; that is, when one of its termination conditions holds. When this happens, if the <final-expression> is supplied, it is evaluated in an environment with the final values of all the loop variables bound, as well as any final bindings supplied by the iterators.
If <loop-name> is not supplied, the loop implicitly proceeds at the end of the loop body, which may be empty. If <loop-name> is supplied, it is bound to a macro that the loop body must call to explicitly proceed the loop. This macro may be called arbitrarily many times, in tail or non-tail positions; it is appropriate for recursive loops. (Note, though, that some iterators are non-reentrant, which makes multiple calls unsafe. These iterators are discouraged, however, and by convention are marked with a suffix of `!'; examples of non-reentrant iterators include LISTING! and LISTING-INTO!.)
The macro <loop-name> accepts arguments by which to update the loop variables. Arguments may be passed by position, or by name, or both. Positional arguments come first; they correspond with the first loop variables in the same positions specified in WITH clauses. Arguments passed by name to update the named loop variables are written with the form (=> <variable> <update>).
All arguments are optional; positional arguments supply values for the first arguments that are supplied. All loop variables have default update values, if they are not explicitly updated when the macro <loop-name> is invokes: those created by (WITH <variable> <initializer> <update>) clauses are by default updated to the value of <update>, those created by (WITH <variable> <initializer>) remain the same at the next iteration, and those created by iterators are updated by means that the iterator supplies implicitly.
(loop continue ((with a 0) (with b '() ;; Default update for B: (cons a b)) (for c d (in-list '(i j k p q r)))) (write (list a b c d)) (newline) (continue (+ a 1) ;First position, for the first (WITH A 0). (=> d (cddr d)))) ;D is from (FOR C D (IN-LIST ...)). ;; Output: ;(0 () i (i j k p q r)) ;(1 (0) k (k p q r)) ;(2 (1 0) q (q r))
In the loop, there are several classes of variable bindings that may arise:
- /Loop variables/ are variables that are initialized at the beginning of the loop and updated at every iteration non- destructively -- that is, by rebinding through procedure call, not by assignment. Loop variables are bound in the user termination conditions, the user variable expressions, the loop body, and the final expression.
- /Entry variables/ are variables whose values are computed when each iteration of the loop is entered, before the termination conditions have been tested. They cannot be updated like loop variables; each is always initialized on loop entry to the value of the same expression. This means that the expression need not be duplicated at the beginning of the loop and when the loop is continued. Entry variables are bound in the user termination conditions, the user variable expressions, the loop body, and the final expression.
- /Body variables/ are variables whose values are computed at each iteration of the loop only if iteration has not been terminated from iterator termination conditions. Body variables are bound in the user termination conditions, the user variable expressions, and the loop body.
- /User variables/ are variables whose values are computed after all iterator termination conditions have failed, and after the body variables have been bound, but before the user termination conditions have been tested. User variables are introduced by LET and LET-VALUES clauses in LOOP. User variables are bound only in the user termination conditions and the loop body.
- /Final variables/ are variables that are computed once at the end of the loop, once any termination condition has been met. Final variables are bound only in the final expression.
The loop iteration is controlled by the loop clauses:
(WITH <variable> <initializer> [<update>])
Introduces a loop variable named <variable>, initialized to the value of <initializer>. On each iteration of the loop, the variable may be explicitly updated; if it is not explicitly updated, but there was an <update> expression supplied, then the variable is updated to the value of that expression in an environment where all the loop variables, entry variables, body variables, and user variables are bound; otherwise, <variable> remains unchanged at the next iteration of the loop.
(<variable> <initializer> [<update>])
Identical to (WITH <variable> <initializer> [<update>]). This alternative form is provided so that the LOOP syntax is compatible with named LET, and very similar to DO. For instance,
(let next ((x 0)) (if (< x 10) (begin (write x) (next (+ x 1))))), (loop next ((x 0)) (if (< x 10) (begin (write x) (next (+ x 1))))), (do ((x 0 (+ x 1))) ((>= x 10)) (write x)), and (loop ((x 0 (+ x 1)) (until (>= x 10))) (write x))
are all equivalent.
The WITH syntax is preferred; it has greater visual distinction, and it it is easier for pretty-printers to indent better with less effort.
(FOR <variable> ... (<iterator> <argument> ...))
General iteration form. Although <iterator> can do anything with the <variable>s and <argument>s, by covention the <variable>s are variables to be bound in the loop body or the final expression, and the <argument>s are expressions or directives controlling the iteration.
No guarantees are made preserving the ordering of the FOR clauses; it is an error to rely on any properties of the ordering.
(LET <variable> <expression>) (LET-VALUES <bvl> <expression>)
These introduce user variable bindings. LET binds <variable> to the single value of <expression> inside the body bindings and before the user termination conditions and loop body. LET-VALUES binds all the variables in the bound variable list <bvl> to the respective values in the multiple values returned by <expression>. (LET <variable> <expression>) is equivalent to the clause (LET-VALUES (<variable>) <expression>).
(WHILE <condition>) (UNTIL <condition>)
User-supplied termination conditions. The termination conditions are evaluated in an environment where all of the loop variables, entry variables, body variables, and user variables are bound. All of the iterator-supplied termination conditions are guaranteed to have failed.
(LAZY-LOOP <loop-name> (<loop-clause> ...)
=> <final-expression> <body>)
Lazily evaluated variant of LOOP, dependent upon SRFI 45 (Primitives for Expressing Iterative Lazy Algoritms). This is equivalent to a LOOP form wrapped in a LAZY form with all calls to the <loop-name> also wrapped in LAZY forms. (See the SRFI 45 for details on LAZY.) The loop's body and final expression both must yield promises in the sense of DELAY or LAZY.
;;; Assume SRFI-40-style streams (see SRFI 40, A Library of ;;; Streams, for details, particularly on even versus odd streams). ;;; Also assume an IN-STREAM iterator. (define (stream-filter predicate stream) (lazy-loop filter ((for element (in-stream stream))) => stream-nil (if (predicate element) (stream-cons element (filter)) (filter)))) ;;; The above definition of STREAM-FILTER is equivalent to the ;;; following: (define (stream-filter predicate stream) (lazy (loop filter ((for element (in-stream stream))) => stream-nil (if (predicate element) (stream-cons element (lazy (filter))) (lazy (filter))))))
The laziness annotations are added around the whole loop and calls to the loop, rather than the loop body, in order to delay any outer bindings of the loop (for example, all expressions supplied as arguments to loop iterators) and any loop variable updates when recursively invoking the loop.
When a LOOP form is fully processed and expanded, the result resembles this:
(LET-VALUES ((<outer-bvl> <outer-producer>) ...) (DEFINE (LOOP-PROCEDURE <loop-variable> ...) (LET-VALUES ((<entry-bvl> <entry-producer>) ...) (LET ((FINISH (LAMBDA () (LET-VALUES ((<final-bvl> <final-producer>)) <final-expression>)))) (IF (OR <termination-condition> ...) (FINISH) (LET-VALUES ((<body-bvl> <body-producer>) ...) (LET-VALUES ((<user-bvl> <user-producer>) ...) (IF (OR <user-termination-condition> ...) (FINISH) <loop-body>))))))) (LOOP-PRODEDURE <loop-initializer> ...))
Each <*-bvl> is a bound variable list supplied by an iterator or by the user. Each <*-producer> is an expression yielding multiple values to be distributed to the corresponding bound variable list. The <user-bvl>s are from user-supplied LET and LET-VALUES clauses in loops; the <user-termination-conditions> are from user-supplied WHILE and UNTIL clauses in loops.
LOOP-PROCEDURE is a name invisible to all of the forms involved in the loop except for the macro to which the loop's name is bound in the loop body. This macro processes its positional and named arguments to form a call to LOOP-PROCEDURE by supplying default update expressions according to WITH clauses or what the iterators provide.
One of the primary goals of LOOP is to generate efficient code. It should be possible to write floating-point matrix multiplication routines as fast with LOOP as by hand, and perhaps faster because the details of the matrix operations can be swept under the rug of a matrix iterator. There may be some questions, however, of the efficiency of the code that LOOP generates.
Use of Assignment
Many other loop macros, such as Common Lisp LOOP, SRFI 42 Eager Comprehensions, and others, update loop variables by assignment, rather than rebinding. Even hand-written loops using named LET or DO are written with assignment, because it is more convenient with large numbers of loop variables which would otherwise have to be enumerated repeatedly for every recursive call to the loop procedure.
This is a death-knell for efficient code, especially in many highly optimizing Scheme compilers, because most Scheme compilers generate heap-allocated cells to store mutable variables. Compilers' lives are simplified by assuming that all variables are immutable; they are free to move references to the variables about as they please. The analysis necessary to transform a program with SET! into a program without SET! but with explicit, heap-allocated cells is trivial, and allows compilers to make the assumption that variables are immutable; subsequently eliding those cells anywhere but in closure references is considerably harder, and not worth the effort for idiomatic Scheme code.
It may be the case that a pure tree interpreter can execute assignment- ridden code more efficiently than procedure calls with large numbers of arguments, because of the overhead of recursively processing each of those arguments. It is also the case that tree interpreters are slow anyway.
Using assignment to update loop variables is a losing proposition in the end. LOOP never introduces assignments to mutable variables; all loop variables are updated by recursive procedure call, i.e. rebinding. Custom iterators are discouraged from doing this as well, although nothing prevents it.
Local Closure for FINISH
Because the construction of a closure for FINISH is costly in many Scheme implementations which have extremely naive compilers, the current implementation of LOOP creates the closure only if necessary to preserve scoping; it will integrate the body of FINISH if and only if there are no user termination conditions, or there are no iterator- supplied termination conditions, iterator-supplied body bindings, and user-supplied bindings around the body.
However, programmers who are worried about the closure for FINISH would most likely be better served by a Scheme implementation that goes to the slighest effort necessary to recognize the local closure and fold the unconditional jumps into the code for the conditional.
Multiple Return Values
Opponents of multiple return values may object to the pervasion of LET-VALUES throughout the expansion and the inefficiency thereof in Scheme implementations that do not subscribe earnestly to the doctrine of multiple return values. LET-VALUES is the only way to allow iterators to bind multiple variables that may have complex inter- dependencies without the indirection and cost of intermediate data structures.
There is no cost on Scheme systems with efficient support for multiple return values. There will be a cost in Scheme systems with costly multiple return values, whether LOOP uses multiple return values, and incurs the cost implicitly, or whether LOOP generates code to construct intermediate data structures. But on Scheme systems with efficient multiple return value support, LOOP incurs no cost. Essentially, by using multiple return values, LOOP defers the question of the cost to the Scheme implementation, rather than incurring cost *wherever* it is used by generating code that *always* constructs intermediate data structures.
Furthermore, for most cases, there is only a single return value anyway. LET-VALUES itself need not incur whatever overhead of multiple return values there is simply by recognizing single-value cases. Unfortunately, the reference implementation of SRFI 11 does not do this, and will generate CALL-WITH-VALUES invocations for every binding clause in the LET-VALUES. Here is an implementation that avoids this:
Index Variables and Generic Arithmetic
Many Scheme implementations provide specialized arithmetic for limited integers fixed to a width that fits inside machine registers; using this arithmetic is much faster than generic arithmetic, where safety and overflow are not issues. These limited integers, often called `fixnums', are typically guaranteed to be the domain of vector indices.
LOOP generates generic arithmetic for index variables, in iterators such as UP-FROM and IN-VECTOR. There are a number of reasons for this:
- There is no portable specialized arithmetic for LOOP to use.
- LOOP is not meant only for code inside inner loops that must be as fast as possible; programs written with LOOP should not fail obscurely because they passed some iterator a non-fixnum value where a fixnum was expected.
- The set of built-in iterators would be complicated by support for fixnum arithmetic. Do we add, for example, UP-FROM-FIXNUM, UP-FROM-UNSIGNED-FIXNUM, and so on? Do we have IN-VECTOR, IN-VECTOR/UNSAFE, and so on?
- Programs that use fixnum arithmetic are harder to understand, because with fixnum arithmetic comes a plethora of subtle implications on the program. For example, the usual formula for finding the average of two integers -- (QUOTIENT (+ A B) 2) -- is no longer valid under fixnum arithmetic, because the sum may overflow the bounds of fixnums. A program that once used Scheme vectors but has been converted to use sparse vectors, which allow arbitrary integer indices, may no longer be able to use fixnum arithmetic.
It is up to the compiler to specialize the generic arithmetic where appropriate, such as the indices inside IN-VECTOR, and to insert pre- loop type and range checks where appropriate. Programmers who use compilers incapable of this should consider finding better compilers, or improving the ones they use.
Common Lisp, by comparison, has no specialized arithmetic either; rather, it has programmer-supplied declarations regarding the types of variables' values. This is even more specific than specialized arithmetic, and does not require the introduction of N new arithmetic routines for every kind of specialized arithmetic, so it composes with macros well. This would be a better way to go about improving the efficiency of arithmetic, and it would require minimal changes to LOOP -- at most, support for iterators to supply declarations in the loop.
Iterating across Lists
(FOR <element> [<pair>] (IN-LIST <list> [<successor>]))
Iterates for each successive pair in <list>, binding the variable <pair>, if supplied, to that pair, and the variable <element> to the car of that pair. The successor of the pair is taken by applying <successor> to the current pair. The iteration runs out when the successor is no longer a pair.
<Successor>, if supplied, must be a unary procedure. The default value of <successor> is CDR. <List> and <successor> are evaluated once before the loop begins.
<Pair> is a loop variable; its value in the final expression is unspecified.
<Element> is a body variable.
The next pair is taken before the loop body, so the loop body may reliably apply SET-CDR!, or its equivalent for the given successor procedure, to the pair, without altering the iteration. Note, though, that <successor> will be called irrespective of whether <pair> is updated explicitly.
(loop ((for a (in-list '(a b c))) (for b (in-list '(p q r)))) (write (list a b)) (newline)) ;; Output: ;(a p) ;(b q) ;(c r) (loop ((for x (in-list '(1 2 3))) (with y 0 (+ y (* x 2)))) => y) ;Value: 12
;;; PAIR-FOLD from SRFI 1.
(define (pair-fold kons knil list) (loop ((for elt pair (in-list list)) (with knil knil (kons pair knil))) => knil))
(FOR <elements> [<pairs>] (IN-LISTS <lists> [<tail>]))
Iterates for each successive pair in each parallel element of <lists>, binding the variable <pairs>, if supplied, to be a list of those pairs, and the variable <elements> to a list of the cars of those pairs, prepended to the value of <tail> evaluated at each iteration. If no <tail> is supplied, its default value is the empty list. The successor of each pair is taken by CDR. The iteration runs out when any successor is no longer a pair.
<Lists> must be a non-empty list; it is an error if otherwise. It is evaluated once before the loop begins.
<Pairs> is a loop variable; its value in the final expression is unspecified.
<Elements> is a body variable.
The cdrs are taken before the loop body, so the loop body may reliably apply SET-CDR! to any of the pairs without altering the iteration.
;;; Transpose a matrix represented as a nested list.
(loop ((for columns (in-lists '((a b c) (d e f)))) (with rows '() (cons columns rows))) => rows) ;Value: ((c f) (b e) (a d)) (define (every? predicate list . lists) (loop proceed ((for elts (in-lists (cons list lists)))) (and (apply predicate elts) (proceed)))) (define (any predicate list . lists) (loop proceed ((for elts (in-lists (cons list lists)))) (or (apply predicate elts) (proceed))))
;;; SRFI 1's signature for FOLD. Notice that KNIL is the *last* ;;; argument to the FOLD procedure, so we have to tack it on to the ;;; end of the argument list.
(define (fold kons knil list . lists) (loop ((with knil knil (apply kons arguments)) (for arguments (in-lists (cons list lists) (cons knil '())))) => knil))
Iterating across Vectors and Strings
(FOR <element> [<index>] (IN-VECTOR <vector> [<low> [<high>]])) (FOR <element> [<index>] (IN-STRING <string> [<low> [<high>]])) (FOR <element> [<index>] (IN-VECTOR-REVERSE <vector> [<high> [<low>]])) (FOR <element> [<index>] (IN-STRING-REVERSE <string> [<high> [<low>]]))
These iterate for each successive index in the vector or string, respectively, binding the variable <index>, if supplied, to that index, and the variable <element> to the element at that index.
IN-VECTOR and IN-STRING run from <low>, inclusive, to <high>, exclusive; IN-VECTOR-REVERSE and IN-STRING-REVERSE run from <high>, exclusive, to <low>, inclusive.
<Vector> or <string> must be a vector or a string, respectively. <Low> and <high>, if supplied, must be exact, non-negative integers, and valid bounds for <string> or <vector>. The default values of <low> and <high> are 0 and the length of the vector or string, respectively. <Vector>, <string>, <low>, and <high> are all evaluated once before the loop begins.
<Index> is a loop variable; its value in the final expression is the value of the last bound of iteration (the value of <high> for IN-VECTOR and IN-STRING; the value of <low> for IN-VECTOR-REVERSE and IN-STRING-REVERSE). With the reverse iterators, IN-VECTOR-REVERSE and IN-STRING-REVERSE, explicitly updating <index> when continuing the loop will cause the next value seen by the loop body to be one *less* than what the index was updated to. That is, on loop entry, <index> is assumed to be an exclusive upper bound, and so it is tested for termination and then decremented to find the actual index.
<Element> is a body variable.
Because the lower bound is inclusive and the upper bound is exclusive, for IN-VECTOR and IN-STRING, the first value of <index> is <low>, and the value of <index> in the final expression is <high>; while for IN-VECTOR-REVERSE and IN-STRING-REVERSE, the first value of <index> in the loop body is one less than <high>, and the value of <index> in the final expression is <low>.
(loop ((for a i (in-vector '#(foo bar baz))) (for b j (in-string-reverse "abcdefghi" 6 3))) => (list i j) (write (list a i b j)) (newline)) ;Value: (3 3)
;; Output: ;(foo 0 f 5) ;(bar 1 e 4) ;(baz 2 d 3)
;;; Find the first index of an element satisfying a predicate.
(define (vector-index vector predicate) (loop proceed ((for elt index (in-vector vector))) (if (predicate elt) index (proceed))))
;;; Copy characters from one string to another.
(define (string-copy! target tstart source sstart send) (loop ((for char (in-string source sstart send)) (with index tstart (+ index 1))) (string-set! target index char))) (loop proceed ((for v i (in-vector '#(a b c d e f g h i j k l m n o p q r s t u v w x y z)))) (write (list v i)) (newline) (proceed (=> i (+ 1 (* i 2))))) ;; Output: ;(a 0) ;(b 1) ;(d 3) ;(h 7) ;(p 15)
Iterating through Input
(FOR <datum> (IN-PORT <input-port> [<reader> [<eof?>]]))
Iterates for successive data read from <input-port>, binding the variable <datum> to the datum read by applying the value of <reader>. Iteration terminates when the datum read satisfies the predicate <eof?>.
<Input-port> must be an input port. <Reader> and <eof?>, if supplied, must be unary procedures. The default value of <reader> is READ-CHAR, and the default value of <eof?> is EOF-OBJECT?. <Input-port>, <reader>, and <eof?> are all evaluated once before the loop begins.
<Datum> is an entry variable.
(define (read-line input-port) (let ((initial (peek-char input-port))) (if (eof-object? initial) initial (loop ((for char (in-port input-port)) (until (char=? char #\newline)) (with chars '() (cons char chars))) => (list->string (reverse chars)))))) (define (read-all input-port) (loop ((for datum (in-port input-port read)) (with data '() (cons datum data))) => (reverse data)))
(FOR <datum> (IN-FILE <pathname> [<reader> [<eof?>]]))
Opens an input port from the file named by <pathname>, iterates as with IN-PORT from the newly opened input port, and closes the input port after iteration terminates.
<Datum> is an entry variable.
(define (read-lines-from-file pathname) (loop ((for line (in-file pathname read-line)) (with lines '() (cons line lines))) => (reverse lines))) (define (file-length pathname) (loop ((for char (in-file pathname)) (with count 0 (+ count 1))) => count))
Iterating over Numbers
(FOR <number> (UP-FROM <low> [(TO <high>)] [(BY <step>)])) (FOR <number> (DOWN-FROM <high> [(TO <low>)] [(BY <step>)]))
Iterates for each successive number starting at <low> and incrementing by <step> at every iteration, for UP-FROM; or starting at <high> and decrementing by <step> at every iteration, for DOWN-FROM. <Number> is bound to the current number. If the TO parameter is supplied, iteration terminates when the number exceeds the given bound.
<Low>, <high>, and <step> must all be exact numbers. If a TO parameter is supplied, they must furthermore all be real numbers. They are all evaluated once before the loop begins.
<Number> is a loop variable, *and*, in DOWN-FROM, an entry variable. If it is updated explicitly, its value at the next iteration in the loop's body will be one <step> *less* than what it was updated to explicitly. Because of this, although it is initialized to <high>, it will never have a value of <high> (unless it is explicitly updated to the value of (+ <high> <step>)).
(define (iota count start step) (loop ((for n (up-from 0 (to count))) (for result (listing (+ start (* n step))))) => result))
;;; Note that the following is incorrect, if the arguments to IOTA ;;; may be inexact numbers:
(define (iota count start step) (loop ((for n (up-from start (to (+ start (* count step))) (by step))) (for result (listing n))) => result)) (define (sieve n) (let ((table (make-bit-string (- n 2) #t))) (define (prime? k) (bit-string-ref table (- k 2))) (define (not-prime! k) (bit-string-clear! table (- k 2))) (define (purge-multiples i) (loop ((for j (up-from (* i i) (to n) (by i)))) (not-prime! j))) (loop proceed ((for i (up-from 2 (to n))) (with primes '())) => (reverse primes) (if (prime? i) (begin (purge-multiples i) (proceed (=> primes (cons i primes)))) (proceed))))) (define (vector-quick-sort! elt< vector start end) (loop sort ((start start) (end end)) (if (< 1 (- end start)) (let ((pivot (select-pivot vector start end))) (loop continue ((i start) (j end)) (let ((i (loop scan ((for i (up-from i))) (if (elt< (vector-ref vector i) pivot) (scan) i))) (j (loop scan ((for j (down-from j))) (if (elt< pivot (vector-ref vector j)) (scan) j)))) (if (< i j) (begin (vector-exchange! vector i j) (continue (+ i 1) j)) (begin (sort (=> end i)) (sort (=> start (+ j 1)))))))))))
Accumulating iterators all follow the same basic forms: accumulate each value of an expression into a result, with an optional condition for the accumulation. Accumulators differ according to the way that they accumulate, whether the intermediate result of accumulation is available in the body of the loop or whether only the fully accumulated result is available in the final expression, and whether the accumulator accepts an optional initial value argument.
The syntax of accumulators is as follows:
(FOR <result> (<accumulating> <datum>))
<Datum> is evaluated at each iteration in an environment with all loop, entry, and body variables bound from the current iteration.
- Conditional -- evaluate and accumulate <datum> if and only if <condition> is not false:
(FOR <result> (<accumulating> <datum> (IF <condition>)))
<Datum> and <condition> are evaluated at each iteration in an environment with all loop, entry, and body variables bound from the current iteration.
- Conditional -- if <condition> is not false, apply the value of <mapper> to the value of <condition> to yield the value to accumulate:
(FOR <result> (<accumulating> <condition> => <mapper>))
<Condition> and <mapper> are evaluated at each iteration in an environment with all loop, entry, and body variables bound from the current iteration. <Mapper> is evaluated only if <condition> is not false.
- Generalized conditional, as in SRFI 61-style `=>' clauses in COND -- evaluate <generator>, and if <tester> applied to all the values returned by <generator> returns true, accumulate the value returned by <mapper> when applied to all the values returned by <generator>:
(FOR <result> (<accumulating> <generator> <tester> => <mapper>))
<Generator>, <tester>, and <mapper> are evaluated at each iteration in an environment with all loop, entry, and body variables bound from the current iteration. <Mapper> is evaluated only if <test> yields true.
Some accumulators also have initial values, supplied with an optional (INITIAL <initial-value>) form before any of the other arguments to the accumulator.
Accumulators that are allowed to operate by destructively modifying internal state are marked with `!' suffixes. These accumulators are /non-reentrant/, which means that it is unsafe to write loops that may be recursively run more than once, and that non-local re-entry may destroy values returned from the loop.
(FOR <result> (LISTING [(INITIAL <tail>)] ...)) (FOR <result> (LISTING-REVERSE [(INITIAL <tail>)] ...))
These each produce lists of the accumulated data with a final cdr of <tail>. LISTING yields a list in the order that the data were produced; LISTING-REVERSE yields a list in the reverse order of that.
<Tail>, if supplied, is evaluated once before the beginning of the loop. The default value of <tail> is the empty list.
For LISTING, <result> is a final variable.
For LISTING-REVERSE, <result> is a loop variable; its value in the final expression is the accumulated list in reverse.
(define (list-tabulate length procedure) (loop ((for i (up-from 0 (to length))) (for list (listing (procedure i)))) => list)) (define (take list count) (loop ((for i (up-from 0 (to count))) (for elt (in-list list)) (for prefix (listing elt))) => prefix)) (define (append-reverse list tail) (loop ((for elt (in-list list)) (for result (listing-reverse (initial tail) list))) => result)) (define (unzip5 list) (loop ((for component (in-list list)) (for result1 (listing (first component))) (for result2 (listing (second component))) (for result3 (listing (third component))) (for result4 (listing (fourth component))) (for result5 (listing (fifth component)))) => (values result1 result2 result3 result4 result5)))
(FOR <result> (APPENDING [(INITIAL <tail>)] ...)) (FOR <result> (APPENDING-REVERSE [(INITIAL <tail>)] ...))
These append the accumulated data into a list with a final cdr of <tail>. APPENDING maintains the order of the lists and their elements; APPENDING-REVERSE reverses the elements of the lists. The accumulated data must all be proper lists.
<Tail>, if supplied, is evaluated once in the environment outside the loop. The default value of <tail> is the empty list.
For APPENDING, <result> is a final variable.
For APPENDING-REVERSE, <result> is a loop variable; its value in the final expression is the reverse-concatenated list.
(define (concatenate lists) (loop ((for list (in-list lists)) (for result (appending list))) => list))
(FOR <result> (LISTING! [(INITIAL <tail>)] ...)) (FOR <result> (LISTING-INTO! <pair> [(INITIAL <tail>)] ...))
LISTING! builds up a list with a final cdr of <tail>. LISTING-INTO! builds up a list with a final cdr of <tail> and destructively stores it in the cdr of <pair>. Both use as little intermediate storage as possible, which may involve using destructive operations, and as a consequence are non-reentrant.
<Tail>, if supplied, is evaluated once in the environment outside the loop. The default value of <tail> is the empty list.
<Result> is a final variable.
Accumulating Numerical Results
(FOR <result> (SUMMING [(INITIAL <initial-sum>)] ...)) (FOR <result> (MULTIPLYING [(INITIAL <initial-product>)] ...))
Accumulates sums or products by adding the accumulated data to <initial-sum>, or multiplying the accumulated data by <initial-product>.
<Initial-sum> and <initial-product>, if supplied, must be numbers, and are evaluated once before the loop begins. The default value of <initial-sum> is 0, and the default value of <initial-product> is 1.
<Result> is a loop variable.
(define (count predicate list) (loop ((for elt (in-list list)) (for count (summing 1 (if (predicate elt))))) => count))
(FOR <result> (MINIMIZING [(INITIAL <initial-minimum>)] ...)) (FOR <result> (MAXIMIZING [(INITIAL <initial-maximum>)] ...))
Finds the minimum or maximum datum accumulated. If an initial value expression is provided, it is evaluated once before the beginning of the loop, all of the data involved must be real numbers, and <result> will be a real number. If no initial value is provided, the data must be either real numbers or false, and <result> will be either a real number or false.
<Result> is a loop variable.
Writing Custom Iterators
(NOTE: This section documents the implementation of iterators for Taylor R. Campbell's current version of foof-loop. It does *not* document the version originally written by Alex Shinn (after whose nickname, `foof', the system is named), which has a somewhat less flexible interface for custom iterators.)
One of the main goals of foof-loop is extensibility. To accomplish this goal, writing custom iterators is a simple matter of specifying the syntax of the iterator, all of the loop variables and other variables bound by the loop, and the loop's termination conditions. In order to implement foof-loop in portable SYNTAX-RULES, iterators are macros in continuation-passing style.
When LOOP encounters a clause of the form
(FOR <variable> ... (<iterator> . <arguments>)),
it invokes <iterator> as (<iterator> (<variable> ...) <arguments> <continuation> . <environment>), where the environment is simply information that LOOP needs in order to continue processing the other loop clauses, and where the continuation is a macro that returns control back to LOOP given a description of all the variables bound by and termination conditions tested for the iterator at hand.
The continuation should be invoked as follows:
(continuation ((<outer-bvl> <outer-producer>) ...) ;; Outer variables to be bound once outside the whole loop. ;; Each <outer-producer> is evaluated once in the enclosing ;; environment of the LOOP form. ((<loop-variable> <initializer> <update>) ...) ;; Loop variables. Each <initializer> is evaluated in an ;; environment where the outer variables are bound. Each <update> ;; is evaluated in an environment where the outer variables, loop ;; variables, entry variables, and body variables are all bound. ((<entry-bvl> <entry-producer>) ...) ;; Entry variables, to be bound each time the loop is entered. ;; Each <entry-producer> is evaluated in an environment where the ;; outer variables and loop variables are bound. (<termination-condition> ...) ;; Conditions that will terminate the loop. These are all evaluated ;; in an environment where the outer variables, loop variables, and ;; entry variables are bound. The first condition, of any loop ;; clause, that holds true will terminate the iteration, short- ;; circuiting the rest of the termination conditions. ((<body-bvl> <body-producer>) ...) ;; Variables to be bound around the body of the loop, if the loop ;; does not terminate. ((<final-bvl> <final-producer>) ...) ;; Variables to be bound around the final expression of the loop, if ;; the loop does terminate. . environment).
The <*-bvl>s are bound variable lists, and the variables in them are bound to the respective values of the corresponding <*-producer> expressions, as with LET-VALUES. The bindings are made in parallel, also as with LET-VALUES. To make interdependent bindings, use multiple return values in a single binding clause.
The ordering of the arguments to the continuation is intended to reflect the ordering of control through the loop: first the outer bindings are made, then the loop variables are initialized, then the entry bindings are made, then the termination conditions are tested, and finally either the body bindings or the final bindings are made, depending on the termination conditions.
For example, here is a simplified definition of IN-LIST, omitting support for a user-supplied loop variable, a user-supplied list successor procedure, and the guarantee that SET-CDR! may be safely applied to the pair without altering the iteration. (For the sake of concision, we use the name `next' for the continuation and `rest' for the environment.) In this simplified IN-LIST, we expect one variable in FOR and one argument after IN-LIST -- i.e., (FOR <element-variable> (IN-LIST <list-expression>)) --; hence, the initial two arguments to the actual macro are `(element-variable)' and `(list-expression)'.
(define-syntax in-list (syntax-rules () ((IN-LIST (element-variable) (list-expression) next . rest) (next (((LIST) list-expression)) ;Outer bindings ((PAIR LIST (CDR PAIR))) ;Loop variables () ;Entry bindings ((NOT (PAIR? PAIR))) ;Termination conditions (((element-variable) ;Body bindings (CAR PAIR))) () ;Final bindings . rest))))
Reporting Syntax Errors
The above definition of IN-LIST will work, but, if used incorrectly, it would yield a strange, incomprehensible syntax error to the casual user of foof-loop, involving a great deal of state internal to the LOOP macro. This is not what users like to see; rather, users like to see useful, comprehensible error messages that are directly related to what they wrote. Therefore, foof-loop provides two conveniences for reporting syntax errors: SYNTACTIC-ERROR and LOOP-CLAUSE-ERROR. The former is a more general way to report errors in macros:
(SYNTACTIC-ERROR <message> <form> ...)
will cause an error to be reported when this form is expanded, ideally involving the given message and subforms. Many loop iterators are implemented in terms of internal utilities handling the general patterns of several specialized cases, such as IN-VECTOR and IN-STRING, which generate very similar code. In order to report specific syntax errors that correspond with what was actually the input to the LOOP macro, these iterators pass error context descriptors to one another. Error context is of the form
(<name> (<variable> ...) <arguments> <message>),
where <name> is the name of the iterator, the <variable>s are the forms passed in the FOR clause before the iterator, <arguments> is the tail of the iterator subform inside the FOR clause, and <message> is a message explaining the error (which cannot be constructed from <name> due to liminations in SYNTAX-RULES). The macro LOOP-CLAUSE-ERROR invokes SYNTACTIC-ERROR with a FOR form that corresponds with the original FOR form seen by LOOP. That is,
(LOOP-CLAUSE-ERROR (<name> (<variable> ...) <arguments> <message>))
(SYNTACTIC-ERROR <message> (FOR <variable> ... (<name> . <arguments>))).
Using LOOP-CLAUSE-ERROR for more robust syntax error notification, we can write the simplified IN-LIST by adding a fall-through syntax rule that matches any possible invocation of the iterator from LOOP:
(define-syntax in-list (syntax-rules () ((IN-LIST (element-variable) (list-expression) next . rest) (next (((LIST) list-expression)) ;Outer bindings ((PAIR LIST (CDR PAIR))) ;Loop variables () ;Entry bindings ((NOT (PAIR? PAIR))) ;Termination conditions (((element-variable) ;Body bindings (CAR PAIR))) () ;Final bindings . rest)) ((IN-LIST variables arguments next . rest) ;; Ignore NEXT and REST; we only report the syntax error. (LOOP-CLAUSE-ERROR (IN-LIST variables arguments "Malformed IN-LIST form in LOOP:")))))
With the appropriate definition of SYNTACTIC-ERROR shown below, if we misuse IN-LIST, we get a helpful error message in most Schemes, along these lines:
(loop ((for x y z (in-list))) (body x y z))
Syntax error: Malformed IN-LIST clause in LOOP: (for x y z (in-list))
Pitfalls of Reporting Syntax Errors
It is important to remember the principle behind passing error context to any macros in terms of which iterators are implemented: to present coherent syntax errors reporting the original iterator form. In an ideal world, this would always be easy, but because we are limited to te tools of SYNTAX-RULES and SYNTAX-RULES alone, this is not as easy as we should like. There are cases where it is convenient, for example, to supply default values for optional parameters by recursively calling the iterator macro. This means, though, that if there is a syntax error for some other reason -- for example, if some of the arguments are named, or there are optional variables *and* optional arguments --, which goes undetected until some default values have been supplied, the wrong form will be reported to the user.
To avoid this, it is better to rewrite iterators into invocations of internal macros that take error context as an extra parameter. This precludes writing one iterator in terms of another one, however, which is an unfortunate consequence of the way that foof-loop is designed, which is itself an unfortunate consequence of the tools available. We should like to be able to have a macro debugger that can examine the expansion history of the forms leading up to a syntax error, but this is not a reliably portable concept; few Schemes even record the expansion history (only one that I know of, MIT Scheme).
Unfortunately, at the moment, foof-loop does not provide facilities for anything more than reporting syntax errors based on SYNTAX-RULES patterns alone. For example, there is no way to verify that a variable to be bound is, in fact, a variable; if the user writes anything other than a variable, there will probably be a very strange error much later on involving invalid definitions with strange lambda lists. It is not difficult to conceive of the interface to such checks; however, they are considerably more trouble to implement than they would be worth in program development at the moment.
Porting to New Schemes
The implementation of foof-loop is divided into three files, one of which is optional but useful:
- foof-loop.scm implements the main LOOP macro and the built-in iterators.
- syn-param.scm implements a general macro for producing local macros that accept named parameters with the form `(=> <name> <value>)'.
- let-values.scm is optional, but implements a portable LET-VALUES form (see SRFI 11) that gracefully handles single-value cases in order to avoid superfluous uses of CALL-WITH-VALUES. This definition of LET-VALUES is more likely to work in Scheme systems that have incorrect implementations of multiple return values, such as MIT Scheme, although problems may still arise with (VALUES X), for example.
Ideally, these files should be loaded into their own modules, or a single module for foof-loop, if your Scheme system supports a module system or any form of namespace separation. If this is not the case, you may wish to prefix some of the identifiers in the files that name internal operations in order to avoid namespace clashes.
The implementation of foof-loop is strictly limited to R5RS Scheme except for one element, which is a utility to signal syntax errors during macro expansion. (SYNTACTIC-ERROR message irritants ...) is a *macro* that should signal an error, ideally involving the message and irritants supplied. A legal, though not ideal, definition with R5RS is:
(define-syntax syntactic-error (syntax-rules ()))
Here is a better definition, for Scheme48, if we open the SIGNALS and NODES structures in the for-syntax environment (but note that this is a sequence of interactive commands, hence the prompt, and *not* code to store in a file to load):
> ,for-syntax ,open signals nodes > (define-syntax syntactic-error (lambda (form rename compare) (apply syntax-error (map schemify (cdr form)))) ()) ;No auxiliary names
Note that if your Scheme does not support SYNTAX-RULES, it is not R5RS- compliant, and it cannot load foof-loop. Also note that foof-loop uses some very complicated patterns of SYNTAX-RULES, so expansion of LOOP forms may be slow and taxing on your Scheme system's resources. Unfortunately, there is no other way to do this portably. Finally, note that Scheme systems that preserve the case of symbols in source code are not R5RS-compliant and cannot load foof-loop. If your Scheme system by default preserves case, but has an option for folding case, you must set that option first.
Design Notes and Rationale
For a comparison of (an older rendition of) foof-loop with other contemporary looping macros, see the Usenet article with message-id <firstname.lastname@example.org> posted on comp.lang.scheme in September of 2006. Also, for extended comparative examples, see http://mumble.net/~campbell/scheme/loop-comparison.scm. These design notes pertain more specifically to choices made for Taylor R. Campbell's variant of the macro.
Always, in the built-in iterators' bounding intervals -- for iteration over ranges of numbers, over segments of vectors, &c. --, the lower bound is inclusive and the upper bound is exclusive. It doesn't matter whether one starts from the upper bound and goes to the lower bound or vice versa: the lower bound is inclusive and the upper bound is exclusive.
My experience is that any inconsistency in this regard is a damning path to confusing doom of program clarity. Off-by-one errors are still possible, of course, especially if the looping macros are mixed with other systems that do not follow this strict convention. However, between R5RS and foof-loop, it is *much* easier to do the right thing from the start, without having to meticulously adjust any upper bounds, especially for backward iteration that starts from the upper bound. If you find that you are performing such meticulous adjustment, ask yourself whether you are doing the right thing, and if the answer is `yes', I'd like to hear about it.
To reiterate, since this is very important:
The lower bound is always inclusive. The upper bound is always exclusive. It does not matter which bound you start from.
If you write any of your own custom iterators that involve any sort of bounding intervals, *PLEASE* adhere to these rules unless you have an extremely good reason not to, which I'd like to hear about if you do.