SRFI 214: Flexvectors

Repositories

Canonical SRFI-214 repository: https://github.com/scheme-requests-for-implementation/srfi-214
SRFI-214 packaged as a CHICKEN egg: https://github.com/nmeum/srfi-214 (the egg comes from this repository)

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi+minus+214+at+srfi+dotschemers+dot+org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Received: 2020-10-06
Draft #1 published: 2020-10-07
Draft #2 published: 2021-01-13
Draft #3 published: 2021-03-11
Finalized: 2021-03-18

Abstract

A flexvector, also known as a dynamic array or an arraylist, is a mutable vector-like data structure with an adjustable size. Flexvectors allow fast random access and fast insertion/removal at the end. This SRFI defines a suite of operations on flexvectors, modeled after SRFI 133's vector operations.

Rationale

Unlike the default vector data type of many other languages, Scheme vectors have a fixed length. This makes vectors unusable as mutable stacks or queues, and is the reason that SRFI 133 lacks common collection operations like filter. In fact, no Scheme standard defines a mutable data type that is suitable for this very common purpose, analogous to a Java ArrayList or to the default list data structure in JavaScript or Python. SRFI 117 defines the commonly-used "tconc" mutable queue, but it is a linked list. And SRFI 134 defines a deque data type, but that data type is immutable. Neither data structure has the (often useful) properties of being a mutable, contiguous, random-access sequence. This SRFI does not define a comparator or any sorting procedures, in order to ensure that it has no dependencies. These may be defined in future SRFIs.

Terminology

What this SRFI calls a flexvector is better known as a dynamic array. This data structure has a wide variety of names in different languages:

JavaScript and Ruby call it an array
Python calls it a list
Java calls it an ArrayList (and, before that, it was called a Vector)

Scheme literature already uses the terms list (for cons lists), vector (for fixed-length vectors), and array (for fixed-length numeric arrays), so a new term is needed. Because array in Scheme refers to a numeric array, the term dynamic array is not ideal. Dynamic vector is a possibility, but dynamic-vector would be an unwieldy prefix due to its length. The newly-minted term flexvector communicates this data structure's flexible size and its relationship to Scheme vectors, while being only as long as an existing Scheme data type name (bytevector).

Procedure inclusion and naming

This SRFI is primarily modeled on SRFI 133. It includes flexvector equivalents of all SRFI 133 procedures, most with the same names and argument order. There are three notable exceptions:

flexvector-unfold mimics the API of SRFI 1's unfold, not SRFI 133's vector-unfold. vector-unfold is limited by the necessity of a fixed vector length, while flexvector-unfold may generate a flexvector of any length, and so the unfold API is more useful.
There is no flexvector equivalent of vector-unfold! because flexvectors use the list version of unfold, which has no unfold! equivalent with a similar API.
The flexvector equivalent of vector= is flexvector=?. It is conventional for Scheme equality predicates to end in =? (e.g., symbol=?, string=?), and most data structure SRFIs follow this convention (see SRFI 113, 125, 146). This SRFI follows established convention, even when it does not match SRFI 133's procedure names.

Additionally, this SRFI includes deque-like operations that reference, add to, and remove from both ends of a flexvector. The naming convention for these operations is taken from SRFI 134, which uses the terms front and back. Front refers to the start of the flexvector (index 0), while back refers to the end (index (- (flexvector-length x) 1)).

Specification

Flexvectors have the same random-access performance guarantees as ordinary vectors. In particular, if a given Scheme implements vectors with contiguous memory locations and O(1) random access and mutation, flexvectors must also have these performance characteristics. Additionally, appending to the back of a flexvector has the same (amortized) performance as setting an existing location in the same flexvector. In this section, the following notation is used to specify parameters and examples:

Parameters named fv are always flexvectors.
x ... indicates that zero or more of the parameter x are allowed.
[x] indicates that the parameter x is optional. Multiple optional parameters may be written [x [y]].
;=> separates an example expression from its expected result.
λ is a shorthand alias for lambda.

Additionally, examples include literal flexvector values written in pseudo-lexical syntax: #<flexvector a b c> is a flexvector of length 3 containing the symbol values a, b, and c. This syntax is only used for example purposes. This SRFI does not define the #<flexvector ...> syntax for actual use.

Constructors

[procedure] (make-flexvector size [fill])

Creates and returns a flexvector of size size. If fill is specified, all of the elements of the vector are initialized to fill. Otherwise, their contents are indeterminate.

[procedure] (flexvector x ...)

Creates and returns a flexvector whose elements are x ....

[procedure] (flexvector-unfold p f g initial-seed ...)

The fundamental flexvector constructor. flexvector-unfold is modeled on SRFI 1 unfold instead of SRFI 133 vector-unfold because flexvectors are not limited to a predetermined length. For each step, flexvector-unfold evaluates p on the seed value(s) to determine whether it should stop unfolding. If p returns #f, it then evaluates f on the seed value(s) to produce the next element, then evaluates g on the seed value(s) to produce the seed value(s) for the next step. The recursion can be described with this algorithm: This is guaranteed to build a flexvector in O(n) if flexvector-add-back! is O(1). flexvector-unfold-right is a variant that constructs a flexvector right-to-left, and uses flexvector-add-front! instead, which may be slower than O(n).

[procedure] (flexvector-copy fv [start [end]])

Allocates a new flexvector whose length is (- end start) and fills it with elements from fv, taking elements from fv starting at index start and stopping at index end. start defaults to 0 and end defaults to the value of (flexvector-length fv). flexvector-reverse-copy is the same, but copies the elements in reverse order from fv. Both start and end are clamped to the range [0, (flexvector-length fv)). It is an error if end is less than start. flexvector-copy shares the performance characteristics of vector-copy — in particular, if a given Scheme's vector-copy uses a fast memcpy operation instead of an element-by-element loop, flexvector-copy should also use this operation.

[procedure] (flexvector-append fv ...)

Returns a newly allocated flexvector that contains all elements in order from the subsequent locations in fv ....

[procedure] (flexvector-concatenate list-of-flexvectors)

Equivalent to (apply flexvector-append list-of-flexvectors), but may be implemented more efficiently.

[procedure] (flexvector-append-subvectors [fv start end] ...)

Returns a vector that contains every element of each fv from start to end in the specified order. This procedure is a generalization of flexvector-append.

Predicates

[procedure] (flexvector? x)

Disjoint type predicate for flexvectors: this returns #t if x is a flexvector, and #f otherwise.

[procedure] (flexvector-empty? fv)

Returns #t if fv is empty (i.e., its length is 0), and #f if not.

[procedure] (flexvector=? elt=? fv ...)

Flexvector structural equality predicate, generalized across user-specified element equality predicates. Flexvectors a and b are considered equal by flexvector=? iff their lengths are the same and, for each index i less than (flexvector-length a), (elt=? (flexvector-ref a i) (flexvector-ref b i)) is true. elt=? is always applied to two arguments. flexvector=? returns #t if it is passed zero or one fv arguments. The execution order of comparisons is intentionally left unspecified.

Selectors

[procedure] (flexvector-ref fv i)

Flexvector element dereferencing: returns the value at location i in fv. Indexing is zero-based. It is an error if i is outside the range [0, (flexvector-length fv)). flexvector-ref has the same computational complexity as vector-ref. In most Schemes, it will be O(1).

[procedure] (flexvector-front fv)

Returns the first element in fv. It is an error if fv is empty. Alias for (flexvector-ref fv 0).

[procedure] (flexvector-back fv)

Returns the last element in fv. It is an error if fv is empty. Alias for (flexvector-ref fv (- (flexvector-length fv) 1)).

[procedure] (flexvector-length fv)

Returns the length of fv, which is the number of elements contained in fv. flexvector-length has the same computational complexity as vector-length. In most Schemes, it will be O(1).

Mutators

[procedure] (flexvector-add! fv i x ...)

Inserts the elements x ... into fv at the location i, preserving their order and shifting all elements after i backward to make room. This increases the length of fv by the number of elements inserted. It is an error if i is outside the range [0, (flexvector-length fv)]. flexvector-add! returns fv after mutating it.

[procedure] (flexvector-add-front! fv x ...)

Inserts the elements x ... into the front or back of fv, preserving their order. This increases the length of fv by the number of elements inserted. flexvector-add-back! of one element has the same computational complexity as vector-set!, amortized. In most Schemes, this will be amortized O(1). These procedures return fv after mutating it.

[procedure] (flexvector-add-all! fv i xs)

Inserts the elements of the list xs into fv at location i. Equivalent to (apply flexvector-add! fv i xs). Returns fv after mutating it.

[procedure] (flexvector-append! fv1 fv2 ...)

Inserts the elements of the flexvectors fv2 ... at the end of the flexvector fv1, in order. Returns fv1 after mutating it.

[procedure] (flexvector-remove! fv i)

Removes and returns the element at i in fv, then shifts all subsequent elements forward, reducing the length of fv by 1. It is an error if i is outside the range [0, (flexvector-length fv)).

[procedure] (flexvector-remove-front! fv)

Removes and returns the first element from fv, then shifts all other elements forward. flexvector-remove-back! instead removes the last element, without moving any other elements, and has the same performance guarantees as flexvector-add-back!. It is an error if fv is empty.

[procedure] (flexvector-remove-range! fv start [end])

Removes all elements from fv between start and end, shifting all elements after end forward by (- end start). If end is not present, it defaults to (flexvector-length fv). Both start and end are clamped to the range [0, (flexvector-length fv)). It is an error if end is less than start. flexvector-remove-range! returns fv after mutating it.

[procedure] (flexvector-clear! fv)

Removes all items from fv, setting its length to 0. Returns fv after mutating it.

[procedure] (flexvector-set! fv i x)

Assigns the value of x to the location i in fv. It is an error if i is outside the range [0, (flexvector-length fv)]. If i is equal to (flexvector-length fv), x is appended after the last element in fv; this is equivalent to flexvector-add-back!. Returns the previous value at location i in fv, or an unspecified value if i is equal to (flexvector-length fv). flexvector-set! has the same computational complexity as vector-set!. In most Schemes, it will be O(1).

[procedure] (flexvector-swap! fv i j)

Swaps or exchanges the values of the locations in fv at indexes i and j. It is an error if either i or j is outside the range [0, (flexvector-length fv)). Returns fv after mutating it.

[procedure] (flexvector-fill! fv fill [start [end]])

Assigns the value of every location in fv between start, which defaults to 0 and end, which defaults to the length of fv, to fill. Returns fv after mutating it. Both start and end are clamped to the range [0, (flexvector-length fv)]. It is an error if end is less than start.

[procedure] (flexvector-reverse! fv)

Destructively reverses fv in-place. Returns fv after mutating it.

[procedure] (flexvector-copy! to at from [start [end]])

Copies the elements of flexvector from between start and end to flexvector to, starting at at. The order in which elements are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary vector and then into the destination. This can be achieved without allocating storage by making sure to copy in the correct direction in such circumstances. flexvector-reverse-copy! is the same, but copies elements in reverse order. start and end default to 0 and (flexvector-length from) if not present. Both start and end are clamped to the range [0, (flexvector-length from)]. It is an error if end is less than start. Unlike vector-copy!, flexvector-copy! may copy elements past the end of to, which will increase the length of to. flexvector-copy! shares the performance characteristics of vector-copy! — in particular, if a given Scheme's vector-copy! uses a fast memcpy operation instead of an element-by-element loop, flexvector-copy! should also use this operation. Both procedures return to after mutating it.

Iteration

[procedure] (flexvector-fold kons knil fv1 fv2 ...)

The fundamental flexvector iterator. kons is iterated over each value in all of the vectors, stopping at the end of the shortest; kons is applied as (kons state (flexvector-ref fv1 i) (flexvector-ref fv2 i) ...) where state is the current state value—the current state value begins with knil, and becomes whatever kons returned on the previous iteration—and i is the current index. The iteration of flexvector-fold is strictly left-to-right. The iteration of flexvector-fold-right is strictly right-to-left.

[procedure] (flexvector-map f fv1 fv2 ...)

Constructs a new flexvector of the shortest size of the flexvector arguments. Each element at index i of the new flexvector is mapped from the old flexvectors by (f (flexvector-ref fv1 i) (flexvector-ref fv2 i) ...). The dynamic order of application of f is unspecified. flexvector-map/index is a variant that passes i as the first argument to f for each element.

[procedure] (flexvector-map! f fv1 fv2 ...)

Similar to flexvector-map, but rather than mapping the new elements into a new flexvector, the new mapped elements are destructively inserted into fv1. Again, the dynamic order of application of f is unspecified, so it is dangerous for f to apply either flexvector-ref or flexvector-set! to fv1 in f.

[procedure] (flexvector-append-map f fv1 fv2 ...)

Constructs a new flexvector by appending the results of each call to f on the elements of the flexvectors fv1, fv2, etc., in order. Each call is of the form (f (flexvector-ref fv1 i) (flexvector-ref fv2 i) ...). Iteration stops when the end of the shortest flexvector argument is reached. The dynamic order of application of f is unspecified. f must return a flexvector. It is an error if f returns anything else. flexvector-append-map/index is a variant that passes the index i as the first argument to f for each element.

[procedure] (flexvector-filter pred? fv)

Constructs a new flexvector consisting of only the elements of fv for which pred? returns a non-#f value. flexvector-filter/index passes the index of each element as the first argument to pred?, and the element itself as the second argument.

[procedure] (flexvector-filter! pred? fv)

Similar to flexvector-filter, but destructively updates fv by removing all elements for which pred? returns #f. flexvector-filter/index! passes the index of each element as the first argument to pred?, and the element itself as the second argument.

[procedure] (flexvector-for-each f fv1 fv2 ...)

Simple flexvector iterator: applies f to the corresponding list of parallel elements from fv1 fv2 ... in the range [0, length), where length is the length of the smallest flexvector argument passed. In contrast with flexvector-map, f is reliably applied in left-to-right order, starting at index 0, in the flexvectors. flexvector-for-each/index is a variant that passes the index as the first argument to f for each element.

[procedure] (flexvector-count pred? fv1 fv2 ...)

Counts the number of parallel elements in the flexvectors that satisfy pred?, which is applied, for each index i in the range [0, length) where length is the length of the smallest flexvector argument, to each parallel element in the flexvectors, in order.

[procedure] (flexvector-cumulate f knil fv)

Returns a newly-allocated flexvector new with the same length as fv. Each element i of new is set to the result of (f (flexvector-ref new (- i 1)) (flexvector-ref fv i)), except that, for the first call on f, the first argument is knil. The new flexvector is returned.

Searching

[procedure] (flexvector-index pred? fv1 fv2 ...)

Finds and returns the index of the first elements in fv1 fv2 ... that satisfy pred?. If no matching element is found by the end of the shortest flexvector, #f is returned. flexvector-index-right is similar, but returns the index of the last elements that satisfy pred?, and requires all flexvector arguments to have the same length. Given n arguments fv1 fv2..., pred? should be a function that takes n arguments and returns a single value, interpreted as a boolean.

[procedure] (flexvector-skip pred? fv1 fv2 ...)

Finds and returns the index of the first elements in fv1 fv2 ... that do not satisfy pred?. If all the values in the flexvectors satisfy pred? until the end of the shortest flexvector, this returns #f. flexvector-skip-right is similar, but returns the index of the last elements that do not satisfy pred?, and requires all flexvector arguments to have the same length. Given n arguments fv1 fv2..., pred? should be a function that takes n arguments and returns a single value, interpreted as a boolean.

[procedure] (flexvector-binary-search fv value cmp [start [end]])

Similar to flexvector-index and flexvector-index-right, but, instead of searching left-to-right or right-to-left, this performs a binary search. If there is more than one element of fv that matches value in the sense of cmp, flexvector-binary-search may return the index of any of them. The search is performed on only the indexes of fv between start, which defaults to 0, and end, which defaults to the length of fv. Both start and end are clamped to the range [0, (flexvector-length fv)]. It is an error if end is less than start. cmp should be a procedure of two arguments that returns either a negative integer, which indicates that its first argument is less than its second; zero, which indicates that they are equal; or a positive integer, which indicates that the first argument is greater than the second argument. An example cmp might be:

[procedure] (flexvector-any pred? fv1 fv2 ...)

Finds the first set of elements in parallel from fv1 fv2 ... for which pred? returns a true value. If such a parallel set of elements exists, flexvector-any returns the value that pred? returned for that set of elements. The iteration is strictly left-to-right.

[procedure] (flexvector-every pred? fv1 fv2 ...)

If, for every index i between 0 and the length of the shortest flexvector argument, the set of elements (flexvector-ref fv1 i) (flexvector-ref fv2 i) ... satisfies pred?, flexvector-every returns the value that pred? returned for the last set of elements, at the last index of the shortest flexvector. The iteration is strictly left-to-right.

[procedure] (flexvector-partition pred? fv)

Returns two values: a flexvector containing all elements of fv that satisfy pred?, and a flexvector containing all elements of fv that do not satisfy pred?. Elements remain in their original order.

Conversion

[procedure] (flexvector->vector fv [start [end]])

Creates a vector containing the elements in fv between start, which defaults to 0, and end, which defaults to the length of fv. Both start and end are clamped to the range [0, (flexvector-length fv)). It is an error if end is less than start.

[procedure] (vector->flexvector vec [start [end]])

Creates a flexvector containing the elements in vec between start, which defaults to 0, and end, which defaults to the length of vec. Both start and end are clamped to the range [0, (vector-length vec)). It is an error if end is less than start.

[procedure] (flexvector->list fv [start [end]])

Creates a list containing the elements in fv between start, which defaults to 0, and end, which defaults to the length of fv. reverse-flexvector->list is similar, but creates a list with elements in reverse order of fv. Both start and end are clamped to the range [0, (flexvector-length fv)). It is an error if end is less than start.

[procedure] (list->flexvector proper-list)

Creates a flexvector of elements from proper-list. reverse-list->flexvector is similar, but creates a flexvector with elements in reverse order of proper-list.

[procedure] (flexvector->string fv [start [end]])

Creates a string containing the elements in fv between start, which defaults to 0, and end, which defaults to the length of fv. It is an error if the elements are not characters. Both start and end are clamped to the range [0, (flexvector-length fv)). It is an error if end is less than start.

[procedure] (string->flexvector string [start [end]])

Creates a flexvector containing the elements in string between start, which defaults to 0, and end, which defaults to the length of string. Both start and end are clamped to the range [0, (string-length string)). It is an error if end is less than start.

[procedure] (flexvector->generator fv)

Returns a SRFI 158 generator that emits the elements of the flexvector fv, in order. If fv is mutated before the generator is exhausted, the generator's remaining return values are undefined.

[procedure] (generator->flexvector gen)

Creates a flexvector containing all elements produced by the SRFI 158 generator gen.

Fix flexvector->list implementation by supporting the optional start and end parameter

1.0.0

Initial release of SRFI-214 as a CHICKEN egg

Copyright

 Copyright (C) Adam Nelson (2020-2021).
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 
 The above copyright notice and this permission notice (including the next
 paragraph) shall be included in all copies or substantial portions of the
 Software.
 
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.