Generalized Arrays and Storage Classes

Provides a multidimensional array data type + associated reader and writer syntax. Likewise provides a storage class API, which backs the item storage for array types.

  1. Generalized Arrays and Storage Classes
  2. Overview
    1. Why not SRFI-122?
    2. This is incomplete compared to the ArraysCowan API!
  3. Installation
  4. Usage
  5. Terminology
  6. General Requirements
  7. API Description
    1. Storage Classes
      1. Constructors
      2. Predicates
      3. Accessors
      4. Invokers
      5. Standard storage classes
    2. Generalized Arrays
      1. Constructors
      2. Predicates
      3. Metadata Accessors
      4. Accessors
      5. Mutators
      6. Whole Array
      7. Transformations
      8. Copying and Conversion
      9. Input / Output
  8. Repository
  9. Version History
  10. License

Overview

This egg is an evolution and implementation of the Arrays Cowan API (see the relevant ArraysCowan.md file). I call it an evolution of the ArraysCowan API because through the course of implementing this API, some interfaces have changed to be simpler, and I have used CHICKEN-specific language extensions (such as #!optional, #!rest, etc) to make the code simpler where necessary. This egg is not specifically written to be compatible across many Scheme implementations, however can likely be made so with minimal effort. I am not opposed to any patches that could contribute towards R7RS compliance.

Why not SRFI-122?

One question that might arise given that this is a CHICKEN specific extension is "Why did you choose to not use SRFI-122?" The purpose of the SRFI process is to provide a useful set of functionality for common data-types and language extensions. The SRFI process does this by providing a reference implementation and documentation describing a standard set of functionality such as in the case of SRFI-122. A natural expectation might be that as long as a standard API and reference implementation is available, it is preferable to use that rather than fragment the community across different, non-portable extensions. In principle, I agree, however in this scenario I have specific objections to SRFI-122 that made it difficult to adopt for my intended use case.

My primary objection to SRFI-122 is with regards to the API. Firstly, there is a schism between what are intervals, and what are arrays. Both data types are provided by SRFI-122, and they represent different concepts. This schism can make it somewhat difficult to predict performance for array vs. interval operations. In addition to the schism caused by keeping intervals as a separate data type, the array objects themselves are lazy, in that they do not evaluate any transformations done to them until an array-ref or some equivalent is called on the array cell. There are times when this can be preferable (i.e. performing many operations on a single array, and only having to loop through it once), but in large I have avoided doing this here as the performance of accessing a single element or many elements is indeterminable in the general case. I do not disparage the SRFI-122 authors for their approach, however while implementing this egg I have made specific decisions that will make it easier to interoperate this array API with BLAS and LAPACK optimizations in the future, without requiring special interfaces and trying to reduce copies as much as can be made feasible.

On a slightly less relevant note, I found the documentation for SRFI-122 to be very math-heavy, and for beginners may be somewhat inscrutable. I likewise find working with the interfaces to be burdensome compared to the ArraysCowan API, which is in some ways a lot closer to Numpy or APL, for users from those languages. As an aside, if the SRFI committee wants to eventually turn the ArraysCowan API into an SRFI, they are welcome to leverage the work I have done here to come up with a separate SRFI.

This is incomplete compared to the ArraysCowan API!

This egg does not implement array-inner-product or array-outer-product. While I am open to any patch that might contribute such functionality, I chose to avoid implementing these here initially because I expect that many users would prefer BLAS-optimized array/matrix products, and I did not want to spend forever to implement and release this library. If you notice anything obvious that seems like it is missing, please contact me and I'd be happy to discuss if it can be added.

Installation

   $ chicken-install -s generalized-arrays

This installs both the generalized-arrays module and storage-classes module. This egg depends on the following eggs:

   check-errors
   matchable
   numbers
   srfi-133
   typed-records

Note that numbers is only included as a dependency for complex number support, but otherwise these eggs should not affect regular usage.

Usage

   (use generalized-arrays storage-classes)

Note that while you may choose to use the storage class API on its own, it may be very difficult to use the arrays API without defining any storage classes.

Terminology

NOTE: Much of this is copied verbatim from the ArraysCowan description, because this egg originated as an implementation of that API.

An array is an object of a disjoint type, a container with elements arranged according to a rectilinear coordinate system. An array can have any number of dimensions or axes, including zero; this number is called the rank of the array. Arrays of rank zero contain exactly one element. Note that "rank" is a Fortran, Common Lisp, and APL term that has nothing to do with matrix ranks in the sense of linear algebra.

Each axis has an extent represented by two exact integers, the first representing the smallest possible coordinate for that axis, and the second representing the largest possible coordinate plus one. Extent is also used, by a mild abuse of language, for the difference between the two values. The smallest coordinates are collected into a Scheme vector known as the lower bound of the array; the largest coordinates are collected into another Scheme vector known as the upper bound of the array. An index of the array is any Scheme vector of exact integers which has the same number of elements as the array's rank, and whose values lie between the lower bound (inclusive) and the upper bound (exclusive) of the corresponding axis. Note that in this implementation, the lower bound of an array will always be zero for every axis.

An array can be a general array, meaning each element can be any Scheme object, or it can be a specialized array, meaning that each element can only belong to a given restricted type. This is accomplished by separating array objects from the underlying storage objects, which can be Scheme vectors or numeric vectors or other objects. Any object which can map a non-negative exact integer into an appropriate value can serve as a storage object by writing a storage class for it. Note that it is an error if the extent of any axis is non-positive.

In order to map an array's index (a Scheme vector of exact integers) into a storage index (a single exact integer), each array maintains another associated vector of exact integers, the stride, as well as a single additional exact integer, the offset. Multiplying each element of the stride by the corresponding element of the array index, summing the results, and adding the offset produces the corresponding storage index. Therefore, the offset is the storage index of the element whose array index consists of all zeros.

Procedures that accept an array object and return a new array sharing the same storage object but with different upper bound, lower bound, stride, and/or offset are known as array transformations, and this egg provides a number of them. The egg also provides other procedures for operating on arrays, all of which have the property that they are meaningful no matter what the elements of the array are. So array-map can be used to sum two arrays, since that is done element-wise over the + operation; but there is no operation provided for ordinary matrix multiplication, because it depends on the array elements being solely numeric, which any general two arrays may not be.

In the same way that the names start and end are applied to optional numerical indexes that default to the smallest element of a sequence (list, vector, string, or whatever) and the largest element plus one, in this egg they default to the lower bound and the upper bound of an array.

In certain procedures, the elements of an array are processed in lexicographic order, also known as row-major order. This means the order in which the highest-numbered axis changes most rapidly, and the other axes change only when the following axis returns to its lowest value.

NOTE: Arrays with high rank (>4) may have very poor performance compared to 1-4D arrays, because of some rank-specific algorithms used. I would venture to say that not many will use 5D+ arrays however, which has been considered a trade-off in this implementation.

General Requirements

Where a procedure that is passed to any procedure defined in this SRFI receives an index as an argument, it is an error for that procedure to mutate the index.

NOTE: that array objects are immutable, but their storage objects are usually mutable. It is possible to create arrays that are prohibited from mutating their storage objects.

The procedures of this egg that are not transformations may return arrays whose stride is implementation-dependent (so that the order of elements in the storage object may be either row-major or column-major), but the offset is always 0.


API Description

Storage Classes

A storage class is a group of storage objects with the same behavior. A storage object maps a non-negative exact integer index into a storage location. There are standard storage classes, but it is also possible for programmers to create their own storage classes. Each storage class allows creating a storage object of a given size, accessing a location by its index, and mutating a location by its index to a new value. Note that the procedures used to do this need not be the standard procedures such as make-vector, vector-ref, and vector-set!; they may be more efficient equivalents. Storage classes allow arrays and similar objects to be polymorphic in the type of storage they use.

Constructors

[procedure] (make-storage-class short-id constructor accessor mutator sizer predicate default-fill)

Returns a storage class with the specified short-id, constructor, accessor, mutator, sizer, predicate, and default fill argument. An example might be as follows:

   (define vector-storage-class (make-storage-class 'v
                                                    make-vector
                                                    vector-ref
                                                    vector-set!
                                                    vector-length
                                                    vector?
                                                    (void)))

Predicates

[procedure] (storage-class? object)

Returns #t iff the object passed into the procedure is a storage class. Otherwise, returns #f.

Accessors

[procedure] (storage-class-short-id storage-class)

Returns the symbol representing the storage-class' short ID (e.g. u32vector-storage-class has a short ID of u32).

[procedure] (storage-class-constructor storage-class)

Returns the constructor of storage-class. This procedure returns a storage object belonging to the storage class, and can be called with one or two arguments: the first is an exact non-negative integer specifying the size of the object. If objects of the class do not have a fixed size, the size must be specified as #f. The second is a value to fill all the elements with. If the second argument is omitted, the elements will have arbitrary contents.

[procedure] (storage-class-accessor storage-class)

Returns the accessor of storage-class as a procedure. This procedure takes two arguments, a storage object and an exact non-negative integer, and returns the value of the element indexed by the integer. It is an error if the index is greater than or equal to the size.

[procedure] (storage-class-mutator storage-class)

Returns the mutator of storage-class as a procedure. This procedure takes three arguments, a storage object, an exact non-negative integer, and a value. It mutates the element of the object specified by the index to be the value. It is an error if the index is greater than or equal to the size, or if the object is not capable of storing the value.

[procedure] (storage-class-sizer storage-class)

Returns the sizer of storage-class as a procedure. This procedure takes one argument, a storage object. It returns the size of the object specified when the object was created. This may be an exact non-negative integer.

[procedure] (storage-class-predicate storage-class)

Returns the predicate for testing a storage class' storage object. This procedure takes one argument, a storage object. It evaluates #t if a storage object is of the type described by the storage class, and #f otherwise.

[procedure] (storage-class-default-fill storage-class)

Returns the default fill value of a storage-class as a value. This procedure takes one argument a storage object. It returns the default fill value for the storage object when it was created.

Invokers

[procedure] (make-storage-object storage-class n [fill])

Returns a newly allocated storage object with class storage-class and length n, filled with value fill, if specified. If fill is not specified, then the default fill value for the storage object is used.

[procedure] (storage-object-ref storage-class storage-object n)

Returns the nth element of storage-object as seen through the lens of storage-class. It is an error if n is not less than the size of storage-object.

[procedure] (storage-object-set! storage-class storage-object n value)

Mutates the nth element of storage-object as seen through the lens of storage-class so that its value is value. It is an error if n is not less than the size of storage-object.

[procedure] (storage-object-length storage-class storage-object)

Returns the size of storage-object as seen through the lens of storage-class.

Standard storage classes

[constant] vector-storage-class
[constant] u8vector-storage-class
[constant] s8vector-storage-class
[constant] u16vector-storage-class
[constant] s16vector-storage-class
[constant] u32vector-storage-class
[constant] s32vector-storage-class
[constant] f32vector-storage-class
[constant] f64vector-storage-class
[constant] c64vector-storage-class
[constant] c128vector-storage-class

Standard set of storage classes provided for convenience. Note that because CHICKEN does not provide u64 and s64 vectors in SRFI-4, these are not provided here. Further, 64 and 128-bit complex numbers are implemented as f32 and f64 vectors respectively. This is done for BLAS and LAPACK compatibility, as this is how they would need to be passed in for BLAS compatibility. Consideration has been taken so that these details are not leaked, however when using array-storage-object this is exposed.

Generalized Arrays

Constructors

[procedure] (make-array storage-class shape [fill])

Returns a newly allocated mutable array (with a newly allocated storage object of the specified storage-class) that has the specified bounds. The values of the elements default to the default fill value of the storage class if the fill argument is omitted.

[procedure] (array-tabulate storage-class proc shape [(mutable? #t)])

Returns a newly allocated array (with a newly allocated storage object of the specified storage-class) that has the specified bounds. The values of the elements are computed by calling proc on every possible index of the array in lexicographic order. If mutable? is true, the array can mutate its storage object.

[procedure] (array-broadcast array obj)

Returns a newly allocated array whose bounds and storage class are the same as array and all of whose elements are obj.

Predicates

[procedure] (array? object)

Returns #t iff the object passed in is an array. Otherwise returns #f.

[procedure] (array-mutable? array)

Returns #t if array can mutate its storage object, and #f otherwise.

Metadata Accessors

[procedure] (array-storage-class array)

Returns the storage class with which array was created.

[procedure] (array-storage-object array)

Returns the storage object underlying this array. Note that this can be #f in the case of storage classes without actual storage.

[procedure] (array-rank array)

Return the rank (number of dimensions) of array.

[procedure] (array-shape array)

Returns a rank-sized vector where each element is the extent of each of the axes, respectively.

[procedure] (array-stride array)

Returns the stride of the array.

[procedure] (array-offset array)

Returns the offset of the array.

[procedure] (array-index->storage-index array index)

Converts the index of array to the equivalent storage index.

Accessors

[procedure] (array-ref array index)

Returns the value of the element of array specified by index. Note that this is different from the array-ref of most Lisp systems, which specifies the index as a sequence of arguments.

[procedure] (array-recursive-ref array index #!rest indices)

Applies array-ref to the array using the first index. If there are more arguments, apply array-ref to the result using the next index. Repeat until there are no more indexes, returning the last result. It is an error if any intermediate result is not an array. (APL enlist.)

[procedure] (array-for-each proc array)

Iterates over the elements of array in lexicographic order, and calling proc on each element. Each invocation of proc receives the value of the element at that index. The value returned by proc is discarded.

[procedure] (array-for-each-index proc array)

Iterates over the indexes of array in lexicographic order, and calling proc on each index. The value returned by proc is discarded.

Mutators

[procedure] (array-set! array index value)

Sets the value of the element of array specified by index to value. Note that this is different from the array-set! of most Lisp systems, which specifies the index as a sequence of arguments.

[procedure] (array-tabulate! proc array)

Iterates over the indexes of array in lexicographical order starting at the index start and ending at the index end (start inclusive, end exclusive), and calling proc on each index. Whatever proc returns becomes the value of the array at the index.

Whole Array

[procedure] (array-map storage-class proc array #!rest arrays)

Returns a newly allocated array with the same bounds as the arrays; it is an error if they do not all have the same bounds. For each valid index value, proc is invoked, passing each corresponding element of the arrays. Whatever proc returns becomes the value of the storage element corresponding to that index in the result array. The order of invocations of proc is not specified.

[procedure] (array-map! proc array #!rest arrays)

It is an error if the arrays do not all have the same bounds. For each valid index value, proc is invoked, passing each corresponding element of the arrays. Whatever proc returns becomes the value of the storage element corresponding to that index in the first specified array. The order of invocations of proc is not specified. Returns an undefined value.

[procedure] (array-fold proc seed array #!rest arrays)

Returns a newly allocated array with the same bounds as the arrays; it is an error if they do not all have the same bounds. For each valid index value, proc is invoked in lexicographic order, passing each corresponding element of the arrays and a seed, whose initial value is seed. Proc returns two values, the value of the storage element corresponding to that index in the result array, and the new value of the seed.

[procedure] (array-reduce proc seed array axis)

Returns a newly allocated array whose rank is one less than the rank of array, by combining all the groups of elements of length n (where the default is the extent of axis) along axis using proc, a two-argument procedure. The order and number of invocations of proc is unspecified. If there is only one such element, it is unchanged. (APL reduce.)

[procedure] (array-cumulate proc seed array axis)

Returns a newly allocated array whose bounds are the same as the bounds of array. Each element along axis is constructed by reducing (as if by array-reduce) successive prefixes of the elements of array along that axis. (APL scan.)

[procedure] (array-count pred array)

Returns an exact integer containing the number of elements in array that satisfy pred.

[procedure] (array-index pred array)

Returns the index of the first element of array in lexicographic order that satisfies pred, or #f if no index is found.

[procedure] (array-find pred array)

Returns the first element of array in lexicographic order that satisfies pred, or #f if no element is found.

[procedure] (array-compress array booleans axis)

Returns an array with the same bounds as array except possibly along the axis dimension. The array is sliced along axis and the elements of booleans (a vector of boolean values) are used to decide which slices to incorporate into the result: if the corresponding boolean is #t, the slice is incorporated, otherwise not. (APL compress.)

[procedure] (array-expand array booleans nil axis)

Returns an array with the same bounds as array except possibly along the axis dimension. Array is sliced along axis and the elements of booleans (a vector of boolean values) are used to decide where, if anywhere, nil is to be interpolated: if the corresponding boolean is #t, nil is interpolated, otherwise the next slice is incorporated. It is an error if the size of booleans is not to the extent of the axis dimension in the result. It is also an error if nil does not have the same bounds as a slice. (APL expand.)

[procedure] (array-rearrange array order axis)

Returns an array with the same bounds as array. Array is sliced along the axis dimension, and the slices are reassembled in the order given by vector. The slice whose number appears in the first element of vector appears first in the result, and so on. It is an error if vector is not a vector of exact integers. (Generalized version of APL rotate.)

Transformations

[procedure] (array-transform proc shape array)

The procedure proc specifies an affine function that returns an index of array when given an index of the returned array. The array does not retain a dependency on proc. (SRFI 25 share-array.)

[procedure] (array-reshape array shape)

Returns an array with the specified bounds whose elements in lexicographic order are the same as the elements of array in lexicographic order. It is an error if there are too many or too few elements. (APL reshape.)

[procedure] (array-diagonal array)

Returns a one-dimensional array which contains the diagonal elements of array (that is, the elements whose indices are all the same integer).

[procedure] (array-reverse array axis)

Returns an array with the same bounds as array, but whose elements on the specified axis are reversed. (APL reverse.)

[procedure] (array-transpose array)

Returns an array whose axes appear in the reverse order of the axes of array. This implies that the upper and lower bound are the reverse of the bounds of array. (APL monadic transpose.)

[procedure] (array-rearrange-axes array order)

Returns an array whose axes are an arbitrary permutation of the axes of array. Vec specifies how to do the permutation: the axis whose number appears in the first element of vector appears as the first axis of the result, and so on. (APL dyadic transpose with integer-valued vector.)

[procedure] (array-slice array start end [step])

Returns a smaller array with the same rank as array whose elements begin at the "lower left" corner specified by start and end at the "upper right" corner specified by end. The optional step parameter describes how many elements to move along each axis (default a vector of rank length of all ones). Unlike array-copy, the result shares its storage object with array. (APL take and drop.)

[procedure] (array-squeeze array [axes])

Returns an array with the ranks specified by the elements of vector removed from array. It is an error if the extents of the specified ranks are not equal to 1. (NumPy squeeze)

[procedure] (array-unsqueeze array rank)

Returns an array whose rank is one greater than the rank of array. This is accomplished by inserting a new axis numbered 'axis' whose extent is (0:1). (NumPy expand.)

Copying and Conversion

These procedures return arrays which do not share their storage objects with any existing arrays.

When these procedures return arrays, both the array and the underlying storage object are newly allocated

[procedure] (array-copy array [(mutable? #t)])

Returns an array containing the elements of array. The stride and offset are implementation-defined. The resulting array is mutable if mutable? is true. The returned array has the same storage class as array.

[procedure] (array-copy! from to)

Copies the elements of array from the "to" argument to the "from" argument starting at element "at". It is an error if there are not enough elements in "to" to make this possible. The storage objects of from and to need not belong to the same storage classes.

[procedure] (array-append axis array #!rest arrays)

Returns a newly allocated array consisting of the arrays concatenated along axis. It is an error unless the storage classes of the arrays are the same, and the result has the same storage class. It is also an error unless the bounds of all the arrays are the same, with the possible exception of axis. The axisth element of the lower bound of the result is 0; the corresponding element of the upper bound is the sum of the extents of the arrays.

[procedure] (array-repeat array axis repeat)

Append repeat copies of array along axis axis, as if by array-append. (Variant of NumPy repeat.)

[procedure] (array-reclassify array storage-class)

Returns an array with the same bounds and elements as array, but whose storage class is specified by storage-class.

[procedure] (array->nested-vector array)
[procedure] (array->nested-list array)

Returns a newly allocated vector/list whose elements are also newly constructed vectors/lists, and so on as far down as necessary to cover every axis of the array. Bottom-level Scheme vectors/lists contain the elements of array. Thus, if array has rank 1, the result is a vector/list; if the array has rank 2, the result is a vector/list whose elements are vectors/lists, and so on. As a special case, if array has rank 0, the sole element is returned.

[procedure] (nested-vector->array nested-vector storage-class rank)
[procedure] (nested-list->array nested-list storage-class rank)

Returns a newly allocated array of the storage-class with rank rank whose elements are initialized from the vector nested-vector or list nested-list. It is an error if this argument is not rectangular. As a special case, if rank is 0, the sole element is nested-vector or nested-list, which in that case need not be a Scheme vector/list.

Input / Output

The external representation of an array consists of a # character, followed by the letter a, followed by a sequence of digits indicating the rank of the array, followed by a coded representation of the storage class, all with no intervening whitespace. This prefix is followed, after optional whitespace, by the representation of a nested list produced as if by array->nested-list. The prefix is interpreted as lower-case.

Examples:

   #1;> #a2v((1 0 0) (0 1 0) (0 0 1))
   #a2v((1 0 0) (0 1 0) (0 0 1))
   #2;> (make-vector u32vector-storage-class #(2 2) 5)
   #a2u32((5 5) (5 5))
[procedure] (array-read [(input-port (current-input-port))])

Reads the external representation of an array from input-port (the current input port if input-port is not specified) and returns the corresponding array. If the coded representation of the storage class is not recognized, vector-storage-class is used; this permits the addition of new coded storage classes in a backward compatible way.

[procedure] (array-write array [(output-port (current-output-port))])

Writes the external representation of array from output-port (the current output port if output-port is not specified) and returns an unspecified value.

Repository

Bitbucket Repository

Version History

0.0.3
Initial release of generalized-arrays and storage-class modules.

License

   ;;;; Copyright (c) 2017, Jeremy Steward
   ;;;; All rights reserved.
   ;;;;
   ;;;; Redistribution and use in source and binary forms, with or without
   ;;;; modification, are permitted provided that the following conditions are met:
   ;;;;
   ;;;; 1. Redistributions of source code must retain the above copyright notice,
   ;;;; this list of conditions and the following disclaimer.
   ;;;;
   ;;;; 2. Redistributions in binary form must reproduce the above copyright notice,
   ;;;; this list of conditions and the following disclaimer in the documentation
   ;;;; and/or other materials provided with the distribution.
   ;;;;
   ;;;; 3. Neither the name of the copyright holder nor the names of its
   ;;;; contributors may be used to endorse or promote products derived from this
   ;;;; software without specific prior written permission.
   ;;;;
   ;;;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
   ;;;; AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   ;;;; IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
   ;;;; ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
   ;;;; LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
   ;;;; CONSEQUENTIAL DAMAGES  INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
   ;;;; SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
   ;;;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
   ;;;; CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
   ;;;; ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   ;;;; POSSIBILITY OF SUCH DAMAGE.