SRFI-135: Immutable Texts
This SRFI specifies a new data type of immutable texts. The operations of this new data type include analogues for all of the non-mutating operations on strings specified by the R7RS and most of those specified by SRFI 130, but the immutability of texts and uniformity of character-based indexing simplify the specification of those operations while avoiding several inefficiencies associated with the mutability of Scheme's strings.
This egg provides the UTF-8 version of the SRFI 135 sample implementation; all procedures provided are fully Unicode-aware.
This egg provides some extensions to SRFI 135, including I/O procedures for texts. To use these extensions, import (srfi 135 extensions).
Author
William D. Clinger
SRFI Description
This page includes excerpts from the SRFI document, but is primarily intended to document the forms exported by the egg. For a full description of this SRFI, see the full SRFI document.
Conceptual model
Immutable texts are like strings except they can't be mutated.
Immutability makes it easier to use space-efficient representations such as UTF-8 and UTF-16 without incurring the cost of scanning from the beginning when character indexes are used (as with string-ref).
When mutation is not needed, immutable texts are likely to be more efficient than strings with respect to space or time. In some implementations, immutable texts may be more efficient than strings with respect to both space and time.
Subtypes
This SRFI defines two new types:
- text is a type consisting of the immutable texts for which text? returns true.
- textual is a union type consisting of the texts and strings for which textual? returns true.
The subtypes of the new textual type include the new text type and Scheme's traditional string type, which consists of the values for which string? returns true. The string type includes both mutable strings and the (conceptually) immutable strings that are the values of string literals and calls to symbol->string.
Notation
In the following procedure specifications:
- A text argument is an immutable text.
- A textual argument is an immutable text or a string.
- A char argument is a character.
- An idx argument is an exact non-negative integer specifying a valid character index into a text or string. The valid character indexes of a text or string textual of length n are the exact integers idx satisfying 0 ≤ idx < n.
- A k argument or result is a position: an exact non-negative integer that is either a valid character index for one of the textual arguments or is the length of a textual argument.
- start and end arguments are positions specifying a half-open interval of indexes for a subtext or substring. When omitted, start defaults to 0 and end to the length of the corresponding textual argument. It is an error unless 0 ≤ start ≤ end ≤ (textual-length textual).
- A len or nchars argument is an exact non-negative integer specifying some number of characters, usually the length of a text or string.
- A pred argument is a unary character predicate, taking a character as its one argument and returning a value that will be interpreted as true or false. Unless noted otherwise, as with textual-every and textual-any, all predicates passed to procedures specified in this SRFI may be called in any order and any number of times. It is an error if pred has side effects or does not behave functionally (returning the same result whenever it is called with the same character); the implementation does not detect those errors.
- An obj argument may be any value at all.
It is an error to pass values that violate the specification above.
Arguments given in square brackets are optional. Unless otherwise noted in the text describing the procedure, any prefix of these optional arguments may be supplied, from zero arguments to the full list. When a procedure returns multiple values, this is shown by listing the return values in square brackets, as well.
Procedures
Predicates
[procedure] (text? obj) → booleanIs obj an immutable text? In particular, (text? obj) returns false if (string? obj) returns true, which implies string? returns false if text? returns true.
[procedure] (textual? obj) → booleanReturns true if and only if obj is an immutable text or a string.
[procedure] (textual-null? textual) → booleanReturns true if and only if textual is the empty text or the empty string.
[procedure] (textual-every pred textual [start end]) → value[procedure] (textual-any pred textual [start end]) → value
Checks to see if every/any character in textual satisfies pred, proceeding from left (index start) to right (index end). These procedures are short-circuiting: if pred returns false, textual-every does not call pred on subsequent characters; if pred returns true, textual-any does not call pred on subsequent characters; Both procedures are "witness-generating":
- If textual-every is given an empty interval (with start = end), it returns #t.
- If textual-every returns true for a non-empty interval (with start < end), the returned true value is the one returned by the final call to the predicate on (text-ref (textual-copy text) (- end 1)).
- If textual-any returns true, the returned true value is the one returned by the predicate.
Note: The names of these procedures do not end with a question mark. This indicates a general value is returned instead of a simple boolean (#t or #f).
Constructors
[procedure] (make-text len char) → textReturns a text of the given length filled with the given character.
[procedure] (text char ...) → textReturns a text consisting of the given characters.
[procedure] (text-tabulate proc len) → textproc is a procedure that accepts an exact integer as its argument and returns a character. Constructs a text of size len by calling proc on each value from 0 (inclusive) to len (exclusive) to produce the corresponding element of the text. The order in which proc is called on those indexes is not specified.
[procedure] (text-unfold stop? mapper successor seed [base make-final]) → textThis is a fundamental constructor for texts.
- successor is used to generate a series of "seed" values from the initial seed: seed, (successor seed), (successor^2 seed), (successor^3 seed), ...
- stop? tells us when to stop — when it returns true when applied to one of these seed values.
- mapper maps each seed value to the corresponding character(s) in the result text, which are assembled into that text in left-to-right order. It is an error for mapper to return anything other than a character, string, or text.
- base is the optional initial/leftmost portion of the constructed text, which defaults to the empty text (text). It is an error if base is anything other than a character, string, or text.
- make-final is applied to the terminal seed value (on which stop? returns true) to produce the final/rightmost portion of the constructed text. It defaults to (lambda (x) (text)). It is an error for make-final to return anything other than a character, string, or text.
text-unfold is a fairly powerful text constructor. You can use it to convert a list to a text, read a port into a text, reverse a text, copy a text, and so forth. Examples:
(port->text p) = (text-unfold eof-object? values (lambda (x) (read-char p)) (read-char p)) (list->text lis) = (text-unfold null? car cdr lis) (text-tabulate f size) = (text-unfold (lambda (i) (= i size)) f add1 0) ;; To map f over a list lis, producing a text: (text-unfold null? (compose f car) cdr lis)[procedure] (text-unfold-right stop? mapper successor seed [base make-final]) → text
This is a fundamental constructor for texts. It is the same as text-unfold except the results of mapper are assembled into the text in right-to-left order, base is the optional rightmost portion of the constructed text, and make-final produces the leftmost portion of the constructed text.
(text-unfold-right (lambda (n) (< n (char->integer #\A))) (lambda (n) (char-downcase (integer->char n))) (lambda (n) (- n 1)) (char->integer #\Z) #\space (lambda (n) " The English alphabet: ")) ⇒ « The English alphabet: abcdefghijklmnopqrstuvwxyz »
Conversion
[procedure] (textual->text textual) → textWhen given a text, textual->text just returns that text. When given a string, textual->text returns the result of calling string->text on that string. Signals an error when its argument is neither string nor text.
[procedure] (textual->string textual [start end]) → string[procedure] (textual->vector textual [start end]) → char-vector
[procedure] (textual->list textual [start end]) → char-list
textual->string, textual->vector, and textual->list return a newly allocated (unless empty) mutable string, vector, or list of the characters that make up the given subtext or substring.
[procedure] (string->text string [start end]) → text[procedure] (vector->text char-vector [start end]) → text
[procedure] (list->text char-list [start end]) → text
These procedures return a text containing the characters of the given substring, subvector, or sublist. The behavior of the text will not be affected by subsequent mutation of the given string, vector, or list.
[procedure] (reverse-list->text char-list) → textAn efficient implementation of (compose list->text reverse):
(reverse-list->text '(#\a #\B #\c)) → «cBa»
This is a common idiom in the epilogue of text-processing loops that accumulate their result using a list in reverse order. (See also textual-concatenate-reverse for the "chunked" variant.)
[procedure] (textual->utf8 textual [start end]) → bytevector[procedure] (textual->utf16 textual [start end]) → bytevector
[procedure] (textual->utf16be textual [start end]) → bytevector
[procedure] (textual->utf16le textual [start end]) → bytevector
These procedures return a newly allocated (unless empty) bytevector containing a UTF-8 or UTF-16 encoding of the given subtext or substring.
The bytevectors returned by textual->utf8, textual->utf16be, and textual->utf16le do not contain a byte-order mark (BOM). textual->utf16be returns a big-endian encoding, while textual->utf16le returns a little-endian encoding.
The bytevectors returned by textual->utf16 begin with a BOM that declares an implementation-dependent endianness, and the bytevector elements following that BOM encode the given subtext or substring using that endianness.
[procedure] (utf8->text bytevector [start end]) → text[procedure] (utf16->text bytevector [start end]) → text
[procedure] (utf16be->text bytevector [start end]) → text
[procedure] (utf16le->text bytevector [start end]) → text
These procedures interpret their bytevector argument as a UTF-8 or UTF-16 encoding of a sequence of characters, and return a text containing that sequence.
The bytevector subrange given to utf16->text may begin with a byte order mark (BOM); if so, that BOM determines whether the rest of the subrange is to be interpreted as big-endian or little-endian; in either case, the BOM will not become a character in the returned text. If the subrange does not begin with a BOM, it is decoded using the same implementation-dependent endianness used by textual->utf16.
The utf16be->text and utf16le->text procedures interpret their inputs as big-endian or little-endian, respectively. If a BOM is present, it is treated as a normal character and will become part of the result.
It is an error if the bytevector subrange given to utf8->text contains invalid UTF-8 byte sequences. For the other three procedures, it is an error if start or end are odd, or if the bytevector subrange contains invalid UTF-16 byte sequences.
Selection
[procedure] (text-length text) → lenReturns the number of characters within the given text.
[procedure] (text-ref text idx) → charReturns character text[idx], using 0-origin indexing.
[procedure] (textual-length textual) → len[procedure] (textual-ref textual idx) → char
textual-length returns the number of characters in textual, and textual-ref returns the character at character index idx, using 0-origin indexing. These procedures are the generalizations of text-length and text-ref to accept strings as well as texts. If textual is a text, they must execute in O(1) time, but there is no such requirement if textual is a string.
[procedure] (subtext text start end) → text[procedure] (subtextual textual start end) → text
These procedures return a text containing the characters of text or textual beginning with index start (inclusive) and ending with index end (exclusive).
If textual is a string, then that string does not share any storage with the result, so subsequent mutation of that string will not affect the text returned by subtextual. When the first argument is a text, as is required by subtext, the implementation returns a result that shares storage with that text. These procedures just return their first argument when that argument is a text, start is 0, and end is the length of that text.
[procedure] (textual-copy textual [start end]) → textReturns a text containing the characters of textual beginning with index start (inclusive) and ending with index end (exclusive).
Unlike subtext and subtextual, the result of textual-copy never shares substructures that would retain characters or sequences of characters that are substructures of its first argument or previously allocated objects.
If textual-copy returns an empty text, that empty text may be eq? or eqv? to the text returned by (text). If the text returned by textual-copy is non-empty, then it is not eqv? to any previously extant object.
[procedure] (textual-take textual nchars) → text[procedure] (textual-drop textual nchars) → text
[procedure] (textual-take-right textual nchars) → text
[procedure] (textual-drop-right textual nchars) → text
textual-take returns a text containing the first nchars of textual; textual-drop returns a text containing all but the first nchars of textual. textual-take-right returns a text containing the last nchars of textual; textual-drop-right returns a text containing all but the last nchars of textual.
If textual is a string, then that string does not share any storage with the result, so subsequent mutation of that string will not affect the text returned by these procedures. If textual is a text, the result shares storage with that text.
(textual-take "Pete Szilagyi" 6) ⇒ «Pete S» (textual-drop "Pete Szilagyi" 6) ⇒ «zilagyi» (textual-take-right "Beta rules" 5) ⇒ «rules» (textual-drop-right "Beta rules" 5) ⇒ «Beta » ;; It is an error to take or drop more characters than are ;; in the text: (textual-take "foo" 37) ⇒ error[procedure] (textual-pad textual len [char start end]) → text
[procedure] (textual-pad-right textual len [char start end]) → text
Returns a text of length len comprised of the characters drawn from the given subrange of textual, padded on the left (right) by as many occurrences of the character char as needed. If textual has more than len chars, it is truncated on the left (right) to length len. char defaults to #\space.
If textual is a string, then that string does not share any storage with the result, so subsequent mutation of that string will not affect the text returned by these procedures. If textual is a text, the result shares storage with that text whenever sharing would be space-efficient.
(textual-pad "325" 5) ⇒ « 325» (textual-pad "71325" 5) ⇒ «71325» (textual-pad "8871325" 5) ⇒ «71325»[procedure] (textual-trim textual [pred start end]) → text
[procedure] (textual-trim-right textual [pred start end]) → text
[procedure] (textual-trim-both textual [pred start end]) → text
Returns a text obtained from the given subrange of textual by skipping over all characters on the left / on the right / on both sides that satisfy the second argument pred: pred defaults to char-whitespace?.
If textual is a string, then that string does not share any storage with the result, so subsequent mutation of that string will not affect the text returned by these procedures. If textual is a text, the result shares storage with that text whenever sharing would be space-efficient.
(textual-trim-both " The outlook wasn't brilliant, \n\r") ⇒ «The outlook wasn't brilliant,»
Replacement
[procedure] (textual-replace textual1 textual2 start1 end1 [start2 end2]) → textReturns
(textual-append (subtextual textual1 0 start1)
(subtextual textual2 start2 end2)
(subtextual textual1 end1 (textual-length textual1)))
That is, the segment of characters in textual1 from start1 to end1 is replaced by the segment of characters in textual2 from start2 to end2. If start1 = end1, this simply splices the characters drawn from textual2 into textual1 at that position.
Examples:
(textual-replace "The TCL programmer endured daily ridicule." "another miserable perl drone" 4 7 8 22) ⇒ «The miserable perl programmer endured daily ridicule.» (textual-replace "It's easy to code it up in Scheme." "lots of fun" 5 9) ⇒ «It's lots of fun to code it up in Scheme.» (define (textual-insert s i t) (textual-replace s t i i)) (textual-insert "It's easy to code it up in Scheme." 5 "really ") ⇒ «It's really easy to code it up in Scheme.» (define (textual-set s i c) (textual-replace s (text c) i (+ i 1))) (textual-set "Text-ref runs in O(n) time." 19 #\1) ⇒ «Text-ref runs in O(1) time.»
Comparison
[procedure] (textual=? textual1 textual2 textual3 ...) → booleanReturns #t if all the texts have the same length and contain exactly the same characters in the same positions; otherwise returns #f.
[procedure] (textual<? textual1 textual2 textual3 ...) → boolean[procedure] (textual>? textual1 textual2 textual3 ...) → boolean
[procedure] (textual<=? textual1 textual2 textual3 ...) → boolean
[procedure] (textual>=? textual1 textual2 textual3 ...) → boolean
These procedures compare their arguments lexicographically and return #t if they are (respectively): monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically non-increasing.
These comparison predicates are transitive.
[procedure] (textual-ci=? textual1 textual2 textual3 ...) → booleanReturns #t if, after calling textual-foldcase on each of the arguments, all of the case-folded texts would have the same length and contain the same characters in the same positions; otherwise returns #f.
[procedure] (textual-ci<? textual1 textual2 textual3 ...) → boolean[procedure] (textual-ci>? textual1 textual2 textual3 ...) → boolean
[procedure] (textual-ci<=? textual1 textual2 textual3 ...) → boolean
[procedure] (textual-ci>=? textual1 textual2 textual3 ...) → boolean
These procedures behave as though they had called textual-foldcase on their arguments before applying the corresponding procedures without "-ci".
Prefixes & suffixes
[procedure] (textual-prefix-length textual1 textual2 [start1 end1 start2 end2]) → integer[procedure] (textual-suffix-length textual1 textual2 [start1 end1 start2 end2]) → integer
Return the length of the longest common prefix/suffix of textual1 and textual2. For prefixes, this is equivalent to their "mismatch index" (relative to the start indexes).
The optional start/end indexes restrict the comparison to the indicated subtexts of textual1 and textual2.
[procedure] (textual-prefix? textual1 textual2 [start1 end1 start2 end2]) → boolean[procedure] (textual-suffix? textual1 textual2 [start1 end1 start2 end2]) → boolean
Is textual1 a prefix/suffix of textual2?
The optional start/end indexes restrict the comparison to the indicated subtexts of textual1 and textual2.
Searching
[procedure] (textual-index textual pred [start end]) → idx-or-false[procedure] (textual-index-right textual pred [start end]) → idx-or-false
[procedure] (textual-skip textual pred [start end]) → idx-or-false
[procedure] (textual-skip-right textual pred [start end]) → idx-or-false
textual-index searches through the given subtext or substring from the left, returning the index of the leftmost character satisfying the predicate pred. textual-index-right searches from the right, returning the index of the rightmost character satisfying the predicate pred. If no match is found, these procedures return #f.
The start and end arguments specify the beginning and end of the search; the valid indexes relevant to the search include start but exclude end. Beware of "fencepost" errors: when searching right-to-left, the first index considered is (- end 1), whereas when searching left-to-right, the first index considered is start. That is, the start/end indexes describe the same half-open interval [start,end) in these procedures that they do in all other procedures specified by this SRFI.
The skip functions are similar, but use the complement of the criterion: they search for the first char that doesn't satisfy pred. To skip over initial whitespace, for example, say
(subtextual text
(or (textual-skip text char-whitespace?)
(textual-length text))
(textual-length text))
These functions can be trivially composed with textual-take and textual-drop to produce take-while, drop-while, span, and break procedures without loss of efficiency.
[procedure] (textual-contains textual1 textual2 [start1 end1 start2 end2]) → idx-or-false[procedure] (textual-contains-right textual1 textual2 [start1 end1 start2 end2]) → idx-or-false
Does the subtext of textual1 specified by start1 and end1 contain the sequence of characters given by the subtext of textual2 specified by start2 and end2?
Returns #f if there is no match. If start2 = end2, textual-contains returns start1 but textual-contains-right returns end1. Otherwise returns the index in textual1 for the first character of the first/last match; that index lies within the half-open interval [start1,end1), and the match lies entirely within the [start1,end1) range of textual1.
;; Searches "a geek" (textual-contains "eek -- what a geek." "ee" 12 18) ⇒ 15
Note: The names of these procedures do not end with a question mark. This indicates a useful value is returned when there is a match.
Case conversion
[procedure] (textual-upcase textual) → text[procedure] (textual-downcase textual) → text
[procedure] (textual-foldcase textual) → text
[procedure] (textual-titlecase textual) → text
These procedures return the text obtained by applying Unicode's full uppercasing, lowercasing, case-folding, or title-casing algorithms to their argument. In some cases, the length of the result may be different from the length of the argument. Note that language-sensitive mappings and foldings are not used.
Concatenation
[procedure] (textual-append textual ...) → textReturns a text whose sequence of characters is the concatenation of the sequences of characters in the given arguments.
[procedure] (textual-concatenate textual-list) → textConcatenates the elements of textual-list together into a single text.
If any elements of textual-list are strings, then those strings do not share any storage with the result, so subsequent mutation of those string will not affect the text returned by this procedure. The result shares storage with the texts in the list if that sharing would be space-efficient.
[procedure] (textual-concatenate-reverse textual-list [final-textual end]) → textWith no optional arguments, calling this procedure is equivalent to
(textual-concatenate (reverse textual-list))
If the optional argument final-textual is specified, it is effectively consed onto the beginning of textual-list before performing the list-reverse and textual-concatenate operations.
If the optional argument end is given, only the characters up to but not including end in final-textual are added to the result, thus producing
(textual-concatenate
(reverse (cons (subtext final-textual 0 end)
textual-list)))
For example:
(textual-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7)
⇒ «Hello, I must be going.»
[procedure] (textual-join textual-list [delimiter grammar]) → text
This procedure is a simple unparser; it pastes texts together using the delimiter text.
textual-list is a list of texts and/or strings. delimiter is a text or a string. The grammar argument is a symbol that determines how the delimiter is used, and defaults to infix. It is an error for grammar to be any symbol other than these four:
- infix means an infix or separator grammar: insert the delimiter between list elements. An empty list will produce an empty text.
- strict-infix means the same as infix if the textual-list is non-empty, but will signal an error if given an empty list. (This avoids an ambiguity shown in the examples below.)
- suffix means a suffix or terminator grammar: insert the delimiter after every list element.
- prefix means a prefix grammar: insert the delimiter before every list element.
The delimiter is the text used to delimit elements; it defaults to a single space " ".
(textual-join '("foo" "bar" "baz")) ⇒ «foo bar baz» (textual-join '("foo" "bar" "baz") "") ⇒ «foobarbaz» (textual-join '("foo" "bar" "baz") «:») ⇒ «foo:bar:baz» (textual-join '("foo" "bar" "baz") ":" 'suffix) ⇒ «foo:bar:baz:» ;; Infix grammar is ambiguous wrt empty list vs. empty text: (textual-join '() ":") ⇒ «» (textual-join '("") ":") ⇒ «» ;; Suffix and prefix grammars are not: (textual-join '() ":" 'suffix)) ⇒ «» (textual-join '("") ":" 'suffix)) ⇒ «:»
Fold, map & friends
[procedure] (textual-fold kons knil textual [start end]) → value[procedure] (textual-fold-right kons knil textual [start end]) → value
These are the fundamental iterators for texts.
The textual-fold procedure maps the kons procedure across the given text or string from left to right:
(... (kons textual[2] (kons textual[1] (kons textual[0] knil))))
In other words, textual-fold obeys the (tail) recursion
(textual-fold kons knil textual start end) = (textual-fold kons (kons textual[start] knil) start+1 end)
The textual-fold-right procedure maps kons across the given text or string from right to left:
(kons textual[0]
(... (kons textual[end-3]
(kons textual[end-2]
(kons textual[end-1] knil)))))
obeying the (tail) recursion
(textual-fold-right kons knil textual start end) = (textual-fold-right kons (kons textual[end-1] knil) start end-1)
Examples:
;; Convert a text or string to a list of chars. (textual-fold-right cons '() textual) ;; Count the number of lower-case characters in a text or string. (textual-fold (lambda (c count) (if (char-lower-case? c) (+ count 1) count)) 0 textual)
The textual-fold-right combinator is sometimes called a "catamorphism."
[procedure] (textual-map proc textual1 textual2 ...) → textIt is an error if proc does not accept as many arguments as the number of textual arguments passed to textual-map, does not accept characters as arguments, or returns a value that is not a character, string, or text.
The textual-map procedure applies proc element-wise to the characters of the textual arguments, converts each value returned by proc to a text, and returns the concatenation of those texts. If more than one textual argument is given and not all have the same length, then textual-map terminates when the shortest textual argument runs out. The dynamic order in which proc is called on the characters of the textual arguments is unspecified, as is the dynamic order in which the coercions are performed. If any strings returned by proc are mutated after they have been returned and before the call to textual-map has returned, then textual-map returns a text with unspecified contents; the textual-map procedure itself does not mutate those strings.
Example:
(textual-map (lambda (c0 c1 c2)
(case c0
((#\1) c1)
((#\2) (string c2))
((#\-) (text #\- c1))))
(string->text "1222-1111-2222")
(string->text "Hi There!")
(string->text "Dear John"))
⇒ «Hear-here!»
[procedure] (textual-for-each proc textual1 textual2 ...) → unspecified
It is an error if proc does not accept as many arguments as the number of textual arguments passed to textual-for-each or does not accept characters as arguments.
The textual-for-each procedure applies proc element-wise to the characters of the textual arguments, going from left to right. If more than one textual argument is given and not all have the same length, then textual-for-each terminates when the shortest textual argument runs out.
[procedure] (textual-map-index proc textual [start end]) → textCalls proc on each valid index of the specified subtext or substring, converts the results of those calls into texts, and returns the concatenation of those texts. It is an error for proc to return anything other than a character, string, or text. The dynamic order in which proc is called on the indexes is unspecified, as is the dynamic order in which the coercions are performed. If any strings returned by proc are mutated after they have been returned and before the call to textual-map-index has returned, then textual-map-index returns a text with unspecified contents; the textual-map-index procedure itself does not mutate those strings.
[procedure] (textual-for-each-index proc textual [start end]) → unspecifiedCalls proc on each valid index of the specified subtext or substring, in increasing order, discarding the results of those calls. This is simply a safe and correct way to loop over a subtext or substring.
Example:
(let ((txt (string->text "abcde")) (v '())) (textual-for-each-index (lambda (cur) (set! v (cons (char->integer (text-ref txt cur)) v))) txt) v) ⇒ (101 100 99 98 97)[procedure] (textual-count textual pred [start end]) → integer
Returns a count of the number of characters in the specified subtext of textual that satisfy the given predicate.
[procedure] (textual-filter pred textual [start end]) → text[procedure] (textual-remove pred textual [start end]) → text
Filter the given subtext of textual, retaining only those characters that satisfy / do not satisfy pred.
If textual is a string, then that string does not share any storage with the result, so subsequent mutation of that string will not affect the text returned by these procedures. If textual is a text, the result shares storage with that text whenever sharing would be space-efficient.
Replication & splitting
[procedure] (textual-replicate textual from to [start end]) → textThis is an "extended subtext" procedure that implements replicated copying of a subtext or substring.
textual is a text or string; start and end are optional arguments that specify a subtext of textual, defaulting to 0 and the length of textual. This subtext is conceptually replicated both up and down the index space, in both the positive and negative directions. For example, if textual is "abcdefg", start is 3, and end is 6, then we have the conceptual bidirectionally-infinite text
... d e f d e f d e f d e f d e f d e f d ... -9 -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8 +9
textual-replicate returns the subtext of this text beginning at index from, and ending at to. It is an error if from is greater than to.
You can use textual-replicate to perform a variety of tasks:
- To rotate a text left: (textual-replicate "abcdef" 2 8) ⇒ {{«cdefab»}}
- To rotate a text right: (textual-replicate "abcdef" -2 4) ⇒ {{«efabcd»}}
- To replicate a text: (textual-replicate "abc" 0 7) ⇒ {{«abcabca»}}
Note that
- The from/to arguments give a half-open range containing the characters from index from up to, but not including, index to.
- The from/to indexes are not expressed in the index space of textual. They refer instead to the replicated index space of the subtext defined by textual, start, and end.
It is an error if start = end, unless from = to, which is allowed as a special case.
[procedure] (textual-split textual delimiter [grammar limit start end]) → listReturns a list of texts representing the words contained in the subtext of textual from start (inclusive) to end (exclusive). The delimiter is a text or string to be used as the word separator. This will often be a single character, but multiple characters are allowed for use cases such as splitting on "\r\n". The returned list will have one more item than the number of non-overlapping occurrences of the delimiter in the text. If delimiter is an empty text, then the returned list contains a list of texts, each of which contains a single character.
The grammar is a symbol with the same meaning as in the textual-join procedure. If it is infix, which is the default, processing is done as described above, except an empty textual produces the empty list; if grammar is strict-infix, then an empty textual signals an error. The values prefix and suffix cause a leading/trailing empty text in the result to be suppressed.
If limit is a non-negative exact integer, at most that many splits occur, and the remainder of textual is returned as the final element of the list (so the result will have at most limit+1 elements). If limit is not specified or is #f, then as many splits as possible are made. It is an error if limit is any other value.
To split on a regular expression re, use SRFI 115's regexp-split procedure:
(map string->text (regexp-split re (textual->string txt)))
Extensions
The following procedures are extensions to SRFI 135. To use them, import (srfi 135 extensions).
Generators and accumulators
The following procedures provide srfi-158 generators and accumulators for immutable texts. See SRFI 158 for more information on generators and accumulators.
[procedure] (textual->generator textual [[start] end]) → procedureReturns a generator that produces the elements (codepoints) of textual in order. If the optional start or end arguments are provided, then only the elements of a subtext(ual) of textual will be produced.
Example:
(let ((gen (textual->generator (text #\a #\b #\c #\d) 2))) (generator->list gen)) ⇒ (#\c #\d)[procedure] (generator->text gen [max]) → text
Returns a text of the characters produced by gen, in order. If max is provided and is an exact natural number, then at most max values will be read from gen. Otherwise, reading continues until gen returns EOF.
Example:
(let ((gen (list->generator '(#\a #\b #\c #\d))))
(generator->text gen))
⇒ «abcd»
[procedure] (text-accumulator) → procedure
Returns an accumulator that, when invoked on a character, appends that character to an (initially empty) internal text and returns an unspecified value. Invoking the accumulator on an eof-object instead returns the accumulated text.
Example:
(let ((acc (text-accumulator)))
(acc #\a)
(acc #\b)
(acc #\c)
(acc #!eof))
⇒ «abc»
I/O
[procedure] (text-read-line [port]) → text-or-eofAnalogous to R7RS read-line. Reads a newline-delimited line from port and returns it as a text, without the terminating newline. port defaults to the value of (current-input-port). If an EOF is encountered before a newline but after some input has been read, then a text containing this input is returned. If an EOF is encountered before any input has been read, then an EOF object is returned.
CRLF and CRCR…LF line terminators are not yet supported.
[procedure] (read-text max [port]) → text-or-eofAnalogous to R7RS read-string. Reads at most max characters from port (which defaults to the current input port) and returns the result as a text. If an EOF is encountered before any input has been read, then an EOF object is returned.
[procedure] (text-read-lines [port [max]]) → list[text]Analogous to read-lines from (chicken io). Reads lines (as with text-read-line) from port and returns the result as a list. port defaults to the current input port. If max is supplied, then at most max lines are read; otherwise, reading continues until an EOF is encountered. If an EOF is encountered immediately, the empty list is returned.
[procedure] (write-textual textual [port [start [end]]]) → voidAnalogous to R7RS write-string. Writes textual to port, which defaults to the current output port. If start or end are provided, then only a subtext(ual) of textual is written.
Text ports
Text ports are analogous to R7RS string ports.
Text ports are currently implemented as wrapped string ports. A direct implementation is a major TODO for this library.
The following procedures are extensions to SRFI 135.
[procedure] (open-input-textual textual) → input-portReturns a textual input port which delivers characters from textual.
Example:
(let ((p (open-input-textual (text #\a #\b #\c #\newline)))) (read-line p)) ⇒ "abc"[procedure] (open-output-text) → output-port
Returns a textual output port that accumulates characters for retrieval by get-output-text.
[procedure] (get-output-text port) → textReturns a text of the characters written to port so far, in the order in which they were output. It is an error if port was not created by open-output-text.
Example:
(let ((p (open-output-text)))
(display "hello " p)
(display "dave" p)
(write-char #\newline p)
(get-output-text p))
⇒ «hello dave»
About This Egg
Dependencies
The following eggs are required:
To run the included tests, the test and srfi-158 eggs are also required.
Type and bound checks
Type declarations are provided for checking compiled code using this library. Extensive runtime checking is also performed on the types and bounds of arguments. Since these checks are implemented using assert, they can be disabled by compiling with the -unsafe option.
Maintainer
Wolfgang Corcoran-Mathe <wcm at sigwinch dot xyzzy without the zy>
Repository
Version History
- 0.1
- (2020-11-12) Initial release.
- 0.2
- (2021-08-31) Add types and bound checks, many small improvements.
- 0.3
- (2021-09-06) Add generators, accumulator, and I/O procedures. Rewrite tests to use test.
- 1.0.0
- (2022-09-27) Reorganize library. Move extensions to their own module. Improve type and bounds checks, and follow CHICKEN's condition protocol.
License
Copyright (C) William D Clinger (2016). All Rights Reserved. Copyright (C) Wolfgang Corcoran-Mathe (2022) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.