Editing page: SRFI-135: Immutable Texts

You can edit this page using wiki syntax for markup.

Article contents:

== SRFI-135: Immutable Texts

This SRFI specifies a new data type of immutable texts. The
operations of this new data type include analogues for all of the
non-mutating operations on strings specified by the R7RS and most of
those specified by SRFI 130, but the immutability of texts and
uniformity of character-based indexing simplify the specification of
those operations while avoiding several inefficiencies associated with
the mutability of Scheme's strings.

This egg provides the UTF-8 version of the SRFI 135 sample
implementation; all procedures provided are fully Unicode-aware.

This egg provides some extensions to SRFI 135, including I/O procedures
for texts.  To use these extensions, import {{(srfi 135 extensions)}}.

[[toc:]]

== Author

William D. Clinger

== SRFI Description

This page includes excerpts from the SRFI document, but is primarily
intended to document the forms exported by the egg. For a full
description of this SRFI, see the full
[[https://srfi.schemers.org/srfi-135/srfi-135.html|SRFI document]].

== Conceptual model

Immutable texts are like strings except they can't be mutated.

Immutability makes it easier to use space-efficient representations
such as UTF-8 and UTF-16 without incurring the cost of scanning from
the beginning when character indexes are used (as with string-ref).

When mutation is not needed, immutable texts are likely to be more
efficient than strings with respect to space or time. In some
implementations, immutable texts may be more efficient than strings
with respect to both space and time.

== Subtypes

This SRFI defines two new types:

* ''text'' is a type consisting of the immutable texts for which {{text?}} returns true.

* ''textual'' is a union type consisting of the texts and strings for which {{textual?}} returns true.

The subtypes of the new ''textual'' type include the new ''text'' type and
Scheme's traditional ''string'' type, which consists of the values for
which {{string?}} returns true. The string type includes both mutable
strings and the (conceptually) immutable strings that are the values
of string literals and calls to {{symbol->string}}.

== Notation

In the following procedure specifications:

* A ''text'' argument is an immutable text.

* A ''textual'' argument is an immutable text or a string.

* A ''char'' argument is a character.

* An ''idx'' argument is an exact non-negative integer specifying a valid character index into a text or string. The valid character indexes of a text or string ''textual'' of length ''n'' are the exact integers ''idx'' satisfying 0 ≤ ''idx'' < ''n''.

* A ''k'' argument or result is a ''position'': an exact non-negative integer that is either a valid character index for one of the textual arguments or is the length of a textual argument.

* ''start'' and ''end'' arguments are positions specifying a half-open interval of indexes for a subtext or substring. When omitted, ''start'' defaults to 0 and ''end'' to the length of the corresponding ''textual'' argument. It is an error unless 0 ≤ ''start'' ≤ ''end'' ≤ {{(textual-length textual)}}.

* A ''len'' or ''nchars'' argument is an exact non-negative integer specifying some number of characters, usually the length of a text or string.

* A ''pred'' argument is a unary character predicate, taking a character as its one argument and returning a value that will be interpreted as true or false. Unless noted otherwise, as with {{textual-every}} and {{textual-any}}, all predicates passed to procedures specified in this SRFI may be called in any order and any number of times. It is an error if ''pred'' has side effects or does not behave functionally (returning the same result whenever it is called with the same character); the implementation does not detect those errors.

* An ''obj'' argument may be any value at all.

It is an error to pass values that violate the specification above.

Arguments given in square brackets are optional. Unless otherwise
noted in the text describing the procedure, any prefix of these
optional arguments may be supplied, from zero arguments to the full
list. When a procedure returns multiple values, this is shown by
listing the return values in square brackets, as well.

== Procedures

=== Predicates

<procedure>(text? obj) → boolean</procedure>

Is ''obj'' an immutable text? In particular, {{(text?}} ''obj''{{)}}
returns false if {{(string?}} ''obj''{{)}} returns true, which
implies {{string?}} returns false if {{text?}} returns true.

<procedure>(textual? obj) → boolean</procedure>

Returns true if and only if ''obj'' is an immutable text or a string.

<procedure>(textual-null? textual) → boolean</procedure>

Returns true if and only if ''textual'' is the empty text or the
empty string.

<procedure>(textual-every pred textual [start end]) → value</procedure>
<procedure>(textual-any pred textual [start end]) → value</procedure>

Checks to see if every/any character in ''textual'' satisfies
''pred'', proceeding from left (index ''start'') to right (index
''end''). These procedures are short-circuiting: if ''pred'' returns
false, {{textual-every}} does not call ''pred'' on subsequent characters;
if ''pred'' returns true, {{textual-any}} does not call ''pred'' on
subsequent characters; Both procedures are "witness-generating":

* If {{textual-every}} is given an empty interval (with ''start'' = ''end''), it returns {{#t}}.

* If {{textual-every}} returns true for a non-empty interval (with ''start'' < ''end''), the returned true value is the one returned by the final call to the predicate on {{(text-ref (textual-copy}} ''text''{{) (-}} ''end'' {{1))}}.

* If {{textual-any}} returns true, the returned true value is the one returned by the predicate.

Note: The names of these procedures do not end with a
question mark. This indicates a general value is returned
instead of a simple boolean ({{#t}} or {{#f}}).

=== Constructors

Returns a text of the given length filled with the given
character.

Returns a text consisting of the given characters.

<procedure>(text-tabulate proc len) → text</procedure>

''proc'' is a procedure that accepts an exact integer as its
argument and returns a character. Constructs a text of
size ''len'' by calling ''proc'' on each value from 0 (inclusive)
to ''len'' (exclusive) to produce the corresponding element
of the text. The order in which ''proc'' is called on those
indexes is not specified.

<procedure>(text-unfold stop? mapper successor seed [base make-final]) → text</procedure>
This is a fundamental constructor for texts.

* ''successor'' is used to generate a series of "seed" values from the initial seed: ''seed'', (''successor seed''), (''successor''^2 ''seed''), (''successor''^3 ''seed''), ...

* ''stop?'' tells us when to stop — when it returns true when applied to one of these seed values.

* ''mapper'' maps each seed value to the corresponding character(s) in the result text, which are assembled into that text in left-to-right order. It is an error for ''mapper'' to return anything other than a character, string, or text.

* ''base'' is the optional initial/leftmost portion of the constructed text, which defaults to the empty text {{(text)}}. It is an error if base is anything other than a character, string, or text.

* ''make-final'' is applied to the terminal seed value (on which ''stop?'' returns true) to produce the final/rightmost portion of the constructed text. It defaults to {{(lambda (x) (text))}}. It is an error for ''make-final'' to return anything other than a character, string, or text.

{{text-unfold}} is a fairly powerful text constructor. You
can use it to convert a list to a text, read a port into
a text, reverse a text, copy a text, and so forth.
Examples:

<enscript highlight="scheme">
(port->text p) = (text-unfold eof-object?
                              values
                              (lambda (x) (read-char p))
                              (read-char p))

(list->text lis) = (text-unfold null? car cdr lis)

(text-tabulate f size) = (text-unfold (lambda (i) (= i size)) f add1 0)

;; To map f over a list lis, producing a text:
(text-unfold null? (compose f car) cdr lis)
</enscript>

<procedure>(text-unfold-right stop? mapper successor seed [base make-final]) → text</procedure>

This is a fundamental constructor for texts. It is the
same as {{text-unfold}} except the results of ''mapper'' are
assembled into the text in right-to-left order, ''base'' is
the optional rightmost portion of the constructed text,
and ''make-final'' produces the leftmost portion of the
constructed text.

<enscript highlight="scheme">
(text-unfold-right (lambda (n) (< n (char->integer #\A)))
                   (lambda (n) (char-downcase (integer->char n)))
                   (lambda (n) (- n 1))
                   (char->integer #\Z)
                   #\space
                   (lambda (n) " The English alphabet: "))
  ⇒ « The English alphabet: abcdefghijklmnopqrstuvwxyz »
</enscript>

=== Conversion

<procedure>(textual->text textual) → text</procedure>

When given a text, {{textual->text}} just returns that text.
When given a string, {{textual->text}} returns the result of
calling {{string->text}} on that string. Signals an error
when its argument is neither string nor text.

<procedure>(textual->string textual [start end]) → string</procedure>
<procedure>(textual->vector textual [start end]) → char-vector</procedure>
<procedure>(textual->list textual [start end]) → char-list</procedure>

{{textual->string}}, {{textual->vector}}, and {{textual->list}}
return a newly allocated (unless empty) mutable string,
vector, or list of the characters that make up the given
subtext or substring.

<procedure>(string->text string [start end]) → text</procedure>
<procedure>(vector->text char-vector [start end]) → text</procedure>
<procedure>(list->text char-list [start end]) → text</procedure>

These procedures return a text containing the characters
of the given substring, subvector, or sublist. The
behavior of the text will not be affected by subsequent
mutation of the given string, vector, or list.

<procedure>(reverse-list->text char-list) → text</procedure>

An efficient implementation of {{(compose list->text reverse)}}:

<enscript highlight="scheme">
(reverse-list->text '(#\a #\B #\c)) → «cBa»
</enscript>

This is a common idiom in the epilogue of text-processing
loops that accumulate their result using a list in
reverse order. (See also {{textual-concatenate-reverse}} for
the "chunked" variant.)

<procedure>(textual->utf8 textual [start end]) → bytevector</procedure>
<procedure>(textual->utf16 textual [start end]) → bytevector</procedure>
<procedure>(textual->utf16be textual [start end]) → bytevector</procedure>
<procedure>(textual->utf16le textual [start end]) → bytevector</procedure>

These procedures return a newly allocated (unless empty)
bytevector containing a UTF-8 or UTF-16 encoding of the
given subtext or substring.

The bytevectors returned by {{textual->utf8}},
{{textual->utf16be}}, and {{textual->utf16le}} do not contain a
byte-order mark (BOM). {{textual->utf16be}} returns a
big-endian encoding, while {{textual->utf16le}} returns a
little-endian encoding.

The bytevectors returned by {{textual->utf16}} begin with a
BOM that declares an implementation-dependent endianness,
and the bytevector elements following that BOM encode the
given subtext or substring using that endianness.

<procedure>(utf8->text bytevector [start end]) → text</procedure>
<procedure>(utf16->text bytevector [start end]) → text</procedure>
<procedure>(utf16be->text bytevector [start end]) → text</procedure>
<procedure>(utf16le->text bytevector [start end]) → text</procedure>

These procedures interpret their bytevector argument as a
UTF-8 or UTF-16 encoding of a sequence of characters, and
return a text containing that sequence.

The bytevector subrange given to {{utf16->text}} may begin
with a byte order mark (BOM); if so, that BOM determines
whether the rest of the subrange is to be interpreted as
big-endian or little-endian; in either case, the BOM will
not become a character in the returned text. If the
subrange does not begin with a BOM, it is decoded using
the same implementation-dependent endianness used by
{{textual->utf16}}.

The {{utf16be->text}} and {{utf16le->text}} procedures interpret
their inputs as big-endian or little-endian,
respectively. If a BOM is present, it is treated as a
normal character and will become part of the result.

It is an error if the bytevector subrange given to
{{utf8->text}} contains invalid UTF-8 byte sequences. For the
other three procedures, it is an error if start or end
are odd, or if the bytevector subrange contains invalid
UTF-16 byte sequences.

=== Selection

<procedure>(text-length text) → len</procedure>

Returns the number of characters within the given text.

Returns character {{text[idx]}}, using 0-origin indexing.

<procedure>(textual-length textual) → len</procedure>
<procedure>(textual-ref textual idx) → char</procedure>

{{textual-length}} returns the number of characters in
''textual'', and {{textual-ref}} returns the character at
character index ''idx'', using 0-origin indexing. These
procedures are the generalizations of {{text-length}} and
{{text-ref}} to accept strings as well as texts. If ''textual''
is a text, they must execute in O(1) time, but there is
no such requirement if ''textual'' is a string.

<procedure>(subtext text start end) → text</procedure>
<procedure>(subtextual textual start end) → text</procedure>

These procedures return a text containing the characters
of ''text'' or ''textual'' beginning with index ''start'' (inclusive)
and ending with index ''end'' (exclusive).

If ''textual'' is a string, then that string does not share any
storage with the result, so subsequent mutation of that
string will not affect the text returned by {{subtextual}}.
When the first argument is a text, as is required by
{{subtext}}, the implementation returns a result that shares
storage with that text. These procedures just return their
first argument when that argument is a text, ''start'' is 0, and
''end'' is the length of that text.

<procedure>(textual-copy textual [start end]) → text</procedure>

Returns a text containing the characters of ''textual''
beginning with index ''start'' (inclusive) and ending with
index ''end'' (exclusive).

Unlike {{subtext}} and {{subtextual}}, the result of {{textual-copy}}
never shares substructures that would retain characters
or sequences of characters that are substructures of its
first argument or previously allocated objects.

If {{textual-copy}} returns an empty text, that empty text
may be {{eq?}} or {{eqv?}} to the text returned by ''(text)''.
If the text returned by {{textual-copy}} is non-empty, then it
is not {{eqv?}} to any previously extant object.

<procedure>(textual-take textual nchars) → text</procedure>
<procedure>(textual-drop textual nchars) → text</procedure>
<procedure>(textual-take-right textual nchars) → text</procedure>
<procedure>(textual-drop-right textual nchars) → text</procedure>

{{textual-take}} returns a text containing the first ''nchars''
of ''textual''; {{textual-drop}} returns a text containing all
but the first ''nchars'' of ''textual''. {{textual-take-right}}
returns a text containing the last ''nchars'' of ''textual'';
{{textual-drop-right}} returns a text containing all but the
last ''nchars'' of ''textual''.

If ''textual'' is a string, then that string does not share
any storage with the result, so subsequent mutation of
that string will not affect the text returned by these
procedures. If ''textual'' is a text, the result shares storage with
that text.

<enscript highlight="scheme">
(textual-take "Pete Szilagyi" 6) ⇒ «Pete S»
(textual-drop "Pete Szilagyi" 6) ⇒ «zilagyi»

(textual-take-right "Beta rules" 5) ⇒ «rules»
(textual-drop-right "Beta rules" 5) ⇒ «Beta »

;; It is an error to take or drop more characters than are
;; in the text:
(textual-take "foo" 37) ⇒ error
</enscript>

<procedure>(textual-pad textual len [char start end]) → text</procedure>
<procedure>(textual-pad-right textual len [char start end]) → text</procedure>

Returns a text of length ''len'' comprised of the characters
drawn from the given subrange of ''textual'', padded on the
left (right) by as many occurrences of the character ''char''
as needed. If ''textual'' has more than ''len'' chars, it is
truncated on the left (right) to length ''len''. ''char''
defaults to {{#\space}}.

<enscript highlight="scheme">
(textual-pad "325" 5) ⇒ «  325»
(textual-pad "71325" 5) ⇒ «71325»
(textual-pad "8871325" 5) ⇒ «71325»
</enscript>

<procedure>(textual-trim textual [pred start end]) → text</procedure>
<procedure>(textual-trim-right textual [pred start end]) → text</procedure>
<procedure>(textual-trim-both textual [pred start end]) → text</procedure>

Returns a text obtained from the given subrange of
''textual'' by skipping over all characters on the left / on
the right / on both sides that satisfy the second
argument ''pred'': ''pred'' defaults to {{char-whitespace?}}.

<enscript highlight="scheme">
(textual-trim-both "  The outlook wasn't brilliant,  \n\r") ⇒ «The outlook wasn't brilliant,»
</enscript>

=== Replacement

<procedure>(textual-replace textual1 textual2 start1 end1 [start2 end2]) → text</procedure>

Returns

<enscript highlight="scheme">
(textual-append (subtextual textual1 0 start1)
(subtextual textual2 start2 end2)
(subtextual textual1 end1 (textual-length textual1)))
</enscript>

That is, the segment of characters in ''textual1'' from
''start1'' to ''end1'' is replaced by the segment of characters
in ''textual2'' from ''start2'' to ''end2''. If ''start1'' = ''end1'',
this simply splices the characters drawn from ''textual2'' into
''textual1'' at that position.

Examples:

<enscript highlight="scheme">
(textual-replace "The TCL programmer endured daily ridicule."
                 "another miserable perl drone"
                 4
                 7
                 8
                 22)
⇒ «The miserable perl programmer endured daily ridicule.»

(textual-replace "It's easy to code it up in Scheme."
                 "lots of fun"
                 5
                 9)
⇒ «It's lots of fun to code it up in Scheme.»

(define (textual-insert s i t) (textual-replace s t i i))

(textual-insert "It's easy to code it up in Scheme." 5 "really ")
⇒ «It's really easy to code it up in Scheme.»

(define (textual-set s i c) (textual-replace s (text c) i (+ i 1)))

(textual-set "Text-ref runs in O(n) time." 19 #\1)
⇒ «Text-ref runs in O(1) time.»
</enscript>

=== Comparison

<procedure>(textual=? textual1 textual2 textual3 ...) → boolean</procedure>

Returns {{#t}} if all the texts have the same length and
contain exactly the same characters in the same
positions; otherwise returns {{#f}}.

<procedure>(textual<?  textual1 textual2 textual3 ...) → boolean</procedure>
<procedure>(textual>?  textual1 textual2 textual3 ...) → boolean</procedure>
<procedure>(textual<=? textual1 textual2 textual3 ...) → boolean</procedure>
<procedure>(textual>=? textual1 textual2 textual3 ...) → boolean</procedure>

These procedures compare their arguments lexicographically and
return {{#t}} if they are (respectively): monotonically increasing,
monotonically decreasing, monotonically non-decreasing, or
monotonically non-increasing.

These comparison predicates are transitive.

<procedure>(textual-ci=? textual1 textual2 textual3 ...) → boolean</procedure>

Returns {{#t}} if, after calling {{textual-foldcase}} on each of
the arguments, all of the case-folded texts would have
the same length and contain the same characters in the
same positions; otherwise returns {{#f}}.

<procedure>(textual-ci<?  textual1 textual2 textual3 ...) → boolean</procedure>
<procedure>(textual-ci>?  textual1 textual2 textual3 ...) → boolean</procedure>
<procedure>(textual-ci<=? textual1 textual2 textual3 ...) → boolean</procedure>
<procedure>(textual-ci>=? textual1 textual2 textual3 ...) → boolean</procedure>

These procedures behave as though they had called
{{textual-foldcase}} on their arguments before applying the
corresponding procedures without "{{-ci}}".

=== Prefixes & suffixes

<procedure>(textual-prefix-length textual1 textual2 [start1 end1 start2 end2]) → integer</procedure>
<procedure>(textual-suffix-length textual1 textual2 [start1 end1 start2 end2]) → integer</procedure>

Return the length of the longest common prefix/suffix of
''textual1'' and ''textual2''. For prefixes, this is equivalent
to their "mismatch index" (relative to the start
indexes).

The optional ''start''/''end'' indexes restrict the comparison to
the indicated subtexts of ''textual1'' and ''textual2''.

<procedure>(textual-prefix? textual1 textual2 [start1 end1 start2 end2]) → boolean</procedure>
<procedure>(textual-suffix? textual1 textual2 [start1 end1 start2 end2]) → boolean</procedure>

Is ''textual1'' a prefix/suffix of ''textual2''?

The optional start/end indexes restrict the comparison to
the indicated subtexts of ''textual1'' and ''textual2''.

=== Searching

<procedure>(textual-index textual pred [start end]) → idx-or-false</procedure>
<procedure>(textual-index-right textual pred [start end]) → idx-or-false</procedure>
<procedure>(textual-skip textual pred [start end]) → idx-or-false</procedure>
<procedure>(textual-skip-right textual pred [start end]) → idx-or-false</procedure>

''textual-index'' searches through the given subtext or
substring from the left, returning the index of the
leftmost character satisfying the predicate ''pred''.
{{textual-index-right}} searches from the right, returning
the index of the rightmost character satisfying the
predicate ''pred''. If no match is found, these procedures
return {{#f}}.

The ''start'' and ''end'' arguments specify the beginning and end
of the search; the valid indexes relevant to the search
include ''start'' but exclude ''end''. Beware of "fencepost"
errors: when searching right-to-left, the first index
considered is {{(-}} ''end'' {{1)}}, whereas when searching
left-to-right, the first index considered is ''start''. That
is, the start/end indexes describe the same half-open
interval [''start'',''end'') in these procedures that they do in
all other procedures specified by this SRFI.

The skip functions are similar, but use the complement of
the criterion: they search for the first char that
doesn't satisfy ''pred''. To skip over initial whitespace,
for example, say

<enscript highlight="scheme">
(subtextual text
            (or (textual-skip text char-whitespace?)
                (textual-length text))
            (textual-length text))
</enscript>

These functions can be trivially composed with
{{textual-take}} and {{textual-drop}} to produce take-while,
drop-while, span, and break procedures without loss of
efficiency.

<procedure>(textual-contains textual1 textual2 [start1 end1 start2 end2]) → idx-or-false</procedure>
<procedure>(textual-contains-right textual1 textual2 [start1 end1 start2 end2]) → idx-or-false</procedure>

Does the subtext of ''textual1'' specified by ''start1'' and ''end1''
contain the sequence of characters given by the subtext
of ''textual2'' specified by ''start2'' and ''end2''?

Returns {{#f}} if there is no match. If ''start2'' = ''end2'',
{{textual-contains}} returns ''start1'' but
{{textual-contains-right}} returns ''end1''. Otherwise returns
the index in ''textual1'' for the first character of the
first/last match; that index lies within the half-open
interval [''start1'',''end1''), and the match lies entirely
within the [''start1'',''end1'') range of ''textual1''.

<enscript highlight="scheme">
;; Searches "a geek"
(textual-contains "eek -- what a geek." "ee" 12 18) ⇒ 15
</enscript>

Note: The names of these procedures do not end with a
question mark. This indicates a useful value is returned
when there is a match.

=== Case conversion

<procedure>(textual-upcase textual) → text</procedure>
<procedure>(textual-downcase textual) → text</procedure>
<procedure>(textual-foldcase textual) → text</procedure>
<procedure>(textual-titlecase textual) → text</procedure>

These procedures return the text obtained by applying
Unicode's full uppercasing, lowercasing, case-folding, or
title-casing algorithms to their argument. In some cases,
the length of the result may be different from the length
of the argument. Note that language-sensitive mappings
and foldings are not used.

=== Concatenation

<procedure>(textual-append textual ...) → text</procedure>

Returns a text whose sequence of characters is the
concatenation of the sequences of characters in the given
arguments.

<procedure>(textual-concatenate textual-list) → text</procedure>

Concatenates the elements of {{textual-list}} together into a
single text.

If any elements of {{textual-list}} are strings, then those
strings do not share any storage with the result, so
subsequent mutation of those string will not affect the
text returned by this procedure. The result shares storage with
the texts in the list if that sharing would be space-efficient.

<procedure>(textual-concatenate-reverse textual-list [final-textual end]) → text</procedure>

With no optional arguments, calling this procedure is
equivalent to
<enscript highlight="scheme">
(textual-concatenate (reverse textual-list))
</enscript>
If the optional argument ''final-textual'' is specified, it
is effectively consed onto the beginning of ''textual-list''
before performing the list-reverse and
{{textual-concatenate}} operations.

If the optional argument ''end'' is given, only the
characters up to but not including ''end'' in ''final-textual''
are added to the result, thus producing
<enscript highlight="scheme">
(textual-concatenate
 (reverse (cons (subtext final-textual 0 end)
                textual-list)))
</enscript>
For example:
<enscript highlight="scheme">
(textual-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7)
 ⇒ «Hello, I must be going.»
</enscript>

<procedure>(textual-join textual-list [delimiter grammar]) → text</procedure>

This procedure is a simple unparser; it pastes texts
together using the delimiter text.

''textual-list'' is a list of texts and/or strings. ''delimiter''
is a text or a string. The ''grammar'' argument is a symbol
that determines how the delimiter is used, and defaults
to {{infix}}. It is an error for ''grammar'' to be any symbol
other than these four:

* {{infix}} means an infix or separator grammar: insert the delimiter between list elements. An empty list will produce an empty text.

* {{strict-infix}} means the same as {{infix}} if the textual-list is non-empty, but will signal an error if given an empty list. (This avoids an ambiguity shown in the examples below.)

* {{suffix}} means a suffix or terminator grammar: insert the delimiter after every list element.

* {{prefix}} means a prefix grammar: insert the delimiter before every list element.

The delimiter is the text used to delimit elements; it
defaults to a single space " ".

<enscript highlight="scheme">
(textual-join '("foo" "bar" "baz")) ⇒ «foo bar baz»
(textual-join '("foo" "bar" "baz") "") ⇒ «foobarbaz»
(textual-join '("foo" "bar" "baz") «:») ⇒ «foo:bar:baz»
(textual-join '("foo" "bar" "baz") ":" 'suffix) ⇒ «foo:bar:baz:»

;; Infix grammar is ambiguous wrt empty list vs. empty text:
(textual-join '()   ":") ⇒ «»
(textual-join '("") ":") ⇒ «»

;; Suffix and prefix grammars are not:
(textual-join '()   ":" 'suffix)) ⇒ «»
(textual-join '("") ":" 'suffix)) ⇒ «:»
</enscript>

=== Fold, map & friends

<procedure>(textual-fold kons knil textual [start end]) → value</procedure>
<procedure>(textual-fold-right kons knil textual [start end]) → value</procedure>

These are the fundamental iterators for texts.

The ''textual-fold'' procedure maps the ''kons'' procedure across
the given text or string from left to right:

<enscript highlight="scheme">
(... (kons textual[2] (kons textual[1] (kons textual[0] knil))))
</enscript>

In other words, ''textual-fold'' obeys the (tail) recursion

<enscript highlight="scheme">
(textual-fold kons knil textual start end) = (textual-fold kons (kons textual[start] knil) start+1 end)
</enscript>

The {{textual-fold-right}} procedure maps ''kons'' across the
given text or string from right to left:

<enscript highlight="scheme">
(kons textual[0]
      (... (kons textual[end-3]
                 (kons textual[end-2]
                       (kons textual[end-1] knil)))))
</enscript>

obeying the (tail) recursion

<enscript highlight="scheme">
(textual-fold-right kons knil textual start end) = (textual-fold-right kons (kons textual[end-1] knil) start end-1)
</enscript>

Examples:

<enscript highlight="scheme">
;; Convert a text or string to a list of chars.
(textual-fold-right cons '() textual)

;; Count the number of lower-case characters in a text or string.
(textual-fold (lambda (c count)
                (if (char-lower-case? c) (+ count 1) count))
              0
              textual)
</enscript>

The textual-fold-right combinator is sometimes called a
"catamorphism."

<procedure>(textual-map proc textual1 textual2 ...) → text</procedure>

It is an error if ''proc'' does not accept as many arguments
as the number of ''textual'' arguments passed to {{textual-map}},
does not accept characters as arguments, or returns a
value that is not a character, string, or text.

The textual-map procedure applies ''proc'' element-wise to
the characters of the ''textual'' arguments, converts each
value returned by ''proc'' to a text, and returns the
concatenation of those texts. If more than one ''textual''
argument is given and not all have the same length, then
{{textual-map}} terminates when the shortest ''textual'' argument
runs out. The dynamic order in which ''proc'' is called on
the characters of the ''textual'' arguments is unspecified,
as is the dynamic order in which the coercions are
performed. If any strings returned by ''proc'' are mutated
after they have been returned and before the call to
{{textual-map}} has returned, then {{textual-map}} returns a text
with unspecified contents; the {{textual-map}} procedure
itself does not mutate those strings.

Example:

<enscript highlight="scheme">
(textual-map (lambda (c0 c1 c2)
               (case c0
                 ((#\1) c1)
                 ((#\2) (string c2))
                 ((#\-) (text #\- c1))))
             (string->text "1222-1111-2222")
             (string->text "Hi There!")
             (string->text "Dear John"))
  ⇒ «Hear-here!»
</enscript>

<procedure>(textual-for-each proc textual1 textual2 ...) → unspecified</procedure>

It is an error if ''proc'' does not accept as many arguments
as the number of ''textual'' arguments passed to {{textual-for-each}}
or does not accept characters as arguments.

The {{textual-for-each}} procedure applies ''proc'' element-wise
to the characters of the ''textual'' arguments, going from
left to right. If more than one ''textual'' argument is given
and not all have the same length, then {{textual-for-each}}
terminates when the shortest ''textual'' argument runs out.

<procedure>(textual-map-index proc textual [start end]) → text</procedure>

Calls ''proc'' on each valid index of the specified subtext
or substring, converts the results of those calls into
texts, and returns the concatenation of those texts. It
is an error for ''proc'' to return anything other than a
character, string, or text. The dynamic order in which
''proc'' is called on the indexes is unspecified, as is the
dynamic order in which the coercions are performed. If
any strings returned by ''proc'' are mutated after they have
been returned and before the call to {{textual-map-index}}
has returned, then {{textual-map-index}} returns a text with
unspecified contents; the {{textual-map-index}} procedure
itself does not mutate those strings.

<procedure>(textual-for-each-index proc textual [start end]) → unspecified</procedure>

Calls ''proc'' on each valid index of the specified subtext
or substring, in increasing order, discarding the results
of those calls. This is simply a safe and correct way to
loop over a subtext or substring.

Example:

<enscript highlight="scheme">
(let ((txt (string->text "abcde"))
      (v '()))
  (textual-for-each-index
   (lambda (cur) (set! v (cons (char->integer (text-ref txt cur)) v)))
   txt)
  v) ⇒ (101 100 99 98 97)
</enscript>

<procedure>(textual-count textual pred [start end]) → integer</procedure>

Returns a count of the number of characters in the
specified subtext of {{textual}} that satisfy the given
predicate.

<procedure>(textual-filter pred textual [start end]) → text</procedure>
<procedure>(textual-remove pred textual [start end]) → text</procedure>

Filter the given subtext of ''textual'', retaining only those
characters that satisfy / do not satisfy ''pred''.

=== Replication & splitting

<procedure>(textual-replicate textual from to [start end]) → text</procedure>

This is an "extended subtext" procedure that implements
replicated copying of a subtext or substring.

''textual'' is a text or string; ''start'' and ''end'' are optional
arguments that specify a subtext of ''textual'', defaulting
to 0 and the length of ''textual''. This subtext is
conceptually replicated both up and down the index space,
in both the positive and negative directions. For
example, if ''textual'' is {{"abcdefg"}}, ''start'' is 3, and ''end''
is 6, then we have the conceptual bidirectionally-infinite
text

...  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d ...
     -9 -8 -7 -6 -5 -4 -3 -2 -1  0 +1 +2 +3 +4 +5 +6 +7 +8 +9

{{textual-replicate}} returns the subtext of this text
beginning at index ''from'', and ending at ''to''. It is an error
if ''from'' is greater than ''to''.

You can use {{textual-replicate}} to perform a variety of
tasks:

* To rotate a text left: {{(textual-replicate "abcdef" 2 8)}} ⇒ {{«cdefab»}}

* To rotate a text right: {{(textual-replicate "abcdef" -2 4)}} ⇒ {{«efabcd»}}

* To replicate a text: {{(textual-replicate "abc" 0 7)}} ⇒ {{«abcabca»}}

Note that

* The ''from''/''to'' arguments give a half-open range containing the characters from index ''from'' up to, but not including, index ''to''.

* The ''from''/''to'' indexes are not expressed in the index space of ''textual''. They refer instead to the replicated index space of the subtext defined by ''textual'', ''start'', and ''end''.

It is an error if ''start'' = ''end'', unless ''from'' = ''to'', which is
allowed as a special case.

<procedure>(textual-split textual delimiter [grammar limit start end]) → list</procedure>

Returns a list of texts representing the words contained
in the subtext of ''textual'' from ''start'' (inclusive) to ''end''
(exclusive). The ''delimiter'' is a text or string to be used
as the word separator. This will often be a single
character, but multiple characters are allowed for use
cases such as splitting on {{"\r\n"}}. The returned list will
have one more item than the number of non-overlapping
occurrences of the delimiter in the text. If ''delimiter'' is
an empty text, then the returned list contains a list of
texts, each of which contains a single character.

The ''grammar'' is a symbol with the same meaning as in the
{{textual-join}} procedure. If it is ''infix'', which is the
default, processing is done as described above, except an
empty ''textual'' produces the empty list; if ''grammar'' is
''strict-infix'', then an empty ''textual'' signals an error. The
values ''prefix'' and ''suffix'' cause a leading/trailing empty
text in the result to be suppressed.

If ''limit'' is a non-negative exact integer, at most that
many splits occur, and the remainder of ''textual'' is
returned as the final element of the list (so the result
will have at most ''limit''+1 elements). If ''limit'' is not
specified or is {{#f}}, then as many splits as possible are
made. It is an error if ''limit'' is any other value.

To split on a regular expression ''re'', use SRFI 115's
{{regexp-split}} procedure:

<enscript highlight="scheme">
(map string->text (regexp-split re (textual->string txt)))
</enscript>

== Extensions

The following procedures are extensions to SRFI 135.  To use them,
import {{(srfi 135 extensions)}}.

=== Generators and accumulators

The following procedures provide [[srfi-158]] generators and
accumulators for immutable texts.  See
[[https://srfi.schemers.org/srfi-158/srfi-158.html|SRFI 158]] for
more information on generators and accumulators.

<procedure>(textual->generator textual [[start] end]) → procedure</procedure>

Returns a generator that produces the elements (codepoints) of
''textual'' in order. If the optional ''start'' or ''end''
arguments are provided, then only the elements of a subtext(ual)
of ''textual'' will be produced.

Example:

<enscript highlight="scheme">
(let ((gen (textual->generator (text #\a #\b #\c #\d) 2)))
  (generator->list gen))
  ⇒ (#\c #\d)
</enscript>

<procedure>(generator->text gen [max]) → text</procedure>

Returns a text of the characters produced by ''gen'', in order. If
''max'' is provided and is an exact natural number, then at most
''max'' values will be read from ''gen''.  Otherwise, reading
continues until ''gen'' returns EOF.

Example:

<enscript highlight="scheme">
(let ((gen (list->generator '(#\a #\b #\c #\d))))
  (generator->text gen))
  ⇒ «abcd»
</enscript>

<procedure>(text-accumulator) → procedure</procedure>

Returns an accumulator that, when invoked on a character, appends
that character to an (initially empty) internal text and returns
an unspecified value. Invoking the accumulator on an eof-object
instead returns the accumulated text.

Example:

<enscript highlight="scheme">
(let ((acc (text-accumulator)))
  (acc #\a)
  (acc #\b)
  (acc #\c)
  (acc #!eof))
  ⇒ «abc»
</enscript>

=== I/O

Analogous to R7RS {{read-line}}. Reads a newline-delimited line from
''port'' and returns it as a text, without the terminating newline.
''port'' defaults to the value of {{(current-input-port)}}. If an EOF
is encountered before a newline but after some input has been read,
then a text containing this input is returned. If an EOF is
encountered before any input has been read, then an EOF object is
returned.

CRLF and CRCR…LF line terminators are not yet supported.

Analogous to R7RS {{read-string}}. Reads at most ''max'' characters
from ''port'' (which defaults to the current input port) and returns
the result as a text. If an EOF is encountered before any input has
been read, then an EOF object is returned.

<procedure>(text-read-lines [port [max]]) → list[text]</procedure>

Analogous to {{read-lines}} from {{(chicken io)}}. Reads lines (as
with {{text-read-line}}) from ''port'' and returns the result as
a list. ''port'' defaults to the current input port. If ''max'' is
supplied, then at most ''max'' lines are read; otherwise, reading
continues until an EOF is encountered. If an EOF is encountered
immediately, the empty list is returned.

<procedure>(write-textual textual [port [start [end]]]) → void</procedure>

Analogous to R7RS {{write-string}}. Writes ''textual'' to ''port'',
which defaults to the current output port. If ''start'' or ''end''
are provided, then only a subtext(ual) of ''textual'' is written.

==== Text ports

Text ports are analogous to R7RS string ports.

Text ports are currently implemented as wrapped string ports. A
direct implementation is a major TODO for this library.

The following procedures are extensions to SRFI 135.

<procedure>(open-input-textual textual) → input-port</procedure>

Returns a textual input port which delivers characters from
''textual''.

Example:

<enscript highlight="scheme">(let ((p (open-input-textual (text #\a #\b #\c #\newline))))
  (read-line p))
  ⇒ "abc"
</enscript>

<procedure>(open-output-text) → output-port</procedure>

Returns a textual output port that accumulates characters for
retrieval by {{get-output-text}}.

<procedure>(get-output-text port) → text</procedure>

Returns a text of the characters written to ''port'' so far, in the
order in which they were output. It is an error if ''port'' was not
created by {{open-output-text}}.

Example:

<enscript highlight="scheme">(let ((p (open-output-text)))
  (display "hello " p)
  (display "dave" p)
  (write-char #\newline p)
  (get-output-text p))
  ⇒ «hello dave»
</enscript>

== About This Egg

=== Dependencies

The following eggs are required:

* [[srfi-1]]
* [[srfi-141]]
* [[utf8]]
* [[r7rs]]
* [[typed-records]]

To run the included tests, the [[test]] and [[srfi-158]] eggs are
also required.

=== Type and bound checks

Type declarations are provided for checking compiled code using this
library.  Extensive runtime checking is also performed on the types
and bounds of arguments.  Since these checks are implemented using
{{assert}}, they can be disabled by compiling with the {{-unsafe}}
option.

=== Maintainer

Wolfgang Corcoran-Mathe <wcm at sigwinch dot xyzzy without the zy>

=== Repository

[[https://github.com/Zipheir/srfi-135|GitHub]]

=== Version History

; 0.1 : (2020-11-12) Initial release.
; 0.2 : (2021-08-31) Add types and bound checks, many small improvements.
; 0.3 : (2021-09-06) Add generators, accumulator, and I/O procedures. Rewrite tests to use [[test]].
; 1.0.0 : (2022-09-27) Reorganize library. Move extensions to their own module. Improve type and bounds checks, and follow CHICKEN's condition protocol.

== License

Copyright (C) William D Clinger (2016). All Rights Reserved.
  Copyright (C) Wolfgang Corcoran-Mathe (2022)
  
  Permission is hereby granted, free of charge, to any person obtaining
  a copy of this software and associated documentation files (the
  "Software"), to deal in the Software without restriction, including
  without limitation the rights to use, copy, modify, merge, publish,
  distribute, sublicense, and/or sell copies of the Software, and to
  permit persons to whom the Software is furnished to do so, subject to
  the following conditions:
  
  The above copyright notice and this permission notice shall be
  included in all copies or substantial portions of the Software.
  
  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
  CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
  TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
  SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Description of your changes:

I would like to authenticate

Authentication

Username:Password:

Spam control

What do you get when you subtract 15 from 7?