SRFI-152: String Library (reduced)

SRFI 152 is a modern, streamlined string library for Scheme that incorporates most of the forms from srfi-13 while being consistent with the R5RS, R6RS, and R7RS-small string procedures.

This egg provides a Unicode-aware implementation of SRFI 152 which imports most forms from utf8.

For a library of similar operations which may be more efficient on longer strings, see srfi-135.

  1. SRFI-152: String Library (reduced)
  2. SRFI Description
    1. Rationale
    2. Notation
    3. Predicates
    4. Constructors
    5. Conversion
    6. Selection
    7. Replacement
    8. Comparison
    9. Prefixes and suffixes
    10. Searching
    11. Concatenation
    12. Fold and map and friends
    13. Replication and splitting
    14. Input-output
    15. Mutation
  3. Implementation
  4. About This Egg
    1. Dependencies
    2. Author
    3. Repository
    4. Maintainer
    5. Copyright
    6. Version history

SRFI Description

This page includes excerpts from the SRFI document, but is primarily intended to document the forms exported by the egg. For a full description of the SRFI, see the SRFI document.

Rationale

This SRFI omits the following bells, whistles, and gongs of SRFI 13:

In addition, this SRFI includes the string-segment and string-split procedures from other sources.

For completeness, string-take-while, string-drop-while, string-take-while-right, and string-drop-while-right are also provided.

Notation

In the following procedure specifications:

It is an error to pass values that violate the specification above.

Predicates

[procedure] (string? obj) → boolean

Is obj a string?

[procedure] (string-null? string) → boolean

Is string the empty string?

[procedure] (string-every pred string [start end]) → value
[procedure] (string-any pred string [start end]) → value

Checks to see if every/any character in string satisfies pred, proceeding from left (index start) to right (index end). These procedures are short-circuiting: if pred returns false, string-every does not call pred on subsequent characters; if pred returns true, string-any does not call pred on subsequent characters. Both procedures are "witness-generating":

Note: The names of these procedures do not end with a question mark. This indicates a general value is returned instead of a simple boolean (#t or #f).

Constructors

[procedure] (make-string len char) → string

Returns a string of the given length filled with the given character.

[procedure] (string char …) → string

Returns a string consisting of the given characters.

[procedure] (string-tabulate proc len) → string

proc is a procedure that accepts an exact integer as its argument and returns a character. Constructs a string of size len by calling proc on each value from 0 (inclusive) to len (exclusive) to produce the corresponding element of the string. The order in which proc is called on those indexes is not specified.

Rationale: Although string-unfold is more general, string-tabulate is likely to run faster for the common special case it implements.

[procedure] (string-unfold stop? mapper successor seed [base make-final]) → string

This is a fundamental constructor for strings.

string-unfold is a fairly powerful string constructor. You can use it to convert a list to a string, read a port into a string, reverse a string, copy a string, and so forth.

Examples:

(port->string p) = (string-unfold eof-object?
                                  values
                                  (lambda (x) (read-char p))
                                  (read-char p))

(list->string lis) = (string-unfold null? car cdr lis)

(string-tabulate f size) = (string-unfold (lambda (i) (= i size)) f add1 0)

To map f over a list lis, producing a string:

(string-unfold null? (compose f car) cdr lis)

Interested functional programmers may enjoy noting that string-fold-right and string-unfold are in some sense inverses. That is, given operations knull?, kar, kdr, and kons, and a value knil satisfying

(kons (kar x) (kdr x)) = x  and  (knull? knil) = #t

then

(string-fold-right kons knil (string-unfold knull? kar kdr x)) = x

and

(string-unfold knull? kar kdr (string-fold-right kons knil string)) = string.

This combinator pattern is sometimes called an "anamorphism."

[procedure] (string-unfold-right stop? mapper successor seed [base make-final]) → string

This is a fundamental constructor for strings. It is the same as string-unfold except the results of mapper are assembled into the string in right-to-left order, base is the optional rightmost portion of the constructed string, and make-final produces the leftmost portion of the constructed string. If mapper returns a string, the string is prepended to the constructed string (without reversal).

(string-unfold-right (lambda (n) (< n (char->integer #\A)))
                     (lambda (n) (char-downcase (integer->char n)))
                     (lambda (n) (- n 1))
                     (char->integer #\Z)
                     #\space
                     (lambda (n) " The English alphabet: "))" The English alphabet: abcdefghijklmnopqrstuvwxyz "

(string-unfold-right null?
                     (lambda (x) (string  #\[ (car x) #\]))
                     cdr
                     '(#\a #\b #\c))"[c|b|a]"

Conversion

[procedure] (string->vector string [start end]) → char-vector
[procedure] (string->list string [start end]) → char-list

These procedures return a newly allocated (unless empty) vector or list of the characters that make up the given substring.

[procedure] (vector->string char-vector [start end]) → string
[procedure] (list->string char-list) → string

These procedures return a string containing the characters of the given (sub)vector or list. The behavior of the string will not be affected by subsequent mutation of the given vector or list.

[procedure] (reverse-list->string char-list) → string

Semantically equivalent to (compose list->string reverse):

(reverse-list->string '(#\a #\B #\c))"cBa"

This is a common idiom in the epilogue of string-processing loops that accumulate their result using a list in reverse order. (See also string-concatenate-reverse for the "chunked" variant.)

Selection

[procedure] (string-length string) → len

Returns the number of characters within the given string.

[procedure] (string-ref string idx) → char

Returns character string[idx], using 0-origin indexing.

[procedure] (substring string start end) → string
[procedure] (string-copy string [start end]) → string

These procedures return a string containing the characters of string beginning with index start (inclusive) and ending with index end (exclusive). The only difference is that substring requires all three arguments, whereas string-copy requires only one.

[procedure] (string-take string nchars) → string
[procedure] (string-drop string nchars) → string
[procedure] (string-take-right string nchars) → string
[procedure] (string-drop-right string nchars) → string

string-take returns a string containing the first nchars of string; string-drop returns a string containing all but the first nchars of string. string-take-right returns a string containing the last nchars of string; string-drop-right returns a string containing all but the last nchars of string.

(string-take "Pete Szilagyi" 6)"Pete S"
(string-drop "Pete Szilagyi" 6)"zilagyi"

(string-take-right "Beta rules" 5)"rules"
(string-drop-right "Beta rules" 5)"Beta "

It is an error to take or drop more characters than are in the string:

(string-take "foo" 37);; error
[procedure] (string-pad string len [char start end]) → string
[procedure] (string-pad-right string len [char start end]) → string

Returns a string of length len comprised of the characters drawn from the given subrange of string, padded on the left (right) by as many occurrences of the character char as needed. If string has more than len chars, it is truncated on the left (right) to length len. char defaults to #\space.

(string-pad "325" 5)"  325"
(string-pad "71325" 5)"71325"
(string-pad "8871325" 5)"71325"
[procedure] (string-trim string [pred start end]) → string
[procedure] (string-trim-right string [pred start end]) → string
[procedure] (string-trim-both string [pred start end]) → string

Returns a string obtained from the given subrange of string by skipping over all characters on the left side / on the right side / on both sides that satisfy pred: pred defaults to char-whitespace?.

(string-trim-both "  The outlook wasn't brilliant,  \n\r")"The outlook wasn't brilliant,"

Replacement

[procedure] (string-replace string₁ string₂ start₁ end₁ [start₂ end₂]) → string

Returns

(string-append (substring string₁ 0 start₁)
               (substring string₂ start₂ end₂)
               (substring string₁ end₁ (string-length string₁)))

That is, the segment of characters in string₁ from start₁ to end₁ is replaced by the segment of characters in string₂ from start₂ to end₂. If start₁ = end₁, this simply splices the characters drawn from string₂ into string₁ at that position.

Examples:

(string-replace "The TCL programmer endured daily ridicule."
                 "another miserable perl drone" 4 7 8 22)"The miserable perl programmer endured daily ridicule."

(string-replace "It's easy to code it up in Scheme." "lots of fun" 5 9)"It's lots of fun to code it up in Scheme."

(define (string-insert s i t) (string-replace s t i i))

(string-insert "It's easy to code it up in Scheme." 5 "really ")"It's really easy to code it up in Scheme."

(define (string-set s i c) (string-replace s (string c) i (+ i 1)))

(string-set "String-ref runs in O(n) time." 21 #\1)"String-ref runs in O(1) time."

Comparison

[procedure] (string=? string₁ string₂ string₃ …) → boolean

Returns #t if all the strings have the same length and contain exactly the same characters in the same positions; otherwise returns #f.

[procedure] (string<? string₁ string₂ string₃ …) → boolean
[procedure] (string>? string₁ string₂ string₃ …) → boolean
[procedure] (string<=? string₁ string₂ string₃ …) → boolean
[procedure] (string>=? string₁ string₂ string₃ …) → boolean

These procedures return #t if their arguments are (respectively): monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically non-increasing.

These comparison predicates are required to be transitive.

In this implementation, these procedures are simply variadic versions of the string comparison procedures from R5RS; that is, they provide lexicographic comparison of strings.

In all cases, a pair of strings must satisfy exactly one of string<?, string=?, and string>?, must satisfy string<=? if and only if they do not satisfy string>?, and must satisfy string>=? if and only if they do not satisfy string<?.

[procedure] (string-ci=? string₁ string₂ string₃ …) → boolean

Returns #t if, after calling string-foldcase on each of the arguments, all of the case-folded strings would have the same length and contain the same characters in the same positions; otherwise returns #f.

[procedure] (string-ci<? string1 string2 string₃ …) → boolean
[procedure] (string-ci>? string1 string2 string₃ …) → boolean
[procedure] (string-ci<=? string1 string2 string₃ …) → boolean
[procedure] (string-ci>=? string1 string2 string₃ …) → boolean

These procedures behave as though they had called string-foldcase on their arguments before applying the corresponding procedures without "-ci".

Prefixes and suffixes

[procedure] (string-prefix-length string₁ string₂ [start₁ end₁ start₂ end₂]) → integer
[procedure] (string-suffix-length string₁ string₂ [start₁ end₁ start₂ end₂]) → integer

Return the length of the longest common prefix/suffix of string₁ and string₂. For prefixes, this is equivalent to their "mismatch index" (relative to the start indexes).

The optional start/end indexes restrict the comparison to the indicated substrings of string₁ and string₂.

[procedure] (string-prefix? string₁ string₂ [start₁ end₁ start₂ end₂]) → boolean
[procedure] (string-suffix? string₁ string₂ [start₁ end₁ start₂ end₂]) → boolean

Is string₁ a prefix/suffix of string₂?

The optional start/end indexes restrict the comparison to the indicated substrings of string₁ and string₂.

Searching

[procedure] (string-index string pred [start end]) → idx-or-false
[procedure] (string-index-right string pred [start end]) → idx-or-false
[procedure] (string-skip string pred [start end]) → idx-or-false
[procedure] (string-skip-right string pred [start end]) → idx-or-false

string-index searches through the given substring from the left, returning the index of the leftmost character satisfying the predicate pred. string-index-right searches from the right, returning the index of the rightmost character satisfying the predicate pred. If no match is found, these procedures return #f.

The start and end arguments specify the beginning and end of the search; the valid indexes relevant to the search include start but exclude end. Beware of "fencepost" errors: when searching right-to-left, the first index considered is (- end 1), whereas when searching left-to-right, the first index considered is start. That is, the start / end indexes describe the same half-open interval [start, end) in these procedures that they do in all other procedures specified by this SRFI.

The skip functions are similar, but use the complement of the criterion: they search for the first char that doesn't satisfy pred. To skip over initial whitespace, for example, say

(substring string
           (or (string-skip string char-whitespace?)
               (string-length string))
           (string-length string))
[procedure] (string-contains string₁ string₂ [start₁ end₁ start₂ end₂]) → idx-or-false
[procedure] (string-contains-right string₁ string₂ [start₁ end₁ start₂ end₂]) → idx-or-false

Does the substring of string₁ specified by start₁ and end₁ contain the sequence of characters given by the substring of string₂ specified by start₂ and end₂?

Returns #f if there is no match. If start₂ = end₂, string-contains returns start₁ but string-contains-right returns end₁. Otherwise returns the index in string₁ for the first character of the first/last match; that index lies within the half-open interval [start₁, end₁), and the match lies entirely within the [start₁, end₁) range of string₁.

(string-contains "eek--what a geek." "ee" 12 18) ; Searches "a geek"
    ⇒ 15

Note: The names of these procedures do not end with a question mark. This indicates a useful value is returned when there is a match.

[procedure] (string-take-while string pred [start end]) → string
[procedure] (string-take-while-right string pred [start end]) → string

Returns the longest initial prefix/suffix of the substring of string specified by start and end whose elements all satisfy the predicate pred. (Not SRFI 13 procedures.)

[procedure] (string-drop-while string pred [start end]) → string
[procedure] (string-drop-while-right string pred [start end]) → string

Drops the longest initial prefix/suffix of the substring of string specified by start and end whose elements all satisfy the predicate pred, and returns the rest of the string.

These are the same as string-trim and string-trim-right, but with a different order of arguments. (Not SRFI 13 procedures.)

[procedure] (string-span string pred [start end]) → [string string]
[procedure] (string-break string pred [start end]) → [string string]

string-span splits the substring of string specified by start and end into the longest initial prefix whose elements all satisfy pred, and the remaining tail. string-break inverts the sense of the predicate: the tail commences with the first element of the input string that satisfies the predicate. (Not SRFI 13 procedures.)

In other words: string-span finds the initial span of elements satisfying pred, and string-break breaks the string at the first element satisfying pred.

string-span is equivalent to

(values (string-take-while pred string)
        (string-drop-while pred string))

Concatenation

[procedure] (string-append string …) → string

Returns a string whose sequence of characters is the concatenation of the sequences of characters in the given arguments.

[procedure] (string-concatenate string-list) → string

Concatenates the elements of string-list together into a single string.

Rationale: Some implementations of Scheme limit the number of arguments that may be passed to an n-ary procedure, so the

(apply string-append string-list)

idiom, which is otherwise equivalent to using this procedure, is not as portable.

[procedure] (string-concatenate-reverse string-list [final-string end]) → string

With no optional arguments, calling this procedure is equivalent to

(string-concatenate (reverse string-list))

If the optional argument final-string is specified, it is effectively consed onto the beginning of string-list before performing the list-reverse and string-concatenate operations.

If the optional argument end is given, only the characters up to but not including end in final-string are added to the result, thus producing

(string-concatenate
  (reverse (cons (substring final-string 0 end)
                 string-list)))

Example:

(string-concatenate-reverse '(" must be" "Hello, I") " going.XXXX" 7)"Hello, I must be going."

Rationale: This procedure is useful when constructing procedures that accumulate character data into lists of string buffers, then convert the accumulated data into a single string when done. The optional end argument accommodates that use case by allowing the final buffer to be only partially full without having to copy it a second time, as string-take would require.

Note that reversing a string simply reverses the sequence of code points it contains. Caution should be taken if a grapheme cluster is divided between two string arguments.

[procedure] (string-join string-list [delimiter grammar]) → string

This procedure is a simple unparser; it pastes strings together using the delimiter string.

string-list is a list of strings. delimiter is a string. The grammar argument is a symbol that determines how the delimiter is used, and defaults to infix.

It is an error for grammar to be any symbol other than these four:

The delimiter is the string used to delimit elements; it defaults to a single space " ".

Examples

(string-join '("foo" "bar" "baz"))"foo bar baz"
(string-join '("foo" "bar" "baz") "")"foobarbaz"
(string-join '("foo" "bar" "baz") ":")"foo:bar:baz"
(string-join '("foo" "bar" "baz") ":" 'suffix)"foo:bar:baz:"

;; Infix grammar is ambiguous wrt empty list vs. empty string:
(string-join '()   ":")""
(string-join '("") ":")""

;; Suffix and prefix grammars are not:
(string-join '()   ":" 'suffix)) ⇒ ""
(string-join '("") ":" 'suffix)) ⇒ ":"

Fold and map and friends

[procedure] (string-fold kons knil string [start end]) → value
[procedure] (string-fold-right kons knil string [start end]) → value

These are the fundamental iterators for strings.

The string-fold procedure maps the kons procedure across the given string from left to right:

((kons string[2] (kons string[1] (kons string[0] knil))))

In other words, string-fold obeys the (tail) recursion

(string-fold kons knil string start end)

(string-fold kons (kons string[start] knil) start+1 end)

The string-fold-right procedure maps kons across the given string from right to left:

(kons string[0]
      ((kons string[end-3]
                 (kons string[end-2]
                       (kons string[end-1]
                             knil)))))

obeying the (tail) recursion

(string-fold-right kons knil string start end)

(string-fold-right kons (kons string[end-1] knil) start end-1)

Examples:

;;; Convert a string to a list of chars.
(string-fold-right cons '() string)

;;; Count the number of lower-case characters in a string.
(string-fold (lambda (c count)
                (if (char-lower-case? c)
                    (+ count 1)
                    count))
              0
              string)

The string-fold-right combinator is sometimes called a "catamorphism."

[procedure] (string-map proc string₁ string₂ …) → string

It is an error if proc does not accept as many arguments as the number of string arguments passed to string-map, does not accept characters as arguments, or returns a value that is not a character or string.

The string-map procedure applies proc element-wise to the characters of the string arguments, converts each value returned by proc to a string, and returns the concatenation of those strings. If more than one string argument is given and not all have the same length, then string-map terminates when the shortest string argument runs out. The dynamic order in which proc is called on the characters of the string arguments is unspecified, as is the dynamic order in which the coercions are performed. If any strings returned by proc are mutated after they have been returned and before the call to string-map has returned, then string-map returns a string with unspecified contents; the string-map procedure itself does not mutate those strings.

Compatibility note: This string-map is the one found in R7RS-small, and not the one from srfi-13 (which takes optional start/end indices rather than additional string arguments).

Example:

(string-map (lambda (c0 c1 c2)
               (case c0
                ((#\1) c1)
                ((#\2) (string c2))
                ((#\-) (string #\- c1))))
            "1222-1111-2222"
            "Hi There!"
            "Dear John")"Hear-here!"
[procedure] (string-for-each proc string₁ string₂ …) → unspecified

It is an error if proc does not accept as many arguments as the number of string arguments passed to string-for-each or does not accept characters as arguments.

The string-for-each procedure applies proc element-wise to the characters of the string arguments, going from left to right. If more than one string argument is given and not all have the same length, then string-for-each terminates when the shortest string argument runs out.

[procedure] (string-count string pred [start end]) → integer

Returns a count of the number of characters in the specified substring of string that satisfy pred.

[procedure] (string-filter pred string [start end]) → string
[procedure] (string-remove pred string [start end]) → string

Filter the given substring of string, retaining only those characters that satisfy / do not satisfy pred.

Compatibility note: In SRFI 13, string-remove is called string-delete. This is inconsistent with SRFI 1 and other SRFIs.

Replication and splitting

[procedure] (string-replicate string from to [start end]) → string

This is an "extended substring" procedure that implements replicated copying of a substring. This substring is conceptually replicated both up and down the index space, in both the positive and negative directions.

For example, if string is "abcdefg", start is 3, and end is 6, then we have the conceptual bidirectionally-infinite string

…  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d  e  f  d …
  -9 -8 -7 -6 -5 -4 -3 -2 -1  0 +1 +2 +3 +4 +5 +6 +7 +8 +9

string-replicate returns the substring of this string beginning at index from, and ending at to.

It is an error if from is greater than to.

You can use string-replicate to perform a variety of tasks:

Note that

It is an error if start = end, unless from = to, which is allowed as a special case.

Compatibility note: In SRFI 13, this procedure is called xsubstring.

[procedure] (string-segment string k) → list

Returns a list of strings representing the consecutive substrings of length k. The last string may be shorter than k. (Not a SRFI 13 procedure.)

[procedure] (string-split string delimiter [grammar limit start end]) → list

Returns a list of strings representing the words contained in the substring of string. The delimiter is a string to be used as the word separator. This will often be a single character, but multiple characters are allowed for use cases such as splitting on "\r\n". The returned list will have one more item than the number of non-overlapping occurrences of the delimiter in the string. If delimiter is an empty string, then the returned list contains a list of strings, each of which contains a single character. (Not a SRFI 13 procedure; replaces string-tokenize).

The grammar is a symbol with the same meaning as in the string-join procedure. If it is infix, which is the default, processing is done as described above, except an empty string produces the empty list; if grammar is strict-infix, then an empty string signals an error. The values prefix and suffix cause a leading/trailing empty string in the result to be suppressed.

If limit is a non-negative exact integer, at most that many splits occur, and the remainder of string is returned as the final element of the list (so the result will have at most limit + 1 elements). If limit is not specified or is #f, then as many splits as possible are made. It is an error if limit is any other value.

To split on a regular expression, use SRFI 115's regexp-split procedure. (irregex-split from the irregex module.)

Compatibility note: Don't confuse this with the similar string-split procedure from (chicken string).

Input-output

[procedure] (read-string k [port]) → string

Reads the next k characters, or as many as are available before the end of file, from the textual input port port into a newly allocated string in left-to-right order and returns the string. If no characters are available before the end of file, an end-of-file object is returned. The default port is the value of (current-input-port).

[procedure] (write-string string [port start end]) → unspecified

Writes the characters of string from index start to index end onto textual output port port. The default port is the value of (current-output-port).

Compatibility note: Don't confuse this with the similar write-string procedure from (chicken io).

Mutation

[procedure] (string-set! string k char) → unspecified

Stores char in element k of string.

[procedure] (string-fill! string fill [start end]) → unspecified

Stores fill (which must be a character) in elements start through end of string.

[procedure] (string-copy! to at from [start end]) → unspecified

Copies the characters of string from between start and end to string to, starting at at. The order in which characters are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary string and then into the destination. This can be achieved without allocating storage by making sure to copy in the correct direction in such circumstances.

Implementation

This implementation is based upon the utf8 egg, so procedures exported by (srfi 152) are fully Unicode-aware.

As with the utf8 egg, care must be taken when importing this egg to avoid confusion with the non-Unicode procedures exported by scheme, (chicken string), r7rs, etc.

A sample import section:

(import (except scheme make-string string string-length string-ref
                       string-set! substring string->list list->string
                       string-fill!)
        (except (chicken string) reverse-list->string string-split
                                 substring-index)
        (except (chicken io) read-string write-string)
        (srfi 152))

About This Egg

Dependencies

The utf8 egg is required. The test egg is required to run the included tests.

Author

John Cowan

Ported to Chicken 5 and packaged by Sergey Goldgaber.

Repository

GitHub

Maintainer

Wolfgang Corcoran-Mathe

Contact: <wcm at sigwinch dot xyzzy minus the zy>

Copyright (C) John Cowan (2017).

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Version history

0.1
Packaged for Chicken Scheme 5.2.0
0.2
Change maintainer information.
1.0
(2022-01-17) First full-Unicode, utf8-based release.