string-utils

  1. string-utils
  2. Documentation
    1. Memoized String
      1. Usage
      2. make-string+
      3. string+
      4. global-string
    2. String Hexadecimal
      1. Usage
      2. string->hex
      3. hex->string
    3. Hexadecimal Procedures
      1. Usage
      2. str_to_hex
      3. blob_to_hex
      4. u8vec_to_hex
      5. s8vec_to_hex
      6. mem_to_hex
      7. hex_to_str
      8. hex_to_blob
    4. Unicode Utilities
      1. Usage
      2. ascii-codepoint?
      3. char->unicode-string
      4. unicode-string
      5. *unicode-string
      6. unicode-make-string
      7. unicode-surrogate?
      8. unicode-surrogates->codepoint
    5. String Utilities
      1. Usage
      2. string-split-chars
      3. string-unzip
      4. string-zip
      5. string-trim-whitespace-both
      6. list-as-string
      7. number->padded-string
      8. string-fixed-length
      9. string-longest-common-prefix
      10. string-longest-common-suffix
      11. string-longest-prefix
      12. string-longest-suffix
    6. String Interpolation
      1. Usage
      2. Compiler Command-Line
      3. Interpreter Command-Line
    7. String Interpolation Syntax
      1. Usage
      2. set-sharp-string-interpolation-syntax
    8. String Interpolator
      1. Usage
      2. string-interpolate
    9. Rabin Karp String Search
      1. Usage
      2. make-string-search
      3. collect-string-search
  3. Requirements
  4. Author
  5. Version history
  6. License

Documentation

Memoized String

Usage

(import memoized-string)

make-string+

[procedure] (make-string+ COUNT [FILL]) -> string

An interning make-string.

FILL is any valid char, including codepoints outside of the ASCII range, which produce UTF-8 strings.

string+

[procedure] (string+ [CHAR...]) -> string

An interning string.

CHAR is any valid char, including codepoints outside of the ASCII range, which produce UTF-8 strings.

global-string

[procedure] (global-string STR) -> string

Share common string space.

String Hexadecimal

Usage

(import string-hexadecimal)

string->hex

[procedure] (string->hex STRING [START [END]]) -> string

Returns a hexadecimal represenation of STRING. START and END are substring limits.

STRING is treated as a string of bytes, a byte-vector.

hex->string

[procedure] (hex->string STRING [START [END]]) -> string

Returns the binary representation of a hexadecimalSTRING. START and END are substring limits.

Hexadecimal Procedures

Usage

(import to-hex)

str_to_hex

[procedure] (str_to_hex OUT IN OFF LEN)

Writes the ASCII hexadecimal representation of IN to OUT.

IN is a nonnull-string.

OFF is the byte offset.

LEN is the length of the bytes at OFF.

OUT is a string of length >= (+ LEN 2).

blob_to_hex

[procedure] (blob_to_hex OUT IN OFF LEN)

Like str_to_hex except IN is a nonnull-blob.

u8vec_to_hex

[procedure] (u8vec_to_hex OUT IN OFF LEN)

Like str_to_hex except IN is a nonnull-u8vector.

s8vec_to_hex

[procedure] (s8vec_to_hex OUT IN OFF LEN)

Like str_to_hex except IN is a nonnull-s8vector.

mem_to_hex

[procedure] (mem_to_hex OUT IN OFF LEN)

Like str_to_hex except IN is a nonnull-c-pointer.

hex_to_str

[procedure] (hex_to_str OUT IN OFF LEN)

Reads the ASCII hexadecimal representation of IN to OUT.

IN is a nonnull-string.

OFF is the byte offset.

LEN is the length of the bytes at OFF.

OUT is a string of length >= (/ LEN 2).

hex_to_blob

[procedure] (hex_to_blob OUT IN OFF LEN)

Like hex_to_str except OUT is a blob of size >= (/ LEN 2).

Unicode Utilities

The name of this extension is misleading. Only UTF-8 is currently supported.

For a better treatment of UTF-8 see the utf-8 extension.

Usage

(import unicode-utils)

ascii-codepoint?

[procedure] (ascii-codepoint? CHAR) -> boolean

char->unicode-string

[procedure] (char->unicode-string CHAR) -> string

Returns a string formed from Unicode codepoint CHAR.

Note that the (string-length) (except under utf-8) may not be equal to 1.

Generates an error should the codepoint be out-of-range.

unicode-string

[procedure] (unicode-string [CHAR...]) -> string

Returns a string formed from Unicode codepoints CHAR...

Note that the (string-length) (except under utf-8) may not be equal to the length of CHAR....

Generates an error should the codepoint be out-of-range.

*unicode-string

[procedure] (*unicode-string CHARS) -> string

Returns a string formed from Unicode codepoints CHARS, a (list-of char).

unicode-make-string

[procedure] (unicode-make-string COUNT [FILL]) -> string

Returns a string formed from COUNT occurrences of the Unicode codepoint FILL. The FILL default is #\space.

Note that the (string-length) (except under utf-8) may not be equal to COUNT.

Generates an error should the codepoint be out-of-range.

unicode-surrogate?

[procedure] (unicode-surrogate? NUM) -> boolean

unicode-surrogates->codepoint

[procedure] (unicode-surrogates->codepoint HIGH LOW) -> (or boolean fixnum)

Returns the codepoint for the valid surrogate pair HIGH and LOW. Otherwise returns #f.

String Utilities

Usage

(import string-utils)

string-split-chars

[procedure] (string-split-chars STR [DELIMITERS]) -> (list-of string) (list-of char)

Returns a list of substrings of STR & a list of the characters, from DELIMITERS, separating those substrings.

STR
string ; version string.
DELIMITERS
string ; string of version component delimiter characters, default ".,".
(string-split-chars "a.2,c" "$,.")
;=> ("a" "2" "c") (#\. #\,)

string-unzip

[procedure] (string-unzip STR [DELIMITERS]) -> (list-of string) (list-of string)

Returns a list of substrings of STR & a list of the delimiters, from DELIMITERS, separating those substrings.

STR
string ; version string.
DELIMITERS
string ; string of version component delimiter characters, default ".,".
(string-unzip "a.2,c" "$,.")
;=> ("a" "2" "c") ("." ",")

string-zip

[procedure] (string-zip PARTS PUNCS) -> string

Returns a string formed from the concatenation of the PARTS and the interspersion of the PUNCS.

PARTS
(list-of string) ; version components.
PUNCS
(list-of string) ; version component separators.
(string-zip ("a" "2" "c") ("." ","))
;=> "a.2,c"

string-trim-whitespace-both

[procedure] (string-trim-whitespace-both S) -> string

Returns the string S with whitespace trimmed.

list-as-string

[procedure] (list-as-string LS) -> string

Returns the list LS written to a string.

number->padded-string

[procedure] (number->padded-string N WIDTH [PADCHAR [BASE]]) -> string
N
number ; source
WIDTH
fixnum ; field width
PADCHAR
char ; padding character
BASE
fixnum ; number conversion base

string-fixed-length

[procedure] (string-fixed-length S N [pad-char: #\space] [trailing: "..."]) -> string

Returns the string S with the string-length fixed to N.

A shorter string is padded. A longer string is truncated, & suffixed with the trailing.

string-longest-common-prefix

[procedure] (string-longest-common-prefix STRINGS) -> string

Returns the longest comment prefix of STRINGS.

STRINGS
(list-of string)

string-longest-common-suffix

[procedure] (string-longest-common-suffix STRINGS) -> string

Returns the longest comment suffix of STRINGS.

STRINGS
(list-of string)

string-longest-prefix

[procedure] (string-longest-prefix CANDIDATE OTHERS) -> (or boolean string)

Returns the member with the longest comment prefix of CANDIDATE from OTHERS, or #f.

CANDIDATE
string
OTHERS
(list-of string)

string-longest-suffix

[procedure] (string-longest-suffix CANDIDATE OTHERS) -> (or boolean string)

Returns the member with the longest comment suffix of CANDIDATE from OTHERS, or #f.

CANDIDATE
string
OTHERS
(list-of string)

String Interpolation

Extends the read-syntax with #"..." where tagged scheme expressions in the string are evaluated at runtime:

#"@ #(+ 1 2)## (#'and #1 #2) = #(and 1 2) trailing #"
;=> "@ 3# (and 1 2) = 2 trailing #"

Similar to the #<# multi-line string.

See Multiline String Constant with Embedded Expressions.

Note Support for the #{<sexpr>} subform is dropped. So SRFI 105 can work as expected:

(import (srfi-105 extra))
#"1 + 3 = #{1 + 3}"
;=> "1 + 3 = 4"
#"An \"#{string-append(\"Hello, \" \"World\")}\" example"
;=> "An \"Hello, World\" example"

Usage

(import string-interpolation)

or using UTF8

(import utf8-string-interpolation)

Compiler Command-Line

csc -extend [utf8-]string-interpolation ...

Interpreter Command-Line

csi -require-extension [utf8-]string-interpolation ...

Activates string-interpolation #"..." syntax.

String Interpolation Syntax

Usage

(import string-interpolation-syntax)

set-sharp-string-interpolation-syntax

[procedure] (set-sharp-string-interpolation-syntax PROC)

Extends the read-syntax with #"..." where the "..." is evaluated using (PROC "...").

PROC
#f ; read-syntax is cleared.
PROC
#t ; PROC is identity.
PROC
procedure ; interpolation function.

String Interpolator

Usage

(import string-interpolator)

or using UTF8

(import utf8-string-interpolator)

string-interpolate

[procedure] (string-interpolate STR [eval-tag: EVAL-TAG]) -> list

Performs substitution of embedded Scheme expressions, prefixed with EVAL-TAG. Two consecutive EVAL-TAGs are translated to a single EVAL-TAG. A trailing EVAL-TAG is taken literally.

STR
string.
EVAL-TAG
character, default #\#.

Usage

(import rabin-karp)
[procedure] (make-string-search STRINGS [COMPARE [HASH]]) -> SEARCHER
STRINGS
(list-of string) ;
COMPARE
(string string --> boolean) ;
HASH
(string [BOUNDS []]) ; SRFI-69 hash procedure.
SEARCHER
(string [START [END]]) --> RESULT
RESULT
(or #f (STRING . (START . END))) ; success or failure result
[procedure] (collect-string-search SEARCHER TARGET) -> (list-of RESULT)

Perform exhaustive search of the TARGET, returing a list of RESULT.

SEARCHER
from make-string-search
TARGET
string ; search within
RESULT
(or #f (STRING . (START . END))) ; success or failure result

Requirements

check-errors miscmacros srfi-1 srfi-13 srfi-69 utf8

test test-utils

Author

Kon Lovett

Version history

2.7.4
More fixnum, add default delimiter for string-split-chars/string-unzip.
2.7.3
Add tests, more fixnum, fix signatures.
2.7.2
Fix signatures, new test-runner.
2.7.1
Fix version.
2.7.0
Add rabin-karp module.
2.6.0
Remove #{...} support.
2.5.6
Reflow.
2.5.5
Update test-runner.
2.5.4
UTF8.
2.5.3
Add string-split-chars.
2.5.2
Fix potential buffer overflow in to-hex.
2.5.0
Add string-zip & string-unzip.
2.4.0
Add string-longest-common-prefix/suffix, string-longest-prefix/suffix, number->padded-string, list-as-string, string-trim-whitespace-both.
2.3.2
Deprecate unicode-char->string, fixes for memoized-string & string-utils modules, ascii-codepoint? & unicode-surrogate? are not predicates.
2.3.1
Minor optimization.
2.3.0
Deprecate #{...} support. Add string-interpolator modules.
2.2.0
Fix string-interpolation.
2.1.0
Add utf8-string-interpolation.
2.0.0
C5 release.

License

Copyright (C) 2010-2024 Kon Lovett. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

 Redistributions of source code must retain the above copyright notice, this list of conditions and the following
   disclaimer.
 Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following
   disclaimer in the documentation and/or other materials provided with the distribution.
 Neither the name of the author nor the names of its contributors may be used to endorse or promote
   products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICESLOSS OF USE, DATA, OR PROFITSOR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.