strse (historical revision 40963) - The CHICKEN Scheme wiki

You are looking at historical revision 40963 of this page. It may differ significantly from its current revision.

strse

strse

Strse (rhymes with terse) is a string DSL for Scheme.

(strse "this is freaking awesome"
       "is" "at"
       "at" (only second "was so")
       'word (only third string-upcase)
       "frea" "ve"
       "king" "ry"
       (=> adjective "very") (conc adjective ", " adjective)
       'word (only last "nice"))

⇒ "that was SO very, very nice"

The first argument is the source string, followed by any number of alternating search patterns and replacement expressions.

Replacement expressions

Replacement expressions are a single expression (but one that has access to all of Scheme, including begin, let etc).

Replacement expressions have access to two anaphoric vars. You can get the whole input string (for the current step) with the name it, and you can get the matches and submatches by giving numeric arguments to m. So (m 0) is the whole match, and (m 1) the first submatch etc.

If a replacement expression evaluates to a string, that becomes the replacement text for the match.

If it evaluates to a procedure, it is applied to the matched substring (as a whole, not considering submatches). If that outputs a string, that becomes the replacement text for the match.

If you supply a literal SRE, all named submatches are bound to their names!

(strse "oh my word" (: word " " (=> second word)) second)

⇒ "my word"

(strse "oh my word" (: word " " (=> second word))
       (conc second second second))

⇒ "mymymy word"

What the heck is a "literal SRE"? Normally, irregex SREs are quoted symbols or lists.

(strse "all vampires are named" 'word "dracula")

⇒ "dracula dracula dracula dracula"

But if strse sees a pair that does not start with quote or quasiquote, it'll get access to the named submatches in there (and then add the quote for you). Atoms are not messed with, so you can supply previously bound regexes:

(let ((lucy '(: word space word)))
  (strse "all vampires are named" lucy "dracula"))

⇒ "dracula dracula"

In that case, it can't see inside of those regexes in order to bind submatches to names.

Replacement operators

Each replacement expression can optionally be wrapped in a single operator.

If you don't, you get your garden variety replace all, one pass.

All the operators also provide an implicit begin.

then

If you just want to execute side-effects on a match without changing the string, wrap them in a then special form.

(strse "hippopotamus"
       "elephant" (then (print "I saw an elephant!"))
       "hippo" (then (print "I saw a hippo!"))
       "tiger" (then (print "I saw a cat!")))

I saw a hippo!br⇒ "hippopotamus"

Another example:

(define (acc)
  (let ((things '()))
    (lambda thing
      (if (null? thing)
      things
      (push! (car thing) things)))))

(define (extract str)
  (define digs (acc))
  (define words (acc))
  (strse str
     (= 3 num) (then (digs (string->number (m 0))))
     (+ alpha) (then (words (m 0))))
  (list (digs) (words)))

(extract "it will get 234 and 123 and 747 but not 1983 or 42 but then again 420")

⇒ ((420 198 747 123 234) ("again" "then" "but" "or" "not" "but" "and" "and" "get" "will" "it"))

recursively

Keep running the same replacement recursively. This can hang unless your search eventually terminates, but it can be really handy as long as you are careful.

(strse "aaaaaaaah!" "aa" "a")

⇒ "aaaah!"

(strse "aaaaaaaah!" "aa" (recursively "a"))

⇒ "ah!"

truly

Keep going as normal if there is a match, but if there isn't, stop strse and return #f without evaluating any further.

(strse "parrot"
       "a" (truly (print "Found a") "i")
       "e" (truly (print "Found e") "i")
       "o" (truly (print "Found o") "i"))

Found abr⇒ #f

entire

Replace the entire string, not just the matched part, if there is a match.

(strse "chirp chirp birds"
       "chir" "shee"
       "sheep" (entire "The sentence got woolly"))

⇒ "The sentence got woolly"

only

Replace just one match even if there are more. You need to supply a list index (zero-indexed. Negative numbers count from the back, so -2 is the second last, -1 is the last) or a list accessor function like cadr or last. Numbers are bounds checked, accessor functions aren't.

strse?

(strse? str reg)

Just returns #t if reg is in str and #f otherwise.

(strse? reg)

Returns a predicate that takes a str argument and checks if reg is in it.brIn other words, it's curried on it's second argument, kind of a backwards currying but often convenient.

Porting from the old version of strse

The old vesion of strse let you jam extra magic booleans and numbers in there, and, that still works. Old code should still work. Yay cruft in the name of backwards compatibility!

The point of the new version is to be more consistent, alternating patterns and replacements.

Here is a Rosetta from old to new.

(strse s foo bar)
(strse s foo bar)

(strse s foo (then bar))
(strse s foo (then bar))

(strse s foo bar #f)
(strse s foo (truly bar))

(strse s foo bar 0)
(strse s foo (entire bar))

(strse s foo bar 3)
(strse s foo (only third bar))
;; or:
(strse s foo (only 2 bar))

Both will work: a list accessor function like first or last or third, or a zero-based positive index, or a negative index. The old index is one-based.

All the new operators have implicit begin:

(strse s foo (begin bar baz) #f)
(strse s foo (truly bar baz))

For a repo,

git clone https://idiomdrottning.org/strse