You are looking at historical revision 40095 of this page. It may differ significantly from its current revision.

strse

Strse (rhymes with terse) is a string DSL for Scheme.

(strse "this is freaking awesome"
       "is" "at"
       "at" "was so" 2
       'word string-upcase 3
       "frea" "ve"
       "king" "ry"
       (=> adjective "very") (conc adjective ", " adjective)
       'word "nice" -1)

⇒ "that was SO very, very nice"

The first argument is the source string, followed by any number of alternating search patterns and replacement expressions. A replacement expression can optionally be followed by an integer or boolean for extra magic.

Search patterns

Strse is nothing but a thin, glory-hogging, unnecessary veneer on top of Alex Shinn's wonderful irregex, so search patterns can be both Unix style regexes and sexp style SREs.

(strse "banana" "na$" "lity")

⇒ "banality"

(strse "banana" 'eol "rama")

⇒ "bananarama"

Replacement expressions

Replacement expressions are a single expression (but one that has access to all of Scheme, including begin, let etc).

Replacement expressions have access to two anaphoric vars. You can get the whole input string (for the current step) with the name it, and you can get the matches and submatches by giving numeric arguments to m. So (m 0) is the whole match, and (m 1) the first submatch etc.

If a replacement expression evaluates to a string, that becomes the replacement text for the match.

If it evaluates to a procedure, it is applied to the matched substring (as a whole, not considering submatches). If that outputs a string, that becomes the replacement text for the match.

If you just want to execute side-effects on a match without changing the string, wrap them in a then special form.

(strse "hippopotamus"
       "elephant" (then (print "I saw an elephant!"))
       "hippo" (then (print "I saw a hippo!"))
       "tiger" (then (print "I saw a cat!")))

I saw a hippo!br⇒ "hippopotamus"

Another example:

(define (acc)
  (let ((things '()))
    (lambda thing
      (if (null? thing)
      things
      (push! (car thing) things)))))

(define (extract str)
  (define digs (acc))
  (define words (acc))
  (strse str
     (= 3 num) (then (digs (string->number (m 0))))
     (+ alpha) (then (words (m 0))))
  (list (digs) (words)))

(extract "it will get 234 and 123 and 747 but not 1983 or 42 but then again 420")

⇒ ((420 198 747 123 234) ("again" "then" "but" "or" "not" "but" "and" "and" "get" "will" "it"))

Extra magic, part one!

If you supply a literal SRE, all named submatches are bound to their names!

(strse "oh my word" (: word " " (=> second word)) second)

⇒ "my word"

(strse "oh my word" (: word " " (=> second word))
       (conc second second second))

⇒ "mymymy word"

What the heck is a "literal SRE"? Normally, irregex SREs are quoted symbols or lists.

(strse "all vampires are named" 'word "dracula")

⇒ "dracula dracula dracula dracula"

But if strse sees a pair that does not start with quote or quasiquote, it'll get access to the named submatches in there (and then add the quote for you). Atoms are not messed with, so you can reuse previously bound regexes.

Isn't this pretty awful? Strse hogs all non-atomic expressions so you can't easily evaluate to regexes (although the code`, trick is a workaround). And, its clever name-binding trick only works with literal SREs so you can't combine it with quasiquoting and pre-baked regexes.

Silver lining: this can be different for each pattern pair in your strse call.

Even more magic

Each pair might optionally be followed by a boolean #t or #f or a number.

If you don't, you get your garden variety replace all, one pass.

A #t means keep running the same replacement recursively. This can hang unless your search eventually terminates, but it can be really handy as long as you are careful.

(strse "aaaaaaaah!" "aa" "a")

⇒ "aaaah!"

(strse "aaaaaaaah!" "aa" "a" #t)

⇒ "ah!"

An #f means nothing special if there is a match, but if there isn't, stop strse and return #f without evaluating any further.

(strse "parrot"
       "a" (begin (print "Found a") "i") #f
       "e" (begin (print "Found e") "i") #f
       "o" (begin (print "Found o") "i") #f)

Found abr⇒ #f

A zero means to replace the entire string, not just the matched part, if there is a match.

(strse "chirp chirp birds"
       "chir" "shee"
       "sheep" "The sentence got woolly" 0)

⇒ "The sentence got woolly"

A positive number means to just replace one match even if there are more. It's one-indexed so 1 is the first match. Negative numbers are the same thing except counting from the right, so -1 is the last match.

Author

Idiomdrottning

Repository

git clone https://idiomdrottning.org/strse

Licence

© 2021 Idiomdrottning.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer:
This software is provided by Idiomdrottning "as is" and any express or
implied warranties, including, but not limited to, the implied
warranties of merchantability and fitness for a particular purpose are
disclaimed. In no event shall Idiomdrottning be liable for any direct,
indirect, incidental, special, exemplary, or consequential damages
(including, but not limited to, procurement of substitute goods or
services; loss of use, data, or profits; or business interruption)
however caused and on any theory of liability, whether in contract,
strict liability, or tort (including negligence or otherwise) arising
in any way out of the use of this software, even if advised of the
possibility of such damage.