raw

  1. raw
  2. Description
  3. API
  4. Examples
    1. Simple usage
    2. Dealing with the escape character
    3. Multiline strings and smart indentation
      1. Embedding C code
      2. Rules
      3. Disabling smart indentation and ignoring newlines
    4. Substituting expressions
      1. Mutiline expressions
    5. Changing the syntax
  5. Caveats
  6. Author
  7. Repository
  8. Requirements
  9. Version history
  10. License

Description

The raw egg is a reader extension for extended string quasi-literals (ESQLs), inspired by SRFI-109. Perfect if the CHICKEN syntax for heredoc strings is not sufficient for your needs.

Henceforth we will refer to these extended string quasi-literals as raw strings, even though they are not quite raw. Features include

Here is a minimal example.

(import raw)

(define head
  #&{
  <head>
    <title>a webpage</title>
  </head>&
  &})

(define content #&{<p>This is a website.</p>&})

(define main
  #&{
  <main>
    &[content]
  </main>&
  &})

(define body
  #&{
  <body>
    &![main]
  </body>&
  &})

(define webpage
  #&{
  <!doctype html>
  <html lang="en">
    &![head]
    &![body]
  </html>
  &})

webpage

The output should be the following (note how everything indents nicely).

<!doctype html>
<html lang="en">
  <head>
    <title>a webpage</title>
  </head>
  <body>
    <main>
      <p>This is a website.</p>
    </main>
  </body>
</html>

See the Examples section for a tutorial covering all features.

API

[procedure] (raw-read-syntax PORT)

Parses a raw string from the port PORT. Suitable for use with set-read-syntax! and set-sharp-read-syntax.

[procedure] (_raw-post-process TOKEN ...)
[parameter] _raw-post-processor-name

Do not redefine, prefix, shadow, not import, etc., _raw-post-process. If you follow this rule, everything will work fine. See the caveats section for more information on this API.

Examples

This is a tutorial. To run the examples remember to first

(import raw)

Simple usage

By default, raw strings begin with the hash syntax #&. Then follows an opening curly brace and some text. To end the raw string, use the escape character, by default &, followed by a closing curly brace.

; "hello world"
#&{hello, world&}

Any character is allowed inside the raw string and will be interpreted literally (but see how smart indentation works), except the escape character, which can do funny things (like end the raw string).

; "!@#$%^*()\\\""
#&{!@#$%^*()\"&}

Next we will see how to insert a literal escape character, and why this is almost never necessary.

Dealing with the escape character

You can change the default escape character by including your preference just after the hash syntax.

; "now the escape is the at sign"
#&@{now the escape is the at sign@}

Any escape character is allowed, except the opening brace character. You almost certainly don't want an opening brace to be the escape character anyway. Common choices are backslash, backtick, at sign, etc.

By choosing a suitable escape character you almost never need to insert a literal escape character in your string. But if you want to do so, you can use the escape character followed by &.

; "me@example.com"
#&@{me@&example.com@}

Multiline strings and smart indentation

Embedding C code

Suppose that you want to have some C code as a string in your program. Further suppose you'd like it to be nicely indented, and you don't want to include extraneous whitespace in the string.

With CHICKEN syntax, you would try this.

(define my-strlen
  (foreign-lambda* int ((c-string str)) #<<CCODE
int n = 0;

while(*(str++))
	++n;

C_return(n);
CCODE
))

Even in this small example, you can see how ugly our code looks. With raw strings, you can do this instead.

(define my-strlen
  (foreign-lambda* int ((c-string str)) #&{
    int n = 0;

    while(*(str++))
    	++n;

    C_return(n);&}))

This is completely equivalent to the previous code in terms of whitespace, even the ++n; will be correctly tab-indented.

Of course, for C the advantage is moot since C ignores most whitespace. But if you are using Python, or are carefully formatting something, smart indentation makes your life easier.

How does it work?

Rules

  1. A newline right after the starting opening curly brace is not included in the string, but it silently activates smart indentation.
  2. Indentation characters (spaces or tabs) are read and recorded until the first non-indentation character is encountered. The recorded sequence is called the "indentation sequence".
  3. From that point onwards, every newline character must be followed by the indentation sequence or another newline character (otherwise an error will be signaled). The indentation sequence after a newline is ignored.

Disabling smart indentation and ignoring newlines

Suppose you want to a raw string that begins with a newline character followed by some indentation characters. Smart indentation will give you some trouble.

A escape character followed by a newline character is a no-op. This can be used to turn smart indentation off as follows.

; "\n    hello"
#&{&

    hello&}

We can use this no-op sequence to break long lines too.

; "this is a very looooooooooooooooooooooooooooong line"
#&{
this is a very &
looooooooooooooooooooooooooooong &
line&
&}

Substituting expressions

You can substitute any Scheme expression that evaluates to a string inside raw strings. To do this, we enclose the expression in square brackets, with the opening bracket prefixed by the escape character.

(define greeting "hello")

; "hello, world"
#&{&[greeting], &["world"]&}

You can include S-expressions like this too, but for them a more convenient syntax is available: just prefix them with the escape character.

; "say aaaaaaaaaa"
#&{say &(make-string 10 #\a)&}

This only works if you use parentheses as delimiters for your S-expression.

Mutiline expressions

If your expression spans multiple lines it is still substituted literally.

(define nums "one\ntwo\nthree\n")
#&{numbers: &[nums]&}

This will result in the following.

numbers: one
two
three

But that is not always what you want! Suppose you are writing a program that writes a program (as Schemers often do):

(define danger-comment
  #&{
  DANGER! Under any circumstances do *not* run:

  (delete-directory "/")
  #})

(define delete-directory-definition
  #&{
  ; This is the definition of delete-directory:

  (define delete-directory
    &(delete-directory-body))

  ; &[danger-comment]
  &})

Then delete-directory-definition will get expanded to the following (delete-directory-body omitted).

; This is the definition of delete-directory:

(define delete-directory
  ...)

; DANGER! Under any circumstances do *not* run:

(delete-directory "/")

Oops! How can we fix this?

Specifically, it would sometimes be nice to have the characters just before the substitution up to the previous newline character to be inserted after every newline character in the substitution (mouthful, I know). Long story short, we can do this by inserting an exclamation mark just after the escape character of the substitution.

(define delete-directory-definition
  #&{
  ; This is the definition of delete-directory:

  (define delete-directory
    &(delete-directory-body))

  ; &![danger-comment]
  &})

This will do the right thing most of the time.

; This is the definition of delete-directory:

(define delete-directory
  ...)

; DANGER! Under any circumstances do *not* run:
; 
; (delete-directory "/")

It is still not perfect (there is an extra space in the second line of the dangerous comment) but it gets the job done. It is specially good for alignment, revisit the example in the description.

Changing the syntax

If you want to stop the reader from processing raw strings, you can use the set-sharp-read-syntax! procedure.

(import (chicken read-syntax))

(set-sharp-read-syntax! #\& #f)

; syntax error
#&{hello, world&}

If you want to define another starting sequence for raw strings, you can do that with the raw-read-syntax procedure. For example, though much our this syntax is taken from SRFI-109, SRFI-109 does not require a leading # to start an extended string quasi-literal: you can start them with &. We decided against that because it makes parsing more complicated since identifiers in Scheme can start with &. However, if you are not using identifiers starting with &, you might prefer the shorter syntax:

(import (chicken read-syntax))

(set-read-syntax! #\& raw-read-syntax)

; "hello, world"
&{hello, world&}

Caveats

To evaluate substituted expressions in your environment, this egg modifies the reader to see a raw string as the procedure _raw-post-process called with suitable tokens. This has to be done: there is no way to evaluate those expressions inside the egg so something must be done in user space. But if this procedure is renamed, not imported, shadowed, prefixed, etc., then the extension will stop working. The built-in reader extensions get around this by calling (undocumented?) low level APIs.

If you still want to rename _raw-post-process, you have to specify the new name by changing the paramenter _raw-post-processor-name

(import (prefix raw r:))

; error: cannot find _raw-post-processor
; #&{hello, world&}

(r:_raw-post-processor-name 'r:_raw-post-process)

; "hello, world"
#&{hello, world&}

Using this method you can even register your own post-processor, though this is undocumented and subject to change, refer to the source code for details.

Author

HernĂ¡n Ibarra Mejia

Repository

Development happens in SourceHut.

Requirements

None (but testing requires the test egg).

Version history

1.0
Initial release.

License

GPLv3.