SXML-transforms (historical revision 28383)

You are looking at historical revision 28383 of this page. It may differ significantly from its current revision.

eggs

SXML-transforms

This is the sxml-transforms extension library for Chicken Scheme.

Description

The SXML transformations (to XML, SXML, and HTML) from the SSAX project

Documentation

This egg provides the SXML transforms available in the SSAX/SXML Sourceforge project. It incorporates one main module, and an auxiliary one:

sxml-transforms

SRV:send-reply

[procedure] (SRV:send-reply . fragments)

Output the FRAGMENTS to the current output port.

The fragments are a list of strings, characters, numbers, thunks, #f, #t -- and other fragments. The function traverses the tree depth-first, writes out strings and characters, executes thunks, and ignores #f and '(). The function returns #t if anything was written at all; otherwise the result is #f. If #t occurs among the fragments, it is not written out but causes the result of SRV:send-reply to be #t.

pre-post-order

[procedure] (pre-post-order tree bindings)

(See also pre-post-order*, which is preferred on Chicken.)

Traversal of an SXML tree or a grove: a <Node> or a <Nodelist>

A <Node> and a <Nodelist> are mutually-recursive datatypes that underlie the SXML tree:

    <Node> ::= (name . <Nodelist>) | "text string"

An (ordered) set of nodes is just a list of the constituent nodes:

    <Nodelist> ::= (<Node> ...)

Nodelists, and Nodes other than text strings are both lists. A <Nodelist> however is either an empty list, or a list whose head is not a symbol (an atom in general). A symbol at the head of a node is either an XML name (in which case it's a tag of an XML element), or an administrative name such as '@'. See SXPath.scm and SSAX.scm for more information on SXML.

Pre-Post-order traversal of a tree and creation of a new tree:

pre-post-order:: <tree> x <bindings> -> <new-tree>

where
   <bindings> ::= (<binding> ...)
   <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
                 (<trigger-symbol> *macro* . <handler>) |
                 (<trigger-symbol> <new-bindings> . <handler>) |
                 (<trigger-symbol> . <handler>)
   <trigger-symbol> ::= XMLname | *text* | *default*
   <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>

The pre-post-order function visits the nodes and nodelists pre-post-order (depth-first). For each <Node> of the form (name <Node> ...) it looks up an association with the given 'name' among its <bindings>. If failed, pre-post-order tries to locate a *default* binding. It's an error if the latter attempt fails as well. Having found a binding, the pre-post-order function first checks to see if the binding is of the form

(<trigger-symbol> *preorder* . <handler>)

If it is, the handler is 'applied' to the current node. Otherwise, the pre-post-order function first calls itself recursively for each child of the current node, with <new-bindings> prepended to the <bindings> in effect. The result of these calls is passed to the <handler> (along with the head of the current <Node>). To be more precise, the handler is _applied_ to the head of the current node and its processed children. The result of the handler, which should also be a <tree>, replaces the current <Node>. If the current <Node> is a text string or other atom, a special binding with a symbol *text* is looked up.

A binding can also be of a form

(<trigger-symbol> *macro* . <handler>)

This is equivalent to *preorder* described above. However, the result is re-processed again, with the current stylesheet.

pre-post-order-splice

[procedure] (pre-post-order-splice tree bindings)

This module's version of pre-post-order is a variant which always outputs strictly-conformant SXML. It unnests lists that do not have a tag as their car until they do.

post-order

[procedure] (post-order tree bindings)

Deprecated. This was a version of pre-post-order that did not accept *macro* or *preorder* directives.

foldts

[procedure] (foldts fdown fup fhere seed tree)

Tree fold operator.

   tree = atom | (node-name tree ...)

   foldts fdown fup fhere seed (Leaf str) = fhere seed str
   foldts fdown fup fhere seed (Nd kids) =
         fup seed $ foldl (foldts fdown fup fhere) (fdown seed) kids

   procedure fhere: seed -> atom -> seed
   procedure fdown: seed -> node -> seed
   procedure fup: parent-seed -> last-kid-seed -> node -> seed

foldts returns the final seed

replace-range

[procedure] (replace-range beg-pred end-pred forest)

   procedure: replace-range:: BEG-PRED x END-PRED x FOREST -> FOREST

Traverse a forest depth-first and cut/replace ranges of nodes.

The nodes that define a range don't have to have the same immediate parent, don't have to be on the same level, and the end node of a range doesn't even have to exist. A replace-range procedure removes nodes from the beginning node of the range up to (but not including) the end node of the range. In addition, the beginning node of the range can be replaced by a node or a list of nodes. The range of nodes is cut while depth-first traversing the forest. If all branches of the node are cut a node is cut as well. The procedure can cut several non-overlapping ranges from a forest.

   replace-range:: BEG-PRED x END-PRED x FOREST -> FOREST

where

   type FOREST = (NODE ...)
   type NODE = Atom | (Name . FOREST) | FOREST

The range of nodes is specified by two predicates, beg-pred and end-pred.

   beg-pred:: NODE -> #f | FOREST
   end-pred:: NODE -> #f | FOREST

The beg-pred predicate decides on the beginning of the range. The node for which the predicate yields non-#f marks the beginning of the range The non-#f value of the predicate replaces the node. The value can be a list of nodes. The replace-range procedure then traverses the tree and skips all the nodes, until the end-pred yields non-#f. The value of the end-pred replaces the end-range node. The new end node and its brothers will be re-scanned. The predicates are evaluated pre-order. We do not descend into a node that is marked as the beginning of the range.

SXML->HTML

[procedure] (SXML->HTML tree)

This procedure is the most generic transformation of SXML into the corresponding HTML document. The SXML tree is traversed post-order (depth-first) and transformed into another tree, which, written in a depth-first fashion, results in an HTML document.

It's basically like pre-post-order with the universal-conversion-rules hardcoded. It also knows about a rule html:begin, which translates the HTML code to oldskool uppercase HTML 3 code preceded by a Content-Type header.

entag

[procedure] (entag tag elems)

Create the HTML markup fragments for tags. TAG is the name of the tag (a symbol) and ELEMS is the tree of elements that form the contents of this tag (not recusively processed). This is used in the node handlers for the (pre-)post-order function, to prepare it for output by SRV:send-reply. This is an alias for entag-xhtml (see below, in the section about Chicken-specific modifications)

enattr

[procedure] (enattr attr-key value)

Create the HTML markup fragments for attributes. The ATTR-KEY is the name of the attribute (a symbol) and VALUE is the value it should have. This is used in the node handlers for the (pre-)post-order function, to prepare it for output by SRV:send-reply.

string->goodHTML

[procedure] (string->goodHTML html)

Given a string, check to make sure it does not contain characters such as '<' or '&' that require encoding. Return either the original string, or a list of string fragments with special characters replaced by appropriate character entities.

universal-conversion-rules

[constant] universal-conversion-rules

Bindings for the pre-post-order function, which traverses the SXML tree and converts it to a tree of fragments. It contains rules to call string->goodHTML, enattr and entag on all text, attributes and tags. In normal situations you always append these rules to your own rules, or add a final pre-post-order processing step with just these bindings.

Note that non-string text nodes (such as symbols, numbers and chars) will not be escaped by default. To escape them, you can override the text node handler. For example,

`((*text* . ,(lambda (tag body)
               (string->goodHTML (->string body))))
  . ,universal-conversion-rules)

On Chicken, universal-conversion-rules has been augmented a bit. The following rule has been added:

(& ENTITY-NAME ...)

which quotes character references given by strings ENTITY-NAME .... For example,

(& "ndash" "quot")
 => "&ndash;&quot;"

universal-protected-rules

[constant] universal-protected-rules

A variation of universal-conversion-rules which keeps '<', '>', '&' and similar characters intact (ie, it skips calling string->goodHTML). The universal-protected-rules are useful when the tree of fragments has to be traversed one more time.

alist-conv-rules

[constant] alist-conv-rules

These rules define the identity transformation. You will usually need to append these rules to all of the bindings you use with pre-post-order, unless you explicitly define your own conversion rules for *default* and *text*.

make-char-quotator

[procedure] (make-char-quotator quot-rules)

Given QUOT-RULES, an assoc list of (char . string) pairs, return a quotation procedure. The returned quotation procedure takes a string and returns either a string or a list of strings. The quotation procedure check to see if its argument string contains any instance of a character that needs to be encoded (quoted). If the argument string is "clean", it is returned unchanged. Otherwise, the quotation procedure will return a list of string fragments. The input straing will be broken at the places where the special characters occur. The special character will be replaced by the corresponding encoding strings.

For example, to make a procedure that quotes special HTML characters, do:

(make-char-quotator
    '((#\< . "&lt;") (#\> . "&gt;") (#\& . "&amp;") (#\" . "&quot;")))

Chicken-specific modifications

entag-xhtml

[procedure] (entag-xhtml)

entag-xhtml closes XHTML tags properly in an HTML compatible way. entag is now an alias for entag-xhtml, so this behaviour is the default.

Newlines before open tags in the rendered HTML output are omitted for inline elements, such as tt and strong. This prevents the introduction of extraneous whitespace.

entag-html

[procedure] (entag-html)

entag-html is an alias for the original entag.

Starred versions

Because pre-post-order necessarily uses apply to call user handlers, it will fail on large SXML trees. Chicken's apply procedure is limited in how many arguments it can pass to a lambda, as described in the manual.

For this reason, the Chicken implementation of sxml-transforms provides alternative "starred" procedures not subject to this limit.

SXML->HTML uses the starred versions internally so it no longer suffers from this problem. This has no impact on callers.

[procedure] (pre-post-order* tree bindings)

Variant of pre-post-order which calls its handlers with only two arguments, TAG (the trigger symbol) and BODY (the subtree body):

(handler (car tree) (cdr tree))

In contrast, pre-post-order calls its handlers with the trigger symbol and with one argument for each element of the body:

(apply handler (car tree) (cdr tree))

On Chicken, pre-post-order is inefficient and will even fail if the tree length is greater than the native apply parameter limit, which may be as low as 126 depending on your system. We recommend using pre-post-order* to avoid these issues.

An example:

 ;; Convert generic lists to ordered HTML lists
 (pre-post-order
   `(list "one" "two" (strong "three!"))
   `((list . ,(lambda (tag . body)
                (cons 'ol (map (lambda (x) (list 'li x)) body))))
     ,@alist-conv-rules))

now becomes:

 ;; Convert generic lists to ordered HTML lists, no limits!
 (pre-post-order*
   `(list "one" "two" (strong "three!"))
   `((list . ,(lambda (tag body)
                (cons 'ol (map (lambda (x) (list 'li x)) body))))
     ,@alist-conv-rules*))

[procedure] (pre-post-order-splice* tree bindings)

Variant of pre-post-order-splice which passes its handlers only two arguments: TAG and BODY. See pre-post-order*.

[constant] universal-conversion-rules*

Variant of universal-conversion-rules to be used with pre-post-order*.

[constant] universal-protected-rules*

Variant of universal-protected-rules to be used with pre-post-order*.

[constant] alist-conv-rules*

Variant of alist-conv-rules to be used with pre-post-order*.

Examples

Oleg's site is the main resource. Be sure to read his examples and the ones in the SSAX repository (also included in the egg). The following papers were of great help:

There's also a more friendly SXML tutorial available.

The initial documentation on this wiki page came straight from the comments in the extremely well-documented source code. It's recommended you read the code if you want to learn more.

1.4.1: Add IMG and a few other HTML4 elts as inline in entag-xhtml, fixing whitespace issues; also add HTML5 inline elts
1.4: Add starred versions (pre-post-order* etc)
1.3: Update to upstream latest version, remove splicing pre-post-order from extra file and rename it to pre-post-order-splice
1.2: Port to hygienic chicken
1.1: Improve inline element whitespace handling; add '&' rule.
1.0: Initial release

License

The sxml-transforms code is in the public domain.