sxml-serializer

Serialize SXML to XML.

Overview

The SXML serializer writes SXML documents to a port, filename or string as XML or HTML. It supports comment nodes, processing instruction nodes, namespace nodes and ampersand entities such as (& 955). As an example, you can parse an XML document with the ssax egg using ssax:xml->sxml and write a valid copy back out with serialize-sxml.

Please refer to the serialization section of the SXML tools tutorial for further discussion and examples. We have made some changes to the interface, but it is close enough and the serialization results are the same.

Interface

[procedure] (serialize-sxml doc #!key keys)

Serialize the SXML document DOC, an SXML node or nodeset, to XML or HTML. Returns the result string if the serialization was done to a string, or an unspecified value if serializing to file or port.

serialize-sxml accepts the following keyword arguments:

output
[default #f] An output port or filename to write the output to, or #f to write it to a string.
cdata-section-elements
[default '()] A list of SXML element names, as symbols, which will have their contents serialized to CDATA.
indent
[default, two spaces] Indentation level to apply to XML elements, as a string or #f. When a string, a newline is written when a new tag is opened, and the tag is indented by printing the string x times for indentation level x. When #f, indentation is totally disabled and no newline is printed. To print all tags left-aligned, use the empty string.
method
[default 'xml] Serialization method, 'xml or 'html. When 'html, an end-tag is not output for empty HTML elements; character escaping is not performed for the content of script and style elements; "<" characters in attribute values are not escaped; whitespace is not added inside a formatted element; and boolean attributes are output in minimized form. HTML output is provided for completeness; using sxml-transforms may give better results.
ns-prefixes
[default conventional-ns-prefixes] An alist mapping namespace prefixes (symbols) to URIs, which allows the application to specify the mapping between namespace URIs and the corresponding namespace prefixes to be used for serialization. When no namespace prefix assignment is provided for some namespace URI, the serializer generates an XML prefix name by itself. This URI to prefix map is applied after any user shortcuts are expanded to full URIs.
allow-prefix-redeclarations
[default: value of allow-prefix-redeclarations? param] Permit different URIs to map to the same XML prefix. See the section on redeclaration.
[constant] conventional-ns-prefixes

An alist mapping well-known namespace prefixes to URIs. Typically used when augmenting the serializer's existing namespace map:

;; translate namespace URI http://3e8.org/zb to the XML prefix zb:
(serialize-sxml doc ns-prefixes: `((zb . "http://3e8.org/zb")
                                   ,@conventional-ns-prefixes))

Namespaces

The default namespace

XML supports a default namespace by which an implicit prefix is assumed for elements, using the xmlns attribute. These two elements are equivalent:

<atom:feed xmlns:atom="http://www.w3.org/2005/Atom" />
<feed xmlns="http://www.w3.org/2005/Atom" />

A typical document, in which *NAMESPACES* maps the atom: user shortcut to the namespace URI http://www.w3.org/2005/Atom, and ns-prefixes maps that URI back to the XML prefix atom:, is shown below:

> (serialize-sxml '(*TOP* (@ (*NAMESPACES* (atom "http://www.w3.org/2005/Atom")))
                     (atom:feed))
                  ns-prefixes: '((atom . "http://www.w3.org/2005/Atom")))
<atom:feed xmlns:atom="http://www.w3.org/2005/Atom" />

However, we can cause the serializer to use the default namespace instead. Consider the default namespace to be a special prefix called *default*, which happens to produce unprefixed XML elements. Now modify the association of the namespace URI in ns-prefixes, changing the prefix from atom to *default*:

> (serialize-sxml '(*TOP* (@ (*NAMESPACES* (atom "http://www.w3.org/2005/Atom")))
                     (atom:feed))
                  ns-prefixes: '((*default* . "http://www.w3.org/2005/Atom")))
<feed xmlns="http://www.w3.org/2005/Atom" />

All matching elements will then be mapped to the default namespace.

XML also supports the empty namespace, which is an unprefixed element with xmlns="". The empty namespace is the default. In SXML, the empty namespace is signified by an element without any qualifying name (in other words, it does not contain a colon). When such an element is encountered, and the default namespace is non-empty, it is reset to empty:

> (serialize-sxml '(*TOP* (@ (*NAMESPACES* (atom "http://www.w3.org/2005/Atom")))
                     (atom:feed (atom:id) (orphan)))
                  ns-prefixes: '((*default* . "http://www.w3.org/2005/Atom")))

<feed xmlns="http://www.w3.org/2005/Atom">
  <id />
  <orphan xmlns="" />
</feed>

Finally, you can specify multiple default namespace URIs; the default namespace will be redeclared whenever necessary:

> (serialize-sxml '(*TOP* (@ (*NAMESPACES* (atom "http://www.w3.org/2005/Atom")
                                           (xhtml "http://www.w3.org/1999/xhtml")))
                     (atom:feed (atom:entry
                                 (atom:content (@ (type "xhtml"))
                                  (xhtml:div (xhtml:p "I'm invincible!"))))))
                   ns-prefixes: '((*default* . "http://www.w3.org/2005/Atom")
                                  (*default* . "http://www.w3.org/1999/xhtml")))

<feed xmlns="http://www.w3.org/2005/Atom">
  <entry>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">
        <p>I'm invincible!</p>
      </div>
    </content>
  </entry>
</feed>

Note that you must have prefix redeclarations enabled (which is the default) for this last example to work properly. If not, the *default* namespace prefix cannot be redeclared and the XHTML elements will be prefixed with xhtml:.

Redeclaring XML prefixes

[parameter] (allow-prefix-redeclarations? #t)

This parameter determines the default value of the allow-prefix-redeclarations: keyword to serialize-sxml.

If #t, allows redeclaration of XML prefixes when a namespace URI maps to a previously declared XML prefix; if #f, a new prefix will be autogenerated so that XML prefixes map one-to-one to URIs. Defaults to #t. The behavior of the stock sxml-tools serializer is to avoid all prefix redeclarations; to retain this behavior, set this value to #f.

When enabled, you can provide multiple identical prefixes in ns-prefixes, mapped to different URIs. The serializer will redeclare the prefix with a different namespace URI if it has already been declared in a parent:

> (serialize-sxml '((*TOP* (http://foo:one (http://bar:two))
                           (http://bar:three)))
                  ns-prefixes: '((BAZ . "http://foo") (BAZ . "http://bar")))
<BAZ:one xmlns:BAZ="http://foo">
  <BAZ:two xmlns:BAZ="http://bar" />
</BAZ:one>
<BAZ:three xmlns:BAZ="http://bar" />

Contrast this to the case where redeclarations are disallowed:

> (serialize-sxml '((*TOP* (http://foo:one (http://bar:two))
                           (http://bar:three)))
                  ns-prefixes: '((BAZ . "http://foo") (BAZ . "http://bar"))
                  allow-prefix-redeclarations: #f)
<BAZ:one xmlns:BAZ="http://foo">
  <prfx1:two xmlns:prfx1="http://bar" />   <!-- ooooh -->
</BAZ:one>
<BAZ:three xmlns:BAZ="http://bar" />

Here the nested element http://bar:two had its XML prefix prfx1 auto-generated, because BAZ had already been declared in the enclosing scope. Notice that http://bar:three still becomes BAZ:three though, because BAZ was not declared in that element's enclosing scope.

This is extremely useful when the default namespace is employed, because you can declare multiple default namespace URIs, and switch back and forth between the empty namespace and a default namespace. You can consider the default namespace to be a dedicated prefix called *default* which happens to render into XML without a prefix.

(serialize-sxml
  '(*TOP* (http://foo:one
            (http://bar:two
              (http://bar:three)
              (http://foo:four)
              (five (six (http://foo:seven))))))
   ns-prefixes: '((*default* . "http://foo") (*default* . "http://bar")))
<!-- redeclarations #t -->          <!-- redeclarations #f -->
<one xmlns="http://foo">            <one xmlns="http://foo">
  <two xmlns="http://bar">            <prfx1:two xmlns:prfx1="http://bar">
    <three />                           <prfx1:three />
    <four xmlns="http://foo" />         <four />
    <five xmlns="">                     <five xmlns="">
      <six>                               <six>
        <seven xmlns="http://foo" />        <prfx2:seven xmlns:prfx2="http://foo" />
      </six>                              </six>
    </five>                             </five>
  </two>                              </prfx1:two>
</one>                              </one>

One thing to note is that the empty namespace can always be redeclared to be the default namespace, despite the value of allow-prefix-redeclarations?. That's because in XML it is illegal to associate the empty namespace with an XML prefix, so we cannot auto-generate a prefix for it.

Changes from stock

(srl:sxml->xml doc)          ==> (serialize-sxml doc)
(srl:sxml->xml-noindent doc) ==> (serialize-sxml doc indent: #f)
(srl:sxml->html doc)         ==> (serialize-sxml doc method: 'html)

Limitations

About this egg

Author

Dmitry Lizorkin wrote the original serializer code. The Chicken port, and some enhancements, are by Jim Ursetto.

Version history

0.3
Accept chars, nulls and symbols as text nodes; accept lists containing text nodes in head position
0.2
XML prefix redeclaration; default namespace support [Jim Ursetto]
0.1
Initial import from sxml-tools CVS 1.7 @ Fri Nov 7 08:36:28 2008 UTC

License

BSD. The original serializer code by Dmitry Lizorkin is public domain.