sxml-serializer
Serialize SXML to XML.
Overview
The SXML serializer writes SXML documents to a port, filename or string as XML or HTML. It supports comment nodes, processing instruction nodes, namespace nodes and ampersand entities such as (& 955). As an example, you can parse an XML document with the ssax egg using ssax:xml->sxml and write a valid copy back out with serialize-sxml.
Please refer to the serialization section of the SXML tools tutorial for further discussion and examples. We have made some changes to the interface, but it is close enough and the serialization results are the same.
Interface
[procedure] (serialize-sxml doc #!key keys)Serialize the SXML document DOC, an SXML node or nodeset, to XML or HTML. Returns the result string if the serialization was done to a string, or an unspecified value if serializing to file or port.
serialize-sxml accepts the following keyword arguments:
- output
- [default #f] An output port or filename to write the output to, or #f to write it to a string.
- cdata-section-elements
- [default '()] A list of SXML element names, as symbols, which will have their contents serialized to CDATA.
- indent
- [default, two spaces] Indentation level to apply to XML elements, as a string or #f. When a string, a newline is written when a new tag is opened, and the tag is indented by printing the string x times for indentation level x. When #f, indentation is totally disabled and no newline is printed. To print all tags left-aligned, use the empty string.
- method
- [default 'xml] Serialization method, 'xml or 'html. When 'html, an end-tag is not output for empty HTML elements; character escaping is not performed for the content of script and style elements; "<" characters in attribute values are not escaped; whitespace is not added inside a formatted element; and boolean attributes are output in minimized form. HTML output is provided for completeness; using sxml-transforms may give better results.
- ns-prefixes
- [default conventional-ns-prefixes] An alist mapping namespace prefixes (symbols) to URIs, which allows the application to specify the mapping between namespace URIs and the corresponding namespace prefixes to be used for serialization. When no namespace prefix assignment is provided for some namespace URI, the serializer generates an XML prefix name by itself. This URI to prefix map is applied after any user shortcuts are expanded to full URIs.
- allow-prefix-redeclarations
- [default: value of allow-prefix-redeclarations? param] Permit different URIs to map to the same XML prefix. See the section on redeclaration.
An alist mapping well-known namespace prefixes to URIs. Typically used when augmenting the serializer's existing namespace map:
;; translate namespace URI http://3e8.org/zb to the XML prefix zb: (serialize-sxml doc ns-prefixes: `((zb . "http://3e8.org/zb") ,@conventional-ns-prefixes))
Namespaces
The default namespace
XML supports a default namespace by which an implicit prefix is assumed for elements, using the xmlns attribute. These two elements are equivalent:
<atom:feed xmlns:atom="http://www.w3.org/2005/Atom" /> <feed xmlns="http://www.w3.org/2005/Atom" />
A typical document, in which *NAMESPACES* maps the atom: user shortcut to the namespace URI http://www.w3.org/2005/Atom, and ns-prefixes maps that URI back to the XML prefix atom:, is shown below:
> (serialize-sxml '(*TOP* (@ (*NAMESPACES* (atom "http://www.w3.org/2005/Atom"))) (atom:feed)) ns-prefixes: '((atom . "http://www.w3.org/2005/Atom"))) <atom:feed xmlns:atom="http://www.w3.org/2005/Atom" />
However, we can cause the serializer to use the default namespace instead. Consider the default namespace to be a special prefix called *default*, which happens to produce unprefixed XML elements. Now modify the association of the namespace URI in ns-prefixes, changing the prefix from atom to *default*:
> (serialize-sxml '(*TOP* (@ (*NAMESPACES* (atom "http://www.w3.org/2005/Atom"))) (atom:feed)) ns-prefixes: '((*default* . "http://www.w3.org/2005/Atom"))) <feed xmlns="http://www.w3.org/2005/Atom" />
All matching elements will then be mapped to the default namespace.
XML also supports the empty namespace, which is an unprefixed element with xmlns="". The empty namespace is the default. In SXML, the empty namespace is signified by an element without any qualifying name (in other words, it does not contain a colon). When such an element is encountered, and the default namespace is non-empty, it is reset to empty:
> (serialize-sxml '(*TOP* (@ (*NAMESPACES* (atom "http://www.w3.org/2005/Atom"))) (atom:feed (atom:id) (orphan))) ns-prefixes: '((*default* . "http://www.w3.org/2005/Atom"))) <feed xmlns="http://www.w3.org/2005/Atom"> <id /> <orphan xmlns="" /> </feed>
Finally, you can specify multiple default namespace URIs; the default namespace will be redeclared whenever necessary:
> (serialize-sxml '(*TOP* (@ (*NAMESPACES* (atom "http://www.w3.org/2005/Atom") (xhtml "http://www.w3.org/1999/xhtml"))) (atom:feed (atom:entry (atom:content (@ (type "xhtml")) (xhtml:div (xhtml:p "I'm invincible!")))))) ns-prefixes: '((*default* . "http://www.w3.org/2005/Atom") (*default* . "http://www.w3.org/1999/xhtml"))) <feed xmlns="http://www.w3.org/2005/Atom"> <entry> <content type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <p>I'm invincible!</p> </div> </content> </entry> </feed>
Note that you must have prefix redeclarations enabled (which is the default) for this last example to work properly. If not, the *default* namespace prefix cannot be redeclared and the XHTML elements will be prefixed with xhtml:.
Redeclaring XML prefixes
[parameter] (allow-prefix-redeclarations? #t)This parameter determines the default value of the allow-prefix-redeclarations: keyword to serialize-sxml.
If #t, allows redeclaration of XML prefixes when a namespace URI maps to a previously declared XML prefix; if #f, a new prefix will be autogenerated so that XML prefixes map one-to-one to URIs. Defaults to #t. The behavior of the stock sxml-tools serializer is to avoid all prefix redeclarations; to retain this behavior, set this value to #f.
When enabled, you can provide multiple identical prefixes in ns-prefixes, mapped to different URIs. The serializer will redeclare the prefix with a different namespace URI if it has already been declared in a parent:
> (serialize-sxml '((*TOP* (http://foo:one (http://bar:two)) (http://bar:three))) ns-prefixes: '((BAZ . "http://foo") (BAZ . "http://bar"))) <BAZ:one xmlns:BAZ="http://foo"> <BAZ:two xmlns:BAZ="http://bar" /> </BAZ:one> <BAZ:three xmlns:BAZ="http://bar" />
Contrast this to the case where redeclarations are disallowed:
> (serialize-sxml '((*TOP* (http://foo:one (http://bar:two)) (http://bar:three))) ns-prefixes: '((BAZ . "http://foo") (BAZ . "http://bar")) allow-prefix-redeclarations: #f) <BAZ:one xmlns:BAZ="http://foo"> <prfx1:two xmlns:prfx1="http://bar" /> <!-- ooooh --> </BAZ:one> <BAZ:three xmlns:BAZ="http://bar" />
Here the nested element http://bar:two had its XML prefix prfx1 auto-generated, because BAZ had already been declared in the enclosing scope. Notice that http://bar:three still becomes BAZ:three though, because BAZ was not declared in that element's enclosing scope.
This is extremely useful when the default namespace is employed, because you can declare multiple default namespace URIs, and switch back and forth between the empty namespace and a default namespace. You can consider the default namespace to be a dedicated prefix called *default* which happens to render into XML without a prefix.
(serialize-sxml '(*TOP* (http://foo:one (http://bar:two (http://bar:three) (http://foo:four) (five (six (http://foo:seven)))))) ns-prefixes: '((*default* . "http://foo") (*default* . "http://bar")))
<!-- redeclarations #t --> <!-- redeclarations #f --> <one xmlns="http://foo"> <one xmlns="http://foo"> <two xmlns="http://bar"> <prfx1:two xmlns:prfx1="http://bar"> <three /> <prfx1:three /> <four xmlns="http://foo" /> <four /> <five xmlns=""> <five xmlns=""> <six> <six> <seven xmlns="http://foo" /> <prfx2:seven xmlns:prfx2="http://foo" /> </six> </six> </five> </five> </two> </prfx1:two> </one> </one>
One thing to note is that the empty namespace can always be redeclared to be the default namespace, despite the value of allow-prefix-redeclarations?. That's because in XML it is illegal to associate the empty namespace with an XML prefix, so we cannot auto-generate a prefix for it.
Changes from stock
- srl:parameterizable has been renamed to serialize-sxml and it now uses keyword arguments instead of (key . value) pairs.
- XML prefix redeclarations are permitted.
- The default namespace is supported.
- Chars, nulls and symbols are accepted as text nodes, in addition to the strings, numbers and booleans accepted by the stock serializer.
- Lists with a text node in head position are accepted; e.g. (#\c () "foo"). However, a symbol in the head position is always considered to be a tag.
- The optional port-or-filename argument became the output keyword. The ns-prefix-assig option was renamed to ns-prefixes. Otherwise, keywords retain their original names.
- omit-xml-declaration, standalone and version have been dropped. Add a (*PI* xml "version='1.0'") node to your SXML document instead. The old way didn't support an encoding attribute and will output a duplicate declaration if there already is one in your SXML document.
- srl:conventional-ns-prefixes had its srl: prefix stripped, and it contains many more prefixes.
- srl:sxml->xml and srl:sxml->xml-noindent (and their html variants) were dropped as they provide little benefit over serialize-sxml:
(srl:sxml->xml doc) ==> (serialize-sxml doc) (srl:sxml->xml-noindent doc) ==> (serialize-sxml doc indent: #f) (srl:sxml->html doc) ==> (serialize-sxml doc method: 'html)
Limitations
- There is currently no way to force declarations of all namespace prefixes in the root element. Instead, they are declared as locally as possible. This leads to extremely verbose output when sibling elements declare the same namespace in every single element, rather than once in a parent.
- Redeclaring a prefix declared in a parent is not supported in attributes, only in elements.
About this egg
Author
Dmitry Lizorkin wrote the original serializer code. The Chicken port, and some enhancements, are by Jim Ursetto.
Repository
This egg is hosted on the CHICKEN Subversion repository:
https://anonymous@code.call-cc.org/svn/chicken-eggs/release/5/sxml-serializer
If you want to check out the source code repository of this egg and you are not familiar with Subversion, see this page.
Version history
- 0.5
- Port to CHICKEN 5.
- 0.4
- Allow *default* pseudo-namespace in source doc; bugfixes.
- 0.3
- Accept chars, nulls and symbols as text nodes; accept lists containing text nodes in head position
- 0.2
- XML prefix redeclaration; default namespace support [Jim Ursetto]
- 0.1
- Initial import from sxml-tools CVS 1.7 @ Fri Nov 7 08:36:28 2008 UTC
License
BSD. The original serializer code by Dmitry Lizorkin is public domain.