You are looking at historical revision 14839 of this page. It may differ significantly from its current revision.

ssax

Description

Oleg Kiselyov's XML parser.

Author

Oleg Kiselyov, with some Chicken-specific modifications by Kirill Lisovsky. Minor changes by felix winkelmann to make the code suitable as an extension library.

Requirements

input-parse

Documentation

See the official SSAX homepage for comprehensive documentation.

The following procedure is exported:

[procedure] (ssax:xml->sxml PORT NAMESPACE-PREFIX-ASSIG)

This procedure reads XML data from PORT and returns an SXML representation. NAMESPACE-PREFIX-ASSIG is an alist that maps user prefixes (symbols) to namespaces (URI strings).

The following macros are available:

[syntax] (ssax:make-parser TAG1 PROC1 [TAG2 PROC2 ...])

Create a custom XML parser; an instance of the XML parsing framework. This will be a SAX, a DOM or a specialized parser depending on the supplied user-handlers.

The arguments to ssax::make-parser are type/procedure pairs, interleaved in the argument list. In other words, TAG1, TAG2 etc are unquoted(!) symbols that identify the type of procedure that follows the tag; see below for the list of allowed tags. The output of this macro is a procedure that represents a parser which accepts two arguments, PORT and SEED. PORT is the port from which to read the XML data and SEED is the initial value of an accumulator that will be passed into the first procedure, where it can be appended to and returned. Then this value will be passed on to the next procedure and so on to eventually obtain a result, in a FOLD-like fashion.

Given below are tags and signatures of the corresponding procedures. Not all tags have to be specified. If some are omitted, reasonable defaults will apply. SEED always represents the current value of the accumulator that will eventually be returned by the parser.

If INTERNAL-SUBSET? is #t, the current position in the port is right after we have read #\[ that begins the internal DTD subset. We must finish reading of this subset before we return (or must call ssax:skip-internal-dtd if we aren't interested in reading it).

The port at exit must be at the first symbol after the whole DOCTYPE declaration. The handler-procedure must generate four values:

     ELEMS ENTITIES NAMESPACES SEED

See xml-decl::elems for ELEMS. It may be #f to switch off the validation. NAMESPACES will typically contain user prefixes for selected URI symbols. The default handler-procedure skips the internal subset, if any, and returns (values #f '() '() SEED).

ELEM-GI is an UNRES-NAME of the root element. This procedure is called when an XML document under parsing contains no DOCTYPE declaration. The handler-procedure, as a DOCTYPE handler procedure above, must generate four values:

      ELEMS ENTITIES NAMESPACES SEED

The default handler-procedure returns (values #f '() '() seed)

The default value is '().

[syntax] (ssax:make-pi-parser PI-HANDLERS)

Create a parser to parse and process one Processing Instruction (PI) element. PI-HANDLERS is an alist (PI-TAG . PI-HANDLER) where PI-TAG is the name of the processing instruction and PI-HANDLER is a procedure PORT PI-TAG SEED.

The handler should read the rest of the PI from PORT, up to and including the combination "?>" that terminates the PI. The handler should return a new seed.

One of the PI-TAGs may be the symbol *DEFAULT*. The corresponding handler will handle PIs that no other handler will. If the *DEFAULT* PI-TAG is not specified, ssax:make-pi-parser will assume the default handler that skips the body of the PI.

[syntax] (ssax:make-elem-parser new-level-seed finish-elem char-data-handler pi-handlers)

Create a parser to parse and process one element, including its character content or children elements. The parser is typically applied to the root element of a document.

The generated parser is a procedure START-TAG-HEAD PORT ELEMS ENTITIES NAMESPACES PRESERVE-WS? SEED

new-level-seed

     procedure ELEM-GI ATTRIBUTES NAMESPACES EXPECTED-CONTENT SEED

where ELEM-GI is a RES-NAME of the element about to be processed. This procedure is to generate the seed to be passed to handlers that process the content of the element.

finish-element

     procedure ELEM-GI ATTRIBUTES NAMESPACES PARENT-SEED SEED

This procedure is called when parsing of ELEM-GI is finished. The SEED is the result from the last content parser (or from new-level-seed if the element has the empty content). PARENT-SEED is the same seed as was passed to new-level-seed. The procedure is to generate a seed that will be the result of the element parser.

char-data-handler

A string handler.

pi-handlers

See ssax:make-pi-parser.

Unicode compatibility

ssax:xml->sxml will convert numeric entities to UTF-8 byte sequences. It does not depend on the utf8 egg for this.

Otherwise, UTF-8 operation is not well tested.

Changelog

License

Public Domain