expat
Description
An interface to James Clarks' Expat XML parser.
Author
felix winkelmann; ported to Chicken 4 by Shawn Rutledge
Requirements
Documentation
Expat is a stream-oriented parser. You register callback (or handler) functions with the parser and then start feeding it the document. As the parser recognizes parts of the document, it will call the appropriate handler for that part (if you've registered one.) The document is fed to the parser in pieces, so you can start parsing before you have all the document. This also allows you to parse really huge documents that won't fit into memory.
If you want to parse an entire document into memory or if you need more bells and whistles, you should take a look at Oleg Kiselyov's SSAX parser.
expat:make-parser
[procedure] (expat:make-parser #!key (encoding #f) (namespaces #f) (namespace-separator #\:))Creates a parser object with the specified attributes. encoding should be a string designating the encoding of the document and should be one of the following:
- UTF-8
- UTF-16
- ISO-8859-1
- US-ASCII
If no encoding or #f is given, then the encoding specified in the document. Note that the strings passed to the handlers are always UTF-8 encoded.
If namespaces is true, then namespace declarations are properly recognized and tags belonging to a namespace will be prefixed with the namespace string and the character given in namespace-separator.
expat:make-external-entity-parser
[procedure] (expat:make-external-entity-parser PARSER CONTEXT #!key (encoding #f))Creates a parser to recursively process external entities.
expat:destroy-parser
[procedure] (expat:destroy-parser PARSER)Releases the memory resources associated with PARSER.
expat:parse
[procedure] (expat:parse PARSER STRING #!key length (final #t) (external-entities #f))Parses a piece of XML document given in STRING. If length is given, then it specifies the number of bytes to parse. If final is true, then the string is the last piece of the document. LENGTH defaults to (string-length STRING).
Returns #t on success, or triggers and exception of the kinds (exn expat). If external-entities controls whether parsing of external entities is enabled and can be any of the symbols never, always or unless-standalone. #f and #t are synonymous for never and always.
expat:set-start-handler!
[procedure] (expat:set-start-handler! PARSER PROCEDURE)Sets the handler to process start (and empty) tags. PROCEDURE will be called with two arguments: the tag (a string) and a list of pairs, where each pair is of the form (ATTRIBUTENAME . ATTRIBUTEVALUE) (both strings).
expat:set-end-handler!
[procedure] (expat:set-end-handler! PARSER PROCEDURE)Sets the handler to process end (and empty) tags. PROCEDURE will be called with one argumente the tag (a string).
expat:set-character-data-handler!
[procedure] (expat:set-character-data-handler! PARSER PROCEDURE)Sets the handler to process text. PROCEDURE will be called with one argument: a string containing a piece of text. Note that a single block of contiguous text free of markup may still result in a sequence of calls to this handler.
expat:set-processing-instruction-handler!
[procedure] (expat:set-processing-instruction-handler! PARSER PROCEDURE)Sets the handler to for processing insructions. PROCEDURE will be called with two arguments: target and data (both strings). The target is the first word in the processing instruction. The data is the rest of the characters in it after skipping all whitespace after the initial word.
expat:set-comment-handler!
[procedure] (expat:set-comment-handler! PARSER PROCEDURE)Sets the handler to process comments. PROCEDURE will be called with the all the text inside the comment delimiters.
expat:set-external-entity-ref-handler!
[procedure] (expat:set-external-entity-ref-handler! PARSER PROCEDURE)Sets the handler for references to external entities. PROCEDURE will be called with five arguments: parser, context, URI base, system- and public ID. The first argument is an expat:parser record, and the rest are strings. To parse the external entity, create a parser with expat:make-external-entity-parser.
Examples
A silly example:
(use expat) (define text #<<EOF <?xml version='1.0'?> <!-- a comment --> <?pi1 yepyepyep?> <yo:this yo='abc' xmlns:yo="http://www.yo.com"> >&;lt;Ā <yo:test>yes, no, !<is/><a/> </yo:test>some more text </yo:this> EOF ) (define p (expat:make-parser namespaces: #t)) (expat:set-start-handler! p (lambda (tag attrs) (print "Start: " tag " - " attrs))) (expat:set-end-handler! p (lambda (tag) (print "End: " tag))) (expat:set-character-data-handler! p (lambda (text) (pp (string->list text)))) (expat:set-processing-instruction-handler! p (lambda (target text) (print "PI: " target " - " text))) (expat:set-comment-handler! p (lambda (text) (print "Comment: " text))) (expat:parse p text) (expat:destroy-parser p)
This will output:
Comment: a comment PI: pi1 - yepyepyep Start: http://www.yo.com:this - ((yo . abc)) (#\newline) (#\>) (#\<) (#\Ä #\) (#\newline) (#\space) Start: http://www.yo.com:test - () (#\y #\e #\s #\, #\space #\n #\o #\, #\space) (#\!) Start: is - () End: is Start: a - () End: a (#\newline) (#\space) End: http://www.yo.com:test (#\s #\o #\m #\e #\space #\m #\o #\r #\e #\space #\t #\e #\x #\t) (#\newline) End: http://www.yo.com:this
Another example that uses DTDs:
Say we have a file foo.xml:
<?xml version="1.0"?> <!DOCTYPE foo SYSTEM "foo.dtd"> <foo> &abcdef; </foo>
and another one called foo.dtd:
<!ENTITY abcdef "this is a test">
(use utils expat) (define p (expat:make-parser)) (expat:set-start-handler! p (lambda (tag attrs) (print "Start: " tag " - " attrs))) (expat:set-end-handler! p (lambda (tag) (print "End: " tag))) (expat:set-character-data-handler! p (lambda (text) (pp (string->list text)))) (expat:set-external-entity-ref-handler! p (lambda (context base sys pub) (print "external: " sys) (let* ([p2 (expat:make-external-entity-parser p context)] [s (expat:parse p2 (read-all "foo.dtd"))] ) (expat:destroy-parser p2) s) ) ) (expat:parse p (read-all "foo.xml") external-entities: #t) (expat:destroy-parser p)
Changelog
- 2.0 Ported to Chicken 5, with several backward-incompatible changes
- 1.4 Ported to Chicken 4
- 1.3 Removed use of ___callback
- 1.2 Works withh externalized easyffi extension
- 1.1 Added support for parsing external entities; optional arguments to expat:parse are now keyword arguments.
- 1.0 Initial release
License
Copyright (c) 2005, Felix L. Winkelmann All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.