You are looking at historical revision 28495 of this page. It may differ significantly from its current revision.

This is version 0.2 of the sxpath extension library for Chicken Scheme.

Sxpath

The sxpath parts of the sxml-tools from the SSAX project at Sourceforge. This includes the DDO (Distinct Document Order) and context-based versions of sxpath, as well as txpath support.

Documentation

This egg provides the sxpath-related tools from the sxml-tools available in the SSAX/SXML Sourceforge project.

It is split up in several modules:

sxpath, context-sxpath, ddo-sxpath, txpath, xpath-parser, sxpath-lolevel, context-sxpath-lolevel and ddo-sxpath-lolevel.

The lolevel modules expose the full list of accessors, constructors and predicates used to manually traverse an SXML tree. The higher-level sxpath modules only expose a handful of procedures, which comprise the high-level interface you'd normally need to use.

Much documentation is available at Lisovsky's XML page and the SSAX homepage.

The initial documentation on this wiki page came straight from the comments in the extremely well-documented source code. It's recommended you read the code if you want to learn more.

If you're not familiar with regular xpath the sxpath documentation may be a bit confusing. Try this quick tutorial to get up to speed with xpath.

sxpath

This is the preferred interface to use. It allows you to query the SXML document tree using an s-expression based language, in which you can also use arbitrary procedures and even "classic" textual XPath (see below for docs on that).

A complete description on how to use this is outside the scope of this egg documentation. See the introduction to SXPath for that.

[procedure] (sxpath path [ns-binding])

Returns a procedure that accepts an SXML document tree and an optional association list of variables and returns a nodeset (list of nodes) that match the path expression.

The optional ns-binding argument is an alist of namespace bindings. It is used to map abbreviated namespace prefixes to full URI strings but only for textual XPath strings embedded in the path expression.

The optional association list of variables must include all the variables defined by the sxpath expression.

It can be useful to compare the following examples to those for txpath.

(use sxpath)

;; selects all the 'item' elements that have an 'olist' parent
;; (which is not root) and that are in the same document as the context node
((sxpath `(// olist item))
 '(doc (olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1") (item "3"))


(use sxpath-lolevel data-structures)

;; selects only the nth 'item' element under each 'olist' parent
;; (which is not root) and that is in the same document as the context node
;; The n is parameterized to be the first item
;; The node-pos function comes from sxpath-lolevel and implements txpath position selector [$n]
((sxpath `(// olist ((item ,(lambda (nodeset var-binding) ((node-pos (alist-ref 'n var-binding)) nodeset))))))
 '(doc (olist (item "1") (item "2")) (nested (olist (item "3")))) '((n . 1)))
 => ((item "1") (item "3"))

;; selects the 'chapter' children of the context node that have one or
;; more 'title' children with string-value equal to 'Introduction'
((sxpath '((chapter ((equal? (title "Introduction"))))))
 '(text  (chapter (title "Introduction"))  (chapter "No title for this chapter")  (chapter (title "Conclusion"))))
 => ((chapter (title "Introduction")))

;; (sxpath string-expr) is equivalent to (txpath string-expr)
((sxpath "chapter[title='Introduction']")
 '(text  (chapter (title "Introduction"))  (chapter "No title for this chapter")  (chapter (title "Conclusion"))))
 => ((chapter (title "Introduction")))
[procedure] (if-sxpath path)

Like sxpath, only returns #f instead of the empty list if nothing matches (so it does not always return a nodeset).

[procedure] (car-sxpath path)

Like sxpath, only instead of a nodeset it returns the first node found. If no node was found, return an empty list.

[procedure] (if-car-sxpath path)

Like car-sxpath, only returns #f instead of the empty list if nothing matches.

[procedure] (sxml:id-alist node . lpaths)

Builds an index as a list of (ID_value . element) pairs for given node. lpaths are location paths for attributes of type ID (ie, sxpath expressions that tell it how to find the ID attribute).

Note: location paths must be of the form (expr '@ attrib-name).

See also sxml:lookup below, in sxpath-lolevel, which can use this index.


;; Obtain ID values for a regular XHTML DOM
(sxml:id-alist
 '(div (h1 (@ (id "info"))
           "A story")
       (p (@ (id "story-body"))
	  "Once upon a time")
       (a (@ (id "back") (href "../index.xml"))
	  "click here to go back"))
 '(* @ id))
 => (("info" h1 (@ (id "info")) "A story")
     ("story-body" p (@ (id "story-body")) "Once upon a time")
     ("back" a (@ (id "back") (href "../index.xml")) "click here to go back"))

;; In an alternate reality, where links are uniquely identified
;; by their href, we would use this
(sxml:id-alist
 '(div (h1 (@ (id "info"))
	   "A story")
       (p (@ (id "story-body"))
	  "Once upon a time")
       (a (@ (id "back") (href "../index.xml"))
	  "click here to go back"))
 '(h1 @ foo) '(a @ href))
 => (("../index.xml" . (a (@ (id "back")
                             (href "../index.xml"))
                          "click here to go back")))

txpath

This section documents the txpath interface. This interface is mostly useful for programs that deal exclusively with "legacy" textual XPath queries.

Primary interface

The following procedures are the main interface one would use in practice. There are also more low-level procedures (see next section), which one could use to build txpath extensions.

[procedure] (sxml:xpath string . ns-binding)
[procedure] (txpath string . ns-binding)
[procedure] (sxml:xpath+root string . ns-binding)
[procedure] (sxml:xpath+root+vars string . ns-binding)

Returns a procedure that accepts an SXML document tree and an optional association list of variable bindings and returns a nodeset (list of nodes) that match the XPath expression string.

The optional ns-binding argument is an alist of namespace bindings. It is used to map abbreviated namespace prefixes to full URI strings.

(txpath x) is equivalent to (sxpath x) whenever x is a string. The txpath, sxml:xpath+root and sxml:xpath+root+vars procedures are currently all aliases for sxml:xpath, which exist for backwards compatibility reasons.

It's useful to compare the following examples to the above examples for sxpath.

(use txpath)

;; selects all the 'item' elements that have an 'olist' parent
;; (which is not root) and that are in the same document as the context node
((txpath "//olist/item")
 '(doc (olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1") (item "3"))

;; Same example as above, but now with a namespace prefix of 'x',
;; which is bound to the namespace "bar" in the ns-binding parameter.
((txpath "//x:olist/item" '((x . "bar")))
 '(doc (bar:olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1"))


(use sxpath sxpath-lolevel data-structures)

;; selects only the nth 'item' element under each 'olist' parent
;; (which is not root) and that is in the same document as the context node
;; The n is parameterized to be the first item
((txpath "//olist/item[$n]")
 '(doc (olist (item "1") (item "2")) (nested (olist (item "3")))) '((n . 1)))
 => ((item "1") (item "3"))

;; selects the 'chapter' children of the context node that have one or
;; more 'title' children with string-value equal to 'Introduction'
((txpath "chapter[title='Introduction']")
 '(text  (chapter (title "Introduction"))  (chapter "No title for this chapter")  (chapter (title "Conclusion"))))
 => ((chapter (title "Introduction")))
[procedure] (sxml:xpath+index string . ns-binding)

This procedure returns the result of sxml:xpath consed onto #t. If the sxml:xpath would return #f, this returns #f instead.

It is provided solely for backwards compatibility.

[procedure] (sxml:xpointer string . ns-binding)
[procedure] (sxml:xpointer+root+vars string . ns-binding)

Returns a procedure that accepts an SXML document tree and returns a nodeset (list of nodes) that match the XPointer expression string.

The optional ns-binding argument is an alist of namespace bindings. It is used to map abbreviated namespace prefixes to full URI strings.

Currently, only the XPointer xmlns() and xpointer() schemes are implemented, the element() scheme is not.

;; selects all the 'item' elements that have an 'olist' parent
;; (which is not root) and that are in the same document as the context node.
;; Equivalent to (txpath "//olist/item").
((sxml:xpointer "xpointer(//olist/item)")
 '(doc (olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1") (item "3"))

;; An example with a namespace prefix, now using the XPointer xmlns()
;; function instead of the ns-binding parameter. xmlns always have full
;; namespace names on their right-hand side, never bound shortcuts.
((sxml:xpointer "xmlns(x=bar)xpointer(//x:olist/item)")
 '(doc (bar:olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1"))
[procedure] (sxml:xpointer+index string . ns-binding)

This procedure returns the result of sxml:xpointer consed onto #t. If the sxml:xpointer would return #f, this returns #f instead.

It is provided solely for backwards compatibility.

[procedure] (sxml:xpath-expr string . ns-binding)

Returns a procedure that accepts an SXML node and returns #t if the node matches the string expression. This is an expression of type Expr, which is whatever you can put in a predicate (between square brackets after a node name).

The optional ns-binding argument is an alist of namespace bindings. It is used to map abbreviated namespace prefixes to full URI strings.

;; Does the node have a class attribute with "content" as value?
((sxml:xpath-expr "@class=\"content\"")
 '(div (@ (class "content")) (p "Lorem ipsum")))
 => #t

;; Does the node have a paragraph with string value of "Lorem ipsum"?
((sxml:xpath-expr "p=\"Lorem ipsum\"")
 '(div (@ (class "content")) (p "Lorem ipsum")))
 => #t

;; Does the node have a "p" child node with string value of "Blah"?
((sxml:xpath-expr "p=\"Blah\"")
 '(div (@ (class "content")) (p "Lorem ipsum")))
 => #f

XPath function library

The procedures documented in this section can be used to implement a custom xpath traverser. Unlike the sxpath low-level procedures, they are not in a separate library because they are in the same file as the high-level procedures, so the library size is not impacted by splitting them up. When importing the txpath module you can simply leave these procedures out, so splitting them up into a separate library would provide no benefits.

These procedures implement the core XPath functions, as described in The XPath specification, section 4.

All of the following procedures return procedures that accept 4 arguments, which together make up (part of) the XPath context:

 (lambda (nodeset root-node context var-binding) ...)

The nodeset argument is the nodeset (a list of nodes) that is currently under consideration. The root-node argument is a nodeset containing only one element: the root node of the document. The context argument is a list of two numbers; the position and size of the context. The var-binding argument is an alist of XPath variable bindings.

The arguments to each of these core procedures, if any, are all procedures of the same type as they return. For example, sxml:core-local-name accepts an optional procedure which accepts a nodeset, a root-node, a context, a var-binding and returns a nodeset. Of this nodeset, the local part of the name of the first node (if any) is returned. The values for each of these arguments are just those passed to sxml:core-local-name.

Node set functions

[procedure] (sxml:core-last)
[procedure] (sxml:core-position)
[procedure] (sxml:core-count node-set)
[procedure] (sxml:core-id object)
[procedure] (sxml:core-local-name [node-set])
[procedure] (sxml:core-namespace-uri [node-set])
[procedure] (sxml:core-name [node-set])

String functions

[procedure] (sxml:core-string [object])
[procedure] (sxml:core-concat [string ...])
[procedure] (sxml:core-starts-with string prefix)
[procedure] (sxml:core-contains string substring)
[procedure] (sxml:core-substring-before string separator)
[procedure] (sxml:core-substring-after string separator)
[procedure] (sxml:core-substring string numeric-offset [length])
[procedure] (sxml:core-string-length [string])
[procedure] (sxml:core-normalize-space [string])
[procedure] (sxml:core-translate string from to)

Boolean functions

[procedure] (sxml:core-boolean object)
[procedure] (sxml:core-not boolean)
[procedure] (sxml:core-true)
[procedure] (sxml:core-false)
[procedure] (sxml:core-lang lang-code)

Number functions

[procedure] (sxml:core-number [object])
[procedure] (sxml:core-sum node-set)
[procedure] (sxml:core-floor number)
[procedure] (sxml:core-ceiling number)
[procedure] (sxml:core-round number)

Parameter list

[constant] sxml:classic-params

This is a very long list of parameters containing parser and traversal information for the textual xpath parser engine. This corresponds to the "function library" mentioned in the introduction of the XPath spec. You will have read the source code for details on how exactly to use it.

sxpath-lolevel

This section documents the low-level sxpath interface. It includes mostly-generic list and SXML operators. This is equivalent to the "low-level sxpath interface" described at the introduction to SXPath.

These utilities are useful when you want to query SXML document trees, but full sxpath would be overkill. Most of these procedures are faster than their sxpath equivalent, because they are very specific. But this also means they are very low-level, so you should use them only if you know what you're doing.

Predicates

[procedure] (sxml:empty-element? obj)

Predicate which returns #t if given element obj is empty. Empty elements have no nested elements, text nodes, PIs, Comments or entities but may contain attributes or namespace-id. It is a SXML counterpart of XML empty-element.

[procedure] (sxml:shallow-normalized? obj)

Returns #t if the given obj is a shallow-normalized SXML element. The element itself has to be normalised but its nested elements are not tested.

[procedure] (sxml:normalized? obj)

Returns #t if the given obj is a normalized SXML element. The element itself and all its nested elements have to be normalised.

[procedure] (sxml:shallow-minimized? obj)

Returns #t if the given obj is a shallow-minimized SXML element. The element itself has to be minimised but its nested elements are not tested.

[procedure] (sxml:minimized? obj)

Returns #t if the given obj is a minimized SXML element. The element itself and all its nested elements have to be minimised.

Accessors

These procedures obtain information about nodes, or their direct children. They don't traverse subtrees.

Normalization-independent accessors

These accessors can be used on arbitrary, non-normalized SXML trees. Because of this, they are generally slower than the normalization-dependent variants listed in the next section.

[procedure] (sxml:name node)

Returns a name of a given SXML node. It is introduced for the sake of encapsulation.

[procedure] (sxml:element-name obj)

A checked version of sxml:name, which returns #f if the given obj is not a SXML element. Otherwise returns its name.

[procedure] (sxml:node-name obj)

Safe version of sxml:name, which returns #f if the given obj is not a SXML node. Otherwise returns its name.

The difference between this and sxml::element-name is that a node can be one of @, @@, *PI*, *COMMENT* or *ENTITY* while an element must be a real element (any symbol not in that set is considered to be an element).

[procedure] (sxml:ncname node)

Like sxml:name, except returns only the local part of the name (called an "NCName" in the XML namespaces spec).

The node's name is interpreted as a "Qualified Name", a colon-separated name of which the last one is considered to be the local part. If the name contains no colons, the name itself is returned.

Important: Please note that while an SXML name is a symbol, this function returns a string.

[procedure] (sxml:name->ns-id sxml-name)

Given a node name, return the namespace part of the name (called a namespace-id). If the name contains no colons, returns #f. See sxml:ncname for more info.

Important: Please note that while an SXML name is a symbol, this function returns a string.

[procedure] (sxml:content obj)

Retrieve the contents of an SXML element or nodeset. Any non-element nodes (attributes, processing instructions, etc) are discarded, while the elements and text nodes are returned as a list of strings and nested elements in document order. This list is empty if obj is an empty element or empty list.

The inner elements are unmodified so they still contain attributes, but also comments or other non-element nodes.

(sxml:content
  '(div (@ (class "content"))
        (*COMMENT* "main contents start here")
         "The document moved "
	 (a (@ (href "/other.xml")) "here")))
 => ("The document moved " (a (@ (href "/other.xml")) "here"))
[procedure] (sxml:text node)

Returns a string which combines all the character data from text node children of the given SXML element or "" if there are no text node children. Note that it does not include text from descendant nodes, only direct children.

(sxml:text
  '(div (@ (class "content"))
        (*COMMENT* "main contents start here")
         "The document moved "
	 (a (@ (href "/other.xml")) "here")))
 => ("The document moved ")

Normalization-dependent accessors

"Universal" accessors are less effective but may be used for non-normalized SXML. These safe accessors are named with suffix '-u' for "universal".

"Fast" accessors are optimized for normalized SXML data. They are not applicable to arbitrary non-normalized SXML data. Their names have no specific suffixes.

[procedure] (sxml:content-raw obj)

Returns all the content of normalized SXML element except attr-list and aux-list. Thus it includes PI, COMMENT and ENTITY nodes as well as TEXT and ELEMENT nodes returned by sxml:content. Returns a list of nodes in document order or empty list if obj is an empty element or an empty list.

This function is faster than sxml:content.

[procedure] (sxml:attr-list-u obj)

Returns the list of attributes for given element or nodeset. Analog of ((sxpath '(@ *)) obj). Empty list is returned if there is no list of attributes.

[procedure] (sxml:aux-list obj)
[procedure] (sxml:aux-list-u obj)

Returns the list of auxiliary nodes for given element or nodeset. Analog of ((sxpath '(@@ *)) obj). Empty list is returned if a list of auxiliary nodes is absent.

[procedure] (sxml:aux-node obj aux-name)

Return the first aux-node with <aux-name> given in SXML element obj or #f is such a node is absent.

NOTE: it returns just the first node found even if multiple nodes are present, so it's mostly intended for nodes with unique names. Use sxml:aux-nodes if you want all of them.

[procedure] (sxml:aux-nodes obj aux-name)
  

Return a list of aux-nodes with aux-name given in SXML element obj or '() if such a node is absent.

[procedure] (sxml:attr obj attr-name)

Returns the value of the attribute with name attr-name in the given SXML element obj, or #f if no such attribute exists.

[procedure] (sxml:attr-from-list attr-list name)

Returns the value of the attribute with name attr-name in the given list of attributes attr-list, or #f if no such attribute exists. The list of attributes can be obtained from an element using the sxml:attr-list procedure.

[procedure] (sxml:num-attr obj attr-name)

Returns the value of the numerical attribute with name attr-name in the given SXML element obj, or #f if no such attribute exists. This value is converted from a string to a number.

[procedure] (sxml:attr-u obj attr-name)

Accessor for an attribute attr-name of given SXML element obj, which may also be an attributes-list or a nodeset (usually content of an SXML element)

[procedure] (sxml:ns-list obj)

Returns the list of namespaces for given element. Analog of ((sxpath '(@@ *NAMESPACES* *)) obj). The empty list is returned if there are no namespaces.

[procedure] (sxml:ns-id->nodes obj namespace-id)

Returns a list of namespace information lists that match the given namespace-id in SXML element obj. Analog of ((sxpath '(@@ *NAMESPACES* namespace-id)) obj). The empty list is returned if there is no namespace with the given namespace-id.

(sxml:ns-id->nodes
  '(c:part (@) (@@ (*NAMESPACES* (c "http://www.cars.com/xml")))) 'c)
 => ((c "http://www.cars.com/xml"))
[procedure] (sxml:ns-id->uri obj namespace-id)

Returns the URI for the (first) namespace matching the given namespace-id, or #f if no namespace matches the given namespace-id.

(sxml:ns-id->uri
  '(c:part (@) (@@ (*NAMESPACES* (c "http://www.cars.com/xml")))) 'c)
 => "http://www.cars.com/xml"
[procedure] (sxml:ns-uri->nodes obj uri)

Returns a list of namespace information lists that match the given uri in SXML element obj.

(sxml:ns-uri->nodes
  '(c:part (@) (@@ (*NAMESPACES* (c "http://www.cars.com/xml")
                                 (d "http://www.cars.com/xml"))))
  "http://www.cars.com/xml")
 => ((c "http://www.cars.com/xml") (d "http://www.cars.com/xml"))
[procedure] (sxml:ns-uri->id obj uri)

Returns the namespace id for the (first) namespace matching the given uri, or #f if no namespace matches the given uri.

(sxml:ns-uri->id
  '(c:part (@) (@@ (*NAMESPACES* (c "http://www.cars.com/xml")
                                 (d "http://www.cars.com/xml"))))
  "http://www.cars.com/xml")
 => c
[procedure] (sxml:ns-id ns-list)

Given a namespace information list ns-list, returns the namespace ID.

[procedure] (sxml:ns-uri ns-list)

Given a namespace information list ns-list, returns the namespace URI.

[procedure] (sxml:ns-prefix ns-list)

Given a namespace information list ns-list, returns the namespace prefix if it is present in the list. If it's not present, returns the namespace ID.

Data modification procedures

Constructors and mutators for normalized SXML data

Important: These functions are optimized for normalized SXML data. They are not applicable to arbitrary non-normalized SXML data.

Most of the functions are provided in two variants:

  1. Side-effect intended functions for linear update of given elements. Their names are ended with exclamation mark.
  2. Pure functions without side-effects which return modified elements.
[procedure] (sxml:change-content! obj new-content)
[procedure] (sxml:change-content obj new-content)

Change the content of given SXML element obj to new-content. If new-content is an empty list then the obj is transformed to an empty element. The resulting SXML element is normalized.

[procedure] (sxml:change-attrlist obj new-attrlist)
[procedure] (sxml:change-attrlist! obj new-attrlist)

Change the attribute list of the given SXML element obj to new-attrlist.

[procedure] (sxml:change-name obj new-name)
[procedure] (sxml:change-name! obj new-name)

Change the name of the given SXML element obj to new-name.

[procedure] (sxml:add-attr obj attr)
[procedure] (sxml:add-attr! obj attr)

Returns the given SXML element obj with the attribute attr added to the attribute list, or #f if the attribute already exists.

[procedure] (sxml:change-attr obj attr)
[procedure] (sxml:change-attr! obj attr)

Returns SXML element obj with changed value of attribute attr or #f if where is no attribute with given name.

attr is a list like it would occur as a member of an attribute list: (attr-name attr-value).

  
[procedure] (sxml:set-attr obj attr)
[procedure] (sxml:set-attr! obj attr)

Returns SXML element obj with changed value of attribute attr. If there is no such attribute the new one is added.

attr is a list like it would occur as a member of an attribute list: (attr-name attr-value).

[procedure] (sxml:add-aux obj aux-node)
[procedure] (sxml:add-aux! obj aux-node)

Returns SXML element obj with an auxiliary node aux-node added.

[procedure] (sxml:squeeze obj)
[procedure] (sxml:squeeze! obj)

Returns a minimized and normalized SXML element obj with empty lists of attributes and aux-lists eliminated, in obj and all its descendants.

  
[procedure] (sxml:clean obj)

Returns a minimized and normalized SXML element obj with empty lists of attributes and all aux-lists eliminated, in obj and all its descendants.

[procedure] (select-first-kid test-pred?)

Given a node, return the first child that satisfies the test-pred?. Given a nodeset, traverse the set until a node is found whose first child matches the predicate. Returns #f if there is no such a child to be found.

[procedure] (sxml:node-parent rootnode)

Returns a function of one argument - an SXML element - which returns its parent node using *PARENT* pointer in the aux-list. '*TOP-PTR* may be used as a pointer to root node. It returns an empty list when applied to the root node.

[procedure] (sxml:add-parents obj [top-ptr])

Returns the SXML element obj annotated with *PARENT* pointers for obj and all its descendants. If obj is not the root node (a node with a name of *TOP*), you must pass in the parent pointer for obj as top-ptr.

Warning: This procedure mutates its obj argument.

[procedure] (sxml:lookup id index)

Lookup an element using its ID. index should be an alist of (id . element).

Markup generation

XML
[procedure] (sxml:attr->xml attr)

Returns a list containing tokens that when joined together form the attribute's XML output.

Warning: This procedure assumes that the attribute's values have already been escaped (ie, sxml:string->xml has been called on the strings inside it).

(sxml:attr->xml '(href "http://example.com"))
 => (" " "href" "='" "http://example.com" "'")
[procedure] (sxml:string->xml string)

Escape the string so it can be used anywhere in XML output. This converts the <, >, ', " and & characters to their respective entities.

[procedure] (sxml:sxml->xml tree)

Convert the tree of SXML nodes to a nested list of XML fragments. These fragments can be output by flattening the list and concatenating the strings inside it.

HTML

[procedure] (sxml:attr->html attr)

Returns a list containing tokens that when joined together form the attribute's HTML output. The difference with the XML variant is that this encodes empty attribute values to attributes with no value (think selected in option elements, or checked in checkboxes).

Warning: This procedure assumes that the attribute's values have already been escaped (ie, sxml:string->html has been called on the strings inside it).

[procedure] (sxml:string->html string)

Escape the string so it can be used anywhere in XML output. This converts the <, >, " and & characters to their respective entities.

[procedure] (sxml:non-terminated-html-tag? tag)

Is the named tag one that is "self-closing" (ie, does not need to be terminated) in HTML 4.0?

[procedure] (sxml:sxml->html tree)

Convert the tree of SXML nodes to a nested list of HTML fragments. These fragments can be output by flattening the list and concatenating the strings inside it.

Procedures from sxpathlib

Basic converters and applicators

A converter is a function

 type Converter = Node|Nodelist -> Nodelist

A converter can also play a role of a predicate: in that case, if a converter, applied to a node or a nodelist, yields a non-empty nodelist, the converter-predicate is deemed satisfied. Throughout this file a nil nodelist is equivalent to #f in denoting a failure.

[procedure] (nodeset? obj)

Returns #t if obj is a nodelist.

[procedure] (as-nodeset obj)

If obj is a nodelist - returns it as is, otherwise wrap it in a list.

Node test

The following functions implement 'Node test's as defined in Sec. 2.3 of the XPath document. A node test is one of the components of a location step. It is also a converter-predicate in SXPath.

[procedure] (sxml:element? obj)

Predicate which returns #t if obj is SXML element, otherwise #f.

[procedure] (ntype-names?? crit)

Takes a list of acceptable node names as a criterion and returns a function, which, when applied to a node, will return #t if the node name is present in criterion list and #f otherwise.

  ntype-names?? :: ListOfNames -> Node -> Boolean
[procedure] (ntype?? crit)

Takes a type criterion and returns a function, which, when applied to a node, will tell if the node satisfies the test.

 ntype?? :: Crit -> Node -> Boolean

The criterion crit is one of the following symbols:

@
tests if the Node is an attributes-list
*
tests if the Node is an Element
*text*
tests if the Node is a text node
*data*
tests if the Node is a data node (text, number, boolean, etc., but not pair)
*PI*
tests if the Node is a processing instructions node
*COMMENT*
tests if the Node is a comment node
*ENTITY*
tests if the Node is an entity node
*any*
#t for any type of Node
other symbol
tests if the Node has the right name given by the symbol
((ntype?? 'div) '(div (@ (class "greeting")) "hi"))
 => #t

((ntype?? 'div) '(span (@ (class "greeting")) "hi"))
 => #f

((ntype?? '*) '(span (@ (class "greeting")) "hi"))
 => #t
  
[procedure] (ntype-namespace-id?? ns-id)

This function takes a namespace-id, and returns a predicate Node -> Boolean, which is #t for nodes with the given namespace id. ns-id is a string. (ntype-namespace-id?? #f) will be #t for nodes with non-qualified names.

[procedure] (sxml:complement pred)

This function takes a predicate and returns it complemented, that is if the given predicate yields #f or '() the complemented one yields the given node and vice versa.

[procedure] (node-eq? other)

Returns a predicate procedure that, given a node, returns #t if the node is the exact same as other.

[procedure] (node-equal? other)

Returns a predicate procedure that, given a node, returns #t if the node has the same contents as other.

[procedure] (node-pos n)

Returns a procedure that, given a nodelist, returns a new nodelist containing only the nth element, counting from 1. If n is negative, it returns a nodelist with the nth element counting from the right. If no such node exists, returns the empty list. n may not equal zero.

Examples:

((node-pos 1) '((div "hi") (span "hello") (em "really, hi!")))
 => ((div "hi"))

((node-pos 6) '((div "hi") (span "hello") (em "really, hi!")))
 => ()

((node-pos -1) '((div "hi") (span "hello") (em "is this thing on?")))
 => ((em "is this thing on?"))
[procedure] (sxml:filter pred?)

Returns a procedure that accepts a nodelist or a node (which will be converted to a one-element nodelist) and returns only those nodes for which the predicate pred? does not return #f or '().

((sxml:filter (ntype?? 'div)) '((div "hi") (span "hello") (div "still here?")))
 => ((div "hi") (div "still here?"))
[procedure] (take-until pred?)
[procedure] (take-after pred?)

Returns a procedure that accepts a node or a nodelist.

The take-until variant returns everything before the first node for which the predicate pred? returns anything but #f or '(). In other words, it returns the longest prefix for which the predicate returns #f or '().

The take-after variant returns everything after the first node for which the predicate pred? returns anything besides #f or '().

((take-until (ntype?? 'span)) '((div "hi") (span "hello") (span "there") (div "still here?")))
 => ((div "hi"))

((take-after (ntype?? 'span)) '((div "hi") (span "hello") (span "there") (div "still here?")))
 => ((span "there") (div "still here?"))
[procedure] (map-union proc list)

Apply proc to each element of the nodelist lst and return the list of results. If proc returns a nodelist, splice it into the result (essentially returning a flattened nodelist).

[procedure] (node-reverse node-or-nodelist)

Accepts a nodelist and reverses the nodes inside. If a node is passed to this procedure, it returns a nodelist containing just that node. (it does not change the order of the children).

Converter combinators

Combinators are higher-order functions that transmogrify a converter or glue a sequence of converters into a single, non-trivial converter. The goal is to arrive at converters that correspond to XPath location paths.

From a different point of view, a combinator is a fixed, named pattern of applying converters. Given below is a complete set of such patterns that together implement XPath location path specification. As it turns out, all these combinators can be built from a small number of basic blocks; regular functional composition, map-union and filter applicators, and the nodelist union.

[procedure] (select-kids pred?)

Returns a procedure that accepts a node and returns a nodelist of the node's children that satisfy pred? (ie, pred? returns anything but #f or '()).

[procedure] (node-self pred?)

Similar to select-kids but applies to the node itself rather than to its children. The resulting Nodelist will contain either one component (the node), or will be empty (if the node failed the predicate).

[procedure] (node-join . selectors)

Returns a procedure that accepts a nodelist or a node, and returns a nodelist with all the selectors applied to every node in sequence. The selectors must function as converter combinators, ie they must accept a node and output a nodelist.

((node-join
  (select-kids (ntype?? 'li))
  sxml:content)
 '((ul (@ (class "whiskies"))
       (li "Ardbeg")
       (li "Glenfarclas")
       (li "Springbank"))))
 => ("Ardbeg" "Glenfarclas" "Springbank")
[procedure] (node-reduce . converters)

A regular functional composition of converters.

From a different point of view,

 ((apply node-reduce converters) nodelist)

is equivalent to

 (fold apply nodelist converters)

i.e., folding, or reducing, a list of converters with the nodelist as a seed.

[procedure] (node-or . converters)

This combinator applies all converters to a given node and produces the union of their results. This combinator corresponds to a union, "|" operation for XPath location paths.

[procedure] (node-closure test-pred?)

Select all descendants of a node that satisfy a converter-predicate. This combinator is similar to select-kids but applies to grandchildren as well.

[procedure] (node-trace title)

Returns a procedure that accepts a node or a nodelist, which it pretty-prints to the current output port, preceded by title. It returns the node or the nodelist unchanged. This is a useful debugging aid, since it doesn't really do anything besides print its argument and pass it on.

[procedure] (sxml:node? obj)

Returns #t if the given obj is an SXML node, #f otherwise. A node is anything except an attribute list or an auxiliary list.

[procedure] (sxml:attr-list node)

Returns the list of attributes for a given SXML node. The empty list is returned if the given node is not an element, or if it has no list of attributes.

This differs from sxml:attr-list-u in that this procedure accepts any SXML node while sxml:attr-list-u only accepts nodelists or elements. This means that sxml:attr-list-u will throw an error if you pass it a text node (a string), while sxml:attr-list will not.

[procedure] (sxml:attribute test-pred?)

Like sxml:filter, but considers the attributes instead of the nodes. Returns a nodelist of attribtes that match test-pred?.

((sxml:attribute (ntype?? 'id))
 '((div (@ (id "navigation")) "navigation here")
   (div (@ (class "pullquote")) "random stuff")
   (div (@ (id "main-content")) "lorem ipsum ...")))
 => ((id "navigation") (id "main-content"))
[procedure] (sxml:child test-pred?)

This procedure is similar to select-kids, but it returns an empty child-list for PI, Comment and Entity nodes.

[procedure] (sxml:parent test-pred?)

Returns a procedure that accepts a root-node, and returns another procedure. This second procedure accepts a nodeset (or a node) and returns the immediate parents of the nodes in the set, but only if for those parents that match the predicate.

The root-node does not have to be the root node of the whole SXML tree -- it may be a root node of a branch of interest.

This procedure can be used with any SXML node.

Useful shortcuts

[procedure] (node-parent node)

(node-parent rootnode) yields a converter that returns a parent of a node it is applied to. If applied to a nodelist, it returns the list of parents of nodes in the nodelist.

This is equivalent to ((sxml:parent (ntype? '*any*)) node).

[procedure] (sxml:child-nodes node)

Returns all the child nodes of the given node.

This is equivalent to ((sxml:child sxml:node?) node).

[procedure] (sxml:child-elements node)

Returns all the child elements of the given node. (ie, excludes any textnodes).

This is equivalent to ((select-kids sxml:element?) node).

Procedures from sxpath-ext

SXML counterparts to W3C XPath Core Functions Library

[procedure] (sxml:string object)

The counterpart to XPath 'string' function (section 4.2 XPath 1.0 Rec.). Converts a given object to a string.

Notes:

  1. When converting a nodeset, document order is not preserved
  2. number->string returns the result in a form which is slightly different from XPath Rec. specification
[procedure] (sxml:boolean object)

The counterpart to XPath 'boolean' function (section 4.3 XPath Rec.). Converts its argument to a boolean.

[procedure] (sxml:number object)

The counterpart to XPath 'number' function (section 4.4 XPath Rec.). Converts its argument to a number.

Notes:

  1. The argument is not optional (yet?)
  2. string->number conversion is not IEEE 754 round-to-nearest
  3. NaN is represented as 0
[procedure] (sxml:string-value node)

Returns a string value for a given node in accordance to XPath Rec. 5.1 - 5.7

[procedure] (sxml:id id-index)

Returns a procedure that accepts a nodeset and returns a nodeset containing the elements in the id-index that match the string-values of each entry of the nodeset. XPath Rec. 4.1

The id-index is an alist with unique IDs as key, and elements as values:

 id-index = ( (id-value . element) (id-value . element) ... )

Comparators for XPath objects

[procedure] (sxml:list-head list n)

Returns the n first members of list. Mostly equivalent to SRFI-1's take procedure, except it returns the list if n is larger than the length of said list, instead of throwing an error.

[procedure] (sxml:merge-sort less-than? list)

Returns the sorted list, the smallest member first.

 less-than? ::= (lambda (obj1 obj2) ...)

less-than? returns #t if obj1 < obj2 with respect to the given ordering.

[procedure] (sxml:equality-cmp bool=? number=? string=?)

A helper for XPath equality operations: = , !=. The bool=?, number=? and string=? arguments are comparison operations for booleans, numbers and strings respectively.

Returns a procedure that accepts two objects, looks at the first object's type and applies the correct comparison predicate to it. Type coercion takes place depending on the rules described in the XPath 1.0 spec, section 3.4 ("Booleans").

[procedure] (sxml:equal? obj1 obj2)
[procedure] (sxml:not-equal? obj1 obj2)

Equality procedures with the default comparison operators eq?, = and string=?, or their inverse, respectively.

[procedure] (sxml:relational-cmp op)

A helper for XPath relational operations: <, >, <=, >= for two XPath objects. op is one of these operators.

Returns a procedure that accepts two objects and returns the value of the procedure applied to these objects, converted according to the coercion rules described in the XPath 1.0 spec, section 3.4 ("Booleans").

XPath axes

[procedure] (sxml:ancestor test-pred?)

Like sxml:parent, except it returns all the ancestors that match test-pred?, not just the immediate parent.

[procedure] (sxml:ancestor-or-self test-pred?)

Like sxml:ancestor, except also allows the node itself to match the predicate.

[procedure] (sxml:descendant test-pred?)

Like node-closure, except the resulting nodeset is in depth-first order instead of breadth-first.

[procedure] (sxml:descendant-or-self test-pred?)

Like sxml:descendant, except also allows the node itself to match the predicate.

[procedure] (sxml:following test-pred?)

Returns a procedure that accepts a root node and returns a new procedure that accepts a node and returns all nodes following this node in the document source matching the predicate.

[procedure] (sxml:following-sibling test-pred?)

Like sxml:following, except only siblings (nodes at the same level under the same parent) are returned.

[procedure] (sxml:preceding test-pred?)

Returns a procedure that accepts a root node and returns a new procedure that accepts a node and returns all nodes preceding this node in the document source matching the predicate.

[procedure] (sxml:preceding-sibling test-pred?)

Like sxml:preceding, except only siblings (nodes at the same level under the same parent) are returned.

[procedure] (sxml:namespace test-pred?)

Returns a procedure that accepts a nodeset and returns the namespace lists of the nodes matching test-pred?.

Examples

The SXML tutorial, though incomplete at the time of writing, contains a large section about sxpath and how to use it. This is your best bet for understanding it, aside from this eggdoc.

About this egg

Author

Oleg Kiselyov, Kirill Lisovsky, Dmitry Lizorkin.

Version history

0.2
Add modules for DDO and contextual versions of sxpath. Split up xpath-parser low-level stuff into its own module.
0.1.3
Fix bug in normalize-space() and possibly other xpath primitives reported by Felix.
0.1.2
Fix problem with attribute selectors reported by Daishi Kato.
0.1.1
Use string-concatenate instead of (apply string-append ...)
0.1
Split up the old sxml-tools egg into sxpath

License

The sxml-tools are in the public domain.