uri-generic
Description
The uri-generic library contains procedures for parsing and manipulation of Uniform Resource Identifiers (RFC 3986). It is intended to conform more closely to the RFC, and uses combinator parsing and character classes rather than regular expressions.
This library should be considered to be a basis for creating scheme-specific URI parser libraries. This library only parses the generic components from a URI. Any specific library can further parse subcomponents. For this reason, encoding and decoding of percent-encoded characters is not done automatically. This should be handled by specific URI scheme implementations.
For a more practical library which deals with "common" URI schemes like http, ftp, file and such, see the uri-common egg, which is such a specific implementation.
Library Procedures
Constructors and predicates
As specified in section 2.3 of RFC 3986, URI constructors automatically decode percent-encoded octets in the range of unreserved characters. This means that the following holds true:
(equal? (uri-reference "http://example.com/foo-bar") (uri-reference "http://example.com/foo%2Dbar")) => #t[procedure] (uri-reference STRING) => URI
A URI reference is either a URI or a relative reference (RFC 3986, Section 4.1). If the given string's prefix does not match the syntax of a scheme followed by a colon separator, then the given string is parsed as a relative reference. If STRING is neither a URI nor a relative reference, uri-reference returns #f.
[procedure] (uri-reference? URI) => BOOLIs the given object a URI reference? All objects created by URI-generic constructors are URI references; they are either URIs or relative references. The constructors below are just more strict checking versions of uri-reference. They all create URI references.
[procedure] (absolute-uri STRING) => URIParses the given string as an absolute URI, in which no fragments are allowed. If no URI scheme is found, or a fragment is detected, this raises an error.
Absolute URIs are defined by RFC 3986 as non-relative URI references without a fragment (RFC 3986, Section 4.2). Absolute URIs can be used as a base URI to resolve a relative-ref against, using uri-relative-to (see below).
[procedure] (make-uri #!key authority scheme path query fragment host port username password) => URIConstructs a URI from the given components.
[procedure] (absolute-uri? URI) => BOOLIs the given object an absolute URI?
[procedure] (uri? URI) => BOOLIs the given object a URI? URIs are all URI references that include a scheme part. The other type of URI references are relative references.
[procedure] (relative-ref? URI) => BOOLIs the given object a relative reference? Relative references are defined by RFC 3986 as URI references which are not URIs; they contain no URI scheme and can be resolved against an absolute URI to obtain a complete URI using uri-relative-to.
[procedure] (uri-path-absolute? URI) => BOOLIs the URI's path component an absolute path?
[procedure] (uri-path-relative? URI) => BOOLIs the URI's path component a relative path?
Attribute accessors
[procedure] (uri-authority URI) => URI-AUTH[procedure] (uri-scheme URI) => SYMBOL
[procedure] (uri-path URI) => LIST
[procedure] (uri-query URI) => STRING
[procedure] (uri-fragment) URI => STRING
[procedure] (uri-host URI) => STRING
[procedure] (uri-port URI) => INTEGER
[procedure] (uri-username URI) => STRING
[procedure] (uri-password URI) => STRING
[procedure] (authority? URI-AUTH) => BOOL
[procedure] (authority-host URI-AUTH) => STRING
[procedure] (authority-port URI-AUTH) => INTEGER
[procedure] (authority-username URI-AUTH) => STRING
[procedure] (authority-password URI-AUTH) => STRING
If a component is not defined in the given URI, then the corresponding accessor returns #f, except for uri-path, which will always return a (possibly empty) list.
[procedure] (update-uri URI #!key authority scheme path query fragment host port username password) => URI[procedure] (update-authority URI-AUTH #!key host port username password) => URI
Update the specified keys in the URI or URI-AUTH object in a functional way (ie, it creates a new copy with the modifications).
String and List Representations
[procedure] (uri->string URI [USERINFO]) => STRINGReconstructs the given URI into a string; uses a supplied function LAMBDA USERNAME PASSWORD -> STRING to map the userinfo part of the URI. If not given, it represents the userinfo as the username followed by ":******".
[procedure] (uri->list URI USERINFO) => LISTReturns a list of the form (SCHEME SPECIFIC FRAGMENT); SPECIFIC is of the form (AUTHORITY PATH QUERY).
Reference Resolution
[procedure] (uri-relative-to URI URI) => URIResolve the first URI as a reference relative to the second URI, returning a new URI (RFC 3986, Section 5.2.2).
[procedure] (uri-relative-from URI URI) => URIConstructs a new, possibly relative, URI which represents the location of the first URI with respect to the second URI.
(import uri-generic) (uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/"))) => "http://example.com/foo/qux" (uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/"))) => "../qux"
String encoding and decoding
[procedure] (uri-encode-string STRING [CHAR-SET]) => STRINGReturns the percent-encoded form of the given string. The optional char-set argument controls which characters should be encoded. It defaults to the complement of char-set:uri-unreserved. This is always safe, but often overly careful; it is allowed to leave certain characters unquoted depending on the context.
[procedure] (uri-decode-string STRING [CHAR-SET]) => STRINGReturns the decoded form of the given string. The optional char-set argument controls which characters should be decoded. It defaults to char-set:full.
This will raise a condition when the string contains percent-encoded bytes which are invalid when interpreted in UTF-8 encoding.
Normalization
[procedure] (uri-normalize-case URI) => URIURI case normalization (RFC 3986 section 6.2.2.1)
[procedure] (uri-normalize-path-segments URI) => URIURI path segment normalization (RFC 3986 section 6.2.2.3)
Character sets
As a convenience for sub-parsers or other special-purpose URI handling code, there are a couple of character sets exported by uri-generic.
[constant] char-set:gen-delimsGeneric delimiters.
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"[constant] char-set:sub-delims
Sub-delimiters.
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="[constant] char-set:uri-reserved
The union of gen-delims and sub-delims; all reserved URI characters.
reserved = gen-delims / sub-delims[constant] char-set:uri-unreserved
All unreserved characters that are allowed in a URI.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Note that this is _not_ the complement of char-set:uri-reserved! There are several characters (even printable, noncontrol characters) which are not allowed at all in a URI.
Requires
Repository
This egg is hosted on the CHICKEN Subversion repository:
https://anonymous@code.call-cc.org/svn/chicken-eggs/release/6/uri-generic
If you want to check out the source code repository of this egg and you are not familiar with Subversion, see this page.
Version History
- 4.0 Initial version for CHICKEN 6
- 3.3 fix to uri-normalize-case to handle URIs without scheme or host
- 3.2 uri-relative-from now will always attempt to construct a relative path when possible. This makes it easier to construct URLs that still work behind proxies and other types of path manglers (thanks to Jim Ursetto).
- 3.1 Fixed parsing of IPv6, which was completely broken (#1530, thanks to Vasilij Schneidermann). Dropped unnecessary dependency on srfi-13.
- 3.0 Ported to CHICKEN 5.
- 2.43 Fixed handling of UTF-8 characters when percent-encoding/decoding (thanks to Adrien Ramos).
- 2.42 Improved performance.
- 2.41 Make code more portable by avoiding keyword arguments (thanks to Seth Alves).
- 2.39 Get rid of a compiler warning due to broken ipv4 address handling (thanks to Mario Goulart).
- 2.38 Fixed a bug that caused an error to be thrown when host contained percent-encoded characters (thanks to Roel van der Hoorn).
- 2.37 Fixed bug in make-uri when passed no path, added basic tests for make-uri.
- 2.36 Added procedure make-uri
- 2.35 Added some extra checks so we do not try to parse URIs containing invalid (non-hexnum) percent-encoding. Add code to preserve empty path segments during parsing and when performing relative reference resolution.
- 2.34 Fix two bugs that show up in very rare cases (possibly never in practice). One caused issues when creating relative paths from two URIs where one URI had a path that was a prefix of the other, the other caused issues when a relative URI's path containing ".." as last component was resolved.
- 2.33 Path component for empty absolute path directly followed by query is now represented the same as empty path without query.
- 2.32 Empty absolute path directly followed by query is now properly recognised as an URI reference.
- 2.31 Return #f in constructors if unconsumed input remains after parsing
- 2.3 Add predicates uri-path-relative? and uri-path-absolute?
- 2.2 Improvements to uri->string.
- 2.1 Add new predicates for URIs, absolute URIs and relative references. Fix absolute-uri so it raises a condition when passing in a non-absolute uri string, instead of returning a string with the error. Also throw an error if a fragment is detected in the string.
- 2.0 Export char-sets, add char-set arg to uri-encode/uri-decode, do not decode query args as x-www-form-urlencoded, change path representation. Lots of bugfixes.
- 1.12 Fix relative path normalization when original path ends in a slash, remove consecutive slashes from paths in URIs
- 1.11 Added accessors for the authority components, functional update procedures. Fixed case-normalization.
- 1.10 Fixed edge case in uri-relative-to with empty path in base uri, fixed uri->string for URIs with query args, fixed uri->string to not add an extraneous slash after authority in case of empty path.
- 1.9 Fixed bug in uri-encode-string with reserved characters, added tests for decoding and encoding [Peter Bex]
- 1.8 Added uri-encode-string and uri-decode-string. URI constructors now perform automatic normalization of percent-encoded unreserved characters. [suggested by Peter Bex]
- 1.6 Added error message about missing scheme in absolute-uri.
- trunk Small bugfix in absolute-uri. [Peter Bex]
- 1.5 Bug fixes in uri->string and absolute-uri. [reported by Peter Bex]
- 1.3 Ported to Hygienic Chicken and the test egg [Peter Bex]
- 1.2 Now using defstruct instead of define-record [suggested by Peter Bex]
- 1.1 Added utf8 compatibility
- 1.0 Initial Release
License
Based on the Haskell URI library by Graham Klyne <gk@ninebynine.org>.
Copyright 2008-2024 Ivan Raikov, Peter Bex, Seth Alves. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.