uri-common
Description
The uri-common library provides simple and easy-to-use parsing and manipulation procedures for URIs using common schemes.
These "common schemes" all have the following rules:
- An empty path after the hostname is considered to be identical to the root path.
- All components are to be fully URI-decoded (so no percent-encoded characters in it).
- The query argument will be in application/x-www-form-urlencoded form.
- The port is automatically determined if it is omitted and the URI scheme is known.
Library Procedures
This library replaces most of the procedures in uri-generic. If you need to work with URIs on the uri-generic level or need to work with both uri-generic and uri-common URI objects, you will have to import and prefix or rename procedures.
Constructors and predicates
These constructors fully decode their arguments, so afterwards it is impossible to distinguish between encoded delimiters and unencoded delimiters. This makes uri-common objects decoding endpoints; no further decoding on the URI level is possible (of course, applications are free to decode further information inside the URI). If for some reason, the original URI is still needed, it can be converted to a uri-generic. However, updating a URI component causes this component's original encoding to be lost, so be careful!
[procedure] (uri-reference STRING) => URIA URI reference is either a URI or a relative reference (RFC 3986, Section 4.1). If the given string's prefix does not match the syntax of a scheme followed by a colon separator, then the given string is parsed as a relative reference.
[procedure] (absolute-uri STRING) => URIParses the given string as an absolute URI, in which no fragments are allowed. If no URI scheme is found, or a fragment is detected, this raises an error.
Absolute URIs are defined by RFC 3986 as non-relative URI references without a fragment (RFC 3986, Section 4.2). Absolute URIs can be used as a base URI to resolve a relative-ref against, using uri-relative-to (see below).
[procedure] (make-uri #!key scheme path query fragment host port username password) => URIConstructs a URI from the given components.
Accessors
[procedure] (uri-scheme uri-common) => symbol[procedure] (uri-path uri-common) => list
[procedure] (uri-query uri-common) => alist
[procedure] (uri-fragment uri-common) => string
[procedure] (uri-host uri-common) => string
[procedure] (uri-port uri-common) => integer
[procedure] (uri-username uri-common) => string
[procedure] (uri-password uri-common) => string
Accessors for URI-common objects.
If a component is not defined in the given URI-common, then the corresponding accessor returns #f, except for uri-query and uri-path, which both always return a (possibly empty) list.
Updater
[procedure] (update-uri URI-common #!key scheme path query fragment host port username password) => URI-commonUpdate the specified keys in the URI-common object in a functional way (ie, it creates a new copy with the modifications).
Here's a nice tip: If you want to create an URI with only a few components set to dynamic values extracted from elsewhere, you can generally create an empty URI and update its constituent parts.
You can do that like this:
(uri->string (update-uri (uri-reference "") path: '("example" "greeting") query: '((hi . "there")))) => "example/greeting?hi=there"
Predicates
There are several predicates to check whether objects are URI references (the most general type of an URI-like object), or more specific types of URIs like absolute URIs or relative references. The classification tree of URI-like objects looks a bit like this:
uri-reference Anything defined by the RFC fits this / \ uri relative-ref Scheme (uri) or no scheme (relative-ref)? / / \ absolute-uri path-relative path-absolute No URI fragment(absolute-uri)? | path starts with a slash (path-absolute) or not (path-relative)?[procedure] (uri-reference? URI) => BOOL
Is the given object a URI reference? All objects created by URI-common constructors are URI references; they are either URIs or relative references. The constructors below are just more strict checking versions of uri-reference. They all create URI references.
[procedure] (absolute-uri? URI) => BOOLIs the given object an absolute URI?
[procedure] (uri? URI) => BOOLIs the given object a URI? URIs are all URI references that include a scheme part. The other type of URI references are relative references.
[procedure] (relative-ref? URI) => BOOLIs the given object a relative reference? Relative references are defined by RFC 3986 as URI references which are not URIs; they contain no URI scheme and can be resolved against an absolute URI to obtain a complete URI using uri-relative-to.
[procedure] (uri-path-absolute? URI) => BOOLIs the URI's path component an absolute path?
[procedure] (uri-path-relative? URI) => BOOLIs the URI's path component a relative path?
[procedure] (uri-default-port? URI) => BOOLIs the URI's port the default port for the URI's scheme?
Reference Resolution
[procedure] (uri-relative-to URI URI) => URIResolve the first URI as a reference relative to the second URI, returning a new URI (RFC 3986, Section 5.2.2).
[procedure] (uri-relative-from URI URI) => URIConstructs a new, possibly relative, URI which represents the location of the first URI with respect to the second URI.
(import uri-common) (uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/"))) => "http://example.com/foo/qux" (uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/"))) => "../qux"
Query encoding and decoding
[parameter] (form-urlencoded-separator [char-set/char/string])[procedure] (form-urlencode alist #!key (separator (form-urlencoded-separator))) => string
[procedure] (form-urldecode string #!key (separator (form-urlencoded-separator))) => alist
Encode or decode an alist using the encoding corresponding to the form-urlencoded media type, using the given separator character(s).
The alist contains key/value pairs corresponding to the values in the final urlencoded string. If a value is #f, the key will be omitted from the string. If it is #t the key will be present without a value. In all other cases, the value is converted to a string and urlencoded. The keys are always converted to a string and urlencoded.
When encoding, if separator is a string, the first character will be used as the separator in the resulting querystring. If it is a char-set, it will be converted to a string and its first character will be taken. In either case, all of these characters are encoded if they occur inside the key/value pairs.
When decoding, any character in the set (or string) will be seen as a separator.
The separator defaults to the string ";&". This means that either semicolons or ampersands are allowed as separators when decoding an URI string, but semicolons are used when generating strings.
If you would like to use a different separator, you should parameterize all calls to procedures that return an uri-common object.
Decoding will raise a condition when the string contains percent-encoded bytes which are invalid when interpreted in UTF-8 encoding.
Examples:
(form-urlencode '(("lemon" . "ade") (sucks . #f) (rocks . #t) (number . 42))) => "lemon=ade;rocks;number=42" (form-urldecode "lemon=ade;rocks;number=42") => ((lemon . "ade") (rocks . #t) (number . "42"))
String encoding and decoding
A little more generic but also more low-level than encoding/decoding whole query strings/alists at a time, you can also encode and decode strings on an individual level.
[procedure] (uri-encode-string STRING [CHAR-SET]) => STRINGReturns the percent-encoded form of the given string. The optional char-set argument controls which characters should be encoded. It defaults to the complement of char-set:uri-unreserved. This is always safe, but often overly careful; it is allowed to leave certain characters unquoted depending on the context.
[procedure] (uri-decode-string STRING [CHAR-SET]) => STRINGReturns the decoded form of the given string. The optional char-set argument controls which characters should be decoded. It defaults to char-set:full.
Normalization
[procedure] (uri-normalize-case URI) => URIURI case normalization (RFC 3986 section 6.2.2.1)
[procedure] (uri-normalize-path-segments URI) => URIURI path segment normalization (RFC 3986 section 6.2.2.3)
uri-generic, string and list representation
[procedure] (uri->uri-generic uri-common) => uri-generic[procedure] (uri-generic->uri uri-common) => uri-common
To convert between uri-generic and uri-common objects, use these procedures. As stated above, this will allow you to retrieve the original encoding of the URI components, but once you update a component from the uri-common side, the original encoding is no longer available (the updated value replaces the original value).
This will raise a condition when the string contains percent-encoded bytes which are invalid when interpreted in UTF-8 encoding, because these procedures attempt to further decode all percent-encoded characters in these components. The uri-generic egg keeps any special characters percent-encoded because there might be further encoding happening in e.g. the path or query.
[procedure] (uri->string uri-common [userinfo]) => stringReconstructs the given URI into a string; uses a supplied function LAMBDA USERNAME PASSWORD -> STRING to map the userinfo part of the URI. If not given, it represents the userinfo as the username followed by ":******".
[procedure] (uri->list URI USERINFO) => LISTReturns a list of the form (SCHEME SPECIFIC FRAGMENT); SPECIFIC is of the form (AUTHORITY PATH QUERY).
Character sets
As a convenience for further sub-parsers or other special-purpose URI handling code like separately URI-encoding strings, there are a couple of character sets exported by uri-common.
[constant] char-set:gen-delimsGeneric delimiters.
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"[constant] char-set:sub-delims
Sub-delimiters.
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="[constant] char-set:uri-reserved
The union of gen-delims and sub-delims; all reserved URI characters.
reserved = gen-delims / sub-delims[constant] char-set:uri-unreserved
All unreserved characters that are allowed in a URI.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Note that this is _not_ the complement of char-set:uri-reserved! There are several characters (even printable, noncontrol characters) which are not allowed at all in a URI.
Requires
Repository
This egg is hosted on the CHICKEN Subversion repository:
https://anonymous@code.call-cc.org/svn/chicken-eggs/release/6/uri-common
If you want to check out the source code repository of this egg and you are not familiar with Subversion, see this page.
Version History
- 3.0 Port to CHICKEN 6
- 2.0 Port to CHICKEN 5
- 1.4 Do not reset the port when switching schemes, but keep it around so when it's the default for a scheme, it won't be printed, and when switching to another scheme it will be printed again if it's not the default port for this scheme. This makes the interface more composable and less surprising (reported by Kristian Lein-Mathisen).
- 1.3 Added make-uri constructor.
- 1.2 re-exported uri-encode-string, uri-decode-string and the various charsets from uri-generic. Remove bogus charset encoding rules for fragments (fall back to normal uri encoding)
- 1.1 Fixed x-www-form-urlencoded encoding so it encodes even characters that do not strictly need to be encoded according to the URI spec, but do according to the x-www-form-urlencoded spec.
- 1.0 Fix a bug that caused empty lists to be treated differently from lists containing only false values in form-urlencode
- 0.10 Fix urlencoded-separator first char selection in form-urlencode
- 0.9 Automatically convert non-strings to strings in creating queries
- 0.8 Actually export form-urlencoded-separator
- 0.7 Fix silly bug in the predicates from 0.6 (it helps to test first...)
- 0.6 Add predicates uri-path-relative? and uri-path-absolute?
- 0.5 Add uri-default-port? predicate procedure
- 0.4 Add uri->list conversion procedure
- 0.3 Fix dependency info (requires at least uri-generic 2.1)
- 0.2 Add predicates for URIs, absolute URIs and relative references, matching the ones in uri-generic.
- 0.1 Initial Release
License
Copyright 2008-2024 Peter Bex All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.