Intarweb
Description
Intarweb is an advanced http library. It parses all headers into more useful Scheme values. It serves as a low-level basis for http servers or clients. For a more high-level server library that is based on intarweb, see spiffy. For a high-level client library based on intarweb, see http-client.
You would rarely need to build an application directly on top of raw Intarweb, but when using one of the above-mentioned libraries you often need to interact with intarweb's API.
Author
Repository
https://code.more-magic.net/intarweb
Requirements
Documentation
The intarweb egg is designed to be used from a variety of situations. For this reason, it does not try to be a full HTTP client or server. If you need that kind of functionality, see eggs like spiffy or http-client.
Requests
[procedure] (make-request #!key uri port (method 'GET) (major 1) (minor 1) (headers (headers '())))Create a request object (a defstruct-type record). The client will generally write requests, while the server will read them.
The URI defines the entity to retrieve on the server, which should be a uri-common-type URI object. The PORT is the scheme I/O port where the request is written to or read from. The METHOD is a symbol that defines the HTTP method to use (case sensitive). MAJOR and MINOR identify the major and minor version of HTTP to use. Currently, 0.9, 1.0 and 1.1 are supported. HTTP/0.9 support is disabled by default for security reasons (HTTP/0.9 has nothing to identify it as HTTP, which means it cannot be distinguished from any completely different service. This can cause HTML interpretation of, say, SMTP or FTP responses which might contain attacker-supplied data).
HEADERS must be a headers object. See below for more information about headers.
[procedure] (update-request old-request #!key uri port method major minor headers)Like make-request, except this takes an old-request object as a template for values which are missing from the parameter list, thereby providing a way to do a purely functional update of that object.
[procedure] (request? obj)Predicate which returns #t when obj is a request object as returned by make-request, #f otherwise.
[procedure] (request-uri REQUEST) => URI[procedure] (request-port REQUEST) => PORT
[procedure] (request-method REQUEST) => SYMBOL
[procedure] (request-major REQUEST) => NUMBER
[procedure] (request-minor REQUEST) => NUMBER
[procedure] (request-headers REQUEST) => HEADERS
An existing request can be picked apart with these accessors. The uri may be #f in case of "*" (in an options request, for instance).
[procedure] (write-request REQUEST) => REQUESTWrite a request line with headers to the server. In case it is a request type that has any body data, this should be written to the the request's port. Beware that this port can be modified by write-request, so be sure to write to the port as it is returned by the write-request procedure!
You'll need to remember to call finish-request-body after you're done with the request if you have request data to write.
[procedure] (finish-request-body REQUEST) => REQUESTFinalize the request body. You must call this after writing a request body (but don't call it if there's no request body). This is required for chunked requests to have a proper trailer, otherwise the client may keep waiting for more data. In a pipeline of a keep-alive request it may cause other issues as well, if you don't call this.
[procedure] (read-request PORT) => REQUESTReads a request object from the given input-port. An optional request body can be read from the request-port after calling this procedure.
If an end of file is returned before anything can be read (when the connection was closed by the remote end before it sent the request), #f is returned. In case of an invalid HTTP request line, an exception of type (exn http unknown-protocol-line) is raised.
NOTE: Currently, CONNECT-type requests which have an "authority" (hostname) between the method and the protocol version are not recognised and will result in an unrecognised protol line error. This is a limitation of intarweb.
Requests with an asterisk (like OPTIONS * HTTP/1.1) will cause #f to appear in the request object's uri slot.
[parameter] (request-parsers [LIST])Requests are parsed using parse procedures, which can be customized by overriding this parameter.
LIST is a list of procedures which accept a request line string and produce a request object, or #f if the request is not of the type handled by that procedure.
The predefined request parsers are:
- http-0.9-request-parser
- http-1.x-request-parser
[procedure] (http-1.x-request-parser STRING) => REQUEST
Predefined request parsers for use with request-parsers.
[parameter] (request-unparsers [LIST])Requests are written using unparse procedures, which can be customized by overriding this parameter.
LIST is list of procedures which accept a request object and write to the request's output port and return the new, possibly updated request object. If the request object is not unparsed by this handler, it returns #f.
The predefined request unparsers are:
- http-0.9-request-unparser
- http-1.0-request-unparser
- http-1.x-request-unparser
[procedure] (http-1.0-request-unparser REQUEST) => REQUEST
[procedure] (http-1.x-request-unparser REQUEST) => REQUEST
Predefined request unparsers for use with request-unparsers. They return the request, and as a side effect they write the request to the request object's port.
Responses
[procedure] (make-response #!key port status (code 200) (reason "OK") (major 1) (minor 1) (headers (headers '())))Create a response, a defstruct-type record. A server will usually write a response with write-response; a client will read it with read-response.
You can either supply a status symbol or a code and/or reason to set the response status. If you do, the code will be set to match that symbol and the reason is set to the default reason belonging to that code, as provided by the HTTP standard(s). The allowed symbols are generally just Schemified versions of the default reason.
Only the code and reason are actual fields in the object; the status is a virtual field.
See http-status-codes for a list of all known default statuses.
[procedure] (update-response old-response #!key port status code reason major minor headers)Like make-response, except this takes an old-response object as a template for values which are missing from the parameter list, thereby providing a way to do a purely functional update of that object.
[procedure] (response? obj)Predicate which returns #t when obj is a response object as returned by make-response, #f otherwise.
[procedure] (response-port RESPONSE) => PORT[procedure] (response-code RESPONSE) => NUMBER
[procedure] (response-reason RESPONSE) => STRING
[procedure] (response-class RESPONSE-OR-CODE) => NUMBER
[procedure] (response-major RESPONSE) => NUMBER
[procedure] (response-minor RESPONSE) => NUMBER
[procedure] (response-headers RESPONSE) => HEADERS
[procedure] (response-status RESPONSE-OR-CODE) => SYMBOL
An existing response can be picked apart using these accessors.
The PORT, MAJOR, MINOR and HEADERS are the same as for requests. CODE and REASON are an integer status code and the short message that belongs to it, as defined in the spec (examples include: 200 OK, 301 Moved Permanently, etc). CLASS is the major class of the response code (100, 200, 300, 400 or 500). response-class can be called either on a response object or directly on a response code number.
response-status attempts to fetch the symbolic status from the response object based on its response code. If no matching symbol can be found in the http-status-codes parameter, an exception is thrown. response-status can also be called on a response object or directly on a response code number.
[procedure] (response-port-set! RESPONSE PORT)[procedure] (response-code-set! RESPONSE NUMBER)
[procedure] (response-reason-set! RESPONSE STRING)
[procedure] (response-major-set! RESPONSE NUMBER)
[procedure] (response-minor-set! RESPONSE NUMBER)
[procedure] (response-headers-set! RESPONSE HEADERS)
[procedure] (response-status-set! RESPONSE SYMBOL)
These procedures mutate an existing response object and set the corresponding slot. response-status-set! will attempt to look up the code and reason in http-status-codes and set both slots. If the symbolic status is unknown, an exception is thrown.
[parameter] (http-status-codes [ALIST])This is an alist mapping symbolic status indicators to HTTP codes and reason strings.
These can be used to make your code a bit more expressive and to reduce duplication of hardcoded strings; instead of using a numeric "magic number" HTTP code plus the same human-readable string everywhere the same code occurs, you can instead use a descriptive symbol.
The default value of this mapping is as follows:
((continue . (100 . "Continue"))
(switching-protocols . (101 . "Switching Protocols"))
(processing . (102 . "Processing"))
(ok . (200 . "OK"))
(created . (201 . "Created"))
(accepted . (202 . "Accepted"))
(non-authoritative-information . (203 . "Non-Authoritative Information"))
(no-content . (204 . "No Content"))
(reset-content . (205 . "Reset Content"))
(partial-content . (206 . "Partial Content"))
(multi-status . (207 . "Multi-Status"))
(already-reported . (208 . "Already Reported"))
(im-used . (226 . "IM Used"))
(multiple-choices . (300 . "Multiple Choices"))
(moved-permanently . (301 . "Moved Permanently"))
(found . (302 . "Found"))
(see-other . (303 . "See Other"))
(not-modified . (304 . "Not Modified"))
(use-proxy . (305 . "Use Proxy"))
(temporary-redirect . (307 . "Temporary Redirect"))
(bad-request . (400 . "Bad Request"))
(unauthorized . (401 . "Unauthorized"))
(payment-required . (402 . "Payment Required"))
(forbidden . (403 . "Forbidden"))
(not-found . (404 . "Not Found"))
(method-not-allowed . (405 . "Method Not Allowed"))
(not-acceptable . (406 . "Not Acceptable"))
(proxy-authentication-required . (407 . "Proxy Authentication Required"))
(request-time-out . (408 . "Request Time-out"))
(conflict . (409 . "Conflict"))
(gone . (410 . "Gone"))
(length-required . (411 . "Length Required"))
(precondition-failed . (412 . "Precondition Failed"))
(request-entity-too-large . (413 . "Request Entity Too Large"))
(request-uri-too-large . (414 . "Request-URI Too Large"))
(unsupported-media-type . (415 . "Unsupported Media Type"))
(requested-range-not-satisfiable . (416 . "Requested Range Not Satisfiable"))
(expectation-failed . (417 . "Expectation Failed"))
(unprocessable-entity . (422 . "Unprocessable Entity"))
(locked . (423 . "Locked"))
(failed-dependency . (424 . "Failed Dependency"))
(upgrade-required . (426 . "Upgrade Required"))
(precondition-required . (428 . "Precondition Required"))
(too-many-requests . (429 . "Too Many Requests"))
(request-header-fields-too-large . (431 . "Request Header Fields Too Large"))
(internal-server-error . (500 . "Internal Server Error"))
(not-implemented . (501 . "Not Implemented"))
(bad-gateway . (502 . "Bad Gateway"))
(service-unavailable . (503 . "Service Unavailable"))
(gateway-time-out . (504 . "Gateway Time-out"))
(http-version-not-supported . (505 . "HTTP Version Not Supported"))
(insufficient-storage . (507 . "Insufficient Storage"))
(loop-detected . (508 . "Loop Detected"))
(not-extended . (510 . "Not Extended"))
(network-authentication-required . (511 . "Network Authentication Required")))
[procedure] (write-response RESPONSE) => RESPONSE
Write the response object RESPONSE to the response-port.
If there is a response body, this must be written to the response-port after sending the response headers. You'll need to remember to call finish-response-body after you're done with the response if you have response data to write.
[procedure] (finish-response-body RESPONSE) => RESPONSEFinalize the response body. You must call this after writing a response body (but don't call it if there's no response body). This is required for chunked responses to have a proper trailer, otherwise the client may keep waiting for more data. In a pipeline of a keep-alive request it may cause other issues as well, if you don't call this.
[procedure] (read-response PORT) => RESPONSEReads a response object from the port. An optional response body can be read from the response-port after calling this procedure.
If an end of file is returned before anything can be read (when the connection was closed by the remote end before it sent the response), #f is returned. In case of an invalid HTTP response line, an exception of type (exn http unknown-protocol-line) is raised.
[parameter] (response-parsers [LIST])Responses are parsed using parse procedures, which can be customized by overriding this parameter.
LIST is a list one of procedures which accept a response line string and produce a response object, or #f if the response is not of the type handled by that procedure.
The predefined response parsers are:
- http-0.9-response-parser
- http-1.0-response-parser
- http-1.x-response-parser
[procedure] (http-1.0-response-parser REQUEST) => REQUEST
[procedure] (http-1.x-response-parser REQUEST) => REQUEST
Predefined response parsers for use with response-parser.
[parameter] (response-unparsers [LIST])Responses are written using unparse procedures, which can be customized by overriding this parameter.
LIST is a list of procedures which accept a response object and write to the response's output port and return the new, possibly updated response object. If the response object is not unparsed by this handler, it returns #f.
The predefined response unparsers are the following:
- http-0.9-response-unparser
- http-1.0-response-unparser
- http-1.x-response-unparser
[procedure] (http-1.0-response-unparser REQUEST) => REQUEST
[procedure] (http-1.x-response-unparser REQUEST) => REQUEST
Predefined response unparsers for use with response-unparser.
Headers
[procedure] (headers ALIST [HEADERS]) => HEADERSThis creates a header object based on an input list.
Requests and responses contain HTTP headers wrapped in a special header-object to ensure they are properly normalized.
The input list has header names (symbols) as keys, and lists of values as values:
(headers `((host ("example.com" . 8080))
(accept #(text/html ((q . 0.5)))
#(text/xml ((q . 0.1))))
(authorization #("Bearer: token" raw)))
old-headers)
This adds the named headers to the existing headers in old-headers. The host header is a pair of hostname/port. The accept header is a list of allowed mime-type symbols. The authorization header is a list with a raw value.
As can be seen here, optional parameters or "attributes" can be added to a header value by wrapping the value in a vector of length 2. The first entry in the vector is the header value, the second is an alist of attribute name/value pairs, or the symbol raw, in which case the header value will be kept as-is when writing the response.
[procedure] (headers->list HEADERS) => ALISTThis converts a header object back to a list. See headers for details.
[procedure] (header-values NAME HEADERS) => LISTObtain the value of header NAME in the HEADERS object.
The NAME of the header is a symbol; this procedure will return all the values of the header (for example, the Accept header will have several values that indicate the set of acceptable mime-types).
[procedure] (header-value NAME HEADERS [DEFAULT]) => valueIf you know in advance that a header has only one value, you can use header-value instead of header-values. This will return the first value in the list, or the provided default if there is no value for that header.
[procedure] (header-params NAME HEADERS) => ALISTThis will return all the params for a given header, assuming there is only one header. An empty list is returned if the header does not exist.
If the header is a "raw" one, the symbol raw is returned.
[procedure] (header-param PARAM NAME HEADERS [DEFAULT]) => valueThis will return a specific parameter for the header, or DEFAULT if the parameter isn't present or the header does not exist. This also assumes there's only one header.
If the header is a raw one, DEFAULT will be returned.
[procedure] (header-contents NAME HEADERS) => LIST[procedure] (get-value VECTOR) => value
[procedure] (get-params VECTOR) => ALIST
[procedure] (get-param PARAM VECTOR [DEFAULT]) => value
Procedures such as header-values are just shortcuts; these are the underlying procedures to query the raw contents of a header.
Header contents are lists of 2-element vectors; the first value containing the value for the header and the second value containing an alist with "parameters" for that header value. Parameters are attribute/value pairs that define further specialization of a header's value. For example, the accept header consists of a list of mime-types, which optionally can have a quality parameter that defines the preference for that mime-type. All parameter names are downcased symbols, just like header names.
Here's a few examples on how to retrieve info from headers:
;; This would be returned by a server and retrieved via (response-headers r): (define example-headers (headers '((accept #(text/html ((q . 0.1))) #(text/xml ((q . 0.5))) text/plain) (allow HEAD GET) (content-type #(text/html ((charset . utf-8)))) (max-forwards 2)))) ;;; Basic procedures (define c (header-contents 'accept example-headers)) c ; => (#(text/html ((q . 0.5))) #(text/xml ((q . 0.1))) #(text/plain ())) (get-value (car c)) ; => text/html (get-params (car c)) ; => ((q . 0.5)) (get-param 'q (car c)) ; => 0.5 ;;; Simplified helpers (header-values 'accept example-headers) ; => (text/html text/xml text/plain) (header-values 'max-forwards example-headers) ; => (2) (header-values 'nonexistent-header example-headers) ; => () ;; This assumes there's only one value (returns the first) (header-value 'max-forwards example-headers) ; => 2 (header-value 'nonexistent-header example-headers) ; => #f (header-value 'nonexistent-header example-headers 'not-here) ; => not-here ;; Tricky: (header-value 'accept example-headers) ; => text/html ;; This is tricky: this just returns the first, which is not the preferred (header-params 'accept example-headers) ; => ((q . 0.1)) ;; Quick access (header-param 'charset 'content-type example-headers) ; => utf-8
Header types
The headers all have their own different types. Here follows a list of headers with their value types:
Header name | Value type | Example value |
---|---|---|
accept | List of mime-types (symbols), with optional q attribute indicating "quality" (preference level) | (text/html #(text/xml ((q . 0.1)))) |
accept-charset | List of charset-names (symbols), with optional q attribute | (utf-8 #(iso-8859-5 ((q . 0.1)))) |
accept-encoding | List of encoding-names (symbols), with optional q attribute | (gzip #(identity ((q . 0)))) |
accept-language | List of language-names (symbols), with optional q attribute | (en-gb #(nl ((q . 0.5)))) |
accept-ranges | List of range types acceptable (symbols). The spec only defines bytes and none. | (bytes) |
age | Age in seconds (number) | (3600) |
allow | List of methods that are allowed (symbols). | (GET POST PUT DELETE) |
authorization | Authorization information. This consists of a symbol identifying the authentication scheme, with scheme-specific attributes. basic is handled specially, as if it were a regular symbol with two attributes; username and password. | (#(basic ((username . "foo") (password . "bar"))) #(digest ((qop . auth) (username . "Mufasa") (nc . 1)))) |
cache-control | An alist of key/value pairs. If no value is applicable, it is #t | ((public . #t) (max-stale . 10) (no-cache . (age set-cookie))) |
connection | A list of connection options (symbols) | (close) |
content-disposition | A symbol indicating the disposition | (#(inline ((filename . "test.pdf")))) |
content-encoding | A list of encodings (symbols) applied to the entity-body. | (deflate gzip) |
content-language | The natural language(s) of the "intended audience" (symbols) | (de nl en-gb) |
content-length | The number of bytes (an exact number) in the entity-body | (10) |
content-location | A location that the content can be retrieved from (a uri-common object) | (<#uri-common# ...>) |
content-md5 | The MD5 checksum (a string) of the entity-body | ("12345ABCDEF") |
content-range | Content range (list with start- and endpoint and total) of the entity-body, if partially sent. Uses #f to encode a *. | ((25 120 1234) (25 #f 1234) (1 100 #f)) |
content-type | The mime type of the entity-body (a symbol) | (#(text/html ((charset . iso-8859-1)))) |
date | A timestamp (10-element vector, see string->time) at which the message originated. Important: Note that you will always need to supply (an empty list of) attributes, because otherwise it is ambiguous whether it's a vector with attribs or a bare timestamp. | (#(#(42 23 15 20 6 108 0 309 #f 0) ())) |
etag | An entity-tag (pair, car being either the symbol weak or strong, cdr being a string) that uniquely identifies the resource contents. | ((strong . "foo123")) |
expect | Expectations of the server's behaviour (alist of symbol-string pairs), possibly with parameters. | (#(((100-continue . #t)) ())) |
expires | Expiry timestamp (10-element vector, see string->time) for the entity. Also see the note for date | (#(#(42 23 15 20 6 108 0 309 #f 0) ())) |
from | The e-mail address (a string) of the human user who controls the client | ("info@example.com") |
host | The host to use (for virtual hosting). This is a pair of hostname and port. The port will be #f if the port should be the default one for the requested service. | (("example.com" . 8080)) |
if-match | Either '* (a wildcard symbol) or a list of entity-tags (pair, weak/strong symbol and unique entity identifier string). | ((strong . "foo123") (strong . "bar123")) |
if-modified-since | Timestamp (10-element vector, see string->time) which indicates since when the entity must have been modified. | (#(#(42 23 15 20 6 108 0 309 #f 0) ())) |
if-none-match | Either '* (a wildcard symbol) or a list of entity-tags (pair, weak/strong symbol and unique entity identifier symbol). | ((strong . foo123) (strong . bar123)) |
if-range | The range to request, if the entity was unchanged | TODO |
if-unmodified-since | A timestamp (10-element vector, see string->time) since which the entity must not have been modified | (#(#(42 23 15 20 6 108 0 309 #f 0) ())) |
last-modified | A timestamp (10-element vector, see string->time) when the entity was last modified | (#(#(42 23 15 20 6 108 0 309 #f 0) ())) |
location | A location (an URI object) to which to redirect | (<#uri-object ...>) |
max-forwards | The maximum number of proxies that can forward a request | (2) |
pragma | An alist of symbols containing implementation-specific directives. | ((no-cache . #t) (my-extension . my-value)) |
proxy-authenticate | Proxy authentication request. Equivalent to www-authenticate, for proxies. | (#(basic ((realm . "foo")) ) #(digest ((realm . "foo") (domain . (<#uri object> <#uri object>)) (qop . (auth auth-int)) (nonce . "012345abc")))) |
proxy-authorization | The answer to a proxy-authentication request. Equivalent to authorization, for proxies. | (#(basic ((username . "foo") (password . "bar"))) #(digest ((qop . auth) (username . "Mufasa") (nc . 1)))) |
range | The range of bytes (a pair of start and end) to request from the server. Uses #f when that end of the range is missing (i.e., in an open-ended range like -100 or 25-). | ((25 120) (25 #f)) |
referer | The referring URL (uri-common object) that linked to this one. | (<#uri-object ...>) |
retry-after | Timestamp (10-element vector, see string->time) after which to retry the request if unavailable now. | (#(#(42 23 15 20 6 108 0 309 #f 0) ())) |
server | List of products the server uses (list of 3-tuple lists of strings; product name, product version, comment. Version and/or comment may be #f). Note that this is a single header, with a list inside it! | ((("Apache" "2.2.9" "Unix") ("mod_ssl" "2.2.9" #f) ("OpenSSL" "0.9.8e" #f) ("DAV" "2" #f) ("mod_fastcgi" "2.4.2" #f) ("mod_apreq2-20051231" "2.6.0" #f))) |
te | Allowed transfer-encodings (symbols, with optional q attribute) for the response | (deflate #(gzip ((q . 0.2)))) |
trailer | Names of header fields (symbols) available in the trailer/after body | (range etag) |
transfer-encoding | The encodings (symbols) used in the body | (chunked) |
upgrade | Product names to which must be upgraded. (pairs of strings. The car is always a product name and the cdr is either a version or #f.) Note that these strings must be compared either case-insensitively or case-sensitively depending on the RFC you're implementing... | (("TLS" . "1.0") ("upgrade" . #f)) |
user-agent | List of products the user agent uses (list of 3-tuple lists of strings; product name, product version, comment. Version and/or comment may be #f). Note that this is a single header, with a list inside it! | (user-agent (("Mozilla" "5.0" "X11; U; NetBSD amd64; en-US; rv:1.9.0.3") ("Gecko" "2008110501" #f) ("Minefield" "3.0.3" #f))) |
vary | The names of headers that define variation in the resource body, to determine cachability (symbols) | (range etag) |
via | The intermediate hops through which the message is forwarded (strings) | TODO |
warning | Warning code for special status | TODO |
www-authenticate | If unauthorized, a challenge to authenticate (symbol, with attributes) | (#(basic ((realm . "foo"))) #(digest ((realm . "foo") (domain . (<#uri object> <#uri object>)) (qop . (auth auth-int)) (nonce . "012345abc")))) |
set-cookie | Cookies to set (name/value pair (both strings), with attributes) | (#(("foo" . "bar") ((max-age . 10) (port . '(80 8080)))) |
strict-transport-security | An alist of key/value pairs. Attributes are ignored. | ((max-age . 10) (includesubdomains . #t)) |
cookie | Cookies that were set (name/value string pair, with attributes) | (#(("foo" . "bar") ((version . 1) (path . #(uri path: (/ ""))) (domain . "foo.com")))) |
x-forwarded-for | The chain of IP addresses which each intermediate proxy will add to. Plain strings representing the IP-address or "unknown" when the proxy couldn't determine the client address or the option is disabled. Never accept this value without question; it can easily be spoofed! | ("192.168.1.2" "unknown" "123.456.789.012" "some-made-up-value-by-an-attacker") |
Any unrecognised headers are assumed to be raw multi-headers, and the entire header lines are put unparsed into a list, one entry per line.
Header parsers and unparsers
[parameter] (header-parsers [ALIST])[parameter] (header-unparsers [ALIST])
The parsers and unparsers used to read and write header values can be customized with these parameters.
These (un)parsers are indexed with as key the header name (a symbol) and the value being a procedure.
A header parser accepts the contents of the header (a string, without the leading header name and colon) and returns a list of vectors which represents the values of the header. For headers that are supposed to only have a single value, the last value in the list will be stored as the value (as determined by single-headers).
A header unparser accepts one argument: the header's contents (a vector). It should return a list of strings, each of which represents one line's worth of header contents (without the header name). For each entry, a header line will automatically be printed with the header name preceding it.
The parser driver will call update-header-contents! with the parser's result.
[parameter] (header-parse-error-handler [HANDLER])When there is an error parsing a given header, this parameter's procedure will be invoked.
HANDLER is a procedure accepting four values: the header name, the header contents, the current headers and the exception object. The procedure must return the new headers. Defaults to a procedure that simply returns the current headers. When an error occurs while parsing the header line itself (for example when a colon is missing between the header name and contents), the error will not be caught.
In such a case, Servers should return a 400 Bad Request error and clients should error out. The reason that malformed error lines are ignored is that there are several servers and clients that send headers content values that are slightly off, even though the rest of the request is OK. In the interest of the "robustness principle", it's best to simply ignore these headers with "bad" content values.
[procedure] (replace-header-contents NAME CONTENTS HEADERS) => HEADERS[procedure] (replace-header-contents! NAME CONTENTS HEADERS) => HEADERS
[procedure] (update-header-contents NAME CONTENTS HEADERS) => HEADERS
[procedure] (update-header-contents! NAME CONTENTS HEADERS) => HEADERS
The replace procedures replace any existing contents of the named header with new ones, the update procedures add these contents to the existing header. The procedures with a name ending in bang are linear update variants of the ones without the bang. The header contents have to be normalized to be a 2-element vector, with the first element being the actual value and the second element being an alist (possibly empty) of parameters/attributes for that value.
The update procedures append the value to the existing header if it is a multi-header, and act as a simple replace in the case of a single-header.
If you attempt to add a header that's not a proper 2-element vector, or if you try to combine raw and non-raw header values, this will raise an exception of type (exn http header-value).
[parameter] (single-headers [LIST])Whether a header is allowed once or multiple times in a request or response is determined by this parameter.
The value is a list of symbols that define header-names which are allowed to occur only once in a request/response.
[procedure] (http-name->symbol STRING) => SYMBOL[procedure] (symbol->http-name SYMBOL) => STRING
These procedures convert strings containing the name of a header or attribute (parameter name) to symbols representing the same. The symbols are completely downcased. When converting this symbol back to a string, the initial letters of all the words in the header name or attribute are capitalized.
[procedure] (remove-header name headers) => headers[procedure] (remove-header! name headers) => headers
These two procedures remove all headers with the given name.
Header subparsers and subunparsers
Some headers are modular themselves. This means they need some way to extend them. This is done through subparsers and subunparsers.
[parameter] (authorization-param-subparsers [ALIST])This is an alist of subtypes for the authorization header parser. A subparser of this kind accepts the string containing the header and an integer position in the string. It should parse from that position onwards, and return the parsed contents as an alist of header parameters. Usually, these are actually pseudo-parameters; they don't necessarily have to appear in parameter syntax in the header. The unparser should be configured to expect the same parameters and combine them back into a string, though.
This parameter defaults to:
`((basic . ,basic-auth-subparser)
(digest . ,digest-auth-subparser))
[procedure] (basic-auth-param-subparser STR POS)
Parses STR at POS by extracting the username and password components from a base64-encoded string. These are returned in its first value as an alist with keys username and password. Its second return value is the position after which the next header value may begin.
[procedure] (digest-auth-param-subparser STR POS)Parses STR at POS by reading the various components from a parameter list. These are returned in its first return value as an alist with keys nc, uri, qop and algorithm. Its second return value is the position after which the next header value may begin.
[parameter] (authorization-param-subunparsers [ALIST])This is an alist of subtypes for the authorization header unparser. An unparser of this kind accepts an alist containing the parameters that it needs to unparse and should return a string containing the raw unparsed parameters only.
This parameter defaults to:
`((basic . ,basic-auth-subunparser)
(digest . ,digest-auth-subunparser))
[procedure] (basic-auth-param-subunparser PARAMS)
This unparses the PARAMS alist into a base64-encoded string for basic authentication. It expects username and password parameters.
[procedure] (digest-auth-param-subunparser PARAMS)This unparses the PARAMS alist into a string for digest authentication. It expects username, uri, realm, nonce, cnonce, qop, nc, response, opaque and algorithm parameters. The response parameter should be pre-encoded in the way digest auth expects (this is not done here because the MD5 sum of the contents may be required, which is not available to the parsers).
TODO: This will probably change in the future; the md5 can be passed and all the hard stuff can be done in intarweb.
Other procedures and parameters
[parameter] (http-line-limit [length])The maximum length of any line that's read by intarweb as part of the request/response cycle. This includes the request and response lines as well as the headers. If this is exceeded, an exception of type (exn http line-limit-exceeded) is raised.
You can set this to #f to disable this check. However, this will open up a resource consumption vulnerability (attackers can cause your application to blow up by letting it use all available memory).
Defaults to 4096.
[parameter] (http-header-limit [count])The maximum number of headers that are allowed to be sent, as part of a request or response. If this is exceeded, an exception of type (exn http header-limit-exceeded) is raised.
You can set this to #f to disable this check. However, this will open up a resource consumption vulnerability (attackers can cause your application to blow up by letting it use all available memory).
Defaults to 64.
[procedure] (keep-alive? request-or-response)Returns #t when the given request or response object belongs to a connection that should be kept alive, #f if not. Remember that both parties must agree on whether the connection is to be kept alive or not; HTTP/1.1 defaults to keep alive unless a Connection: close header is sent, HTTP/1.0 defaults to closing the connection, unless a Connection: Keep-Alive header is sent.
[parameter] (request-has-message-body? [predicate])This parameter holds a predicate which accepts a request object and returns #t when the request will have a message body. By default in HTTP/1.1, this is the case for all requests that have a content-length or transfer-coding header. In HTTP/1.0, it is assumed to be true unless there's a Connection: Keep-Alive without a Content-Length header.
The parameter is useful for servers to determine whether to read a request body or not.
[procedure] (read-urlencoded-request-data request [max-length])Convenience procedure to read URLencoded request data (regular POST data; not multipart data!) from the given request object. It will return an alist, as would be returned by form-urldecode from the uri-common egg.
You have to take care of checking the request type whether there really will be request data yourself (it can optionally use request-has-message-body? for this, but it's probably advisable to check the request type anyway).
This will read at most max-length bytes. If not specified, max-length defaults to the current value of http-urlencoded-request-data-limit. If this maximum is exceeded, an exception of type (exn http urlencoded-request-data-limit-exceeded) is raised.
[parameter] (http-urlencoded-request-data-limit [length])Set the default limit for request body data. Defaults to 4194304 (4MB).
[parameter] (response-has-message-body-for-request? [predicate])This parameter holds a predicate which accepts two arguments: a response object and a request object. It returns #t when the response will have a message body for the given request. By default in HTTP/1.1, this is not the case for responses with a response-code of 204 and 304 or in the 1xx class, nor for HEAD requests. All other responses will have a message body.
The parameter is useful for deciding in clients whether a message body will follow (otherwise, trying to read will probably result in an error or in case of HTTP pipelining in a synchronisation problem)
[procedure] (safe? request-or-method)Returns #t when the given request object or symbol (method) is a safe method. A method is defined to be safe when a request of this method will have no side-effects on the server. In practice this means that you can send this request from anywhere at any time and cause no damage.
Important: Quite a lot of software does not abide by these rules! This is not necessarily a reason to treat all methods as unsafe, however. In the words of the standard "the user did not request the side-effects, so therefore cannot be held accountable for them". If a safe method produces side-effects, that's the server-side script developer's fault and he should fix his code.
[parameter] (safe-methods [symbols])A list of methods which are to be considered safe. Defaults to '(GET HEAD OPTIONS TRACE).
[procedure] (idempotent? request-or-method)Returns #t when the given request object or symbol (method) is a idempotent method. A method is defined to be idempotent when a series of identical requests of this method in succession causes the exact same side-effect as just one such request. In practice this means that you can safely retry such a request when an error occurs, for example.
Important: Just as with the safe methods, there is no guarantee that methods that should be idempotent really are idempotent in any given web application. Furthermore, a sequence of requests which each are individually idempotent is not necessarily idempotent as a whole. This means that you cannot replay requests starting anywhere in the chain. To be on the safe side, only retry the last request in the chain.
[parameter] (idempotent-methods [symbols])A list of methods which are to be considered idempotent. Defaults to '(GET HEAD PUT DELETE OPTIONS TRACE).
[procedure] (etag=? a b)Do the etag values a and b strongly match? That is, their car and cdr must be equal, and neither can have a car of weak (both must be strong).
[procedure] (etag=-weakly? a b)Do the etag values a and b weakly match? That is, their car and cdr must be equal. A car of weak is allowed.
[procedure] (etag-matches? etag matches)Does the etag strongly match any of the etags in the list matches? matches is a plain list of etag values, but it can also contain the special symbol *, which matches any etag.
[procedure] (etag-matches-weakly? etag matches)Does the etag weakly match any of the etags in the list matches? matches is a plain list of etag values, but it can also contain the special symbol *, which matches any etag.
Changelog
- 3.0 - Port to CHICKEN 6
- 2.1.0 - Add proper parser for range headers and use the existing range parser to the content-range header for which it was intended. Add some tests for these. Thanks to "Druid of Luhn".
- 2.0.3 - Fix incorrect use of string-downcase!'s return value (fixes #1826)
- 2.0.2 - Export rfc1123-time->string for usage by intarweb consumers.
- 2.0.1 - Fix tests with CHICKEN 5.0.2 and higher.
- 2.0 - Port to CHICKEN 5
- 1.7 - Improve handling of errors while unparsing by showing which header failed, with its value (and the original exception message). Add support for raw encoding (thanks to Kooda for pointing out the necessity of this).
- 1.6.1 - Default request-has-message-body? returns #t for HTTP/1.0 non-keepalive connections.
- 1.6 - Various performance improvements, and support for chunked requests (rather than just responses).
- 1.5 - Fix a bug in serializing multipart/form-data: the field name should not be treated as a path name and stripped of its directory. Thanks to Ryan Senior for reporting this bug and testing the fix.
- 1.4 - Fix a bug in parsing of comments containing quoted strings (reported by Peter Danenberg), and fix exceptions raised by parsing of "Via" headers (reported by Evan Hanson).
- 1.3 - Change the defaults for http-line-limit and http-header-limit to be more accommodating for large header lines (but allow less lines, to keep the same total size). Thanks to Andy Bennett and Roel van der Hoorn for pointing this out. Add full list of HTTP status codes (Thanks to Roel van der Hoorn). Interpret the spec more strictly: Refuse to parse invalid "request" lines with URI references that aren't absolute paths, absolute URIs or asterisks. Thanks to Roel van der Hoorn once again.
- 1.2 Fix request-has-message-body? predicate (thanks to Brian St. Pierre). Improve performance of read-string! on chunked ports. Return #f instead of raising "unknown protocol" when no request/response can be read at all. Make reading of header lines more consistent across different port types (and with master CHICKEN). Fix upgrade header parsing.
- 1.1 Add HSTS support. Improve robustness of finish-request-body and finish-response-body somewhat.
- 1.0 Disable HTTP/0.9 support for security reasons. Write request and response initial line in one burst, to prevent problems with network output. Add finish-request-body and finish-response-body procedures. Fix edge case in reading of chunked data when combined with peek-char (reported by "sz0ka" on IRC)
- 0.8 Treat the charset attribute for Content-Type header as case-insensitive token for consistency with Accept-Charset header. Remove dependency on the regex egg and improve correctness of a few parsers. Add request-has-message-body? and response-has-message-body-for-request? procedures. Add parser for Content-Disposition header and improve unparser by adding date support (Thanks to Evan Hanson). Implement line length and header count limit checking. Add read-urlencoded-request-data with built-in limit check.
- 0.7 Add trivial x-forwarded-for "parser". Add easier overriding of authorization headers through parameter instead of having to rewrite the entire parser. Add content-disposition unparser to accommodate the fact that filenames must always be quoted. Add http-status-codes parameter and status: key to update-response and make-response procedures, as well as response-status and response-status-set! procedures.
- 0.6 Change path parameters on cookies to be uri-common objects
- 0.5 Add regex requirement to make it work with Chicken 4.6.2+
- 0.4 Don't unparse "attributes" (aka "params") by titlecasing their names. Don't default Host header's port to 80, but use #f
- 0.3 Add rfc1123 to default unparser list for the 'date' header. Add etag procedures and if-match unparser. Change procedure signature for header unparsers
- 0.2 Make cookie header parsers/unparsers preserve cookie name case. Change header unparse procedure semantics slightly.
- 0.1 Initial version
License
Copyright (c) 2008-2024, Peter Bex All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.