You are looking at historical revision 33926 of this page. It may differ significantly from its current revision.
http-client
Description
Http-client is a highlevel HTTP client library.
Author
Requirements
Requires the intarweb, sendfile and md5 extensions.
The openssl extension is optional as of 0.7; if it's not installed you'll get an error when trying to access a HTTPS URI.
Documentation
Main request procedures
[procedure] (call-with-response request writer reader)This is the core http-client procedure, but it is also pretty low-level. It is only necessary to use this when you want the most control over the request/response cycle. Otherwise, you should use with-input-from-request, call-with-input-request or call-with-input-request*.
request is the request object that contains information about the request to perform. reader is a procedure that receives the response object and should read the entire request body (any leftover data will cause errors on subsequent requests with keepalive connections), writer is a procedure that receives the request object and should write the request body.
The writer should be prepared to be called several times; if the response is a redirect or some other status that indicates the server wants the client to perform a new request, the writer should be ready to write a request body for this new request. In case digest authentication with message integrity checking is used, writer is always invoked at least twice, once to determine the message digest of the response and once to actually write the response.
Returns three values: The result of the call to reader (or #f if there is no message body in the response), the request-uri of the last request and the response object. The request-uri is useful because this is to be used as the base uri of the document. This can differ from the initial request in the presence of redirects.
If there is no response body to read (as determined by intarweb's response-has-message-body-for-request?), the reader procedure is not invoked at all.
If successive requests cause more than max-redirect-depth redirect responses to occur, a condition of type (exn http redirect-depth-exceeded) is raised.
If the request's URI or the URI of a used proxy is of an unsupported type, a condition of type (exn http unsupported-uri-scheme) is raised (this can of course also occur when the initial URI is correct, but the server redirects to an URI with an unsupported scheme).
When the request requires authentication of an unsupported type, a condition of type (exn http unknown-authtype) is raised.
[procedure] (call-with-input-request uri-or-request writer reader)This procedure is a convenience wrapper around call-with-response.
It is much less strict - uri-or-request can be an intarweb request object, but also an uri-common object or even a string with the URI in it, in which case a request object will be automatically constructed around the URI, using the GET method when writer is #f or the POST method when writer is not #f.
writer can be either #f (in which case nothing is written and the GET method chosen), a string containing the raw data to send, an alist, or a procedure that accepts a port and writes the response data to it. If you supply a procedure, do not forget to set the content-length header! In the other cases, whenever possible, the length is calculated and the header automatically set for you.
If you supplied an alist, the content-type header is automatically set to application/x-www-form-urlencoded unless there's an alist entry whose value is a list starting with the keyword file:, in which case multipart/form-data is used. See the examples for with-input-from-request below. If the data cannot be form-encoded, a condition of type (exn http formdata-error) is raised.
reader is either #f or a procedure which accepts a port and reads out the data. If there is data left in the port when the reader returns (or #f was supplied), this will be automatically discarded to avoid problems.
Returns three values: The result of the call to reader (or #f if there is no message body in the response), the request-uri of the last request and the response object. If the response code is not in the 200 class, it will raise a condition of type (exn http client-error), (exn http server-error) or (exn http unexpected-server-response), depending on the response code. This includes 404 not found (which is a client-error).
If there is no response body to read (as determined by intarweb's response-has-message-body-for-request?), the reader procedure is not invoked at all.
When posting multipart form data, the value of a file entry is a list of keyword-value pairs. The following keywords are recognised:
- file:
- This indicates the file to read from. Can be either a string or a port. This must be specified, everything else is optional.
- filename:
- This indicates the filename to pass on to the server. If not specified or #f, the file:'s string (or port-name in case of a port) will be used.
- headers:
- Additional headers to send for this entry (an intarweb headers-object).
If the URI argument is not a valid URI, a condition of type (exn http client-error bad-uri) will be raised.
If the writer is a list it is taken to be form-data, but if the encoding fails, a condition of type (exn http client-error form-data-error) will be raised.
[procedure] (call-with-input-request* uri-or-request writer reader)As call-with-input-request, except reader is passed two arguments: the input port and the complete intarweb response object (useful for when you want to inspect headers or other aspects of the response).
Please note that the port is not the same as the response-port from the response object: the port is delimited so that you can read until EOF. The response-port is the original underlying, unbounded port. If you do want to read from it, you must make sure to read no more than what's in the Content-Length header, if present. If the header is not present, it will either be a chunked port (which is implicitly delimited by intarweb) or the port will be closed by the remote end after it is consumed, so you can read until EOF in that case.
[procedure] (with-input-from-request uri-or-request writer-thunk reader-thunk)Same as call-with-input-request, except when you pass a procedure as reader-thunk or writer-thunk it has to be a thunk (lambda of no arguments) instead of a procedure of one argument. These thunks will be executed with the current input (or output) port to the request or response port, respectively.
You can still pass #f for both or an alist or string for writer-thunk.
Examples
(use http-client) ;; Start with a simple GET request: (with-input-from-request "http://wiki.call-cc.org/" #f read-string) => ;; [the chicken wiki page HTML contents] ;; Perform a POST of the key "test" with value "value" to an echo service: (with-input-from-request "http://localhost/echo-service" '((test . "value")) read-string) => "You posted: test=value" ;; Performing a PUT request (a less commonly used method) requires ;; constructing your request object manually: (use intarweb uri-common) ; Required for "make-request" and "uri-reference" (with-input-from-request (make-request method: 'PUT uri: (uri-reference "http://example.com/blabla")) (lambda () (print "Page contents")) read-string) ;; Performing a JSON PUT request furthermore requires you to ;; pass custom headers: (let* ((uri (uri-reference "http://www.example.com/some/document")) (req (make-request method: 'PUT uri: uri headers: (headers '((content-type application/json)))))) (with-input-from-request req "Contents of the document" read-string)) ;; Finally, an example where we need to send an "attachment" (file) ;; We post a file to the echo-service from the first example. ;; This results in a multi-part POST request, for which we set ;; custom headers on the file (but not the main request) (with-input-from-request "http://localhost/echo-service" '((test . "value") (test-file file: "/tmp/myfile" filename: "hello.txt" headers: ((content-type text/plain)))) read-string) => "You posted: test=value and a file named \"hello.txt\""
Request handling parameters
[parameter] (max-retry-attempts [number])When a request fails because of an I/O or network problem (or simply because the remote end closed a persistent connection while we were doing something else), the library will try to establish a new connection and perform the request again. This parameter controls how many times this is allowed to be done. If #f, it will never give up.
Defaults to 1.
[parameter] (retry-request? [predicate])This procedure is invoked when a retry should take place, to determine if it should take place at all. It should be a procedure accepting a request object and returning #f or a true value. If the value is true, the new request will be sent. Otherwise, the error that caused the retry attempt will be re-raised.
Defaults to idempotent?, from intarweb. This is because non-idempotent requests cannot be safely retried when it is unknown whether the previous request reached the server or not.
[parameter] (max-redirect-depth [number])The maximum number of allowed redirects, or #f if there is no limit. Currently there's no automatic redirect loop detection algorithm implemented. If zero, no redirects will be followed at all.
Defaults to 5.
When the redirect limit is reached, call-with-response raises a condition of type (exn http redirect-depth-exceeded).
[parameter] (client-software [software-spec])This is the names, versions and comments of the software packages that the client is using, for use in the user-agent header which is automatically added to each request.
Defaults to (("Chicken Scheme HTTP-client" VERSION #f)), where VERSION is the version of this egg.
Connection management
This egg tries to re-use connections that are marked as keep-alive, to avoid unnecessary overhead in establishing new connections when making multiple requests to the same server. This is handled through a pool of idle connections from which the request procedures take the oldest active connection.
[parameter] (max-idle-connections [count])This controls the maximum allowed idle connections at any given time. When a connection would be returned to the pool, the connection will be discarded instead, if the maximum is exceeded.
This value should always be well below the maximum number of available file descriptors for your operating system.
Defaults to 32.
[procedure] (close-connection! uri)Close the connection to the server associated with the URI.
[procedure] (close-idle-connections!)Close all remaining idle connections. Note that connections that are currently in use will still be returned to the connection pool after their requests finish!
[procedure] (close-all-connections!)Deprecated alias for close-idle-connections!.
Setting up custom server connections
[procedure] (default-server-connector uri proxy)The default value of the server-connector parameter. This procedure creates a connection to the remote end for the given uri (an uri-common object) and returns two values: an input port and an output port.
If proxy is not #f but an uri-common object, it will connect to that, instead.
This connector supports plain http connections, and https if the openssl egg can be loaded (which it attempts to do on the fly).
[parameter] (server-connector [connector])This parameter holds a procedure which is invoked to establish a connection for an URI.
The procedure should accept two uri-common objects as arguments: the first indicates the URI for which the connection is to be made and the second indicates the proxy through which the connection should be made, or #f if a direct connection should be made to the first URI's host and port. It should return two values: an input port and an output port corresponding to the connection.
This can be used for nonstandard or complex connections, like for example connecting to UNIX domain sockets or for supplying SSL/TLS client certificates.
SSL client certificate authentication example
This is how you would make a connection to an HTTPS server while supplying a client certificate. Many thanks to Ryan Senior for the initial code.
(use http-client uri-common openssl) (define (make-ssl-context/client-cert ca-cert-path cert-path key-path) (let ((ssl-ctx (ssl-make-client-context 'tls))) ;; Set up so the server's certificate can and will be verified (ssl-load-suggested-certificate-authorities! ssl-ctx ca-cert-path) (ssl-load-verify-root-certificates! ssl-ctx ca-cert-path) (ssl-set-verify! ssl-ctx #t) ;; Now load the client certificate (ssl-load-certificate-chain! ssl-ctx cert-path) (ssl-load-private-key! ssl-ctx key-path) ;; Return the object we created ssl-ctx)) ;; This creates server connectors associated with an SSL context (define (make-ssl-server-connector/context ssl-ctx) (lambda (uri proxy) (let ((remote-end (or proxy uri))) (if (eq? 'https (uri-scheme remote-end)) ;; Only use ssl-connect for HTTPS connections (ssl-connect (uri-host remote-end) (uri-port remote-end) ssl-ctx) ;; Use http-client's default otherwise (default-server-connector uri proxy))))) ;; Now, make a context and matching connector, and register it (let ((ssl-ctx (make-ssl-context/client-cert "/etc/ssl/certs/ca.crt" "/etc/ssl/certs/my-client-cert.crt" "/etc/ssl/private/my-client-cert.key"))) (server-connector (make-ssl-server-connector/context ssl-ctx)))
Now, all requests made with any of the http-client procedures would authenticate with a server using the configured client certificate.
Cookie management
http-client's cookie management is supposed to be as automatic and DWIMmy as possible. This means it will write any cookie as instructed by a server and all stored cookies are automatically sent back to the server upon a new request.
However, in some cases you may want to take control of how cookies are stored.
The API described here should be considered unstable and it may change dramatically when someone comes up with a better way to handle cookies.
[procedure] (get-cookies-for-uri uri)Fetch a list of all cookies which ought to be sent to the given URI. Cookies are vectors of two elements: a name/value pair and an alist of attributes. In other words, these are the exact same values you can put in a cookie header.
[procedure] (store-cookie! cookie-info set-cookie)Store a cookie in the cookiejar corresponding to the Set-Cookie header given by set-cookie. This overwrites any cookie that is equal to this cookie, as defined by RFC 2965, section 3.3.3. Practically, this means that when the cookie's name, domain and path are equal to an existant one, it will be overwritten by the new one. These attributes are taken from the cookie-info alist and expected to be there.
Generally, attributes should be taken from set-cookie, but if missing they ought to be taken from the request URI that responded with the set-cookie.
(store-cookie! `((path . ,(make-uri path: '(/ "")))
                 (domain . "some.host.com")
                 (secure . #t))
               `#(("COOKIE_NAME" . "cookie-value")
                  ((path . ,(make-uri path: '(/ ""))))))[procedure] (delete-cookie! cookie-name cookie-info)
Removes any cookie from the cookiejar that is equal to the given cookie (again, in the sense of RFC 2965, section 3.3.3). The cookie-name must match and the path and domain values for the cookie-info alist must match.
Authentication support
When a 401 Unauthorized response is received, in most interactive clients, the user is normally asked to authenticate. To support this type of interaction, http-client offers the following parameter:
[parameter] (determine-username/password [HANDLER])The procedure in this parameter is called whenever the remote host requests authentication via a 401 Unauthorized response.
The HANDLER is a procedure of two arguments; the URI for the resource currently being requested and the realm (a string) which wants credentials. The procedure should return two string values: the username and the password to use for authentication.
The default value is a procedure which extracts the username and password components from the URI.
For proxy authentication support, see determine-proxy-username/password in the next section.
[parameter] (http-authenticators [AUTHENTICATORS])This parameter allows for pluggable authentication schemes. AUTHENTICATORS is an alist mapping authentication scheme name to a procedure of 7 arguments:
(lambda (response response-header new-request request-header uri realm writer) ...)
Here, response is the response object, response-header is the name of the response header which required authentication - a symbol which is either www-authenticate or proxy-authenticate.
new-request is the request that will be sent next, to be populated with additional headers by the authenticator procedure, and request-header is the name of the request header which is expected to be provided and supplied with extra details by the authenticator - also a symbol, which is either authorization or proxy-authorization.
uri is the URI which was requested when the authorization was demanded (in case of www-authenticate, the protected resource) and realm is the authentication realm (a string).
Finally writer is the writer procedure passed by the user or fabricated by call-with-input-request based on the user's form arguments. It's always a procedure accepting a request object. This is only needed when full-request authentication is desired, to obtain a request body.
Proxy support
http-client has support for sending requests through proxy servers.
[parameter] (determine-proxy [HANDLER])Whenever a request is sent, the library invokes the procedure stored in this parameter to determine through what proxy to send the request, if any.
The HANDLER procedure receives one argument, the URI about to be requested, and returns either an uri-common absolute URI object representing the proxy or #f if no proxy should be used.
The URI's path and query, if present, are ignored; only the scheme and authority (host, port, username, password) are used.
The default value of this parameter is determine-proxy-from-environment.
(determine-proxy
 (lambda (url)
   (uri-reference "http://127.0.0.1:8888/")))
If you just want to disable proxy support, you can do:
(determine-proxy (constantly #f)) ; From unit data-structures[procedure] (determine-proxy-from-environment URI)
This procedure implements the common behaviour of HTTP software under UNIX:
- First it checks if the requested URI's host (or an asterisk) is listed in the NO_PROXY environment variable (if suffixed with a port number, the port is also compared). If a match is found, no proxy is used.
- Then it will check if the $(protocol)_proxy or the $(PROTOCOL)_PROXY variable (in that order) are set. If so, that's used. protocol here actually means "scheme", so the URI's scheme is used, suffixed with _proxy. This means http_proxy is used for HTTP requests and https_proxy is used for HTTPS requests, but see the next point.
- If the scheme is http and the environment variable REQUEST_METHOD is present, CGI_HTTP_PROXY is used instead of HTTP_PROXY to prevent a "httpoxy" attack. This makes the assumption that REQUEST_METHOD is set because the library is being used in a CGI script.
- If there's still no match, it looks for all_proxy or ALL_PROXY, in that order. If one of these environment variables are set, that value is used as a fallback proxy.
- Finally, if none of these checks resulted in a proxy URI, no proxy will be used.
Some UNIX software expects plain hostnames or hostname port combinations separated by colons, but (currently) this library expects full URIs, like most modern UNIX programs.
[parameter] (determine-proxy-username/password [HANDLER])The procedure in this parameter is called whenever the proxy requests authentication via a 407 Proxy Authentication Required response. This basically works the same as authentication against an origin server.
The HANDLER is a procedure of two arguments; the URI for the proxy currently being used and the realm (a string) which wants credentials. The procedure should return two string values: the username and the password to use for authentication.
The default value is a procedure which extracts the username and password components from the proxy's URI.
Changelog
- 0.11 Add max-idle-connections to avoid FD exhaustion (thanks to Alaric for pointing out this issue). Add type and value check for uri argument (thanks to Lemonman for pointing this out). Fix multipart sending of port-based files. Add basic test suite. Fix 303 redirect switch to GET method. Use chunked encoding when using a custom writer procedure and there's no content-length header.
- 0.10 Do not read HTTP_PROXY if REQUEST_METHOD is present (running in a CGI script), to prevent "httpoxy" attack (CVE-2016-6287).
- 0.9 Add support for custom connector procedures. Thanks to Ryan Senior for suggesting support for https client certificates, which this makes possible.
- 0.8 Fix bug in multipart/form-data file uploads with non-file components in the form data causing a crash. Thanks to Ryan Senior for reporting the bug and testing the fix.
- 0.7.2 Add call-with-input-request*. Thanks to Mario Goulart for suggesting this.
- 0.7.1 Fix delimited port handling of peek-char which caused mysterious openssl errors. Thanks to Mario Goulart for a reproducible test case.
- 0.7 Reduce CPU usage by implementing custom read-string! and read-line procedures in make-delimited-input-port. Improved error reporting (show URI as string, and always include it in error messages). Gracefully handle premature disconnection by retrying (as per RFC2616, 8.2.4). Make openssl an optional dependency to make it easier to install on Windows.
- 0.6.1 Work around a bug in read-string! in CHICKEN core which caused random errors.
- 0.6 Provide a proper condition when encountering unsupported URI schemes (thanks to Christian Kellermann). Fix response body reading in error situations (thanks to Andy Bennett). Update request writer to use new finish-request-body from intarweb 1.0.
- 0.5.1 Restore compatibility with message-digest and string-utils egg.
- 0.5 Improve detection of dropped connections (prevents unneccessary "connection reset" exceptions to propagate into the program). Simplify interface by switching to POST when a writer is given to with-input-from-request and call-with-input-request. Add support for multipart forms (file upload). Fix error in case of missing username when authorization was required (introduced by version 0.4.2). Put loop call in tail position (thanks to Felix) Automatically discard remaining data on the input port, if any, to avoid problems on subsequent requests. Add rudimentary support for parameterizable authentication schemes.
- 0.4.2 Allow missing passwords in URIs for authentication
- 0.4.1 Fix connection status check so when the remote end closed the connection we don't try to read from it anymore (thanks to Daishi Kato and Thomas Hintz)
- 0.4 Fix redirection code on 303, and off-by-1 mistake in redirects count (thanks to Moritz Heidkamp). Add arguments to exn objects (thanks to Christian Kellermann). Also accept an empty alist for POSTdata. Fix URI path comparisons in cookies (thanks to Daishi Kato)
- 0.3 Fixed handling of missing Path parameters in set-cookie headers. Reported by Hugo Arregui. Improve set-cookie handling by only passing Path and Domain when matching Set-Cookie header included those parameters.
- 0.2 Added proxy support and many many bugfixes
- 0.1 Initial version
License
Copyright (c) 2008-2017, Peter Bex Parts copyright (c) 2000-2004, Felix L. Winkelmann All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.