source: project/wiki/eggref/4/http-client @ 33507

Last change on this file since 33507 was 33507, checked in by sjamaan, 3 years ago

Add CVE-2016-6287 to http-client changelog

File size: 24.8 KB
Line 
1[[tags: egg]]
2
3== http-client
4
5[[toc:]]
6
7=== Description
8
9Http-client is a highlevel HTTP client library.
10
11=== Author
12
13[[/users/peter-bex|Peter Bex]]
14
15=== Requirements
16
17Requires the [[intarweb]], [[sendfile]] and [[md5]] extensions.
18
19The [[openssl]] extension is optional as of 0.7; if it's not installed
20you'll get an error when trying to access a HTTPS URI.
21
22=== Documentation
23
24==== Main request procedures
25
26<procedure>(call-with-response request writer reader)</procedure>
27
28This is the core http-client procedure, but it is also pretty
29low-level.  It is only necessary to use this when you want the most
30control over the request/response cycle.  Otherwise, you should use
31{{with-input-from-request}}, {{call-with-input-request}} or
32{{call-with-input-request*}}.
33
34{{request}} is the request object that contains information about the
35request to perform.  {{reader}} is a procedure that receives the
36response object and should read the ''entire'' request body (any
37leftover data will cause errors on subsequent requests with keepalive
38connections), {{writer}} is a procedure that receives the request
39object and should write the request body.
40
41The {{writer}} should be prepared to be called several times; if the
42response is a redirect or some other status that indicates the server
43wants the client to perform a new request, the writer should be ready
44to write a request body for this new request. In case digest
45authentication with message integrity checking is used, {{writer}} is
46always invoked at least twice, once to determine the message digest of
47the response and once to actually write the response.
48
49Returns three values: The result of the call to {{reader}} (or {{#f}}
50if there is no message body in the response), the request-uri of
51the last request and the response object. The request-uri is useful
52because this is to be used as the base uri of the document. This can
53differ from the initial request in the presence of redirects.
54
55If there is no response body to read (as determined by intarweb's
56{{response-has-message-body-for-request?}}), the {{reader}} procedure
57is not invoked at all.
58
59If successive requests cause more than {{max-redirect-depth}} redirect
60responses to occur, a condition of type
61{{(exn http redirect-depth-exceeded)}} is raised.
62
63If the request's URI or the URI of a used proxy is of an unsupported
64type, a condition of type {{(exn http unsupported-uri-scheme)}} is
65raised (this can of course also occur when the initial URI is correct,
66but the server redirects to an URI with an unsupported scheme).
67
68When the request requires authentication of an unsupported type, a
69condition of type {{(exn http unknown-authtype)}} is raised.
70
71<procedure>(call-with-input-request uri-or-request writer reader)</procedure>
72
73This procedure is a convenience wrapper around {{call-with-response}}.
74
75It is much less strict - {{uri-or-request}} can be an [[intarweb]]
76request object, but also an uri-common object or even a string with
77the URI in it, in which case a request object will be automatically
78constructed around the URI, using the {{GET}} method when {{writer}}
79is {{#f}} or the {{POST}} method when {{writer}} is not {{#f}}.
80
81{{writer}} can be either {{#f}} (in which case nothing is written and
82the {{GET}} method chosen), a string containing the raw data to send,
83an alist, or a procedure that accepts a port and writes the
84response data to it.  If you supply a procedure, do not forget to set
85the {{content-length}} header!  In the other cases, whenever possible,
86the length is calculated and the header automatically set for you.
87
88If you supplied an alist, the {{content-type}} header is automatically
89set to {{application/x-www-form-urlencoded}} unless there's an alist
90entry whose value is a list starting with the keyword {{file:}}, in
91which case {{multipart/form-data}} is used.  See the examples for
92{{with-input-from-request}} below.  If the data cannot be form-encoded,
93a condition of type {{(exn http formdata-error)}} is raised.
94
95{{reader}} is either {{#f}} or a procedure which accepts a port and
96reads out the data.  If there is data left in the port when the reader
97returns (or {{#f}} was supplied), this will be automatically discarded
98to avoid problems.
99
100Returns three values: The result of the call to {{reader}} (or {{#f}}
101if there is no message body in the response), the request-uri of the
102last request and the response object.  If the response code is not in
103the 200 class, it will raise a condition of type
104{{(exn http client-error)}}, {{(exn http server-error)}} or
105{{(exn http unexpected-server-response)}}, depending on the response
106code.  This includes {{404 not found}} (which is a {{client-error}}).
107
108If there is no response body to read (as determined by intarweb's
109{{response-has-message-body-for-request?}}), the {{reader}} procedure
110is not invoked at all.
111
112When posting multipart form data, the value of a file entry is a list
113of keyword-value pairs.  The following keywords are recognised:
114
115; {{file:}} : This indicates the file to read from.  Can be either a string or a port. This ''must'' be specified, everything else is optional.
116; {{filename:}} : This indicates the filename to pass on to the server.  If not specified or {{#f}}, the {{file:}}'s string (or port-name in case of a port) will be used.
117; {{headers:}} : Additional headers to send for this entry (an [[intarweb]] headers-object).
118
119<procedure>(call-with-input-request* uri-or-request writer reader)</procedure>
120
121As {{call-with-input-request}}, except {{reader}} is passed two
122arguments: the input port and the complete intarweb response object
123(useful for when you want to inspect headers or other aspects of the
124response).
125
126Please note that the port is '''not''' the same as the
127{{response-port}} from the response object: the port is delimited so
128that you can read until {{EOF}}.  The {{response-port}} is the
129original underlying, unbounded port.  If you do want to read from it,
130you must make sure to read no more than what's in the
131{{Content-Length}} header, if present.  If the header is not present,
132it will either be a chunked port (which is implicitly delimited by
133intarweb) or the port will be closed by the remote end after it is
134consumed, so you can read until EOF in that case.
135
136<procedure>(with-input-from-request uri-or-request writer-thunk reader-thunk)</procedure>
137
138Same as {{call-with-input-request}}, except when you pass a procedure
139as {{reader-thunk}} or {{writer-thunk}} it has to be a thunk (lambda
140of no arguments) instead of a procedure of one argument.  These thunks
141will be executed with the current input (or output) port to the
142request or response port, respectively.
143
144You can still pass {{#f}} for both or an alist or string for
145{{writer-thunk}}.
146
147===== Examples
148
149<enscript highlight="scheme">
150(use http-client)
151
152;; Start with a simple GET request:
153(with-input-from-request "http://wiki.call-cc.org/" #f read-string)
154 => ;; [the chicken wiki page HTML contents]
155
156;; Perform a POST of the key "test" with value "value" to an echo service:
157(with-input-from-request "http://localhost/echo-service"
158                         '((test . "value")) read-string)
159 => "You posted: test=value"
160
161;; Performing a PUT request (a less commonly used method) requires
162;; constructing your request object manually:
163
164(use intarweb uri-common)  ; Required for "make-request" and "uri-reference"
165
166(with-input-from-request
167  (make-request method: 'PUT
168                uri: (uri-reference "http://example.com/blabla"))
169  (lambda () (print "Page contents"))
170  read-string)
171
172;; Performing a JSON PUT request furthermore requires you to
173;; pass custom headers:
174(let* ((uri (uri-reference "http://www.example.com/some/document"))
175       (req (make-request method: 'PUT
176                          uri: uri
177                          headers: (headers '((content-type application/json))))))
178  (with-input-from-request req "Contents of the document" read-string))
179
180;; Finally, an example where we need to send an "attachment" (file)
181;; We post a file to the echo-service from the first example.
182;; This results in a multi-part POST request, for which we set
183;; custom headers on the file (but not the main request)
184(with-input-from-request "http://localhost/echo-service"
185                         '((test . "value")
186                           (test-file file: "/tmp/myfile" filename: "hello.txt"
187                                      headers: ((content-type text/plain))))
188                         read-string)
189 => "You posted: test=value and a file named \"hello.txt\""
190</enscript>
191
192
193==== Request handling parameters
194
195<parameter>(max-retry-attempts [number])</parameter>
196
197When a request fails because of an I/O or network problem (or simply
198because the remote end closed a persistent connection while we were
199doing something else), the library will try to establish a new
200connection and perform the request again.  This parameter controls how
201many times this is allowed to be done.  If {{#f}}, it will never give up.
202
203Defaults to 1.
204
205<parameter>(retry-request? [predicate])</parameter>
206
207This procedure is invoked when a retry should take place, to determine
208if it should take place at all.  It should be a procedure accepting a
209request object and returning {{#f}} or a true value.  If the value is
210true, the new request will be sent.  Otherwise, the error that caused
211the retry attempt will be re-raised.
212
213Defaults to {{idempotent?}}, from [[intarweb]].  This is because
214non-idempotent requests cannot be safely retried when it is unknown
215whether the previous request reached the server or not.
216
217<parameter>(max-redirect-depth [number])</parameter>
218
219The maximum number of allowed redirects, or {{#f}} if there is no
220limit.  Currently there's no automatic redirect loop detection
221algorithm implemented.  If zero, no redirects will be followed at all.
222
223Defaults to 5.
224
225When the redirect limit is reached, {{call-with-response}} raises a
226condition of type {{(exn http redirect-depth-exceeded)}}.
227
228<parameter>(client-software [software-spec])</parameter>
229
230This is the names, versions and comments of the software packages that
231the client is using, for use in the {{user-agent}} header which is
232automatically added to each request.
233
234Defaults to {{(("Chicken Scheme HTTP-client" VERSION #f))}}, where
235{{VERSION}} is the version of this egg.
236
237
238==== Connection management
239
240<procedure>(close-connection! uri)</procedure>
241
242Close the connection to the server associated with the URI.
243
244<procedure>(close-all-connections!)</procedure>
245
246Close all connections to all servers.
247
248==== Setting up custom server connections
249
250<procedure>(default-server-connector uri proxy)</procedure>
251
252The default value of the {{server-connector}} parameter.  This
253procedure creates a connection to the remote end for the given {{uri}}
254(an [[uri-common]] object) and returns two values: an input port and
255an output port.
256
257If {{proxy}} is not {{#f}} but an [[uri-common]] object, it will
258connect to that, instead.
259
260This connector supports plain {{http}} connections, and {{https}} if
261the {{openssl}} egg can be loaded (which it attempts to do on the
262fly).
263
264<parameter>(server-connector [connector])</parameter>
265
266This parameter holds a procedure which is invoked to establish a
267connection for an URI.
268
269The procedure should accept two uri-common objects as arguments: the
270first indicates the URI for which the connection is to be made and the
271second indicates the proxy through which the connection should be
272made, or {{#f}} if a direct connection should be made to the first
273URI's host and port.
274
275This can be used for nonstandard or complex connections, like for
276example connecting to UNIX domain sockets or for supplying SSL/TLS
277client certificates.
278
279===== SSL client certificate authentication example
280
281This is how you would make a connection to an HTTPS server while
282supplying a client certificate.  Many thanks to Ryan Senior for the
283initial code.
284
285<enscript highlight="scheme">
286(use http-client uri-common openssl)
287
288(define (make-ssl-context/client-cert ca-cert-path cert-path key-path)
289  (let ((ssl-ctx (ssl-make-client-context 'tls)))
290
291    ;; Set up so the server's certificate can and will be verified
292    (ssl-load-suggested-certificate-authorities! ssl-ctx ca-cert-path)
293    (ssl-load-verify-root-certificates! ssl-ctx ca-cert-path)
294    (ssl-set-verify! ssl-ctx #t)
295
296    ;; Now load the client certificate
297    (ssl-load-certificate-chain! ssl-ctx cert-path)
298    (ssl-load-private-key! ssl-ctx key-path)
299
300    ;; Return the object we created
301    ssl-ctx))
302
303;; This creates server connectors associated with an SSL context
304(define (make-ssl-server-connector/context ssl-ctx)
305  (lambda (uri proxy)
306    (let ((remote-end (or proxy uri)))
307      (if (eq? 'https (uri-scheme remote-end))
308          ;; Only use ssl-connect for HTTPS connections
309          (ssl-connect (uri-host remote-end)
310                       (uri-port remote-end)
311                       ssl-ctx)
312          ;; Use http-client's default otherwise
313          (default-server-connector uri proxy)))))
314
315;; Now, make a context and matching connector, and register it
316(let ((ssl-ctx (make-ssl-context/client-cert
317                 "/etc/ssl/certs/ca.crt"
318                 "/etc/ssl/certs/my-client-cert.crt"
319                 "/etc/ssl/private/my-client-cert.key")))
320  (server-connector (make-ssl-server-connector/context ssl-ctx)))
321</enscript>
322
323Now, all requests made with any of the http-client procedures would
324authenticate with a server using the configured client certificate.
325
326==== Cookie management
327
328http-client's cookie management is supposed to be as automatic and
329DWIMmy as possible.  This means it will write any cookie as instructed
330by a server and all stored cookies are automatically sent back to the
331server upon a new request.
332
333However, in some cases you may want to take control of how cookies are
334stored.
335
336The API described here should be considered unstable and it may change
337dramatically when someone comes up with a better way to handle cookies.
338
339<procedure>(get-cookies-for-uri uri)</procedure>
340
341Fetch a list of all cookies which ought to be sent to the given URI.
342Cookies are vectors of two elements: a name/value pair and an alist of
343attributes.  In other words, these are the exact same values you can
344put in a {{cookie}} header.
345
346<procedure>(store-cookie! cookie-info set-cookie)</procedure>
347
348Store a cookie in the cookiejar corresponding to the Set-Cookie header
349given by {{set-cookie}}.  This overwrites any cookie that is equal to
350this cookie, as defined by RFC 2965, section 3.3.3.  Practically, this
351means that when the cookie's name, domain and path are equal to an
352existant one, it will be overwritten by the new one.  These attributes
353are taken from the {{cookie-info}} alist and expected to be there.
354
355Generally, attributes should be taken from {{set-cookie}}, but if
356missing they ought to be taken from the request URI that responded
357with the {{set-cookie}}.
358
359<enscript highlight="scheme">
360(store-cookie! `((path . ,(make-uri path: '(/ "")))
361                 (domain . "some.host.com")
362                 (secure . #t))
363               `#(("COOKIE_NAME" . "cookie-value")
364                  ((path . ,(make-uri path: '(/ ""))))))
365</enscript>
366
367<procedure>(delete-cookie! cookie-name cookie-info)</procedure>
368
369Removes any cookie from the cookiejar that is equal to the given
370cookie (again, in the sense of RFC 2965, section 3.3.3).
371The {{cookie-name}} must match and the {{path}} and {{domain}} values for
372the {{cookie-info}} alist must match.
373
374==== Authentication support
375
376When a 401 Unauthorized response is received, in most interactive
377clients, the user is normally asked to authenticate.  To support this
378type of interaction, http-client offers the following parameter:
379
380<parameter>(determine-username/password [HANDLER])</parameter>
381
382The procedure in this parameter is called whenever the remote
383host requests authentication via a 401 Unauthorized response.
384
385The {{HANDLER}} is a procedure of two arguments; the URI for the
386resource currently being requested and the realm (a string) which
387wants credentials.  The procedure should return two string values:
388the username and the password to use for authentication.
389
390The default value is a procedure which extracts the username and
391password components from the URI.
392
393For proxy authentication support, see {{determine-proxy-username/password}}
394in the next section.
395
396<parameter>(http-authenticators [AUTHENTICATORS])</parameter>
397
398This parameter allows for pluggable authentication schemes.
399{{AUTHENTICATORS}} is an alist mapping authentication scheme name
400to a procedure of 7 arguments:
401
402{{(lambda (response response-header new-request request-header uri realm writer) ...)}}
403
404Here, {{response}} is the response object, {{response-header}} is the
405name of the response header which required authentication - a symbol
406which is either {{www-authenticate}} or {{proxy-authenticate}}.
407
408{{new-request}} is the request that will be sent next, to be populated
409with additional headers by the authenticator procedure, and
410{{request-header}} is the name of the request header which is expected
411to be provided and supplied with extra details by the authenticator -
412also a symbol, which is either {{authorization}} or
413{{proxy-authorization}}.
414
415{{uri}} is the URI which was requested when the authorization was
416demanded (in case of {{www-authenticate}}, the protected resource) and
417{{realm}} is the authentication realm (a string).
418
419Finally {{writer}} is the writer procedure passed by the user or
420fabricated by {{call-with-input-request}} based on the user's form
421arguments.  It's always a procedure accepting a request object.
422This is only needed when full-request authentication is desired, to
423obtain a request body.
424
425==== Proxy support
426
427http-client has support for sending requests through proxy servers.
428
429<parameter>(determine-proxy [HANDLER])</parameter>
430
431Whenever a request is sent, the library invokes the procedure stored
432in this parameter to determine through what proxy to send the request,
433if any.
434
435The {{HANDLER}} procedure receives one argument, the URI about to be
436requested, and returns either an URI-common absolute URI object
437representing the proxy or {{#f}} if no proxy should be used.
438
439The URI's path and query, if present, are ignored; only the scheme
440and authority (host, port, username, password) are used.
441
442The default value of this parameter is {{determine-proxy-from-environment}}.
443
444<enscript highlight="scheme">
445(determine-proxy
446 (lambda (url)
447   (uri-reference "http://127.0.0.1:8888/")))
448</enscript>
449
450
451If you just want to disable proxy support, you can do:
452
453<enscript highlight="scheme">
454(determine-proxy (constantly #f))   ; From unit data-structures
455</enscript>
456
457<procedure>(determine-proxy-from-environment URI)</procedure>
458
459This procedure implements the common behaviour of HTTP software under
460UNIX:
461
462* First it checks if the requested URI's host (or an asterisk) is listed in the {{NO_PROXY}} environment variable (if suffixed with a port number, the port is also compared).  If a match is found, no proxy is used.
463* Then it will check if the {{$(protocol)_proxy}} or the {{$(PROTOCOL)_PROXY}} variable (in that order) are set.  If so, that's used.  {{protocol}} here actually means "scheme", so the URI's scheme is used, suffixed with {{_proxy}}. This means {{http_proxy}} is used for HTTP requests and {{https_proxy}} is used for HTTPS requests, but see the next point.
464* If the scheme is {{http}} and the environment variable {{REQUEST_METHOD}} is present, {{CGI_HTTP_PROXY}} is used instead of {{HTTP_PROXY}} to prevent a "[[https://httpoxy.org|httpoxy]]" attack.  This makes the assumption that {{REQUEST_METHOD}} is set because the library is being used in a CGI script.
465* If there's still no match, it looks for {{all_proxy}} or {{ALL_PROXY}}, in that order. If one of these environment variables are set, that value is used as a fallback proxy.
466* Finally, if none of these checks resulted in a proxy URI, no proxy will be used.
467
468Some UNIX software expects plain hostnames or hostname port
469combinations separated by colons, but (currently) this library expects
470full URIs, like most modern UNIX programs.
471
472<parameter>(determine-proxy-username/password [HANDLER])</parameter>
473
474The procedure in this parameter is called whenever the proxy requests
475authentication via a 407 Proxy Authentication Required response. This
476basically works the same as authentication against an origin server.
477
478The {{HANDLER}} is a procedure of two arguments; the URI for the
479''proxy'' currently being used and the realm (a string) which wants
480credentials.  The procedure should return two string values: the
481username and the password to use for authentication.
482
483The default value is a procedure which extracts the username and
484password components from the proxy's URI.
485
486
487=== Changelog
488
489* 0.10 Do not read {{HTTP_PROXY}} if {{REQUEST_METHOD}} is present (running in a CGI script), to prevent "[[https://httpoxy.org|httpoxy]]" attack (CVE-2016-6287).
490* 0.9 Add support for custom connector procedures.  Thanks to Ryan Senior for suggesting support for https client certificates, which this makes possible.
491* 0.8 Fix bug in multipart/form-data file uploads with non-file components in the form data causing a crash.  Thanks to Ryan Senior for reporting the bug and testing the fix.
492* 0.7.2 Add {{call-with-input-request*}}. Thanks to [[/users/mario-domenech-goulart|Mario Goulart]] for suggesting this.
493* 0.7.1 Fix delimited port handling of {{peek-char}} which caused mysterious openssl errors.  Thanks to [[/users/mario-domenech-goulart|Mario Goulart]] for a reproducible test case.
494* 0.7 Reduce CPU usage by implementing custom {{read-string!}} and {{read-line}} procedures in {{make-delimited-input-port}}. Improved error reporting (show URI as string, and always include it in error messages). Gracefully handle premature disconnection by retrying (as per RFC2616, 8.2.4).  Make openssl an optional dependency to make it easier to install on Windows.
495* 0.6.1 Work around a bug in {{read-string!}} in CHICKEN core which caused random errors.
496* 0.6 Provide a proper condition when encountering unsupported URI schemes (thanks to [[/users/christian-kellermann|Christian Kellermann]]).  Fix response body reading in error situations (thanks to [[/users/andyjpb|Andy Bennett]]).  Update request writer to use new {{finish-request-body}} from intarweb 1.0.
497* 0.5.1 Restore compatibility with message-digest and string-utils egg.
498* 0.5 Improve detection of dropped connections (prevents unneccessary "connection reset" exceptions to propagate into the program). Simplify interface by switching to {{POST}} when a {{writer}} is given to {{with-input-from-request}} and {{call-with-input-request}}.  Add support for multipart forms (file upload). Fix error in case of missing username when authorization was required (introduced by version 0.4.2). Put loop call in tail position (thanks to [[/users/felix-winkelmann|Felix]]) Automatically discard remaining data on the input port, if any, to avoid problems on subsequent requests. Add rudimentary support for parameterizable authentication schemes.
499* 0.4.2 Allow missing passwords in URIs for authentication
500* 0.4.1 Fix connection status check so when the remote end closed the connection we don't try to read from it anymore (thanks to Daishi Kato and Thomas Hintz)
501* 0.4 Fix redirection code on 303, and off-by-1 mistake in redirects count (thanks to Moritz Heidkamp). Add arguments to exn objects (thanks to Christian Kellermann). Also accept an empty alist for POSTdata. Fix URI path comparisons in cookies (thanks to Daishi Kato)
502* 0.3 Fixed handling of missing Path parameters in set-cookie headers. Reported by Hugo Arregui. Improve set-cookie handling by only passing Path and Domain when matching Set-Cookie header included those parameters.
503* 0.2 Added proxy support and many many bugfixes
504* 0.1 Initial version
505
506=== License
507
508  Copyright (c) 2008-2016, Peter Bex
509  Parts copyright (c) 2000-2004, Felix L. Winkelmann
510  All rights reserved.
511 
512  Redistribution and use in source and binary forms, with or without
513  modification, are permitted provided that the following conditions are
514  met:
515 
516  Redistributions of source code must retain the above copyright
517  notice, this list of conditions and the following disclaimer.
518 
519  Redistributions in binary form must reproduce the above copyright
520  notice, this list of conditions and the following disclaimer in the
521  documentation and/or other materials provided with the distribution.
522 
523  Neither the name of the author nor the names of its contributors may
524  be used to endorse or promote products derived from this software
525  without specific prior written permission.
526 
527  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
528  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
529  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
530  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
531  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
532  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
533  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
534  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
535  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
536  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
537  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
538  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.