source: project/wiki/eggref/4/intarweb @ 12524

Last change on this file since 12524 was 12524, checked in by sjamaan, 12 years ago

Document product parser output

File size: 16.8 KB
Line 
1[[tags: egg]]
2
3== Intarweb
4
5[[toc:]]
6
7=== Description
8
9Intarweb is an advanced http library.  It parses all headers into more
10useful Scheme values.
11
12=== Author
13
14[[Peter Bex]]
15
16=== Requirements
17
18Requires the [[defstruct]], [[base64]] and [[uri-generic]] extensions.
19
20=== Documentation
21
22The intarweb egg is set up to be used from a variety of
23situations. For this reason, it does not try to be a full HTTP client
24or server. If you need that kind of functionality, see eggs like
25[[spiffy]].
26
27=== Requests
28
29A request object (a [[defstruct]]-type record) can be created using
30the following procedure:
31
32<procedure>(make-request #!key uri port (method 'GET) (major 1) (minor 1) (headers (make-headers '())))</procedure>
33
34An existing request can be picked apart using the following procedures:
35<procedure>(request-uri REQUEST) => URI</procedure>
36<procedure>(request-port REQUEST) => PORT</procedure>
37<procedure>(request-method REQUEST) => SYMBOL</procedure>
38<procedure>(request-major REQUEST) => NUMBER</procedure>
39<procedure>(request-minor REQUEST) => NUMBER</procedure>
40<procedure>(request-headers REQUEST) => HEADERS</procedure>
41
42The uri defines the entity to retrieve on the server, which should be
43a [[uri-generic]]-type URI object. The port is the port where the
44request is written to or read from.  The method is a symbol that
45defines the HTTP method to use (case sensitive). major and minor
46identify the major and minor version of HTTP to use. Currently, 0.9,
471.0 and 1.1 are supported (but be careful with 0.9, it has some weird
48consequences and is not widely supported). Headers must be a headers
49object, which is described below.
50
51The client will generally write requests, while the server will read them.
52To write a request, use the following procedure:
53
54<procedure>(write-request REQUEST) => REQUEST</procedure>
55
56This will write a request line with headers to the server.  In case it
57is a request type that has any body data, this should be written to
58the the request's port. Beware that this port can be modified by
59write-request, so be sure to write to the port as it is returned by
60the write-request procedure!
61
62<procedure>(read-request PORT) => REQUEST</procedure>
63
64Reads a request object from the given input-port.  An optional request
65body can be read from the request-port after calling this procedure.
66
67Requests are parsed using parse procedures, which can be customized
68by overriding this parameter:
69
70<parameter>(request-parsers [LIST])</parameter>
71
72The list is one of procedures which accept a request line string,
73which produce a request object from that, or {{#f}} if the request
74is not of the type handled by that procedure.
75
76Requests are written using unparse procedures, which can be
77customized by overriding this parameter:
78
79<parameter>(request-unparsers [LIST])</parameter>
80
81The list is one of procedures which accept a request object and write
82to the request's output port and return the new, possibly updated
83request object. If the request object is not unparsed by this
84handler, it returns {{#f}}.
85
86=== Responses
87
88A response is also a [[defstruct]]-type record, much like a request:
89
90<procedure>(make-response #!key port (code 200) (reason "OK") (major 1) (minor 1) (headers (make-headers '())))</procedure>
91
92An existing response can be picked apart using the following procedures:
93<procedure>(response-port RESPONSE) => PORT</procedure>
94<procedure>(response-code RESPONSE) => NUMBER</procedure>
95<procedure>(response-reason RESPONSE) => STRING</procedure>
96<procedure>(response-major RESPONSE) => NUMBER</procedure>
97<procedure>(response-minor RESPONSE) => NUMBER</procedure>
98<procedure>(response-headers RESPONSE) => HEADERS</procedure>
99
100The port, major, minor and headers are the same as for requests. code
101and reason are an integer status code and the short message that
102belongs to it, as defined in the spec (examples include: 200 OK, 301
103Moved Permanently, etc).
104
105A server will usually write a response, a client will read it.
106To write a response, use the following procedure:
107
108<procedure>(write-response RESPONSE) => RESPONSE</procedure>
109
110If there is a response body, this must be written to the response-port
111after sending the response headers.
112
113<procedure>(read-response PORT) => RESPONSE</procedure>
114
115Reads a response object from the port. An optional response body can
116be read from the response-port after calling this procedure.
117
118Responses are parsed using parse procedures, which can be customized
119by overriding this parameter:
120
121<parameter>(response-parsers [LIST])</parameter>
122
123The list is one of procedures which accept a response line string,
124which produce a response object from that, or {{#f}} if the response
125is not of the type handled by that procedure.
126
127Responses are written using unparse procedures, which can be
128customized by overriding this parameter:
129
130<parameter>(response-unparsers [LIST])</parameter>
131
132The list is one of procedures which accept a response object and write
133to the response's output port and return the new, possibly updated
134response object. If the response object is not unparsed by this
135handler, it returns {{#f}}.
136
137=== Headers
138
139Requests and responses contain HTTP headers wrapped in a special
140header-object to ensure they are properly normalized.
141
142<procedure>(headers ALIST [HEADERS]) => HEADERS</procedure>
143
144This creates headers based on an input list. This list has the
145header-name as a symbol key, and a list of values as value:
146
147<example>
148<expr>
149(headers `((host ("example.com" . 8080))
150           (accept #(text/html ((q . 0.5)))
151                   #(text/xml ((q . 0.1)))))
152          old-headers)
153</expr>
154</example>
155
156This adds the named headers to the existing headers in
157{{old-headers}}. The host header is either a string with the hostname
158or a pair of hostname/port. The accept header is a list of allowed
159mime-type symbols. As can be seen here, optional parameters or
160"attributes" can be added to a header value by wrapping the value in a
161vector of length 2. The first entry in the vector is the header value,
162the second is an alist of attribute name/value pairs.
163
164The headers all have their own different types.  Here follows a list
165of headers with their value types:
166
167<table>
168<tr><th>Header name</th><th>Value type</th><th>Example value</th></tr>
169<tr>
170<td>{{accept}}</td>
171<td>List of mime-types (symbols), with optional {{q}} attribute
172indicating "quality" (preference level)</td>
173<td>{{(text/html #(text/xml ((q . 0.1))))}}</td>
174</tr>
175<tr>
176<td>{{accept-charset}}</td>
177<td>List of charset-names (symbols), with optional {{q}} attribute</td>
178<td>{{(utf-8 #(iso-8859-5 ((q . 0.1))))}}</td>
179</tr>
180<tr>
181<td>{{accept-encoding}}</td>
182<td>List of encoding-names (symbols), with optional {{q}} attribute</td>
183<td>{{(gzip #(identity ((q . 0))))}}</td>
184</tr>
185<tr>
186<td>{{accept-language}}</td>
187<td>List of language-names (symbols), with optional {{q}} attribute</td>
188<td>{{(en-gb #(nl ((q . 0.5))))}}</td>
189</tr>
190<tr>
191<td>{{accept-ranges}}</td>
192<td>List of range types acceptable (symbols). The spec only defines
193{{bytes}} and {{none}}.</td>
194<td>{{(bytes)}}</td>
195</tr>
196<tr>
197<td>{{age}}</td>
198<td>Age in seconds (number)</td>
199<td>{{(3600)}}</td>
200</tr>
201<tr>
202<td>{{allow}}</td>
203<td>List of methods that are allowed (symbols).</td>
204<td>{{(GET POST PUT DELETE)}}</td>
205</tr>
206<tr>
207<td>{{authorization}}</td>
208<td>Authorization information. This consists of a symbol identifying the
209authentication scheme, with scheme-specific attributes.</td>
210<td>{{(digest #((username . "foo")))}}</td>
211</tr>
212<tr>
213<td>{{cache-control}}</td>
214<td>An alist of key/value pairs. If no value is applicable, it is {{#t}}</td>
215<td>((public . #t) (max-stale . 10) (no-cache . (max-age set-cookie)))</td>
216</tr>
217<tr>
218<td>{{connection}}</td>
219<td>A list of connection options (symbols)</td>
220<td>{{(close)}}</td>
221</tr>
222<tr>
223<td>{{content-encoding}}</td>
224<td>A list of encodings (symbols) applied to the entity-body.</td>
225<td>{{(deflate gzip)}}</td>
226</tr>
227<tr>
228<td>{{content-language}}</td>
229<td>The natural language(s) of the "intended audience" (symbols)</td>
230<td>{{(de nl en-gb)}}</td>
231</tr>
232<tr>
233<td>{{content-length}}</td>
234<td>The number of bytes (an exact number) in the entity-body</td>
235<td>{{(10)}}</td>
236</tr>
237<tr>
238<td>{{content-location}}</td>
239<td>A location that the content can be retrieved from (a uri-generic object)</td>
240<td>{{(<#uri-generic# ...>)}}</td>
241</tr>
242<tr>
243<td>{{content-md5}}</td>
244<td>The MD5 checksum (a string) of the entity-body</td>
245<td>{{("12345ABCDEF")}}</td>
246</tr>
247<tr>
248<td>{{content-range}}</td>
249<td>Content range (pair with start- and endpoint) of the entity-body, if partially sent</td>
250<td>{{((25 . 120))}}</td>
251</tr>
252<tr>
253<td>{{content-type}}</td>
254<td>The mime type of the entity-body (a symbol)</td>
255<td>{{(text/html)}}</td>
256</tr>
257<tr>
258<td>{{date}}</td>
259<td>The date at which the message originated</td>
260<td>TODO</td>
261</tr>
262<tr>
263<td>{{etag}}</td>
264<td>An entity-tag (pair, car being either the symbol weak or strong, cdr being a symbol) that uniquely identifies the resource contents.</td>
265<td>{{((strong . foo123))}}</td>
266</tr>
267<tr>
268<td>{{expect}}</td>
269<td>Expectations of the server's behaviour (alist of symbol-string pairs), possibly with parameters.</td>
270<td>{{(#(((100-continue . #t)) ()))}}</td>
271</tr>
272<tr>
273<td>{{expires}}</td>
274<td>Expiry timestamp for the entity</td>
275<td>TODO</td>
276</tr>
277<tr>
278<td>{{from}}</td>
279<td>The e-mail address (a string) of the human user who controls the client</td>
280<td>{{("info@example.com")}}</td>
281</tr>
282<tr>
283<td>{{host}}</td>
284<td>The host to use (for virtual hosting). This is a pair of hostname and port</td>
285<td>{{(("example.com" . 80))}}</td>
286</tr>
287<tr>
288<td>{{if-match}}</td>
289<td>Entity-tags (pair, weak/strong symbol and unique entity identifier symbol) which must match.</td>
290<td>{{((strong . foo123) (strong . bar123))}}</td>
291</tr>
292<tr>
293<td>{{if-modified-since}}</td>
294<td>Timestamp which indicates since when the entity must have been modified.</td>
295<td>TODO</td>
296</tr>
297<tr>
298<td>{{if-none-match}}</td>
299<td>Entity tags (pair, weak/strong symbol and unique entity identifier symbol) which must not match.</td>
300<td>{{((strong . foo123) (strong . bar123))}}</td>
301</tr>
302<tr>
303<td>{{if-range}}</td>
304<td>The range to request, if the entity was unchanged</td>
305<td>TODO</td>
306</tr>
307<tr>
308<td>{{if-unmodified-since}}</td>
309<td>A timestamp since which the entity must not have been modified</td>
310<td>TODO</td>
311</tr>
312<tr>
313<td>{{last-modified}}</td>
314<td>A timestamp when the entity was last modified</td>
315<td>TODO</td>
316</tr>
317<tr>
318<td>{{location}}</td>
319<td>A location (an URI object) to which to redirect</td>
320<td>{{(<#uri-object ...>)}}</td>
321</tr>
322<tr>
323<td>{{max-forwards}}</td>
324<td>The maximum number of proxies that can forward a request</td>
325<td>{{(2)}}</td>
326</tr>
327<tr>
328<td>{{pragma}}</td>
329<td>An alist of symbols containing implementation-specific directives.</td>
330<td>{{((no-cache . #t) (my-extension . my-value))}}</td>
331</tr>
332<tr>
333<td>{{proxy-authenticate}}</td>
334<td>Proxy authentication options (authentication scheme symbol, with parameters)</td>
335<td>{{(digest #((username . "foo")))}}</td>
336</tr>
337<tr>
338<td>{{proxy-authorization}}</td>
339<td>Same as the above, only request-side instead of response-side</td>
340<td>{{(digest #((username . "foo")))}}</td>
341</tr>
342<tr>
343<td>{{range}}</td>
344<td>The range of bytes (a pair of start and end) to request from the server.</td>
345<td>{{((25 . 120))}}</td>
346</tr>
347<tr>
348<td>{{referer}}</td>
349<td>The referring URL (uri-generic object) that linked to this one.</td>
350<td>{{(<#uri-object ...>)}}</td>
351</tr>
352<tr>
353<td>{{retry-after}}</td>
354<td>Timestamp after which to retry the request if unavailable now.</td>
355<td>TODO</td>
356</tr>
357<tr>
358<td>{{server}}</td>
359<td>List of products the server uses (list of 3-tuple lists of strings; product name, product version, comment. Version and/or comment may be {{#f}})</td>
360<td>{{(("Apache" "2.2.9" "Unix") ("mod_ssl" "2.2.9" #f) ("OpenSSL" "0.9.8e" #f) ("DAV" "2" #f) ("mod_fastcgi" "2.4.2" #f) ("mod_apreq2-20051231" "2.6.0" #f))}}</td>
361</tr>
362<tr>
363<td>{{te}}</td>
364<td>Allowed transfer-encodings (symbols, with optional q attribute) for the response</td>
365<td>{{(deflate #(gzip ((q . 0.2))))}}</td>
366</tr>
367<tr>
368<td>{{trailer}}</td>
369<td>Names of header fields (symbols) available in the trailer/after body</td>
370<td>{{(range etag)}}</td>
371</tr>
372<tr>
373<td>{{transfer-encoding}}</td>
374<td>The encodings (symbols) used in the body</td>
375<td>{{(chunked)}}</td>
376</tr>
377<tr>
378<td>{{upgrade}}</td>
379<td>Product names to which must be upgraded (strings)</td>
380<td>TODO</td>
381</tr>
382<tr>
383<td>{{user-agent}}</td>
384<td>List of products the user agent uses (list of 3-tuple lists of strings; product name, product version, comment. Version and/or comment may be {{#f}})</td>
385<td>{{(("Mozilla" "5.0" "X11; U; NetBSD amd64; en-US; rv:1.9.0.3") ("Gecko" "2008110501" #f) ("Minefield" "3.0.3" #f))}}</td>
386</tr>
387<tr>
388<td>{{vary}}</td>
389<td>The names of headers that define variation in the resource body, to determine cachability (symbols)</td>
390<td>{{(range etag)}}</td>
391</tr>
392<tr>
393<td>{{via}}</td>
394<td>The intermediate hops through which the message is forwarded (strings)</td>
395<td>TODO</td>
396</tr>
397<tr>
398<td>{{warning}}</td>
399<td>Warning code for special status</td>
400<td>TODO</td>
401</tr>
402<tr>
403<td>{{www-authenticate}}</td>
404<td>If unauthorized, a challenge to authenticate (symbol, with attributes)</td>
405<td>{{(digest #((username . "foo")))}}</td>
406</tr>
407<tr>
408<td>{{set-cookie}}</td>
409<td>Cookies to set (name/value string pair, with attributes)</td>
410<td>{{(#(("foo" . "bar") ((max-age . 10))))}}</td>
411</tr>
412<tr>
413<td>{{cookie}}</td>
414<td>Cookies that were set (name/value string pair, with attributes)</td>
415<td>{{(#(("foo" . "bar") (($path . "/"))))}}</td>
416</tr>
417</table>
418
419Any unrecognised headers are assumed to be multi-headers, and the
420entire header lines are put unparsed into a list, one entry per line.
421
422==== Header-parsers
423
424The parsers used to read and write header values can be customized
425with the following parameters:
426
427<parameter>(header-parsers [ALIST])</parameter>
428<parameter>(header-unparsers [ALIST])</parameter>
429
430These parsers are indexed with as key the header name (a symbol) and
431the value being a procedure which accepts three values: the name of
432the header (symbol), the contents of the header (a string, without the
433leading header name and colon) and the preceding headers. It should
434merge the new header with the preceding headers and return the
435resulting headers.
436
437Header parsers are supposed to call these procedures to add headers:
438
439<procedure>(replace-header-contents NAME CONTENTS HEADERS) => HEADERS</procedure>
440<procedure>(replace-header-contents! NAME CONTENTS HEADERS) => HEADERS</procedure>
441<procedure>(update-header-contents NAME CONTENTS HEADERS) => HEADERS</procedure>
442<procedure>(update-header-contents! NAME CONTENTS HEADERS) => HEADERS</procedure>
443
444The {{replace}} procedures replace any existing contents of the named
445header with new ones, the {{update}} procedures add these contents to
446the existing header. The procedures with a name ending in bang are
447linear update variants of the ones without the bang. The header
448contents have to be normalized to be a 2-element vector, with the
449first element being the actual value and the second element being an
450alist (possibly empty) of parameters/attributes for that value.
451
452The update procedures append the value to the existing header if it is
453a multi-header, and act as a simple replace in the case of a
454single-header.
455
456Whether a header is allowed once or multiple times in a request or
457response is determined by this parameter:
458
459<parameter>(single-headers [LIST])</parameter>
460
461The value is a list of symbols that define header-names which are
462allowed to occur only once in a request/response.
463
464=== Changelog
465
466* 0.1 Initial version
467
468=== License
469
470  Copyright (c) 2008, Peter Bex
471  All rights reserved.
472 
473  Redistribution and use in source and binary forms, with or without
474  modification, are permitted provided that the following conditions are
475  met:
476 
477  Redistributions of source code must retain the above copyright
478  notice, this list of conditions and the following disclaimer.
479 
480  Redistributions in binary form must reproduce the above copyright
481  notice, this list of conditions and the following disclaimer in the
482  documentation and/or other materials provided with the distribution.
483 
484  Neither the name of the author nor the names of its contributors may
485  be used to endorse or promote products derived from this software
486  without specific prior written permission.
487 
488  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
489  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
490  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
491  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
492  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
493  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
494  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
495  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
496  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
497  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
498  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
499  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.