source: project/wiki/eggref/4/uri-common @ 25128

Last change on this file since 25128 was 25128, checked in by sjamaan, 10 years ago

Fix weird sentence (add missing word "low-level", not "level")

File size: 13.5 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== uri-common
5
6=== Description
7
8The {{uri-common}} library provides simple and easy-to-use parsing
9and manipulation procedures for URIs using common schemes.
10
11These "common schemes" all have the following rules:
12
13* An empty path after the hostname is considered to be identical to the root path.
14* All components are to be fully URI-decoded (so no percent-encoded characters in it).
15* The query argument will be in
16   [[http://www.w3.org/TR/xforms/#structure-model-submission|application/x-www-form-urlencoded]] form.
17* The port is automatically determined if it is omitted and the URI scheme is known.
18
19=== Library Procedures
20
21This library replaces most of the procedures in [[uri-generic]]. If
22you need to work with URIs on the uri-generic level or need to work
23with both uri-generic and uri-common URI objects, you will have to
24import and prefix or rename procedures.
25
26==== Constructors and predicates
27
28These constructors fully decode their arguments, so afterwards it is
29impossible to distinguish between encoded delimiters and unencoded
30delimiters.  This makes uri-common objects decoding endpoints; no
31further decoding on the URI level is possible (of course, applications
32are free to decode further information inside the URI).  If for some
33reason, the original URI is still needed, it can be converted to a
34uri-generic.  However, updating a URI component causes this
35component's original encoding to be lost, so be careful!
36
37<procedure>(uri-reference STRING) => URI</procedure>
38
39A URI reference is either a URI or a relative reference (RFC 3986,
40Section 4.1).  If the given string's prefix does not match the syntax
41of a scheme followed by a colon separator, then the given string is
42parsed as a relative reference.
43
44<procedure>(absolute-uri STRING) => URI</procedure>
45
46Parses the given string as an absolute URI, in which no fragments are
47allowed.  If no URI scheme is found, or a fragment is detected, this
48raises an error.
49
50Absolute URIs are defined by RFC 3986 as non-relative URI references
51without a fragment (RFC 3986, Section 4.2).  Absolute URIs can be used
52as a base URI to resolve a relative-ref against, using
53{{uri-relative-to}} (see below).
54
55==== Accessors
56
57<procedure>(uri-scheme uri-common) => symbol</procedure><br>
58<procedure>(uri-path uri-common) => list</procedure><br>
59<procedure>(uri-query uri-common) => alist</procedure><br>
60<procedure>(uri-fragment uri-common) => string</procedure><br>
61<procedure>(uri-host uri-common) => string</procedure><br>
62<procedure>(uri-port uri-common) => integer</procedure><br>
63<procedure>(uri-username uri-common) => string</procedure><br>
64<procedure>(uri-password uri-common) => string</procedure><br>
65
66Accessors for {{URI-common}} objects.
67
68If a component is not defined in the given URI-common, then the
69corresponding accessor returns {{#f}}.
70
71==== Updater
72
73<procedure>(update-uri URI-common #!key scheme path query fragment host port username password) => URI-common</procedure>
74
75Update the specified keys in the URI-common object in a functional way
76(ie, it creates a new copy with the modifications).
77
78Here's a nice tip: If you want to create an URI with only a few components set to dynamic values extracted from elsewhere, you can generally create an empty URI and update its constituent parts.
79
80You can do that like this:
81
82<enscript highlight="scheme">
83(uri->string (update-uri (uri-reference "") path: '("example" "greeting") query: '((hi . "there"))))
84 => "example/greeting?hi=there"
85</enscript>
86
87==== Predicates
88
89There are several predicates to check whether objects are URI references (the most general type of an URI-like object), or more specific types of URIs like absolute URIs or relative references. The classification tree of URI-like objects looks a bit like this:
90
91                uri-reference                         Anything defined by the RFC fits this
92                /           \
93             uri             relative-ref             Scheme (uri) or no scheme (relative-ref)?
94             /               /        \
95      absolute-uri    path-relative   path-absolute   No URI fragment(absolute-uri)? | path starts with a slash (path-absolute) or not (path-relative)?
96
97
98<procedure>(uri-reference? URI) => BOOL</procedure>
99
100Is the given object a URI reference?  '''All objects created by
101URI-common constructors are URI references'''; they are either URIs
102or relative references.  The constructors below are just more strict
103checking versions of {{uri-reference}}.  They all create
104URI references.
105
106<procedure>(absolute-uri? URI) => BOOL</procedure>
107
108Is the given object an absolute URI?
109
110<procedure>(uri? URI) => BOOL</procedure>
111
112Is the given object a URI?  URIs are all URI references that include
113a scheme part.  The other type of URI references are relative
114references.
115
116<procedure>(relative-ref? URI) => BOOL</procedure>
117
118Is the given object a relative reference?  Relative references are
119defined by RFC 3986 as URI references which are not URIs; they contain
120no URI scheme and can be resolved against an absolute URI to obtain
121a complete URI using {{uri-relative-to}}.
122
123<procedure>(uri-path-absolute? URI) => BOOL</procedure>
124
125Is the {{URI}}'s path component an absolute path?
126
127<procedure>(uri-path-relative? URI) => BOOL</procedure>
128
129Is the {{URI}}'s path component a relative path?
130
131<procedure>(uri-default-port? URI) => BOOL</procedure>
132
133Is the {{URI}}'s port the default port for the {{URI}}'s scheme?
134
135==== Reference Resolution
136
137<procedure>(uri-relative-to URI URI) => URI</procedure>
138
139Resolve the first URI as a reference relative to the second URI,
140returning a new URI (RFC 3986, Section 5.2.2).
141
142<procedure>(uri-relative-from URI URI) => URI</procedure>
143
144Constructs a new, possibly relative, URI which represents the location
145of the first URI with respect to the second URI.
146
147<enscript highlight="scheme">
148(use uri-common)
149
150(uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/")))
151 => "http://example.com/foo/qux"
152
153(uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/")))
154 => "../qux"
155</enscript>
156
157==== Query encoding and decoding
158
159<parameter>(form-urlencoded-separator [char-set/char/string])</parameter><br>
160<procedure>(form-urlencode alist #!key (separator (form-urlencoded-separator))) => string</procedure><br>
161<procedure>(form-urldecode string #!key (separator (form-urlencoded-separator))) => alist</procedure><br>
162
163Encode or decode an alist using the encoding corresponding to the
164[[http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1|form-urlencoded]]
165media type, using the given separator character(s).
166
167The alist contains key/value pairs corresponding to the values in the
168final urlencoded string.  If a value is {{#f}}, the key will be
169'''omitted''' from the string.  If it is {{#t}} the key will be
170present without a value. In all other cases, the value is converted to
171a string and urlencoded.  The keys are always converted to a string
172and urlencoded.
173
174When encoding, if {{separator}} is a string, the first character will
175be used as the separator in the resulting querystring.  If it is a
176char-set, it will be converted to a string and its first character
177will be taken.  In either case, all of these characters are encoded if
178they occur inside the key/value pairs.
179
180When decoding, any character in the set (or string) will be seen as
181a separator.
182
183The separator defaults to the string {{";&"}}.  This means that
184either semicolons or ampersands are allowed as separators when decoding
185an URI string, but semicolons are used when generating strings.
186
187If you would like to use a different separator, you should parameterize
188''all'' calls to procedures that return an uri-common object.
189
190Examples:
191
192<enscript highlight=scheme>
193(form-urlencode '(("lemon" . "ade") (sucks . #f) (rocks . #t) (number . 42)))
194=> "lemon=ade;rocks;number=42"
195
196(form-urldecode "lemon=ade;rocks;number=42")
197=> ((lemon . "ade") (rocks . #t) (number . "42"))
198</enscript>
199
200==== String encoding and decoding
201
202A little more generic but also more low-level than encoding/decoding whole
203query strings/alists at a time, you can also encode and decode strings
204on an individual level.
205
206<procedure>(uri-encode-string STRING [CHAR-SET]) => STRING</procedure>
207
208Returns the percent-encoded form of the given string.  The optional
209char-set argument controls which characters should be encoded.
210It defaults to the complement of {{char-set:uri-unreserved}}. This is
211always safe, but often overly careful; it is allowed to leave certain
212characters unquoted depending on the context.
213
214<procedure>(uri-decode-string STRING [CHAR-SET]) => STRING</procedure>
215
216Returns the decoded form of the given string.  The optional char-set
217argument controls which characters should be decoded.  It defaults to
218{{char-set:full}}.
219
220
221==== Normalization 
222
223<procedure>(uri-normalize-case URI) => URI</procedure>
224
225URI case normalization (RFC 3986 section 6.2.2.1)
226
227<procedure>(uri-normalize-path-segments URI) => URI</procedure>
228
229URI path segment normalization (RFC 3986 section 6.2.2.3)
230
231==== uri-generic, string and list representation
232
233<procedure>(uri->uri-generic uri-common) => uri-generic</procedure><br>
234<procedure>(uri-generic->uri uri-common) => uri-common</procedure>
235
236To convert between uri-generic and uri-common objects, use these
237procedures.  As stated above, this will allow you to retrieve the
238original encoding of the URI components, but once you update a
239component from the uri-common side, the original encoding is no longer
240available (the updated value replaces the original value).
241
242<procedure>(uri->string uri-common [userinfo]) => string</procedure>
243
244Reconstructs the given URI into a string; uses a supplied function
245{{LAMBDA USERNAME PASSWORD -> STRING}} to map the userinfo part of the
246URI.  If not given, it represents the userinfo as the username followed
247by {{":******"}}.
248
249<procedure>(uri->list URI USERINFO) => LIST</procedure>
250
251Returns a list of the form {{(SCHEME SPECIFIC FRAGMENT)}};
252{{SPECIFIC}} is of the form {{(AUTHORITY PATH QUERY)}}.
253
254==== Character sets
255
256As a convenience for further sub-parsers or other special-purpose URI
257handling code like separately URI-encoding strings, there are a couple
258of character sets exported by uri-common.
259
260<constant>char-set:gen-delims</constant>
261
262Generic delimiters.
263  gen-delims  =  ":" / "/" / "?" / "#" / "[" / "]" / "@"
264
265<constant>char-set:sub-delims</constant>
266
267Sub-delimiters.
268  sub-delims  =  "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
269
270<constant>char-set:uri-reserved</constant>
271
272The union of {{gen-delims}} and {{sub-delims}}; all reserved URI characters.
273  reserved    =  gen-delims / sub-delims
274
275<constant>char-set:uri-unreserved</constant>
276
277All unreserved characters that are allowed in a URI.
278  unreserved  =  ALPHA / DIGIT / "-" / "." / "_" / "~"
279
280Note that this is _not_ the complement of {{char-set:uri-reserved}}!
281There are several characters (even printable, noncontrol characters)
282which are not allowed at all in a URI.
283
284
285=== Requires
286
287* [[uri-generic]]
288* [[matchable]]
289* [[defstruct]]
290
291=== Version History
292
293* 1.2 re-exported {{uri-encode-string}}, {{uri-decode-string}} and the various charsets from uri-generic. Remove bogus charset encoding rules for fragments (fall back to normal uri encoding)
294* 1.1 Fixed x-www-form-urlencoded encoding so it encodes even characters that do not strictly need to be encoded according to the URI spec, but do according to the x-www-form-urlencoded spec.
295* 1.0 Fix a bug that caused empty lists to be treated differently from lists containing only false values in form-urlencode
296* 0.10 Fix urlencoded-separator first char selection in form-urlencode
297* 0.9 Automatically convert non-strings to strings in creating queries
298* 0.8 Actually export form-urlencoded-separator
299* 0.7 Fix silly bug in the predicates from 0.6 (it helps to test first...)
300* 0.6 Add predicates uri-path-relative? and uri-path-absolute?
301* 0.5 Add {{uri-default-port?}} predicate procedure
302* 0.4 Add {{uri->list}} conversion procedure
303* 0.3 Fix dependency info (requires at least uri-generic 2.1)
304* 0.2 Add predicates for URIs, absolute URIs and relative references, matching the ones in uri-generic.
305* 0.1 Initial Release
306
307=== License
308
309  Copyright 2008-2010 Peter Bex
310  All rights reserved.
311 
312  Redistribution and use in source and binary forms, with or without
313  modification, are permitted provided that the following conditions are
314  met:
315 
316  Redistributions of source code must retain the above copyright
317  notice, this list of conditions and the following disclaimer.
318 
319  Redistributions in binary form must reproduce the above copyright
320  notice, this list of conditions and the following disclaimer in the
321  documentation and/or other materials provided with the distribution.
322 
323  Neither the name of the author nor the names of its contributors may
324  be used to endorse or promote products derived from this software
325  without specific prior written permission.
326 
327  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
328  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
329  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
330  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
331  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
332  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
333  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
334  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
335  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
336  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
337  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
338  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.