source: project/wiki/eggref/4/uri-common @ 25793

Last change on this file since 25793 was 25793, checked in by sjamaan, 9 years ago

Update uri-generic and uri-common changelogs

File size: 13.7 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== uri-common
5
6=== Description
7
8The {{uri-common}} library provides simple and easy-to-use parsing
9and manipulation procedures for URIs using common schemes.
10
11These "common schemes" all have the following rules:
12
13* An empty path after the hostname is considered to be identical to the root path.
14* All components are to be fully URI-decoded (so no percent-encoded characters in it).
15* The query argument will be in
16   [[http://www.w3.org/TR/xforms/#structure-model-submission|application/x-www-form-urlencoded]] form.
17* The port is automatically determined if it is omitted and the URI scheme is known.
18
19=== Library Procedures
20
21This library replaces most of the procedures in [[uri-generic]]. If
22you need to work with URIs on the uri-generic level or need to work
23with both uri-generic and uri-common URI objects, you will have to
24import and prefix or rename procedures.
25
26==== Constructors and predicates
27
28These constructors fully decode their arguments, so afterwards it is
29impossible to distinguish between encoded delimiters and unencoded
30delimiters.  This makes uri-common objects decoding endpoints; no
31further decoding on the URI level is possible (of course, applications
32are free to decode further information inside the URI).  If for some
33reason, the original URI is still needed, it can be converted to a
34uri-generic.  However, updating a URI component causes this
35component's original encoding to be lost, so be careful!
36
37<procedure>(uri-reference STRING) => URI</procedure>
38
39A URI reference is either a URI or a relative reference (RFC 3986,
40Section 4.1).  If the given string's prefix does not match the syntax
41of a scheme followed by a colon separator, then the given string is
42parsed as a relative reference.
43
44<procedure>(absolute-uri STRING) => URI</procedure>
45
46Parses the given string as an absolute URI, in which no fragments are
47allowed.  If no URI scheme is found, or a fragment is detected, this
48raises an error.
49
50Absolute URIs are defined by RFC 3986 as non-relative URI references
51without a fragment (RFC 3986, Section 4.2).  Absolute URIs can be used
52as a base URI to resolve a relative-ref against, using
53{{uri-relative-to}} (see below).
54
55<procedure>(make-uri #!key authority scheme path query fragment host port username password) => URI</procedure>
56
57Constructs a URI from the given components.
58
59==== Accessors
60
61<procedure>(uri-scheme uri-common) => symbol</procedure><br>
62<procedure>(uri-path uri-common) => list</procedure><br>
63<procedure>(uri-query uri-common) => alist</procedure><br>
64<procedure>(uri-fragment uri-common) => string</procedure><br>
65<procedure>(uri-host uri-common) => string</procedure><br>
66<procedure>(uri-port uri-common) => integer</procedure><br>
67<procedure>(uri-username uri-common) => string</procedure><br>
68<procedure>(uri-password uri-common) => string</procedure><br>
69
70Accessors for {{URI-common}} objects.
71
72If a component is not defined in the given URI-common, then the
73corresponding accessor returns {{#f}}.
74
75==== Updater
76
77<procedure>(update-uri URI-common #!key scheme path query fragment host port username password) => URI-common</procedure>
78
79Update the specified keys in the URI-common object in a functional way
80(ie, it creates a new copy with the modifications).
81
82Here's a nice tip: If you want to create an URI with only a few components set to dynamic values extracted from elsewhere, you can generally create an empty URI and update its constituent parts.
83
84You can do that like this:
85
86<enscript highlight="scheme">
87(uri->string (update-uri (uri-reference "") path: '("example" "greeting") query: '((hi . "there"))))
88 => "example/greeting?hi=there"
89</enscript>
90
91==== Predicates
92
93There are several predicates to check whether objects are URI references (the most general type of an URI-like object), or more specific types of URIs like absolute URIs or relative references. The classification tree of URI-like objects looks a bit like this:
94
95                uri-reference                         Anything defined by the RFC fits this
96                /           \
97             uri             relative-ref             Scheme (uri) or no scheme (relative-ref)?
98             /               /        \
99      absolute-uri    path-relative   path-absolute   No URI fragment(absolute-uri)? | path starts with a slash (path-absolute) or not (path-relative)?
100
101
102<procedure>(uri-reference? URI) => BOOL</procedure>
103
104Is the given object a URI reference?  '''All objects created by
105URI-common constructors are URI references'''; they are either URIs
106or relative references.  The constructors below are just more strict
107checking versions of {{uri-reference}}.  They all create
108URI references.
109
110<procedure>(absolute-uri? URI) => BOOL</procedure>
111
112Is the given object an absolute URI?
113
114<procedure>(uri? URI) => BOOL</procedure>
115
116Is the given object a URI?  URIs are all URI references that include
117a scheme part.  The other type of URI references are relative
118references.
119
120<procedure>(relative-ref? URI) => BOOL</procedure>
121
122Is the given object a relative reference?  Relative references are
123defined by RFC 3986 as URI references which are not URIs; they contain
124no URI scheme and can be resolved against an absolute URI to obtain
125a complete URI using {{uri-relative-to}}.
126
127<procedure>(uri-path-absolute? URI) => BOOL</procedure>
128
129Is the {{URI}}'s path component an absolute path?
130
131<procedure>(uri-path-relative? URI) => BOOL</procedure>
132
133Is the {{URI}}'s path component a relative path?
134
135<procedure>(uri-default-port? URI) => BOOL</procedure>
136
137Is the {{URI}}'s port the default port for the {{URI}}'s scheme?
138
139==== Reference Resolution
140
141<procedure>(uri-relative-to URI URI) => URI</procedure>
142
143Resolve the first URI as a reference relative to the second URI,
144returning a new URI (RFC 3986, Section 5.2.2).
145
146<procedure>(uri-relative-from URI URI) => URI</procedure>
147
148Constructs a new, possibly relative, URI which represents the location
149of the first URI with respect to the second URI.
150
151<enscript highlight="scheme">
152(use uri-common)
153
154(uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/")))
155 => "http://example.com/foo/qux"
156
157(uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/")))
158 => "../qux"
159</enscript>
160
161==== Query encoding and decoding
162
163<parameter>(form-urlencoded-separator [char-set/char/string])</parameter><br>
164<procedure>(form-urlencode alist #!key (separator (form-urlencoded-separator))) => string</procedure><br>
165<procedure>(form-urldecode string #!key (separator (form-urlencoded-separator))) => alist</procedure><br>
166
167Encode or decode an alist using the encoding corresponding to the
168[[http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1|form-urlencoded]]
169media type, using the given separator character(s).
170
171The alist contains key/value pairs corresponding to the values in the
172final urlencoded string.  If a value is {{#f}}, the key will be
173'''omitted''' from the string.  If it is {{#t}} the key will be
174present without a value. In all other cases, the value is converted to
175a string and urlencoded.  The keys are always converted to a string
176and urlencoded.
177
178When encoding, if {{separator}} is a string, the first character will
179be used as the separator in the resulting querystring.  If it is a
180char-set, it will be converted to a string and its first character
181will be taken.  In either case, all of these characters are encoded if
182they occur inside the key/value pairs.
183
184When decoding, any character in the set (or string) will be seen as
185a separator.
186
187The separator defaults to the string {{";&"}}.  This means that
188either semicolons or ampersands are allowed as separators when decoding
189an URI string, but semicolons are used when generating strings.
190
191If you would like to use a different separator, you should parameterize
192''all'' calls to procedures that return an uri-common object.
193
194Examples:
195
196<enscript highlight=scheme>
197(form-urlencode '(("lemon" . "ade") (sucks . #f) (rocks . #t) (number . 42)))
198=> "lemon=ade;rocks;number=42"
199
200(form-urldecode "lemon=ade;rocks;number=42")
201=> ((lemon . "ade") (rocks . #t) (number . "42"))
202</enscript>
203
204==== String encoding and decoding
205
206A little more generic but also more low-level than encoding/decoding whole
207query strings/alists at a time, you can also encode and decode strings
208on an individual level.
209
210<procedure>(uri-encode-string STRING [CHAR-SET]) => STRING</procedure>
211
212Returns the percent-encoded form of the given string.  The optional
213char-set argument controls which characters should be encoded.
214It defaults to the complement of {{char-set:uri-unreserved}}. This is
215always safe, but often overly careful; it is allowed to leave certain
216characters unquoted depending on the context.
217
218<procedure>(uri-decode-string STRING [CHAR-SET]) => STRING</procedure>
219
220Returns the decoded form of the given string.  The optional char-set
221argument controls which characters should be decoded.  It defaults to
222{{char-set:full}}.
223
224
225==== Normalization 
226
227<procedure>(uri-normalize-case URI) => URI</procedure>
228
229URI case normalization (RFC 3986 section 6.2.2.1)
230
231<procedure>(uri-normalize-path-segments URI) => URI</procedure>
232
233URI path segment normalization (RFC 3986 section 6.2.2.3)
234
235==== uri-generic, string and list representation
236
237<procedure>(uri->uri-generic uri-common) => uri-generic</procedure><br>
238<procedure>(uri-generic->uri uri-common) => uri-common</procedure>
239
240To convert between uri-generic and uri-common objects, use these
241procedures.  As stated above, this will allow you to retrieve the
242original encoding of the URI components, but once you update a
243component from the uri-common side, the original encoding is no longer
244available (the updated value replaces the original value).
245
246<procedure>(uri->string uri-common [userinfo]) => string</procedure>
247
248Reconstructs the given URI into a string; uses a supplied function
249{{LAMBDA USERNAME PASSWORD -> STRING}} to map the userinfo part of the
250URI.  If not given, it represents the userinfo as the username followed
251by {{":******"}}.
252
253<procedure>(uri->list URI USERINFO) => LIST</procedure>
254
255Returns a list of the form {{(SCHEME SPECIFIC FRAGMENT)}};
256{{SPECIFIC}} is of the form {{(AUTHORITY PATH QUERY)}}.
257
258==== Character sets
259
260As a convenience for further sub-parsers or other special-purpose URI
261handling code like separately URI-encoding strings, there are a couple
262of character sets exported by uri-common.
263
264<constant>char-set:gen-delims</constant>
265
266Generic delimiters.
267  gen-delims  =  ":" / "/" / "?" / "#" / "[" / "]" / "@"
268
269<constant>char-set:sub-delims</constant>
270
271Sub-delimiters.
272  sub-delims  =  "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
273
274<constant>char-set:uri-reserved</constant>
275
276The union of {{gen-delims}} and {{sub-delims}}; all reserved URI characters.
277  reserved    =  gen-delims / sub-delims
278
279<constant>char-set:uri-unreserved</constant>
280
281All unreserved characters that are allowed in a URI.
282  unreserved  =  ALPHA / DIGIT / "-" / "." / "_" / "~"
283
284Note that this is _not_ the complement of {{char-set:uri-reserved}}!
285There are several characters (even printable, noncontrol characters)
286which are not allowed at all in a URI.
287
288
289=== Requires
290
291* [[uri-generic]]
292* [[matchable]]
293* [[defstruct]]
294
295=== Version History
296
297* 1.3 Added {{make-uri}} constructor.
298* 1.2 re-exported {{uri-encode-string}}, {{uri-decode-string}} and the various charsets from uri-generic. Remove bogus charset encoding rules for fragments (fall back to normal uri encoding)
299* 1.1 Fixed x-www-form-urlencoded encoding so it encodes even characters that do not strictly need to be encoded according to the URI spec, but do according to the x-www-form-urlencoded spec.
300* 1.0 Fix a bug that caused empty lists to be treated differently from lists containing only false values in form-urlencode
301* 0.10 Fix urlencoded-separator first char selection in form-urlencode
302* 0.9 Automatically convert non-strings to strings in creating queries
303* 0.8 Actually export form-urlencoded-separator
304* 0.7 Fix silly bug in the predicates from 0.6 (it helps to test first...)
305* 0.6 Add predicates uri-path-relative? and uri-path-absolute?
306* 0.5 Add {{uri-default-port?}} predicate procedure
307* 0.4 Add {{uri->list}} conversion procedure
308* 0.3 Fix dependency info (requires at least uri-generic 2.1)
309* 0.2 Add predicates for URIs, absolute URIs and relative references, matching the ones in uri-generic.
310* 0.1 Initial Release
311
312=== License
313
314  Copyright 2008-2012 Peter Bex
315  All rights reserved.
316 
317  Redistribution and use in source and binary forms, with or without
318  modification, are permitted provided that the following conditions are
319  met:
320 
321  Redistributions of source code must retain the above copyright
322  notice, this list of conditions and the following disclaimer.
323 
324  Redistributions in binary form must reproduce the above copyright
325  notice, this list of conditions and the following disclaimer in the
326  documentation and/or other materials provided with the distribution.
327 
328  Neither the name of the author nor the names of its contributors may
329  be used to endorse or promote products derived from this software
330  without specific prior written permission.
331 
332  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
333  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
334  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
335  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
336  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
337  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
338  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
339  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
340  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
341  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
342  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
343  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.