source: project/wiki/eggref/5/uri-common @ 35628

Last change on this file since 35628 was 35628, checked in by sjamaan, 17 months ago

Add uri-common to eggref/5

File size: 14.1 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== uri-common
5
6=== Description
7
8The {{uri-common}} library provides simple and easy-to-use parsing
9and manipulation procedures for URIs using common schemes.
10
11These "common schemes" all have the following rules:
12
13* An empty path after the hostname is considered to be identical to the root path.
14* All components are to be fully URI-decoded (so no percent-encoded characters in it).
15* The query argument will be in
16   [[http://www.w3.org/TR/xforms/#structure-model-submission|application/x-www-form-urlencoded]] form.
17* The port is automatically determined if it is omitted and the URI scheme is known.
18
19=== Library Procedures
20
21This library replaces most of the procedures in [[uri-generic]]. If
22you need to work with URIs on the uri-generic level or need to work
23with both uri-generic and uri-common URI objects, you will have to
24import and prefix or rename procedures.
25
26==== Constructors and predicates
27
28These constructors fully decode their arguments, so afterwards it is
29impossible to distinguish between encoded delimiters and unencoded
30delimiters.  This makes uri-common objects decoding endpoints; no
31further decoding on the URI level is possible (of course, applications
32are free to decode further information inside the URI).  If for some
33reason, the original URI is still needed, it can be converted to a
34uri-generic.  However, updating a URI component causes this
35component's original encoding to be lost, so be careful!
36
37<procedure>(uri-reference STRING) => URI</procedure>
38
39A URI reference is either a URI or a relative reference (RFC 3986,
40Section 4.1).  If the given string's prefix does not match the syntax
41of a scheme followed by a colon separator, then the given string is
42parsed as a relative reference.
43
44<procedure>(absolute-uri STRING) => URI</procedure>
45
46Parses the given string as an absolute URI, in which no fragments are
47allowed.  If no URI scheme is found, or a fragment is detected, this
48raises an error.
49
50Absolute URIs are defined by RFC 3986 as non-relative URI references
51without a fragment (RFC 3986, Section 4.2).  Absolute URIs can be used
52as a base URI to resolve a relative-ref against, using
53{{uri-relative-to}} (see below).
54
55<procedure>(make-uri #!key scheme path query fragment host port username password) => URI</procedure>
56
57Constructs a URI from the given components.
58
59==== Accessors
60
61<procedure>(uri-scheme uri-common) => symbol</procedure><br>
62<procedure>(uri-path uri-common) => list</procedure><br>
63<procedure>(uri-query uri-common) => alist</procedure><br>
64<procedure>(uri-fragment uri-common) => string</procedure><br>
65<procedure>(uri-host uri-common) => string</procedure><br>
66<procedure>(uri-port uri-common) => integer</procedure><br>
67<procedure>(uri-username uri-common) => string</procedure><br>
68<procedure>(uri-password uri-common) => string</procedure><br>
69
70Accessors for {{URI-common}} objects.
71
72If a component is not defined in the given URI-common, then the
73corresponding accessor returns {{#f}}, except for {{uri-query}} and
74{{uri-path}}, which both ''always'' return a (possibly empty) list.
75
76==== Updater
77
78<procedure>(update-uri URI-common #!key scheme path query fragment host port username password) => URI-common</procedure>
79
80Update the specified keys in the URI-common object in a functional way
81(ie, it creates a new copy with the modifications).
82
83Here's a nice tip: If you want to create an URI with only a few components set to dynamic values extracted from elsewhere, you can generally create an empty URI and update its constituent parts.
84
85You can do that like this:
86
87<enscript highlight="scheme">
88(uri->string (update-uri (uri-reference "") path: '("example" "greeting") query: '((hi . "there"))))
89 => "example/greeting?hi=there"
90</enscript>
91
92==== Predicates
93
94There are several predicates to check whether objects are URI references (the most general type of an URI-like object), or more specific types of URIs like absolute URIs or relative references. The classification tree of URI-like objects looks a bit like this:
95
96                uri-reference                         Anything defined by the RFC fits this
97                /           \
98             uri             relative-ref             Scheme (uri) or no scheme (relative-ref)?
99             /               /        \
100      absolute-uri    path-relative   path-absolute   No URI fragment(absolute-uri)? | path starts with a slash (path-absolute) or not (path-relative)?
101
102
103<procedure>(uri-reference? URI) => BOOL</procedure>
104
105Is the given object a URI reference?  '''All objects created by
106URI-common constructors are URI references'''; they are either URIs
107or relative references.  The constructors below are just more strict
108checking versions of {{uri-reference}}.  They all create
109URI references.
110
111<procedure>(absolute-uri? URI) => BOOL</procedure>
112
113Is the given object an absolute URI?
114
115<procedure>(uri? URI) => BOOL</procedure>
116
117Is the given object a URI?  URIs are all URI references that include
118a scheme part.  The other type of URI references are relative
119references.
120
121<procedure>(relative-ref? URI) => BOOL</procedure>
122
123Is the given object a relative reference?  Relative references are
124defined by RFC 3986 as URI references which are not URIs; they contain
125no URI scheme and can be resolved against an absolute URI to obtain
126a complete URI using {{uri-relative-to}}.
127
128<procedure>(uri-path-absolute? URI) => BOOL</procedure>
129
130Is the {{URI}}'s path component an absolute path?
131
132<procedure>(uri-path-relative? URI) => BOOL</procedure>
133
134Is the {{URI}}'s path component a relative path?
135
136<procedure>(uri-default-port? URI) => BOOL</procedure>
137
138Is the {{URI}}'s port the default port for the {{URI}}'s scheme?
139
140==== Reference Resolution
141
142<procedure>(uri-relative-to URI URI) => URI</procedure>
143
144Resolve the first URI as a reference relative to the second URI,
145returning a new URI (RFC 3986, Section 5.2.2).
146
147<procedure>(uri-relative-from URI URI) => URI</procedure>
148
149Constructs a new, possibly relative, URI which represents the location
150of the first URI with respect to the second URI.
151
152<enscript highlight="scheme">
153(import uri-common)
154
155(uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/")))
156 => "http://example.com/foo/qux"
157
158(uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/")))
159 => "../qux"
160</enscript>
161
162==== Query encoding and decoding
163
164<parameter>(form-urlencoded-separator [char-set/char/string])</parameter><br>
165<procedure>(form-urlencode alist #!key (separator (form-urlencoded-separator))) => string</procedure><br>
166<procedure>(form-urldecode string #!key (separator (form-urlencoded-separator))) => alist</procedure><br>
167
168Encode or decode an alist using the encoding corresponding to the
169[[http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1|form-urlencoded]]
170media type, using the given separator character(s).
171
172The alist contains key/value pairs corresponding to the values in the
173final urlencoded string.  If a value is {{#f}}, the key will be
174'''omitted''' from the string.  If it is {{#t}} the key will be
175present without a value. In all other cases, the value is converted to
176a string and urlencoded.  The keys are always converted to a string
177and urlencoded.
178
179When encoding, if {{separator}} is a string, the first character will
180be used as the separator in the resulting querystring.  If it is a
181char-set, it will be converted to a string and its first character
182will be taken.  In either case, all of these characters are encoded if
183they occur inside the key/value pairs.
184
185When decoding, any character in the set (or string) will be seen as
186a separator.
187
188The separator defaults to the string {{";&"}}.  This means that
189either semicolons or ampersands are allowed as separators when decoding
190an URI string, but semicolons are used when generating strings.
191
192If you would like to use a different separator, you should parameterize
193''all'' calls to procedures that return an uri-common object.
194
195Examples:
196
197<enscript highlight=scheme>
198(form-urlencode '(("lemon" . "ade") (sucks . #f) (rocks . #t) (number . 42)))
199=> "lemon=ade;rocks;number=42"
200
201(form-urldecode "lemon=ade;rocks;number=42")
202=> ((lemon . "ade") (rocks . #t) (number . "42"))
203</enscript>
204
205==== String encoding and decoding
206
207A little more generic but also more low-level than encoding/decoding whole
208query strings/alists at a time, you can also encode and decode strings
209on an individual level.
210
211<procedure>(uri-encode-string STRING [CHAR-SET]) => STRING</procedure>
212
213Returns the percent-encoded form of the given string.  The optional
214char-set argument controls which characters should be encoded.
215It defaults to the complement of {{char-set:uri-unreserved}}. This is
216always safe, but often overly careful; it is allowed to leave certain
217characters unquoted depending on the context.
218
219<procedure>(uri-decode-string STRING [CHAR-SET]) => STRING</procedure>
220
221Returns the decoded form of the given string.  The optional char-set
222argument controls which characters should be decoded.  It defaults to
223{{char-set:full}}.
224
225
226==== Normalization 
227
228<procedure>(uri-normalize-case URI) => URI</procedure>
229
230URI case normalization (RFC 3986 section 6.2.2.1)
231
232<procedure>(uri-normalize-path-segments URI) => URI</procedure>
233
234URI path segment normalization (RFC 3986 section 6.2.2.3)
235
236==== uri-generic, string and list representation
237
238<procedure>(uri->uri-generic uri-common) => uri-generic</procedure><br>
239<procedure>(uri-generic->uri uri-common) => uri-common</procedure>
240
241To convert between uri-generic and uri-common objects, use these
242procedures.  As stated above, this will allow you to retrieve the
243original encoding of the URI components, but once you update a
244component from the uri-common side, the original encoding is no longer
245available (the updated value replaces the original value).
246
247<procedure>(uri->string uri-common [userinfo]) => string</procedure>
248
249Reconstructs the given URI into a string; uses a supplied function
250{{LAMBDA USERNAME PASSWORD -> STRING}} to map the userinfo part of the
251URI.  If not given, it represents the userinfo as the username followed
252by {{":******"}}.
253
254<procedure>(uri->list URI USERINFO) => LIST</procedure>
255
256Returns a list of the form {{(SCHEME SPECIFIC FRAGMENT)}};
257{{SPECIFIC}} is of the form {{(AUTHORITY PATH QUERY)}}.
258
259==== Character sets
260
261As a convenience for further sub-parsers or other special-purpose URI
262handling code like separately URI-encoding strings, there are a couple
263of character sets exported by uri-common.
264
265<constant>char-set:gen-delims</constant>
266
267Generic delimiters.
268  gen-delims  =  ":" / "/" / "?" / "#" / "[" / "]" / "@"
269
270<constant>char-set:sub-delims</constant>
271
272Sub-delimiters.
273  sub-delims  =  "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
274
275<constant>char-set:uri-reserved</constant>
276
277The union of {{gen-delims}} and {{sub-delims}}; all reserved URI characters.
278  reserved    =  gen-delims / sub-delims
279
280<constant>char-set:uri-unreserved</constant>
281
282All unreserved characters that are allowed in a URI.
283  unreserved  =  ALPHA / DIGIT / "-" / "." / "_" / "~"
284
285Note that this is _not_ the complement of {{char-set:uri-reserved}}!
286There are several characters (even printable, noncontrol characters)
287which are not allowed at all in a URI.
288
289
290=== Requires
291
292* [[srfi-1]]
293* [[srfi-13]]
294* [[srfi-14]]
295* [[defstruct]]
296* [[matchable]]
297* [[uri-generic]]
298
299=== Version History
300
301* 2.0 Port to CHICKEN 5
302* 1.4 Do not reset the port when switching schemes, but keep it around so when it's the default for a scheme, it won't be printed, and when switching to another scheme it will be printed again if it's not the default port for this scheme. This makes the interface more composable and less surprising (reported by Kristian Lein-Mathisen).
303* 1.3 Added {{make-uri}} constructor.
304* 1.2 re-exported {{uri-encode-string}}, {{uri-decode-string}} and the various charsets from uri-generic. Remove bogus charset encoding rules for fragments (fall back to normal uri encoding)
305* 1.1 Fixed x-www-form-urlencoded encoding so it encodes even characters that do not strictly need to be encoded according to the URI spec, but do according to the x-www-form-urlencoded spec.
306* 1.0 Fix a bug that caused empty lists to be treated differently from lists containing only false values in form-urlencode
307* 0.10 Fix urlencoded-separator first char selection in form-urlencode
308* 0.9 Automatically convert non-strings to strings in creating queries
309* 0.8 Actually export form-urlencoded-separator
310* 0.7 Fix silly bug in the predicates from 0.6 (it helps to test first...)
311* 0.6 Add predicates uri-path-relative? and uri-path-absolute?
312* 0.5 Add {{uri-default-port?}} predicate procedure
313* 0.4 Add {{uri->list}} conversion procedure
314* 0.3 Fix dependency info (requires at least uri-generic 2.1)
315* 0.2 Add predicates for URIs, absolute URIs and relative references, matching the ones in uri-generic.
316* 0.1 Initial Release
317
318=== License
319
320  Copyright 2008-2018 Peter Bex
321  All rights reserved.
322 
323  Redistribution and use in source and binary forms, with or without
324  modification, are permitted provided that the following conditions are
325  met:
326 
327  Redistributions of source code must retain the above copyright
328  notice, this list of conditions and the following disclaimer.
329 
330  Redistributions in binary form must reproduce the above copyright
331  notice, this list of conditions and the following disclaimer in the
332  documentation and/or other materials provided with the distribution.
333 
334  Neither the name of the author nor the names of its contributors may
335  be used to endorse or promote products derived from this software
336  without specific prior written permission.
337 
338  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
339  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
340  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
341  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
342  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
343  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
344  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
345  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
346  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
347  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
348  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
349  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.