source: project/wiki/eggref/4/uri-generic @ 25290

Last change on this file since 25290 was 25290, checked in by Ivan Raikov, 10 years ago

uri-generic doc update

File size: 11.5 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== uri-generic
5
6=== Description
7
8The {{uri-generic}} library contains procedures for parsing and
9manipulation of Uniform Resource Identifiers
10([[http://tools.ietf.org/html/rfc3986|RFC 3986]]). It is intended to
11conform more closely to the RFC, and uses combinator parsing and
12character classes rather than regular expressions.
13
14This library should be considered to be a ''basis'' for creating
15scheme-specific URI parser libraries. This library only parses
16the generic components from a URI.  Any specific library can
17further parse subcomponents. For this reason, encoding and decoding
18of percent-encoded characters is not done automatically.
19This should be handled by specific URI scheme implementations.
20
21=== Library Procedures
22
23==== Constructors and predicates
24
25As specified in section 2.3 of RFC 3986, URI constructors
26automatically decode percent-encoded octets in the range of unreserved
27characters. This means that the following holds true:
28
29 (equal? (uri-reference "http://example.com/foo-bar")
30         (uri-reference "http://example.com/foo%2Dbar"))  => #t
31
32<procedure>(uri-reference STRING) => URI</procedure>
33
34A URI reference is either a URI or a relative reference (RFC 3986,
35Section 4.1).  If the given string's prefix does not match the syntax
36of a scheme followed by a colon separator, then the given string is
37parsed as a relative reference.
38
39<procedure>(uri-reference? URI) => BOOL</procedure>
40
41Is the given object a URI reference?  '''All objects created by
42URI-generic constructors are URI references'''; they are either URIs
43or relative references.  The constructors below are just more strict
44checking versions of {{uri-reference}}.  They all create
45URI references.
46
47<procedure>(absolute-uri STRING) => URI</procedure>
48
49Parses the given string as an absolute URI, in which no fragments are
50allowed.  If no URI scheme is found, or a fragment is detected, this
51raises an error.
52
53Absolute URIs are defined by RFC 3986 as non-relative URI references
54without a fragment (RFC 3986, Section 4.2).  Absolute URIs can be used
55as a base URI to resolve a relative-ref against, using
56{{uri-relative-to}} (see below).
57
58<procedure>(make-uri #!key authority scheme path query fragment host port username password) => URI</procedure>
59
60Constructs a URI from the given components.
61
62<procedure>(absolute-uri? URI) => BOOL</procedure>
63
64Is the given object an absolute URI?
65
66<procedure>(uri? URI) => BOOL</procedure>
67
68Is the given object a URI?  URIs are all URI references that include
69a scheme part.  The other type of URI references are relative
70references.
71
72<procedure>(relative-ref? URI) => BOOL</procedure>
73
74Is the given object a relative reference?  Relative references are
75defined by RFC 3986 as URI references which are not URIs; they contain
76no URI scheme and can be resolved against an absolute URI to obtain
77a complete URI using {{uri-relative-to}}.
78
79<procedure>(uri-path-absolute? URI) => BOOL</procedure>
80
81Is the {{URI}}'s path component an absolute path?
82
83<procedure>(uri-path-relative? URI) => BOOL</procedure>
84
85Is the {{URI}}'s path component a relative path?
86
87==== Attribute accessors
88
89<procedure>(uri-authority URI) => URI-AUTH</procedure><br>
90<procedure>(uri-scheme URI) => SYMBOL</procedure><br>
91<procedure>(uri-path URI) => LIST</procedure><br>
92<procedure>(uri-query URI) => STRING</procedure><br>
93<procedure>(uri-fragment) URI => STRING</procedure><br>
94<procedure>(uri-host URI) => STRING</procedure><br>
95<procedure>(uri-port URI) => INTEGER</procedure><br>
96<procedure>(uri-username URI) => STRING</procedure><br>
97<procedure>(uri-password URI) => STRING</procedure><br>
98<procedure>(authority? URI-AUTH) => BOOL</procedure><br>
99<procedure>(authority-host URI-AUTH) => STRING</procedure><br>
100<procedure>(authority-port URI-AUTH) => INTEGER</procedure><br>
101<procedure>(authority-username URI-AUTH) => STRING</procedure><br>
102<procedure>(authority-password URI-AUTH) => STRING</procedure><br>
103
104If a component is not defined in the given URI, then the corresponding
105accessor returns {{#f}}.
106
107<procedure>(update-uri URI #!key authority scheme path query fragment host port username password) => URI</procedure><br>
108<procedure>(update-authority URI-AUTH #!key host port username password) => URI</procedure><br>
109
110Update the specified keys in the URI or URI-AUTH object in a
111functional way (ie, it creates a new copy with the modifications).
112
113==== String and List Representations
114
115<procedure>(uri->string URI [USERINFO]) => STRING</procedure>
116
117Reconstructs the given URI into a string; uses a supplied function
118{{LAMBDA USERNAME PASSWORD -> STRING}} to map the userinfo part of the
119URI.  If not given, it represents the userinfo as the username followed
120by {{":******"}}.
121
122<procedure>(uri->list URI USERINFO) => LIST</procedure>
123
124Returns a list of the form {{(SCHEME SPECIFIC FRAGMENT)}};
125{{SPECIFIC}} is of the form {{(AUTHORITY PATH QUERY)}}.
126
127==== Reference Resolution
128
129<procedure>(uri-relative-to URI URI) => URI</procedure>
130
131Resolve the first URI as a reference relative to the second URI,
132returning a new URI (RFC 3986, Section 5.2.2).
133
134<procedure>(uri-relative-from URI URI) => URI</procedure>
135
136Constructs a new, possibly relative, URI which represents the location
137of the first URI with respect to the second URI.
138
139<enscript highlight="scheme">
140(use uri-generic)
141(uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/")))
142 => "http://example.com/foo/qux"
143
144(uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/")))
145 => "../qux"
146</enscript>
147
148==== String encoding and decoding
149
150<procedure>(uri-encode-string STRING [CHAR-SET]) => STRING</procedure>
151
152Returns the percent-encoded form of the given string.  The optional
153char-set argument controls which characters should be encoded.
154It defaults to the complement of {{char-set:uri-unreserved}}. This is
155always safe, but often overly careful; it is allowed to leave certain
156characters unquoted depending on the context.
157
158<procedure>(uri-decode-string STRING [CHAR-SET]) => STRING</procedure>
159
160Returns the decoded form of the given string.  The optional char-set
161argument controls which characters should be decoded.  It defaults to
162{{char-set:full}}.
163
164
165==== Normalization 
166
167<procedure>(uri-normalize-case URI) => URI</procedure>
168
169URI case normalization (RFC 3986 section 6.2.2.1)
170
171<procedure>(uri-normalize-path-segments URI) => URI</procedure>
172
173URI path segment normalization (RFC 3986 section 6.2.2.3)
174
175
176==== Character sets
177
178As a convenience for sub-parsers or other special-purpose URI handling
179code, there are a couple of character sets exported by uri-generic.
180
181<constant>char-set:gen-delims</constant>
182
183Generic delimiters.
184  gen-delims  =  ":" / "/" / "?" / "#" / "[" / "]" / "@"
185
186<constant>char-set:sub-delims</constant>
187
188Sub-delimiters.
189  sub-delims  =  "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
190
191<constant>char-set:uri-reserved</constant>
192
193The union of {{gen-delims}} and {{sub-delims}}; all reserved URI characters.
194  reserved    =  gen-delims / sub-delims
195
196<constant>char-set:uri-unreserved</constant>
197
198All unreserved characters that are allowed in a URI.
199  unreserved  =  ALPHA / DIGIT / "-" / "." / "_" / "~"
200
201Note that this is _not_ the complement of {{char-set:uri-reserved}}!
202There are several characters (even printable, noncontrol characters)
203which are not allowed at all in a URI.
204
205
206=== Requires
207
208* [[matchable]]
209* [[defstruct]]
210
211=== Version History
212
213* 2.36 Added procedure make-uri
214* 2.35 Added some extra checks so we do not try to parse URIs containing invalid (non-hexnum) percent-encoding.  Add code to preserve empty path segments during parsing and when performing relative reference resolution.
215* 2.34 Fix two bugs that show up in very rare cases (possibly never in practice). One caused issues when creating relative paths from two URIs where one URI had a path that was a prefix of the other, the other caused issues when a relative URI's path containing ".." as last component was resolved.
216* 2.33 Path component for empty absolute path directly followed by query is now represented the same as empty path without query.
217* 2.32 Empty absolute path directly followed by query is now properly recognised as an URI reference.
218* 2.31 Return {{#f}} in constructors if unconsumed input remains after parsing
219* 2.3 Add predicates uri-path-relative? and uri-path-absolute?
220* 2.2 Improvements to uri->string.
221* 2.1 Add new predicates for URIs, absolute URIs and relative references. Fix absolute-uri so it raises a condition when passing in a non-absolute uri string, instead of returning a string with the error. Also throw an error if a fragment is detected in the string.
222* 2.0 Export char-sets, add char-set arg to uri-encode/uri-decode,
223       do not decode query args as x-www-form-urlencoded, change path
224       representation.  Lots of bugfixes.
225* 1.12 Fix relative path normalization when original path ends in a slash, remove consecutive slashes from paths in URIs
226* 1.11 Added accessors for the authority components, functional update procedures. Fixed case-normalization.
227* 1.10 Fixed edge case in {{uri-relative-to}} with empty path in base uri,
228       fixed {{uri->string}} for URIs with query args, fixed {{uri->string}}
229       to not add an extraneous slash after authority in case of empty path.
230* 1.9 Fixed bug in uri-encode-string with reserved characters, added
231      tests for decoding and encoding [Peter Bex]
232* 1.8 Added uri-encode-string and uri-decode-string.
233      URI constructors now perform automatic normalization
234      of percent-encoded unreserved characters. [suggested by Peter Bex]
235* 1.6 Added error message about missing scheme in absolute-uri.
236* trunk Small bugfix in absolute-uri. [Peter Bex]
237* 1.5 Bug fixes in uri->string and absolute-uri. [reported by Peter Bex]
238* 1.3 Ported to Hygienic Chicken and the [[test]] egg [Peter Bex]
239* 1.2 Now using defstruct instead of define-record [suggested by Peter Bex]
240* 1.1 Added utf8 compatibility
241* 1.0 Initial Release
242
243=== License
244
245Based on the
246[[http://www.ninebynine.org/Software/ReadMe-URI-Haskell.txt|Haskell
247URI library]] by Graham Klyne <gk@ninebynine.org>.
248
249
250  Copyright 2008-2011 Ivan Raikov, Peter Bex.
251  All rights reserved.
252 
253  Redistribution and use in source and binary forms, with or without
254  modification, are permitted provided that the following conditions are
255  met:
256 
257  Redistributions of source code must retain the above copyright
258  notice, this list of conditions and the following disclaimer.
259 
260  Redistributions in binary form must reproduce the above copyright
261  notice, this list of conditions and the following disclaimer in the
262  documentation and/or other materials provided with the distribution.
263 
264  Neither the name of the author nor the names of its contributors may
265  be used to endorse or promote products derived from this software
266  without specific prior written permission.
267 
268  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
269  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
270  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
271  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
272  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
273  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
274  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
275  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
276  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
277  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
278  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
279  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.