source: project/wiki/eggref/4/uri-generic @ 13231

Last change on this file since 13231 was 13231, checked in by sjamaan, 11 years ago

Document latest changes, move documentation around a bit so it's clearer (IMHO)

File size: 10.2 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== uri-generic
5
6=== Description
7
8The {{uri-generic}} library contains procedures for parsing and
9manipulation of Uniform Resource Identifiers
10([[http://tools.ietf.org/html/rfc3986|RFC 3986]]). It is intended to
11conform more closely to the RFC, and uses combinator parsing and
12character classes rather than regular expressions.
13
14This library should be considered to be a ''basis'' for creating
15scheme-specific URI parser libraries. This library only parses
16the generic components from a URI.  Any specific library can
17further parse subcomponents. For this reason, encoding and decoding
18of percent-encoded characters is not done automatically.
19This should be handled by specific URI scheme implementations.
20
21=== Library Procedures
22
23==== Constructors and predicates
24
25As specified in section 2.3 of RFC 3986, URI constructors
26automatically decode percent-encoded octets in the range of unreserved
27characters. This means that the following holds true:
28
29 (equal? (uri-reference "http://example.com/foo-bar")
30         (uri-reference "http://example.com/foo%2Dbar"))  => #t
31
32<procedure>(uri-reference STRING) => URI</procedure>
33
34A URI reference is either a URI or a relative reference (RFC 3986,
35Section 4.1).  If the given string's prefix does not match the syntax
36of a scheme followed by a colon separator, then the given string is
37parsed as a relative reference.
38
39<procedure>(uri-reference? URI) => BOOL</procedure>
40
41Is the given object a URI reference?  '''All objects created by
42URI-generic constructors are URI references'''; they are either URIs
43or relative references.  The constructors below are just more strict
44checking versions of {{uri-reference}}.  They all create
45URI references.
46
47<procedure>(absolute-uri STRING) => URI</procedure>
48
49Parses the given string as an absolute URI, in which no fragments are
50allowed.  If no URI scheme is found, or a fragment is detected, this
51raises an error.
52
53Absolute URIs are defined by RFC 3986 as non-relative URI references
54without a fragment (RFC 3986, Section 4.2).  Absolute URIs can be used
55as a base URI to resolve a relative-ref against, using
56{{uri-relative-to}} (see below).
57
58<procedure>(absolute-uri? URI) => BOOL</procedure>
59
60Is the given object an absolute URI?
61
62<procedure>(uri? URI) => BOOL</procedure>
63
64Is the given object a URI?  URIs are all URI references that include
65a scheme part.  The other type of URI references are relative
66references.
67
68<procedure>(relative-ref? URI) => BOOL</procedure>
69
70Is the given object a relative reference?  Relative references are
71defined by RFC 3986 as URI references which are not URIs; they contain
72no URI scheme and can be resolved against an absolute URI to obtain
73a complete URI using {{uri-relative-to}}.
74
75==== Attribute accessors
76
77* <procedure>(uri-authority URI) => URI-AUTH</procedure>
78* <procedure>(uri-scheme URI) => SYMBOL</procedure>
79* <procedure>(uri-path URI) => LIST</procedure>
80* <procedure>(uri-query URI) => STRING</procedure>
81* <procedure>(uri-fragment) URI => STRING</procedure>
82* <procedure>(uri-host URI) => STRING</procedure>
83* <procedure>(uri-port URI) => INTEGER</procedure>
84* <procedure>(uri-username URI) => STRING</procedure>
85* <procedure>(uri-password URI) => STRING</procedure>
86* <procedure>(authority? URI-AUTH) => BOOL</procedure>
87* <procedure>(authority-host URI-AUTH) => STRING</procedure>
88* <procedure>(authority-port URI-AUTH) => INTEGER</procedure>
89* <procedure>(authority-username URI-AUTH) => STRING</procedure>
90* <procedure>(authority-password URI-AUTH) => STRING</procedure>
91
92If a component is not defined in the given URI, then the corresponding
93accessor returns {{#f}}.
94
95* <procedure>(update-uri URI #!key authority scheme path query fragment host port username password) => URI</procedure>
96* <procedure>(update-authority URI-AUTH #!key host port username password) => URI</procedure>
97
98Update the specified keys in the URI or URI-AUTH object in a
99functional way (ie, it creates a new copy with the modifications).
100
101==== String and List Representations
102
103<procedure>(uri->string URI USERINFO) => STRING</procedure>
104
105Reconstructs the given URI into a string; uses a supplied function
106{{LAMBDA USERNAME PASSWORD -> STRING}} to map the userinfo part of the
107URI
108
109<procedure>(uri->list URI USERINFO) => LIST</procedure>
110
111Returns a list of the form {{(SCHEME SPECIFIC FRAGMENT)}};
112{{SPECIFIC}} is of the form {{(AUTHORITY PATH QUERY)}}.
113
114==== Reference Resolution
115
116<procedure>(uri-relative-to URI URI) => URI</procedure>
117
118Resolve the first URI as a reference relative to the second URI,
119returning a new URI (RFC 3986, Section 5.2.2).
120
121<procedure>(uri-relative-from URI URI) => URI</procedure>
122
123Constructs a new, possibly relative, URI which represents the location
124of the first URI with respect to the second URI.
125
126<examples>
127<example>
128<init>(use uri-generic)</init>
129<expr>(uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/")))</expr>
130<result>"http://example.com/foo/qux"</result>
131</example>
132<example>
133<init>(use uri-generic)</init>
134<expr>(uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/")))</expr>
135<result>"../qux"</result>
136</example>
137</examples>
138
139==== String encoding and decoding
140
141<procedure>(uri-encode-string STRING [CHAR-SET]) => STRING</procedure>
142
143Returns the percent-encoded form of the given string.  The optional
144char-set argument controls which characters should be encoded.
145It defaults to the complement of {{char-set:uri-unreserved}}. This is
146always safe, but often overly careful; it is allowed to leave certain
147characters unquoted depending on the context.
148
149<procedure>(uri-decode-string STRING [CHAR-SET]) => STRING</procedure>
150
151Returns the decoded form of the given string.  The optional char-set
152argument controls which characters should be decoded.  It defaults to
153{{char-set:full}}.
154
155
156==== Normalization 
157
158<procedure>(uri-normalize-case URI) => URI</procedure>
159
160URI case normalization (RFC 3986 section 6.2.2.1)
161
162<procedure>(uri-normalize-path-segments URI) => URI</procedure>
163
164URI path segment normalization (RFC 3986 section 6.2.2.3)
165
166
167==== Character sets
168
169As a convenience for sub-parsers or other special-purpose URI handling
170code, there are a couple of character sets exported by uri-generic.
171
172<constant>char-set:gen-delims</constant>
173
174Generic delimiters.
175  gen-delims  =  ":" / "/" / "?" / "#" / "[" / "]" / "@"
176
177<constant>char-set:sub-delims</constant>
178
179Sub-delimiters.
180  sub-delims  =  "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
181
182<constant>char-set:uri-reserved</constant>
183
184The union of {{gen-delims}} and {{sub-delims}}; all reserved URI characters.
185  reserved    =  gen-delims / sub-delims
186
187<constant>char-set:uri-unreserved</constant>
188
189All unreserved characters that are allowed in a URI.
190  unreserved  =  ALPHA / DIGIT / "-" / "." / "_" / "~"
191
192Note that this is _not_ the complement of {{char-set:uri-reserved}}!
193There are several characters (even printable, noncontrol characters)
194which are not allowed at all in a URI.
195
196
197=== Requires
198
199* [[matchable]]
200* [[defstruct]]
201
202=== Version History
203
204* trunk Add new predicates for URIs, absolute URIs and relative references. Fix absolute-uri so it raises a condition when passing in a non-absolute uri string, instead of returning a string with the error. Also throw an error if a fragment is detected in the string.
205* 2.0 Export char-sets, add char-set arg to uri-encode/uri-decode,
206       do not decode query args as x-www-form-urlencoded, change path
207       representation.  Lots of bugfixes.
208* 1.12 Fix relative path normalization when original path ends in a slash, remove consecutive slashes from paths in URIs
209* 1.11 Added accessors for the authority components, functional update procedures. Fixed case-normalization.
210* 1.10 Fixed edge case in {{uri-relative-to}} with empty path in base uri,
211       fixed {{uri->string}} for URIs with query args, fixed {{uri->string}}
212       to not add an extraneous slash after authority in case of empty path.
213* 1.9 Fixed bug in uri-encode-string with reserved characters, added
214      tests for decoding and encoding [Peter Bex]
215* 1.8 Added uri-encode-string and uri-decode-string.
216      URI constructors now perform automatic normalization
217      of percent-encoded unreserved characters. [suggested by Peter Bex]
218* 1.6 Added error message about missing scheme in absolute-uri.
219* trunk Small bugfix in absolute-uri. [Peter Bex]
220* 1.5 Bug fixes in uri->string and absolute-uri. [reported by Peter Bex]
221* 1.3 Ported to Hygienic Chicken and the [[test]] egg [Peter Bex]
222* 1.2 Now using defstruct instead of define-record [suggested by Peter Bex]
223* 1.1 Added utf8 compatibility
224* 1.0 Initial Release
225
226=== License
227
228Based on the
229[[http://www.ninebynine.org/Software/ReadMe-URI-Haskell.txt|Haskell
230URI library]] by Graham Klyne <gk@ninebynine.org>.
231
232
233  Copyright 2008-2009 Ivan Raikov, Peter Bex.
234  All rights reserved.
235 
236  Redistribution and use in source and binary forms, with or without
237  modification, are permitted provided that the following conditions are
238  met:
239 
240  Redistributions of source code must retain the above copyright
241  notice, this list of conditions and the following disclaimer.
242 
243  Redistributions in binary form must reproduce the above copyright
244  notice, this list of conditions and the following disclaimer in the
245  documentation and/or other materials provided with the distribution.
246 
247  Neither the name of the author nor the names of its contributors may
248  be used to endorse or promote products derived from this software
249  without specific prior written permission.
250 
251  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
252  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
253  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
254  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
255  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
256  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
257  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
258  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
259  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
260  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
261  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
262  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.