source: project/wiki/uri-generic @ 13218

Last change on this file since 13218 was 13218, checked in by sjamaan, 11 years ago

Merge uri-generic documentation changes from release 4

File size: 9.6 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== uri-generic
5
6=== Description
7
8The {{uri-generic}} library contains procedures for parsing and
9manipulation of Uniform Resource Identifiers
10([[http://tools.ietf.org/html/rfc3986|RFC 3986]]). It is intended to
11conform more closely to the RFC, and uses combinator parsing and
12character classes rather than regular expressions.
13
14This library should be considered to be a ''basis'' for creating
15scheme-specific URI parser libraries. This library only parses
16the generic components from an URI.  Any specific library can
17further parse subcomponents. For this reason, encoding and decoding
18of percent-encoded characters is not done automatically.
19This should be handled by specific URI scheme implementations.
20
21=== Library Procedures
22
23==== Constructors
24
25As specified in section 2.3 of RFC 3986, URI constructors
26automatically decode percent-encoded octets in the range of unreserved
27characters. This means that the following holds true:
28
29 (equal? (uri-reference "http://example.com/foo-bar")
30         (uri-reference "http://example.com/foo%2Dbar"))  => #t
31
32<procedure>(uri-reference STRING) => URI</procedure>
33
34A URI reference is either a URI or a relative reference (RFC 3986,
35Section 4.1).  If the given string's prefix does not match the syntax
36of a scheme followed by a colon separator, then the given string is
37parsed as a relative reference.
38
39<procedure>(absolute-uri STRING) => URI</procedure>
40
41Parses the given string as an absolute URI, in which no fragments are
42allowed (RFC 3986, Section 4.2)
43
44
45==== Predicates and Accessors
46
47* <procedure>(uri-authority URI) => URI-AUTH</procedure>
48* <procedure>(uri-scheme URI) => SYMBOL</procedure>
49* <procedure>(uri-path URI) => LIST</procedure>
50* <procedure>(uri-query URI) => STRING</procedure>
51* <procedure>(uri-fragment) URI => STRING</procedure>
52* <procedure>(uri-host URI) => STRING</procedure>
53* <procedure>(uri-port URI) => INTEGER</procedure>
54* <procedure>(uri-username URI) => STRING</procedure>
55* <procedure>(uri-password URI) => STRING</procedure>
56* <procedure>(authority? URI-AUTH) => BOOL</procedure>
57* <procedure>(authority-host URI-AUTH) => STRING</procedure>
58* <procedure>(authority-port URI-AUTH) => INTEGER</procedure>
59* <procedure>(authority-username URI-AUTH) => STRING</procedure>
60* <procedure>(authority-password URI-AUTH) => STRING</procedure>
61
62If a component is not defined in the given URI, then the corresponding
63accessor returns {{#f}}.
64
65* <procedure>(update-uri URI #!key authority scheme path query fragment host port username password) => URI</procedure>
66* <procedure>(update-authority URI-AUTH #!key host port username password) => URI</procedure>
67
68Update the specified keys in the URI or URI-AUTH object in a
69functional way (ie, it creates a new copy with the modifications).
70
71* <procedure>(uri? URI) => BOOL</procedure>
72
73Is the given object an URI-generic object?
74
75* <procedure>(relative-ref? URI) => BOOL</procedure>
76
77Is the given object a relative reference?  Relative references are
78defined by RFC 3986 as URI references which are not URIs; they contain
79no URI scheme and can be resolved against an absolute URI to obtain
80a complete URI using {{uri-relative-to}}.
81
82* <procedure>(absolute-uri? URI) => BOOL</procedure>
83
84Is the given object an absolute URI?  Absolute URI is defined by
85RFC 3986 as a non-relative URI reference without a fragment.  Absolute
86URIs can be used as a base URI to resolve a relative-ref against, using
87{{uri-relative-to}}.
88
89==== String and List Representations
90
91<procedure>(uri->string URI USERINFO) => STRING</procedure>
92
93Reconstructs the given URI into a string; uses a supplied function
94{{LAMBDA USERNAME PASSWORD -> STRING}} to map the userinfo part of the
95URI
96
97<procedure>(uri->list URI USERINFO) => LIST</procedure>
98
99Returns a list of the form {{(SCHEME SPECIFIC FRAGMENT)}};
100{{SPECIFIC}} is of the form {{(AUTHORITY PATH QUERY)}}.
101
102==== Reference Resolution
103
104<procedure>(uri-relative-to URI URI) => URI</procedure>
105
106Resolve the first URI as a reference relative to the second URI,
107returning a new URI (RFC 3986, Section 5.2.2).
108
109<procedure>(uri-relative-from URI URI) => URI</procedure>
110
111Constructs a new, possibly relative, URI which represents the location
112of the first URI with respect to the second URI.
113
114<examples>
115<example>
116<init>(use uri-generic)</init>
117<expr>(uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/")))</expr>
118<result>"http://example.com/foo/qux"</result>
119</example>
120<example>
121<init>(use uri-generic)</init>
122<expr>(uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/")))</expr>
123<result>"../qux"</result>
124</example>
125</examples>
126
127==== String encoding and decoding
128
129<procedure>(uri-encode-string STRING [CHAR-SET]) => STRING</procedure>
130
131Returns the percent-encoded form of the given string.  The optional
132char-set argument controls which characters should be encoded.
133It defaults to the complement of {{char-set:uri-unreserved}}. This is
134always safe, but often overly careful; it is allowed to leave certain
135characters unquoted depending on the context.
136
137<procedure>(uri-decode-string STRING [CHAR-SET]) => STRING</procedure>
138
139Returns the decoded form of the given string.  The optional char-set
140argument controls which characters should be decoded.  It defaults to
141{{char-set:full}}.
142
143
144==== Normalization 
145
146<procedure>(uri-normalize-case URI) => URI</procedure>
147
148URI case normalization (RFC 3986 section 6.2.2.1)
149
150<procedure>(uri-normalize-path-segments URI) => URI</procedure>
151
152URI path segment normalization (RFC 3986 section 6.2.2.3)
153
154
155==== Character sets
156
157As a convenience for sub-parsers or other special-purpose URI handling
158code, there are a couple of character sets exported by uri-generic.
159
160<constant>char-set:gen-delims</constant>
161
162Generic delimiters.
163  gen-delims  =  ":" / "/" / "?" / "#" / "[" / "]" / "@"
164
165<constant>char-set:sub-delims</constant>
166
167Sub-delimiters.
168  sub-delims  =  "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
169
170<constant>char-set:uri-reserved</constant>
171
172The union of {{gen-delims}} and {{sub-delims}}; all reserved URI characters.
173  reserved    =  gen-delims / sub-delims
174
175<constant>char-set:uri-unreserved</constant>
176
177All unreserved characters that are allowed in an URI.
178  unreserved  =  ALPHA / DIGIT / "-" / "." / "_" / "~"
179
180Note that this is _not_ the complement of {{char-set:uri-reserved}}!
181There are several characters (even printable, noncontrol characters)
182which are not allowed at all in an URI.
183
184
185=== Requires
186
187* [[matchable]]
188* [[defstruct]]
189
190=== Version History
191
192* trunk Add new predicates for absoluteness/relativeness. Fix absolute-uri so it raises a condition when passing in a non-absolute uri string, instead of returning a string with the error.
193* 2.0 Export char-sets, add char-set arg to uri-encode/uri-decode,
194       do not decode query args as x-www-form-urlencoded, change path
195       representation.  Lots of bugfixes.
196* 1.12 Fix relative path normalization when original path ends in a slash, remove consecutive slashes from paths in URIs
197* 1.11 Added accessors for the authority components, functional update procedures. Fixed case-normalization.
198* 1.10 Fixed edge case in {{uri-relative-to}} with empty path in base uri,
199       fixed {{uri->string}} for URIs with query args, fixed {{uri->string}}
200       to not add an extraneous slash after authority in case of empty path.
201* 1.9 Fixed bug in uri-encode-string with reserved characters, added
202      tests for decoding and encoding [Peter Bex]
203* 1.8 Added uri-encode-string and uri-decode-string.
204      URI constructors now perform automatic normalization
205      of percent-encoded unreserved characters. [suggested by Peter Bex]
206* 1.6 Added error message about missing scheme in absolute-uri.
207* trunk Small bugfix in absolute-uri. [Peter Bex]
208* 1.5 Bug fixes in uri->string and absolute-uri. [reported by Peter Bex]
209* 1.3 Ported to Hygienic Chicken and the [[test]] egg [Peter Bex]
210* 1.2 Now using defstruct instead of define-record [suggested by Peter Bex]
211* 1.1 Added utf8 compatibility
212* 1.0 Initial Release
213
214=== License
215
216Based on the
217[[http://www.ninebynine.org/Software/ReadMe-URI-Haskell.txt|Haskell
218URI library]] by Graham Klyne <gk@ninebynine.org>.
219
220
221  Copyright 2008-2009 Ivan Raikov, Peter Bex.
222  All rights reserved.
223 
224  Redistribution and use in source and binary forms, with or without
225  modification, are permitted provided that the following conditions are
226  met:
227 
228  Redistributions of source code must retain the above copyright
229  notice, this list of conditions and the following disclaimer.
230 
231  Redistributions in binary form must reproduce the above copyright
232  notice, this list of conditions and the following disclaimer in the
233  documentation and/or other materials provided with the distribution.
234 
235  Neither the name of the author nor the names of its contributors may
236  be used to endorse or promote products derived from this software
237  without specific prior written permission.
238 
239  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
240  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
241  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
242  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
243  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
244  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
245  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
246  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
247  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
248  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
249  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
250  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.