source: project/wiki/eggref/4/uri-generic @ 36592

Last change on this file since 36592 was 36592, checked in by sjamaan, 3 months ago

uri-generic: Link to uri-common so people know not to use this directly.

File size: 12.8 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== uri-generic
5
6=== Description
7
8The {{uri-generic}} library contains procedures for parsing and
9manipulation of Uniform Resource Identifiers
10([[http://tools.ietf.org/html/rfc3986|RFC 3986]]). It is intended to
11conform more closely to the RFC, and uses combinator parsing and
12character classes rather than regular expressions.
13
14This library should be considered to be a ''basis'' for creating
15scheme-specific URI parser libraries. This library only parses
16the generic components from a URI.  Any specific library can
17further parse subcomponents. For this reason, encoding and decoding
18of percent-encoded characters is not done automatically.
19This should be handled by specific URI scheme implementations.
20
21For a more practical library which deals with "common" URI schemes
22like {{http}}, {{ftp}}, {{file}} and such, see the [[uri-common]] egg,
23which is such a specific implementation.
24
25=== Library Procedures
26
27==== Constructors and predicates
28
29As specified in section 2.3 of RFC 3986, URI constructors
30automatically decode percent-encoded octets in the range of unreserved
31characters. This means that the following holds true:
32
33 (equal? (uri-reference "http://example.com/foo-bar")
34         (uri-reference "http://example.com/foo%2Dbar"))  => #t
35
36<procedure>(uri-reference STRING) => URI</procedure>
37
38A URI reference is either a URI or a relative reference (RFC 3986,
39Section 4.1).  If the given string's prefix does not match the syntax
40of a scheme followed by a colon separator, then the given string is
41parsed as a relative reference. If STRING is neither a URI nor a
42relative reference, uri-reference returns #f.
43
44<procedure>(uri-reference? URI) => BOOL</procedure>
45
46Is the given object a URI reference?  '''All objects created by
47URI-generic constructors are URI references'''; they are either URIs
48or relative references.  The constructors below are just more strict
49checking versions of {{uri-reference}}.  They all create
50URI references.
51
52<procedure>(absolute-uri STRING) => URI</procedure>
53
54Parses the given string as an absolute URI, in which no fragments are
55allowed.  If no URI scheme is found, or a fragment is detected, this
56raises an error.
57
58Absolute URIs are defined by RFC 3986 as non-relative URI references
59without a fragment (RFC 3986, Section 4.2).  Absolute URIs can be used
60as a base URI to resolve a relative-ref against, using
61{{uri-relative-to}} (see below).
62
63<procedure>(make-uri #!key authority scheme path query fragment host port username password) => URI</procedure>
64
65Constructs a URI from the given components.
66
67<procedure>(absolute-uri? URI) => BOOL</procedure>
68
69Is the given object an absolute URI?
70
71<procedure>(uri? URI) => BOOL</procedure>
72
73Is the given object a URI?  URIs are all URI references that include
74a scheme part.  The other type of URI references are relative
75references.
76
77<procedure>(relative-ref? URI) => BOOL</procedure>
78
79Is the given object a relative reference?  Relative references are
80defined by RFC 3986 as URI references which are not URIs; they contain
81no URI scheme and can be resolved against an absolute URI to obtain
82a complete URI using {{uri-relative-to}}.
83
84<procedure>(uri-path-absolute? URI) => BOOL</procedure>
85
86Is the {{URI}}'s path component an absolute path?
87
88<procedure>(uri-path-relative? URI) => BOOL</procedure>
89
90Is the {{URI}}'s path component a relative path?
91
92<procedure>(uri-ipv6-host? URI) => BOOL</procedure><br>
93
94Is the {{URI}}'s host component an IPv6 literal address?
95
96<procedure>(authority-ipv6-host? URI-AUTH) => BOOL</procedure><br>
97
98Is the {{URI-AUTH}}'s host component an IPv6 literal address?  You can
99get an {{URI-AUTH}} object from an uri using the {{uri-authority}}
100attribute accessor, see below.
101
102==== Attribute accessors
103
104<procedure>(uri-authority URI) => URI-AUTH</procedure><br>
105<procedure>(uri-scheme URI) => SYMBOL</procedure><br>
106<procedure>(uri-path URI) => LIST</procedure><br>
107<procedure>(uri-query URI) => STRING</procedure><br>
108<procedure>(uri-fragment) URI => STRING</procedure><br>
109<procedure>(uri-host URI) => STRING</procedure><br>
110<procedure>(uri-port URI) => INTEGER</procedure><br>
111<procedure>(uri-username URI) => STRING</procedure><br>
112<procedure>(uri-password URI) => STRING</procedure><br>
113<procedure>(authority? URI-AUTH) => BOOL</procedure><br>
114<procedure>(authority-host URI-AUTH) => STRING</procedure><br>
115<procedure>(authority-port URI-AUTH) => INTEGER</procedure><br>
116<procedure>(authority-username URI-AUTH) => STRING</procedure><br>
117<procedure>(authority-password URI-AUTH) => STRING</procedure><br>
118
119If a component is not defined in the given URI, then the corresponding
120accessor returns {{#f}}, except for {{uri-path}}, which will always return
121a (possibly empty) list.
122
123<procedure>(update-uri URI #!key authority scheme path query fragment host port username password) => URI</procedure><br>
124<procedure>(update-authority URI-AUTH #!key host port username password) => URI</procedure><br>
125
126Update the specified keys in the URI or URI-AUTH object in a
127functional way (ie, it creates a new copy with the modifications).
128
129==== String and List Representations
130
131<procedure>(uri->string URI [USERINFO]) => STRING</procedure>
132
133Reconstructs the given URI into a string; uses a supplied function
134{{LAMBDA USERNAME PASSWORD -> STRING}} to map the userinfo part of the
135URI.  If not given, it represents the userinfo as the username followed
136by {{":******"}}.
137
138<procedure>(uri->list URI USERINFO) => LIST</procedure>
139
140Returns a list of the form {{(SCHEME SPECIFIC FRAGMENT)}};
141{{SPECIFIC}} is of the form {{(AUTHORITY PATH QUERY)}}.
142
143==== Reference Resolution
144
145<procedure>(uri-relative-to URI URI) => URI</procedure>
146
147Resolve the first URI as a reference relative to the second URI,
148returning a new URI (RFC 3986, Section 5.2.2).
149
150<procedure>(uri-relative-from URI URI) => URI</procedure>
151
152Constructs a new, possibly relative, URI which represents the location
153of the first URI with respect to the second URI.
154
155<enscript highlight="scheme">
156(use uri-generic)
157(uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/")))
158 => "http://example.com/foo/qux"
159
160(uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/")))
161 => "../qux"
162</enscript>
163
164==== String encoding and decoding
165
166<procedure>(uri-encode-string STRING [CHAR-SET]) => STRING</procedure>
167
168Returns the percent-encoded form of the given string.  The optional
169char-set argument controls which characters should be encoded.
170It defaults to the complement of {{char-set:uri-unreserved}}. This is
171always safe, but often overly careful; it is allowed to leave certain
172characters unquoted depending on the context.
173
174<procedure>(uri-decode-string STRING [CHAR-SET]) => STRING</procedure>
175
176Returns the decoded form of the given string.  The optional char-set
177argument controls which characters should be decoded.  It defaults to
178{{char-set:full}}.
179
180
181==== Normalization 
182
183<procedure>(uri-normalize-case URI) => URI</procedure>
184
185URI case normalization (RFC 3986 section 6.2.2.1)
186
187<procedure>(uri-normalize-path-segments URI) => URI</procedure>
188
189URI path segment normalization (RFC 3986 section 6.2.2.3)
190
191
192==== Character sets
193
194As a convenience for sub-parsers or other special-purpose URI handling
195code, there are a couple of character sets exported by uri-generic.
196
197<constant>char-set:gen-delims</constant>
198
199Generic delimiters.
200  gen-delims  =  ":" / "/" / "?" / "#" / "[" / "]" / "@"
201
202<constant>char-set:sub-delims</constant>
203
204Sub-delimiters.
205  sub-delims  =  "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
206
207<constant>char-set:uri-reserved</constant>
208
209The union of {{gen-delims}} and {{sub-delims}}; all reserved URI characters.
210  reserved    =  gen-delims / sub-delims
211
212<constant>char-set:uri-unreserved</constant>
213
214All unreserved characters that are allowed in a URI.
215  unreserved  =  ALPHA / DIGIT / "-" / "." / "_" / "~"
216
217Note that this is _not_ the complement of {{char-set:uri-reserved}}!
218There are several characters (even printable, noncontrol characters)
219which are not allowed at all in a URI.
220
221
222=== Requires
223
224* [[matchable]]
225
226=== Version History
227
228* 2.45 Don't break hard when setting host to {{#f}} using {{update-uri}} (reported via IRC by "off_world").
229* 2.44 Fixed parsing of IPv6, which was completely broken (#1530, thanks to Vasilij Schneidermann).
230* 2.43 Fixed handling of UTF-8 characters when percent-encoding/decoding (thanks to Adrien Ramos).
231* 2.42 Improved performance.
232* 2.41 Make code more portable by avoiding keyword arguments (thanks to Seth Alves).
233* 2.39 Get rid of a compiler warning due to broken ipv4 address handling (thanks to Mario Goulart).
234* 2.38 Fixed a bug that caused an error to be thrown when host contained percent-encoded characters (thanks to Roel van der Hoorn).
235* 2.37 Fixed bug in make-uri when passed no path, added basic tests for make-uri.
236* 2.36 Added procedure make-uri
237* 2.35 Added some extra checks so we do not try to parse URIs containing invalid (non-hexnum) percent-encoding.  Add code to preserve empty path segments during parsing and when performing relative reference resolution.
238* 2.34 Fix two bugs that show up in very rare cases (possibly never in practice). One caused issues when creating relative paths from two URIs where one URI had a path that was a prefix of the other, the other caused issues when a relative URI's path containing ".." as last component was resolved.
239* 2.33 Path component for empty absolute path directly followed by query is now represented the same as empty path without query.
240* 2.32 Empty absolute path directly followed by query is now properly recognised as an URI reference.
241* 2.31 Return {{#f}} in constructors if unconsumed input remains after parsing
242* 2.3 Add predicates uri-path-relative? and uri-path-absolute?
243* 2.2 Improvements to uri->string.
244* 2.1 Add new predicates for URIs, absolute URIs and relative references. Fix absolute-uri so it raises a condition when passing in a non-absolute uri string, instead of returning a string with the error. Also throw an error if a fragment is detected in the string.
245* 2.0 Export char-sets, add char-set arg to uri-encode/uri-decode,
246       do not decode query args as x-www-form-urlencoded, change path
247       representation.  Lots of bugfixes.
248* 1.12 Fix relative path normalization when original path ends in a slash, remove consecutive slashes from paths in URIs
249* 1.11 Added accessors for the authority components, functional update procedures. Fixed case-normalization.
250* 1.10 Fixed edge case in {{uri-relative-to}} with empty path in base uri,
251       fixed {{uri->string}} for URIs with query args, fixed {{uri->string}}
252       to not add an extraneous slash after authority in case of empty path.
253* 1.9 Fixed bug in uri-encode-string with reserved characters, added
254      tests for decoding and encoding [Peter Bex]
255* 1.8 Added uri-encode-string and uri-decode-string.
256      URI constructors now perform automatic normalization
257      of percent-encoded unreserved characters. [suggested by Peter Bex]
258* 1.6 Added error message about missing scheme in absolute-uri.
259* trunk Small bugfix in absolute-uri. [Peter Bex]
260* 1.5 Bug fixes in uri->string and absolute-uri. [reported by Peter Bex]
261* 1.3 Ported to Hygienic Chicken and the [[test]] egg [Peter Bex]
262* 1.2 Now using defstruct instead of define-record [suggested by Peter Bex]
263* 1.1 Added utf8 compatibility
264* 1.0 Initial Release
265
266=== License
267
268Based on the
269[[http://www.ninebynine.org/Software/ReadMe-URI-Haskell.txt|Haskell
270URI library]] by Graham Klyne <gk@ninebynine.org>.
271
272  Copyright 2008-2018 Ivan Raikov, Peter Bex.
273  All rights reserved.
274 
275  Redistribution and use in source and binary forms, with or without
276  modification, are permitted provided that the following conditions are
277  met:
278 
279  Redistributions of source code must retain the above copyright
280  notice, this list of conditions and the following disclaimer.
281 
282  Redistributions in binary form must reproduce the above copyright
283  notice, this list of conditions and the following disclaimer in the
284  documentation and/or other materials provided with the distribution.
285 
286  Neither the name of the author nor the names of its contributors may
287  be used to endorse or promote products derived from this software
288  without specific prior written permission.
289 
290  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
291  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
292  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
293  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
294  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
295  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
296  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
297  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
298  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
299  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
300  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
301  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.