source: project/wiki/eggref/4/uri-generic @ 33267

Last change on this file since 33267 was 33267, checked in by sjamaan, 4 years ago

Update uri-generic changelog to mention 2.43

File size: 12.1 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== uri-generic
5
6=== Description
7
8The {{uri-generic}} library contains procedures for parsing and
9manipulation of Uniform Resource Identifiers
10([[http://tools.ietf.org/html/rfc3986|RFC 3986]]). It is intended to
11conform more closely to the RFC, and uses combinator parsing and
12character classes rather than regular expressions.
13
14This library should be considered to be a ''basis'' for creating
15scheme-specific URI parser libraries. This library only parses
16the generic components from a URI.  Any specific library can
17further parse subcomponents. For this reason, encoding and decoding
18of percent-encoded characters is not done automatically.
19This should be handled by specific URI scheme implementations.
20
21=== Library Procedures
22
23==== Constructors and predicates
24
25As specified in section 2.3 of RFC 3986, URI constructors
26automatically decode percent-encoded octets in the range of unreserved
27characters. This means that the following holds true:
28
29 (equal? (uri-reference "http://example.com/foo-bar")
30         (uri-reference "http://example.com/foo%2Dbar"))  => #t
31
32<procedure>(uri-reference STRING) => URI</procedure>
33
34A URI reference is either a URI or a relative reference (RFC 3986,
35Section 4.1).  If the given string's prefix does not match the syntax
36of a scheme followed by a colon separator, then the given string is
37parsed as a relative reference. If STRING is neither a URI nor a
38relative reference, uri-reference returns #f.
39
40<procedure>(uri-reference? URI) => BOOL</procedure>
41
42Is the given object a URI reference?  '''All objects created by
43URI-generic constructors are URI references'''; they are either URIs
44or relative references.  The constructors below are just more strict
45checking versions of {{uri-reference}}.  They all create
46URI references.
47
48<procedure>(absolute-uri STRING) => URI</procedure>
49
50Parses the given string as an absolute URI, in which no fragments are
51allowed.  If no URI scheme is found, or a fragment is detected, this
52raises an error.
53
54Absolute URIs are defined by RFC 3986 as non-relative URI references
55without a fragment (RFC 3986, Section 4.2).  Absolute URIs can be used
56as a base URI to resolve a relative-ref against, using
57{{uri-relative-to}} (see below).
58
59<procedure>(make-uri #!key authority scheme path query fragment host port username password) => URI</procedure>
60
61Constructs a URI from the given components.
62
63<procedure>(absolute-uri? URI) => BOOL</procedure>
64
65Is the given object an absolute URI?
66
67<procedure>(uri? URI) => BOOL</procedure>
68
69Is the given object a URI?  URIs are all URI references that include
70a scheme part.  The other type of URI references are relative
71references.
72
73<procedure>(relative-ref? URI) => BOOL</procedure>
74
75Is the given object a relative reference?  Relative references are
76defined by RFC 3986 as URI references which are not URIs; they contain
77no URI scheme and can be resolved against an absolute URI to obtain
78a complete URI using {{uri-relative-to}}.
79
80<procedure>(uri-path-absolute? URI) => BOOL</procedure>
81
82Is the {{URI}}'s path component an absolute path?
83
84<procedure>(uri-path-relative? URI) => BOOL</procedure>
85
86Is the {{URI}}'s path component a relative path?
87
88==== Attribute accessors
89
90<procedure>(uri-authority URI) => URI-AUTH</procedure><br>
91<procedure>(uri-scheme URI) => SYMBOL</procedure><br>
92<procedure>(uri-path URI) => LIST</procedure><br>
93<procedure>(uri-query URI) => STRING</procedure><br>
94<procedure>(uri-fragment) URI => STRING</procedure><br>
95<procedure>(uri-host URI) => STRING</procedure><br>
96<procedure>(uri-port URI) => INTEGER</procedure><br>
97<procedure>(uri-username URI) => STRING</procedure><br>
98<procedure>(uri-password URI) => STRING</procedure><br>
99<procedure>(authority? URI-AUTH) => BOOL</procedure><br>
100<procedure>(authority-host URI-AUTH) => STRING</procedure><br>
101<procedure>(authority-port URI-AUTH) => INTEGER</procedure><br>
102<procedure>(authority-username URI-AUTH) => STRING</procedure><br>
103<procedure>(authority-password URI-AUTH) => STRING</procedure><br>
104
105If a component is not defined in the given URI, then the corresponding
106accessor returns {{#f}}, except for {{uri-path}}, which will always return
107a (possibly empty) list.
108
109<procedure>(update-uri URI #!key authority scheme path query fragment host port username password) => URI</procedure><br>
110<procedure>(update-authority URI-AUTH #!key host port username password) => URI</procedure><br>
111
112Update the specified keys in the URI or URI-AUTH object in a
113functional way (ie, it creates a new copy with the modifications).
114
115==== String and List Representations
116
117<procedure>(uri->string URI [USERINFO]) => STRING</procedure>
118
119Reconstructs the given URI into a string; uses a supplied function
120{{LAMBDA USERNAME PASSWORD -> STRING}} to map the userinfo part of the
121URI.  If not given, it represents the userinfo as the username followed
122by {{":******"}}.
123
124<procedure>(uri->list URI USERINFO) => LIST</procedure>
125
126Returns a list of the form {{(SCHEME SPECIFIC FRAGMENT)}};
127{{SPECIFIC}} is of the form {{(AUTHORITY PATH QUERY)}}.
128
129==== Reference Resolution
130
131<procedure>(uri-relative-to URI URI) => URI</procedure>
132
133Resolve the first URI as a reference relative to the second URI,
134returning a new URI (RFC 3986, Section 5.2.2).
135
136<procedure>(uri-relative-from URI URI) => URI</procedure>
137
138Constructs a new, possibly relative, URI which represents the location
139of the first URI with respect to the second URI.
140
141<enscript highlight="scheme">
142(use uri-generic)
143(uri->string (uri-relative-to (uri-reference "../qux") (uri-reference "http://example.com/foo/bar/")))
144 => "http://example.com/foo/qux"
145
146(uri->string (uri-relative-from (uri-reference "http://example.com/foo/qux") (uri-reference "http://example.com/foo/bar/")))
147 => "../qux"
148</enscript>
149
150==== String encoding and decoding
151
152<procedure>(uri-encode-string STRING [CHAR-SET]) => STRING</procedure>
153
154Returns the percent-encoded form of the given string.  The optional
155char-set argument controls which characters should be encoded.
156It defaults to the complement of {{char-set:uri-unreserved}}. This is
157always safe, but often overly careful; it is allowed to leave certain
158characters unquoted depending on the context.
159
160<procedure>(uri-decode-string STRING [CHAR-SET]) => STRING</procedure>
161
162Returns the decoded form of the given string.  The optional char-set
163argument controls which characters should be decoded.  It defaults to
164{{char-set:full}}.
165
166
167==== Normalization 
168
169<procedure>(uri-normalize-case URI) => URI</procedure>
170
171URI case normalization (RFC 3986 section 6.2.2.1)
172
173<procedure>(uri-normalize-path-segments URI) => URI</procedure>
174
175URI path segment normalization (RFC 3986 section 6.2.2.3)
176
177
178==== Character sets
179
180As a convenience for sub-parsers or other special-purpose URI handling
181code, there are a couple of character sets exported by uri-generic.
182
183<constant>char-set:gen-delims</constant>
184
185Generic delimiters.
186  gen-delims  =  ":" / "/" / "?" / "#" / "[" / "]" / "@"
187
188<constant>char-set:sub-delims</constant>
189
190Sub-delimiters.
191  sub-delims  =  "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
192
193<constant>char-set:uri-reserved</constant>
194
195The union of {{gen-delims}} and {{sub-delims}}; all reserved URI characters.
196  reserved    =  gen-delims / sub-delims
197
198<constant>char-set:uri-unreserved</constant>
199
200All unreserved characters that are allowed in a URI.
201  unreserved  =  ALPHA / DIGIT / "-" / "." / "_" / "~"
202
203Note that this is _not_ the complement of {{char-set:uri-reserved}}!
204There are several characters (even printable, noncontrol characters)
205which are not allowed at all in a URI.
206
207
208=== Requires
209
210* [[matchable]]
211
212=== Version History
213
214* 2.43 Fixed handling of UTF-8 characters when percent-encoding/decoding (thanks to Adrien Ramos).
215* 2.42 Improved performance.
216* 2.41 Make code more portable by avoiding keyword arguments (thanks to Seth Alves).
217* 2.39 Get rid of a compiler warning due to broken ipv4 address handling (thanks to Mario Goulart).
218* 2.38 Fixed a bug that caused an error to be thrown when host contained percent-encoded characters (thanks to Roel van der Hoorn).
219* 2.37 Fixed bug in make-uri when passed no path, added basic tests for make-uri.
220* 2.36 Added procedure make-uri
221* 2.35 Added some extra checks so we do not try to parse URIs containing invalid (non-hexnum) percent-encoding.  Add code to preserve empty path segments during parsing and when performing relative reference resolution.
222* 2.34 Fix two bugs that show up in very rare cases (possibly never in practice). One caused issues when creating relative paths from two URIs where one URI had a path that was a prefix of the other, the other caused issues when a relative URI's path containing ".." as last component was resolved.
223* 2.33 Path component for empty absolute path directly followed by query is now represented the same as empty path without query.
224* 2.32 Empty absolute path directly followed by query is now properly recognised as an URI reference.
225* 2.31 Return {{#f}} in constructors if unconsumed input remains after parsing
226* 2.3 Add predicates uri-path-relative? and uri-path-absolute?
227* 2.2 Improvements to uri->string.
228* 2.1 Add new predicates for URIs, absolute URIs and relative references. Fix absolute-uri so it raises a condition when passing in a non-absolute uri string, instead of returning a string with the error. Also throw an error if a fragment is detected in the string.
229* 2.0 Export char-sets, add char-set arg to uri-encode/uri-decode,
230       do not decode query args as x-www-form-urlencoded, change path
231       representation.  Lots of bugfixes.
232* 1.12 Fix relative path normalization when original path ends in a slash, remove consecutive slashes from paths in URIs
233* 1.11 Added accessors for the authority components, functional update procedures. Fixed case-normalization.
234* 1.10 Fixed edge case in {{uri-relative-to}} with empty path in base uri,
235       fixed {{uri->string}} for URIs with query args, fixed {{uri->string}}
236       to not add an extraneous slash after authority in case of empty path.
237* 1.9 Fixed bug in uri-encode-string with reserved characters, added
238      tests for decoding and encoding [Peter Bex]
239* 1.8 Added uri-encode-string and uri-decode-string.
240      URI constructors now perform automatic normalization
241      of percent-encoded unreserved characters. [suggested by Peter Bex]
242* 1.6 Added error message about missing scheme in absolute-uri.
243* trunk Small bugfix in absolute-uri. [Peter Bex]
244* 1.5 Bug fixes in uri->string and absolute-uri. [reported by Peter Bex]
245* 1.3 Ported to Hygienic Chicken and the [[test]] egg [Peter Bex]
246* 1.2 Now using defstruct instead of define-record [suggested by Peter Bex]
247* 1.1 Added utf8 compatibility
248* 1.0 Initial Release
249
250=== License
251
252Based on the
253[[http://www.ninebynine.org/Software/ReadMe-URI-Haskell.txt|Haskell
254URI library]] by Graham Klyne <gk@ninebynine.org>.
255
256  Copyright 2008-2016 Ivan Raikov, Peter Bex.
257  All rights reserved.
258 
259  Redistribution and use in source and binary forms, with or without
260  modification, are permitted provided that the following conditions are
261  met:
262 
263  Redistributions of source code must retain the above copyright
264  notice, this list of conditions and the following disclaimer.
265 
266  Redistributions in binary form must reproduce the above copyright
267  notice, this list of conditions and the following disclaimer in the
268  documentation and/or other materials provided with the distribution.
269 
270  Neither the name of the author nor the names of its contributors may
271  be used to endorse or promote products derived from this software
272  without specific prior written permission.
273 
274  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
275  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
276  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
277  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
278  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
279  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
280  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
281  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
282  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
283  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
284  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
285  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.