1 | [[tags: eggs literals regex regex-literals]] |
---|
2 | |
---|
3 | [[toc:]] |
---|
4 | |
---|
5 | == Introduction |
---|
6 | |
---|
7 | A reader extension providing precompiled regular expression literals of the |
---|
8 | form <code>#/[a-z0-9]+/i</code> and <code>#r{^/path/(to)/file$}</code>. |
---|
9 | |
---|
10 | |
---|
11 | == Examples |
---|
12 | |
---|
13 | === Using regular expression literals in the interpreter |
---|
14 | |
---|
15 | Loading {{regex-literals}} also loads the {{regex}} unit and allows |
---|
16 | convenient use of regular expression literals as follows: |
---|
17 | |
---|
18 | <enscript highlight=scheme>#;1> (use regex-literals) |
---|
19 | |
---|
20 | #;2> #/[A-Za-z0-9]+/ |
---|
21 | #<regexp> |
---|
22 | |
---|
23 | #;3> ,x #/^[a-z0-9]+$/i |
---|
24 | (regexp "^[a-z0-9]+$" #t #f #f) |
---|
25 | |
---|
26 | #;4> (string-match #/^(\d{2}):(\d{2})(..)/ "11:59pm") |
---|
27 | ("11:59pm" "11" "59" "pm") |
---|
28 | |
---|
29 | #;5> (string-split-fields #/[^\s]+/ "the quick brown fox jumps over the lazy dog") |
---|
30 | ("the" "quick" "brown" "fox" "jumps" "over" "the" "lazy" "dog") |
---|
31 | |
---|
32 | #;6> (string-split-fields #r{[^/]+} "/path/to/file") |
---|
33 | ("path" "to" "file") |
---|
34 | |
---|
35 | #;7> (string-substitute #/(\w+)\s+(\w+)/u "\\2, \\1" "John Smith") |
---|
36 | "Smith, John" |
---|
37 | |
---|
38 | </enscript> |
---|
39 | |
---|
40 | |
---|
41 | === Using regular expression literals with the compiler |
---|
42 | |
---|
43 | Passing a {{-X regex-literals}} command-line option to {{csc}} allows you to |
---|
44 | conveniently make use of regular expression literals in your egg or compiled |
---|
45 | program without making the {{regex-literals}} egg a runtime dependency. |
---|
46 | |
---|
47 | (See the [[php-s11n]] egg for an example of building with {{regex-literals}}.) |
---|
48 | |
---|
49 | |
---|
50 | == Authors |
---|
51 | |
---|
52 | [[Arto Bendiken]], [[http://3e8.org/zb|Zbigniew]] |
---|
53 | |
---|
54 | |
---|
55 | == Requires |
---|
56 | |
---|
57 | * [[Unit regex|regex]] |
---|
58 | |
---|
59 | |
---|
60 | == Reader extensions |
---|
61 | |
---|
62 | This egg installs a reader extension for {{#\/}} that reads a regular |
---|
63 | expression literal as described below in {{read-regex-literal}}, and another |
---|
64 | reader extension for {{#\r}} that works similarly but supports a generalized |
---|
65 | delimiter syntax as described in {{read-regex-literal/general}}. |
---|
66 | |
---|
67 | Note that there are some caveats to using reader extensions when compiling; |
---|
68 | for more details, refer to the relevant |
---|
69 | [[faq#Why%20does%20{{define-reader-ctor}}%20not%20work%20in%20my%20compiled%20program?|FAQ entry]]. |
---|
70 | |
---|
71 | |
---|
72 | == Input and output |
---|
73 | |
---|
74 | === read-regex-literal |
---|
75 | |
---|
76 | [procedure] (read-regex-literal [PORT]) |
---|
77 | |
---|
78 | Reads a regular expression literal of the form {{#/.../}} from {{PORT}}, |
---|
79 | which defaults to the value of {{(current-input-port)}}. The literal is |
---|
80 | converted to a precompiled regular expression object using the {{(regexp)}} |
---|
81 | procedure provided by the [[Unit regex|regex]] unit. |
---|
82 | |
---|
83 | Regular expression literals may include one or more options that modify the |
---|
84 | way the pattern matches strings. The options are one or more characters |
---|
85 | placed immediately after the terminator: |
---|
86 | |
---|
87 | * {{#/.../i}} PCRE_CASELESS: case-insensitive mode; the pattern match will |
---|
88 | ignore the case of letters in the pattern. |
---|
89 | * {{#/.../x}} PCRE_EXTENDED: extended mode; complex regular expressions can |
---|
90 | be difficult to read, so this option allows you to insert spaces, |
---|
91 | newlines, and comments in the pattern to make it more readable. |
---|
92 | * {{#/.../u}} PCRE_UTF8: UTF-8 mode; sets the language encoding of the |
---|
93 | regular expression. |
---|
94 | |
---|
95 | |
---|
96 | === read-regex-literal/general |
---|
97 | |
---|
98 | [procedure] (read-regex-literal/general [PORT]) |
---|
99 | |
---|
100 | Reads a regular expression literal of the form {{#r(...)}} from {{PORT}}, |
---|
101 | which defaults to the value of {{(current-input-port)}}. This works |
---|
102 | otherwise similarly to {{read-regex-literal}} but supports a generalized |
---|
103 | delimiter syntax as follows: |
---|
104 | |
---|
105 | * Matching delimiter pairs: {{#r{...}}}, {{#r(...)}}, {{#r[...]}} and |
---|
106 | {{#r<...>}} |
---|
107 | * Any arbitrary character: {{#r!...!}}, {{#r|...|}}, {{#r@...@}}, and so |
---|
108 | forth. |
---|
109 | |
---|
110 | |
---|
111 | == License |
---|
112 | |
---|
113 | Copyright (c) 2006-2007 Arto Bendiken. |
---|
114 | |
---|
115 | Permission is hereby granted, free of charge, to any person obtaining a copy |
---|
116 | of this software and associated documentation files (the "Software"), to |
---|
117 | deal in the Software without restriction, including without limitation the |
---|
118 | rights to use, copy, modify, merge, publish, distribute, sublicense, and/or |
---|
119 | sell copies of the Software, and to permit persons to whom the Software is |
---|
120 | furnished to do so, subject to the following conditions: |
---|
121 | |
---|
122 | The above copyright notice and this permission notice shall be included in |
---|
123 | all copies or substantial portions of the Software. |
---|
124 | |
---|
125 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
---|
126 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
---|
127 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
---|
128 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
---|
129 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING |
---|
130 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS |
---|
131 | IN THE SOFTWARE. |
---|
132 | |
---|
133 | |
---|
134 | == Version history |
---|
135 | |
---|
136 | ;1.0.2 : Support for generalized {{#r(...)}} delimiters (by [[http://3e8.org/zb|Zbigniew]]) |
---|
137 | ;1.0.1 : Added support for the {{#/.../i}}, {{#/.../x}} and {{#/.../u}} options. |
---|
138 | ;1.0.0 : Initial release of the {{regex-literals}} egg. |
---|