source: project/wiki/eggref/4/regex @ 20561

Last change on this file since 20561 was 20561, checked in by felix winkelmann, 10 years ago

easyffi and regex page fixes

File size: 7.1 KB
Line 
1[[tags: egg]]
2[[toc:]]
3
4== regex
5
6=== Introduction
7
8This extension provides the regular expression API that used to be
9available in CHICKEN releases before version ''4.6.2''. It is a thin
10wrapper around the functionality provided by [[/man/4/Unit irregex|irregex]]
11and is mostly intended to keep old code working.
12
13=== Usage
14
15<enscript highlight=scheme>
16(require-extension regex)
17</enscript>
18
19=== Documentation
20
21==== grep
22
23<procedure>(grep REGEX LIST [ACCESSOR])</procedure>
24
25Returns all items of {{LIST}} that match the regular expression
26{{REGEX}}.  This procedure could be defined as follows:
27
28<enscript highlight=scheme>
29(define (grep regex lst)
30  (filter (lambda (x) (string-search regex x)) lst) )
31</enscript>
32
33{{ACCESSOR}} is an optional accessor-procedure applied to each
34element before doing the match. It should take a single argument
35and return a string that will then be used in the regular expression
36matching. {{ACCESSOR}} defaults to the identity function.
37
38
39==== glob->regexp
40
41<procedure>(glob->regexp PATTERN [SRE?])</procedure>
42
43Converts the file-pattern {{PATTERN}} into a regular expression.
44
45<enscript highlight=scheme>
46(glob->regexp "foo.*")
47=> "foo\..*"
48</enscript>
49
50{{PATTERN}} should follow "glob" syntax. Allowed wildcards are
51
52 *
53 [C...]
54 [C1-C2]
55 [-C...]
56 ?
57
58{{glob->regexp}} returns a regular expression object if the optional
59argument {{SRE?}} is false or not given, otherwise the SRE of the
60computed regular expression is returned.
61
62
63==== regexp
64
65<procedure>(regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])</procedure>
66
67Returns a precompiled regular expression object for {{string}}.
68The optional arguments {{IGNORECASE}}, {{IGNORESPACE}} and {{UTF8}}
69specify whether the regular expression should be matched with case- or whitespace-differences
70ignored, or whether the string should be treated as containing UTF-8 encoded
71characters, respectively.
72
73Note that code that uses regular expressions heavily should always
74use them in precompiled form, which is likely to be much faster than
75passing strings to any of the regular-expression routines described
76below.
77
78
79==== regexp?
80
81<procedure>(regexp? X)</procedure>
82
83Returns {{#t}} if {{X}} is a precompiled regular expression,
84or {{#f}} otherwise.
85
86
87==== string-match
88==== string-match-positions
89
90<procedure>(string-match REGEXP STRING)</procedure><br>
91<procedure>(string-match-positions REGEXP STRING)</procedure>
92
93Matches the regular expression in {{REGEXP}} (a string or a precompiled
94regular expression) with
95{{STRING}} and returns either {{#f}} if the match failed,
96or a list of matching groups, where the first element is the complete
97match.  For each matching group the
98result-list contains either: {{#f}} for a non-matching but optional
99group; a list of start- and end-position of the match in {{STRING}}
100(in the case of {{string-match-positions}}); or the matching
101substring (in the case of {{string-match}}). Note that the exact string
102is matched. For searching a pattern inside a string, see below.
103Note also that {{string-match}} is implemented by calling
104{{string-search}} with the regular expression wrapped in {{^ ... $}}.
105
106
107==== string-search
108==== string-search-positions
109
110<procedure>(string-search REGEXP STRING [START [RANGE]])</procedure><br>
111<procedure>(string-search-positions REGEXP STRING [START [RANGE]])</procedure>
112
113Searches for the first match of the regular expression in
114{{REGEXP}} with {{STRING}}. The search can be limited to
115{{RANGE}} characters.
116
117
118==== string-split-fields
119
120<procedure>(string-split-fields REGEXP STRING [MODE [START]])</procedure>
121
122Splits {{STRING}} into a list of fields according to {{MODE}},
123where {{MODE}} can be the keyword {{#:infix}} ({{REGEXP}}
124matches field separator), the keyword {{#:suffix}} ({{REGEXP}}
125matches field terminator) or {{#t}} ({{REGEXP}} matches field),
126which is the default.
127
128<enscript highlight=scheme>
129(define s "this is a string 1, 2, 3,")
130
131(string-split-fields "[^ ]+" s)
132
133  => ("this" "is" "a" "string" "1," "2," "3,")
134
135(string-split-fields " " s #:infix)
136
137  => ("this" "is" "a" "string" "1," "2," "3,")
138
139(string-split-fields "," s #:suffix)
140 
141  => ("this is a string 1" " 2" " 3")
142</enscript>
143
144
145==== string-substitute
146
147<procedure>(string-substitute REGEXP SUBST STRING [MODE])</procedure>
148
149Searches substrings in {{STRING}} that match {{REGEXP}}
150and substitutes them with the string {{SUBST}}. The substitution
151can contain references to subexpressions in
152{{REGEXP}} with the {{\NUM}} notation, where {{NUM}}
153refers to the NUMth parenthesized expression. The optional argument
154{{MODE}} defaults to 1 and specifies the number of the match to
155be substituted. Any non-numeric index specifies that all matches are to
156be substituted.
157
158<enscript highlight=scheme>
159(string-substitute "([0-9]+) (eggs|chicks)" "\\2 (\\1)" "99 eggs or 99 chicks" 2)
160=> "99 eggs or chicks (99)"
161</enscript>
162
163Note that a regular expression that matches an empty string will
164signal an error.
165
166
167==== string-substitute*
168
169<procedure>(string-substitute* STRING SMAP [MODE])</procedure>
170
171Substitutes elements of {{STRING}} with {{string-substitute}} according to {{SMAP}}.
172{{SMAP}} should be an association-list where each element of the list
173is a pair of the form {{(MATCH . REPLACEMENT)}}. Every occurrence of
174the regular expression {{MATCH}} in {{STRING}} will be replaced by the string
175{{REPLACEMENT}}
176
177<enscript highlight=scheme>
178(string-substitute* "<h1>Hello, world!</h1>" '(("<[/A-Za-z0-9]+>" . "")))
179
180=>  "Hello, world!"
181</enscript>
182
183
184==== regexp-escape
185
186<procedure>(regexp-escape STRING)</procedure>
187
188Escapes all special characters in {{STRING}} with {{\}}, so that the string can be embedded
189into a regular expression.
190
191<enscript highlight=scheme>
192(regexp-escape "^[0-9]+:.*$")
193=>  "\\^\\[0-9\\]\\+:.\n.\\*\\$"
194</enscript>
195
196
197=== Author
198
199[[felix winkelmann]]
200
201=== License
202
203Copyright (c) 2010, Felix L. Winkelmann
204All rights reserved.
205
206Redistribution and use in source and binary forms, with or without
207modification, are permitted provided that the following conditions
208are met:
2091. Redistributions of source code must retain the above copyright
210   notice, this list of conditions and the following disclaimer.
2112. Redistributions in binary form must reproduce the above copyright
212   notice, this list of conditions and the following disclaimer in the
213   documentation and/or other materials provided with the distribution.
2143. The name of the authors may not be used to endorse or promote products
215   derived from this software without specific prior written permission.
216
217THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR
218IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
219OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
220IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT,
221INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
222NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
223DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
224THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
225(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
226THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
227
228=== Version History
229
230; 0.1 : initial release
Note: See TracBrowser for help on using the repository browser.