source: project/wiki/Unit regex @ 13049

Last change on this file since 13049 was 13049, checked in by Ivan Raikov, 11 years ago

Copied updated regex documentation.

File size: 5.3 KB
Line 
1[[tags: manual]]
2[[toc:]]
3
4== Unit regex
5
6This library unit provides support for regular expressions. The
7regular expression package used is {{irregex}} written by Alex
8Shinn. See [[http://synthcode.com/scheme/irregex/]] for information
9about the particular regexp flavor and extensions provided by this
10library.
11
12To test that {{irregex}} support has been built into Chicken properly,
13try:
14
15<enscript highlight=scheme>
16(require 'regex)
17(feature? 'irregex) => #t
18</enscript>
19
20
21=== grep
22
23 [procedure] (grep REGEX LIST)
24
25Returns all items of {{LIST}} that match the regular expression
26{{REGEX}}.  This procedure could be defined as follows:
27
28<enscript highlight=scheme>
29(define (grep regex lst)
30  (filter (lambda (x) (string-search regex x)) lst) )
31</enscript>
32
33
34=== glob->regexp
35
36 [procedure] (glob->regexp PATTERN)
37
38Converts the file-pattern {{PATTERN}} into a regular expression.
39
40<enscript highlight=scheme>
41(glob->regexp "foo.*")
42=> "foo\..*"
43</enscript>
44
45{{PATTERN}} should follow "glob" syntax. Allowed wildcards are
46
47 *
48 [C...]
49 [C1-C2]
50 [-C...]
51 ?
52
53
54=== glob?
55
56 [procedure] (glob? STRING)
57
58Does the {{STRING}} have any "glob" wildcards?
59
60A string without any "glob" wildcards does not meet the criteria,
61even though it technically is a valid "glob" file-pattern.
62
63
64=== regexp
65
66 [procedure] (regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])
67
68Returns a precompiled regular expression object for {{string}}.
69The optional arguments {{IGNORECASE}}, {{IGNORESPACE}} and {{UTF8}}
70specify whether the regular expression should be matched with case- or whitespace-differences
71ignored, or whether the string should be treated as containing UTF-8 encoded
72characters, respectively.
73
74=== regexp?
75
76 [procedure] (regexp? X)
77
78Returns {{#t}} if {{X}} is a precompiled regular expression,
79or {{#f}} otherwise.
80
81=== string-match
82=== string-match-positions
83
84 [procedure] (string-match REGEXP STRING )
85 [procedure] (string-match-positions REGEXP STRING )
86
87Matches the regular expression in {{REGEXP}} (a string or a
88precompiled regular expression) with {{STRING}} and returns either
89{{#f}} if the match failed, or a list of matching groups, where the
90first element is the complete match. For each matching group the
91result-list contains either: {{#f}} for a non-matching but optional
92group; a list of start- and end-position of the match in {{STRING}}
93(in the case of {{string-match-positions}}); or the matching substring
94(in the case of {{string-match}}). Note that the exact string is
95matched. For searching a pattern inside a string, see below.  Note
96also that {{string-match}} is implemented by calling {{string-search}}
97with the regular expression wrapped in {{^ ... $}}.  If invoked with a
98precompiled regular expression argument (by using {{regexp}}),
99{{string-match}} is identical to {{string-search}}.
100
101
102=== string-search
103=== string-search-positions
104
105 [procedure] (string-search REGEXP STRING [START [RANGE]])
106 [procedure] (string-search-positions REGEXP STRING [START [RANGE]])
107
108Searches for the first match of the regular expression in
109{{REGEXP}} with {{STRING}}. The search can be limited to
110{{RANGE}} characters.
111
112
113=== string-split-fields
114
115 [procedure] (string-split-fields REGEXP STRING [MODE [START]])
116
117Splits {{STRING}} into a list of fields according to {{MODE}},
118where {{MODE}} can be the keyword {{#:infix}} ({{REGEXP}}
119matches field separator), the keyword {{#:suffix}} ({{REGEXP}}
120matches field terminator) or {{#t}} ({{REGEXP}} matches field),
121which is the default.
122
123<enscript highlight=scheme>
124(define s "this is a string 1, 2, 3,")
125
126(string-split-fields "[^ ]+" s)
127
128  => ("this" "is" "a" "string" "1," "2," "3,")
129
130(string-split-fields " " s #:infix)
131
132  => ("this" "is" "a" "string" "1," "2," "3,")
133
134(string-split-fields "," s #:suffix)
135 
136  => ("this is a string 1" " 2" " 3")
137</enscript>
138
139
140=== string-substitute
141
142 [procedure] (string-substitute REGEXP SUBST STRING [MODE])
143
144Searches substrings in {{STRING}} that match {{REGEXP}}
145and substitutes them with the string {{SUBST}}. The substitution
146can contain references to subexpressions in
147{{REGEXP}} with the {{\NUM}} notation, where {{NUM}}
148refers to the NUMth parenthesized expression. The optional argument
149{{MODE}} defaults to 1 and specifies the number of the match to
150be substituted. Any non-numeric index specifies that all matches are to
151be substituted.
152
153<enscript highlight=scheme>
154(string-substitute "([0-9]+) (eggs|chicks)"
155                   "\\2 (\\1)" "99 eggs or 99 chicks" 2)
156=> "99 eggs or chicks (99)"
157</enscript>
158
159Note that a regular expression that matches an empty string will
160signal an error.
161
162
163=== string-substitute*
164
165 [procedure] (string-substitute* STRING SMAP [MODE])
166
167Substitutes elements of {{STRING}} with {{string-substitute}} according to {{SMAP}}.
168{{SMAP}} should be an association-list where each element of the list
169is a pair of the form {{(MATCH . REPLACEMENT)}}. Every occurrence of
170the regular expression {{MATCH}} in {{STRING}} will be replaced by the string
171{{REPLACEMENT}}
172
173<enscript highlight=scheme>
174(string-substitute* "<h1>Hello, world!</h1>"
175                    '(("<[/A-Za-z0-9]+>" . "")))
176
177=>  "Hello, world!"
178</enscript>
179
180
181=== regexp-escape
182
183 [procedure] (regexp-escape STRING)
184
185Escapes all special characters in {{STRING}} with {{\}}, so that the string can be embedded
186into a regular expression.
187
188<enscript highlight=scheme>
189(regexp-escape "^[0-9]+:.*$")
190=>  "\\^\\[0-9\\]\\+:.\n.\\*\\$"
191</enscript>
192
193
194Previous: [[Unit match]]
195
196Next: [[Unit srfi-18]]
Note: See TracBrowser for help on using the repository browser.