source: project/chicken/trunk/manual/Unit regex @ 13740

Last change on this file since 13740 was 13740, checked in by Kon Lovett, 11 years ago

Ordered by srfi #. Added srfi-69.

File size: 5.5 KB
Line 
1[[tags: manual]]
2[[toc:]]
3
4== Unit regex
5
6This library unit provides support for regular expressions. The regular
7expression package used is {{irregex}}
8written by Alex Shinn. Irregex supports most Perl-extensions and is
9written completely in Scheme.
10
11This library unit exposes two APIs: the one listed below and the
12original irregex API. To use the latter, import from the {{irregex}} module.
13
14
15=== grep
16
17 [procedure] (grep REGEX LIST)
18
19Returns all items of {{LIST}} that match the regular expression
20{{REGEX}}.  This procedure could be defined as follows:
21
22<enscript highlight=scheme>
23(define (grep regex lst)
24  (filter (lambda (x) (string-search regex x)) lst) )
25</enscript>
26
27
28=== glob->regexp
29
30 [procedure] (glob->regexp PATTERN)
31
32Converts the file-pattern {{PATTERN}} into a regular expression.
33
34<enscript highlight=scheme>
35(glob->regexp "foo.*")
36=> "foo\..*"
37</enscript>
38
39{{PATTERN}} should follow "glob" syntax. Allowed wildcards are
40
41 *
42 [C...]
43 [C1-C2]
44 [-C...]
45 ?
46
47
48=== glob?
49
50 [procedure] (glob? STRING)
51
52Does the {{STRING}} have any "glob" wildcards?
53
54A string without any "glob" wildcards does not meet the criteria,
55even though it technically is a valid "glob" file-pattern.
56
57
58=== regexp
59
60 [procedure] (regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])
61
62Returns a precompiled regular expression object for {{string}}.
63The optional arguments {{IGNORECASE}}, {{IGNORESPACE}} and {{UTF8}}
64specify whether the regular expression should be matched with case- or whitespace-differences
65ignored, or whether the string should be treated as containing UTF-8 encoded
66characters, respectively.
67
68Note that code that uses regular expressions heavily should always
69use them in precompiled form, which is likely to be much faster than
70passing strings to any of the regular-expression routines described
71below.
72
73
74=== regexp?
75
76 [procedure] (regexp? X)
77
78Returns {{#t}} if {{X}} is a precompiled regular expression,
79or {{#f}} otherwise.
80
81
82=== string-match
83=== string-match-positions
84
85 [procedure] (string-match REGEXP STRING [START])
86 [procedure] (string-match-positions REGEXP STRING [START])
87
88Matches the regular expression in {{REGEXP}} (a string or a precompiled
89regular expression) with
90{{STRING}} and returns either {{#f}} if the match failed,
91or a list of matching groups, where the first element is the complete
92match. If the optional argument {{START}} is supplied, it specifies
93the starting position in {{STRING}}.  For each matching group the
94result-list contains either: {{#f}} for a non-matching but optional
95group; a list of start- and end-position of the match in {{STRING}}
96(in the case of {{string-match-positions}}); or the matching
97substring (in the case of {{string-match}}). Note that the exact string
98is matched. For searching a pattern inside a string, see below.
99Note also that {{string-match}} is implemented by calling
100{{string-search}} with the regular expression wrapped in {{^ ... $}}.
101If invoked with a precompiled regular expression argument (by using
102{{regexp}}), {{string-match}} is identical to {{string-search}}.
103
104
105=== string-search
106=== string-search-positions
107
108 [procedure] (string-search REGEXP STRING [START [RANGE]])
109 [procedure] (string-search-positions REGEXP STRING [START [RANGE]])
110
111Searches for the first match of the regular expression in
112{{REGEXP}} with {{STRING}}. The search can be limited to
113{{RANGE}} characters.
114
115
116=== string-split-fields
117
118 [procedure] (string-split-fields REGEXP STRING [MODE [START]])
119
120Splits {{STRING}} into a list of fields according to {{MODE}},
121where {{MODE}} can be the keyword {{#:infix}} ({{REGEXP}}
122matches field separator), the keyword {{#:suffix}} ({{REGEXP}}
123matches field terminator) or {{#t}} ({{REGEXP}} matches field),
124which is the default.
125
126<enscript highlight=scheme>
127(define s "this is a string 1, 2, 3,")
128
129(string-split-fields "[^ ]+" s)
130
131  => ("this" "is" "a" "string" "1," "2," "3,")
132
133(string-split-fields " " s #:infix)
134
135  => ("this" "is" "a" "string" "1," "2," "3,")
136
137(string-split-fields "," s #:suffix)
138 
139  => ("this is a string 1" " 2" " 3")
140</enscript>
141
142
143=== string-substitute
144
145 [procedure] (string-substitute REGEXP SUBST STRING [MODE])
146
147Searches substrings in {{STRING}} that match {{REGEXP}}
148and substitutes them with the string {{SUBST}}. The substitution
149can contain references to subexpressions in
150{{REGEXP}} with the {{\NUM}} notation, where {{NUM}}
151refers to the NUMth parenthesized expression. The optional argument
152{{MODE}} defaults to 1 and specifies the number of the match to
153be substituted. Any non-numeric index specifies that all matches are to
154be substituted.
155
156<enscript highlight=scheme>
157(string-substitute "([0-9]+) (eggs|chicks)" "\\2 (\\1)" "99 eggs or 99 chicks" 2)
158=> "99 eggs or chicks (99)"
159</enscript>
160
161Note that a regular expression that matches an empty string will
162signal an error.
163
164
165=== string-substitute*
166
167 [procedure] (string-substitute* STRING SMAP [MODE])
168
169Substitutes elements of {{STRING}} with {{string-substitute}} according to {{SMAP}}.
170{{SMAP}} should be an association-list where each element of the list
171is a pair of the form {{(MATCH . REPLACEMENT)}}. Every occurrence of
172the regular expression {{MATCH}} in {{STRING}} will be replaced by the string
173{{REPLACEMENT}}
174
175<enscript highlight=scheme>
176(string-substitute* "<h1>Hello, world!</h1>" '(("<[/A-Za-z0-9]+>" . "")))
177
178=>  "Hello, world!"
179</enscript>
180
181
182=== regexp-escape
183
184 [procedure] (regexp-escape STRING)
185
186Escapes all special characters in {{STRING}} with {{\}}, so that the string can be embedded
187into a regular expression.
188
189<enscript highlight=scheme>
190(regexp-escape "^[0-9]+:.*$")
191=>  "\\^\\[0-9\\]\\+:.\n.\\*\\$"
192</enscript>
193
194---
195Previous: [[Unit extras]]
196
197Next: [[Unit srfi-1]]
Note: See TracBrowser for help on using the repository browser.