source: project/wiki/Unit regex @ 13009

Last change on this file since 13009 was 13009, checked in by Ivan Raikov, 11 years ago

Updates to the regex unit documentation to reflect irregex transition.

File size: 7.2 KB
Line 
1[[tags: manual]]
2[[toc:]]
3
4== Unit regex
5
6This library unit provides support for regular expressions. The
7regular expression package used is {{irregex}} written by Alex
8Shinn. See [[http://synthcode.com/scheme/irregex/]] for information
9about the particular regexp flavor and extensions provided by this
10library.
11
12To test that {{irregex}} support has been built into Chicken properly,
13try:
14
15<enscript highlight=scheme>
16(require 'regex)
17(feature? 'irregex) => #t
18</enscript>
19
20
21=== grep
22
23 [procedure] (grep REGEX LIST)
24
25Returns all items of {{LIST}} that match the regular expression
26{{REGEX}}.  This procedure could be defined as follows:
27
28<enscript highlight=scheme>
29(define (grep regex lst)
30  (filter (lambda (x) (string-search regex x)) lst) )
31</enscript>
32
33
34=== glob->regexp
35
36 [procedure] (glob->regexp PATTERN)
37
38Converts the file-pattern {{PATTERN}} into a regular expression.
39
40<enscript highlight=scheme>
41(glob->regexp "foo.*")
42=> "foo\..*"
43</enscript>
44
45{{PATTERN}} should follow "glob" syntax. Allowed wildcards are
46
47 *
48 [C...]
49 [C1-C2]
50 [-C...]
51 ?
52
53
54=== glob?
55
56 [procedure] (glob? STRING)
57
58Does the {{STRING}} have any "glob" wildcards?
59
60A string without any "glob" wildcards does not meet the criteria,
61even though it technically is a valid "glob" file-pattern.
62
63
64=== regexp
65
66 [procedure] (regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])
67
68Returns a precompiled regular expression object for {{string}}.
69The optional arguments {{IGNORECASE}}, {{IGNORESPACE}} and {{UTF8}}
70specify whether the regular expression should be matched with case- or whitespace-differences
71ignored, or whether the string should be treated as containing UTF-8 encoded
72characters, respectively.
73
74
75=== regexp*
76
77 [procedure] (regexp* STRING [OPTIONS [TABLES]])
78
79Returns a precompiled regular expression object for {{string}}. The optional
80argument {{OPTIONS}} must be a list of option symbols. The optional argument
81{{TABLES}} must be a character definitions table (not defined here).
82
83
84Option Symbols:
85
86; caseless : Character case insensitive match
87; multiline : Equivalent to Perl's /m option
88; dotall : Equivalent to Perl's /s option
89; extended : Ignore whitespace
90; anchored : Anchor pattern match
91; dollar-endonly : `$' metacharacter in the pattern matches only at the end of the subject string
92; extra : Currently of very little use
93; notbol : First character of the string is not the beginning of a line
94; noteol : End of the string is not the end of a line
95; ungreedy : Inverts the "greediness" of the quantifiers so that they are not greedy by default
96; notempty : The empty string is not considered to be a valid match
97; utf8 : UTF-8 encoded characters
98; no-auto-capture : Disables the use of numbered capturing parentheses
99; no-utf8-check : Skip valid UTF-8 sequence check
100; auto-callout : Automatically inserts callout items (not defined here)
101; partial : Partial match ok
102; firstline : An unanchored pattern is required to match before or at the first newline
103; dupnames : Names used to identify capturing subpatterns need not be unique
104; newline-cr : Newline definition is `\r'
105; newline-lf : Newline definition is `\n'
106; newline-crlf : Newline definition is `\r\n'
107; newline-anycrlf : Newline definition is any of `\r', `\n', or `\r\n'
108; newline-any : Newline definition is any Unicode newline sequence
109; bsr-anycrlf : `\R' escape sequence matches only CR, LF, or CRLF
110; bsr-unicode : `\R' escape sequence matches only Unicode newline sequence
111
112; dfa-shortest : Currently unused
113; dfa-restart : Currently unused
114
115
116=== regexp?
117
118 [procedure] (regexp? X)
119
120Returns {{#t}} if {{X}} is a precompiled regular expression,
121or {{#f}} otherwise.
122
123=== string-match
124=== string-match-positions
125
126 [procedure] (string-match REGEXP STRING [START])
127 [procedure] (string-match-positions REGEXP STRING [START])
128
129Matches the regular expression in {{REGEXP}} (a string or a precompiled
130regular expression) with
131{{STRING}} and returns either {{#f}} if the match failed,
132or a list of matching groups, where the first element is the complete
133match. If the optional argument {{START}} is supplied, it specifies
134the starting position in {{STRING}}.  For each matching group the
135result-list contains either: {{#f}} for a non-matching but optional
136group; a list of start- and end-position of the match in {{STRING}}
137(in the case of {{string-match-positions}}); or the matching
138substring (in the case of {{string-match}}). Note that the exact string
139is matched. For searching a pattern inside a string, see below.
140Note also that {{string-match}} is implemented by calling
141{{string-search}} with the regular expression wrapped in {{^ ... $}}.
142If invoked with a precompiled regular expression argument (by using
143{{regexp}}), {{string-match}} is identical to {{string-search}}.
144
145
146=== string-search
147=== string-search-positions
148
149 [procedure] (string-search REGEXP STRING [START [RANGE]])
150 [procedure] (string-search-positions REGEXP STRING [START [RANGE]])
151
152Searches for the first match of the regular expression in
153{{REGEXP}} with {{STRING}}. The search can be limited to
154{{RANGE}} characters.
155
156
157=== string-split-fields
158
159 [procedure] (string-split-fields REGEXP STRING [MODE [START]])
160
161Splits {{STRING}} into a list of fields according to {{MODE}},
162where {{MODE}} can be the keyword {{#:infix}} ({{REGEXP}}
163matches field separator), the keyword {{#:suffix}} ({{REGEXP}}
164matches field terminator) or {{#t}} ({{REGEXP}} matches field),
165which is the default.
166
167<enscript highlight=scheme>
168(define s "this is a string 1, 2, 3,")
169
170(string-split-fields "[^ ]+" s)
171
172  => ("this" "is" "a" "string" "1," "2," "3,")
173
174(string-split-fields " " s #:infix)
175
176  => ("this" "is" "a" "string" "1," "2," "3,")
177
178(string-split-fields "," s #:suffix)
179 
180  => ("this is a string 1" " 2" " 3")
181</enscript>
182
183
184=== string-substitute
185
186 [procedure] (string-substitute REGEXP SUBST STRING [MODE])
187
188Searches substrings in {{STRING}} that match {{REGEXP}}
189and substitutes them with the string {{SUBST}}. The substitution
190can contain references to subexpressions in
191{{REGEXP}} with the {{\NUM}} notation, where {{NUM}}
192refers to the NUMth parenthesized expression. The optional argument
193{{MODE}} defaults to 1 and specifies the number of the match to
194be substituted. Any non-numeric index specifies that all matches are to
195be substituted.
196
197<enscript highlight=scheme>
198(string-substitute "([0-9]+) (eggs|chicks)"
199                   "\\2 (\\1)" "99 eggs or 99 chicks" 2)
200=> "99 eggs or chicks (99)"
201</enscript>
202
203Note that a regular expression that matches an empty string will
204signal an error.
205
206
207=== string-substitute*
208
209 [procedure] (string-substitute* STRING SMAP [MODE])
210
211Substitutes elements of {{STRING}} with {{string-substitute}} according to {{SMAP}}.
212{{SMAP}} should be an association-list where each element of the list
213is a pair of the form {{(MATCH . REPLACEMENT)}}. Every occurrence of
214the regular expression {{MATCH}} in {{STRING}} will be replaced by the string
215{{REPLACEMENT}}
216
217<enscript highlight=scheme>
218(string-substitute* "<h1>Hello, world!</h1>"
219                    '(("<[/A-Za-z0-9]+>" . "")))
220
221=>  "Hello, world!"
222</enscript>
223
224
225=== regexp-escape
226
227 [procedure] (regexp-escape STRING)
228
229Escapes all special characters in {{STRING}} with {{\}}, so that the string can be embedded
230into a regular expression.
231
232<enscript highlight=scheme>
233(regexp-escape "^[0-9]+:.*$")
234=>  "\\^\\[0-9\\]\\+:.\n.\\*\\$"
235</enscript>
236
237
238Previous: [[Unit match]]
239
240Next: [[Unit srfi-18]]
Note: See TracBrowser for help on using the repository browser.