source: project/chicken/branches/prerelease/manual/Unit regex @ 9381

Last change on this file since 9381 was 9381, checked in by Ivan Raikov, 12 years ago

Merged trunk into prerelease

File size: 8.2 KB
Line 
1[[tags: manual]]
2[[toc:]]
3
4== Unit regex
5
6This library unit provides support for regular expressions. The regular
7expression package used is {{PCRE}} (''Perl Compatible Regular Expressions'')
8written by Philip Hazel. See [[http://www.pcre.org]] for information about
9the particular regexp flavor and extensions provided by this library.
10
11To test that PCRE support has been built into Chicken properly, try:
12
13<enscript highlight=scheme>
14(require 'regex)
15(test-feature? 'pcre) => #t
16</enscript>
17
18
19=== grep
20
21 [procedure] (grep REGEX LIST)
22
23Returns all items of {{LIST}} that match the regular expression
24{{REGEX}}.  This procedure could be defined as follows:
25
26<enscript highlight=scheme>
27(define (grep regex lst)
28  (filter (lambda (x) (string-search regex x)) lst) )
29</enscript>
30
31
32=== glob->regexp
33
34 [procedure] (glob->regexp PATTERN)
35
36Converts the file-pattern {{PATTERN}} into a regular expression.
37
38<enscript highlight=scheme>
39(glob->regexp "foo.*")
40=> "foo\..*"
41</enscript>
42
43{{PATTERN}} should follow "glob" syntax. Allowed wildcards are
44
45 *
46 [C...]
47 [C1-C2]
48 [-C...]
49 ?
50
51
52=== glob?
53
54 [procedure] (glob? STRING)
55
56Does the {{STRING}} have any "glob" wildcards?
57
58A string without any "glob" wildcards does not meet the criteria,
59even though it technically is a valid "glob" file-pattern.
60
61
62=== regex-chardef-table?
63
64 [procedure] (regex-chardef-table? OBJECT)
65
66Returns {{#t}} if the {{OBJECT}} is a {{character definitions table}}, and
67{{#f}} otherwise.
68
69
70=== regex-chardef-table
71
72 [procedure] (regex-chardef-table)
73
74Returns a new {{character definitions table}}.
75
76
77=== regexp
78
79 [procedure] (regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])
80
81Returns a precompiled regular expression object for {{string}}.
82The optional arguments {{IGNORECASE}}, {{IGNORESPACE}} and {{UTF8}}
83specify whether the regular expression should be matched with case- or whitespace-differences
84ignored, or whether the string should be treated as containing UTF-8 encoded
85characters, respectively.
86
87
88=== regexp*
89
90 [procedure] (regexp* STRING [OPTIONS [CHARDEFS-TABLE]])
91
92Returns a precompiled regular expression object for {{string}}. The optional
93argument {{OPTIONS}} must be a list of option symbols. The optional argument
94{{CHARDEFS-TABLE}} must be a character definitions table.
95
96
97==== Option Symbols:
98
99; caseless : Character case insensitive match
100; multiline : Equivalent to Perl's /m option
101; dotall : Equivalent to Perl's /s option
102; extended : Ignore whitespace
103; anchored : Anchor pattern match
104; dollar-endonly : `$' metacharacter in the pattern matches only at the end of the subject string
105; extra : Currently of very little use
106; notbol : First character of the string is not the beginning of a line
107; noteol : End of the string is not the end of a line
108; ungreedy : Inverts the "greediness" of the quantifiers so that they are not greedy by default
109; notempty : The empty string is not considered to be a valid match
110; utf8 : UTF-8 encoded characters
111; no-auto-capture : Disables the use of numbered capturing parentheses
112; no-utf8-check : Skip valid UTF-8 sequence check
113; auto-callout : Automatically inserts callout items (not defined here)
114; partial : Partial match ok
115; firstline : An unanchored pattern is required to match before or at the first newline
116; dupnames : Names used to identify capturing subpatterns need not be unique
117; newline-cr : Newline definition is `\r'
118; newline-lf : Newline definition is `\n'
119; newline-crlf : Newline definition is `\r\n'
120; newline-anycrlf : Newline definition is any of `\r', `\n', or `\r\n'
121; newline-any : Newline definition is any Unicode newline sequence
122; bsr-anycrlf : `\R' escape sequence matches only CR, LF, or CRLF
123; bsr-unicode : `\R' escape sequence matches only Unicode newline sequence
124
125; dfa-shortest : Currently unused
126; dfa-restart : Currently unused
127
128
129=== regexp?
130
131 [procedure] (regexp? X)
132
133Returns {{#t}} if {{X}} is a precompiled regular expression,
134or {{#f}} otherwise.
135
136
137=== regexp-optimize
138
139 [procedure] (regexp-optimize RX)
140
141Perform available optimizations for the precompiled regular expression {{RX}}.
142Returns {{#t}} when optimization performed, and {{#f}} otherwise.
143
144
145=== string-match
146=== string-match-positions
147
148 [procedure] (string-match REGEXP STRING [START])
149 [procedure] (string-match-positions REGEXP STRING [START])
150
151Matches the regular expression in {{REGEXP}} (a string or a precompiled
152regular expression) with
153{{STRING}} and returns either {{#f}} if the match failed,
154or a list of matching groups, where the first element is the complete
155match. If the optional argument {{START}} is supplied, it specifies
156the starting position in {{STRING}}.  For each matching group the
157result-list contains either: {{#f}} for a non-matching but optional
158group; a list of start- and end-position of the match in {{STRING}}
159(in the case of {{string-match-positions}}); or the matching
160substring (in the case of {{string-match}}). Note that the exact string
161is matched. For searching a pattern inside a string, see below.
162Note also that {{string-match}} is implemented by calling
163{{string-search}} with the regular expression wrapped in {{^ ... $}}.
164If invoked with a precompiled regular expression argument (by using
165{{regexp}}), {{string-match}} is identical to {{string-search}}.
166
167
168=== string-search
169=== string-search-positions
170
171 [procedure] (string-search REGEXP STRING [START [RANGE]])
172 [procedure] (string-search-positions REGEXP STRING [START [RANGE]])
173
174Searches for the first match of the regular expression in
175{{REGEXP}} with {{STRING}}. The search can be limited to
176{{RANGE}} characters.
177
178
179=== string-split-fields
180
181 [procedure] (string-split-fields REGEXP STRING [MODE [START]])
182
183Splits {{STRING}} into a list of fields according to {{MODE}},
184where {{MODE}} can be the keyword {{#:infix}} ({{REGEXP}}
185matches field separator), the keyword {{#:suffix}} ({{REGEXP}}
186matches field terminator) or {{#t}} ({{REGEXP}} matches field),
187which is the default.
188
189<enscript highlight=scheme>
190(define s "this is a string 1, 2, 3,")
191
192(string-split-fields "[^ ]+" s)
193
194  => ("this" "is" "a" "string" "1," "2," "3,")
195
196(string-split-fields " " s #:infix)
197
198  => ("this" "is" "a" "string" "1," "2," "3,")
199
200(string-split-fields "," s #:suffix)
201
202  => ("this is a string 1" " 2" " 3")
203</enscript>
204
205
206=== string-substitute
207
208 [procedure] (string-substitute REGEXP SUBST STRING [MODE])
209
210Searches substrings in {{STRING}} that match {{REGEXP}}
211and substitutes them with the string {{SUBST}}. The substitution
212can contain references to subexpressions in
213{{REGEXP}} with the {{\NUM}} notation, where {{NUM}}
214refers to the NUMth parenthesized expression. The optional argument
215{{MODE}} defaults to 1 and specifies the number of the match to
216be substituted. Any non-numeric index specifies that all matches are to
217be substituted.
218
219<enscript highlight=scheme>
220(string-substitute "([0-9]+) (eggs|chicks)"
221                   "\\2 (\\1)" "99 eggs or 99 chicks" 2)
222=> "99 eggs or chicks (99)"
223</enscript>
224
225Note that a regular expression that matches an empty string will
226signal an error.
227
228
229=== string-substitute*
230
231 [procedure] (string-substitute* STRING SMAP [MODE])
232
233Substitutes elements of {{STRING}} with {{string-substitute}} according to {{SMAP}}.
234{{SMAP}} should be an association-list where each element of the list
235is a pair of the form {{(MATCH . REPLACEMENT)}}. Every occurrence of
236the regular expression {{MATCH}} in {{STRING}} will be replaced by the string
237{{REPLACEMENT}}
238
239<enscript highlight=scheme>
240(string-substitute* "<h1>Hello, world!</h1>"
241                    '(("<[/A-Za-z0-9]+>" . "")))
242
243=>  "Hello, world!"
244</enscript>
245
246
247=== regexp-escape
248
249 [procedure] (regexp-escape STRING)
250
251Escapes all special characters in {{STRING}} with {{\}}, so that the string can be embedded
252into a regular expression.
253
254<enscript highlight=scheme>
255(regexp-escape "^[0-9]+:.*$")
256=>  "\\^\\[0-9\\]\\+:.\n.\\*\\$"
257</enscript>
258
259
260=== make-anchored-pattern
261
262 [procedure] (make-anchored-pattern REGEXP [WITHOUT-BOL [WITHOUT-EOL]])
263
264Makes an anchored pattern from {{REGEXP}} (a string or a precompiled regular
265expression) and returns the updated pattern. When {{WITHOUT-BOL}} is {{#t}} the
266beginning-of-line anchor is not added. When {{WITHOUT-EOL}} is {{#t}} the
267end-of-line anchor is not added.
268
269The {{WITHOUT-BOL}} and {WITHOUT-EOL}} arguments are ignored for a precompiled regular
270expression.
271
272
273Previous: [[Unit match]]
274
275Next: [[Unit srfi-18]]
Note: See TracBrowser for help on using the repository browser.