source: project/wiki/Unit regex @ 13142

Last change on this file since 13142 was 13142, checked in by Jim Ursetto, 11 years ago

Changes applied for zbigniew (71.201.84.72) through svnwiki:

admonishment to use OR instead of vertical pipe in SRE syntax

File size: 5.5 KB
Line 
1[[tags: manual]]
2[[toc:]]
3
4== Unit regex
5
6This library unit provides support for regular expressions. The
7regular expression package used is {{irregex}} written by Alex
8Shinn. See [[http://synthcode.com/scheme/irregex/]] for information
9about the particular regexp flavor and extensions provided by this
10library.
11
12To test that {{irregex}} support has been built into Chicken properly,
13try:
14
15<enscript highlight=scheme>
16(require 'regex)
17(feature? 'irregex) => #t
18</enscript>
19
20Note on SRE syntax: instead of {{(| <sre> ...)}}, use {{(or <sre> ...)}}. The character {{|}} (vertical pipe) is handled specially in Chicken, and cannot be used on its own as a symbol.
21
22=== grep
23
24 [procedure] (grep REGEX LIST)
25
26Returns all items of {{LIST}} that match the regular expression
27{{REGEX}}.  This procedure could be defined as follows:
28
29<enscript highlight=scheme>
30(define (grep regex lst)
31  (filter (lambda (x) (string-search regex x)) lst) )
32</enscript>
33
34
35=== glob->regexp
36
37 [procedure] (glob->regexp PATTERN)
38
39Converts the file-pattern {{PATTERN}} into a regular expression.
40
41<enscript highlight=scheme>
42(glob->regexp "foo.*")
43=> "foo\..*"
44</enscript>
45
46{{PATTERN}} should follow "glob" syntax. Allowed wildcards are
47
48 *
49 [C...]
50 [C1-C2]
51 [-C...]
52 ?
53
54
55=== glob?
56
57 [procedure] (glob? STRING)
58
59Does the {{STRING}} have any "glob" wildcards?
60
61A string without any "glob" wildcards does not meet the criteria,
62even though it technically is a valid "glob" file-pattern.
63
64
65=== regexp
66
67 [procedure] (regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])
68
69Returns a precompiled regular expression object for {{string}}.
70The optional arguments {{IGNORECASE}}, {{IGNORESPACE}} and {{UTF8}}
71specify whether the regular expression should be matched with case- or whitespace-differences
72ignored, or whether the string should be treated as containing UTF-8 encoded
73characters, respectively.
74
75=== regexp?
76
77 [procedure] (regexp? X)
78
79Returns {{#t}} if {{X}} is a precompiled regular expression,
80or {{#f}} otherwise.
81
82=== string-match
83=== string-match-positions
84
85 [procedure] (string-match REGEXP STRING )
86 [procedure] (string-match-positions REGEXP STRING )
87
88Matches the regular expression in {{REGEXP}} (a string or a
89precompiled regular expression) with {{STRING}} and returns either
90{{#f}} if the match failed, or a list of matching groups, where the
91first element is the complete match. For each matching group the
92result-list contains either: {{#f}} for a non-matching but optional
93group; a list of start- and end-position of the match in {{STRING}}
94(in the case of {{string-match-positions}}); or the matching substring
95(in the case of {{string-match}}). Note that the exact string is
96matched. For searching a pattern inside a string, see below.  Note
97also that {{string-match}} is implemented by calling {{string-search}}
98with the regular expression wrapped in {{^ ... $}}.  If invoked with a
99precompiled regular expression argument (by using {{regexp}}),
100{{string-match}} is identical to {{string-search}}.
101
102
103=== string-search
104=== string-search-positions
105
106 [procedure] (string-search REGEXP STRING [START [RANGE]])
107 [procedure] (string-search-positions REGEXP STRING [START [RANGE]])
108
109Searches for the first match of the regular expression in
110{{REGEXP}} with {{STRING}}. The search can be limited to
111{{RANGE}} characters.
112
113
114=== string-split-fields
115
116 [procedure] (string-split-fields REGEXP STRING [MODE [START]])
117
118Splits {{STRING}} into a list of fields according to {{MODE}},
119where {{MODE}} can be the keyword {{#:infix}} ({{REGEXP}}
120matches field separator), the keyword {{#:suffix}} ({{REGEXP}}
121matches field terminator) or {{#t}} ({{REGEXP}} matches field),
122which is the default.
123
124<enscript highlight=scheme>
125(define s "this is a string 1, 2, 3,")
126
127(string-split-fields "[^ ]+" s)
128
129  => ("this" "is" "a" "string" "1," "2," "3,")
130
131(string-split-fields " " s #:infix)
132
133  => ("this" "is" "a" "string" "1," "2," "3,")
134
135(string-split-fields "," s #:suffix)
136 
137  => ("this is a string 1" " 2" " 3")
138</enscript>
139
140
141=== string-substitute
142
143 [procedure] (string-substitute REGEXP SUBST STRING [MODE])
144
145Searches substrings in {{STRING}} that match {{REGEXP}}
146and substitutes them with the string {{SUBST}}. The substitution
147can contain references to subexpressions in
148{{REGEXP}} with the {{\NUM}} notation, where {{NUM}}
149refers to the NUMth parenthesized expression. The optional argument
150{{MODE}} defaults to 1 and specifies the number of the match to
151be substituted. Any non-numeric index specifies that all matches are to
152be substituted.
153
154<enscript highlight=scheme>
155(string-substitute "([0-9]+) (eggs|chicks)"
156                   "\\2 (\\1)" "99 eggs or 99 chicks" 2)
157=> "99 eggs or chicks (99)"
158</enscript>
159
160Note that a regular expression that matches an empty string will
161signal an error.
162
163
164=== string-substitute*
165
166 [procedure] (string-substitute* STRING SMAP [MODE])
167
168Substitutes elements of {{STRING}} with {{string-substitute}} according to {{SMAP}}.
169{{SMAP}} should be an association-list where each element of the list
170is a pair of the form {{(MATCH . REPLACEMENT)}}. Every occurrence of
171the regular expression {{MATCH}} in {{STRING}} will be replaced by the string
172{{REPLACEMENT}}
173
174<enscript highlight=scheme>
175(string-substitute* "<h1>Hello, world!</h1>"
176                    '(("<[/A-Za-z0-9]+>" . "")))
177
178=>  "Hello, world!"
179</enscript>
180
181
182=== regexp-escape
183
184 [procedure] (regexp-escape STRING)
185
186Escapes all special characters in {{STRING}} with {{\}}, so that the string can be embedded
187into a regular expression.
188
189<enscript highlight=scheme>
190(regexp-escape "^[0-9]+:.*$")
191=>  "\\^\\[0-9\\]\\+:.\n.\\*\\$"
192</enscript>
193
194
195Previous: [[Unit match]]
196
197Next: [[Unit srfi-18]]
Note: See TracBrowser for help on using the repository browser.