source: project/wiki/Unit regex @ 2905

Last change on this file since 2905 was 2905, checked in by felix winkelmann, 14 years ago

manual updates

File size: 4.8 KB
Line 
1[[tags: manual]]
2[[toc:]]
3
4== Unit regex
5
6This library unit provides support for regular expressions. The regular
7expression package used is {{PCRE}} (''Perl Compatible Regular Expressions'')
8written by Philip Hazel. See [[http://www.pcre.org]] for information about
9the particular regexp flavor and extensions provided by this library.
10
11
12=== grep
13
14 [procedure] (grep REGEX LIST)
15
16Returns all items of {{LIST}} that match the regular expression
17{{REGEX}}.  This procedure could be defined as follows:
18
19<enscript highlight=scheme>
20(define (grep regex lst)
21  (filter (lambda (x) (string-search regex x)) lst) )
22</enscript>
23
24
25=== glob->regexp
26
27 [procedure] (glob->regexp PATTERN)
28
29Converts the file-pattern {{PATTERN}} into a regular expression.
30
31<enscript highlight=scheme>
32(glob->regexp "foo.*")
33=> "foo\..*"
34</enscript>
35
36
37=== regexp
38
39 [procedure] (regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])
40
41Returns a precompiled regular expression object for {{string}}.
42The optional arguments {{IGNORECASE}}, {{IGNORESPACE}} and {{UTF8}}
43specify whether the regular expression should be matched with case- or whitespace-differences
44ignored, or whether the string should be treated as containing UTF-8 encoded
45characters, respectively.
46
47
48=== regexp?
49
50 [procedure] (regexp? X)
51
52Returns {{#t}} if {{X}} is a precompiled regular expression,
53or {{#f}} otherwise.
54
55
56=== string-match
57=== string-match-positions
58
59 [procedure] (string-match REGEXP STRING [START])
60 [procedure] (string-match-positions REGEXP STRING [START])
61
62Matches the regular expression in {{REGEXP}} (a string or a precompiled
63regular expression) with
64{{STRING}} and returns either {{#f}} if the match failed,
65or a list of matching groups, where the first element is the complete
66match. If the optional argument {{START}} is supplied, it specifies
67the starting position in {{STRING}}.  For each matching group the
68result-list contains either: {{#f}} for a non-matching but optional
69group; a list of start- and end-position of the match in {{STRING}}
70(in the case of {{string-match-positions}}); or the matching
71substring (in the case of {{string-match}}). Note that the exact string
72is matched. For searching a pattern inside a string, see below.
73Note also that {{string-match}} is implemented by calling
74{{string-search}} with the regular expression wrapped in {{^ ... $}}.
75
76
77=== string-search
78=== string-search-positions
79
80 [procedure] (string-search REGEXP STRING [START [RANGE]])
81 [procedure] (string-search-positions REGEXP STRING [START [RANGE]])
82
83Searches for the first match of the regular expression in
84{{REGEXP}} with {{STRING}}. The search can be limited to
85{{RANGE}} characters.
86
87
88=== string-split-fields
89
90 [procedure] (string-split-fields REGEXP STRING [MODE [START]])
91
92Splits {{STRING}} into a list of fields according to {{MODE}},
93where {{MODE}} can be the keyword {{#:infix}} ({{REGEXP}}
94matches field separator), the keyword {{#:suffix}} ({{REGEXP}}
95matches field terminator) or {{#t}} ({{REGEXP}} matches field),
96which is the default.
97
98<enscript highlight=scheme>
99(define s "this is a string 1, 2, 3,")
100
101(string-split-fields "[^ ]+" s)
102
103  => ("this" "is" "a" "string" "1," "2," "3,")
104
105(string-split-fields " " s #:infix)
106
107  => ("this" "is" "a" "string" "1," "2," "3,")
108
109(string-split-fields "," s #:suffix))
110 
111  => ("this is a string 1" " 2" " 3")
112</enscript>
113
114
115=== string-substitute
116
117 [procedure] (string-substitute REGEXP SUBST STRING [MODE])
118
119Searches substrings in {{STRING}} that match {{REGEXP}}
120and substitutes them with the string {{SUBST}}. The substitution
121can contain references to subexpressions in
122{{REGEXP}} with the {{\NUM}} notation, where {{NUM}}
123refers to the NUMth parenthesized expression. The optional argument
124{{MODE}} defaults to 1 and specifies the number of the match to
125be substituted. Any non-numeric index specifies that all matches are to
126be substituted.
127
128<enscript highlight=scheme>
129(string-substitute "([0-9]+) (eggs|chicks)"
130                   "\\2 (\\1)" "99 eggs or 99 chicks" 2)
131=> "99 eggs or chicks (99)"
132</enscript>
133
134
135=== string-substitute*
136
137 [procedure] (string-substitute* STRING SMAP [MODE])
138
139Substitutes elements of {{STRING}} with {{string-substitute}} according to {{SMAP}}.
140{{SMAP}} should be an association-list where each element of the list
141is a pair of the form {{(MATCH . REPLACEMENT)}}. Every occurrence of
142the regular expression {{MATCH}} in {{STRING}} will be replaced by the string
143{{REPLACEMENT}}
144
145<enscript highlight=scheme>
146(string-substitute* "<h1>Hello, world!</h1>"
147                    '(("<[/A-Za-z0-9]+>" . ""))))
148
149=>  "Hello, world!"
150</enscript>
151
152
153=== regexp-escape
154
155 [procedure] (regexp-escape STRING)
156
157Escapes all special characters in {{STRING}} with {{\}}, so that the string can be embedded
158into a regular expression.
159
160<enscript highlight=scheme>
161(regexp-escape "^[0-9]+:.*$")
162=>  "\\^\\[0-9\\]\\+:.\n.\\*\\$"
163</enscript>
164
165
166Previous: [[Unit match]]
167
168Next: [[Unit srfi-18]]
Note: See TracBrowser for help on using the repository browser.