source: project/wiki/Unit regex @ 1713

Last change on this file since 1713 was 1713, checked in by felix winkelmann, 15 years ago

wiki updates, added missing chicken files

File size: 5.4 KB
Line 
1[[tags: manual]]
2[[toc:]]
3
4== Unit regex
5
6This library unit provides support for regular expressions. The flavor depends on
7the particular installation platform:
8
9* On UNIX systems that have PCRE (the Perl Compatible Regular Expression package) installed, PCRE is used.
10* If PCRE is not available, and the C library provides regular expressions, these are used instead.
11* on Windows (or of PCRE and libc regexes are not available), Dorai Sitaram's portable {{pregexp}} library is used.
12
13
14=== grep
15
16 [procedure] (grep REGEX LIST)
17
18Returns all items of {{LIST}} that match the regular expression
19{{REGEX}}.  This procedure could be defined as follows:
20
21<enscript highlight=scheme>
22(define (grep regex lst)
23  (filter (lambda (x) (string-search regex x)) lst) )
24</enscript>
25
26
27=== glob->regexp
28
29 [procedure] (glob->regexp PATTERN)
30
31Converts the file-pattern {{PATTERN}} into a regular expression.
32
33<enscript highlight=scheme>
34(glob->regexp "foo.*")
35=> "foo\..*"
36</enscript>
37
38
39=== regexp
40
41 [procedure] (regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])
42
43Returns a precompiled regular expression object for {{string}}.
44The optional arguments {{IGNORECASE}}, {{IGNORESPACE}} and {{UTF8}}
45specify whether the regular expression should be matched with case- or whitespace-differences
46ignored, or whether the string should be treated as containing UTF-8 encoded
47characters, respectively.
48
49Notes:
50
51* regex doesn't allow (?: ) cloisters (non-capturing groups). Currently this means if you use utf8 matching, individual "." matching will return extra submatches.
52* pregexp doesn't allow a # comment w/o a trailing newline.
53
54
55=== regexp?
56
57 [procedure] (regexp? X)
58
59Returns {{#t}} if {{X}} is a precompiled regular expression,
60or {{#f}} otherwise.
61
62
63=== string-match
64=== string-match-positions
65
66 [procedure] (string-match REGEXP STRING [START])
67 [procedure] (string-match-positions REGEXP STRING [START])
68
69Matches the regular expression in {{REGEXP}} (a string or a precompiled
70regular expression) with
71{{STRING}} and returns either {{#f}} if the match failed,
72or a list of matching groups, where the first element is the complete
73match. If the optional argument {{START}} is supplied, it specifies
74the starting position in {{STRING}}.  For each matching group the
75result-list contains either: {{#f}} for a non-matching but optional
76group; a list of start- and end-position of the match in {{STRING}}
77(in the case of {{string-match-positions}}); or the matching
78substring (in the case of {{string-match}}). Note that the exact string
79is matched. For searching a pattern inside a string, see below.
80Note also that {{string-match}} is implemented by calling
81{{string-search}} with the regular expression wrapped in {{^ ... $}}.
82
83
84=== string-search
85=== string-search-positions
86
87 [procedure] (string-search REGEXP STRING [START [RANGE]])
88 [procedure] (string-search-positions REGEXP STRING [START [RANGE]])
89
90Searches for the first match of the regular expression in
91{{REGEXP}} with {{STRING}}. The search can be limited to
92{{RANGE}} characters.
93
94
95=== string-split-fields
96
97 [procedure] (string-split-fields REGEXP STRING [MODE [START]])
98
99Splits {{STRING}} into a list of fields according to {{MODE}},
100where {{MODE}} can be the keyword {{#:infix}} ({{REGEXP}}
101matches field separator), the keyword {{#:suffix}} ({{REGEXP}}
102matches field terminator) or {{#t}} ({{REGEXP}} matches field),
103which is the default.
104
105<enscript highlight=scheme>
106(define s "this is a string 1, 2, 3,")
107
108(string-split-fields "[^ ]+" s)
109
110  => ("this" "is" "a" "string" "1," "2," "3,")
111
112(string-split-fields " " s #:infix)
113
114  => ("this" "is" "a" "string" "1," "2," "3,")
115
116(string-split-fields "," s #:suffix))
117 
118  => ("this is a string 1" " 2" " 3")
119</enscript>
120
121
122=== string-substitute
123
124 [procedure] (string-substitute REGEXP SUBST STRING [MODE])
125
126Searches substrings in {{STRING}} that match {{REGEXP}}
127and substitutes them with the string {{SUBST}}. The substitution
128can contain references to subexpressions in
129{{REGEXP}} with the {{\NUM}} notation, where {{NUM}}
130refers to the NUMth parenthesized expression. The optional argument
131{{MODE}} defaults to 1 and specifies the number of the match to
132be substituted. Any non-numeric index specifies that all matches are to
133be substituted.
134
135<enscript highlight=scheme>
136(string-substitute "([0-9]+) (eggs|chicks)"
137                   "\\2 (\\1)" "99 eggs or 99 chicks" 2)
138=> "99 eggs or chicks (99)"
139</enscript>
140
141
142=== string-substitute*
143
144 [procedure] (string-substitute* STRING SMAP [MODE])
145
146Substitutes elements of {{STRING}} with {{string-substitute}} according to {{SMAP}}.
147{{SMAP}} should be an association-list where each element of the list
148is a pair of the form {{(MATCH . REPLACEMENT)}}. Every occurrence of
149the regular expression {{MATCH}} in {{STRING}} will be replaced by the string
150{{REPLACEMENT}}
151
152<enscript highlight=scheme>
153(string-substitute* "<h1>Hello, world!</h1>"
154                    '(("<[/A-Za-z0-9]+>" . ""))))
155
156=>  "Hello, world!"
157</enscript>
158
159
160=== regexp-escape
161
162 [procedure] (regexp-escape STRING)
163
164Escapes all special characters in {{STRING}} with {{\}}, so that the string can be embedded
165into a regular expression.
166
167<enscript highlight=scheme>
168(regexp-escape "^[0-9]+:.*$")
169=>  "\\^\\[0-9\\]\\+:.\n.\\*\\$"
170</enscript>
171
172
173Platform-specific notes:
174
175* due to a bug in the {{pregexp}} library, character classes enclosed in {{[ ... ]}} may not begin with a hyphen ({{-}}). A workaround is either to precede the hyphen with a backslash or use the range {{---}}.
176
177Previous: [[Unit match]]
178
179Next: [[Unit srfi-18]]
Note: See TracBrowser for help on using the repository browser.