Changeset 13851 in project
- Timestamp:
- 03/20/09 21:47:37 (12 years ago)
- Location:
- chicken/trunk
- Files:
-
- 1 added
- 3 edited
Legend:
- Unmodified
- Added
- Removed
-
chicken/trunk/manual/The User's Manual
r13710 r13851 1 1 [[tags:manual]] 2 2 3 [[image:http://www.call-with-current-continuation.org/chicken .png]]3 [[image:http://www.call-with-current-continuation.org/chicken4.png]] 4 4 5 5 == The CHICKEN User's Manual -
chicken/trunk/manual/Unit regex
r13740 r13851 11 11 This library unit exposes two APIs: the one listed below and the 12 12 original irregex API. To use the latter, import from the {{irregex}} module. 13 14 Regular expressions may be either POSIX-style strings (with most PCRE 15 extensions) or an SCSH-style SRE. There is no {{(rx ...)}} syntax - 16 just use normal Scheme lists, with quasiquote if you like. 17 13 18 14 19 … … 192 197 </enscript> 193 198 199 === Extended SRE Syntax 200 201 The following table summarizes the SRE syntax, with detailed explanations following. 202 203 ;; basic patterns 204 <string> ; literal string 205 (seq <sre> ...) ; sequence 206 (: <sre> ...) 207 (or <sre> ...) ; alternation 208 209 ;; optional/multiple patterns 210 (? <sre> ...) ; 0 or 1 matches 211 (* <sre> ...) ; 0 or more matches 212 (+ <sre> ...) ; 1 or more matches 213 (= <n> <sre> ...) ; exactly <n> matches 214 (>= <n> <sre> ...) ; <n> or more matches 215 (** <from> <to> <sre> ...) ; <n> to <m> matches 216 (?? <sre> ...) ; non-greedy (non-greedy) pattern: (0 or 1) 217 (*? <sre> ...) ; non-greedy kleene star 218 (**? <from> <to> <sre> ...) ; non-greedy range 219 220 ;; submatch patterns 221 (submatch <sre> ...) ; numbered submatch 222 (submatch-named <name> <sre> ...) ; named submatch 223 (=> <name> <sre> ...) 224 (backref <n-or-name>) ; match a previous submatch 225 226 ;; toggling case-sensitivity 227 (w/case <sre> ...) ; enclosed <sre>s are case-sensitive 228 (w/nocase <sre> ...) ; enclosed <sre>s are case-insensitive 229 230 ;; character sets 231 <char> ; singleton char set 232 (<string>) ; set of chars 233 (or <cset-sre> ...) ; set union 234 (~ <cset-sre> ...) ; set complement (i.e. [^...]) 235 (- <cset-sre> ...) ; set difference 236 (& <cset-sre> ...) ; set intersection 237 (/ <range-spec> ...) ; pairs of chars as ranges 238 239 ;; named character sets 240 any 241 nonl 242 ascii 243 lower-case lower 244 upper-case upper 245 alphabetic alpha 246 numeric num 247 alphanumeric alphanum alnum 248 punctuation punct 249 graphic graph 250 whitespace white space 251 printing print 252 control cntrl 253 hex-digit xdigit 254 255 ;; assertions and conditionals 256 bos eos ; beginning/end of string 257 bol eol ; beginning/end of line 258 bow eow ; beginning/end of word 259 nwb ; non-word-boundary 260 (look-ahead <sre> ...) ; zero-width look-ahead assertion 261 (look-behind <sre> ...) ; zero-width look-behind assertion 262 (neg-look-ahead <sre> ...) ; zero-width negative look-ahead assertion 263 (neg-look-behind <sre> ...) ; zero-width negative look-behind assertion 264 (atomic <sre> ...) ; for (?>...) independent patterns 265 (if <test> <pass> [<fail>]) ; conditional patterns 266 commit ; don't backtrack beyond this (i.e. cut) 267 268 ;; backwards compatibility 269 (posix-string <string>) ; embed a POSIX string literal 270 271 ==== Basic SRE Patterns 272 273 The simplest SRE is a literal string, which matches that string exactly. 274 275 (string-search "needle" "hayneedlehay") => <match> 276 277 By default the match is case-sensitive, though you can control this either with the compiler flags or local overrides: 278 279 (string-search "needle" "haynEEdlehay") => #f 280 281 (string-search (irregex "needle" 'i) "haynEEdlehay") => <match> 282 283 (string-search '(w/nocase "needle") "haynEEdlehay") => <match> 284 285 You can use {{w/case}} to switch back to case-sensitivity inside a {{w/nocase}}: 286 287 (string-search '(w/nocase "SMALL" (w/case "BIG")) "smallBIGsmall") => <match> 288 289 (string-search '(w/nocase "small" (w/case "big")) "smallBIGsmall") => #f 290 291 Of course, literal strings by themselves aren't very interesting 292 regular expressions, so we want to be able to compose them. The most 293 basic way to do this is with the {{seq}} operator (or its abbreviation {{:}}), 294 which matches one or more patterns consecutively: 295 296 (string-search '(: "one" space "two" space "three") "one two three") => <match> 297 298 As you may have noticed above, the {{w/case}} and {{w/nocase}} operators 299 allowed multiple SREs in a sequence - other operators that take any 300 number of arguments (e.g. the repetition operators below) allow such 301 implicit sequences. 302 303 To match any one of a set of patterns use the or alternation operator: 304 305 (string-search '(or "eeney" "meeney" "miney") "meeney") => <match> 306 307 (string-search '(or "eeney" "meeney" "miney") "moe") => #f 308 309 ==== SRE Repetition Patterns 310 311 There are also several ways to control the number of times a pattern 312 is matched. The simplest of these is {{?}} which just optionally matches 313 the pattern: 314 315 (string-search '(: "match" (? "es") "!") "matches!") => <match> 316 317 (string-search '(: "match" (? "es") "!") "match!") => <match> 318 319 (string-search '(: "match" (? "es") "!") "matche!") => #f 320 321 To optionally match any number of times, use {{*}}, the Kleene star: 322 323 (string-search '(: "<" (* (~ #\>)) ">") "<html>") => <match> 324 325 (string-search '(: "<" (* (~ #\>)) ">") "<>") => <match> 326 327 (string-search '(: "<" (* (~ #\>)) ">") "<html") => #f 328 329 Often you want to match any number of times, but at least one time is required, and for that you use {{+}}: 330 331 (string-search '(: "<" (+ (~ #\>)) ">") "<html>") => <match> 332 333 (string-search '(: "<" (+ (~ #\>)) ">") "<a>") => <match> 334 335 (string-search '(: "<" (+ (~ #\>)) ">") "<>") => #f 336 337 More generally, to match at least a given number of times, use {{>=}}: 338 339 (string-search '(: "<" (>= 3 (~ #\>)) ">") "<table>") => <match> 340 341 (string-search '(: "<" (>= 3 (~ #\>)) ">") "<pre>") => <match> 342 343 (string-search '(: "<" (>= 3 (~ #\>)) ">") "<tr>") => #f 344 345 To match a specific number of times exactly, use {=}: 346 347 (string-search '(: "<" (= 4 (~ #\>)) ">") "<html>") => <match> 348 349 (string-search '(: "<" (= 4 (~ #\>)) ">") "<table>") => #f 350 351 And finally, the most general form is {{**}} which specifies a range 352 of times to match. All of the earlier forms are special cases of this. 353 354 (string-search '(: (= 3 (** 1 3 numeric) ".") (** 1 3 numeric)) "192.168.1.10") => <match> 355 356 (string-search '(: (= 3 (** 1 3 numeric) ".") (** 1 3 numeric)) "192.0168.1.10") => #f 357 358 There are also so-called "non-greedy" variants of these repetition 359 operators, by convention suffixed with an additional {{?}}. Since the 360 normal repetition patterns can match any of the allotted repetition 361 range, these operators will match a string if and only if the normal 362 versions matched. However, when the endpoints of which submatch 363 matched where are taken into account (specifically, all matches when 364 using string-search since the endpoints of the match itself matter), 365 the use of a non-greedy repetition can change the result. 366 367 So, whereas {{?}} can be thought to mean "match or don't match," {{??}} means 368 "don't match or match." {{*}} typically consumes as much as possible, but 369 {{*?}} tries first to match zero times, and only consumes one at a time if 370 that fails. If you have a greedy operator followed by a non-greedy 371 operator in the same pattern, they can produce surprisins results as 372 they compete to make the match longer or shorter. If this seems 373 confusing, that's because it is. Non-greedy repetitions are defined 374 only in terms of the specific backtracking algorithm used to implement 375 them, which for compatibility purposes always means the Perl 376 algorithm. Thus, when using these patterns you force IrRegex to use a 377 backtracking engine, and can't rely on efficient execution. 378 379 ==== SRE Character Sets 380 381 Perhaps more common than matching specific strings is matching any of 382 a set of characters. You can use the or alternation pattern on a list 383 of single-character strings to simulate a character set, but this is 384 too clumsy for everyday use so SRE syntax allows a number of 385 shortcuts. 386 387 A single character matches that character literally, a trivial 388 character class. More conveniently, a list holding a single element 389 which is a string refers to the character set composed of every 390 character in the string. 391 392 (string-match '(* #\-) "---") => <match> 393 394 (string-match '(* #\-) "-_-") => #f 395 396 (string-match '(* ("aeiou")) "oui") => <match> 397 398 (string-match '(* ("aeiou")) "ouais") => #f 399 400 Ranges are introduced with the {{/}} operator. Any strings or characters 401 in the {{/}} are flattened and then taken in pairs to represent the start 402 and end points, inclusive, of character ranges. 403 404 (string-match '(* (/ "AZ09")) "R2D2") => <match> 405 406 (string-match '(* (/ "AZ09")) "C-3PO") => #f 407 408 In addition, a number of set algebra operations are provided. or, of 409 course, has the same meaning, but when all the options are character 410 sets it can be thought of as the set union operator. This is further 411 extended by the {{&}} set intersection, {{-}} set difference, and {{~}} set 412 complement operators. 413 414 (string-match '(* (& (/ "az") (~ ("aeiou")))) "xyzzy") => <match> 415 416 (string-match '(* (& (/ "az") (~ ("aeiou")))) "vowels") => #f 417 418 (string-match '(* (- (/ "az") ("aeiou"))) "xyzzy") => <match> 419 420 (string-match '(* (- (/ "az") ("aeiou"))) "vowels") => #f 421 422 ==== SRE Assertion Patterns 423 424 There are a number of times it can be useful to assert something about 425 the area around a pattern without explicitly making it part of the 426 pattern. The most common cases are specifically anchoring some pattern 427 to the beginning or end of a word or line or even the whole 428 string. For example, to match on the end of a word: 429 430 (string-match '(: "foo" eow) "foo") => <match> 431 432 (string-match '(: "foo" eow) "foo!") => <match> 433 434 (string-match '(: "foo" eow) "foof") => #f 435 436 The {{bow}}, {{bol}}, {{eol}}, {{bos}} and {{eos}} work similarly. {{nwb}} asserts that you 437 are not in a word-boundary - if replaced for {{eow}} in the above examples 438 it would reverse all the results. 439 440 There is no {{wb}}, since you tend to know from context whether it 441 would be the beginning or end of a word, but if you need it you can 442 always use (or bow eow). 443 444 Somewhat more generally, Perl introduced positive and negative 445 look-ahead and look-behind patterns. Perl look-behind patterns are 446 limited to a fixed length, however the IrRegex versions have no such 447 limit. 448 449 (string-match '(: "regular" (look-ahead " expression")) "regular expression") => <match> 450 451 The most general case, of course, would be an and pattern to 452 complement the or pattern - all the patterns must match or the whole 453 pattern fails. This may be provided in a future release, although it 454 (and look-ahead and look-behind assertions) are unlikely to be 455 compiled efficiently. 456 457 194 458 --- 195 459 Previous: [[Unit extras]] -
chicken/trunk/scripts/makedist.scm
r13816 r13851 48 48 (warning "files missing" missing) ) ) 49 49 (run (tar cfz ,(conc distname ".tar.gz") ,distname)) 50 (when full?51 (run (cp ,tgz site)) )52 50 (run (rm -fr ,distname)) ) ) 53 51
Note: See TracChangeset
for help on using the changeset viewer.