source: project/wiki/eggref/5/agrep @ 38028

Last change on this file since 38028 was 38028, checked in by Ivan Raikov, 9 months ago

agrep documentation for C5

File size: 4.4 KB
Line 
1[[tags: egg]]
2
3== agrep
4
5[[toc:]]
6
7=== Description
8
9This library implements the Wu-Manber algorithm for string searching
10with errors, popularized by the {{agrep}} Unix command and the
11{{glimpse}} file indexing tool.  It was developed as part of a search
12engine for a largish MP3 collection; the "with error" searching comes
13handy for those who can't spell Liszt or Shostakovitch.
14
15Given a search pattern and a string, this algorithm determines whether
16the string contains a substring that matches the pattern up to a
17parameterizable number N of errors.  An error is either a substitution
18(replace a character of the string with another character), a deletion
19(remove a character) or an insertion (add a character to the string).
20In more scientific terms, the number of errors is the Levenshtein edit
21distance between the pattern and the matched substring.
22
23The search patterns are roughly those of the Unix shell, including
24one-character wildcard (?), character classes ([0-9]) and
25multi-character wildcard (*).  In addition, conjunction (&) and
26alternative (|) are supported.  General regular expressions are not
27supported, however.
28
29The algorithm is described in S. Wu and U. Manber, {{Fast Text
30Searching With Errors}}, tech. rep. TR 91-11, University of Arizona,
311991. 
32
33
34=== Author
35
36Xavier Leroy, ported to Chicken by
37[[/users/ivan raikov|Ivan Raikov]].
38
39=== Library Procedures
40
41<procedure>(make-pattern STRING [TRANSL]) => PATTERN</procedure>
42
43Compiles a search pattern.  The syntax for patterns is similar to that of the Unix shell. 
44The following constructs are recognized:
45
46; {{?}}:  match any single character
47; {{*}}:  match any sequence of characters
48; {{[..]}}:  character set: ranges are denoted with -, as in {{[a-z]}}; an initial {{^}}, as in {{[^0-9]}}, complements the set
49; {{&}}:  conjunction (e.g. {{sweet&sour}})
50; {{|}}:  alternative (e.g. {{high|low}})
51; {{(..)}}: grouping
52; {{\}}:  escape special characters; the special characters are {{\?*[]&|()}}.
53
54The optional argument {{TRANSL}} is a character translation table.
55
56<procedure>(string->pattern STRING [TRANSL]) => PATTERN</procedure>
57
58Returns a pattern that matches exactly the given string  and nothing else.
59
60<procedure>(string-match PAT STRING [NUMERRS] [WHOLEWORD]) => BOOL</procedure>
61
62Tests whether the string {{STRING}} matches the compiled pattern
63{{PAT}}.  The optional keyword parameter {{NUMERRS}} is the number of
64errors permitted.  One error corresponds to a substitution, an
65insertion or a deletion of a character.  {{NUMERRS}} default to 0
66(exact match).  The optional keyword parameter {{WHOLEWORD}} is true
67if the pattern must match a whole word, false if it can match inside a
68word.  It defaults to false (match inside words).
69
70<procedure>(substring-match PAT STRING POS LEN [NUMERRS] [WHOLEWORD] )</procedure>
71
72Same as {{string-match}}, but restricts the match to the substring of
73the given string starting at character number {{POS}} and extending
74{{LEN}} characters.
75
76<procedure>(errors-substring-match PAT STRING POS LEN [NUMERRS] [WHOLEWORD])</procedure>
77
78Same as {{substring-match}}, but returns the smallest number of errors
79such that the substring matches the pattern.  That is, it returns 0
80if the substring matches exactly, 1 if the substring matches with
81one error, etc.  Returns -1 if the substring does not match the
82pattern with at most {{NUMERRS}} errors.
83
84
85=== Requires
86
87* [[datatype]]
88
89=== Version History
90
91* 1.6 Ported to CHICKEN 5
92* 1.4 Using long instead of int to account for 64-bit systems; removed dependency on easyffi
93* 1.3 Ensure that *-match procedures return a boolean value (thanks to mario)
94* 1.2 Bug fixes in pattern (thanks to felix)
95* 1.1 Added definition of type ulong in skeleton.h
96* 1.0 Initial Release
97
98=== License
99
100{{agrep}} was originally written by Xavier Leroy and ported to Chicken
101by [[/users/ivan raikov|Ivan Raikov]].
102
103  Copyright 2009-2019 Ivan Raikov.
104 
105  This program is free software: you can redistribute it and/or modify
106  it under the terms of the GNU General Public License as published by
107  the Free Software Foundation, either version 3 of the License, or
108  (at your option) any later version.
109 
110  This program is distributed in the hope that it will be useful, but
111  WITHOUT ANY WARRANTY; without even the implied warranty of
112  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
113  General Public License for more details.
114 
115  A full copy of the GPL license can be found at
116  <http://www.gnu.org/licenses/>.
Note: See TracBrowser for help on using the repository browser.