source: project/wiki/eggref/4/lexgen @ 14809

Last change on this file since 14809 was 14809, checked in by Ivan Raikov, 10 years ago

added an entry for lexgen release 1.5

File size: 5.8 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== lexgen
5
6=== Description
7
8{{lexgen}} is a lexer generator comprised in its core of only four
9small procedures. The programmer combines these procedures into
10regular expression pattern matchers.
11
12A pattern matcher procedure takes a list of streams, and returns a
13new list of streams advanced by every combination allowed by the
14pattern matcher function. A stream is defined as a list that contains
15a list of characters consumed by the pattern matcher, and a list of
16characters not yet consumed.
17
18Note that the number of streams returned by a pattern matcher
19typically won't match the number of streams passed in. If the pattern
20doesn't match at all, the empty list is returned.
21
22
23=== Library Procedures
24
25Every combinator procedure in this library returns a procedure that
26takes in a list of streams as an argument.
27
28==== Basic procedures
29
30<procedure>(tok TOKEN PROC) => MATCHER</procedure>
31
32Procedure {{tok}} builds a pattern matcher function that, for each
33stream given, applies a procedure to the given token {{TOKEN}} and an
34input character. If the procedure returns a true value, that value is
35prepended to the list of consumed elements, and the input character is
36removed from the list of input elements.
37
38<procedure>(seq MATCHER-LIST) => MATCHER</procedure>
39
40{{seq}} builds a matcher that matches a sequence of patterns.
41
42<procedure>(bar MATCHER-LIST) => MATCHER</procedure>
43
44{{bar}} matches any of a list of patterns. It's analogous to a series
45of patterns separated by {{|}} in traditional regular expressions.
46
47<procedure>(star MATCHER) => MATCHER</procedure>
48
49{{star}} is an implementation of the Kleene closure. It is analogous
50to {{*}} in traditional regular expressions.
51
52
53==== Convenience procedures
54
55These procedures are built from the previous four and are provided
56for convenience.
57
58<procedure>(try PROC) => PROC</procedure>
59
60Converts a binary predicate procedure to a binary procedure that
61returns its right argument when the predicate is true, and false
62otherwise.
63
64<procedure>(char CHAR) => MATCHER</procedure>
65
66Matches a single character.
67
68<procedure>(pos MATCHER) => MATCHER</procedure>
69
70Positive closure. Analogous to {{+}}.
71
72<procedure>(opt MATCHER) => MATCHER</procedure>
73
74Optional pattern. Analogous to {{?}}.
75
76<procedure>(set CHAR-SET) => MATCHER</procedure>
77
78Matches any of a SRFI-14 set of characters.
79
80<procedure>(range CHAR CHAR) => MATCHER</procedure>
81
82Matches a range of characters. Analogous to character class {{[]}}.
83
84<procedure>(lit STRING) => MATCHER</procedure>
85
86Matches a literal string {{s}}.
87
88
89==== Lexer procedures
90
91<procedure>(longest STREAM-LIST) => STREAM</procedure>
92
93Takes the resulting streams produced by the application of a pattern
94on a stream (or streams) and selects the longest match if one
95exists. If {{STREAM-LIST}} is empty, it returns {{#F}}.
96
97
98<procedure>(lex MATCHER STRING) => CHAR-LIST</procedure>
99
100{{lex}} takes a pattern and a string, turns the string into a list of
101streams (containing one stream), applies the pattern, and returns the
102longest match.
103
104=== Examples
105
106  (define a-pat (tok #\a (try char=?)))
107  (define b-pat (tok #\b (try char=?)))
108  (define a-then-b-pat (seq (list a-pat b-pat)))
109  (define a-or-b-pat (seq (list a-pat b-pat)))
110  (define a-star-pat (star a-pat))
111 
112  (define abc-stream (list `(() ,(string->list "abc"))))
113
114  (print (a-pat abc-stream))
115  (print (b-pat abc-stream))
116  (print (a-then-b-pat abc-stream))
117  (print (a-or-b-pat abc-stream))
118  (print (a-star-pat abc-stream))
119
120  ;; A pattern to match floating point numbers.
121  ;; "-"?(([0-9]+(\\.[0-9]+)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?
122
123  (define numpat
124    (let* ((digit        (range #\0 #\9))
125           (digits       (pos digit))
126           (fraction     (seq `(,(char #\.) ,digits)))
127           (significand  (bar `(,(seq `(,digits ,(opt fraction))) ,fraction)))
128           (exp          (seq `(,(set "eE") ,(opt (set "+-")) ,digits)))
129           (sign         (opt (char #\-)) ))     
130     (seq `(,sign ,(seq `(,significand ,(opt exp)))))))
131  (print (lex numpat "3.45e-6"))
132
133=== Requires
134
135* [[matchable]]
136
137=== Version History
138
139* 1.5 Using (require-extension srfi-1)
140* 1.4 Ported to Chicken 4
141* 1.2 Added procedures try and tok (supersedes pred)
142* 1.0 Initial release
143
144=== License
145
146Based on the [[http://www.standarddeviance.com/projects/combinators/combinators.html|SML lexer generator by Thant Tessman]].
147
148  Copyright 2009 Ivan Raikov.
149  All rights reserved.
150 
151  Redistribution and use in source and binary forms, with or without
152  modification, are permitted provided that the following conditions are
153  met:
154 
155  Redistributions of source code must retain the above copyright
156  notice, this list of conditions and the following disclaimer.
157 
158  Redistributions in binary form must reproduce the above copyright
159  notice, this list of conditions and the following disclaimer in the
160  documentation and/or other materials provided with the distribution.
161 
162  Neither the name of the author nor the names of its contributors may
163  be used to endorse or promote products derived from this software
164  without specific prior written permission.
165 
166  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
167  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
168  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
169  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
170  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
171  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
172  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
173  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
174  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
175  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
176  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
177  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.