source: project/wiki/lexgen @ 13261

Last change on this file since 13261 was 13261, checked in by Ivan Raikov, 11 years ago

Added lexgen documentation.

File size: 5.4 KB
Line 
1[[tags: eggs]]
2[[toc:]]
3
4== lexgen
5
6=== Description
7
8{{lexgen}} is a lexer generator comprised in its core of only four
9small procedures. The programmer combines these procedures into
10regular expression pattern matchers.
11
12A pattern matcher procedure takes a list of streams, and returns a
13new list of streams advanced by every combination allowed by the
14pattern matcher function. A stream is defined as a list that contains
15a list of characters consumed by the pattern matcher, and a list of
16characters not yet consumed.
17
18Note that the number of streams returned by a pattern matcher
19typically won't match the number of streams passed in. If the pattern
20doesn't match at all, the empty list is returned.
21
22
23=== Library Procedures
24
25Every combinator procedure in this library returns a procedure that
26takes in a list of streams as an argument.
27
28==== Basic procedures
29
30<procedure>(pred TOKEN PROC) => MATCHER</procedure>
31
32Procedure {{pred}} builds a pattern matcher function that, for each
33stream given, applies a predicate to the given token {{TOKEN}} and an
34input character. If the predicate returns true, the character is
35prepended to the list of the consumed characters in the corresponding
36stream.
37
38<procedure>(seq MATCHER-LIST) => MATCHER</procedure>
39
40{{seq}} builds a matcher that matches a sequence of patterns.
41
42<procedure>(bar MATCHER-LIST) => MATCHER</procedure>
43
44{{bar}} matches any of a list of patterns. It's analogous to a series
45of patterns separated by {{|}} in traditional regular expressions.
46
47<procedure>(star MATCHER) => MATCHER</procedure>
48
49{{star}} is an implementation of the Kleene closure. It is analogous
50to {{*}} in traditional regular expressions.
51
52
53==== Convenience procedures
54
55These procedures are built from the previous four and are provided
56for convenience.
57
58<procedure>(pos MATCHER) => MATCHER</procedure>
59
60Positive closure. Analogous to {{+}}.
61
62<procedure>(opt MATCHER) => MATCHER</procedure>
63
64Optional pattern. Analogous to {{'?'}}.
65
66<procedure>(char CHAR) => MATCHER</procedure>
67
68Matches a single character.
69
70<procedure>(set CHAR-SET) => MATCHER</procedure>
71
72Matches any of a SRFI-14 set of characters.
73
74<procedure>(range CHAR CHAR) => MATCHER</procedure>
75
76Matches a range of characters. Analogous to character class {{[]}}.
77
78<procedure>(lit STRING) => MATCHER</procedure>
79
80Matches a literal string {{s}}.
81
82
83==== Lexer procedures
84
85<procedure>(longest STREAM-LIST) => STREAM</procedure>
86
87Takes the resulting streams produced by the application of a pattern
88on a stream (or streams) and selects the longest match if one
89exists. If {{STREAM-LIST}} is empty, it returns {{#F}}.
90
91
92<procedure>(lex MATCHER STRING) => CHAR-LIST</procedure>
93
94{{lex}} takes a pattern and a string, turns the string into a list of
95streams (containing one stream), applies the pattern, and returns the
96longest match.
97
98=== Examples
99
100  (define a-pat (pred #\a char=?))
101  (define b-pat (pred #\b char=?))
102  (define a-then-b-pat (seq (list a-pat b-pat)))
103  (define a-or-b-pat (seq (list a-pat b-pat)))
104  (define a-star-pat (star a-pat))
105 
106  (define abc-stream (list `(() ,(string->list "abc"))))
107
108  (print (a-pat abc-stream))
109  (print (b-pat abc-stream))
110  (print (a-then-b-pat abc-stream))
111  (print (a-or-b-pat abc-stream))
112  (print (a-star-pat abc-stream))
113
114  ;; A pattern to match floating point numbers.
115  ;; "-"?(([0-9]+(\\.[0-9]+)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?
116
117  (define numpat
118    (let* ((digit        (range #\0 #\9))
119           (digits       (pos digit))
120           (fraction     (seq `(,(char #\.) ,digits)))
121           (significand  (bar `(,(seq `(,digits ,(opt fraction))) ,fraction)))
122           (exp          (seq `(,(set "eE") ,(opt (set "+-")) ,digits)))
123           (sign         (opt (char #\-)) ))     
124    (seq `(,sign ,(seq `(,significand ,(opt exp)))))))
125
126  (print (lex numpat "3.45e-6"))
127
128
129=== Requires
130
131* [[matchable]]
132
133=== Version History
134
135* 1.0 Initial release
136
137=== License
138
139Based on the [[http://www.standarddeviance.com/projects/combinators/combinators.html|SML lexer generator by Thant Tessman]].
140
141  Copyright 2009 Ivan Raikov.
142  All rights reserved.
143 
144  Redistribution and use in source and binary forms, with or without
145  modification, are permitted provided that the following conditions are
146  met:
147 
148  Redistributions of source code must retain the above copyright
149  notice, this list of conditions and the following disclaimer.
150 
151  Redistributions in binary form must reproduce the above copyright
152  notice, this list of conditions and the following disclaimer in the
153  documentation and/or other materials provided with the distribution.
154 
155  Neither the name of the author nor the names of its contributors may
156  be used to endorse or promote products derived from this software
157  without specific prior written permission.
158 
159  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
160  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
161  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
162  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
163  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
164  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
165  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
166  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
167  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
168  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
169  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
170  OF THE POSSIBILITY OF SUCH DAMAGE.
Note: See TracBrowser for help on using the repository browser.