source: project/wiki/eggref/4/comparse @ 30713

Last change on this file since 30713 was 30713, checked in by Moritz Heidkamp, 7 years ago

comparse: Update link to smug, update copyright year, add some clarifications to introduction

File size: 10.7 KB
Line 
1[[tags: egg]]
2
3== comparse
4
5Comparse is a library of
6[[http://en.wikipedia.org/wiki/Parser_combinator|parser combinators]]
7similar to [[http://www.haskell.org/haskellwiki/Parsec|Haskell's
8Parsec]]. Its implementation is based on
9[[https://github.com/drewc/smug|Drew Crampsie's smug for Common Lisp]]
10but slightly deviates in naming and provides a number of additional
11convenience combinators.
12
13
14[[toc:]]
15
16=== Caveat
17
18As of now the API documentation is still incomplete, notably missing
19the main entry point procedure {{parse}}. Also, the
20[[#examples|Examples section]] is yet to be written.
21
22
23=== Introduction
24
25Comparse provides a library of ''parsers'' which can be applied to
26''inputs'' in order to produce ''parse results''. It also provides
27''parser combinators'' to create new parsers. In Comparse inputs are
28represented by a {{parser-input}} type. It can hold a sequence of any
29kinds of ''items'' (often characters). For example, we can define an
30input sequence of characters like this:
31
32<enscript language="scheme">
33(use comparse)
34
35(define some-input
36  (->parser-input "hello world"))
37</enscript>
38
39A ''parser'' then is a function which takes an ''input'' as its
40argument and, in case the parse succeeds, returns a ''parse result''
41(which can be a value of any kind) and the ''remainder'' of the input
42(which is a {{parser-input}} again). In case the parse fails it
43returns {{#f}}.
44
45A very basic parser provided by Comparse is the {{item}} parser which
46consumes one item from the given input and returns it as its parse
47result. If the input is empty, {{item}} fails. We can apply it to the
48input we defined above using the {{parse}} procedure:
49
50<enscript language="scheme">
51(parse item some-input)
52</enscript>
53
54Which gives us these results:
55
56<enscript language="scheme">
57#\h
58#<parser-input #\e #\l #\l #\o #\space #\w #\o #\r #\l #\d ...>
59</enscript>
60
61There is another basic parser called {{fail}} which always fails, no
62matter what's passed to it. These primitive parsers are not very
63useful on their own, of course. In order to do more interesting things
64Comparse provides a selection of ''parser combinators''. Just like
65regular function combinators (such as {{compose}} or {{complement}}),
66parser combinators are functions which take one or more parsers as
67their arguments and combine them into a new parser. One of the basic
68parser combinators in Comparse is {{bind}}. It takes a parser and a
69procedure as its arguments and returns a parser which first applies
70the passed parser and, if it succeeds, calls the procedure with its
71parse result. The procedure then must return another parser which will
72be applied to the remainder of the parser that was passed to
73{{bind}}. Okay, that sounds pretty dazzling, so an example is in
74order:
75
76<enscript language="scheme">
77(define double-item
78  (bind item (lambda (x)
79               (lambda (remainder)
80                 ;; Note that unlike the `parse' procedure, the
81                 ;; internal parser API does not expect multiple
82                 ;; return values but rather a pair of values on
83                 ;; successful parse.
84                 (cons (list x x) remainder)))))
85
86(parse double-item some-input)
87</enscript>
88
89Which gives us:
90
91<enscript language="scheme">
92(#\h #\h)
93#<parser-input #\e #\l #\l #\o #\space #\w #\o #\r #\l #\d ...>
94</enscript>
95
96So the procedure we pass to {{bind}} returns a parser which constructs
97a two-item list from the parse result and returns it with the
98untouched remainder. As returning a result from a parser without
99touching the remainder is needed quite frequently, Comparse provides
100the {{result}} parser combinator which does just that. Using it we can
101write {{double-item}} more conveniently like this:
102
103
104<enscript language="scheme">
105(define double-item
106  (bind item (lambda (x)
107               (result (list x x)))))
108</enscript>
109
110Comparse provides many convenient and more high-level parser
111combinators than the ones presented in this introdution. See the
112[[#examples|Examples section]] on how to use them to create parsers
113for practical tasks. Also note that you would usually not manually
114convert raw inputs (like strings or input ports) into a
115{{parser-input}} object like we did above. Instead you can pass the
116raw input object directly to the {{parse}} procedure which will
117automatically turn it into a {{parser-input}} for you.
118
119
120=== A note on variadic combinators
121
122Some parser combinators accept one or more parsers as their
123arguments. As a convenience when just one argument is passed and it is
124a list instead of a parser, that list is treated as the argument
125list. In other words those combinators implicitly {{apply}} themselves
126to the given (single) list argument.
127
128
129=== API
130
131<procedure>(result value)</procedure>
132
133Returns a parser which always succeeds, leaves the input untouched and
134returns {{value}}.
135
136
137<procedure>(fail input)</procedure>
138
139A parser which always fails.
140
141
142<procedure>(item input)</procedure>
143
144A parser which consumes and returns the first item of {{input}},
145i.e. it returns that item and the tail of {{input}}. Fails if
146{{input}} is empty.
147
148
149<procedure>(bind parser proc)</procedure>
150
151Returns a parser which applies {{parser}} and calls {{proc}} with its
152parse result if it succeeds. {{proc}} must return a parser which will
153be applied to the remainder returned by {{parser}}.
154
155
156<procedure>(satisfies condition . args)</procedure>
157
158Returns a parser which consumes the next {{item}} of its input and
159applies {{condition}} to it like this: {{(apply condition item
160args)}}. If that call returns {{#f}}, the parser fails. Otherwise it
161succeeds and returns the item as its result.
162
163
164<procedure>(is x)</procedure>
165
166Returns a parser which succeeds when the next {{item}} of its input is
167{{eq?}} to {{x}}.
168
169
170<procedure>(in collection . items)</procedure>
171
172Returns a parser which succeeds when the next item of the given input
173is contained in {{collection}}. Collection can be a
174[[http://srfi.schemers.org/srfi-14/srfi-14.html|SRFI 14 char-set]] or
175a list which is checked for membership with {{memq}}. Given more than
176one argument, membership of the next item is checked for with {{memq}}
177in {{(cons collection items)}}.
178
179
180<procedure>(sequence parser . parsers)</procedure>
181
182Returns a parser which applies the given parsers from left to right
183and returns the list of their results when all succeed. See also the
184[[#a-note-on-variadic-combinators|note on variadic combinators]].
185
186
187<procedure>(char-seq str)</procedure>
188
189Returns a parser which consumes the chars of the given string {{str}}
190in the order contained therein, i.e. it parses a literal string. On
191success the parser returns {{str}}.
192
193
194<procedure>(any-of parser . parsers)</procedure>
195
196Returns a parser which applies the given parsers from left to right
197and returns the result of the first one that succeeds. See also the
198[[#a-note-on-variadic-combinators|note on variadic combinators]].
199
200
201<procedure>(all-of parser . parsers)</procedure>
202
203Returns a parser which applies the given parsers from left to right
204and returns the result of the last one if all succeed. See also the
205[[#a-note-on-variadic-combinators|note on variadic combinators]].
206
207
208<procedure>(none-of parser . parsers)</procedure>
209
210Returns a parser which applies the given parsers from left to right
211and returns {{#t}} as its result if none of them succeed while it
212doesn't consume any items from the input. See also the
213[[#a-note-on-variadic-combinators|note on variadic combinators]].
214
215
216<procedure>(none-of* parser [parsers ...] but)</procedure>
217
218Returns a parser which applies the given parsers from left to
219right. If none of them succeed, it applies the parser {{but}} and
220returns its result.
221
222
223<procedure>(preceded-by parser . parsers)</procedure>
224
225Returns a parser which applies the given parsers from left to right to
226the input or the remainder of the preceding parser, respectively, and
227returns the result of the last one. See also the
228[[#a-note-on-variadic-combinators|note on variadic combinators]].
229
230
231<procedure>(followed-by parser following-parser . following-parsers)</procedure>
232
233Returns a parser which applies {{parser}} if the given
234{{following-parser[s]}} succeed on its remainder from left to
235right. See also the [[#a-note-on-variadic-combinators|note on variadic
236combinators]] which applies to the {{following-parser[s]}} argument(s)
237in this case.
238
239
240<procedure>(enclosed-by open content close)</procedure>
241
242Returns a parser which applies the given parsers {{open}},
243{{content}}, and {{close}} in that order to the input or the remainder
244of the preceding parser, respectively, and returns the result of
245{{content}} with the remainder of {{close}}.
246
247
248=== Examples
249
250Yet to be written.
251
252
253=== About this egg
254
255==== Source
256
257The [[https://bitbucket.org/DerGuteMoritz/comparse|source code is
258hosted at Bitbucket]]. Feel free to fork it and send pull requests
259there.
260
261==== Author
262
263[[/users/moritz-heidkamp|Moritz Heidkamp]]
264
265==== Version history
266
267; 0.1.0 : Add {{parser-input}} wrapper type so as not to leak {{lazy-seq}} implementation detail. Add {{end-of-input}} parser. Improve performance of {{as-string}} and {{repeated}} combinators.
268; 0.0.5 : Turn {{latch}} into compile-time dependency
269; 0.0.4 : Fix potential apply limit exceedance in {{as-string}}. Extend {{repeated}} to parse exactly {{n}} repetitions.
270; 0.0.3 : Correct dependencies
271; 0.0.2 : Add memoization support
272; 0.0.1 : Initial release
273
274==== License
275
276  Copyright (c) 2013-2014, Moritz Heidkamp
277  All rights reserved.
278 
279  Redistribution and use in source and binary forms, with or without
280  modification, are permitted provided that the following conditions are
281  met:
282 
283  Redistributions of source code must retain the above copyright
284  notice, this list of conditions and the following disclaimer.
285 
286  Redistributions in binary form must reproduce the above copyright
287  notice, this list of conditions and the following disclaimer in the
288  documentation and/or other materials provided with the distribution.
289 
290  Neither the name of the author nor the names of its contributors may
291  be used to endorse or promote products derived from this software
292  without specific prior written permission.
293 
294  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
295  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
296  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
297  FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
298  COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
299  INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
300  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
301  SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
302  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
303  STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
304  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
305  OF THE POSSIBILITY OF SUCH DAMAGE.
306
Note: See TracBrowser for help on using the repository browser.