source: project/wiki/eggref/3/loopy-loop @ 13621

Last change on this file since 13621 was 13621, checked in by sjamaan, 11 years ago

Move old chicken 3 eggs over to eggref/3

File size: 16.9 KB
Line 
1[[tags: egg]]
2
3LOOP is a generalized iteration form supporting extensible iterator
4macros, keyword updates, and full recursion.  The idea is to create a
5loop as simple and close to natural Scheme as possible, while still
6being extensible.
7
8[[toc:]]
9
10=== Introduction
11
12In its most basic usage, LOOP can be used as a drop-in replacement for
13named let (assuming the let name isn't passed as a first class value).
14So, for example, the following definitions
15
16<enscript highlight=scheme>
17  (define (fold kons knil ls)
18    (let lp ((ls ls) (knil knil))
19      (if (pair? ls)
20        (lp (cdr ls) (kons (car ls) knil))
21        knil)))
22</enscript>
23
24and
25
26<enscript highlight=scheme>
27  (define (fold kons knil ls)
28    (loop lp ((ls ls) (knil knil))
29      (if (pair? ls)
30        (lp (cdr ls) (kons (car ls) knil))
31        knil)))
32</enscript>
33
34are equivalent.  We further allow automatic stepping of variables, as
35in the DO macro:
36
37<enscript highlight=scheme>
38  (define (fold kons knil ls)
39    (loop lp ((ls ls (cdr ls)) (knil knil (kons (car ls) knil)))
40      (if (pair? ls)
41        (lp)
42        knil)))
43</enscript>
44
45The parameters have default steps, so we don't need to pass them
46explicitly anymore (though we can still do so if we wanted to override
47the defaults).
48
49In addition, we provide extensible iterators to automatically handle
50the logic of stepping, fetching, and checking termination on sequence
51types.  To use an iterator, specify one or more variable names
52followed by `<-' followed by the iterator and any parameters:
53
54  (x <- in-foo bar baz qux)
55
56To iterate over a list, use the IN-LIST macro:
57
58  (x <- in-list ls)
59
60This will bind X to the successive elements of LS in the body of the
61loop.
62
63Now, when iterating automatically, the loop will also terminate
64automatically if it encounters the end of its input.  In such a case
65you may want to specify a return value.  You can do this by putting
66
67  => <expr>
68
69right at the start of the loop body.  So our example now becomes:
70
71<enscript highlight=scheme>
72  (define (fold kons knil ls)
73    (loop lp ((x <- in-list ls) (knil knil (kons x knil)))
74        => knil
75      (lp)))
76</enscript>
77
78Note we can still call, or not call, the loop itself in the body
79according to whatever logic we want, and re-enter it possibly multiple
80times.  However, in this common case where the entire body is reduced
81to just calling the loop again, we can omit it by using an anonymous
82loop:
83
84<enscript highlight=scheme>
85  (define (fold kons knil ls)
86    (loop ((x <- in-list ls) (knil knil (kons x knil)))
87       => knil))
88</enscript>
89
90No flexibility is lost over named let, yet we've gained the
91convenience of iterators.  If you wanted to change the above to work
92on vectors, all you would need to do is change the iterator:
93
94  (x <- in-vector vec)
95
96and it works as expected.
97
98=== Bindings and scope
99
100Iterator macros may introduce variables in three different lexical
101scopes:
102
103; Loop variables : Analogous to the variables in a named let, these are initialized once and updated on each iteration through the loop.  In the example above, KNIL is a loop variable (as are all named let and DO-style variables).
104
105; Body variables : Bound in the body, these are usually derived directly from loop variables.  They can't be overridden (see below) and are not available in the final expression.  In the example above, X is a body variable.
106
107; Final variables : Bound once in the return expression, these are sometimes used for some final computation such as reversing a consed up list.
108
109Within each of these three lexical scopes, all variables are updated
110in parallel, and none are ever mutated (unless the programmer does so
111manually).  This referential transparency is important to achieve full
112non-tail recursion and re-entrancy.
113
114In many cases the loop variables will be implicit and unnamed.  For
115instance, IN-LIST uses a loop variable to cdr down the list of pairs,
116binding X to the successive cars.  However, in such cases the iterator
117usually lets you explicitly name the loop variable if you want access
118to it.
119
120Loop variables may be manually overridden on a recursive call.  You can
121either use the original positional arguments, or specify individual
122values by name with the <- syntax, punning the initial binding.  Thus
123in
124
125   (loop lp ((x ls <- in-list ls)) ...)
126
127the recursive calls
128
129   (lp)
130   (lp (cdr ls))
131   (lp ls <- (cdr ls))
132
133are all the same.  Note that we are binding the loop variable LS, not
134X which is considered to be always derived from the loop variable.
135Note also that there is no need to recurse on CDR - we could use CDDR,
136or a completely unrelated list, or '() to force an early termination.
137
138The following example flattens a tree into a list, using minimal
139conses and stack.  This serves as an example of naming implicit loop
140variables, binding loop variables, and non-tail recursion.
141
142<enscript highlight=scheme>
143  (define (flatten ls)
144    (reverse
145     (loop lp ((x ls <- in-list ls) (res '()))
146         => res
147       (if (pair? x)
148           (lp res <- (lp ls <- x))
149           (lp res <- (cons x res))))))
150</enscript>
151
152The scope of the final expression will include all the final
153variables, as well as all the last instances of all the loop
154variables, at least one of which will correspond to a true termination
155condition (you could manually check the others to see if the sequence
156lengths were uneven).  The body variables are not bound, however the
157loop itself, if named, is available so that you can restart the loop
158with all new initial values if you want.
159
160=== Iterators
161
162==== in-list
163
164 '''syntax:''' (<element> [<pair>] <- in-list <list> [<cdr> [<null?>]])
165
166Iterates over the successive elements of a list.
167
168<enscript highlight=scheme>
169  ;;; Simple loop
170  > (loop ((x <- in-list '(a b c))) (write x) (newline))
171  a
172  b
173  c
174
175  ;;; Reverse a list destructively.
176  (define (reverse! list)
177    (loop ((elt pair <- in-list list)
178           (tail '() pair))
179        => tail
180      (set-cdr! pair tail)))
181
182  ;;; Test for circularity
183  (define (cddr* ls) ; CL's cddr
184    (if (pair? (cdr ls)) (cddr ls) '()))
185
186  (define (circular-list? ls)
187    (and (pair? ls)
188         (loop race ((tortoise <- in-list ls)
189                     (hare <- in-list (cdr ls) cddr*))
190             => #f
191           (or (eq? hare tortoise) (race)))))
192</enscript>
193
194==== in-lists
195
196 '''syntax:''' (<elements> [<pairs>] <- in-lists <lol> [<cdr> [<null?>]])
197
198Iterate over a list of lists.  <elements> is bound to the heads of
199each of the lists in <lol>.  The CDR and NULL? options can be
200specified as in IN-LIST.
201
202<enscript highlight=scheme>
203  (define (any pred . lol)
204    (loop lp ((elts <- in-lists lol))
205        => #f
206      (or (apply pred elts) (lp))))
207</enscript>
208
209==== in-string / in-string-reverse
210
211 '''syntax:''' (<element> [<index>] <- in-string <str> [<start> [<end> [<step>]]])
212 '''syntax:''' (<element> [<index>] <- in-string-reverse <str> [<start> [<end> [<step>]]])
213
214Iterate over the characters of a string.  Proceeds from <start>,
215inclusive, to <end>, exclusive.  By default <start> is 0 and <end> is
216the string length, thus iterating over every character.
217
218You can specify a step other than the default 1, for example 2 to
219iterate over every other character.
220
221The reverse version steps from one less than the end, continuing until
222you step below the start.  Thus with the same <start> and <end> and a
223<step> of 1 (or any divisor of the difference), the two forms will
224iterate over the same characters but in the reverse order.
225
226Note this works correctly with the utf8 egg, but is not optimal in
227such cases because the use of numeric indexes is slow.
228
229==== in-vector / in-vector-reverse
230
231 '''syntax:''' (<element> [<index>] <- in-vector <vec> [<start> [<end> [<step>]]])
232 '''syntax:''' (<element> [<index>] <- in-vector-reverse <vec> [<start> [<end> [<step>]]])
233
234Analogues of the string iterators, but for vectors.  Note also all of the
235SRFI-4 uniform vectors can iterated over as in-u8vector, etc.
236
237==== in-port / in-file
238
239 '''syntax:''' (<datum> <- in-port [<port> [<reader> [<eof?>]]])
240 '''syntax:''' (<datum> <- in-file <path> [<reader> [<eof?>]])
241
242Iterate over data read from a port, defaulting to (CURRENT-INPUT-PORT)
243for IN-PORT, and a port opened by (OPEN-INPUT-FILE <path>) for
244IN-FILE.  The reader defaults to READ-CHAR?, and the termination test
245defaults to EOF-OBJECT?.
246
247The stateful nature of ports means that these are not referentially
248transparent, and you can't save a loop iteration to go back to later.
249In particular, IN-FILE will close its port on the first termination,
250causing an error if you attempt to re-enter the same loop again.
251
252<enscript highlight=scheme>
253  (define (read-mime-headers port)
254    (loop lp ((line <- in-port port read-line)
255              (res '() (cons line res)))
256        => (reverse res) ; eof case
257      (if (string-null? line)
258        (reverse res)
259        (lp))))
260
261  ;; alternate version with a custom termination test
262  (define (read-mime-headers port)
263    (loop lp ((line <- in-port port read-line
264                       (disjoin eof-object? string-null?))
265              (res <- collecting line))
266        => res))
267
268  (define (file->sexp-list path)
269    (loop ((x <- in-file path read) (ls <- collecting x)) => x))
270</enscript>
271
272==== in-range / in-range-reverse
273
274 '''syntax:''' (<number> <- in-range [[<from>] <to> [<step>]])
275 '''syntax:''' (<number> <- in-range-reverse [[<from>] <to> [<step>]])
276
277Step through the real numbers beginning with <from> (default 0), until
278they would be greater than (less then in the -reverse case) or equal
279to <to> (thus <to> is never included).  <step> defaults to 1.
280
281Two arguments indicate <from> and <to>, so provide the default <from>
282of 0 if you're only interested in <to> and <step>.
283
284These macros are subject to change in the near future.
285
286==== in-random
287
288 '''syntax:''' (<number> <- in-random [<range> [<low>]])
289
290With no arguments, <number> is bound to a random inexact number
291uniformly distributed over 0.0 and 1.0, inclusive, on each iteration.
292
293With a single argument, <number> is bound to a random integer
294uniformly distributed over 0..<range>-1, inclusive.
295
296With two arguments, <number> is bound to a random integer uniformly
297distributed over <low>..<low>+<range>-1, inclusive.
298
299These are conceptually infinite sequences, and themselves never cause
300the loop to terminate.
301
302==== in-random-element
303
304 '''syntax:''' (<element> <- in-random-element <vector-or-list>)
305
306On each iteration, <element> is bound a random object uniformly
307chosen from the elements of the <vector-or-list> source.
308
309Elements may be repeated, so this is a conceptually infinite sequence.
310
311==== in-permutations
312
313 '''syntax:''' (<perm> <- in-permutations <list> [<n>])
314
315With one argument, <perm> is bound to the successive permutations of
316the elements of <list> in lexicographic order.  No assumptions about
317the elements are made - if <list> is a multi-set, duplicate
318permutations will arise.
319
320This is very fast and mutation free.  It uses only O(k) space, where k
321is the number of elements in <list>.  Beware that the number of
322permutations of n elements is n!, which grows extremely fast.
323
324<enscript highlight=scheme>
325  > (loop ((p <- in-permutations '(a b c) 2)) (write p) (newline))
326  (a b)
327  (a c)
328  (b a)
329  (b c)
330  (c a)
331  (c b)
332</enscript>
333
334==== in-combinations
335
336 '''syntax:''' (<comb> <- in-combinations <list> <n>)
337
338Similar to IN-PERMUTATIONS, but iterates over all combinations of <n>
339elements from <list> (i.e. order doesn't matter).
340
341<enscript highlight=scheme>
342  > (loop ((c <- in-combinations '(a b c) 2)) (write c) (newline))
343  (a b)
344  (a c)
345  (b c)
346</enscript>
347
348Using permutations and combinations can be a convenient way to build
349very extensive (albeit brute-force) test suites, among other things.
350
351==== in-cartesian-product
352
353 '''syntax:''' (<list> <- in-cartesian-product <list-of-lists>)
354
355Iterates over the full cartesian product (all joins) of
356<list-of-lists>, lexicographically (the rightmost list changes
357first).
358
359<enscript highlight=scheme>
360  > (loop ((x <- in-cartesian-product '((a b) (c d e)))) (write x) (newline))
361  (a c)
362  (a d)
363  (a e)
364  (b c)
365  (b d)
366  (b e)
367</enscript>
368
369==== in-hash-table
370
371 '''syntax:''' (<key> <value> <- in-hash-table <table>)
372
373Iterate over the <key> and <value> pairs of a hash-table <table>.  The
374current <key> being iterated over may be deleted from the table or
375have its value in the table changed safely.
376
377The result is unspecified if you add or remove other values to the
378table while it is being iterated over.  If you want to capture a safe
379snapshot of the table first, you can convert it to an alist and
380iterate over those values.
381
382<enscript highlight=scheme>
383  (define (hash-table-purge! pred table)
384    (loop ((k v <- in-table table))
385      (if (pred k v)
386        (hash-table-delete! table k))))
387</enscript>
388
389==== collecting
390
391 '''syntax:''' (<list> [<rev>] <- collecting <expr> [<cons> [<finalize> [<init>]]])
392
393The only of the standard iterators that introduces a final variable.
394<list> is bound only in the => final clause.  By default,
395
396* a <cons> of APPEND-REVERSE will append all the <expr>'s into a list.
397
398* a <finalize> of REVERSE-LIST->VECTOR will collect a vector, and
399IDENTITY will collect a reversed list.
400
401By specifying all of <cons>, <finalize> and <init> you could collect
402into any data structure.
403
404The optional <rev> is a loop variable representing the intermediate
405consed results.  You may override this manually to include or exclude
406values, or even reset the collected results mid-loop.
407
408This is really just syntactic sugar over an accumulated list to save
409you the trouble of reversing manually at the end.
410
411=== Implicit matching
412
413For any body variable (as described above, the ones derived from iterators,
414e.g. the elements in a list), instead of a simple name you can use any sexp,
415and it will be matched against the result as in Common-Lisp's
416destructuring-bind, except using the [[matchable]] syntax (described in
417[[http://chicken.wiki.br/Pattern%20matching|Pattern Matching]].  So for example, to iterate nicely over the pairs
418in an alist, you just do
419
420<enscript highlight=scheme>
421  (loop (((k . v) <- in-list alist))
422    (print "key: " k " value: " v))
423</enscript>
424
425This costs nothing if you don't use it, and is fast even if you do.
426
427=== Extending
428
429Adding your own iterators is easy.  When a loop includes a binding such as
430
431<enscript language=scheme>
432  (left ... <- in-iterator right ...)
433</enscript>
434
435then the iterator itself is called as a macro in the following
436form:
437
438<enscript language=scheme>
439  (in-iterator ((left ...) (right ...)) next . rest)
440</enscript>
441
442where next and rest are the continuation.  The continuation
443expects to be passed the appropriate information to insert in the
444loop, in the following form:
445
446<enscript language=scheme>
447  (next ((temp-var value) ...)     ; one-time bindings outside the loop
448        ((loop-var init step) ...) ; do-style description of loop variables
449        (done? ...)                ; termination tests
450        ((body-var value) ...)     ; body variables
451        ((final-var value) ...)    ; final result bindings in => clause
452        . rest)
453</enscript>
454
455Note that any or all of the terms may be empty lists - the
456iterator doesn't have to do anything.
457
458As an example, consider the following simplified implementation
459of IN-LIST:
460
461<enscript language=scheme>
462  (define-syntax in-list
463    (syntax-rules ()
464      ((in-list ((elt) (init-list)) next . rest)
465       ;; pass the info to next
466       (next ()                              ; no outer let bindings
467             ((ls init-list (cdr ls)))       ; loop variable, init, step
468             ((null? ls))                    ; termination tests
469             ((elt (car ls)))                ; body variables and values
470             ()                              ; no final result bindings
471             . rest))))
472</enscript>
473
474This implementation of IN-LIST causes the code
475
476<enscript language=scheme>
477  (loop ((x <- in-list '(1 2 3)))
478    (print x))
479</enscript>
480
481to expand to
482
483<enscript language=scheme>
484  (let ((ls '(1 2 3)))
485    (if (null? ls)
486      (if #f #f) ; unspecified value returned by default
487      (let ((x (car ls)))
488        (print x)
489        (lp (cdr ls)))))
490</enscript>
491
492Note the outer let bindings are empty because we don't have
493anything to remember - the loop just proceeds by cdr'ing down the
494LS loop variable.  In an interator such as IN-VECTOR, where you
495repeatedly VECTOR-REF the same vector, you'd want to bind the
496vector once so that it's not evaluated multiple times.
497
498The final result bindings are also usually empty.  Currently it's
499only used by COLLECTING to reverse the list that has been
500accumulated so far.
501
502=== Further reading
503
504See the article with message-id
505[[http://groups.google.com/group/comp.lang.scheme/msg/60dcac5ea812398|<1157562097.001179.11470@i42g2000cwa.googlegroups.com>]] posted on
506comp.lang.scheme in September of 2006 for the original version and a
507brief history of Lisp iteration constructs.
508
509=== Requirements
510
511[[http://www.call-with-current-continuation.org/eggs/syntax-case.html|syntax-case]]
512or [[syntactic-closures]]
513
514[[matchable]]
515
516
517=== License
518
519public domain
520
521=== History
522
523; 0.5 : srfi-4 uniform vectors and in-cartesian-product
524; 0.4 : fixing bug in IN-FILE
525; 0.3 : minor changes and cleanup
526; 0.2 : implicit matching
527; 0.1 : initial release
Note: See TracBrowser for help on using the repository browser.