Changeset 20539 in project


Ignore:
Timestamp:
09/27/10 15:00:24 (9 years ago)
Author:
Alex Shinn
Message:

cleanup

File:
1 edited

Legend:

Unmodified
Added
Removed
  • gazette/src/issues/5.wiki

    r20538 r20539  
    11((title . "Issue 5")
    22 (authors "Alex Shinn")
    3  (date . 1285489159))
     3 (date . 1285592401))
    44
    55== 0. Introduction
     
    3232by the CL defsystem macro.  {{system}} lets you define groups of files
    3333and their dependencies which can be loaded or compiled, and even
    34 re-loaded or compiled keeping track of modified files.  Use it for
    35 rapid development in the repl (or in the near future as a {{make}}
    36 alternative in your .setup files).
     34re-loaded or compiled keeping track of modified files.  Use it for a
     35{{make}} alternative in your .setup files, or for rapid development in
     36the repl!
    3737
    3838== 2. The Core - Bleeding Edge Development
     
    4646
    4747A serious compiler bug related to inlining was fixed (found with much
    48 help by Sven Hartrumpf), and several other bugs reported by Kon Lovett
    49 were fixed.
     48help by [[http://wiki.call-cc.org/users/sven-hartrumpf|Sven Hartrumpf]]),
     49and several other bugs reported by
     50[[http://wiki.call-cc.org/users/kon-lovett|Kon Lovett]] were fixed.
    5051
    5152A new foreign type `pointer-vectors' (vectors of native unboxed
     
    6162egg (should be fully backwards compatible, as long as "(use regex)
    6263(import irregex)" idiom is used; dependencies on regex unit not in egg
    63 repo, though), upgraded irregex version, many upstream bugfixes and
     64repo, though), upgraded irregex version, many upstream bug-fixes and
    6465optimizations, with many thanks to
    6566[[http://wiki.call-cc.org/users/peter-bex|Peter Bex]]
     
    111112A packrat parser is a parser for Parsing Expression Grammars (PEGs),
    112113which is essentially a recursive decent parser with backtracking plus
    113 memoization to promise a linear time parse.  They look similar to
    114 Context Free Grammars (CFGs), but are unambiguous by virtue of being
    115 ordered - the leftmost matching rule is always chosen.  PEGs also
    116 allow {{and}} and {{not}} rules, which allow them to parse languages
    117 that can't be described by CFGs.
     114memoization.  What does all that mean?  Well, a recursive decent
     115parser just looks ahead a fixed number of tokens and dispatches
     116accordingly.  Scheme's own s-expressions are a simple example of this,
     117since a datum can be determined by just the first few characters.
     118Since you just determine the rule once in advance and then recurse,
     119this lends itself to very simple parsers, and indeed {{read}} is very
     120easy to implement in Scheme.
     121
     122Adding backtracking removes the restriction on fixing the rule in
     123advance, allowing more complex grammars resembling those of typical
     124parser generators.  With backtracking we usually call the library a
     125parser combinator, and there are already a number of examples
     126available including Taylor Campbell's
     127[[http://mumble.net/~campbell/darcs/parscheme/|parscheme]].  The
     128disadvantage of a naive backtracking approach is that it may require
     129exponential time to parse.  A packrat parser just adds memoization,
     130which guarantees a linear time parse.
     131
     132The rules for PEGs look similar to Context Free Grammars (CFGs), but
     133are unambiguous by virtue of being ordered - the leftmost matching
     134rule is always chosen.  PEGs also allow {{and}} and {{not}} rules,
     135which allow them to parse languages that can't be described by CFGs.
    118136
    119137The
    120138[[http://bugs.call-cc.org/export/20226/release/4/packrat/doc/packrat.pdf|documentation for packrat]]
    121 is not too user-friendly, and among other things doesn't include any
     139is not too user-friendly - among other things doesn't include any
    122140examples of parsing from actual text.  This is because {{packrat}} is
    123141written to work over abstract streams of tokens, not just text, but
     
    142160                    (set! pos (update-parse-position pos x))
    143161                    (values old-pos (cons x x)))))))))
    144 
     162</enscript>
     163
     164You can just use this as-is without worrying about the details for
     165now.
     166
     167<enscript highlight="scheme">
    145168  ;; Now define a character-oriented version of the expression parser:
    146169  (define expr-parser
     
    155178             ((a <- '#\6) a) ((a <- '#\7) a) ((a <- '#\8) a)
    156179             ((a <- '#\9) a))))
    157 
    158   ;; ... and a utility function to parse from strings or ports,
     180</enscript>
     181
     182Here we describe our grammar as returning an {{expr}} (the initial
     183rule).  Following this is a list of definitions for the non-terminals
     184{{expr}}, {{mulexp}}, {{simple}}, {{num}}, {{digits}} and {{digit}},
     185which may recursively refer to each other.
     186
     187Each definition has one or more clauses which contain a sequence and
     188an action.  The action is normal Scheme code.  The sequence is a list
     189of other non-terminals referred to by name, and non-terminals referred
     190to by quoting them.  You can also bind a value to be bound in the
     191action by writing  {{<id> <- <pattern>}}.
     192
     193So the simplest (though longest) rule is for {{digit}} which simply
     194matches any digit character, binds is, and returns it.  These are
     195collected by the {{digits}} rule which conses a list of one or more
     196digits.  Notice that {{digits}} first tries to bind more than one
     197digit by recursively matching itself, defaulting to just a single
     198digit only when this fails - the rules are tried in left-to-right
     199order, only backtracking on failure.  Finally, {{num}} takes the
     200result of {{digits}} and constructs a number from it.
     201
     202The other rules are from the original documentation, and are fairly
     203straightforward parser rules.  We've chosen to compute the values
     204directly, using {{(+ a b)}} and {{(* a b)}}, but could just have
     205easily returned the results as s-expressions, e.g. with {{`(+ ,a ,b)}}
     206giving the user an AST to work with.
     207
     208Now to put together the parser and generator, let's write a simple
     209utility function:
     210
     211<enscript highlight="scheme">
     212  ;; Utility function to parse from strings or ports,
    159213  ;; and return the semantic results on success or #f on failure:
    160214  (define (expr-parse x)
     
    173227</enscript>
    174228
    175 As you can see, writing the grammar is very natural.  We've chosen to
    176 compute the values directly in the actions, but could also return them
    177 as s-expressions, e.g. using {{`(+ ,a ,b)}} to get an AST to work
    178 with.
     229We pass a dummy filename and a port to the generator, and pass the
     230result of this to {{base-generator->results}}, then call the parser on
     231it.  The parser results are available in
     232{{parse-result-semantic-value}}.
    179233
    180234The egg is still rather primitive, and is missing some useful features
    181235such as charsets (which would simplify our "digit" rule greatly), and
    182236arbitrary predicates, but is definitely something that can be built
    183 on.  For similar efforts you can search for Scheme "parser
    184 combinators," which often refer to the same thing without the
    185 memoization.  A parser combinator project worth checking out is Taylor
    186 Campbell's [[http://mumble.net/~campbell/darcs/parscheme/|parscheme]].
     237on.  It would be especially nice if rules could be composed - maybe
     238someone can add that!
    187239
    188240== 5. About the Chicken Gazette
Note: See TracChangeset for help on using the changeset viewer.