source: project/release/4/uri-generic/trunk/alternatives/README @ 20514

Last change on this file since 20514 was 20514, checked in by sjamaan, 10 years ago

uri-generic: update alternatives code to match the current main code. All tests succeed for irregex and packrat. abnf is broken, will fix later. Also updated the benchmark results (look at the improvments in irregex and drool :P)

File size: 5.0 KB
Line 
1Here are some alternative implementations of uri-generic, using
2various parsing libraries.  They all passed the full testsuite at the
3time they were written.
4
5The original uri-generic implementation (based on matchable) is a bit
6hard to read, but it is by far the most efficient. For readability I
7prefer the irregex one, based on composable Scheme Regular
8Expressions.  It is about average in performance compared to the other
9implementations and it will hopefully improve when irregex is optimized.
10
11Here are some notes I made after benchmarking and implementing each version:
12(benchmarks on an iBook G4, using Chicken 4.6.2 with irregex 0.8.2)
13
14The +/- behind the LOC indicates that some parts are implemented using
15irregexes.  If those parts would be converted to get rid of the irregex
16dependency, the lines of code would change (generally increase)
17
18==============================================================================
19
20Current main implementation: based on matchable [[1058 LOC]]
21
22#;1> ,t (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i))))
237.055s CPU time, 0.747s GC time (major), 1320215 mutations, 85/16096 GCs (major/minor)
24
25NOTES:
26
27Implementation is a bit hard to read & understand even if you know how
28it works. It is also the largest implementation (in lines of code).
29But it's faaaaaast! Turns out that converting a string to a list of chars
30once and then using an optimizing macro to match the list against expected
31chars is a very efficient way to do parsing :)
32
33------------------------------------------------------------------------------
34
35Implementation based on abnf [[790+/- LOC]]
36
37#;1> ,t (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i))))
3826.32s CPU time, 5.039s GC time (major), 570140 mutations, 411/67768 GCs (major/minor)
39
40NOTES:
41
42It always generates all possible matches.  For example, an abnf for
43"a*" would generate ("a" "aa" "aaa") when passed "aaa" as input.
44I think this is the major reason it's so slow.
45
46------------------------------------------------------------------------------
47
48irregex-based implementation [[695 LOC]]
49
50#;1> ,t (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i))))
517.659s CPU time, 0.76s GC time (major), 600143 mutations, 61/15838 GCs (major/minor)
52
53NOTES:
54
55Very concise but still very readable syntax when using named matches.
56I especially like that built-in regexes like hex-digit and such are all
57quoted, whereas custom bits are unquoted.  It's an immediate indication
58that the definition is to be found elsewhere in the code (AFAIK, this
59is not extendable so you can't make libraries of regexes in the
60same way; at that point everything of the library becomes unquoted too).
61
62The 0.8.1 version of irregex which shipped with Chickens up till 4.6.1
63was extremely slow (this benchmark took over 50 seconds to run), but
64with irregex 0.8.2, the irregex implementation performs similarly to
65the matchable implementation.
66
67------------------------------------------------------------------------------
68
69packrat-based implementation [[842+/- LOC]]
70
71#;1> ,t (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i))))
7253.339s CPU time, 15.325s GC time (major), 3100393 mutations, 1547/340859 GCs (major/minor)
73
74NOTES:
75
76It is extremely annoying that the macro requires you to provide a body
77expression.  There is no automatic way to let it just resolve to
78whatever the lower nonterminals resolve to, nor to combine code from
79a sequence of expressions so you have to spell it out every time.
80
81In general, the macro just feels clumsy and you have to keep the
82specification next to your code all the time because it feels so
83unnatural (requiring, for my feeling, superfluous sets of parentheses).
84
85The macro does not allow escaping to Scheme quickly to produce a
86dynamically generated parser in-line.  You can define procedures
87outside of the macro which can be called as "nonterminals" but you
88can't, for example, parameterize a parser on-the-fly.
89
90The macro is much more like plain old BNF or normal yacc.  This means
91that for simple things like optional arguments you need to make a
92separate rule containing both the argument and an empty sequence.
93Quite annoying, after you've worked with any of the above parser libs
94where that's no problem.  Perhaps I just misunderstood how to use
95the (/ ...) rule, but I couldn't get it to work.
96
97Possibly things are better when you just use the combinators instead,
98not using the macro. The advantage of the macro is that the resulting
99ruleset is quite readable.
100
101Furthermore, this parser has the same disadvantage that abnf has, it
102generates all possible parses and hence has quite a bit of overhead.
103
104=========================================================================
105
106Other notes that people interested in Scheme parsing might want to read
107(by zbigniew): http://3e8.org/zb/eggs/lexing.txt
Note: See TracBrowser for help on using the repository browser.