source: project/release/4/uri-generic/trunk/alternatives/README @ 30968

Last change on this file since 30968 was 30968, checked in by sjamaan, 6 years ago

Update README notes for URI-generic parser alternatives to latest versions of everything

File size: 5.8 KB
Line 
1Here are some alternative implementations of uri-generic, using
2various parsing libraries.  They all passed the full testsuite at the
3time they were written.  Occasionally they are updated to make them
4work again with the latest testsuite.
5
6The original uri-generic implementation (based on matchable) is a bit
7hard to read, but it is by far the most efficient. For readability I
8prefer the irregex one, based on composable Scheme Regular
9Expressions.  It is about average in performance compared to the other
10implementations and it will hopefully improve when irregex is optimized.
11
12Here are some notes I made after benchmarking and implementing each version:
13(benchmarks on a Lenovo X230, Chicken 4.9.1, which ships w/ irregex 0.9.2)
14
15The benchmark can be done like this:
16
17(load "uri-generic.irregex.so") ;; Use the compiled version!
18(load "../uri-generic.import.scm")
19(import uri-generic)
20(time (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i)))))
21
22You can compile with:
23
24for i in uri-generic.*.scm; do echo Compiling $i; csc -s -O3 $i; done
25
26Test with csi -s test.scm after changing it for the tested implementation.
27
28The +/- behind the LOC indicates that some parts are implemented using
29irregexes.  If those parts would be converted to get rid of the irregex
30dependency, the lines of code would change (generally increase)
31
32==============================================================================
33
34Current main implementation: based on matchable [[1260 LOC]]
35
360.252s CPU time, 0.028s GC time (major), 1320149/4900 mutations (total/tracked), 15/2139 GCs (major/minor)
37
38NOTES:
39
40Implementation is a bit hard to read & understand even if you know how
41it works. It is also the largest implementation (in lines of code).
42But it's faaaaaast! Turns out that converting a string to a list of chars
43once and then using an optimizing macro to match the list against expected
44chars is a very efficient way to do parsing :)
45
46------------------------------------------------------------------------------
47
48Implementation based on abnf [[943 LOC]]
49
500.552s CPU time, 0.032s GC time (major), 1730190/3654 mutations (total/tracked), 13/4274 GCs (major/minor)
51
52NOTES:
53
54abnf has undergone a few major API changes, which meant this
55alternative implementation required a massive overhaul.  I'd be
56hesitant to rely on a parsing egg which changes so radically.  On the
57other hand, it may have stabilised by now.
58
59------------------------------------------------------------------------------
60
61Implementation based on comparse [[892 LOC]]
62
631.28s CPU time, 0.14s GC time (major), 480090/49717 mutations (total/tracked), 62/9842 GCs (major/minor)
64
65NOTES:
66
67Reasonably easy to use and the resulting parser is pretty readable,
68but not very fast.  I presume it's because it keeps a lot of state
69around for backtracking.
70
71------------------------------------------------------------------------------
72
73irregex-based implementation [[858 LOC]]
74
750.344s CPU time, 0.032s GC time (major), 600077/2532 mutations (total/tracked), 19/2711 GCs (major/minor)
76
77NOTES:
78
79Very concise but still very readable syntax when using named matches.
80I especially like that built-in regexes like hex-digit and such are all
81quoted, whereas custom bits are unquoted.  It's an immediate indication
82that the definition is to be found elsewhere in the code (AFAIK, this
83is not extendable so you can't make libraries of regexes in the
84same way; at that point everything of the library becomes unquoted too).
85
86The 0.8.1 version of irregex which shipped with Chickens up till 4.6.1
87was extremely slow (this benchmark took over 50 seconds to run), but
88with irregex 0.8.2, the irregex implementation performs similarly to
89the matchable implementation (but it's still slightly slower).
90
91------------------------------------------------------------------------------
92
93packrat-based implementation [[997 LOC]]
94
952.44s CPU time, 0.964s GC time (major), 3110328/498768 mutations (total/tracked), 556/9723 GCs (major/minor)
96
97NOTES:
98
99In compiled mode it's the slowest of the bunch, and it's extremely
100clumsy.  Do not use, avoid at all costs, etc.  Why?  Many reasons:
101
102It is extremely annoying that the macro requires you to provide a body
103expression.  There is no automatic way to let it just resolve to
104whatever the lower nonterminals resolve to, nor to combine code from
105a sequence of expressions so you have to spell it out every time.
106
107In general, the macro just feels clumsy and you have to keep the
108specification next to your code all the time because it feels so
109unnatural (requiring, for my feeling, superfluous sets of parentheses).
110
111The macro does not allow escaping to Scheme quickly to produce a
112dynamically generated parser in-line.  You can define procedures
113outside of the macro which can be called as "nonterminals" but you
114can't, for example, parameterize a parser on-the-fly.
115
116The macro is much more like plain old BNF or normal yacc.  This means
117that for simple things like optional arguments you need to make a
118separate rule containing both the argument and an empty sequence.
119Quite annoying, after you've worked with any of the above parser libs
120where that's no problem.  Perhaps I just misunderstood how to use
121the (/ ...) rule, but I couldn't get it to work.
122
123Debugging is pretty terrible, due to the definitions being wrapped in
124one large macro.  Because everything is wrapped up in one macro, you
125can't easily reuse subparsers, unless you rip them out and put them in
126their own macro.  This makes it completely uncomposable.
127
128Possibly things are better when you just use the combinators instead,
129not using the macro. The advantage of the macro is that the resulting
130ruleset is quite readable.
131
132=========================================================================
133
134Other notes that people interested in Scheme parsing might want to read
135(by zbigniew): http://3e8.org/zb/eggs/lexing.txt
Note: See TracBrowser for help on using the repository browser.