Changeset 30968 in project


Ignore:
Timestamp:
06/05/14 22:16:33 (6 years ago)
Author:
sjamaan
Message:

Update README notes for URI-generic parser alternatives to latest versions of everything

File:
1 edited

Legend:

Unmodified
Added
Removed
  • release/4/uri-generic/trunk/alternatives/README

    r20514 r30968  
    11Here are some alternative implementations of uri-generic, using
    22various parsing libraries.  They all passed the full testsuite at the
    3 time they were written.
     3time they were written.  Occasionally they are updated to make them
     4work again with the latest testsuite.
    45
    56The original uri-generic implementation (based on matchable) is a bit
     
    1011
    1112Here are some notes I made after benchmarking and implementing each version:
    12 (benchmarks on an iBook G4, using Chicken 4.6.2 with irregex 0.8.2)
     13(benchmarks on a Lenovo X230, Chicken 4.9.1, which ships w/ irregex 0.9.2)
     14
     15The benchmark can be done like this:
     16
     17(load "uri-generic.irregex.so") ;; Use the compiled version!
     18(load "../uri-generic.import.scm")
     19(import uri-generic)
     20(time (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i)))))
     21
     22You can compile with:
     23
     24for i in uri-generic.*.scm; do echo Compiling $i; csc -s -O3 $i; done
     25
     26Test with csi -s test.scm after changing it for the tested implementation.
    1327
    1428The +/- behind the LOC indicates that some parts are implemented using
     
    1832==============================================================================
    1933
    20 Current main implementation: based on matchable [[1058 LOC]]
     34Current main implementation: based on matchable [[1260 LOC]]
    2135
    22 #;1> ,t (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i))))
    23 7.055s CPU time, 0.747s GC time (major), 1320215 mutations, 85/16096 GCs (major/minor)
     360.252s CPU time, 0.028s GC time (major), 1320149/4900 mutations (total/tracked), 15/2139 GCs (major/minor)
    2437
    2538NOTES:
     
    3346------------------------------------------------------------------------------
    3447
    35 Implementation based on abnf [[790+/- LOC]]
     48Implementation based on abnf [[943 LOC]]
    3649
    37 #;1> ,t (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i))))
    38 26.32s CPU time, 5.039s GC time (major), 570140 mutations, 411/67768 GCs (major/minor)
     500.552s CPU time, 0.032s GC time (major), 1730190/3654 mutations (total/tracked), 13/4274 GCs (major/minor)
    3951
    4052NOTES:
    4153
    42 It always generates all possible matches.  For example, an abnf for
    43 "a*" would generate ("a" "aa" "aaa") when passed "aaa" as input.
    44 I think this is the major reason it's so slow.
     54abnf has undergone a few major API changes, which meant this
     55alternative implementation required a massive overhaul.  I'd be
     56hesitant to rely on a parsing egg which changes so radically.  On the
     57other hand, it may have stabilised by now.
    4558
    4659------------------------------------------------------------------------------
    4760
    48 irregex-based implementation [[695 LOC]]
     61Implementation based on comparse [[892 LOC]]
    4962
    50 #;1> ,t (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i))))
    51 7.659s CPU time, 0.76s GC time (major), 600143 mutations, 61/15838 GCs (major/minor)
     631.28s CPU time, 0.14s GC time (major), 480090/49717 mutations (total/tracked), 62/9842 GCs (major/minor)
     64
     65NOTES:
     66
     67Reasonably easy to use and the resulting parser is pretty readable,
     68but not very fast.  I presume it's because it keeps a lot of state
     69around for backtracking.
     70
     71------------------------------------------------------------------------------
     72
     73irregex-based implementation [[858 LOC]]
     74
     750.344s CPU time, 0.032s GC time (major), 600077/2532 mutations (total/tracked), 19/2711 GCs (major/minor)
    5276
    5377NOTES:
     
    6387was extremely slow (this benchmark took over 50 seconds to run), but
    6488with irregex 0.8.2, the irregex implementation performs similarly to
    65 the matchable implementation.
     89the matchable implementation (but it's still slightly slower).
    6690
    6791------------------------------------------------------------------------------
    6892
    69 packrat-based implementation [[842+/- LOC]]
     93packrat-based implementation [[997 LOC]]
    7094
    71 #;1> ,t (let loop ((i 0)) (unless (> i 10000) (uri-reference "http://call-with-current-continuation.org/foo/bar?mooh#qux") (loop (add1 i))))
    72 53.339s CPU time, 15.325s GC time (major), 3100393 mutations, 1547/340859 GCs (major/minor)
     952.44s CPU time, 0.964s GC time (major), 3110328/498768 mutations (total/tracked), 556/9723 GCs (major/minor)
    7396
    7497NOTES:
     98
     99In compiled mode it's the slowest of the bunch, and it's extremely
     100clumsy.  Do not use, avoid at all costs, etc.  Why?  Many reasons:
    75101
    76102It is extremely annoying that the macro requires you to provide a body
     
    95121the (/ ...) rule, but I couldn't get it to work.
    96122
     123Debugging is pretty terrible, due to the definitions being wrapped in
     124one large macro.  Because everything is wrapped up in one macro, you
     125can't easily reuse subparsers, unless you rip them out and put them in
     126their own macro.  This makes it completely uncomposable.
     127
    97128Possibly things are better when you just use the combinators instead,
    98129not using the macro. The advantage of the macro is that the resulting
    99130ruleset is quite readable.
    100 
    101 Furthermore, this parser has the same disadvantage that abnf has, it
    102 generates all possible parses and hence has quite a bit of overhead.
    103131
    104132=========================================================================
Note: See TracChangeset for help on using the changeset viewer.