source: project/wiki/eggref/4/ssax @ 14838

Last change on this file since 14838 was 14838, checked in by sjamaan, 11 years ago

Add some docs on ssax. Will improve it later

File size: 6.1 KB
Line 
1[[tags: egg]]
2
3== ssax
4
5[[toc:]]
6
7=== Description
8
9Oleg Kiselyov's XML parser.
10
11=== Author
12
13Oleg Kiselyov, with some Chicken-specific modifications by Kirill
14Lisovsky.  Minor changes by [[/users/felix winkelmann|felix winkelmann]] to make the code
15suitable as an extension library.
16
17=== Requirements
18
19None
20
21=== Documentation
22
23See the official [[http://ssax.sourceforge.net|SSAX homepage]] for
24comprehensive documentation.
25
26The following procedure is exported:
27
28<procedure>(ssax:xml->sxml PORT NAMESPACE-PREFIX-ASSIG)</procedure>
29
30This procedure reads XML data from {{PORT}} and returns an SXML
31representation. {{NAMESPACE-PREFIX-ASSIG}} is an alist that maps user
32prefixes (symbols) to namespaces (URI strings).
33
34The following macros are available:
35
36<macro>(ssax:make-parser TAG1 PROC1 [TAG2 PROC2 ...])</macro>
37
38Create a custom XML parser; an instance of the XML parsing framework.
39This will be a SAX, a DOM or a specialized parser depending on the
40supplied user-handlers.
41
42The arguments to {{ssax::make-parser}} are type/procedure pairs,
43interleaved in the argument list.  In other words, {{TAG1}}, {{TAG2}}
44etc are '''unquoted'''(!) symbols that identify the type of procedure
45that follows the tag; see below for the list of allowed tags.  The
46output of this macro is a procedure that represents a parser which
47accepts two arguments, {{PORT}} and {{SEED}}.  {{PORT}} is the port
48from which to read the XML data and {{SEED}} is the initial value of
49an accumulator that will be passed into the first procedure, where it
50can be appended to and returned.  Then this value will be passed on to
51the next procedure and so on to eventually obtain a result, in a
52{{FOLD}}-like fashion.
53
54Given below are tags and signatures of the corresponding procedures.
55Not all tags have to be specified.  If some are omitted, reasonable
56defaults will apply. {{SEED}} always represents the current value of
57the accumulator that will eventually be returned by the parser.
58
59Tag: {{DOCTYPE}}
60Handler-procedure: {{PORT DOCNAME SYSTEMID INTERNAL-SUBSET? SEED}}
61
62If {{INTERNAL-SUBSET?}} is {{#t}}, the current position in the port
63is right after we have read {{#\[}} that begins the internal DTD subset.
64We must finish reading of this subset before we return (or must call
65{{ssax:skip-internal-dtd}} if we aren't interested in reading it).
66
67The port at exit must be at the first symbol after the whole
68DOCTYPE declaration.
69The handler-procedure must generate four values:
70      ELEMS ENTITIES NAMESPACES SEED
71See {{xml-decl::elems}} for {{ELEMS}}. It may be {{#f}} to switch
72off the validation.
73{{NAMESPACES}} will typically contain user prefixes for selected
74URI symbols.
75The default handler-procedure skips the internal subset, if any,
76and returns {{(values #f '() '() SEED)}}.
77
78Tag: {{UNDECL-ROOT}}
79Handler-procedure: {{ELEM-GI SEED}}
80
81{{ELEM-GI}} is an {{UNRES-NAME}} of the root element. This procedure
82is called when an XML document under parsing contains ''no'' {{DOCTYPE}}
83declaration.
84The handler-procedure, as a DOCTYPE handler procedure above,
85must generate four values:
86       ELEMS ENTITIES NAMESPACES SEED
87The default handler-procedure returns {{(values #f '() '() seed)}}
88
89Tag: {{NEW-LEVEL-SEED}}
90Handler-procedure: see {{ssax:make-elem-parser}}, new-level-seed
91
92Tag: {{FINISH-ELEMENT}}
93Handler-procedure: see {{ssax:make-elem-parser}}, finish-elem
94
95Tag: {{CHAR-DATA-HANDLER}}
96Handler-procedure: see {{ssax:make-elem-parser}}, char-data-handler
97
98Tag: {{PI}}
99Handler-procedure: see {{ssax:make-pi-parser}}
100The default value is {{'()}}.
101
102<macro>(ssax:make-pi-parser PI-HANDLERS)</macro>
103
104Create a parser to parse and process one Processing Instruction (PI)
105element.  {{PI-HANDLERS}} is an alist {{(PI-TAG . PI-HANDLER)}} where
106{{PI-TAG}} is the name of the processing instruction and
107{{PI-HANDLER}} is a procedure {{PORT PI-TAG SEED}}.
108
109The handler should read the rest of the PI from {{PORT}}, up to and
110including the combination "{{?>}}" that terminates the PI. The handler
111should return a new seed.
112
113One of the {{PI-TAG}}s may be the symbol {{*DEFAULT*}}. The
114corresponding handler will handle PIs that no other handler will. If
115the {{*DEFAULT*}} {{PI-TAG}} is not specified, {{ssax:make-pi-parser}}
116will assume the default handler that skips the body of the PI.
117
118<macro>(ssax:make-elem-parser new-level-seed finish-elem char-data-handler pi-handlers)</macro>
119
120Create a parser to parse and process one element, including its
121character content or children elements. The parser is typically
122applied to the root element of a document.
123
124The generated parser is a procedure
125{{START-TAG-HEAD PORT ELEMS ENTITIES NAMESPACES PRESERVE-WS? SEED}}
126
127{{new-level-seed}}
128      procedure ELEM-GI ATTRIBUTES NAMESPACES EXPECTED-CONTENT SEED
129where {{ELEM-GI}} is a {{RES-NAME}} of the element about to be
130processed.  This procedure is to generate the seed to be passed to
131handlers that process the content of the element.
132
133{{finish-element}}
134      procedure ELEM-GI ATTRIBUTES NAMESPACES PARENT-SEED SEED
135This procedure is called when parsing of {{ELEM-GI}} is finished.  The
136{{SEED}} is the result from the last content parser (or from
137{{new-level-seed}} if the element has the empty content).
138{{PARENT-SEED}} is the same seed as was passed to {{new-level-seed}}.
139The procedure is to generate a seed that will be the result of the
140element parser.
141
142{{char-data-handler}}
143A string handler.
144
145{{pi-handlers}}
146See {{ssax:make-pi-handler}}.
147
148=== Unicode compatibility
149
150{{ssax:xml->sxml}} will convert numeric entities to UTF-8 byte sequences.  It does not depend on the [[utf8]] egg for this.
151
152Otherwise, UTF-8 operation is not well tested.
153
154=== Changelog
155
156* 5.0.0 Port to Chicken 4; fresh import of the clean upstream CVS tree (which now has downcased names)
157* 4.9.8 Convert numeric entities > 255 to UTF-8 [Jim Ursetto]
158* 4.9.7 Using ##sys#read/peek-char instead of read/peek-char [Daishi Kato]
159* 4.9.6 parser-error now raises a condition [Daishi Kato]
160* 4.9.5 Fixed bug in error-reporting function
161* 4.9.4 Replaced {{(apply string-append ...)}} calls with {{string-concatenate}}
162* 4.9.3 Adapted to new setup scheme. Fixed a reentrancy-bug [Thanks to Bruce Butterfield]
163* 4.9.2 {{SSAX:warn}} adds newline [Thanks to Sunnan]
164* 4.9.1 Fixed exports for case-sensitivity.
165
166=== License
167
168Public Domain
Note: See TracBrowser for help on using the repository browser.