source: project/wiki/eggref/4/ssax @ 14839

Last change on this file since 14839 was 14839, checked in by sjamaan, 11 years ago

Fix some markup. This is probably the last I'll do about these docs for a while, until I'm actually going to use ssax

File size: 6.2 KB
Line 
1[[tags: egg]]
2
3== ssax
4
5[[toc:]]
6
7=== Description
8
9Oleg Kiselyov's XML parser.
10
11=== Author
12
13Oleg Kiselyov, with some Chicken-specific modifications by Kirill
14Lisovsky.  Minor changes by [[/users/felix winkelmann|felix winkelmann]] to make the code
15suitable as an extension library.
16
17=== Requirements
18
19[[input-parse]]
20
21=== Documentation
22
23See the official [[http://ssax.sourceforge.net|SSAX homepage]] for
24comprehensive documentation.
25
26The following procedure is exported:
27
28<procedure>(ssax:xml->sxml PORT NAMESPACE-PREFIX-ASSIG)</procedure>
29
30This procedure reads XML data from {{PORT}} and returns an SXML
31representation. {{NAMESPACE-PREFIX-ASSIG}} is an alist that maps user
32prefixes (symbols) to namespaces (URI strings).
33
34The following macros are available:
35
36<macro>(ssax:make-parser TAG1 PROC1 [TAG2 PROC2 ...])</macro>
37
38Create a custom XML parser; an instance of the XML parsing framework.
39This will be a SAX, a DOM or a specialized parser depending on the
40supplied user-handlers.
41
42The arguments to {{ssax::make-parser}} are type/procedure pairs,
43interleaved in the argument list.  In other words, {{TAG1}}, {{TAG2}}
44etc are '''unquoted'''(!) symbols that identify the type of procedure
45that follows the tag; see below for the list of allowed tags.  The
46output of this macro is a procedure that represents a parser which
47accepts two arguments, {{PORT}} and {{SEED}}.  {{PORT}} is the port
48from which to read the XML data and {{SEED}} is the initial value of
49an accumulator that will be passed into the first procedure, where it
50can be appended to and returned.  Then this value will be passed on to
51the next procedure and so on to eventually obtain a result, in a
52{{FOLD}}-like fashion.
53
54Given below are tags and signatures of the corresponding procedures.
55Not all tags have to be specified.  If some are omitted, reasonable
56defaults will apply. {{SEED}} always represents the current value of
57the accumulator that will eventually be returned by the parser.
58
59* Tag: {{DOCTYPE}}
60* Handler-procedure: {{PORT DOCNAME SYSTEMID INTERNAL-SUBSET? SEED}}
61
62If {{INTERNAL-SUBSET?}} is {{#t}}, the current position in the port
63is right after we have read {{#\[}} that begins the internal DTD subset.
64We must finish reading of this subset before we return (or must call
65{{ssax:skip-internal-dtd}} if we aren't interested in reading it).
66
67The port at exit must be at the first symbol after the whole
68DOCTYPE declaration.
69The handler-procedure must generate four values:
70      ELEMS ENTITIES NAMESPACES SEED
71See {{xml-decl::elems}} for {{ELEMS}}. It may be {{#f}} to switch
72off the validation.
73{{NAMESPACES}} will typically contain user prefixes for selected
74URI symbols.
75The default handler-procedure skips the internal subset, if any,
76and returns {{(values #f '() '() SEED)}}.
77
78* Tag: {{UNDECL-ROOT}}
79* Handler-procedure: {{ELEM-GI SEED}}
80
81{{ELEM-GI}} is an {{UNRES-NAME}} of the root element. This procedure
82is called when an XML document under parsing contains ''no'' {{DOCTYPE}}
83declaration.
84The handler-procedure, as a DOCTYPE handler procedure above,
85must generate four values:
86       ELEMS ENTITIES NAMESPACES SEED
87The default handler-procedure returns {{(values #f '() '() seed)}}
88
89* Tag: {{NEW-LEVEL-SEED}}
90* Handler-procedure: see {{ssax:make-elem-parser}}, new-level-seed
91
92* Tag: {{FINISH-ELEMENT}}
93* Handler-procedure: see {{ssax:make-elem-parser}}, finish-elem
94
95* Tag: {{CHAR-DATA-HANDLER}}
96* Handler-procedure: see {{ssax:make-elem-parser}}, char-data-handler
97
98* Tag: {{PI}}
99* Handler-procedure: see {{ssax:make-pi-parser}}
100
101The default value is {{'()}}.
102
103<macro>(ssax:make-pi-parser PI-HANDLERS)</macro>
104
105Create a parser to parse and process one Processing Instruction (PI)
106element.  {{PI-HANDLERS}} is an alist {{(PI-TAG . PI-HANDLER)}} where
107{{PI-TAG}} is the name of the processing instruction and
108{{PI-HANDLER}} is a procedure {{PORT PI-TAG SEED}}.
109
110The handler should read the rest of the PI from {{PORT}}, up to and
111including the combination "{{?>}}" that terminates the PI. The handler
112should return a new seed.
113
114One of the {{PI-TAG}}s may be the symbol {{*DEFAULT*}}. The
115corresponding handler will handle PIs that no other handler will. If
116the {{*DEFAULT*}} {{PI-TAG}} is not specified, {{ssax:make-pi-parser}}
117will assume the default handler that skips the body of the PI.
118
119<macro>(ssax:make-elem-parser new-level-seed finish-elem char-data-handler pi-handlers)</macro>
120
121Create a parser to parse and process one element, including its
122character content or children elements. The parser is typically
123applied to the root element of a document.
124
125The generated parser is a procedure
126{{START-TAG-HEAD PORT ELEMS ENTITIES NAMESPACES PRESERVE-WS? SEED}}
127
128{{new-level-seed}}
129
130      procedure ELEM-GI ATTRIBUTES NAMESPACES EXPECTED-CONTENT SEED
131where {{ELEM-GI}} is a {{RES-NAME}} of the element about to be
132processed.  This procedure is to generate the seed to be passed to
133handlers that process the content of the element.
134
135{{finish-element}}
136
137      procedure ELEM-GI ATTRIBUTES NAMESPACES PARENT-SEED SEED
138This procedure is called when parsing of {{ELEM-GI}} is finished.  The
139{{SEED}} is the result from the last content parser (or from
140{{new-level-seed}} if the element has the empty content).
141{{PARENT-SEED}} is the same seed as was passed to {{new-level-seed}}.
142The procedure is to generate a seed that will be the result of the
143element parser.
144
145{{char-data-handler}}
146
147A string handler.
148
149{{pi-handlers}}
150
151See {{ssax:make-pi-parser}}.
152
153=== Unicode compatibility
154
155{{ssax:xml->sxml}} will convert numeric entities to UTF-8 byte sequences.  It does not depend on the [[utf8]] egg for this.
156
157Otherwise, UTF-8 operation is not well tested.
158
159=== Changelog
160
161* 5.0.0 Port to Chicken 4; fresh import of the clean upstream CVS tree (which now has downcased names)
162* 4.9.8 Convert numeric entities > 255 to UTF-8 [Jim Ursetto]
163* 4.9.7 Using ##sys#read/peek-char instead of read/peek-char [Daishi Kato]
164* 4.9.6 parser-error now raises a condition [Daishi Kato]
165* 4.9.5 Fixed bug in error-reporting function
166* 4.9.4 Replaced {{(apply string-append ...)}} calls with {{string-concatenate}}
167* 4.9.3 Adapted to new setup scheme. Fixed a reentrancy-bug [Thanks to Bruce Butterfield]
168* 4.9.2 {{SSAX:warn}} adds newline [Thanks to Sunnan]
169* 4.9.1 Fixed exports for case-sensitivity.
170
171=== License
172
173Public Domain
Note: See TracBrowser for help on using the repository browser.