Opened 14 years ago
Closed 13 years ago
#325 closed enhancement (fixed)
ssax and Unicode entities
Reported by: | Ivan Raikov | Owned by: | Jim Ursetto |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | extensions | Version: | 4.5.x |
Keywords: | ssax unicode entity | Cc: | |
Estimated difficulty: |
Description
The SSAX library includes a file, html-entity-codes.scm, that defines a table of Unicode entity codes. This table could be used to replace ssax:predefined-parsed-entities, so that XHTML files that include Unicode entity codes can be parsed with SSAX. However, the Chicken sxml-transforms library does not export these definitions.
The code that constructs the table of Unicode entity codes assumes Unicode-aware integer->char, so it should not be included in sxml-transforms by default. Perhaps we need to a separate egg, something like utf8-sxml-transforms that utilizes the utf8 library?
Change History (6)
comment:1 Changed 14 years ago by
Milestone: | 4.6.0 |
---|
comment:2 Changed 14 years ago by
Summary: | sxml-transforms and Unicode entities → ssax and Unicode entities |
---|
I think that's right. ssax is about parsing, sxml-transforms is about transformation of already-parsed sxml.
I'll change the title to match. I don't know who maintains ssax, but I don't.
comment:3 follow-up: 4 Changed 14 years ago by
Keywords: | ssax added; sxml-transforms removed |
---|---|
Owner: | changed from sjamaan to Jim Ursetto |
Status: | new → assigned |
No one really maintains it, but I can make this change.
Ivan, do you just want me to export html-entity-unicode-chars from ssax, after fixing it to handle utf8? I don't know to use the lowlevel SSAX parser so if you could give me a short test case that would be appreciated.
comment:4 Changed 14 years ago by
Yes, that would be fine. I also have not used the low level parser, but I will look at it and see if I can make it work. Any XML file that includes a Unicode entity (e.g. hellip) would be a good test case.
Replying to zbigniew:
No one really maintains it, but I can make this change.
Ivan, do you just want me to export html-entity-unicode-chars from ssax, after fixing it to handle utf8? I don't know to use the lowlevel SSAX parser so if you could give me a short test case that would be appreciated.
comment:6 Changed 13 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Added html-entity-unicode-chars
in version 5.0.5. I have absolutely no idea how to use it --- you can't pass it to ssax:xml->sxml
--- but at least it has the right value!
Ivan,
integer->char is already unicode aware.
What is not unicode aware is converting this character to a string using (string c) or (make-string 1 c). However, you can use the system procedure ##sys#char->utf8-string:
It is trivial to modify html-entity-unicode-chars to call this instead of core make-string, which will not require the utf8 egg. In fact I did a similar thing in the 'ssax' egg so that numeric entities are parsed into utf8 sequences without requiring the utf8 egg.
Speaking of which, shouldn't this be filed against the ssax egg and not against sxml-transforms?