Opened 8 years ago
Closed 22 months ago
#1374 closed defect (wontfix)
`display' issue with UTF-8
Reported by: | Mario Domenech Goulart | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 5.4 |
Component: | core libraries | Version: | 4.12.0 |
Keywords: | display, ##sys#print, utf8 | Cc: | |
Estimated difficulty: | hard |
Description
I received a bug report against awful (https://github.com/mario-goulart/awful/issues/5), but the issue seems to be related to CHICKEN.
Here's a smaller test case to illustrate the problem:
$ cat test.scm (cond-expand (chicken (use utf8)) (else #f)) (let ((chars (string->list "出"))) (display "<html><head><meta charset=\"utf-8\"/></head>") (display chars) (display "<br>") (display "(") (display (car chars)) (display ")") (display "</html>"))
To see the problem:
$ csi -s test.scm > chicken-out.html $ firefox chicken-out.html
It seems that display
is messing up at printing the list containing the UTF-8 char.
Gauche does the right thing:
$ gosh test.scm > gauche-out.html $ firefox gauche-out.html
The two output files differ, of course:
$ cmp gauche-out.html chicken-out.html gauche-out.html chicken-out.html differ: byte 44, line 1
Change History (5)
comment:2 Changed 7 years ago by
Estimated difficulty: | → hard |
---|
comment:3 Changed 3 years ago by
Milestone: | someday → 5.4 |
---|
The problem seems to be that the utf8 egg redefines display
as a procedure which special-cases characters, but hands off displaying of nested structures to the built-in display
, which then messes up.
The built-in display
uses outchr
which calls the port's write-char
procedure, which for regular file-based ports is defined as C_display_char
which uses C_fputc
.
Perhaps one of these should be changed analogously to ##sys#char->utf8-string
so that we're not calling putc
directly on wide characters? I think it might be good to do that at the lowest level possible (i.e., either C_display_char
or write-char
)
comment:4 Changed 3 years ago by
hmm, on second thought, that would break writing of raw bytes or latin1. Perhaps this is better solved by the utf8 egg overloading the port with a custom port that calls the underlying port's write-char
in the described way, and then handing that off to the built-in display
?
comment:5 Changed 22 months ago by
Resolution: | → wontfix |
---|---|
Status: | new → closed |
This will be addressed by a fully unicode-aware string representation in the next major release.
That’s just because CHICKEN strings are byte strings, not utf-8 strings.
(use utf8) at the top of the file should solve the issue here.