Opened 8 years ago
Closed 3 years ago
#1374 closed defect (wontfix)
`display' issue with UTF-8
| Reported by: | Mario Domenech Goulart | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 5.4 | 
| Component: | core libraries | Version: | 4.12.0 | 
| Keywords: | display, ##sys#print, utf8 | Cc: | |
| Estimated difficulty: | hard | 
Description
I received a bug report against awful (https://github.com/mario-goulart/awful/issues/5), but the issue seems to be related to CHICKEN.
Here's a smaller test case to illustrate the problem:
$ cat test.scm
(cond-expand
  (chicken (use utf8))
  (else #f))
(let ((chars (string->list "出")))
  (display "<html><head><meta charset=\"utf-8\"/></head>")
  (display chars)
  (display "<br>")
  (display "(")
  (display (car chars))
  (display ")")
  (display "</html>"))
To see the problem:
$ csi -s test.scm > chicken-out.html $ firefox chicken-out.html
It seems that display is messing up at printing the list containing the UTF-8 char.
Gauche does the right thing:
$ gosh test.scm > gauche-out.html $ firefox gauche-out.html
The two output files differ, of course:
$ cmp gauche-out.html chicken-out.html gauche-out.html chicken-out.html differ: byte 44, line 1
Change History (5)
comment:2 Changed 8 years ago by
| Estimated difficulty: | → hard | 
|---|
comment:3 Changed 4 years ago by
| Milestone: | someday → 5.4 | 
|---|
The problem seems to be that the utf8 egg redefines display as a procedure which special-cases characters, but hands off displaying of nested structures to the built-in display, which then messes up.
The built-in display uses outchr which calls the port's write-char procedure, which for regular file-based ports is defined as C_display_char which uses C_fputc.
Perhaps one of these should be changed analogously to ##sys#char->utf8-string so that we're not calling putc directly on wide characters? I think it might be good to do that at the lowest level possible (i.e., either C_display_char or write-char)
comment:4 Changed 4 years ago by
hmm, on second thought, that would break writing of raw bytes or latin1. Perhaps this is better solved by the utf8 egg overloading the port with a custom port that calls the underlying port's write-char in the described way, and then handing that off to the built-in display?
comment:5 Changed 3 years ago by
| Resolution: | → wontfix | 
|---|---|
| Status: | new → closed | 
This will be addressed by a fully unicode-aware string representation in the next major release.


That’s just because CHICKEN strings are byte strings, not utf-8 strings.
(use utf8) at the top of the file should solve the issue here.