#146 closed defect (worksforme)
several char-sets seem to be broken
Reported by: | Moritz Heidkamp | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | core libraries | Version: | 4.3.0 |
Keywords: | srfi-14 char-set | Cc: | |
Estimated difficulty: |
Description
char-set:letter
for example contains many non-displayable aka missing glyphs as can be witnessed by evaling:
(char-set->string char-set:letter)
which yields
"�����������������������������������zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA"
This is also true for char-set:lower-case
, char-set:upper-case
, char-set:printing
, char-set:symbol
and char-set:punctuation
-- maybe even more char-sets are affected, but those are the ones I've checked ;)
Attachments (1)
Change History (10)
comment:1 Changed 15 years ago by
comment:2 Changed 15 years ago by
Resolution: | → worksforme |
---|---|
Status: | new → closed |
These are non-ascii latin1 characters. Depending on your terminal they will show up as garbage or as accented letters.
comment:3 follow-ups: 4 5 Changed 15 years ago by
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
This is very odd as my terminal is very well able to display characters like é. I attach the file created by
(with-output-to-file "/tmp/char-set-problem" (cut print (char-set->string char-set:letter)))
This is on Arch Linux 2.6.32-ARCH #1 SMP PREEMPT Fri Dec 4 14:59:45 UTC 2009 i686 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz GenuineIntel? GNU/Linux with LANG=en_US.utf8 (if that helps).
Also:
CHICKEN
(c)2008-2009 The Chicken Team
(c)2000-2007 Felix L. Winkelmann
Version 4.3.0
linux-unix-gnu-x86 [ manyargs dload ptables ]
compiled 2009-12-15 on gut (Linux)
Changed 15 years ago by
Attachment: | char-set-problem added |
---|
comment:4 Changed 15 years ago by
Replying to syn:
This is very odd as my terminal is very well able to display characters like é. I attach the file created by
(with-output-to-file "/tmp/char-set-problem" (cut print (char-set->string char-set:letter)))This is on Arch Linux 2.6.32-ARCH #1 SMP PREEMPT Fri Dec 4 14:59:45 UTC 2009 i686 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz GenuineIntel? GNU/Linux with LANG=en_US.utf8 (if that helps).
Thanks for the file. The hexdump is:
00000000: fffe fdfc fbfa f9f8 f6f5 f4f3 f2f1 f0ef ................
00000010: eeed eceb eae9 e8e7 e6e5 e4e3 e2e1 e0df ................
00000020: dedd dcdb dad9 d8d6 d5d4 d3d2 d1d0 cfce ................
00000030: cdcc cbca c9c8 c7c6 c5c4 c3c2 c1c0 bab5 ................
00000040: aa7a 7978 7776 7574 7372 7170 6f6e 6d6c .zyxwvutsrqponml
00000050: 6b6a 6968 6766 6564 6362 615a 5958 5756 kjihgfedcbaZYXWV
00000060: 5554 5352 5150 4f4e 4d4c 4b4a 4948 4746 UTSRQPONMLKJIHGF
00000070: 4544 4342 410a EDCBA.
The 0xff at the start is a valid latin1 character ("yumlaut").
comment:5 Changed 15 years ago by
Replying to syn:
This is on Arch Linux 2.6.32-ARCH #1 SMP PREEMPT Fri Dec 4 14:59:45 UTC 2009 i686 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz GenuineIntel? GNU/Linux with LANG=en_US.utf8 (if that helps).
The terminal expects an UTF8 encoding, but the characters in the string are not UTF8 encoded. I think that is the root of the problem.
comment:6 follow-up: 7 Changed 15 years ago by
Alrihgt, so Chicken uses latin1 by default? That would explain it! I have been pointed to the utf8 egg meanwhile. (use utf8-srfi-14)
redefines char-set:letter
which now no longer contains extended latin characters:
#;2> (char-set->string char-set:letter) "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
Is this intended?
However, thanks for the help!
comment:7 Changed 15 years ago by
Replying to syn:
Alrihgt, so Chicken uses latin1 by default? That would explain it! I have been pointed to the utf8 egg meanwhile.
(use utf8-srfi-14)
redefineschar-set:letter
which now no longer contains extended latin characters:
#;2> (char-set->string char-set:letter) "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"Is this intended?
I'm not at expert in these things, but I assume this is locale-dependent (but I don't assume the utf8 egg uses locale information). Latin1-letters contains more elements than US-ASCII. The Right Thing would be to take locale-information into account, but that is an awful mess and needs an expert.
comment:8 Changed 15 years ago by
Resolution: | → worksforme |
---|---|
Status: | reopened → closed |
What platform is this on? Are you sure this is not caused by your terminal font settings? Your examples work as expected for me under Chicken 4.3.0 and Debian Linux.