Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#146 closed defect (worksforme)

several char-sets seem to be broken

Reported by: Moritz Heidkamp Owned by:
Priority: major Milestone:
Component: core libraries Version: 4.3.0
Keywords: srfi-14 char-set Cc:
Estimated difficulty:

Description

char-set:letter for example contains many non-displayable aka missing glyphs as can be witnessed by evaling:

  (char-set->string char-set:letter)

which yields

  "�����������������������������������zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA"

This is also true for char-set:lower-case, char-set:upper-case, char-set:printing, char-set:symbol and char-set:punctuation -- maybe even more char-sets are affected, but those are the ones I've checked ;)

Attachments (1)

char-set-problem (118 bytes) - added by Moritz Heidkamp 12 years ago.

Download all attachments as: .zip

Change History (10)

comment:1 Changed 12 years ago by Ivan Raikov

What platform is this on? Are you sure this is not caused by your terminal font settings? Your examples work as expected for me under Chicken 4.3.0 and Debian Linux.

comment:2 Changed 12 years ago by felix winkelmann

Resolution: worksforme
Status: newclosed

These are non-ascii latin1 characters. Depending on your terminal they will show up as garbage or as accented letters.

See http://htmlhelp.com/reference/charset/iso224-255.html

comment:3 Changed 12 years ago by Moritz Heidkamp

Resolution: worksforme
Status: closedreopened

This is very odd as my terminal is very well able to display characters like é. I attach the file created by

(with-output-to-file "/tmp/char-set-problem" (cut print (char-set->string char-set:letter)))

This is on Arch Linux 2.6.32-ARCH #1 SMP PREEMPT Fri Dec 4 14:59:45 UTC 2009 i686 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz GenuineIntel? GNU/Linux with LANG=en_US.utf8 (if that helps).

Also:

CHICKEN
(c)2008-2009 The Chicken Team
(c)2000-2007 Felix L. Winkelmann
Version 4.3.0
linux-unix-gnu-x86 [ manyargs dload ptables ]
compiled 2009-12-15 on gut (Linux)

Changed 12 years ago by Moritz Heidkamp

Attachment: char-set-problem added

comment:4 in reply to:  3 Changed 12 years ago by felix winkelmann

Replying to syn:

This is very odd as my terminal is very well able to display characters like é. I attach the file created by

(with-output-to-file "/tmp/char-set-problem" (cut print (char-set->string char-set:letter)))

This is on Arch Linux 2.6.32-ARCH #1 SMP PREEMPT Fri Dec 4 14:59:45 UTC 2009 i686 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz GenuineIntel? GNU/Linux with LANG=en_US.utf8 (if that helps).

Thanks for the file. The hexdump is:

00000000: fffe fdfc fbfa f9f8 f6f5 f4f3 f2f1 f0ef ................
00000010: eeed eceb eae9 e8e7 e6e5 e4e3 e2e1 e0df ................
00000020: dedd dcdb dad9 d8d6 d5d4 d3d2 d1d0 cfce ................
00000030: cdcc cbca c9c8 c7c6 c5c4 c3c2 c1c0 bab5 ................
00000040: aa7a 7978 7776 7574 7372 7170 6f6e 6d6c .zyxwvutsrqponml
00000050: 6b6a 6968 6766 6564 6362 615a 5958 5756 kjihgfedcbaZYXWV
00000060: 5554 5352 5150 4f4e 4d4c 4b4a 4948 4746 UTSRQPONMLKJIHGF
00000070: 4544 4342 410a EDCBA.

The 0xff at the start is a valid latin1 character ("yumlaut").

comment:5 in reply to:  3 Changed 12 years ago by felix winkelmann

Replying to syn:

This is on Arch Linux 2.6.32-ARCH #1 SMP PREEMPT Fri Dec 4 14:59:45 UTC 2009 i686 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz GenuineIntel? GNU/Linux with LANG=en_US.utf8 (if that helps).

The terminal expects an UTF8 encoding, but the characters in the string are not UTF8 encoded. I think that is the root of the problem.

comment:6 Changed 12 years ago by Moritz Heidkamp

Alrihgt, so Chicken uses latin1 by default? That would explain it! I have been pointed to the utf8 egg meanwhile. (use utf8-srfi-14) redefines char-set:letter which now no longer contains extended latin characters:

#;2> (char-set->string char-set:letter)
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

Is this intended?

However, thanks for the help!

comment:7 in reply to:  6 Changed 12 years ago by felix winkelmann

Replying to syn:

Alrihgt, so Chicken uses latin1 by default? That would explain it! I have been pointed to the utf8 egg meanwhile. (use utf8-srfi-14) redefines char-set:letter which now no longer contains extended latin characters:

#;2> (char-set->string char-set:letter)
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

Is this intended?

I'm not at expert in these things, but I assume this is locale-dependent (but I don't assume the utf8 egg uses locale information). Latin1-letters contains more elements than US-ASCII. The Right Thing would be to take locale-information into account, but that is an awful mess and needs an expert.

comment:8 Changed 12 years ago by felix winkelmann

Resolution: worksforme
Status: reopenedclosed

comment:9 Changed 12 years ago by (none)

Milestone: 4.3.0

Milestone 4.3.0 deleted

Note: See TracTickets for help on using tickets.