Opened 5 days ago
#1851 new defect
utf8 egg: Missing char sets and outdated tables
Reported by: | Zipheir | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | someday |
Component: | unknown | Version: | 5.4.0 |
Keywords: | unicode | Cc: | |
Estimated difficulty: |
Description
The unicode-char-sets module of the utf8 egg is missing several character sets. In particular, there is no set for characters with the Numeric property (making it impossible to implement a Unicode-aware 'char-numeric?' in CHICKEN) or for any of the punctuation properties. The utf8-srfi-14 module includes char-set:digit and char-set:punctuation, but these are throwaway ASCII-only implementations (in a file that begins with "Unicode capable char-sets", no less!). These sets should be added.
Furthermore, the sets that unicode-char-sets does provide seem to be built on data that is extremely out-of-date. The header comment in unicode-char-sets.scm claims the tables were generated in 2007.