#345 closed defect (wontfix)
utf8 regexp bug with underscores
Reported by: | Jim Ursetto | Owned by: | sjamaan |
---|---|---|---|
Priority: | minor | Milestone: | 4.9.0 |
Component: | core libraries | Version: | 4.5.x |
Keywords: | Cc: | ||
Estimated difficulty: |
Description
Only in utf8 mode, regexp seems to have a bug with negated charsets containing underscores in either POSIX REs or SREs. I'm not sure but I think any charset with one underscore and one or more other chars is affected.
Irregex has the same problem but only when called with option 'utf8.
I'm using chicken-experimental 4.5.8 without irregex 0.8, so perhaps this bug is fixed in irregex 0.8? I haven't checked. I know this occurs back to Chicken 4.5.0 at least.
#;> (use utf8) #;> (regexp "[^_]") #<regexp> #;> (regexp "[^a_b]") Error: (cddr) bad argument type: () #;> (regexp "[^a_]") Error: (cddr) bad argument type: () #;> (regexp "[^_a]") Error: (cddr) bad argument type: () #;> (regexp '(~ #\_ #\a)) Error: (cddr) bad argument type: () #;> (irregex '(~ #\_ #\a)) #(*irregex-tag* ...) #;> (irregex '(~ #\_ #\a) 'utf8) Error: (cddr) bad argument type: ()
Attachments (1)
Change History (8)
comment:1 Changed 14 years ago by
Owner: | set to sjamaan |
---|---|
Status: | new → assigned |
comment:2 Changed 14 years ago by
Milestone: | 4.6.0 → 4.7.0 |
---|
comment:3 follow-up: 4 Changed 14 years ago by
Changed 14 years ago by
Attachment: | irregex-utf8-nonranges.patch added |
---|
comment:4 Changed 14 years ago by
Replying to sjamaan:
It does not crap out in the new version of irregex because the cset API has changed in such a way that it will never include bare characters in charsets, which means this bug should not get triggered (the assumption is still there, but it is now correct).
Here's a quick hack that fixes it for Chicken 4.6.0, if it should still get in.
Thanks for the patch, but I don't think this will make it into the 4.6.0 release.
comment:5 Changed 14 years ago by
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
Then I'll close it wontfix, since 4.7.0 should include the new irregex. Please reopen if you disagree.
It does not crap out in the new version of irregex because the cset API has changed in such a way that it will never include bare characters in charsets, which means this bug should not get triggered (the assumption is still there, but it is now correct).
Here's a quick hack that fixes it for Chicken 4.6.0, if it should still get in.