Opened 11 years ago

Closed 11 years ago

Last modified 9 years ago

#345 closed defect (wontfix)

utf8 regexp bug with underscores

Reported by: Jim Ursetto Owned by: sjamaan
Priority: minor Milestone: 4.9.0
Component: core libraries Version: 4.5.x
Keywords: Cc:
Estimated difficulty:

Description

Only in utf8 mode, regexp seems to have a bug with negated charsets containing underscores in either POSIX REs or SREs. I'm not sure but I think any charset with one underscore and one or more other chars is affected.

Irregex has the same problem but only when called with option 'utf8.

I'm using chicken-experimental 4.5.8 without irregex 0.8, so perhaps this bug is fixed in irregex 0.8? I haven't checked. I know this occurs back to Chicken 4.5.0 at least.

#;> (use utf8)
#;> (regexp "[^_]")
#<regexp>
#;> (regexp "[^a_b]")
Error: (cddr) bad argument type: ()
#;> (regexp "[^a_]")
Error: (cddr) bad argument type: ()
#;> (regexp "[^_a]")
Error: (cddr) bad argument type: ()
#;> (regexp '(~ #\_ #\a))
Error: (cddr) bad argument type: ()

#;> (irregex '(~ #\_ #\a))
#(*irregex-tag* ...)
#;> (irregex '(~ #\_ #\a) 'utf8)
Error: (cddr) bad argument type: ()

Attachments (1)

irregex-utf8-nonranges.patch (608 bytes) - added by sjamaan 11 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 11 years ago by felix winkelmann

Owner: set to sjamaan
Status: newassigned

comment:2 Changed 11 years ago by felix winkelmann

Milestone: 4.6.04.7.0

comment:3 Changed 11 years ago by sjamaan

It does not crap out in the new version of irregex because the cset API has changed in such a way that it will never include bare characters in charsets, which means this bug should not get triggered (the assumption is still there, but it is now correct).

Here's a quick hack that fixes it for Chicken 4.6.0, if it should still get in.

Changed 11 years ago by sjamaan

comment:4 in reply to:  3 Changed 11 years ago by felix winkelmann

Replying to sjamaan:

It does not crap out in the new version of irregex because the cset API has changed in such a way that it will never include bare characters in charsets, which means this bug should not get triggered (the assumption is still there, but it is now correct).

Here's a quick hack that fixes it for Chicken 4.6.0, if it should still get in.

Thanks for the patch, but I don't think this will make it into the 4.6.0 release.

comment:5 Changed 11 years ago by sjamaan

Resolution: wontfix
Status: assignedclosed

Then I'll close it wontfix, since 4.7.0 should include the new irregex. Please reopen if you disagree.

comment:6 Changed 11 years ago by felix winkelmann

Milestone: 4.7.04.8.0

Milestone 4.7.0 deleted

comment:7 Changed 9 years ago by felix winkelmann

Milestone: 4.8.04.9.0

Milestone 4.8.0 deleted

Note: See TracTickets for help on using tickets.