Custom Query (1630 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (37 - 39 of 1630)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Ticket Resolution Summary Owner Reporter
#345 wontfix utf8 regexp bug with underscores sjamaan Jim Ursetto
Description

Only in utf8 mode, regexp seems to have a bug with negated charsets containing underscores in either POSIX REs or SREs. I'm not sure but I think any charset with one underscore and one or more other chars is affected.

Irregex has the same problem but only when called with option 'utf8.

I'm using chicken-experimental 4.5.8 without irregex 0.8, so perhaps this bug is fixed in irregex 0.8? I haven't checked. I know this occurs back to Chicken 4.5.0 at least.

#;> (use utf8)
#;> (regexp "[^_]")
#<regexp>
#;> (regexp "[^a_b]")
Error: (cddr) bad argument type: ()
#;> (regexp "[^a_]")
Error: (cddr) bad argument type: ()
#;> (regexp "[^_a]")
Error: (cddr) bad argument type: ()
#;> (regexp '(~ #\_ #\a))
Error: (cddr) bad argument type: ()

#;> (irregex '(~ #\_ #\a))
#(*irregex-tag* ...)
#;> (irregex '(~ #\_ #\a) 'utf8)
Error: (cddr) bad argument type: ()

#1182 fixed utf8 egg silently accepts invalid byte sequences Alex Shinn Moritz Heidkamp
Description

I noticed that some procedures of the utf8 egg silently accept invalid byte sequences. This might have some safety implications, e.g. consider this case (the procedures used are the core versions, procedures from the utf8 egg are prefixed with utf8- in the following code snippets):

(define evil-quote
  (list->string (map integer->char '(#b11000000 #b10100111))))

This is an invalid (overlong) UTF-8 encoding of the ' character. Now a program could perform a check like this to make sure a user supplied string doesn't contain any quotes:

(unless (utf8-string-contains evil-quote "'") ...)

And then go ahead and write it character by character like this:

(utf8-string-for-each display evil-quote)

Which would produce the actual ' character. The same is true for any other procedure that produces characters from strings, e.g. string-ref, string->list, etc.

Any other invalid byte sequence (such as stray continuation bytes) is also silently accepted.

I'm not entirely sure what would be the wisest way to handle this. We could have these procedures signal an error or just mention this behavior in the documentation so that people know to perform validation on untrusted inputs.

#480 fixed utf8 egg noop->void in experimental branch. Alan Post
Description

The utf8 will not compile in the experimental branch of chicken, as thy symbol noop is used. noop is deprecated and should be replaced with void.

Here is the build log against experimental showing the compilation failure:

http://pestilenz.org/~ckeen/salmonella-report/2011-01-14/utf8.html

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Note: See TracQuery for help on using queries.