id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	difficulty
1182	utf8 egg silently accepts invalid byte sequences	Moritz Heidkamp	Alex Shinn	"I noticed that some procedures of the `utf8` egg silently accept invalid byte sequences. This might have some safety implications, e.g. consider this case (the procedures used are the core versions, procedures from the `utf8` egg are prefixed with `utf8-` in the following code snippets):

{{{
(define evil-quote
  (list->string (map integer->char '(#b11000000 #b10100111))))
}}}

This is an invalid (overlong) UTF-8 encoding of the `'` character. Now a program could perform a check like this to make sure a user supplied string doesn't contain any quotes:

{{{
(unless (utf8-string-contains evil-quote ""'"") ...)
}}}

And then go ahead and write it character by character like this:

{{{
(utf8-string-for-each display evil-quote)
}}}

Which would produce the actual `'` character. The same is true for any other procedure that produces characters from strings, e.g. `string-ref`, `string->list`, etc.

Any other invalid byte sequence (such as stray continuation bytes) is also silently accepted.

I'm not entirely sure what would be the wisest way to handle this. We could have these procedures signal an error or just mention this behavior in the documentation so that people know to perform validation on untrusted inputs."	enhancement	closed	major	someday	extensions	4.9.x	fixed	utf8