id summary reporter owner description type status priority milestone component version resolution keywords cc difficulty 1182 utf8 egg silently accepts invalid byte sequences Moritz Heidkamp Alex Shinn "I noticed that some procedures of the `utf8` egg silently accept invalid byte sequences. This might have some safety implications, e.g. consider this case (the procedures used are the core versions, procedures from the `utf8` egg are prefixed with `utf8-` in the following code snippets): {{{ (define evil-quote (list->string (map integer->char '(#b11000000 #b10100111)))) }}} This is an invalid (overlong) UTF-8 encoding of the `'` character. Now a program could perform a check like this to make sure a user supplied string doesn't contain any quotes: {{{ (unless (utf8-string-contains evil-quote ""'"") ...) }}} And then go ahead and write it character by character like this: {{{ (utf8-string-for-each display evil-quote) }}} Which would produce the actual `'` character. The same is true for any other procedure that produces characters from strings, e.g. `string-ref`, `string->list`, etc. Any other invalid byte sequence (such as stray continuation bytes) is also silently accepted. I'm not entirely sure what would be the wisest way to handle this. We could have these procedures signal an error or just mention this behavior in the documentation so that people know to perform validation on untrusted inputs." enhancement closed major someday extensions 4.9.x fixed utf8