id,summary,reporter,owner,description,type,status,priority,milestone,component,version,resolution,keywords,cc,difficulty 1805,`html->sxml` with escaped quotes breaks text into multiple nodes,Jeremy Steward,Alex Shinn,"There's some weirdness with escaping quotes in text when using `html->sxml`. Perhaps a short example would be sufficient to explain the problem I'm encountering: {{{ (html->sxml ""

foo'bar"baz

"") ;=> (*TOP* (p ""foo"" ""'"" ""bar"" ""\"""" ""baz"")) }}} As a counter-example, I'll use the [https://wiki.call-cc.org/eggref/5/ssax: ssax egg]: {{{ (call-with-input-string ""

foo'bar"baz

"") ;=> (*TOP* (p ""foo'bar\""baz"")) }}} I guess fundamentally it's a question of whether there should be one text node or not. I would argue that in this particular case, it should be a single node. I have been using html-parser to try and scrape some web pages, and this is extremely unexpected! Especially so if one uses `txpath` / `sxpath` on the final result, as `//p/text()` queries will not necessarily behave as expected. You would have to `(apply string-append ((txpath ""//p/text()"") sxml))` to the result to get the full contents of the text. Is there a rationale for this, or is that some kind of limitation of the parser? I know that tags may also contain sub-tags in HTML, but I'm not sure a new node should be made if a tag's contents are not HTML tags themselves.",defect,assigned,minor,someday,extensions,5.3.0,,,,