﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	difficulty
1805	`html->sxml` with escaped quotes breaks text into multiple nodes	Jeremy Steward	Alex Shinn	"There's some weirdness with escaping quotes in text when using `html->sxml`. Perhaps a short example would be sufficient to explain the problem I'm encountering:

{{{
(html->sxml ""<p>foo&apos;bar&quot;baz</p>"") ;=> (*TOP* (p ""foo"" ""'"" ""bar"" ""\"""" ""baz""))
}}}

As a counter-example, I'll use the [https://wiki.call-cc.org/eggref/5/ssax: ssax egg]:

{{{
(call-with-input-string ""<p>foo&apos;bar&quot;baz</p>"") ;=> (*TOP* (p ""foo'bar\""baz""))
}}}

I guess fundamentally it's a question of whether there should be one text node or not. I would argue that in this particular case, it should be a single node. I have been using html-parser to try and scrape some web pages, and this is extremely unexpected! Especially so if one uses `txpath` / `sxpath` on the final result, as `//p/text()` queries will not necessarily behave as expected. You would have to `(apply string-append ((txpath ""//p/text()"") sxml))` to the result to get the full contents of the text.

Is there a rationale for this, or is that some kind of limitation of the parser? I know that tags may also contain sub-tags in HTML, but I'm not sure a new node should be made if a tag's contents are not HTML tags themselves."	defect	assigned	minor	someday	extensions	5.3.0				
