Opened 4 years ago

Last modified 4 months ago

#1077 new defect

Symbols containing newlines don't get quoted by write

Reported by: sjamaan Owned by: sjamaan
Priority: major Milestone: 5.1
Component: core libraries Version: 4.8.x
Keywords: Cc:
Estimated difficulty: insane

Description

(write '|\n|)

This should print |\n| but just prints a newline, breaking read/write invariance.

Reported by zenspider on IRC.

Attachments (1)

0001-Do-not-use-a-private-namespace-for-the-csi-program.patch (3.1 KB) - added by sjamaan 4 years ago.
Remove private namespace for csi

Download all attachments as: .zip

Change History (10)

comment:1 Changed 4 years ago by evhan

I think this is a consequence of the way qualified symbols are encoded,
and affects all symbols whose first byte is less than 32 (please someone
correct me if any of the following is wrong).

Any symbol whose name has a leading byte under 32 is considered
qualified, with that byte specifying the length of the namespace part of
the ensuing string. Obviously, this is invalid for '|\n|, so when
it's handled as a qualified symbol in ##sys#print after satisfying
##sys#qualified-symbol? (library.scm:3357),
##sys#symbol->qualified-string detects this invalid length, falls
back to simply returning the symbol's string value without any
qualification, and we get a lone newline printed out as the result.

You can see what would happen were 10 a valid length by extending the
symbol, e.g. '|\naaaaaaaaaaa| => ##aaaaaaaaaa#a.

All that said, I'm not really sure what to do about this. We could make
##sys#qualified-symbol? check whether its argument has a valid
namespace length so its behavior at least matches that of split
(library.scm:1184) and the procedures defined over it, but that leaves
things dependent on the length of the symbol (e.g. the difference
between '|\n| and '|\naaaaaaaaaaa| above) so it isn't a
great option. We could drop namespace-max-id-len so that symbols
can begin with the more commonly-used values under 32 (\n,
\t, etc.), but even if we dropped it to something quite low we'd
still have problems with e.g. '|\x03|, and longer namespaces like
##compiler# might have to change, so that's also not really an
option either. We could... I don't know. Hopefully I'm missing a really
obvious fix.

Thoughts?

comment:2 Changed 4 years ago by sjamaan

Thanks for your thorough analysis, Evan. I'm not sure what you mean by "longer namespaces like
##compiler# might have to change", though. Could you elaborate?

I was thinking we should just prefix them with ##compiler#. Do you expect problems with that?

comment:3 Changed 4 years ago by evhan

I only meant that if we dropped namespace-max-id-len to 3, for example, the ##compiler# namespace would be too long ("compiler" length 8) and identifiers starting with it would fail to be recognized as qualified symbols (when read by r-ext-symbol). I did try that out of curiosity and things exploded, though I didn't keep looking to see how badly.

Changed 4 years ago by sjamaan

Remove private namespace for csi

comment:4 Changed 4 years ago by sjamaan

This first patch is an easy one, but it makes it easier and more self-contained to remove the private namespace from the compiler itself. It removes the private namespace from the "csi" program - it is compiled separately and we can use the regular (declare (hide ...)) to hide any private variables that user code is not supposed to see.

comment:5 Changed 4 years ago by sjamaan

  • Milestone changed from 4.9.0 to 4.10.0

Let's postpone to 4.10.0; it's not a blocker

comment:6 Changed 2 years ago by sjamaan

  • Milestone changed from 4.10.0 to 5.0

This is closely related to #1131, which we'll fix somewhere in CHICKEN 5.

comment:7 Changed 15 months ago by sjamaan

  • Estimated difficulty set to insane

comment:8 Changed 8 months ago by sjamaan

  • Milestone changed from 5.0 to 5.1

We're making headways with this by properly modularising the core system, but this won't get finished for 5.0 (maybe not even 5.1, but one can dream).

comment:9 Changed 4 months ago by sjamaan

Looks like keywords also fall somewhere in here: any symbol that starts with \x00 gets written as a keyword.

Note: See TracTickets for help on using tickets.