Opened 13 years ago
Closed 4 years ago
#768 closed defect (fixed)
s11n: reconstruction of hash-tables incorrect on de-serialization
Reported by: | felix winkelmann | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | |
Component: | extensions | Version: | 4.7.x |
Keywords: | s11n | Cc: | |
Estimated difficulty: | hard |
Description
As pointed out by sjamaan, the reconstruction of hash-tables on deserialization is incorrect and creates a hash-table record that is missing several slots.
Change History (11)
comment:1 Changed 8 years ago by
Estimated difficulty: | → hard |
---|
comment:2 Changed 6 years ago by
comment:4 Changed 4 years ago by
(Note to self as much as anything else)
The fact that the order is sometimes different isn't weird, running the same script without the round trip through the file exhibits that behavior as well. The problem is the missing slot.
comment:5 Changed 4 years ago by
I've also captured a file that exhibited the (foo . bar) (ht . #<hash-table (1)>))
behavior, but looking inside it was exactly the same as the other order.
In other words: ht foo-bar vs foo-bar ht has the same serialization, the order-difference arises when creating the hash-table during deserialization. Which is fine.
Whereas the serializations that only exhibit ht does have foo-bar stored in them, but it is as back-references to the inner hash-table. These files can't be deserialized properly no matter how often I try. Whether this is a bug in serialization or deserialization is TBD.
comment:6 Changed 4 years ago by
Hash-tables are stored as alists preceded by a special tag (which is implemented as a specific byte, decimal 15 referred to as hash-table-tag).
comment:7 Changed 4 years ago by
OK, so the "inner refence" thing was a complete red herring because even when the hash tables use different symbols, the same problem can happen. Good to know! Investigation continues.
comment:8 Changed 4 years ago by
I have fixed the bug: https://ellen.idiomdrottning.org/s11n
I've changed deserialize
and chicken-dump
while I've left serialize
alone.
Here is what was wrong:
srfi-69 hash-tables in chicken 5 have eleven slots, more than I guess previous versions must've had because the deserializer stops fetching after just fetching the test & the hash function. Leading the *next* fetcher to fetch things that were actually emitted as part of the hash-table. It was not just hash-tables in hash-tables that was borked, it was hash-tables in lists too.
Slot 5 is the min-load value but the deserializer believes that it is about to read the cdr of the linked-list that the hash-table is inside. So it sets the cdr to 0.5 or whatever, terminating the list prematurely, losing the other values that was in the list.
(list 'blue 'jeans ht1 'white 't-shirt)
was also borked, it'd lose the white t-shirt because it thought cdr was 0.5 instead of the remaining links to the list.
comment:9 Changed 4 years ago by
About backwards compatibility: deserialize and chicken-dump in my version now expect eleven slots whereas they used to expect five. "So why not change the tag to new-hash-table-tag
or similar?"
The problem is that serialize itself was (and still is) implemented as "emit all the slots of hash-table", not just the first five. So for many many years, serialize has emitted eleven-slot hash-tables already while deserialize/chicken-dump has tried to snarf only five. Backwards compatibility was *already* broken.
comment:11 Changed 4 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
I've applied the patch and tagged the egg as 0.9.12 - many thanks!
Here's a simple example to illustrate the problem:
Maybe I was not creative enough, but I could only reproduce this behavior with nested hash-tables.