Opened 8 years ago
Last modified 7 years ago
#1293 new defect
eq?-hash tables calculcate different value for objects after they're mutated and lose locatives after GC
Reported by: | sjamaan | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | someday |
Component: | core libraries | Version: | 4.10.x |
Keywords: | srfi-69, hash tables, locatives | Cc: | |
Estimated difficulty: | hard |
Description (last modified by )
If you mutate a pair, it's still eq?
to itself, but the hash-by-identity
/ eq?-hash
doesn't calculate the same hash anymore, which means the hash table cannot find the key, unless the randomization factor happens to be such that it finds the same bucket, of course, but that's highly unlikely for simple cases like the one below:
#;1> (use srfi-69) ; loading /home/sjamaan/chickens/4.11.0/lib/chicken/8/srfi-69.import.so ... ; loading library srfi-69 ... #;2> (define x (cons 1 2)) #;3> (hash-by-identity x) 380849329 #;4> (set-car! x 3) #;5> (hash-by-identity x) 380718257 #;6> (= #3 #5) #f #;7>
Originally pointed out by John Croisant:
;; Even though the filler slots are not used, if you remove any of ;; them, the hash of the instance will change after GC, causing the ;; hash table lookup to fail. (use srfi-69 lolevel) (define table (make-hash-table test: eq? hash: eq?-hash)) (define-record-type box (make-box contents) box? ;; Remove or comment out any of the next three lines: (filler1 box-filler1) (filler2 box-filler2) (filler3 box-filler3) (contents box-contents)) (define my-box (make-box (make-locative "foo"))) (hash-table-set! table my-box #t) (printf "before gc, hash table contains ~S? ~S~N" my-box (hash-table-exists? table my-box)) (gc) (printf "after gc, hash table contains ~S? ~S~N" my-box (hash-table-exists? table my-box)) (printf "hash table as alist: ~S~N" (hash-table->alist table))
Attachments (1)
Change History (12)
comment:2 Changed 8 years ago by
It looks like the real problem is that eq?
hash tables don't work as you'd expect: If you mutate an object, it's still eq?
to its previous incarnation (if there were such a thing), but the hash function calculates a different hash:
#;1> (use srfi-69) ; loading /home/sjamaan/chickens/4.11.0/lib/chicken/8/srfi-69.import.so ... ; loading library srfi-69 ... #;2> (define x (cons 1 2)) #;3> (hash-by-identity x) 380849329 #;4> (set-car! x 3) #;5> (hash-by-identity x) 380718257 #;6> (= #3 #5) #f #;7>
This is completely broken, and it seems that the reference implementation has the exact same bug in hash-by-identity
.
Racket and Guile seem to do this correctly, even mutation of a pair doesn't result in a new hash value.
I have no idea how to fix this, as of yet...
Changed 8 years ago by
Attachment: | srfi-69-testcases.patch added |
---|
A patch for the srfi-69 test suite to detect these problems
comment:3 Changed 8 years ago by
Priority: | major → critical |
---|---|
Summary: | Locatives get moved around causing hash tables to lose them → eq?-hash tables calculcate different value for objects after they're mutated and lose locatives after GC |
comment:4 Changed 8 years ago by
Description: | modified (diff) |
---|
comment:5 Changed 8 years ago by
Fixing locatives for equal?
hash tables is relatively easy: just special case it and use locative->object
to retrieve the object and recur on it, instead of taking the raw pointer in the "special" slot.
comment:6 Changed 8 years ago by
Actually, disregard that last comment: if an object is eq?
, it must also be equal?
, so the problem with mutation of the keys exists just as much for equal?-hash
as for eq?-hash
.
comment:7 Changed 8 years ago by
One (very) ugly way to "fix" this is to change hash-by-identity
: It can simply hash the object's pointer, and record it in a side table along with the current pointer. This means that every object that ever was hashed has an entry in that table.
Whenever the object is moved, update the current pointer. Then, if the object is looked up again we can obtain the originally assigned hash value. On major GC, walk the table and remove all the keys that no longer exist.
It's ugly and stupid and takes up a lot of unnecessary memory. I really don't like it, but we could do this in a generic/reusable way and also use it in core's symbol table, for example.
comment:8 Changed 8 years ago by
Never mind, that won't work because then you get O(n)
behaviour over all items ever hashed (across all tables!), which is exactly what you don't want when choosing a hash table!
comment:9 Changed 8 years ago by
Estimated difficulty: | → insane |
---|
comment:10 Changed 8 years ago by
Milestone: | 4.12.0 → someday |
---|
comment:11 Changed 7 years ago by
Estimated difficulty: | insane → hard |
---|
The Right Thing here is to use a pointer as the eq-hash, and then modify the major GC (which has a conspectus of all of memory) so that it rehashes all the eq-hash-tables at once. This is hard, but not insanely hard.
Alternatively, simply declare that anyone who mutates a key deserves to lose, even in eq-hash-tables.
Context from IRC:
<jacius> I am having the weirdest bug. If I erase an unused slot from the source code of a record type, it greatly changes the behavior of my app.
<jacius> No code is referencing the slot, and I can rename the slot with no effect. But if I delete it from the code things break.
<jacius> Also if I change the order of the slots, things break.
(later)
<jacius> Okay, I think I have narrowed down my weird bug. I think it's related to hashing behavior for records and locatives.
<jacius> I think that if a record holds a locative in any of its first three slots, the record's hash will change after GC.
<jacius> But not if the locative is in the 4th or later slots.
<jacius> I think this also applies recursively for records that hold records that hold locatives.
<jacius> So, I think my app was breaking because I was using a record as a key in a hash table, but the record's hash was changing after GC. Whee.
<evhan> jacius: Wow, that's a pretty nice bug.
<evhan> "Good work." :)
<jacius> That's my theory, anyway. Making a minimal test case to see if I am right.
<jacius> I am surprised that the result of eq?-hash depends on the record's contents
<evhan> I think eq?-hash should just use the object's address.
<jacius> That's what I would think too
<jacius> Oh
<jacius> So maybe the record itself is moving during GC, so its address changes?
<jacius> No, that can't be it
<jacius> Because it only changes if one of the first three slots holds a locative