Opened 12 years ago

Last modified 12 years ago

#905 closed defect

Unreliable behavior of hash-table-copy with symbols as keys (regression wrt 4.7.4) — at Version 9

Reported by: Ivan Raikov Owned by: sjamaan
Priority: critical Milestone: 4.9.0
Component: core libraries Version: 4.8.x
Keywords: Cc:
Estimated difficulty:

Description (last modified by Ivan Raikov)

The nemo program relies heavily on performing transformations over hash tables with symbols as keys. It works fine with Chicken 4.7.4, but unfortunately there seems to have been a regression in 4.8.0rc1, which results in hash-table-exists? and hash-table-ref failing to find any of the existing keys in the table after hash-table-copy has been performed.
The following code reproduces the issue:

(import scheme chicken)
(require-extension srfi-69 srfi-1)

(define t (make-hash-table hash: symbol-hash))

(hash-table-set! t 'k1 1)

(define prefix (gensym "test"))
	
  
(for-each
 (lambda (k v)
   ;;    (printf "hash-table-set! ~A ~A~%" k v)
   (hash-table-set! t k v))
 (list-tabulate 10 (lambda (i) (string->symbol (string-append (symbol->string prefix) ":" (symbol->string (gensym "k"))))))
 (list-tabulate 10 (lambda (i) i))
 )

(print (hash-table->alist t))
;; hash-table-ref works as expected
(print (hash-table-ref t 'k1))
(print (hash-table-ref t 'test17:k29))

(define t2 (hash-table-copy t))

(print (hash-table->alist t2))

;; hash-table-ref fails
(print (hash-table-ref t2 'k1))
(print (hash-table-ref t2 'test17:k29))

Perhaps the hash tables stop working after some garbage collections have been performed. This issue is not entirely new; nemo previously used the environments egg, which was also exhibiting similar unreliable behavior, so I had resorted to using strings instead of symbols as keys. Any advice on how to debug this would be appreciated.

Change History (9)

comment:1 Changed 12 years ago by Ivan Raikov

Description: modified (diff)

comment:2 Changed 12 years ago by Ivan Raikov

Description: modified (diff)

comment:3 Changed 12 years ago by sjamaan

Hi Ivan,

Perhaps I'm missing something, but you're using gensyms here. Those are uninterned, which means that any gensym'ed symbol is not {{eq?}} to a user-inserted symbol with the same printed representation. See this slightly modified version of your code:

(use srfi-1 srfi-69)

(define t (make-hash-table hash: symbol-hash))

(hash-table-set! t 'k1 1)
(hash-table-ref t 'k1)

(for-each
  (lambda (k v)
    (printf "hash-table-set! ~A ~A~%" k v)
    (hash-table-set! t k v)
    (when (= v 9) ;; modification starts here
      (printf "Referencing ~A => ~A~%" k (hash-table-ref t k))
      (printf "Referencing ~A => ~A~%" 'k15 (hash-table-ref t 'k15))))
  (list-tabulate 10 (lambda (i) (gensym 'k)))
  (list-tabulate 10 (lambda (i) i)))

(print (hash-table->alist t))
(print (hash-table-ref t 'k15))

For me, this prints:

hash-table-set! k24 0
hash-table-set! k23 1
hash-table-set! k22 2
hash-table-set! k21 3
hash-table-set! k20 4
hash-table-set! k19 5
hash-table-set! k18 6
hash-table-set! k17 7
hash-table-set! k16 8
hash-table-set! k15 9
Referencing k15 => 9

Error: (hash-table-ref) hash-table does not contain key
k15
#<hash-table (11)>

This is exactly as expected. Are you using gensyms in your actual program?

comment:4 Changed 12 years ago by sjamaan

(I also get the same output with 4.7.0, by the way. If 4.7.4 behaved differently, it must've had some weird bug)

comment:5 Changed 12 years ago by felix winkelmann

Owner: set to sjamaan
Status: newassigned

comment:6 Changed 12 years ago by felix winkelmann

Ivan's example can't work: the default test function is equal?, but the keys in this case are gensyms and so never match, unless the exactly same gensym is given to hash-table-ref.

comment:7 in reply to:  3 ; Changed 12 years ago by Ivan Raikov

Hi Peter,

I must admit that I am ignorant of what "uninterned symbols" are, and neither the Chicken documentation nor the R5RS document explain this very well. R5RS has an exceptionally vague paragraph about "some implementations" that support or use uninterned symbols, and while the Chicken manual does list gensym under the section "Generating uninterned symbols", it does not specify what an uninterned symbol is. If indeed eq? does not hold for these symbols, I think such an important caveat needs to be featured prominently in the documentation. Would you be willing to add this to the manual?

And yes, unfortunately my example was a bit too simplified and does fail on 4.7.4 as well. However, I have just done a number of clean compilations of nemo with Chicken 4.7.4 and 4.8.0rc2 and can again confirm that 4.8.0rc2 hash tables with symbols as keys fail to find any of their elements after the tables are populated and several lookups are performed.

The symbols in nemo can be created as constants (e.g. '* '+ '/ for the basic arithmetic operators), by the expression parser (which uses string->symbol) or by the scope resolution machinery, which uses constructs of the form (string->symbol (string-append component-name ":" (symbol->string var-id)) where component-name is produced with gensym and var-id is produced with string->symbol. Hash table look ups can fail on any of these symbols, it seems to largely depend on the machine where the program is run. So this is indeed a tricky bug and any advice on how to track it down would be appreciated.

Replying to sjamaan:

Hi Ivan,

Perhaps I'm missing something, but you're using gensyms here. Those are uninterned, which means that any gensym'ed symbol is not {{eq?}} to a user-inserted symbol with the same printed representation. See this slightly modified version of your code:

(use srfi-1 srfi-69)

(define t (make-hash-table hash: symbol-hash))

(hash-table-set! t 'k1 1)
(hash-table-ref t 'k1)

(for-each
  (lambda (k v)
    (printf "hash-table-set! ~A ~A~%" k v)
    (hash-table-set! t k v)
    (when (= v 9) ;; modification starts here
      (printf "Referencing ~A => ~A~%" k (hash-table-ref t k))
      (printf "Referencing ~A => ~A~%" 'k15 (hash-table-ref t 'k15))))
  (list-tabulate 10 (lambda (i) (gensym 'k)))
  (list-tabulate 10 (lambda (i) i)))

(print (hash-table->alist t))
(print (hash-table-ref t 'k15))

For me, this prints:

hash-table-set! k24 0
hash-table-set! k23 1
hash-table-set! k22 2
hash-table-set! k21 3
hash-table-set! k20 4
hash-table-set! k19 5
hash-table-set! k18 6
hash-table-set! k17 7
hash-table-set! k16 8
hash-table-set! k15 9
Referencing k15 => 9

Error: (hash-table-ref) hash-table does not contain key
k15
#<hash-table (11)>

This is exactly as expected. Are you using gensyms in your actual program?

comment:8 in reply to:  7 Changed 12 years ago by felix winkelmann

I must admit that I am ignorant of what "uninterned symbols" are, and neither the Chicken documentation nor the R5RS document explain this very well. R5RS has an exceptionally vague paragraph about "some implementations" that support or use uninterned symbols, and while the Chicken manual does list gensym under the section "Generating uninterned symbols", it does not specify what an uninterned symbol is. If indeed eq? does not hold for these symbols, I think such an important caveat needs to be featured prominently in the documentation. Would you be willing to add this to the manual?

That makes sense. You can create a ticket for that, if you like.

The symbols in nemo can be created as constants (e.g. '* '+ '/ for the basic arithmetic operators), by the expression parser (which uses string->symbol) or by the scope resolution machinery, which uses constructs of the form (string->symbol (string-append component-name ":" (symbol->string var-id)) where component-name is produced with gensym and var-id is produced with string->symbol. Hash table look ups can fail on any of these symbols, it seems to largely depend on the machine where the program is run. So this is indeed a tricky bug and any advice on how to track it down would be appreciated.

Is "component-name" a symbol or a symbol converted to a string in the code above (the one that uses string-append)?

Is it possible that there is a problem with re-hashing (the process of enlarging the internal hash-table bucket-sequence if the capacity is exhausted)? Ivan: can you try creating your hash-tables with min-load/max-load settings that are large enough to avoid any rehashing and check whether this makes a difference?

Is there an example use of nemo that can be used to reproduce the problem?

comment:9 Changed 12 years ago by Ivan Raikov

Description: modified (diff)
Summary: Unreliable behavior of hash tables with symbols as keys (regression wrt 4.7.4)Unreliable behavior of hash-table-copy with symbols as keys (regression wrt 4.7.4)
Note: See TracTickets for help on using tickets.