Opened 10 years ago

Closed 8 years ago

#1173 closed defect (fixed)

Symbols held in GC-roots are sometimes collected in "symbol-gc" mode

Reported by: felix winkelmann Owned by:
Priority: major Milestone: 5.0
Component: core libraries Version: 4.9.x
Keywords: Cc:
Estimated difficulty: hard

Description

(Initially reported by sjamaan)

If a symbol is stored in a GC-root, then it may be collected when symbol-GC is enabled (with the -:w runtime option). Here a simple test case:

(foreign-declare
 "static void *root;
  static void foo(C_word x)
  {
    root = CHICKEN_new_gc_root();
    CHICKEN_gc_root_set(root, x);
  }
  static C_word bar() { return CHICKEN_gc_root_ref(root); }")

(define foo (foreign-lambda void "foo" scheme-object))
(define bar (foreign-lambda scheme-object "bar"))

(gc #t)
(foo (string->symbol "FOO"))
(gc #t)
(print (eq? (string->symbol "FOO") (bar)))

Without -:w, this will print #t, with -:w it will print #f (sometimes, not always. of course)

Attachments (4)

working-symbol-gc.patch (14.6 KB) - added by sjamaan 8 years ago.
A working, but slow fix
okay-working-symbol-gc.patch (16.8 KB) - added by sjamaan 8 years ago.
Faster working symbol GC
fast-working-symbol-gc.patch (17.5 KB) - added by sjamaan 8 years ago.
Working symbol GC that's as fast as the original
simpler-fix.patch (1.5 KB) - added by sjamaan 8 years ago.
A noninvasive patch that works with the current weak table code

Download all attachments as: .zip

Change History (19)

comment:1 Changed 9 years ago by sjamaan

Summary: Symbols help in GC-roots are sometimes collected in "symbol-gc" modeSymbols held in GC-roots are sometimes collected in "symbol-gc" mode

comment:2 Changed 9 years ago by sjamaan

Milestone: someday5.0

I'd like to fix this for CHICKEN 5 (symbol GC should be the default, IMO)

comment:3 Changed 9 years ago by sjamaan

I think this has nothing to do with GC roots in particular. The symbol GC is throwing away symbols too eagerly:

$ csi -e '(gc #t) (use posix)' # all good
$ csi -:w -e '(gc #f) (use posix)' # still good
$ csi -:w -e '(gc #t) (use posix)' # what the hell

Error: (require) cannot load extension: posix

        Call history:

        <syntax>          (gc #t)
        <eval>    (gc #t)
        <syntax>          (use posix)
        <syntax>          (##core#require-extension (posix) #t)
        <syntax>          (##core#begin (##core#begin (##sys#require (quote posix)) (import posix)) (##core#undefined))
        <syntax>          (##core#begin (##sys#require (quote posix)) (import posix))
        <syntax>          (##sys#require (quote posix))
        <syntax>          (quote posix)
        <syntax>          (##core#quote posix)
        <syntax>          (import posix)
        <syntax>          (##core#undefined)
        <syntax>          (##core#undefined)
        <eval>    (##sys#require (quote posix)) <--

comment:4 Changed 9 years ago by LemonBoy

After some brief experimentation I think that the root cause is the fact that the symbol-gc greedily wipes all the symbols from the table, including the static ones that hold the names for the exported symbols.
Back to this specific case, ##sys#load-extension is failing because an eq? comparison doesn't hold anymore (I think it was one of the memq, sadly I didn't write any note) and end up printing the error above.

As a possible fix I could devise two possible scenarios:

  • Make the static symbols have a ∞ reference count (so that now the counter has 4 possible states: 0, 1, 1+, ∞)
  • Avoid static symbols when evicting the symbols, you can recognize that a symbol's static if the slot-1 string resides outside the chicken heap.

I did implement the latter since it was easier and fixed this particular crash, I didn't test it much though.

comment:5 Changed 9 years ago by sjamaan

It's true that eq? would fail, but that will only happen if the symbol is re-interned while there's still some reference to it elsewhere. This can be fixed by skipping symbols that have a global binding or a plist attached to them, I think.

comment:6 Changed 9 years ago by sjamaan

Special-casing static symbols might be helpful to avoid allocating memory for symbol strings that are already malloced, though. But it wouldn't be an actual fix for the current situation.

comment:7 Changed 9 years ago by sjamaan

BTW, statically allocated objects can be detected via C_permanentp.

comment:8 Changed 8 years ago by sjamaan

Grr, this program no longer fails under CHICKEN 5. We didn't actually fix the bug AFAIK, so this is a coincidence.

Changed 8 years ago by sjamaan

Attachment: working-symbol-gc.patch added

A working, but slow fix

comment:9 Changed 8 years ago by sjamaan

The reason the original version doesn't work seems to be because the weak table's displacement code might run out of space, and simply not insert the symbol in the table.

The fixed version dispenses with the weak table, and instead changes buckets to weakly refer to their symbol (by using C_SPECIALBLOCK_BIT). After a GC, it scans the symbol table, and updates the weak references if their targets are forwarded. If the result isn't in the new heap or permanent, it can be discarded.

This implementation is rather slow, because I had to move the scanning of the symbol table to be done even for minor GCs, because symbols can be purely stack-allocated and unreferenced by anything except the symbol table, yet still need to be preserved: this happens when the symbol gets a global definition or a property list.

We can speed this up by changing the code for set! and putprop to "touch" the symbol table, so these symbols are put on the mutation stack, for example. Alternatively, we could make a separate stack of "symbols to persist", but that's a bit ugly.

If we add this "touch symbol" code, it probably also needs to remove the C_SPECIALBLOCK_BIT on the bucket containing the symbol. That way, the symbol is preserved on the next major GC even if there's nothing referencing it. Then, the original symbol table scanning code in the GC can be restored again, which means the performance should be the same as it originally was.

Last edited 8 years ago by sjamaan (previous) (diff)

comment:10 Changed 8 years ago by sjamaan

A nice additional advantage of this new approach is that it no longer needs the bucket type to be distinct from regular pairs. I think we can get rid of this type tag; probably only in CHICKEN 5, because it's, strictly speaking, a backwards incompatibility.

Changed 8 years ago by sjamaan

Faster working symbol GC

comment:11 Changed 8 years ago by sjamaan

This version applies the ideas I mentioned, and it results in exactly the same performance as with regular CHICKEN when not collecting symbols.

Unfortunately, collecting weak symbols is very slow. The knucleotide benchmark is five times slower than with the old CHICKEN, when passing -:w. Of course, this can be explained due to the fact that the old CHICKEN only reclaimed a few hundred symbols in major GCs on that benchmark while CHICKEN with patch reclaims several hundred thousand symbols.

comment:12 Changed 8 years ago by sjamaan

hm, if I change rescan in C_reclaim like this, it's very fast, because it skips symbol GC like the original code did:

if(n > 0 && (h & C_BYTEBLOCK_BIT) == 0) {
  if(h & C_SPECIALBLOCK_BIT) {
    /* Minor GC needs to be fast; always mark weakly held buckets */
    if (gc_mode != GC_MINOR || h != C_WEAK_BUCKET_TAG) {
      --n;
      ++p;
    }
  }

  while(n--) mark(p++);
}

I'm not sure how much I like this: the loop is supposed to be as fast as possible, and adding another (highly specific) check in there seems a bit ugly.

WEAK_BUCKET_TAG is defined as (C_BUCKET_TYPE | C_SPECIALBLOCK_BIT | (C_SIZEOF_BUCKET - 1)) of course. We could still define it as (C_PAIR_TYPE | C_SPECIALBLOCK_BIT | (C_SIZEOF_PAIR - 1)) to get rid of the bucket type.

It would be nice if we can make this behave more generically. Then we can make locatives work exactly like this, as well, and theoretically we could add other weak types too.

Changed 8 years ago by sjamaan

Working symbol GC that's as fast as the original

comment:13 Changed 8 years ago by sjamaan

OK, my original analysis about the hash table was (of course) completely wrong.

It looks like the problem with the original implementation is that we may encounter both the symbol, its bucket container and another item that refers to the symbol during minor GC. These are all copied to the fromspace. Nothing happens to the weak table because it's a minor GC. We scan over the bucket, and update the forwarding pointer to the symbol, so it points to the symbol in the fromspace.

Then, a major GC kicks in because fromspace is full.

We first copy the bucket to tospace. This enters it into the weak table with counter 0, using as item pointer the symbol's location in fromspace. Then, the symbol itself is copied and the counter is set to 1. Then, we encounter the other item that refers to the symbol. However, it still refers to the stack location of the symbol, which holds a forwarding pointer. The forwarding pointer gets updated, but the symbol's counter doesn't get incremented.

I think a less invasive patch might be possible if we check the weak table for an entry for the forwarded location while chasing forwarding pointers, to handle this case.

Changed 8 years ago by sjamaan

Attachment: simpler-fix.patch added

A noninvasive patch that works with the current weak table code

comment:14 Changed 8 years ago by sjamaan

Estimated difficulty: hard

Patch sent to the mailinglist

comment:15 Changed 8 years ago by sjamaan

Resolution: fixed
Status: newclosed

The bug itself is now fixed by 2150ad5 / b6c9d81.

Will polish the other patch later and send it, but it's an improvement, not a bugfix.

Note: See TracTickets for help on using tickets.