Opened 3 years ago

Closed 15 months ago

#1173 closed defect (fixed)

Symbols held in GC-roots are sometimes collected in "symbol-gc" mode

Reported by: felix Owned by:
Priority: major Milestone: 5.0
Component: core libraries Version: 4.9.x
Keywords: Cc:
Estimated difficulty: hard

Description

(Initially reported by sjamaan)

If a symbol is stored in a GC-root, then it may be collected when symbol-GC is enabled (with the -:w runtime option). Here a simple test case:

(foreign-declare
 "static void *root;
  static void foo(C_word x)
  {
    root = CHICKEN_new_gc_root();
    CHICKEN_gc_root_set(root, x);
  }
  static C_word bar() { return CHICKEN_gc_root_ref(root); }")

(define foo (foreign-lambda void "foo" scheme-object))
(define bar (foreign-lambda scheme-object "bar"))

(gc #t)
(foo (string->symbol "FOO"))
(gc #t)
(print (eq? (string->symbol "FOO") (bar)))

Without -:w, this will print #t, with -:w it will print #f (sometimes, not always. of course)

Attachments (4)

working-symbol-gc.patch (14.6 KB) - added by sjamaan 15 months ago.
A working, but slow fix
okay-working-symbol-gc.patch (16.8 KB) - added by sjamaan 15 months ago.
Faster working symbol GC
fast-working-symbol-gc.patch (17.5 KB) - added by sjamaan 15 months ago.
Working symbol GC that's as fast as the original
simpler-fix.patch (1.5 KB) - added by sjamaan 15 months ago.
A noninvasive patch that works with the current weak table code

Download all attachments as: .zip

Change History (19)

comment:1 Changed 2 years ago by sjamaan

  • Summary changed from Symbols help in GC-roots are sometimes collected in "symbol-gc" mode to Symbols held in GC-roots are sometimes collected in "symbol-gc" mode

comment:2 Changed 2 years ago by sjamaan

  • Milestone changed from someday to 5.0

I'd like to fix this for CHICKEN 5 (symbol GC should be the default, IMO)

comment:3 Changed 22 months ago by sjamaan

I think this has nothing to do with GC roots in particular. The symbol GC is throwing away symbols too eagerly:

$ csi -e '(gc #t) (use posix)' # all good
$ csi -:w -e '(gc #f) (use posix)' # still good
$ csi -:w -e '(gc #t) (use posix)' # what the hell

Error: (require) cannot load extension: posix

        Call history:

        <syntax>          (gc #t)
        <eval>    (gc #t)
        <syntax>          (use posix)
        <syntax>          (##core#require-extension (posix) #t)
        <syntax>          (##core#begin (##core#begin (##sys#require (quote posix)) (import posix)) (##core#undefined))
        <syntax>          (##core#begin (##sys#require (quote posix)) (import posix))
        <syntax>          (##sys#require (quote posix))
        <syntax>          (quote posix)
        <syntax>          (##core#quote posix)
        <syntax>          (import posix)
        <syntax>          (##core#undefined)
        <syntax>          (##core#undefined)
        <eval>    (##sys#require (quote posix)) <--

comment:4 Changed 17 months ago by LemonBoy

After some brief experimentation I think that the root cause is the fact that the symbol-gc greedily wipes all the symbols from the table, including the static ones that hold the names for the exported symbols.
Back to this specific case, ##sys#load-extension is failing because an eq? comparison doesn't hold anymore (I think it was one of the memq, sadly I didn't write any note) and end up printing the error above.

As a possible fix I could devise two possible scenarios:

  • Make the static symbols have a ∞ reference count (so that now the counter has 4 possible states: 0, 1, 1+, ∞)
  • Avoid static symbols when evicting the symbols, you can recognize that a symbol's static if the slot-1 string resides outside the chicken heap.

I did implement the latter since it was easier and fixed this particular crash, I didn't test it much though.

comment:5 Changed 17 months ago by sjamaan

It's true that eq? would fail, but that will only happen if the symbol is re-interned while there's still some reference to it elsewhere. This can be fixed by skipping symbols that have a global binding or a plist attached to them, I think.

comment:6 Changed 17 months ago by sjamaan

Special-casing static symbols might be helpful to avoid allocating memory for symbol strings that are already malloced, though. But it wouldn't be an actual fix for the current situation.

comment:7 Changed 17 months ago by sjamaan

BTW, statically allocated objects can be detected via C_permanentp.

comment:8 Changed 15 months ago by sjamaan

Grr, this program no longer fails under CHICKEN 5. We didn't actually fix the bug AFAIK, so this is a coincidence.

Changed 15 months ago by sjamaan

A working, but slow fix

comment:9 Changed 15 months ago by sjamaan

The reason the original version doesn't work seems to be because the weak table's displacement code might run out of space, and simply not insert the symbol in the table.

The fixed version dispenses with the weak table, and instead changes buckets to weakly refer to their symbol (by using C_SPECIALBLOCK_BIT). After a GC, it scans the symbol table, and updates the weak references if their targets are forwarded. If the result isn't in the new heap or permanent, it can be discarded.

This implementation is rather slow, because I had to move the scanning of the symbol table to be done even for minor GCs, because symbols can be purely stack-allocated and unreferenced by anything except the symbol table, yet still need to be preserved: this happens when the symbol gets a global definition or a property list.

We can speed this up by changing the code for set! and putprop to "touch" the symbol table, so these symbols are put on the mutation stack, for example. Alternatively, we could make a separate stack of "symbols to persist", but that's a bit ugly.

If we add this "touch symbol" code, it probably also needs to remove the C_SPECIALBLOCK_BIT on the bucket containing the symbol. That way, the symbol is preserved on the next major GC even if there's nothing referencing it. Then, the original symbol table scanning code in the GC can be restored again, which means the performance should be the same as it originally was.

Last edited 15 months ago by sjamaan (previous) (diff)

comment:10 Changed 15 months ago by sjamaan

A nice additional advantage of this new approach is that it no longer needs the bucket type to be distinct from regular pairs. I think we can get rid of this type tag; probably only in CHICKEN 5, because it's, strictly speaking, a backwards incompatibility.

Changed 15 months ago by sjamaan

Faster working symbol GC

comment:11 Changed 15 months ago by sjamaan

This version applies the ideas I mentioned, and it results in exactly the same performance as with regular CHICKEN when not collecting symbols.

Unfortunately, collecting weak symbols is very slow. The knucleotide benchmark is five times slower than with the old CHICKEN, when passing -:w. Of course, this can be explained due to the fact that the old CHICKEN only reclaimed a few hundred symbols in major GCs on that benchmark while CHICKEN with patch reclaims several hundred thousand symbols.

comment:12 Changed 15 months ago by sjamaan

hm, if I change rescan in C_reclaim like this, it's very fast, because it skips symbol GC like the original code did:

if(n > 0 && (h & C_BYTEBLOCK_BIT) == 0) {
  if(h & C_SPECIALBLOCK_BIT) {
    /* Minor GC needs to be fast; always mark weakly held buckets */
    if (gc_mode != GC_MINOR || h != C_WEAK_BUCKET_TAG) {
      --n;
      ++p;
    }
  }

  while(n--) mark(p++);
}

I'm not sure how much I like this: the loop is supposed to be as fast as possible, and adding another (highly specific) check in there seems a bit ugly.

WEAK_BUCKET_TAG is defined as (C_BUCKET_TYPE | C_SPECIALBLOCK_BIT | (C_SIZEOF_BUCKET - 1)) of course. We could still define it as (C_PAIR_TYPE | C_SPECIALBLOCK_BIT | (C_SIZEOF_PAIR - 1)) to get rid of the bucket type.

It would be nice if we can make this behave more generically. Then we can make locatives work exactly like this, as well, and theoretically we could add other weak types too.

Changed 15 months ago by sjamaan

Working symbol GC that's as fast as the original

comment:13 Changed 15 months ago by sjamaan

OK, my original analysis about the hash table was (of course) completely wrong.

It looks like the problem with the original implementation is that we may encounter both the symbol, its bucket container and another item that refers to the symbol during minor GC. These are all copied to the fromspace. Nothing happens to the weak table because it's a minor GC. We scan over the bucket, and update the forwarding pointer to the symbol, so it points to the symbol in the fromspace.

Then, a major GC kicks in because fromspace is full.

We first copy the bucket to tospace. This enters it into the weak table with counter 0, using as item pointer the symbol's location in fromspace. Then, the symbol itself is copied and the counter is set to 1. Then, we encounter the other item that refers to the symbol. However, it still refers to the stack location of the symbol, which holds a forwarding pointer. The forwarding pointer gets updated, but the symbol's counter doesn't get incremented.

I think a less invasive patch might be possible if we check the weak table for an entry for the forwarded location while chasing forwarding pointers, to handle this case.

Changed 15 months ago by sjamaan

A noninvasive patch that works with the current weak table code

comment:14 Changed 15 months ago by sjamaan

  • Estimated difficulty set to hard

Patch sent to the mailinglist

comment:15 Changed 15 months ago by sjamaan

  • Resolution set to fixed
  • Status changed from new to closed

The bug itself is now fixed by 2150ad5 / b6c9d81.

Will polish the other patch later and send it, but it's an improvement, not a bugfix.

Note: See TracTickets for help on using tickets.