Opened 10 years ago

Closed 9 years ago

#285 closed defect (wontfix)

termite hangs and doesn't return values when it should

Reported by: Jeronimo Pellegrini Owned by: Christian Kellermann
Priority: major Milestone:
Component: extensions Version: 4.5.x
Keywords: termite egg Cc:
Estimated difficulty:

Description

Hello,

It seems that the termite egg -- it looks like processes
don't react to messages immediately -- "?" seems to hang
and wait forever.

Below I compare the termite egg with Chicken from git/master (0cc55fe1d91af124b64644ea7fd9a82cd4712e52) to Gambit C 4.6.0 with the termite code from googlecode/svn.

> (define pid (spawn (lambda () (write (?)))))
> (! pid "hello world")
> "hello world"

In Chicken:

#;2> (define pid (spawn (lambda () (write (?)))))
#;3> (! pid "hello world")

Seems like the message got stuck in some buffer...
Something to do with current-output-port, perhaps?

Sometimes I can get an answer:

#;4> (! pid 'a)
"hello world"
#;5>

But several times not:

(...)
#;19> (! pid "aaaaaaaaaaaaaaaaaaaaaaaaaaa")
#;20> (! pid "aaaaaaaaaaaaaaaaaaaaaaaaaaa")
#;21> (! pid "aaaaaaaaaaaaaaaaaaaaaaaaaaa")
#;22> (! pid "aaaaaaaaaaaaaaaaaaaaaaaaaaa")
#;23> (! pid "aaaaaaaaaaaaaaaaaaaaaaaaaaa")
#;24> (! pid "aaaaaaaaaaaaaaaaaaaaaaaaaaa")
#;25> (! pid "aaaaaaaaaaaaaaaaaaaaaaaaaaa")
#;26> (! pid "aaaaaaaaaaaaaaaaaaaaaaaaaaa")

In Gambit, if I set a timeout for receiving a message and
it actually times out, an exception is raised:

(? 1)
*** ERROR IN (console)@4.1 -- mailbox receive timed out
(thread-receive 0)
1>

In Chicken it gets blocked forever:

#;2> (? 1)

Hitting C gives me a stack trace:

        Call history:

        termite.scm:103: log-crash
        termite.scm:88: call-with-output-string
        termite.scm:81: continuation-capture
        termite.scm:94: termite-exception?
        termite.scm:103: log-crash
        termite.scm:88: call-with-output-string
        termite.scm:81: continuation-capture
        print-call-chain                        <--

Defining a value to be returned when "?" doesn't get
anything after n seconds also doesn't work:

In Gambit, it returns the value:

> (? 1 'ok)
ok
>

In Chicken, it doesn't hang, but no value is returned:

#;2> (? 1 'ok)
#;3>

Also, "on" seems to hang. On Gambit, I can do this:

Start two gsi interpreters with termite loaded. These
are gambit-A and gambit-B:

gambit-A> (node-init (make-node "127.0.0.1" 3000))
ok


gambit-B> (node-init (make-node "127.0.0.1" 3001))
ok
gambit-B> (define A (make-node "127.0.0.1" 3000))
gambit-B> (on A (lambda () (print 'x)))
gambit-B>

B doesn't hang, and "x" is written on A's REPL.

On Chicken,

A runs:

 (use termite)
 (define a (make-node "127.0.0.1" 10000))
 (node-init a)

On B:

#;2> (node-init (make-node "127.0.0.1" 10001))
ok
#;3> (define A (make-node "127.0.0.1" 10000))
#;4> (on A (lambda () (print 'x)))

Nothing is printed on A's REPL, and B hangs.
Again, hitting C shows this stach trace:

        Call history:

        termite.scm:94: termite-exception?
        termite.scm:103: log-crash
        termite.scm:88: call-with-output-string
        termite.scm:81: continuation-capture
        termite.scm:94: termite-exception?
        termite.scm:103: log-crash
        termite.scm:88: call-with-output-string
        print-call-chain                        <--

It doesn't seem to actually have to do with I/O, because this also
makes B hang:

#;4> (on A (lambda () (set! x 20) (values)))

I hope this is just some small issue in the way termite interacts with Chicken's threading code...

Thanks,
J.

Change History (4)

comment:1 Changed 10 years ago by Jeronimo Pellegrini

Priority: majorminor

comment:2 Changed 10 years ago by Christian Kellermann

Owner: set to Christian Kellermann
Priority: minormajor
Status: newaccepted

Thanks for this excellent bug report!

The problem here is my misunderstanding of the inner workings of the mailbox egg and it's implication for threads. It seems that the message will "pop" up when another one is put into the mailbox.

I will have a look. Maybe Kon can explain some things. Also there seems to be a problem when you rewind a mailbox for the primordial thread, this may cause a deadlock exception. I am not sure yet how to deal with this situation properly either.

The node code has not been well tested as you can see. If you want to help improving the situation I am glad of any help!

comment:3 Changed 10 years ago by Jeronimo Pellegrini

I am not familiar with the mailbox and mailbox-threads eggs and the internals of Termite, so
I'm not sure how I could help. What could I do to help? More testing maybe?

comment:4 Changed 9 years ago by Christian Kellermann

Resolution: wontfix
Status: acceptedclosed

I am sorry, but after a year and still no idea how to fix this I will mark the termite egg as obsolete and declare it a failure. A cleaner implementation closer to chicken scheme's idioms might have more success.

Thanks for bearing with me here.

Note: See TracTickets for help on using tickets.