Opened 12 years ago
Closed 11 years ago
#989 closed defect (fixed)
High CPU usage when calling signal handler multiple times
Reported by: | Mario Domenech Goulart | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | 4.9.0 |
Component: | core libraries | Version: | 4.8.x |
Keywords: | signal handling | Cc: | |
Estimated difficulty: |
Description
Here's an example to demonstrate the aforementioned behavior:
;; Press C-c multiple times in an interval < 20s (use posix) (set-signal-handler! signal/int (lambda (signal) (print "caught signal" signal) (sleep 20))) (let loop () (sleep 1) (loop))
I cannot reproduce that behavior on 4.7.0. Peter mentioned on IRC that b7995839c0b481280bdeda117eb68bc0e78a40bf triggered it.
Attachments (1)
Change History (12)
comment:1 Changed 12 years ago by
comment:2 Changed 12 years ago by
Ripping out the queueing up of pending events (so that interrupt_reason is overwritten like it used to be) doesn't help either.
comment:3 Changed 12 years ago by
C_stack_probe(&a)
keeps returning false in f_19521
(the procedure marked with /* loop in interrupt-hook in k19511 in k19465 in k17778 in k17771 in k17767 in k17763 in k17759 in k21506 in k21500 in k17281 in k17278 in k17276 in k17274 in k17272 in k17270 in k17267 in k17265 in k17263 in k17261 in k17259 in ... */
), even after the C_reclaim has "returned" and the procedure is re-invoked via its trampoline.
Changed 12 years ago by
Attachment: | signal-handler-hack.patch added |
---|
a hack that makes the symptoms go away
comment:4 Changed 12 years ago by
Now that I try to think why that patch works, I cannot come up with an explanation.
But something seems to go wrong when multiple signal-handlers are running at the same time.
comment:5 Changed 11 years ago by
Extremely interesting: I cannot reproduce the problem when I add a single print statement after the (sleep 20)
in the signal handler:
;; Press C-c multiple times in an interval < 20s (use posix) (set-signal-handler! signal/int (lambda (signal) (print "caught signal" signal) (sleep 20) (print "done"))) (let loop () (sleep 1) (loop))
So far, I can't explain this yet.
comment:6 Changed 11 years ago by
Still unclear, but the continuation of sleep seems to be doing something very strange. If we explicitly capture the continuation (which should be identical to the implicit continuation) and pass it to the procedure when we invoke it, it works. If we use the continuation supplied by the foreign-lambda wrapper, it breaks.
However, if sleep(x) is replaced by a constant integer, it works with either continuation.
(use posix) ;; BROKEN: (define do-sleep (foreign-primitive ((scheme-object k) (int x)) "printf(\"%d\\n\", sleep(x)); C_values(2, C_SCHEME_UNDEFINED, C_k);")) ;; OKAY: (define do-sleep (foreign-primitive ((scheme-object k) (int x)) "printf(\"%d\\n\", sleep(x)); C_values(2, C_SCHEME_UNDEFINED, k);")) (set-signal-handler! signal/int (lambda (signal) (print "caught signal" signal) (call/cc (lambda (k) (do-sleep k 10))))) (let loop () (sleep 1) (loop))
It gets weirder and weirder!
comment:7 Changed 11 years ago by
This could be related to #1058. If either is fixed, the other should be re-tested.
comment:8 Changed 11 years ago by
I think #877 is due to the same underlying problem. There seems to be something strange going on when handling interrupts. I think the reason the bug goes away when messing with the continuation is because we're now allowing interrupts to get handled via the "regular" way, through the GC rather than explicitly by C_pending_interrupt.
I still not quite grok it, but this seems to be the case. A fix is by not calling C_pending_interrupt, but this would kill the ability to save a backlog of interrupts (effectively undoing b7995839c0b481280bdeda117eb68bc0e78a40bf).
comment:11 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Oddly enough, if I throw out the HAVE_SIGACTION from Makefile.bsd, make spotless and rebuild, it still fails the same way (on master). It seems to be related to the reworking of the signal handling itself.
gdb seems to point to an infinite loop in the GC, but that's rather difficult to understand given the setjmp/longjmp stuff. ktrace shows that it keeps setting and resetting a signal mask.