Opened 10 years ago

Closed 10 years ago

Last modified 9 years ago

#668 closed defect (fixed)

Issue with 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94

Reported by: Mario Domenech Goulart Owned by:
Priority: critical Milestone: 4.9.0
Component: unknown Version: 4.7.x
Keywords: 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 Cc: sjamaan
Estimated difficulty:

Description (last modified by Mario Domenech Goulart)

I have a quite complex awful web app that behaves weirdly when run by chicken from git commit 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94. The same web app works fine with chicken code prior to 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 and behaves weirdly with commits more recent than it.

By "behaves weirdly" I mean the application can be started, but requests take a long time to be finished and keep the CPU very busy. The requested data is eventually delivered to the client but takes a long time and demands a lot of CPU work.

Unfortunately I don't have a simple test case to reproduce the problem yet. I'm just filing this ticket now in the hope somebody can see something in 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 that could be a problem.

Attachments (1)

scheduler-debug.log.bz2 (245.0 KB) - added by Mario Domenech Goulart 10 years ago.
Debugging output from the scheduler

Download all attachments as: .zip

Change History (13)

comment:1 Changed 10 years ago by Mario Domenech Goulart

Description: modified (diff)

comment:2 Changed 10 years ago by felix winkelmann

I have absolutely no idea. The diff looks fine to me.

comment:3 Changed 10 years ago by Mario Domenech Goulart

Ok, I could make a simple code to illustrate the problem. Christian said he could also reproduce the issue using a recent chicken-core. Here is the test code:

$ cat spiffy.scm 
(use spiffy)
(root-path (current-directory))
(start-server)

$ cat index.html 
<html><head><link rel='stylesheet' href='0.css' type='text/css'><link rel='stylesheet' href='1.css' type='text/css'></head><body>foo</body></html>

$ csi -s spiffy.scm

If I access http://localhost:8080, the CPU usage gets very high for ~60s, when the server eventually sends the response.

This behavior can be observed no matter the CSS files exist or not.

If I remove the link to one of the CSS files, it works as expected (i.e., it doesn't hang for 60s eating CPU).

Most part of the strace output goes like:

gettimeofday({1314118760, 902292}, NULL) = 0
gettimeofday({1314118760, 902344}, NULL) = 0
gettimeofday({1314118760, 902400}, NULL) = 0
select(1024, [4 5], [], NULL, {60, 0})  = 1 (in [4], left {59, 999997})
gettimeofday({1314118760, 902705}, NULL) = 0
gettimeofday({1314118760, 902754}, NULL) = 0
select(1024, [4 5], [], NULL, {60, 0})  = 1 (in [4], left {59, 999998})
gettimeofday({1314118760, 903024}, NULL) = 0
gettimeofday({1314118760, 903072}, NULL) = 0
select(1024, [4 5], [], NULL, {59, 999000}) = 1 (in [4], left {59, 998998})
gettimeofday({1314118760, 903340}, NULL) = 0
gettimeofday({1314118760, 903387}, NULL) = 0
select(1024, [4 5], [], NULL, {59, 999000}) = 1 (in [4], left {59, 998998})
gettimeofday({1314118760, 903657}, NULL) = 0
gettimeofday({1314118760, 903705}, NULL) = 0

...

select(1024, [4 5], [], NULL, {0, 2000}) = 1 (in [4], left {0, 1999})
gettimeofday({1314118820, 900945}, NULL) = 0
gettimeofday({1314118820, 900977}, NULL) = 0
select(1024, [4 5], [], NULL, {0, 2000}) = 1 (in [4], left {0, 1999})
gettimeofday({1314118820, 901158}, NULL) = 0
gettimeofday({1314118820, 901190}, NULL) = 0
select(1024, [4 5], [], NULL, {0, 1000}) = 1 (in [4], left {0, 999})
gettimeofday({1314118820, 901370}, NULL) = 0
gettimeofday({1314118820, 901402}, NULL) = 0
select(1024, [4 5], [], NULL, {0, 1000}) = 1 (in [4], left {0, 999})
gettimeofday({1314118820, 901582}, NULL) = 0
gettimeofday({1314118820, 901614}, NULL) = 0
select(1024, [4 5], [], NULL, {0, 1000}) = 1 (in [4], left {0, 999})
gettimeofday({1314118820, 901794}, NULL) = 0
gettimeofday({1314118820, 901827}, NULL) = 0
select(1024, [4 5], [], NULL, {0, 1000}) = 1 (in [4], left {0, 999})
gettimeofday({1314118820, 902007}, NULL) = 0
select(1024, [4], [], NULL, {0, 0})     = 1 (in [4], left {0, 0})
shutdown(5, 0 /* receive */)            = 0
shutdown(5, 1 /* send */)               = 0
close(5)                                = 0
select(5, [4], NULL, NULL, {0, 0})      = 1 (in [4], left {0, 0})
accept(4, 0, NULL)                      = 5

Changed 10 years ago by Mario Domenech Goulart

Attachment: scheduler-debug.log.bz2 added

Debugging output from the scheduler

comment:4 Changed 10 years ago by Mario Domenech Goulart

I've attached a file containing debug messages from the scheduler. I used the chicken-core code with the HEAD set to 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94.

The data in the debug log represents one successful response followed by a response that took 60s eating CPU (using the example code showed above). I've used firefox 3.5.16 to request data from the server (Debian Linux x86).

comment:5 Changed 10 years ago by felix winkelmann

Cc: sjamaan added

Can the problem be nailed down to this particular commit? If you revert it in the current HEAD, does it work correctly?

comment:6 Changed 10 years ago by Mario Domenech Goulart

I'm rechecking commits around 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 to observe the behavior of the test code. I got to 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 via git bisect. It seems that now I'm observing a slightly different behavior with commits previous to 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94, but still not the behavior I'd expect.

The HEAD code from yesterday shows the same behavior as 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94's (hangs and high CPU usage).

I could not build the bootstrap compiler for one commit (5defd64d28e29060e2b4fbcf9418d6a4e41539c8 -- the one right before 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94). I used 4.7.2 to build chicken, but then I got an error while building the regex egg, IIRC.

I tried 98259ae99912d3c7883cb32bfa2e51252ec02e4e and it initially seems to work, but it hangs sometimes (less frequently than 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 and doesn't consume a lot CPU, but more frequently than 4.7.0). The same for 8f1cabaabadf89361ce3b594903cbd2c43c3deec.

45c040f0a352c6f4b4431788aef1d0357c6ede6f (the one right after 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94) hangs consuming CPU.

To request data from the server, I just reload the page on Firefox.

So, summarizing (commits ordered according to git log order):

  • 45c040f0a352c6f4b4431788aef1d0357c6ede6f: Hangs frequently and consumes CPU
  • 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94: Hangs frequently and consumes CPU
  • 5defd64d28e29060e2b4fbcf9418d6a4e41539c8: could not test (build errors)
  • 98259ae99912d3c7883cb32bfa2e51252ec02e4e: Hangs sometimes, but doesn't consume CPU
  • 8f1cabaabadf89361ce3b594903cbd2c43c3deec: Hangs sometimes, but doesn't consume CPU

comment:7 Changed 10 years ago by Mario Domenech Goulart

If I revert 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 I get the same behavior as 98259ae99912d3c7883cb32bfa2e51252ec02e4e's and 8f1cabaabadf89361ce3b594903cbd2c43c3deec's (hang sometimes, but don't consume CPU).

comment:8 Changed 10 years ago by felix winkelmann

Priority: majorcritical

comment:9 Changed 10 years ago by felix winkelmann

Owner: felix winkelmann deleted
Status: newassigned

comment:10 Changed 10 years ago by felix winkelmann

Resolution: fixed
Status: assignedclosed

The fix is so trivial, it hurts:

diff --git a/scheduler.scm b/scheduler.scm
index 08c8a5c..02a8fad 100644
--- a/scheduler.scm
+++ b/scheduler.scm
@@ -436,7 +436,7 @@ EOF
                            [inf (##core#inline "C_fd_test_input" fd)]
                            [outf (##core#inline "C_fd_test_output" fd)])
                       (dbg "fd " fd " state: input=" inf ", output=" outf)
-                      (if (or inf outf errf)
+                      (if (or inf outf)
                           (let loop2 ((threads (cdr a)) (keep '()))
                             (if (null? threads)
                                 (if (null? keep)

Since unbound variable checks are disabled in this code, the reference would result in thh test being always true.

I will apply this directly to master, there is not much reason to review it.

comment:11 Changed 10 years ago by Mario Domenech Goulart

Fantastic, Felix! Thanks for fixing it.

comment:12 Changed 9 years ago by felix winkelmann

Milestone: 4.8.04.9.0

Milestone 4.8.0 deleted

Note: See TracTickets for help on using tickets.