#668 closed defect (fixed)
Issue with 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94
Reported by: | Mario Domenech Goulart | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | 4.9.0 |
Component: | unknown | Version: | 4.7.x |
Keywords: | 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 | Cc: | sjamaan |
Estimated difficulty: |
Description (last modified by )
I have a quite complex awful web app that behaves weirdly when run by chicken from git commit 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94. The same web app works fine with chicken code prior to 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 and behaves weirdly with commits more recent than it.
By "behaves weirdly" I mean the application can be started, but requests take a long time to be finished and keep the CPU very busy. The requested data is eventually delivered to the client but takes a long time and demands a lot of CPU work.
Unfortunately I don't have a simple test case to reproduce the problem yet. I'm just filing this ticket now in the hope somebody can see something in 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 that could be a problem.
Attachments (1)
Change History (13)
comment:1 Changed 13 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 13 years ago by
comment:3 Changed 13 years ago by
Ok, I could make a simple code to illustrate the problem. Christian said he could also reproduce the issue using a recent chicken-core. Here is the test code:
$ cat spiffy.scm (use spiffy) (root-path (current-directory)) (start-server) $ cat index.html <html><head><link rel='stylesheet' href='0.css' type='text/css'><link rel='stylesheet' href='1.css' type='text/css'></head><body>foo</body></html> $ csi -s spiffy.scm
If I access http://localhost:8080, the CPU usage gets very high for ~60s, when the server eventually sends the response.
This behavior can be observed no matter the CSS files exist or not.
If I remove the link to one of the CSS files, it works as expected (i.e., it doesn't hang for 60s eating CPU).
Most part of the strace output goes like:
gettimeofday({1314118760, 902292}, NULL) = 0 gettimeofday({1314118760, 902344}, NULL) = 0 gettimeofday({1314118760, 902400}, NULL) = 0 select(1024, [4 5], [], NULL, {60, 0}) = 1 (in [4], left {59, 999997}) gettimeofday({1314118760, 902705}, NULL) = 0 gettimeofday({1314118760, 902754}, NULL) = 0 select(1024, [4 5], [], NULL, {60, 0}) = 1 (in [4], left {59, 999998}) gettimeofday({1314118760, 903024}, NULL) = 0 gettimeofday({1314118760, 903072}, NULL) = 0 select(1024, [4 5], [], NULL, {59, 999000}) = 1 (in [4], left {59, 998998}) gettimeofday({1314118760, 903340}, NULL) = 0 gettimeofday({1314118760, 903387}, NULL) = 0 select(1024, [4 5], [], NULL, {59, 999000}) = 1 (in [4], left {59, 998998}) gettimeofday({1314118760, 903657}, NULL) = 0 gettimeofday({1314118760, 903705}, NULL) = 0 ... select(1024, [4 5], [], NULL, {0, 2000}) = 1 (in [4], left {0, 1999}) gettimeofday({1314118820, 900945}, NULL) = 0 gettimeofday({1314118820, 900977}, NULL) = 0 select(1024, [4 5], [], NULL, {0, 2000}) = 1 (in [4], left {0, 1999}) gettimeofday({1314118820, 901158}, NULL) = 0 gettimeofday({1314118820, 901190}, NULL) = 0 select(1024, [4 5], [], NULL, {0, 1000}) = 1 (in [4], left {0, 999}) gettimeofday({1314118820, 901370}, NULL) = 0 gettimeofday({1314118820, 901402}, NULL) = 0 select(1024, [4 5], [], NULL, {0, 1000}) = 1 (in [4], left {0, 999}) gettimeofday({1314118820, 901582}, NULL) = 0 gettimeofday({1314118820, 901614}, NULL) = 0 select(1024, [4 5], [], NULL, {0, 1000}) = 1 (in [4], left {0, 999}) gettimeofday({1314118820, 901794}, NULL) = 0 gettimeofday({1314118820, 901827}, NULL) = 0 select(1024, [4 5], [], NULL, {0, 1000}) = 1 (in [4], left {0, 999}) gettimeofday({1314118820, 902007}, NULL) = 0 select(1024, [4], [], NULL, {0, 0}) = 1 (in [4], left {0, 0}) shutdown(5, 0 /* receive */) = 0 shutdown(5, 1 /* send */) = 0 close(5) = 0 select(5, [4], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0}) accept(4, 0, NULL) = 5
Changed 13 years ago by
Attachment: | scheduler-debug.log.bz2 added |
---|
Debugging output from the scheduler
comment:4 Changed 13 years ago by
I've attached a file containing debug messages from the scheduler. I used the chicken-core code with the HEAD set to 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94.
The data in the debug log represents one successful response followed by a response that took 60s eating CPU (using the example code showed above). I've used firefox 3.5.16 to request data from the server (Debian Linux x86).
comment:5 Changed 13 years ago by
Cc: | sjamaan added |
---|
Can the problem be nailed down to this particular commit? If you revert it in the current HEAD, does it work correctly?
comment:6 Changed 13 years ago by
I'm rechecking commits around 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 to observe the behavior of the test code. I got to 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 via git bisect. It seems that now I'm observing a slightly different behavior with commits previous to 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94, but still not the behavior I'd expect.
The HEAD code from yesterday shows the same behavior as 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94's (hangs and high CPU usage).
I could not build the bootstrap compiler for one commit (5defd64d28e29060e2b4fbcf9418d6a4e41539c8 -- the one right before 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94). I used 4.7.2 to build chicken, but then I got an error while building the regex egg, IIRC.
I tried 98259ae99912d3c7883cb32bfa2e51252ec02e4e and it initially seems to work, but it hangs sometimes (less frequently than 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 and doesn't consume a lot CPU, but more frequently than 4.7.0). The same for 8f1cabaabadf89361ce3b594903cbd2c43c3deec.
45c040f0a352c6f4b4431788aef1d0357c6ede6f (the one right after 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94) hangs consuming CPU.
To request data from the server, I just reload the page on Firefox.
So, summarizing (commits ordered according to git log order):
- 45c040f0a352c6f4b4431788aef1d0357c6ede6f: Hangs frequently and consumes CPU
- 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94: Hangs frequently and consumes CPU
- 5defd64d28e29060e2b4fbcf9418d6a4e41539c8: could not test (build errors)
- 98259ae99912d3c7883cb32bfa2e51252ec02e4e: Hangs sometimes, but doesn't consume CPU
- 8f1cabaabadf89361ce3b594903cbd2c43c3deec: Hangs sometimes, but doesn't consume CPU
comment:7 Changed 13 years ago by
If I revert 49b92ba0bfb7ecfceb49a45f64e9943ea83a6d94 I get the same behavior as 98259ae99912d3c7883cb32bfa2e51252ec02e4e's and 8f1cabaabadf89361ce3b594903cbd2c43c3deec's (hang sometimes, but don't consume CPU).
comment:8 Changed 13 years ago by
Priority: | major → critical |
---|
comment:9 Changed 13 years ago by
Owner: | felix winkelmann deleted |
---|---|
Status: | new → assigned |
comment:10 Changed 13 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
The fix is so trivial, it hurts:
diff --git a/scheduler.scm b/scheduler.scm index 08c8a5c..02a8fad 100644 --- a/scheduler.scm +++ b/scheduler.scm @@ -436,7 +436,7 @@ EOF [inf (##core#inline "C_fd_test_input" fd)] [outf (##core#inline "C_fd_test_output" fd)]) (dbg "fd " fd " state: input=" inf ", output=" outf) - (if (or inf outf errf) + (if (or inf outf) (let loop2 ((threads (cdr a)) (keep '())) (if (null? threads) (if (null? keep)
Since unbound variable checks are disabled in this code, the reference would result in thh test being always true.
I will apply this directly to master, there is not much reason to review it.
I have absolutely no idea. The diff looks fine to me.