Opened 11 years ago
Closed 11 years ago
#1045 closed defect (fixed)
[panic] out of memory - heap full while resizing - execution terminated (awful-picman)
Reported by: | Mario Domenech Goulart | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | 4.9.0 |
Component: | unknown | Version: | 4.8.x |
Keywords: | awful-picman, out of memory error, heap full | Cc: | andyjpb@… |
Estimated difficulty: |
Description (last modified by )
I frequently get out [panic] out of memory - heap full while resizing - execution terminated
errors when running awful-picman (a pictures manager).
I don't have a simple test case, and I can't deterministically reproduce the problem, but it' not very difficult to trigger it.
Here are the steps to install awful-picman, run it and set up a test case:
git clone https://github.com/mario-goulart/awful-picman.git cd awful-picman git checkout 45a0a1e5f7245b0ea17b163746362aca87e77d95 chicken-install mkdir -p test/pics cd test/pics wget http://parenteses.org/mario/misc/DSC00065.jpg for i in `seq 100`; do cp DSC00065.jpg $i.jpg; done cd .. awful-picman --init
(--init
is only necessary for the first run.)
Then use you browser to request http://localhost:8080/folders/pics . Wait until the thumbnails generation is finished then keep reloading the page which shows the pictures. After a couple of requests awful-picman will crash with [panic] out of memory - heap full while resizing - execution terminated
.
Sometimes it crashes when the heap is growing, sometimes when the heap is shrinking. The call trace is usually different among crashes.
I can reproduce this problem with CHICKENs 4.7.0, 4.8.0, 4.8.0.3 and the 4.8.2 dev-snapshot tarball (Linux x86-64).
It crashes more frequently with 4.7.0 (usually on the first request).
Attachments (1)
Change History (16)
comment:1 Changed 11 years ago by
Milestone: | someday → 4.9.0 |
---|
comment:2 Changed 11 years ago by
I have a reproduction case for this using the Ugarit unit-test suite, too, if it helps. Basically, Ugarit runs a child process to manage actual storage operations, and talks to it via s-expressions over standard input and output. The test suites run fine, except for if I run the same sequence of operations as other tests use, but using an SHA hash rather than a Tiger one; this is a difference in the parent process, observable to the child process purely as the hash strings it is sent being different (with a different length, I note), but in that case, the child gives the same panic error message; but only after it's been sent the command to terminate itself cleanly. More details available upon request!
Ugarit also suffers a problem (in the parent process this time) where some srfi-4 function complains that what it's being passed isn't a u8vector, but printing it out exactly before the srfi-4 call shows that it is, which also suggests that memory is being corrupted somehow. Maybe related. C-Keen is able to reproduce this, but I've not seen it myself.
However, I do feel it is quite likely that Chicken has at least one underlying subtle memory corruption problem here!
Previous mailing-list discussions on this and similar mysteries that look like memory corruption (I don't have the message ID, sorry) resulted in the idea of Felix adding optional logging hooks to the lowest-level runtime operations relating to garbage collection, if somebody else would write tools to process the logs, checking various complex invariants to see if an inconsistency can be found.
comment:3 Changed 11 years ago by
Description: | modified (diff) |
---|
comment:4 Changed 11 years ago by
While debugging YET ANOTHER problem (the one on Alaric's server causing a corrupted henrietta cache), I stumbled upon a weird error which may provide the clue for this one:
I used some code to fetch a HTTP resource in a loop, and because I wanted to cause it to behave badly, I tried random things like interrupting it with Z and resuming it. During one of these attempts, after typing "fg", I got the following unexplained error:
181 182 183 184 185 186 187 ^Z zsh: suspended csi -s test.scm sjamaan@yves% fg [1] + continued csi -s test.scm 188 Error: Unknown protocol: "1" Call history: http-client.scm:164: close-input-port http-client.scm:165: close-output-port http-client.scm:166: connections http-client.scm:166: hash-table-delete! http-client.scm:121: open-output-string http-client.scm:121: uri-common#uri-host http-client.scm:121: write write-char/port http-client.scm:121: uri-common#uri-port http-client.scm:121: write http-client.scm:121: get-output-string http-client.scm:117: uri-common#uri-port http-client.scm:117: uri-common#uri-port http-client.scm:118: uri-common#uri-host http-client.scm:118: uri-common#uri-host http-client.scm:566: raise <--
The program itself doesn't contain bugs related to the URI construction (the URI is a hardcoded string constant), and is as follows:
(use http-client extras) (do ((i 0 (add1 i))) ((> i 1000)) (with-output-to-file (->string i) (lambda () (fprintf (current-error-port) "~A~%" i) ; Show progress (pp (with-input-from-request "http://code.call-cc.org/release-info?egg=netstring" #f read-file)))))
This leads me to believe that possibly the error may somehow be related to signal handling. Earlier, I had noticed that both Ugarit and Awful-picman spawn children, so they'll be receiving SIGCHLD a lot.
comment:6 Changed 11 years ago by
awful-picman only forks when generating thumbnails. The crashes I observe are not during thumbnails generation.
comment:7 Changed 11 years ago by
I did a few more tests, and the awful-picman may be receiving lots of SIGPIPEs if you keep refreshing fast enough. Not sure if that's the case though (if it isn't handling the signal it may not be interfering with the code)
comment:8 Changed 11 years ago by
Maybe these two errors are related:
- http://tests.call-cc.org/master/linux/x86/2013/08/24/salmonella-report/test/channel.htmlz
- http://tests.call-cc.org/master/linux/x86/2013/08/24/salmonella-report/test/dummy-user.htmlz
Maybe they can be useful to produce smaller and more deterministic test cases.
comment:9 Changed 11 years ago by
Priority: | major → critical |
---|
And another test program that triggers an error which is probably the same root cause:
(use http-client) (with-input-from-request "http://www.netbsd.org/" #f void)
This will either hang or segfault or print "heap full while resizing". I'm raising the priority to critical, considering so many different programs will cause this issue.
comment:11 Changed 11 years ago by
Cc: | andyjpb@… added |
---|
comment:12 Changed 11 years ago by
Status update: The panics from comment 8 and 9 have been fixed in 4.8.0.5.
I have a feeling the Ugarit bug has also been fixed, but haven't been able to verify. The problem in the looping http-client program is not reproducible so easily, so I'm unsure about that one.
comment:13 Changed 11 years ago by
I can also reproduce the out of memory errors with a CHICKEN & eggs built with clang 3.0 (Linux x86-64).
Changed 11 years ago by
Attachment: | awful-picman.log added |
---|
awful-picman crash log (with chicken 4.8.2 dev snapshot)
comment:14 Changed 11 years ago by
As suggested by Peter, I'm attaching a log (awful-picman.log) I obtained by running "awful-picman -:D", for future reference. CHICKEN 4.8.2.
comment:15 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
As discovered by Felix, it turned out to be an issue with sql-de-lite, which has been fixed by Jim in sql-de-lite 0.6.2
I think it's very important we fix this before 4.9.0 is out