Opened 8 years ago

Closed 8 years ago

#1045 closed defect (fixed)

[panic] out of memory - heap full while resizing - execution terminated (awful-picman)

Reported by: Mario Domenech Goulart Owned by:
Priority: critical Milestone: 4.9.0
Component: unknown Version: 4.8.x
Keywords: awful-picman, out of memory error, heap full Cc: andyjpb@…
Estimated difficulty:

Description (last modified by Mario Domenech Goulart)

I frequently get out [panic] out of memory - heap full while resizing - execution terminated errors when running awful-picman (a pictures manager).

I don't have a simple test case, and I can't deterministically reproduce the problem, but it' not very difficult to trigger it.

Here are the steps to install awful-picman, run it and set up a test case:

git clone https://github.com/mario-goulart/awful-picman.git
cd awful-picman
git checkout 45a0a1e5f7245b0ea17b163746362aca87e77d95
chicken-install
mkdir -p test/pics
cd test/pics
wget http://parenteses.org/mario/misc/DSC00065.jpg
for i in `seq 100`; do cp DSC00065.jpg $i.jpg; done
cd ..
awful-picman --init

(--init is only necessary for the first run.)

Then use you browser to request http://localhost:8080/folders/pics . Wait until the thumbnails generation is finished then keep reloading the page which shows the pictures. After a couple of requests awful-picman will crash with [panic] out of memory - heap full while resizing - execution terminated.

Sometimes it crashes when the heap is growing, sometimes when the heap is shrinking. The call trace is usually different among crashes.

I can reproduce this problem with CHICKENs 4.7.0, 4.8.0, 4.8.0.3 and the 4.8.2 dev-snapshot tarball (Linux x86-64).

It crashes more frequently with 4.7.0 (usually on the first request).

Attachments (1)

awful-picman.log (22.4 KB) - added by Mario Domenech Goulart 8 years ago.
awful-picman crash log (with chicken 4.8.2 dev snapshot)

Download all attachments as: .zip

Change History (16)

comment:1 Changed 8 years ago by sjamaan

Milestone: someday4.9.0

I think it's very important we fix this before 4.9.0 is out

comment:2 Changed 8 years ago by Alaric Snell-Pym

I have a reproduction case for this using the Ugarit unit-test suite, too, if it helps. Basically, Ugarit runs a child process to manage actual storage operations, and talks to it via s-expressions over standard input and output. The test suites run fine, except for if I run the same sequence of operations as other tests use, but using an SHA hash rather than a Tiger one; this is a difference in the parent process, observable to the child process purely as the hash strings it is sent being different (with a different length, I note), but in that case, the child gives the same panic error message; but only after it's been sent the command to terminate itself cleanly. More details available upon request!

Ugarit also suffers a problem (in the parent process this time) where some srfi-4 function complains that what it's being passed isn't a u8vector, but printing it out exactly before the srfi-4 call shows that it is, which also suggests that memory is being corrupted somehow. Maybe related. C-Keen is able to reproduce this, but I've not seen it myself.

However, I do feel it is quite likely that Chicken has at least one underlying subtle memory corruption problem here!

Previous mailing-list discussions on this and similar mysteries that look like memory corruption (I don't have the message ID, sorry) resulted in the idea of Felix adding optional logging hooks to the lowest-level runtime operations relating to garbage collection, if somebody else would write tools to process the logs, checking various complex invariants to see if an inconsistency can be found.

comment:3 Changed 8 years ago by Mario Domenech Goulart

Description: modified (diff)

comment:4 Changed 8 years ago by sjamaan

While debugging YET ANOTHER problem (the one on Alaric's server causing a corrupted henrietta cache), I stumbled upon a weird error which may provide the clue for this one:

I used some code to fetch a HTTP resource in a loop, and because I wanted to cause it to behave badly, I tried random things like interrupting it with Z and resuming it. During one of these attempts, after typing "fg", I got the following unexplained error:

181
182
183
184
185
186
187
^Z
zsh: suspended  csi -s test.scm
sjamaan@yves% fg
[1]  + continued  csi -s test.scm
188
Error: Unknown protocol: "1"

        Call history:

        http-client.scm:164: close-input-port     
        http-client.scm:165: close-output-port    
        http-client.scm:166: connections          
        http-client.scm:166: hash-table-delete!   
        http-client.scm:121: open-output-string   
        http-client.scm:121: uri-common#uri-host          
        http-client.scm:121: write        
        write-char/port   
        http-client.scm:121: uri-common#uri-port          
        http-client.scm:121: write        
        http-client.scm:121: get-output-string    
        http-client.scm:117: uri-common#uri-port          
        http-client.scm:117: uri-common#uri-port          
        http-client.scm:118: uri-common#uri-host          
        http-client.scm:118: uri-common#uri-host          
        http-client.scm:566: raise              <--

The program itself doesn't contain bugs related to the URI construction (the URI is a hardcoded string constant), and is as follows:

(use http-client extras)

(do ((i 0 (add1 i)))
    ((> i 1000))
  (with-output-to-file (->string i)
    (lambda ()
      (fprintf (current-error-port) "~A~%" i) ; Show progress
      (pp (with-input-from-request
          "http://code.call-cc.org/release-info?egg=netstring"
          #f
          read-file)))))

This leads me to believe that possibly the error may somehow be related to signal handling. Earlier, I had noticed that both Ugarit and Awful-picman spawn children, so they'll be receiving SIGCHLD a lot.

comment:5 Changed 8 years ago by sjamaan

BTW: I ran the test program with Chicken 4.8.0.3

comment:6 Changed 8 years ago by Mario Domenech Goulart

awful-picman only forks when generating thumbnails. The crashes I observe are not during thumbnails generation.

comment:7 Changed 8 years ago by sjamaan

I did a few more tests, and the awful-picman may be receiving lots of SIGPIPEs if you keep refreshing fast enough. Not sure if that's the case though (if it isn't handling the signal it may not be interfering with the code)

comment:8 Changed 8 years ago by Mario Domenech Goulart

comment:9 Changed 8 years ago by sjamaan

Priority: majorcritical

And another test program that triggers an error which is probably the same root cause:

(use http-client)
(with-input-from-request "http://www.netbsd.org/" #f void)

This will either hang or segfault or print "heap full while resizing". I'm raising the priority to critical, considering so many different programs will cause this issue.

comment:10 Changed 8 years ago by sjamaan

Forgot to click the preview button again :S

comment:11 Changed 8 years ago by andyjpb

Cc: andyjpb@… added

comment:12 Changed 8 years ago by sjamaan

Status update: The panics from comment 8 and 9 have been fixed in 4.8.0.5.

I have a feeling the Ugarit bug has also been fixed, but haven't been able to verify. The problem in the looping http-client program is not reproducible so easily, so I'm unsure about that one.

comment:13 Changed 8 years ago by Mario Domenech Goulart

I can also reproduce the out of memory errors with a CHICKEN & eggs built with clang 3.0 (Linux x86-64).

Changed 8 years ago by Mario Domenech Goulart

Attachment: awful-picman.log added

awful-picman crash log (with chicken 4.8.2 dev snapshot)

comment:14 Changed 8 years ago by Mario Domenech Goulart

As suggested by Peter, I'm attaching a log (awful-picman.log) I obtained by running "awful-picman -:D", for future reference. CHICKEN 4.8.2.

comment:15 Changed 8 years ago by Mario Domenech Goulart

Resolution: fixed
Status: newclosed

As discovered by Felix, it turned out to be an issue with sql-de-lite, which has been fixed by Jim in sql-de-lite 0.6.2

Note: See TracTickets for help on using tickets.