Opened 7 years ago

Closed 6 years ago

#892 closed defect (fixed)

Segfault on insanely long lists

Reported by: sjamaan Owned by:
Priority: critical Milestone: 4.9.0
Component: unknown Version: 4.7.x
Keywords: segfaults are fun Cc:
Estimated difficulty:

Description

This silly invocation fails, hard (after eating up a lot of memory):

csi -e "(use srfi-1) (let lp ((l (iota 10000000))) (lp (reverse l)))"

It gives a segfault when it's printing something like this:

Error: (reverse) bad argument type - not a proper list: (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 . 70093846267453)

Other errors I've seen are:

Error: (reverse) bad argument type - not a proper list: (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 . #<invalid forwarded object>)

and (but this is when I tried it with 4.7.0):

Error: (reverse) bad argument type - not a proper list: (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 #(##sys#fnord ##sys#fnord ##sys#fnord ##sys#fnord ##sys#fnord 143

Change History (6)

comment:1 Changed 7 years ago by felix winkelmann

Milestone: 4.8.04.9.0

comment:2 Changed 7 years ago by felix winkelmann

Priority: majorcritical

comment:3 Changed 6 years ago by sjamaan

Additional data: I verified (clumsily, by looking at top(1) output while the program ran) that the crash happens at the instant the maximum heap size is exceeded. After increasing the heap size from 2G to 4G (via -:hm4G), the crash happened when the new maximum heap size was exceeded (which takes less than twice the time it takes 2G heap to get filled!).

So something goes terribly wrong when the heap is full. Also, why isn't the garbage cleaned up which gets left behind by the loop?

comment:4 Changed 6 years ago by sjamaan

With a 6G heap the program seems to run stable (and consumes all memory). It really seems to need this much memory!

A rough calculation seems to indicate that 10,000,000 fixnums require about 77MiB on a 64-bit machine. With the additional cons cells for the list that's three times as much, about 229MiB. With "reverse" running, this will need twice as much again, let's say it needs 500MiB to keep the two lists around. When compiled, the same program does indeed take up as much (well, 1G but that's probably due to the buddy algorithm resizing the heap to double its size every time it runs the GC when more memory is needed).

Having an interpreted program take about 4 times as much is a little strange (and possibly candidate for improvement), but the real bug is that it's crashing hard when running out of memory.

comment:5 Changed 6 years ago by sjamaan

This turns out to be due to a GC which causes the heap to get resized to the maximum heap size, but the data that gets copied fitting the new heap. Then, when the heap gets "resized" again, it is already at the maximum size so C_rereclaim2 simply returns instead of finishing its job. This causes massive breakage.

The proper fix seems to be to keep going if the heap can't be resized, and letting the normal "heap full while resizing" error handling get triggered. This is the normal path which happens when the heap gets resized up to the maximum size, and the data doesn't fit. Patch(es) sent to chicken-hackers.

comment:6 Changed 6 years ago by Moritz Heidkamp

Resolution: fixed
Status: newclosed

Fixed in 8bb3b4c73399cbf79311d7222b7ba2668861e32c

Note: See TracTickets for help on using tickets.