Opened 7 years ago

Closed 6 years ago

#910 closed defect (worksforme)

Failure with many arguments on amd64

Reported by: sjamaan Owned by: felix winkelmann
Priority: critical Milestone: 4.9.0
Component: core libraries Version: 4.8.x
Keywords: Cc: Christian Kellermann, Ivan Raikov
Estimated difficulty:

Description (last modified by sjamaan)

The following simple program, when compiled crashes on AMD64:

(define (make-me-a-list . args) args)

(print (make-me-a-list 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150))

The errors vary from bus errors to illegal instruction to segmentation fault. There's no C_do_apply call to be found in the code, so perhaps it's "just" a gcc bug.

If you think this is bad enough to warrant a fix before 4.8.0 is out, feel free to reset the milestone.

Attachments (1)

t.scm (28.3 KB) - added by Christian Kellermann 7 years ago.
My script to crash it…

Download all attachments as: .zip

Change History (11)

comment:1 Changed 7 years ago by sjamaan

This was on NetBSD/amd64 5.1, with GCC 4.1.3 20080704 (the one shipped with NetBSD 5.1.2) and both Chicken 4.7.0 and today's master, 4.7.5 (rev e061038)

It didn't crash on Linux/i386, and Christian tested on Linux/amd64 where it also didn't crash, which means it's likely a NetBSD-only bug or a bug in this particular GCC.

comment:2 Changed 7 years ago by sjamaan

Description: modified (diff)

comment:3 Changed 7 years ago by felix winkelmann

Cc: Christian Kellermann added
Priority: majorcritical

Can someone run this with valgrind on a 64-bit box?

Changed 7 years ago by Christian Kellermann

Attachment: t.scm added

My script to crash it...

comment:4 Changed 7 years ago by Christian Kellermann

My impressions so far:

It does crash, but I need a whole load of numbers to get it that far, see the attached script.
When I run it with valgrind, it does *not* crash however. Also valgrind does not complain about leakage:

==20622== 
==20622== HEAP SUMMARY:
==20622==     in use at exit: 1,192,896 bytes in 1,696 blocks
==20622==   total heap usage: 1,705 allocs, 9 frees, 1,219,644 bytes allocated
==20622== 
==20622== LEAK SUMMARY:
==20622==    definitely lost: 0 bytes in 0 blocks
==20622==    indirectly lost: 0 bytes in 0 blocks
==20622==      possibly lost: 0 bytes in 0 blocks
==20622==    still reachable: 1,192,896 bytes in 1,696 blocks
==20622==         suppressed: 0 bytes in 0 blocks
==20622== Rerun with --leak-check=full to see details of leaked memory
==20622== 

Looks good to me...

This has been chicken Version 4.7.5 (rev e061038)
linux-unix-gnu-x86-64 [ 64bit manyargs dload ptables ]

comment:5 Changed 7 years ago by felix winkelmann

Cc: Ivan Raikov added
Owner: set to felix winkelmann
Status: newassigned

I think this is simply a temporary-stack overflow (the stack that holds the arguments, used in C_do_apply). I can't remember right now if this is checked, but will try to come up with a patch that adds some checks (just for testing).

We should probably try to figure this out before we do a release.

comment:6 Changed 7 years ago by sjamaan

I'm not sure it's C_do_apply that's the problem; there is no call to it in the generated code. It contains a C_proc152 function datatype, which when called apparently causes a stack overflow (at least when single-stepping through it with gdb it crashes upon doing that call).

It looks like C_stack_probe isn't checking the stack size correctly. However, modifying the compiled program manually by adding some more to the C_stack_probe value doesn't seem to help fix it.

comment:7 Changed 7 years ago by felix winkelmann

No, C_do_apply is not involved. But C_restore_rest would overflow the temporary stack. But the crash happens earlier - already when doing the call (or directly when entering the procedure). I get (like you) all sorts of weird segfaults and illegal instruction errors. This looks the call is already exceeding the allowed parameter limit (AFAIK, the C standard just allows a handful, passing a very large number of arguments to a C function may result in undefined behaviour), particularly because it calls a vararg function. This seems simply to be a limitation of the x86-64 ABI.

I see no fix besides introducing a static limit, in other words: give a warning or error when compiling a call with more than (say) 1024 arguments.

I also think this is not relevant to 4.8.0, as there isn't much that can be done about it.

comment:8 in reply to:  7 Changed 7 years ago by sjamaan

Replying to felix:

This looks the call is already exceeding the allowed parameter limit (AFAIK, the C standard just allows a handful, passing a very large number of arguments to a C function may result in undefined behaviour), particularly because it calls a vararg function. This seems simply to be a limitation of the x86-64 ABI.

Hm, this is interesting. It looks like only 127 arguments are guaranteed by the C spec. However, a simple test program using the same GCC compiler gave me no error. Also, I think the C compiler should give a warning or error when trying to compile a call containing more arguments than the compiler (or its target platform) can handle. But then again, it's C....

I see no fix besides introducing a static limit, in other words: give a warning or error when compiling a call with more than (say) 1024 arguments.

I was thinking about possibly just compiling this to the equivalent of {{(apply make-me-a-list args)}}, so using do_apply instead of a direct call. This might get around any C compiler limits (but of course not ABI limitations).

However, I'm not convinced that's really the problem (since the test program just worked...). Limiting the number of arguments to Scheme procedures doesn't sound very appealing either.

I also think this is not relevant to 4.8.0, as there isn't much that can be done about it.

I agree, let's just get on with this release.

comment:9 Changed 6 years ago by sjamaan

I had another look at it and the 6000 arguments in C-Keen's example broke sometimes for me. I've been having varying results. Sometimes the program just works, sometimes not. For example, at optimization levels -O3 and higher for CHICKEN it works, and with lower optimization levels it will crash, but I can get it to work if I compile the resulting C file with higher optimization levels for GCC.

If the program segfaults normally, sometimes it doesn't segfault when run under gdb. It's also not the stack probe, because I can get it to force a GC by manipulating the generated C code, and it will still break. Perhaps Christian can have another look w/ Valgrind based on these findings? Perhaps compile the generated C manually with no GCC optimizations?

comment:10 Changed 6 years ago by sjamaan

Resolution: worksforme
Status: assignedclosed

After diving in deeper, all signs point towards a GCC bug. When the function is entered, the assembly prologue generated by GCC jumps to an address which gets %al added to it (?). According to the ABI documentation, %al holds the number of saved SSE registers. However, the value of %al is completely out of whack and has largish negative values (it should be a number between 0 and 8).

As Christian and Mario did not run into any crashes with GCC 4.7, I tried upgrading as well (I was running 4.5). With GCC 4.8, everything worked just fine!

Finally, Christian and Mario did report running into crashes with larger argument counts, but that's almost definitely due to the temporary stack overflowing, like Felix theorised above. I think adding an extra check there won't hurt, but at least this ticket itself is bogus.

Note: See TracTickets for help on using tickets.