Bug #10626
closedBUS error from nesting lambda's and calls to methods defined with define_method
Description
I get a BUS error from executing the following Ruby program: https://gist.github.com/jaroslawr/8579678d7c68a49208f0
I am on Gentoo Linux and Ruby 2.1.5, and have also tried Ruby 2.1.4, 2.1.3, ..., down to 2.1.0. My colleagues The problem seems to lie in rapidly consuming stack space, and goes away when the stack size limit is increased with ulimit -s. For the real world context behind this, see the corresponding Rails issue I opened:
Updated by jaroslawr (Jarosław Rzeszótko) almost 10 years ago
By the way, even more strange things happen if you replace the simple call to test with something like:
t1 = Thread.new { test }
t2 = Thread.new { test }
t1.join
t2.join
I get then the following error:
test.rb:6: [BUG] object allocation during garbage collection phase
Although for me this part is a purely theoretical exercise.
Updated by nobu (Nobuyoshi Nakada) almost 10 years ago
- Is duplicate of Bug #10460: Segfault instead of stack level too deep added
Updated by jaroslawr (Jarosław Rzeszótko) almost 10 years ago
I have seen #10460 and would not say this is an obvious duplicate. Here, it is not only that you do not get a stack overflow, it seems that the C stack grows up abnormally quickly when you nest lambdas and method calls, compared to just nesting method calls. If I run Ruby under GDB, the number of stack frames is not really all that big, e.g. Ruby can in other situations often easily handle stacks 3-4 times as big in terms of pure number of call frames, both on the Ruby and on the C level.
Also, the symptomps (error message, backtrace etc.) are different than what people report in #10460.
Updated by jaroslawr (Jarosław Rzeszótko) almost 10 years ago
Here is a stripped down and easier to understand test case:
https://gist.github.com/anonymous/a2a784c9f37b1fc6b753
Basically, the bigger the M, the lower N is needed to trigger the crash. On my computer, just nesting 100 lambdas is enough to trigger a crash if you allocate a lot of memory at the same time.
Updated by jaroslawr (Jarosław Rzeszótko) almost 10 years ago
Some more findings, you can run the above test case under gdb like this:
# gdb ruby
(gdb) set disable-randomization off
(gdb) run test.rb
The test program does not crash with randomization disabled in gdb, nor does it crash when run under valgrind. Where the program crashes varies from run to run, sometimes it does not crash at all. At the assembly level, it always crashes on this call:
call 0xb75169a0 <__x86.get_pc_thunk.bx>
Which is basically (http://gcc.gnu.org/ml/gcc-help/2010-12/msg00131.html):
movl (%esp), %e##reg;
And indeed, in info registers I get for example:
esp 0xbfc08fe0 0xbfc08fe0
And then:
(gdb) x 0xbfc08fe0
0xbfc08fe0: Cannot access memory at address 0xbfc08fe0
So the stack pointer is somehow broken. In this case the start of the stack is:
(gdb) proc stat
...
Start of stack: 0xbfc3cd90
Doing the math, the stack in total occupies:
((0xbfc3cd90 - 0xbfc08fe0) words * 4 bytes) / 1024 bytes = 829 kbytes
Which is way lower than the default ulimit -s of 8192 bytes.
Updated by jaroslawr (Jarosław Rzeszótko) almost 10 years ago
... way lower than the default ulimit -s of 8192 kilobytes. Wish this bugtracker supported editing ^^
Updated by jaroslawr (Jarosław Rzeszótko) almost 10 years ago
Given that this is a BUS error, here is perhaps a particularly interesting backtrace you can get if you are "lucky":
#0 0xaf9e3187 in _int_memalign (av=av@entry=0xafb00420 <main_arena>, alignment=alignment@entry=16384, bytes=bytes@entry=16364) at malloc.c:4359
#1 0xaf9e42e1 in _mid_memalign (alignment=alignment@entry=16384, bytes=bytes@entry=16364, address=0xafcd6acc <heap_assign_page+188>) at malloc.c:3095
#2 0xaf9e5d6d in __posix_memalign (memptr=memptr@entry=0xbfacd0c0, alignment=alignment@entry=16384, size=size@entry=16364) at malloc.c:4980
#3 0xafcd6acc in aligned_malloc (size=16364, alignment=16384) at gc.c:5909
#4 heap_page_allocate (objspace=0xb1381e90) at gc.c:1035
#5 heap_page_create (objspace=0xb1381e90) at gc.c:1121
#6 heap_assign_page (objspace=0xb1381e90, heap=0xb1381e98) at gc.c:1143
#7 0xafcdafdf in heap_increment (heap=0xb1381e98, objspace=0xb1381e90) at gc.c:1191
#8 heap_prepare_freepage (heap=0xb1381e98, objspace=0xb1381e90) at gc.c:1212
#9 heap_get_freeobj_from_next_freepage (heap=0xb1381e98, objspace=0xb1381e90) at gc.c:1237
#10 heap_get_freeobj (heap=0xb1381e98, objspace=0xb1381e90) at gc.c:1259
#11 newobj_of (klass=klass@entry=2973259080, flags=flags@entry=36, v1=v1@entry=0, v2=v2@entry=0, v3=v3@entry=0) at gc.c:1303
#12 0xafcdb127 in rb_newobj_of (klass=2973259080, flags=flags@entry=36) at gc.c:1356
#13 0xafd1f813 in rb_float_new_in_heap (d=0.61034828073270286) at numeric.c:639
#14 0xafd5fbd0 in rb_float_new_inline (d=<optimized out>) at internal.h:591
#15 rb_f_rand (argc=0, argv=0xaf6efa70, obj=2973266420) at random.c:1212
#16 0xafe1da7e in call_cfunc_m1 (func=0xafd5fa80 <rb_f_rand>, recv=2973266420, argc=0, argv=0xaf6efa70) at vm_insnhelper.c:1317
#17 0xafe24491 in vm_call_cfunc_with_frame (th=0xb1381be0, reg_cfp=0xaf76c678, ci=0xb15bf7b0) at vm_insnhelper.c:1489
#18 0xafe2c5e7 in vm_exec_core (th=0xb1381be0, initial=initial@entry=0) at insns.def:1028
#19 0xafe328f7 in vm_exec (th=th@entry=0xb1381be0) at vm.c:1398
#20 0xafe25ab5 in invoke_block_from_c (th=<optimized out>, block=<optimized out>, self=2973266420, argc=argc@entry=1, argv=argv@entry=0xbfacd870,
blockptr=blockptr@entry=0x0, cref=cref@entry=0x0, defined_class=2973268920) at vm.c:817
#21 0xafe3b238 in vm_yield (argv=<optimized out>, argc=<optimized out>, th=<optimized out>) at vm.c:856
#22 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:938
#23 rb_yield (val=697) at vm_eval.c:948
#24 0xafc5c8db in rb_ary_collect (ary=2796886860) at array.c:2677
#25 0xafe1da9e in call_cfunc_0 (func=0xafc5c880 <rb_ary_collect>, recv=2796886860, argc=0, argv=0xaf6efa5c) at vm_insnhelper.c:1323
#26 0xafe24491 in vm_call_cfunc_with_frame (th=0xb1381be0, reg_cfp=0xaf76c6c8, ci=0xb15bfcf8) at vm_insnhelper.c:1489
#27 0xafe2cd8b in vm_exec_core (th=0xb1381be0, initial=initial@entry=0) at insns.def:999
#28 0xafe328f7 in vm_exec (th=th@entry=0xb1381be0) at vm.c:1398
#29 0xafe25ab5 in invoke_block_from_c (th=th@entry=0xb1381be0, block=block@entry=0xb15c9b68, self=2973266420, argc=0, argv=0xaf6efa48, blockptr=0x0, cref=cref@entry=0x0,
defined_class=2973268920) at vm.c:817
#30 0xafe26964 in vm_invoke_proc (th=0xb1381be0, proc=0xb15c9b68, self=2973266420, defined_class=2973268920, argc=0, argv=0xaf6efa48, blockptr=0x0) at vm.c:881
#31 0xafe26a1a in rb_vm_invoke_proc (th=<optimized out>, proc=<optimized out>, proc@entry=0xb15c9b68, argc=argc@entry=0, argv=argv@entry=0xaf6efa48, blockptr=0x0)
at vm.c:900
#32 0xafcc0dcd in proc_call (argc=0, argv=0xaf6efa48, procval=2975625740) at proc.c:713
#33 0xafe1da7e in call_cfunc_m1 (func=0xafcc0d70 <proc_call>, recv=2975625740, argc=0, argv=0xaf6efa48) at vm_insnhelper.c:1317
#34 0xafe24491 in vm_call_cfunc_with_frame (th=0xb1381be0, reg_cfp=0xaf76c718, ci=0xb15bfd38) at vm_insnhelper.c:1489
#35 0xafe2c5e7 in vm_exec_core (th=0xb1381be0, initial=initial@entry=0) at insns.def:1028
As always, in this case too the stack pointer (%esp) is pointing to an invalid address, for what it is worth.
Updated by jaroslawr (Jarosław Rzeszótko) almost 10 years ago
A still more simple test case for apparently the same problem:
https://gist.github.com/anonymous/a86f5eb0198acc10ae1e
It really isn't simply an unhandled stack overflow. If you decrease the number of allocations, the program runs just fine.
Updated by jaroslawr (Jarosław Rzeszótko) almost 10 years ago
Maybe someone can now rename this issue to a name better reflecting the actual problem, it seems like a pretty general memory allocation bug that can causes many different code patterns to produce a crash. I also have reproduced the same issue on Ruby 2.2.0-rc1.
Sorry for the large amount of somewhat disorganized writing, I have spent a huge amount of time debugging this issue starting from a complex Rails app, would very much like to find out what is at the bottom of this, and it's still an ongoing research.
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
- Status changed from Open to Feedback
I tried the last two gists with many Ruby versions (1.9-2.7) and could not produce a crash. These were compiled with clang 7 on OpenBSD. It's possible with a different compiler and compiler options the results would be different. Can you get your examples to crash with a currently supported version of Ruby?
Updated by jeremyevans0 (Jeremy Evans) about 5 years ago
- Status changed from Feedback to Closed