Bug #18464
closedRUBY_INTERNAL_EVENT_NEWOBJ tracepoint causes an interpreter crash when combined with Ractors
Description
When a Ractor is created whilst a tracepoint for RUBY_INTERNAL_EVENT_NEWOBJ
is active (registered with rb_tracepoint_new
/rb_tracepoint_enabled
), the interpreter crashes with a null pointer dereference with the following backtrace:
[BUG] Segmentation fault at 0x0000000000000000
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin20]
...
-- C level backtrace information -------------------------------------------
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_print_backtrace+0xf) [0x10a15fadd] vm_dump.c:759
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_vm_bugreport) vm_dump.c:1045
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_vm_bugreport) (null):0
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(bug_report_end+0x0) [0x109f96b81] error.c:820
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_bug_for_fatal_signal) error.c:820
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(sigsegv+0x52) [0x10a0be3a2] signal.c:964
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1d) [0x7fff20934d7d]
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(gc_event_hook_body+0x4) [0x109fb9d21] gc.c:2214
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_slowpath) gc.c:2486
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_slowpath_wb_unprotected) gc.c:2507
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_fill+0x0) [0x109fac92e] gc.c:2543
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_of0) gc.c:2553
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_of) gc.c:2552
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_wb_unprotected_newobj_of) gc.c:2567
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(io_alloc+0x12) [0x109fd341c] io.c:1047
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(prep_io) io.c:8483
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(prep_stdio) io.c:8514
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_io_prep_stdin) io.c:8532
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(thread_start_func_2+0xf7) [0x10a1058a7] thread.c:802
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_native_cond_initialize+0x0) [0x10a1055fb] ./thread_pthread.c:1047
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(register_cached_thread_and_wait) ./thread_pthread.c:1099
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(thread_start_func_1) ./thread_pthread.c:1054
/usr/lib/system/libsystem_pthread.dylib(_pthread_start+0xe0) [0x7fff208ef8fc]
(full output is attached).
This seems to be because the new Ractor sets up stdio objects (rb_io_prep_stdin
et. al.), which in turn allocate Ruby objects, before rb_ec_initialize_vm_stack
is called to set up the initial stack frame.
I've attached a patch which works around this by not firing GC event hooks if there is no control frame on the execution context. The patch also includes a test which reproduces the issue using the objspace
extension; creating a Ractor within an ObjectSpace.trace_object_allocations
block is enough to trigger the crash. The patch seems to fix things, but if you folk prefer I can also try swapping around the order of prep_stdio
and rb_ec_initialize_vm_stack
.
Files
Updated by nobu (Nobuyoshi Nakada) almost 3 years ago
- Status changed from Open to Assigned
- Assignee set to ko1 (Koichi Sasada)
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago
Just checked, this is still an issue with 3.2.0-preview1. Is there any feedback on the patch I posted? Any other way you would suggest going about a solution? Thanks!
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago
I opened a PR with this patch. Happy to try fixing it a different way but this at least stops the crash. https://github.com/ruby/ruby/pull/5990
Updated by ivoanjo (Ivo Anjo) about 2 years ago
If it helps, here's a Linux-based backtrace:
-- C level backtrace information -------------------------------------------
/usr/local/lib/libruby.so.3.1(rb_print_backtrace+0x11) [0x7f75e6678aa8] vm_dump.c:759
/usr/local/lib/libruby.so.3.1(rb_vm_bugreport) vm_dump.c:1045
/usr/local/lib/libruby.so.3.1(rb_bug_for_fatal_signal+0xf0) [0x7f75e6477750] error.c:821
/usr/local/lib/libruby.so.3.1(sigsegv+0x49) [0x7f75e65ced19] signal.c:964
/lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7f75e636e140]
/usr/local/lib/libruby.so.3.1(gc_event_hook_body+0x1b) [0x7f75e649010b] gc.c:2217
/usr/local/lib/libruby.so.3.1(gc_enter+0x1f) [0x7f75e64a495f] gc.c:9194
/usr/local/lib/libruby.so.3.1(gc_enter) gc.c:9165
/usr/local/lib/libruby.so.3.1(gc_sweep_continue) gc.c:5743
/usr/local/lib/libruby.so.3.1(heap_prepare) gc.c:2193
/usr/local/lib/libruby.so.3.1(heap_next_freepage) gc.c:2388
/usr/local/lib/libruby.so.3.1(ractor_cache_slots) gc.c:2424
/usr/local/lib/libruby.so.3.1(newobj_slowpath) gc.c:2484
/usr/local/lib/libruby.so.3.1(newobj_slowpath_wb_unprotected) gc.c:2510
/usr/local/lib/libruby.so.3.1(newobj_fill+0x0) [0x7f75e64a4cf9] gc.c:2546
/usr/local/lib/libruby.so.3.1(newobj_of) gc.c:2556
/usr/local/lib/libruby.so.3.1(rb_wb_unprotected_newobj_of) gc.c:2570
/usr/local/lib/libruby.so.3.1(io_alloc+0x5) [0x7f75e64ccbca] io.c:1047
/usr/local/lib/libruby.so.3.1(prep_io) io.c:8479
/usr/local/lib/libruby.so.3.1(prep_stdio) io.c:8510
/usr/local/lib/libruby.so.3.1(rb_io_prep_stdin) io.c:8528
/usr/local/lib/libruby.so.3.1(thread_start_func_2+0x165) [0x7f75e6619965] thread.c:802
/usr/local/lib/libruby.so.3.1(register_cached_thread_and_wait+0x0) [0x7f75e661a6f9] thread_pthread.c:1047
/usr/local/lib/libruby.so.3.1(thread_start_func_1) thread_pthread.c:1054
/lib/x86_64-linux-gnu/libpthread.so.0(0x8ea7) [0x7f75e6362ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f75e607fdef]
Updated by ivoanjo (Ivo Anjo) about 2 years ago
Interestingly, my crash happened on RUBY_INTERNAL_EVENT_GC_ENTER
(you can see my stack includes an attempt to garbage collect) but I believe the fix would work for this situation as well.
Updated by ivoanjo (Ivo Anjo) over 1 year ago
The PR to fix this has been merged ( https://github.com/ruby/ruby/pull/5990 ).
Would it be possible for the fix to be backported to 3.0/3.1/3.2? There's a few features in the ddtrace gem that can trigger this crash and that we've had to disable for these Rubies.
Updated by byroot (Jean Boussier) over 1 year ago
- Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED
The fix was merged as 7bd7aee02e303de27d2cddfc5ef47e612d6782cb
Updated by byroot (Jean Boussier) over 1 year ago
- Status changed from Assigned to Closed
Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago
- Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: REQUIRED
ruby_3_1 bdbe6053853c11ffe9b8737eb4da50ed84c9dbd6 merged revision(s) 7bd7aee02e303de27d2cddfc5ef47e612d6782cb.
Updated by ivoanjo (Ivo Anjo) over 1 year ago
Thank you @nagachika (Tomoyuki Chikanaga)! :) :) :)
Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago
- Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: DONE
ruby_3_2 b422c3523c419b88c6da23a4022ae8864f411b84 merged revision(s) 7bd7aee02e303de27d2cddfc5ef47e612d6782cb.
Updated by ivoanjo (Ivo Anjo) over 1 year ago
Thanks again @nagachika (Tomoyuki Chikanaga)!
Can I bother you with a backport to 3.0 as well? I know that one is getting "long in the tooth" in terms of support, but having it fixed would mean this crash would not happen on any of the Ruby releases which support Ractors (3.0/3.1/3.2/...) which would make our usage of tracepoints in the ddtrace gem simpler :)
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
ivoanjo (Ivo Anjo) wrote in #note-12:
Can I bother you with a backport to 3.0 as well?
Ruby 3.0 is in security maintenance mode, and this does not appear to be a security issue: https://www.ruby-lang.org/en/downloads/branches/
Updated by ivoanjo (Ivo Anjo) over 1 year ago
Aaaahh. It's a shame, but I can understand 😓 . Thanks for the clarification :)