Bug #18464: RUBY_INTERNAL_EVENT_NEWOBJ tracepoint causes an interpreter crash when combined with Ractors - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #18464

closed

RUBY_INTERNAL_EVENT_NEWOBJ tracepoint causes an interpreter crash when combined with Ractors

Added by kjtsanaktsidis (KJ Tsanaktsidis) over 3 years ago. Updated about 2 years ago.

Status:

Closed

Assignee:

ko1 (Koichi Sasada)

Target version:

ruby -v:

ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin20]

Backport:

2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: DONE

[ruby-core:107005]

Description

When a Ractor is created whilst a tracepoint for RUBY_INTERNAL_EVENT_NEWOBJ is active (registered with rb_tracepoint_new/rb_tracepoint_enabled), the interpreter crashes with a null pointer dereference with the following backtrace:

[BUG] Segmentation fault at 0x0000000000000000
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin20]

...

-- C level backtrace information -------------------------------------------
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_print_backtrace+0xf) [0x10a15fadd] vm_dump.c:759
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_vm_bugreport) vm_dump.c:1045
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_vm_bugreport) (null):0
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(bug_report_end+0x0) [0x109f96b81] error.c:820
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_bug_for_fatal_signal) error.c:820
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(sigsegv+0x52) [0x10a0be3a2] signal.c:964
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1d) [0x7fff20934d7d]
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(gc_event_hook_body+0x4) [0x109fb9d21] gc.c:2214
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_slowpath) gc.c:2486
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_slowpath_wb_unprotected) gc.c:2507
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_fill+0x0) [0x109fac92e] gc.c:2543
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_of0) gc.c:2553
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(newobj_of) gc.c:2552
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_wb_unprotected_newobj_of) gc.c:2567
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(io_alloc+0x12) [0x109fd341c] io.c:1047
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(prep_io) io.c:8483
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(prep_stdio) io.c:8514
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_io_prep_stdin) io.c:8532
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(thread_start_func_2+0xf7) [0x10a1058a7] thread.c:802
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(rb_native_cond_initialize+0x0) [0x10a1055fb] ./thread_pthread.c:1047
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(register_cached_thread_and_wait) ./thread_pthread.c:1099
/Users/ktsanaktsidis/Code/zendesk/ruby/ruby(thread_start_func_1) ./thread_pthread.c:1054
/usr/lib/system/libsystem_pthread.dylib(_pthread_start+0xe0) [0x7fff208ef8fc]

(full output is attached).

This seems to be because the new Ractor sets up stdio objects (rb_io_prep_stdin et. al.), which in turn allocate Ruby objects, before rb_ec_initialize_vm_stack is called to set up the initial stack frame.

I've attached a patch which works around this by not firing GC event hooks if there is no control frame on the execution context. The patch also includes a test which reproduces the issue using the objspace extension; creating a Ractor within an ObjectSpace.trace_object_allocations block is enough to trigger the crash. The patch seems to fix things, but if you folk prefer I can also try swapping around the order of prep_stdio and rb_ec_initialize_vm_stack.

Files

Download all files

0001-Fix-interpreter-crash-caused-by-RUBY_INTERNAL_EVENT_.patch (1.91 KB) 0001-Fix-interpreter-crash-caused-by-RUBY_INTERNAL_EVENT_.patch		kjtsanaktsidis (KJ Tsanaktsidis), 01/08/2022 04:34 AM
crash.log (26.1 KB) crash.log		kjtsanaktsidis (KJ Tsanaktsidis), 01/08/2022 04:35 AM
ruby_2022-01-08-151326_8927-ktsanaktsidis.crash (18.8 KB) ruby_2022-01-08-151326_8927-ktsanaktsidis.crash		kjtsanaktsidis (KJ Tsanaktsidis), 01/08/2022 04:37 AM

Actions

Copy link

#1 [ruby-core:107007]

Updated by nobu (Nobuyoshi Nakada) over 3 years ago

Status changed from Open to Assigned
Assignee set to ko1 (Koichi Sasada)

Actions

Copy link

#2 [ruby-core:108523]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 3 years ago

Just checked, this is still an issue with 3.2.0-preview1. Is there any feedback on the patch I posted? Any other way you would suggest going about a solution? Thanks!

Actions

Copy link

#3 [ruby-core:108806]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 3 years ago

I opened a PR with this patch. Happy to try fixing it a different way but this at least stops the crash. https://github.com/ruby/ruby/pull/5990

Actions

Copy link

#4 [ruby-core:110667]

Updated by ivoanjo (Ivo Anjo) over 2 years ago

If it helps, here's a Linux-based backtrace:

-- C level backtrace information -------------------------------------------
/usr/local/lib/libruby.so.3.1(rb_print_backtrace+0x11) [0x7f75e6678aa8] vm_dump.c:759
/usr/local/lib/libruby.so.3.1(rb_vm_bugreport) vm_dump.c:1045
/usr/local/lib/libruby.so.3.1(rb_bug_for_fatal_signal+0xf0) [0x7f75e6477750] error.c:821
/usr/local/lib/libruby.so.3.1(sigsegv+0x49) [0x7f75e65ced19] signal.c:964
/lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7f75e636e140]
/usr/local/lib/libruby.so.3.1(gc_event_hook_body+0x1b) [0x7f75e649010b] gc.c:2217
/usr/local/lib/libruby.so.3.1(gc_enter+0x1f) [0x7f75e64a495f] gc.c:9194
/usr/local/lib/libruby.so.3.1(gc_enter) gc.c:9165
/usr/local/lib/libruby.so.3.1(gc_sweep_continue) gc.c:5743
/usr/local/lib/libruby.so.3.1(heap_prepare) gc.c:2193
/usr/local/lib/libruby.so.3.1(heap_next_freepage) gc.c:2388
/usr/local/lib/libruby.so.3.1(ractor_cache_slots) gc.c:2424
/usr/local/lib/libruby.so.3.1(newobj_slowpath) gc.c:2484
/usr/local/lib/libruby.so.3.1(newobj_slowpath_wb_unprotected) gc.c:2510
/usr/local/lib/libruby.so.3.1(newobj_fill+0x0) [0x7f75e64a4cf9] gc.c:2546
/usr/local/lib/libruby.so.3.1(newobj_of) gc.c:2556
/usr/local/lib/libruby.so.3.1(rb_wb_unprotected_newobj_of) gc.c:2570
/usr/local/lib/libruby.so.3.1(io_alloc+0x5) [0x7f75e64ccbca] io.c:1047
/usr/local/lib/libruby.so.3.1(prep_io) io.c:8479
/usr/local/lib/libruby.so.3.1(prep_stdio) io.c:8510
/usr/local/lib/libruby.so.3.1(rb_io_prep_stdin) io.c:8528
/usr/local/lib/libruby.so.3.1(thread_start_func_2+0x165) [0x7f75e6619965] thread.c:802
/usr/local/lib/libruby.so.3.1(register_cached_thread_and_wait+0x0) [0x7f75e661a6f9] thread_pthread.c:1047
/usr/local/lib/libruby.so.3.1(thread_start_func_1) thread_pthread.c:1054
/lib/x86_64-linux-gnu/libpthread.so.0(0x8ea7) [0x7f75e6362ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f75e607fdef]

Actions

Copy link

#5 [ruby-core:110704]

Updated by ivoanjo (Ivo Anjo) over 2 years ago

Interestingly, my crash happened on RUBY_INTERNAL_EVENT_GC_ENTER (you can see my stack includes an attempt to garbage collect) but I believe the fix would work for this situation as well.

Actions

Copy link

#6 [ruby-core:112840]

Updated by ivoanjo (Ivo Anjo) over 2 years ago

The PR to fix this has been merged ( https://github.com/ruby/ruby/pull/5990 ).

Would it be possible for the fix to be backported to 3.0/3.1/3.2? There's a few features in the ddtrace gem that can trigger this crash and that we've had to disable for these Rubies.

Actions

Copy link

#7 [ruby-core:112841]

Updated by byroot (Jean Boussier) over 2 years ago

Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED

The fix was merged as 7bd7aee02e303de27d2cddfc5ef47e612d6782cb

Actions

Copy link

Updated by byroot (Jean Boussier) over 2 years ago

Status changed from Assigned to Closed

Actions

Copy link

#9 [ruby-core:113014]

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: REQUIRED

ruby_3_1 bdbe6053853c11ffe9b8737eb4da50ed84c9dbd6 merged revision(s) 7bd7aee02e303de27d2cddfc5ef47e612d6782cb.

Actions

Copy link

#10 [ruby-core:113021]

Updated by ivoanjo (Ivo Anjo) over 2 years ago

Thank you @nagachika (Tomoyuki Chikanaga)! :) :) :)

Actions

Copy link

#11 [ruby-core:114021]

Updated by nagachika (Tomoyuki Chikanaga) about 2 years ago

Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED, 3.1: DONE, 3.2: DONE

ruby_3_2 b422c3523c419b88c6da23a4022ae8864f411b84 merged revision(s) 7bd7aee02e303de27d2cddfc5ef47e612d6782cb.

Actions

Copy link

#12 [ruby-core:114026]

Updated by ivoanjo (Ivo Anjo) about 2 years ago

Thanks again @nagachika (Tomoyuki Chikanaga)!

Can I bother you with a backport to 3.0 as well? I know that one is getting "long in the tooth" in terms of support, but having it fixed would mean this crash would not happen on any of the Ruby releases which support Ractors (3.0/3.1/3.2/...) which would make our usage of tracepoints in the ddtrace gem simpler :)

Actions

Copy link

#13 [ruby-core:114027]