Bug #20017
closed3.3.0dev `rb_thread_profile_frames` crashes when `RUBY_MN_THREADS=1`
Description
I discovered this while running our internal CI with MaNy enabled, our application crash when trying to profile with StackProf (with postponed jobs disabled):
[BUG] Segmentation fault at 0x0000000000000010
ruby 3.3.0dev (2023-11-19T03:01:05Z shopify 9aee12cc28) +MN [x86_64-linux]
-- C level backtrace information -------------------------------------------
/usr/local/ruby/bin/ruby(rb_print_backtrace+0x14) [0x55c8f16333e1] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/vm_dump.c:812
/usr/local/ruby/bin/ruby(rb_vm_bugreport) /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/vm_dump.c:1143
/usr/local/ruby/bin/ruby(rb_bug_for_fatal_signal+0xfc) [0x55c8f17e559c] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/error.c:1065
/usr/local/ruby/bin/ruby(sigsegv+0x4d) [0x55c8f158091d] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/signal.c:920
/lib/x86_64-linux-gnu/libc.so.6(0x7ff3db333520) [0x7ff3db333520]
/usr/local/ruby/bin/ruby(thread_profile_frames+0x10) [0x55c8f162efd0] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/vm_backtrace.c:1587
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_buffer_sample+0x28) [0x7ff3b7d2435c] /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:622
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_buffer_sample) /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:604
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_buffer_sample) (null):0
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_signal_handler+0x5) [0x7ff3b7d24545] /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:767
/tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/lib/stackprof/stackprof.so(stackprof_signal_handler) /tmp/bundle/ruby/3.3.0+0/gems/stackprof-0.2.25/ext/stackprof/stackprof.c:722
/lib/x86_64-linux-gnu/libc.so.6(0x7ff3db333520) [0x7ff3db333520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x24a) [0x7ff3db384a7a]
/usr/local/ruby/bin/ruby(rb_native_cond_wait+0xb) [0x55c8f15c8cbb] /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/thread_pthread.c:214
/usr/local/ruby/bin/ruby(ractor_sched_deq) /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/thread_pthread.c:1230
/usr/local/ruby/bin/ruby(nt_start) /tmp/ruby-build/ruby-3.3.0-9aee12cc28cbca40306784e54e38558688caa9f7/thread_pthread.c:2209
For additional context, as far as I know rb_thread_profile_frames
isn't officially async signal safe, but it happens to be since 3.0 for the interpreter, and 3.3 for YJIT, so StackProf tries not to use postponed jobs for profiling when it can as it leads to much more accurate profiling results.
Updated by jhawthorn (John Hawthorn) 11 months ago
I opened a PR with a proposal to fix this https://github.com/ruby/ruby/pull/9310
The issue is that under M:N threads the current thread's EC can become NULL (when running on a shared native thread if there are no more threads to schedule). This PR fixes the issue by returning 0 in that case.
Updated by jhawthorn (John Hawthorn) 11 months ago
- Status changed from Open to Closed
Applied in changeset git|ffa5f16273f46c97bfca56e4549b0b38b9322d63.
Make rb_profile_frames return 0 for NULL ec
When using M:N threads, EC is set to NULL in the shared native thread
when nothing is scheduled. This previously caused a segfault when we try
to examine the EC.
Returning 0 instead means we may miss profiling information, but a
profiler relying on this isn't thread aware anyways, and observing that
"nothing" is running is probably correct.
Fixes [Bug #20017]
Co-authored-by: Dustin Brown dbrown9@gmail.com