Bug #19865
openSegfault when calling user signal handlers during VM shutdown
Description
Howdy 👋! I work for Datadog on the ddtrace
gem . I've found this issue while investigating a customer crash report.
Background¶
The original issue was found in a production app. A number of things need to be in play to cause it.
The ruby-odbc
gem provides a way of accessing databases through the ODBC API. It wraps a database connection as a Data object, with a free function that, prior to freeing the native resources, disconnects from the database if the connection was still active.
Because disconnecting from the database is a blocking operation, the gem (reasonably, in my opinion), releases the global VM lock before disconnecting.
The trigger for the crash is:
- The app in question used puma, and puma installs a
Signal.trap('TERM')
- The database object was still connected when the app started to shut down
- A VM shutdown starts...
- Half-way through shutdown, the VM received a SIGTERM signal, and queued it for processing
- The VM calls the free function on all objects
- The ruby-odbc gem sees there's an active database connection, and tries to release the GVL to call the blocking disconnect
- Before releasing the GVL, the VM checks for pending interruptions
- The VM tries to run the Ruby-level signal handler method half-way through VM shutdown, when you can no longer run Ruby code
- Segfault
How to reproduce (Ruby version & script)¶
I was able to reproduce on a minimal example on Ruby 3.2.2 (ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
) and recent master (ruby 3.3.0dev (2023-08-17T07:30:01Z master d26b015e83) [x86_64-linux]
).
I've put the test-case up on github as well https://github.com/DataDog/signal-bug-testcase, but here's the important bits:
signal-bug-testcase.rb
:
require 'signal_bug_testcase'
Signal.trap("TERM") { puts "Hello, world" }
FOO = SignalBugTestcase.new
signal_bug_testcase.c
:
#include <ruby.h>
#include <ruby/thread.h>
#include <signal.h>
#include <unistd.h>
typedef struct { int dummy; } BugTestcase;
void *test_nogvl(void *unused) {
fprintf(stderr, "GVL released!\n");
return NULL;
}
static void bug_testcase_free(void* ptr) {
fprintf(stderr, "Free getting called! Sending signal...\n");
kill(getpid(), SIGTERM);
fprintf(stderr, "SIGTERM signal queued, trying to release GVL...\n");
rb_thread_call_without_gvl(test_nogvl, NULL, NULL, NULL);
fprintf(stderr, "After releasing GVL!\n");
free(ptr);
}
const rb_data_type_t bug_testcase_data_type = {
.wrap_struct_name = "SignalBugTestcase",
.function = { NULL, bug_testcase_free, NULL },
.flags = RUBY_TYPED_FREE_IMMEDIATELY
};
VALUE bug_testcase_alloc(VALUE klass) {
BugTestcase* obj = calloc(1, sizeof(BugTestcase));
return TypedData_Make_Struct(klass, BugTestcase, &bug_testcase_data_type, obj);
}
void Init_signal_bug_testcase(void) {
VALUE cBugTestcase = rb_define_class("SignalBugTestcase", rb_cObject);
rb_define_alloc_func(cBugTestcase, bug_testcase_alloc);
}
Expectation and result¶
No segfault happens.
Interestingly, on Ruby 2.7, the VM exits half-way through but doesn't always segfault, but running it a few times always triggers the issue. On 3.2+ it crashes every time for me.
I suspect the right thing here is to no longer accept/try to run any Ruby-level signal handlers after VM shutdown starts.
Here's what I see with this test-case:
$ bundle exec ruby lib/signal-bug-testcase.rb
Free getting called! Sending signal...
SIGTERM signal queued, trying to release GVL...
lib/signal-bug-testcase.rb:3: [BUG] Segmentation fault at 0x0000000000000007
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0003 p:---- s:0011 e:000010 CFUNC :puts
c:0002 p:0005 s:0006 e:000005 BLOCK lib/signal-bug-testcase.rb:3 [FINISH]
c:0001 p:0000 s:0003 E:0001a0 DUMMY [FINISH]
-- Ruby level backtrace information ----------------------------------------
lib/signal-bug-testcase.rb:3:in `block in <main>'
lib/signal-bug-testcase.rb:3:in `puts'
-- Machine register context ------------------------------------------------
RIP: 0x000070aa64cedbe7 RBP: 0x000070aa648e8fd0 RSP: 0x00007ffc5c057608
RAX: 0x0000000000004171 RBX: 0x00007ffc5c057630 RCX: 0x0000000000000001
RDX: 0x00007ffc5c057630 RDI: 0x0000000000000007 RSI: 0x0000000000004171
R8: 0x000000000000021b R9: 0x0000000000000000 R10: 0x000070aa63eff048
R11: 0x0000000000000000 R12: 0x000070aa648e8fd0 R13: 0x0000000000004171
R14: 0x0000000000000007 R15: 0x000070aa648e8ff0 EFL: 0x0000000000010202
-- C level backtrace information -------------------------------------------
ruby-3.2.2/lib/libruby.so.3.2(rb_print_backtrace+0xd) [0x70aa64d5bb5f] ruby-3.2.2/vm_dump.c:785
ruby-3.2.2/lib/libruby.so.3.2(rb_vm_bugreport) ruby-3.2.2/vm_dump.c:1080
ruby-3.2.2/lib/libruby.so.3.2(rb_bug_for_fatal_signal+0xf4) [0x70aa64b52164] ruby-3.2.2/error.c:813
ruby-3.2.2/lib/libruby.so.3.2(sigsegv+0x4d) [0x70aa64cab0fd] ruby-3.2.2/signal.c:964
/lib/x86_64-linux-gnu/libc.so.6(0x70aa64642520) [0x70aa64642520]
ruby-3.2.2/lib/libruby.so.3.2(hash_table_index+0x0) [0x70aa64cedbe7] ruby-3.2.2/symbol.h:72
ruby-3.2.2/lib/libruby.so.3.2(rb_id_table_lookup) ruby-3.2.2/id_table.c:230
ruby-3.2.2/lib/libruby.so.3.2(cached_callable_method_entry+0x24) [0x70aa64d356bb] ruby-3.2.2/vm_method.c:1295
ruby-3.2.2/lib/libruby.so.3.2(callable_method_entry_or_negative) ruby-3.2.2/vm_method.c:1365
ruby-3.2.2/lib/libruby.so.3.2(callable_method_entry) ruby-3.2.2/vm_method.c:1402
ruby-3.2.2/lib/libruby.so.3.2(rb_callable_method_entry) ruby-3.2.2/vm_method.c:1409
ruby-3.2.2/lib/libruby.so.3.2(gccct_method_search_slowpath+0x38) [0x70aa64d36258] ruby-3.2.2/vm_eval.c:434
ruby-3.2.2/lib/libruby.so.3.2(rb_call0+0x267) [0x70aa64d4e877] ruby-3.2.2/vm_eval.c:483
ruby-3.2.2/lib/libruby.so.3.2(rb_call+0x32) [0x70aa64d4f406] ruby-3.2.2/vm_eval.c:877
ruby-3.2.2/lib/libruby.so.3.2(rb_funcallv_kw) ruby-3.2.2/vm_eval.c:1074
ruby-3.2.2/lib/libruby.so.3.2(vm_call_cfunc_with_frame+0x127) [0x70aa64d30277] ruby-3.2.2/vm_insnhelper.c:3268
ruby-3.2.2/lib/libruby.so.3.2(vm_sendish+0x97) [0x70aa64d407a4] ruby-3.2.2/vm_insnhelper.c:5080
ruby-3.2.2/lib/libruby.so.3.2(vm_exec_core) ruby-3.2.2/insns.def:820
ruby-3.2.2/lib/libruby.so.3.2(rb_vm_exec+0xd3) [0x70aa64d460d3] ruby-3.2.2/vm.c:2374
ruby-3.2.2/lib/libruby.so.3.2(rb_vm_invoke_proc+0x5f) [0x70aa64d4bfcf] ruby-3.2.2/vm.c:1603
ruby-3.2.2/lib/libruby.so.3.2(vm_call0_body+0x5df) [0x70aa64d4c5cf] ruby-3.2.2/vm_eval.c:274
ruby-3.2.2/lib/libruby.so.3.2(vm_call0_cc+0x77) [0x70aa64d4e7e7] ruby-3.2.2/vm_eval.c:87
ruby-3.2.2/lib/libruby.so.3.2(rb_call0) ruby-3.2.2/vm_eval.c:551
ruby-3.2.2/lib/libruby.so.3.2(rb_call+0x32) [0x70aa64d4f406] ruby-3.2.2/vm_eval.c:877
ruby-3.2.2/lib/libruby.so.3.2(rb_funcallv_kw) ruby-3.2.2/vm_eval.c:1074
ruby-3.2.2/lib/libruby.so.3.2(rb_eval_cmd_kw+0x142) [0x70aa64d4f562] ruby-3.2.2/vm_eval.c:1923
ruby-3.2.2/lib/libruby.so.3.2(signal_exec+0xf6) [0x70aa64caae16] ruby-3.2.2/signal.c:1064
ruby-3.2.2/lib/libruby.so.3.2(rb_threadptr_execute_interrupts+0x36b) [0x70aa64cf7078] ruby-3.2.2/thread.c:2334
ruby-3.2.2/lib/libruby.so.3.2(rb_threadptr_execute_interrupts) ruby-3.2.2/thread.c:2291
ruby-3.2.2/lib/libruby.so.3.2(rb_vm_check_ints+0xb) [0x70aa64cf7ac5] ruby-3.2.2/vm_core.h:1994
ruby-3.2.2/lib/libruby.so.3.2(rb_vm_check_ints) ruby-3.2.2/vm_core.h:1985
ruby-3.2.2/lib/libruby.so.3.2(unblock_function_set) ruby-3.2.2/thread.c:320
ruby-3.2.2/lib/libruby.so.3.2(blocking_region_begin) ruby-3.2.2/thread.c:1485
ruby-3.2.2/lib/libruby.so.3.2(rb_nogvl+0xbf) [0x70aa64cf90cf] ruby-3.2.2/thread.c:1548
signal-bug-testcase-2/lib/signal_bug_testcase.so(fprintf+0x0) [0x70aa6518f299] ../../../../ext/signal_bug_testcase/signal_bug_testcase.c:17
signal-bug-testcase-2/lib/signal_bug_testcase.so(bug_testcase_free) ../../../../ext/signal_bug_testcase/signal_bug_testcase.c:18
ruby-3.2.2/lib/libruby.so.3.2(run_final+0xf) [0x70aa64b73172] ruby-3.2.2/gc.c:4388
ruby-3.2.2/lib/libruby.so.3.2(finalize_list) ruby-3.2.2/gc.c:4407
ruby-3.2.2/lib/libruby.so.3.2(finalize_deferred_heap_pages) ruby-3.2.2/gc.c:4436
ruby-3.2.2/lib/libruby.so.3.2(rb_objspace_call_finalizer+0x350) [0x70aa64b80d70] ruby-3.2.2/gc.c:4573
ruby-3.2.2/lib/libruby.so.3.2(rb_ec_finalize+0x2a) [0x70aa64b5d6d1] ruby-3.2.2/eval.c:168
ruby-3.2.2/lib/libruby.so.3.2(rb_ec_cleanup) ruby-3.2.2/eval.c:262
ruby-3.2.2/lib/libruby.so.3.2(ruby_run_node+0x9d) [0x70aa64b5d91d] ruby-3.2.2/eval.c:330
ruby-3.2.2/bin/ruby(rb_main+0x21) [0x5d5d1295f187] ./main.c:38
ruby-3.2.2/bin/ruby(main) ./main.c:57
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_call_main+0x80) [0x70aa64629d90] ../sysdeps/nptl/libc_start_call_main.h:58
/lib/x86_64-linux-gnu/libc.so.6(call_init+0x0) [0x70aa64629e40] ../csu/libc-start.c:392
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main_impl) ../csu/libc-start.c:379
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main) (null):0
[0x5d5d1295f1d5]
Files