Bug #21633
openA `rb_thread_call_without_gvl` loop can cause the fiber scheduler to ignore signals.
Description
The gRPC gem calls rb_thread_call_without_gvl
in a loop, and doesn't exit when interrupts are delivered if Thread.handle_interrupt(::SignalException => :never)
is used by the scheduler to create a safe point for asynchronous signal handling.
While this may not be considered a bug in any particular part of the system, the combination of the behaviour creates a situation where gRPC can hang for a long time and ignores SIGINT / SIGTERM.
gRPC Failure Analysis¶
From src/ruby/ext/grpc/rb_completion_queue.c
:
static void unblock_func(void* param) {
next_call_stack* const next_call = (next_call_stack*)param;
next_call->interrupted = 1; // ← SIGINT causes this flag to be set
}
grpc_event rb_completion_queue_pluck(grpc_completion_queue* queue, void* tag,
gpr_timespec deadline,
const char* reason) {
// ...
do {
next_call.interrupted = 0; // ← Reset flag
rb_thread_call_without_gvl(grpc_rb_completion_queue_pluck_no_gil,
(void*)&next_call, unblock_func,
(void*)&next_call);
if (next_call.event.type != GRPC_QUEUE_TIMEOUT) break;
} while (next_call.interrupted); // ← The problem! If interrupted, LOOP AGAIN!
return next_call.event;
}
The loop explicitly retries after interruption, making SIGINT/SIGTERM ineffective.
This might be considered the expected behaviour if Thread.handle_interrupt
is used. However, the goal of Thread.handle_interrupt
in the fiber scheduler is to create a safe point for signal handling, not to prevent them completely. Since this loop never yields back to the scheduler, no such chance exists, and the loop will continue indefinitely.
As rb_thread_call_without_gvl
invokes vm_check_ints_blocking
, one solution is to yield to the scheduler in the case that there are pending interrupts. This gives the scheduler a chance to handle the incoming SIGINT / SIGTERM signals at the safe point.
For a full reproduction of the issue using gRPC: https://github.com/samuel-williams-shopify/grpc-interrupt
For the proposed fix: https://github.com/ruby/ruby/pull/14700
No data to display