Project

General

Profile

Actions

Bug #21633

open

A `rb_thread_call_without_gvl` loop can cause the fiber scheduler to ignore signals.

Added by ioquatix (Samuel Williams) about 3 hours ago.

Status:
Open
Target version:
-
[ruby-core:<unknown>]

Description

The gRPC gem calls rb_thread_call_without_gvl in a loop, and doesn't exit when interrupts are delivered if Thread.handle_interrupt(::SignalException => :never) is used by the scheduler to create a safe point for asynchronous signal handling.

While this may not be considered a bug in any particular part of the system, the combination of the behaviour creates a situation where gRPC can hang for a long time and ignores SIGINT / SIGTERM.

gRPC Failure Analysis

From src/ruby/ext/grpc/rb_completion_queue.c:

static void unblock_func(void* param) {
  next_call_stack* const next_call = (next_call_stack*)param;
  next_call->interrupted = 1;  // ← SIGINT causes this flag to be set
}

grpc_event rb_completion_queue_pluck(grpc_completion_queue* queue, void* tag,
                                     gpr_timespec deadline,
                                     const char* reason) {
  // ...
  do {
    next_call.interrupted = 0;  // ← Reset flag
    
    rb_thread_call_without_gvl(grpc_rb_completion_queue_pluck_no_gil,
                               (void*)&next_call, unblock_func,
                               (void*)&next_call);
    
    if (next_call.event.type != GRPC_QUEUE_TIMEOUT) break;
  } while (next_call.interrupted);  // ← The problem! If interrupted, LOOP AGAIN!
  
  return next_call.event;
}

The loop explicitly retries after interruption, making SIGINT/SIGTERM ineffective.

This might be considered the expected behaviour if Thread.handle_interrupt is used. However, the goal of Thread.handle_interrupt in the fiber scheduler is to create a safe point for signal handling, not to prevent them completely. Since this loop never yields back to the scheduler, no such chance exists, and the loop will continue indefinitely.

As rb_thread_call_without_gvl invokes vm_check_ints_blocking, one solution is to yield to the scheduler in the case that there are pending interrupts. This gives the scheduler a chance to handle the incoming SIGINT / SIGTERM signals at the safe point.

For a full reproduction of the issue using gRPC: https://github.com/samuel-williams-shopify/grpc-interrupt

For the proposed fix: https://github.com/ruby/ruby/pull/14700

No data to display

Actions

Also available in: Atom PDF

Like0