Project

General

Profile

Actions

Bug #20181

closed

Process.wait(-1) doesn't report exited child processes if WAITPID_USE_SIGCHLD is enabled

Added by stanhu (Stan Hu) 4 months ago. Updated 4 months ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [aarch64-linux]
[ruby-core:116183]

Description

From Ruby 2.6 to 3.2, Process.wait(-1) doesn't return in a timely manner if a spawned, detached process is still running. The following script exits immediately with Ruby 3.3, but hangs for 10 minutes (the length of the sleep) in Ruby 2.6 to 3.2:

#!/bin/env ruby

Process.spawn({}, "sh -c 'sleep 600'").tap do |pid|
  puts "detaching PID #{pid}"
  Process.detach(pid)
end

forked_pid = fork do
  loop { sleep 1 }
end

child_waiter = Thread.new do
  puts "Waiting for child process to die..."

  # This works
  # puts Process.wait2(forked_pid)

  # The spawned process has to exit before this returns in Ruby 3.1 and 3.2
  pid, status = Process.wait2(-1)
  puts "Exited PID: #{pid}, status: #{status}"
end

process_killer = Thread.new do
  puts "Killing #{forked_pid}"
  system("kill #{forked_pid}")
end

child_waiter.join
process_killer.join

In Ruby 3.2, we see:

detaching PID 8
Waiting for child process to die...
Killing 11
<process hangs here>

In Ruby 3.3, this exits immediately:

detaching PID 9
Waiting for child process to die...
Killing 11
Exited PID: 11, status: pid 11 SIGTERM (signal 15)

However, if I switch the Process.wait(-1) to Process.wait(forked_pid), Ruby 3.2 works fine.

I've validated that this problem goes away if I disable WAITPID_USE_SIGCHLD:

diff --git a/vm_core.h b/vm_core.h
index 1cc0659700..0e7d1643fe 100644
--- a/vm_core.h
+++ b/vm_core.h
@@ -126,7 +126,7 @@
 #endif
 
 /* define to 0 to test old code path */
-#define WAITPID_USE_SIGCHLD (RUBY_SIGCHLD || SIGCHLD_LOSSY)
+#define WAITPID_USE_SIGCHLD 0
 
 #if defined(SIGSEGV) && defined(HAVE_SIGALTSTACK) && defined(SA_SIGINFO) && !defined(__NetBSD__)
 #  define USE_SIGALTSTACK

This was first reported in the Puma issue tracker (https://github.com/puma/puma/issues/3313), and another contributor documented long-standing issues with Process.wait in the past: https://github.com/dentarg/gists/tree/master/gists/ruby-bug-15499#ruby--puma-bug

In Ruby 2.6, https://github.com/ruby/ruby/commit/054a412d540e7ed2de63d68da753f585ea6616c3 introduced a mechanism for rb_waitpid that uses SIGCHLD for blocking wait calls, and this might have introduced this bug. Ruby 2.5 doesn't appear to have this problem.

In Ruby 3.3, this SIGCHLD implementation was dropped in https://github.com/ruby/ruby/pull/7476 and https://github.com/ruby/ruby/pull/7527, so Ruby 3.3 no longer appears affected.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #19837: Concurrent calls to Process.waitpid2 misbehave on Ruby 3.1 & 3.2ClosedActions
Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0