Project

General

Profile

Actions

Bug #20181

closed

Process.wait(-1) doesn't report exited child processes if WAITPID_USE_SIGCHLD is enabled

Added by stanhu (Stan Hu) 4 months ago. Updated 3 months ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [aarch64-linux]
[ruby-core:116183]

Description

From Ruby 2.6 to 3.2, Process.wait(-1) doesn't return in a timely manner if a spawned, detached process is still running. The following script exits immediately with Ruby 3.3, but hangs for 10 minutes (the length of the sleep) in Ruby 2.6 to 3.2:

#!/bin/env ruby

Process.spawn({}, "sh -c 'sleep 600'").tap do |pid|
  puts "detaching PID #{pid}"
  Process.detach(pid)
end

forked_pid = fork do
  loop { sleep 1 }
end

child_waiter = Thread.new do
  puts "Waiting for child process to die..."

  # This works
  # puts Process.wait2(forked_pid)

  # The spawned process has to exit before this returns in Ruby 3.1 and 3.2
  pid, status = Process.wait2(-1)
  puts "Exited PID: #{pid}, status: #{status}"
end

process_killer = Thread.new do
  puts "Killing #{forked_pid}"
  system("kill #{forked_pid}")
end

child_waiter.join
process_killer.join

In Ruby 3.2, we see:

detaching PID 8
Waiting for child process to die...
Killing 11
<process hangs here>

In Ruby 3.3, this exits immediately:

detaching PID 9
Waiting for child process to die...
Killing 11
Exited PID: 11, status: pid 11 SIGTERM (signal 15)

However, if I switch the Process.wait(-1) to Process.wait(forked_pid), Ruby 3.2 works fine.

I've validated that this problem goes away if I disable WAITPID_USE_SIGCHLD:

diff --git a/vm_core.h b/vm_core.h
index 1cc0659700..0e7d1643fe 100644
--- a/vm_core.h
+++ b/vm_core.h
@@ -126,7 +126,7 @@
 #endif
 
 /* define to 0 to test old code path */
-#define WAITPID_USE_SIGCHLD (RUBY_SIGCHLD || SIGCHLD_LOSSY)
+#define WAITPID_USE_SIGCHLD 0
 
 #if defined(SIGSEGV) && defined(HAVE_SIGALTSTACK) && defined(SA_SIGINFO) && !defined(__NetBSD__)
 #  define USE_SIGALTSTACK

This was first reported in the Puma issue tracker (https://github.com/puma/puma/issues/3313), and another contributor documented long-standing issues with Process.wait in the past: https://github.com/dentarg/gists/tree/master/gists/ruby-bug-15499#ruby--puma-bug

In Ruby 2.6, https://github.com/ruby/ruby/commit/054a412d540e7ed2de63d68da753f585ea6616c3 introduced a mechanism for rb_waitpid that uses SIGCHLD for blocking wait calls, and this might have introduced this bug. Ruby 2.5 doesn't appear to have this problem.

In Ruby 3.3, this SIGCHLD implementation was dropped in https://github.com/ruby/ruby/pull/7476 and https://github.com/ruby/ruby/pull/7527, so Ruby 3.3 no longer appears affected.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #19837: Concurrent calls to Process.waitpid2 misbehave on Ruby 3.1 & 3.2ClosedActions

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 4 months ago

Actually I think this is a duplicate of https://bugs.ruby-lang.org/issues/19837. Does this describe your issue?

The fix for this was backported into the Ruby 3.2 and 3.1 branches, but I don't think a release of either 3.2 or 3.1 has been performed since then. Does the problem go away if you compile Ruby from the ruby_3_2 directly?

Updated by stanhu (Stan Hu) 4 months ago

Yes, thanks, this definitely looks like the same issue. Thanks for filing that issue and getting the patches merged.

I tested ruby_3_2, and it appears that the patch fixes the problem. I thought it wasn't working initially, but I may have been using the wrong Ruby interpreter.

Actions #4

Updated by byroot (Jean Boussier) 3 months ago

  • Related to Bug #19837: Concurrent calls to Process.waitpid2 misbehave on Ruby 3.1 & 3.2 added
Actions #5

Updated by byroot (Jean Boussier) 3 months ago

  • Status changed from Open to Closed
Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0