Feature #19322
openSupport spawning "private" child processes
Description
Background¶
The traditional Unix process APIs (fork
etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for SIGCHLD
in the program will be called when the spawned process exits, and even worse, if the parent calls Process.waitpid2(-1)
, it will consume the returned status code, stealing it from the library!
Unfortunately, the practice of responding to SIGCHLD
by calling waitpid2(-1)
in a loop is a pretty common unixism. For example, Unicorn does it here. In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program.
Problem statement¶
Consider the following program.
# Imagine this part of the program is in some top-level application event loop
# or something - similar to how Unicorn works. It detects child processes exiting
# and takes some action (possibly restarting a crashed worker, for example).
Signal.trap(:CHLD) do
loop do
begin
pid, status = Process.waitpid2 -1
puts "Signal handler reaped #{pid} #{status.inspect}"
rescue Errno::ECHILD
puts "Signal handler reaped nothing"
break
end
end
end
# Imagine that _this_ part of the program is buried deep in some gem. It knows
# nothing about the application SIGCHLD handling, and quite possibly the application
# author might not even know this gem spawns a child process to do its work!
require 'open3'
loop do
o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'")
puts "ran command, got #{o.chomp} #{status.inspect}"
end
In current versions of Ruby, some loop iterations will function correctly, and print something like this. The gem gets the Process::Status
object from its command and can know if e.g. it exited abnormally.
ran command, got ohaithar #<Process::Status: pid 1153687 exit 0>
Signal handler reaped nothing
However, other iterations of the loop print this. The signal handler runs and calls Process.waitpid2(-1)
before the code in open3 can do so. Then, the gem code does not get a Process::Status
object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility.
Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0>
Signal handler reaped nothing
ran command, got ohaithar nil
We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to Process.waitpid2(-1)
(or even Process.waitpid2($some_lucky_guess_for_the_pid)
) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the SIGCHLD
signal to avoid the application signal handler needlessly waking up.
Proposed Ruby-level APIs.¶
I propose we create the following new methods in Ruby.
Process.spawn_private
Process.fork_private
These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type Process::PrivateHandle
.
Process::PrivateHandle
would have the following methods:
-
pid()
- returns the pid for the created process -
wait()
- blocks the caller until the created process has exited, and returns aProcess::Status
object. If the handle has already had#wait
called on it, it returns the sameProcess::Status
object as was returned then immediately. This is unlikeProcess.waitpid
and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). -
wait_nonblock()
- if the created process has exited, behaves like#wait
; otherwise, it returns aProcess::Status
object for which#exited?
returns false. -
kill(...)
- if the created process has not been reaped via a call to#wait
, performs identically toProcess.kill ..., pid
. Otherwise, if the process has been reaped, raisesErrno::ESRCH
immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake.
A call to Process.wait
, Process.waitpid
, or Process.waitpid2
will never return a Process::Status
for a process started with a _private
method, even if that call is made with the pid of the child process. The only way to reap a private child process is through Process::PrivateHandle
.
The implementation of IO.popen
, Kernel#system
, Kernel#popen
, backticks, and the Open3
module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe Kernel#system
suffers from the same problem as the open3
example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it)
Implementation strategy¶
I believe this can be implemented, in broad strokes, with an approach like this:
- Keep a global table mapping pids -> handles for processes created with
fork_private
orspawn_private
. - When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to
waitpid
. - If the wait call was made with the handle, we can return the
- Once a handle has had the child status saved against it, it is removed from the table.
- A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call.
In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun (Takashi Kokubun) and @normalperson (Eric Wong) four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a Process.waitpid2(-1)
, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code.
In any case, the way waitpid works currently, is that...
- Ruby actually does all calls to
waitpid
asWNOHANG
(i.e. nonblocking) internally. - If a call to
waitpid
finds no children, it blocks the thread, representing the state in a structure of typestruct waitpid_state
. - Ruby also keeps a list of all
waitpid_state
's that are currently being waited for,vm->waiting_pids
andvm->waiting_grps
. - These structures are protected with a specific mutex,
vm->waitpid_lock
. - Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to
waitpid
(via the two lists) to actually dispatch the reaped status to. - If some caller is waiting for a specific pid, that always takes priority over some other caller that's waiting for a pid-group (e.g.
-1
).
mjit's child process is protected, because:
- When mjit forks, it uses a method
rb_mjit_fork
to do so. - That calls the actual
fork
implementation whilst still holdingvm->waitpid_lock
- Before yielding the lock, it inserts an entry in
vm->waiting_pids
saying that mjit is waiting for the just-created child. - Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children.
I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in process.c
.
POC implementation¶
I sketched out a very rough POC to see if what I said above would be possible, and I think it is:
https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57
The following script behaves how I expect with this patch:
pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69"
puts "pid -> #{pid}"
puts "h -> #{h}"
# should ESRCH.
sleep 2
begin
Process.waitpid2 -1
rescue => e
puts "waitpid err -> #{e}"
end
wpid, status = h.wait
puts "wpid -> #{wpid}"
puts "status -> #{status.inspect}"
ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb
pid -> 1154105
h -> #<Process::PrivateHandle:0x0000ffff94014098>
waitpid err -> No child processes
wpid -> 1154105
status -> #<Process::Status: pid 1154105 exit 4>
The child process can be waited on with the handle, and the call to waitpid2(-1)
finds nothing.
Previous idea: OS-specific handles¶
My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the wait
syscall. In particular, I had believed that we could use the clone
syscall in Linux to create a child process which:
- Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid),
- Would not generate a signal when it exited (i.e. no SIGCHLD).
- Could not be waited on by an unspecting to
waitpid
(except if a special flag__WCLONE
as passed).
Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes back to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API.
So, I don't think we should use pidfds in this proposal.
Motivation¶
My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature).
In my case, I'm so deep in linux specific stuff that just calling clone(2)
from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems.