Project

General

Profile

Actions

Feature #19322

open

Support spawning "private" child processes

Added by kjtsanaktsidis (KJ Tsanaktsidis) almost 2 years ago. Updated almost 2 years ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:111712]

Description

Background

The traditional Unix process APIs (fork etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for SIGCHLD in the program will be called when the spawned process exits, and even worse, if the parent calls Process.waitpid2(-1), it will consume the returned status code, stealing it from the library!

Unfortunately, the practice of responding to SIGCHLD by calling waitpid2(-1) in a loop is a pretty common unixism. For example, Unicorn does it here. In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program.

Problem statement

Consider the following program.

# Imagine this part of the program is in some top-level application event loop
# or something - similar to how Unicorn works. It detects child processes exiting
# and takes some action (possibly restarting a crashed worker, for example).
Signal.trap(:CHLD) do
  loop do
    begin
      pid, status = Process.waitpid2 -1
      puts "Signal handler reaped #{pid} #{status.inspect}"
    rescue Errno::ECHILD
      puts "Signal handler reaped nothing"
      break
    end
  end
end

# Imagine that _this_ part of the program is buried deep in some gem. It knows
# nothing about the application SIGCHLD handling, and quite possibly the application
# author might not even know this gem spawns a child process to do its work!
require 'open3'
loop do
  o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'")
  puts "ran command, got #{o.chomp} #{status.inspect}"
end

In current versions of Ruby, some loop iterations will function correctly, and print something like this. The gem gets the Process::Status object from its command and can know if e.g. it exited abnormally.

ran command, got ohaithar #<Process::Status: pid 1153687 exit 0>
Signal handler reaped nothing

However, other iterations of the loop print this. The signal handler runs and calls Process.waitpid2(-1) before the code in open3 can do so. Then, the gem code does not get a Process::Status object! This is also potentially bad for the application; it reaped a child process it didn't even know existed, and it might cause some surprising bugs if the application author didn't know this was a possibility.

Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0>
Signal handler reaped nothing
ran command, got ohaithar nil

We would like a family of APIs which allow a gem to spawn a child process and guarantees that the gem can wait on it. Some concurrent call to Process.waitpid2(-1) (or even Process.waitpid2($some_lucky_guess_for_the_pid)) should not steal the status out from underneath the code which created the process. Ideally, we should even suppress the SIGCHLD signal to avoid the application signal handler needlessly waking up.

Proposed Ruby-level APIs.

I propose we create the following new methods in Ruby.

  • Process.spawn_private
  • Process.fork_private

These methods behave identically to their non-_private versions in all respect, except instead of returning a pid, they return an object of type Process::PrivateHandle.

Process::PrivateHandle would have the following methods:

  • pid() - returns the pid for the created process
  • wait() - blocks the caller until the created process has exited, and returns a Process::Status object. If the handle has already had #wait called on it, it returns the same Process::Status object as was returned then immediately. This is unlike Process.waitpid and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid).
  • wait_nonblock() - if the created process has exited, behaves like #wait; otherwise, it returns a Process::Status object for which #exited? returns false.
  • kill(...) - if the created process has not been reaped via a call to #wait, performs identically to Process.kill ..., pid. Otherwise, if the process has been reaped, raises Errno::ESRCH immediately without issuing a system call. This ensures that, if pids wrap around, that the wrong process is not signaled by mistake.

A call to Process.wait, Process.waitpid, or Process.waitpid2 will never return a Process::Status for a process started with a _private method, even if that call is made with the pid of the child process. The only way to reap a private child process is through Process::PrivateHandle.

The implementation of IO.popen, Kernel#system, Kernel#popen, backticks, and the Open3 module would be changed to use this private process mechanism internally, although they do not return pids so they do not need to have their interfaces changed. (note though - I don't believe Kernel#system suffers from the same problem as the open3 example above, because it does not yield the GVL nor check interrupts in between spawning the child and waiting on it)

Implementation strategy

I believe this can be implemented, in broad strokes, with an approach like this:

  • Keep a global table mapping pids -> handles for processes created with fork_private or spawn_private.
  • When a child process is waited on, consult the handle table. If there is a handle registered, and the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, and repeat the call to waitpid.
  • If the wait call was made with the handle, we can return the
  • Once a handle has had the child status saved against it, it is removed from the table.
  • A subsequent call to wait on that pi the handle will look up the saved information and return it without making a system call.

In fact, most of the infrastructure to do this correctly is already in place - it was added by @k0kubun (Takashi Kokubun) and @normalperson (Eric Wong) four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had a similar problem to the one described in this issue; it needs to fork a C compiler, but if the application performs a Process.waitpid2(-1), it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code.

In any case, the way waitpid works currently, is that...

  • Ruby actually does all calls to waitpid as WNOHANG (i.e. nonblocking) internally.
  • If a call to waitpid finds no children, it blocks the thread, representing the state in a structure of type struct waitpid_state.
  • Ruby also keeps a list of all waitpid_state's that are currently being waited for, vm->waiting_pids and vm->waiting_grps.
  • These structures are protected with a specific mutex, vm->waitpid_lock.
  • Ruby internally uses the SIGCHLD signal to reap the dead children, and then find a waiting call to waitpid (via the two lists) to actually dispatch the reaped status to.
  • If some caller is waiting for a specific pid, that always takes priority over some other caller that's waiting for a pid-group (e.g. -1).

mjit's child process is protected, because:

  • When mjit forks, it uses a method rb_mjit_fork to do so.
  • That calls the actual fork implementation whilst still holding vm->waitpid_lock
  • Before yielding the lock, it inserts an entry in vm->waiting_pids saying that mjit is waiting for the just-created child.
  • Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children.

I believe this mechanism can be extended and generalised to power the proposed API, and mjit could itself use that rather than having mjit-specific handling in process.c.

POC implementation

I sketched out a very rough POC to see if what I said above would be possible, and I think it is:

https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57

The following script behaves how I expect with this patch:

pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69"
puts "pid -> #{pid}"
puts "h -> #{h}"

# should ESRCH.
sleep 2
begin
    Process.waitpid2 -1
rescue => e
    puts "waitpid err -> #{e}"
end
wpid, status = h.wait
puts "wpid -> #{wpid}"
puts "status -> #{status.inspect}"
ktsanaktsidis@lima-linux1 ruby % ./tool/runruby.rb -- ./tst1.rb
pid -> 1154105
h -> #<Process::PrivateHandle:0x0000ffff94014098>
waitpid err -> No child processes
wpid -> 1154105
status -> #<Process::Status: pid 1154105 exit 4>

The child process can be waited on with the handle, and the call to waitpid2(-1) finds nothing.

Previous idea: OS-specific handles

My first version of this proposal involved a similar API, but powering it with platform-specific concepts available on Linux, Windows, and FreeBSD which offer richer control than just pids & the wait syscall. In particular, I had believed that we could use the clone syscall in Linux to create a child process which:

  • Could be referred to by a unique file descriptor (a pidfd) which would be guaranteed never to be re-used (unlike a pid),
  • Would not generate a signal when it exited (i.e. no SIGCHLD).
  • Could not be waited on by an unspecting to waitpid (except if a special flag __WCLONE as passed).

Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a process - BUT, when the process exec's, it goes back to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads in Linux, rather than being a general-purpose "hidden process" API.

So, I don't think we should use pidfds in this proposal.

Motivation

My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature).

In my case, I'm so deep in linux specific stuff that just calling clone(2) from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0