Project

General

Profile

Feature #19322

Updated by kjtsanaktsidis (KJ Tsanaktsidis) almost 2 years ago

## Background 

 The traditional Unix process APIs (`fork` etc) are poorly isolated. If a library spawns a child process, this is not transparent to the program using the library. Any signal handler for `SIGCHLD` in the program will be called when the spawned process exits, and even worse, if the parent calls `Process.waitpid2(-1)`, it will consume the returned status code, stealing it from the library! 

 Unfortunately, the practice of responding to `SIGCHLD` by calling `waitpid2(-1)` in a loop is a pretty common unixism. For example, Unicorn does it [here](https://yhbt.net/unicorn.git/tree/lib/unicorn/http_server.rb#n401). In short, there is no reliable way for a gem to spawn a child process in a way that can’t (unintentionally) be interfered with by other parts of the program. 

 ## Problem statement Existing solutions in OS’s 

 Consider Several operating systems provide an improved API for spawning child processes which are fully isolated; that is, they do not generate `SIGCHLD` signals in the following program. program, and are invisible to calls to `waitpid(2)` 

 ```ruby 
 # Imagine this part * On Linux, such invisible processes can be made by calling `clone(2)` with a zero value in the low byte of `flags`. If the program CLONE_PIDFD flag is in some top-level application event loop 
 # or something - similar to how Unicorn works. It detects child processes exiting 
 # and takes some action (possibly restarting also provided, then a crashed worker, for example). 
 Signal.trap(:CHLD) do 
   loop do 
     begin 
       pid, status = Process.waitpid2 -1 
       puts "Signal handler reaped #{pid} #{status.inspect}" 
     rescue Errno::ECHILD 
       puts "Signal handler reaped nothing" 
       break 
     end 
   end 
 end 

 # Imagine that _this_ part of file descriptor representing the program process is buried deep in some gem. It knows 
 # nothing about the application SIGCHLD handling, and quite possibly the application 
 # author might not even know also returned; this gem spawns a child process can be used to do its work! 
 require 'open3' 
 loop do 
   o, status = Open3.capture2("/bin/sh", "-c", "echo 'hello'") 
   puts "ran command, got #{o.chomp} #{status.inspect}" 
 end 
 ``` 

 In current versions of Ruby, _some_ loop iterations will function correctly, wait for and print something like this. The gem gets signal the `Process::Status` object from its command and can know if e.g. it exited abnormally. 

 ``` process in a race-free way. 
 ran command, got ohaithar #<Process::Status: pid 1153687 exit 0> 
 Signal handler reaped nothing 
 ``` 

 However, other iterations of * On FreeBSD, the loop print this. The `pdfork(2)` syscall makes a process that does not signal handler runs SIGCHLD and is ignored by `waitpid(2)` calls `Process.waitpid2(-1)` before the code in open3 can that do so. Then, not explicitly specify the gem code does not get a `Process::Status` object! This pid (i.e. it is ignored when -1 is passed). It also potentially bad for the application; it reaped returns a child process it didn't even know existed, and it might cause some surprising bugs if file descriptor representing the application author didn't know this was a possibility. process. 

 ``` 
 Signal handler reaped 1153596 #<Process::Status: pid 1153596 exit 0> 
 Signal handler reaped nothing 
 ran command, got ohaithar nil 
 ``` 

 We would like a family Both of these APIs which allow center around the idea of a gem to spawn process file descriptor. Rather than managing a child process and guarantees that using the gem old process-global wait/signal mechanisms, they return a file descriptor representing the process. Such a file descriptor can uniquely identify the spawned process, be used to wait on it. Some concurrent call to `Process.waitpid2(-1)` (or even `Process.waitpid2($some_lucky_guess_for_the_pid)`) should not steal the status out from underneath process and get the code which created the process. Ideally, we should status, send signals, and even suppress participate in `poll(2)`. They also protect against pid-reuse race conditions; after a process has terminated and been reaped, the `SIGCHLD` signal pidfd becomes invalid, and can’t randomly begin to avoid the application signal handler needlessly waking up. 


 refer to a different process. 

 ## Proposed Ruby-level APIs. Ruby APIs 

 I propose think we create the following should make a new methods in Ruby. 

 * `Process.spawn_private` 
 * `Process.fork_private` 

 These methods behave identically to their non-_private versions in API `Process.spawn_handle`, which accepts all respect, except instead of returning the same parameters as `Process.spawn`. However, it does _not_ return a pid, they return an object of pid like `Process.spawn`, but rather a new type `Process::PrivateHandle`. `Process::Handle`. 

 `Process::PrivateHandle` `Process::Handle` would have identify a single spawned process, using a durable OS-supplied handle not subject to re-use risks (e.g. a pidfd). It would provide the following methods: 

 * `pid()` `#pid` - returns get the pid for that the created process handle is for. 
 * `wait()` `#send_signal(signal)` - blocks send a signal to the caller until the created wrapped process has exited, and returns (where "signal" is a `Process::Status` object. If symbol, string, or number with the handle has _already_ had `#wait` called on it, it returns the same `Process::Status` object meaning as was returned then immediately. This is unlike `Process.waitpid` and friends, which would raise an ECHILD in this case (or, in the face of pid wraparound, potentially wait on some other totally unrelated child process with the same pid). `Process.kill`. 
 * `wait_nonblock()` `#wait` - if blocks waiting for the created process has exited, behaves like `#wait`; otherwise, it program to exit, and then returns a `Process::Status` object for which `#exited?` returns false. representing e.g. the exit code. Like calling `waitpid`. 
 * `kill(...)` `#wait_nonblock` - if the created process has not been reaped via Returns a call to `#wait`, performs identically to `Process.kill ..., pid`. Otherwise, if `Process::Status` object for the process _has_ been reaped, raises `Errno::ESRCH` immediately without issuing a system call. This ensures that, if pids wrap around, that child process. If the wrong process is child has not signaled by mistake. 

 A call to `Process.wait`, `Process.waitpid`, or `Process.waitpid2` exited, it will _never_ return be a `Process::Status` status object for a process started with a `_private` method, even if that call which `#exited?` is made with the pid of the child process. The _only_ way to reap a private child process is through `Process::PrivateHandle`. false. Does not block. Like calling `waitpid(WNOHANG)`. 

 The implementation of `IO.popen`, `Kernel#system`, `Kernel#popen`, backticks, and Finally, the `Open3` module family of methods would be changed extended to use this private process mechanism internally, although they do not return pids so they do not need accept `handle:` as an additional keyword argument. When set to have their interfaces changed. (note though - I don't believe `Kernel#system` suffers from true, `Process.spawn_handle` will be used to start the same problem as the `open3` example above, because it does not yield the GVL nor check interrupts child, and `Process::Handle` objects will be returned in between spawning the child place of pids. 

 Modifying backticks, `Kernel#system` and waiting on it) other process-creating methods which don't return pids to use `spawn_handle` internally would also be possible, but out of scope for an initial implementation of this ticket. 

 ## Implementation strategy OS compatibility 

 I believe For this can API to be implemented, in broad strokes, with an approach like this: 

 * Keep a global table mapping pids -> handles for processes created with `fork_private` or `spawn_private`. 
 * When a child process is waited on, consult useful to gem authors, it has to be widely available on the handle table. If there is a handle registered, systems that they and their users care about. As discussed, the wait call was made without the handle, do NOT return the reaped status. Instead, save the status against the handle, `clone(2)` syscall and repeat the call `CLONE_PIDFD` flag can be used on Linux 5.2+ to `waitpid`. 
 * If implement `Process::Handle`. FreeBSD has `pdfork(2)` since v9. 

 I haven’t investigated Windows _deeply_, but I think Windows doesn’t really have the wait call _was_ made with the handle, we can return the  
 * Once notion of process-global `waitpid` or `SIGCHLD` anyway. The `CreateProcess` function returns a handle has had `PROCESS_INFORMATION` struct, which returns a `HANDLE` for the child status saved against it, it is removed from the table. 
 * A subsequent call process, which seems analogous to wait on that pi the handle will look up the saved information and return it without making a system call. process FD. 

 In fact, most of the infrastructure to do However this correctly is already in place - it was added by @k0kubun and @normalperson four years ago - https://bugs.ruby-lang.org/issues/14867. MJIT had does leave a similar problem to the one described in large chunk of operating systems which don’t have this issue; it needs to fork a C compiler, but if functionality built-in. Off the application performs a `Process.waitpid2(-1)`, it could wind up reaping the gcc process out from underneath mjit. This code has changed considerably over the course top of last year, but my understanding is that mjit still uses this infrastructure to protect its Ruby child-process from becoming visible to Ruby code. head: 

 In any case, the way waitpid works _currently_, is that... 

 * Ruby actually does all calls to `waitpid` as `WNOHANG` (i.e. nonblocking) internally. 
 * If a call to `waitpid` finds no children, it blocks the thread, representing the state in a structure of type `struct waitpid_state`. 
 * Ruby also keeps a list of all `waitpid_state`'s that are currently being waited for, `vm->waiting_pids` MacOS, NetBSD, and `vm->waiting_grps`. 
 * These structures are protected with a specific mutex, `vm->waitpid_lock`. 
 * Ruby internally uses OpenBSD have nothing. I stared pretty hard at the SIGCHLD signal to reap the dead children, Darwin XNU source and then couldn’t find a waiting call race-free way to `waitpid` (via the two lists) convince it not to actually dispatch the reaped status to.  
 * If some caller is waiting `SIGCHLD` for a specific pid, that _always_ takes priority over some other caller that's waiting for a pid-group (e.g. `-1`). 

 mjit's child particular process is protected, because: 

 * When mjit forks, or stop it uses a method `rb_mjit_fork` to do so. from being reaped by process-wide `wait4` calls. 
 * That calls Linux < 5.2 is in some probably-pretty-widely-deployed-still distros - it’s the actual `fork` implementation _whilst still holding_ `vm->waitpid_lock` 
 * Before yielding the lock, it inserts an entry release kernel in `vm->waiting_pids` saying that mjit is waiting Ubuntu 18.04 for the just-created child. 
 * Since direct waits for pids always take precedence over pid-groups, this ensures that mjit will always reap its own children. example. 

 I believe this mechanism can be extended and generalised to power have two ideas for how the proposed API, and mjit semantics of `Process::Handle` could itself use be emulated on such systems. However I recognise that rather than having mjit-specific handling in `process.c`. 

 ## POC implementation 

 I sketched out a _very_ rough POC to see they aren’t amazing so if what anybody has some better ideas I said above would be possible, and I think it is: dearly love to hear them. 

 https://github.com/ruby/ruby/commit/6009c564b16862001535f2b561f1a12f6e7e0c57 ### Long-lived proxy 

 The following script behaves how I expect first time `Process.spawn_handle` is used, we would fork/exec a long-lived “fork-helper” program. This could be a separate helper binary we compile with this patch: 

 ```ruby 
 pid, h = Process.spawn_private "/bin/sh", "-c", "sleep 1; exit 69" 
 puts "pid -> #{pid}" 
 puts "h -> #{h}" 

 # should ESRCH. 
 sleep 2 
 begin 
     Process.waitpid2 -1 
 rescue => e 
     puts "waitpid err -> #{e}" 
 end 
 wpid, status = h.wait 
 puts "wpid -> #{wpid}" 
 puts "status -> #{status.inspect}" 
 ``` 

 ``` 
 ktsanaktsidis@lima-linux1 the build system, or perhaps just a re-invocation of the ruby % ./tool/runruby.rb -- ./tst1.rb 
 pid -> 1154105 
 h -> #<Process::PrivateHandle:0x0000ffff94014098> 
 waitpid err -> No child processes 
 wpid -> 1154105 
 status -> #<Process::Status: pid 1154105 exit 4> 
 ``` 

 The child process can interpreter with something like `ruby -e “Process._fork_helper”`. There would be waited on with a unix socketpair shared between the handle, and parent process & the call to `waitpid2(-1)` finds nothing. helper. 

 ## Previous idea: OS-specific handles 

 My first version Instead of this proposal involved actually forking when we’re calling `Process.spawn_handle`, we would instead send a similar API, but powering it with platform-specific concepts available message on Linux, Windows, and FreeBSD which offer richer control than just pids this socket asking the helper to, _itself_, fork & exec the `wait` syscall. In particular, I had believed that we could use specified program. Any file descriptors etc needed in the `clone` syscall in Linux to create a child process which: 

 * Could could also be referred to by a unique file descriptor (a pidfd) which sent over this socket. All of the `Process::Handle` methods would be guaranteed proxies which called through to the helper binary. 

 This way, the ruby process is never to be re-used (unlike actually the parent of the spawned child, so we would never get any SIGCHLD etc from it. The fork-helper program might generate a pid), 
 * Would not SIGCHLD, but it should persist until the ruby process exits; we would only generate a SIGCHLD signal when if it exited (i.e. no SIGCHLD). 
 * Could not be waited on by crashed abnormally. 

 ### Forward misdirected waits 

 With this approach, `Process.spawn_handle` would just `fork(2)`/`exec(2)` or `posix_spawn(2)` processes like normal. We would however keep a table of pids -> `Process::Handle` instances. 

 When Ruby’s C-level SIGCHLD handler is invoked, we would inspect that table and see if the pid has an unspecting to associated `Process::Handle`. If so, we would skip calling any registered Ruby SIGCHLD handler; instead, we would call `waitpid` (except if a special flag `__WCLONE` ourselves, update the status info on the handle object, and unblock anybody waiting on `Process::Handle#wait`. 

 Likewise, in the C-side implementation of `Process.waitpid2` etc, we would check the returned pid from the syscall against the handle table. If it matched, we would perform the same work as passed). in the SIGCHLD case, and then re-start the original call to `Process.waitpid2`. 

 Unfortunately, when I tried to implement this, I ran into a pretty serious snag. It is possible to create such a This approach keeps the process - BUT, when tree correct and involves less silly proxying, but it won’t hide the process exec's, it goes _back_ from any callers to "raise-SIGCHLD-on-exit" and "allow-waiting-without-__WCLONE" modes. I guess this functionality in the clone syscall is really designed to power threads raw `waitpid` library functions in Linux, rather than being C extensions. Doing that seems like a general-purpose "hidden process" API. 

 So, I don't think we should use pidfds in this proposal. silly idea anyway though, so maybe that’s OK? 

 ## Motivation 

 My use-case for this is that I’m working on a perf-based profiling tool for Ruby. To get around some Linux capability issues, I want my profiler gem (or CRuby patch, whatever it winds up being!) to fork a privileged helper binary to do some eBPF twiddling. But, if you’re profiling e.g. a Unicorn master process, the result of that binary exiting might be caught by Unicorn itself, rather than my (gem | interpreter feature). 

 In my case, I'm so deep in linux specific stuff that just calling `clone(2)` from my extension is probably fine, but I had enough of a look at this process management stuff I thought it would be worth asking the question if this might be useful to other, more normal, gems. 

Back