Project

General

Profile

Feature #20590

Updated by byroot (Jean Boussier) 5 months ago

NB: Opening this as a feature because I don't have any clear bug report or repro script, but in a way this is a bug. 

 ### Context 

 For better or for worse, fork(2) remains the primary provider of parallelism in Ruby programs, and will likely continue to be for the foreseeable future. 

 Even though it's frowned upon in many circles, and a lot of literature will simply state that only async-signal safe APIs are safe to use after `fork()`, 
 in practice, most APIs work well as long as you are careful about not to fork while another thread is holding a pthread mutex, and the general advice is simply to not start any thread before calling `fork(2)`. 

 This became much harder (if not impossible) to ensure in Ruby 3.3 following [Feature #19965]. Every call to `Addrinfo.getaddrinfo` now starts a native thread that will call `getaddrinfo(3)`. 
 And unless I'm not reading the code correctly, that thread may even be left running if the call is interrupted. This is particularly problematic because, at least in the `glibc` implementation, `getaddrinfo(3)` 
 do acquire a mutex. So if a fork happens while this mutex is held, the resulting child will be corrupted, and any call to `getaddrinfo(3)` in the child will deadlock. 
 This is a fairly well-known fork-safety problem ([just one example](https://emptysqua.re/blog/getaddrinfo-deadlock/)). 

 I don't have a reproducer to demonstrate this bug, but I heard hear several reports of deadlocked processes after fork issues happening to people upgrading to Ruby 3.3 that seem related or could be explained by this issue. 

 ### Proposal 

 I think we could reduce the impact of this problem by locking around `fork(2)` and `getaddrinfo(3)` `getaddrinfo(3) with a read-write lock. 

 `Process.fork` would acquire the lock in write mode, and `getaddrinfo` would acquire it in read mode. 

 The obvious downside of course is that an interrupted `addrinfo` call may take a very long time to timeout and release the lock, 
 delaying the `Process.fork` call for a while, and that's far from ideal, but I don't have any better idea. 

 I implemented a proof of concept at: https://github.com/ruby/ruby/pull/10864 

 cc @mame @ko1

Back