Bug #19395
closedProcess forking within non-main Ractor hits rb_bug()
Description
def test_fork_in_ractor
r2 = Ractor.new do
pid = fork do
exit Ractor.count
end
pid
end
pid = r2.take
puts "Process #{Process.pid} waiting for #{pid}"
_pid, status = Process.waitpid2(pid) # stuck forever
if status.exitstatus != 1
raise "status is #{status.exitstatus}"
end
end
test_fork_in_ractor()
$ top # shows CPU usage is high for child process
Updated by luke-gru (Luke Gruber) almost 2 years ago
- Subject changed from Process forking within non-main Ractor creates child stuck in busy loop to Process forking within non-main Ractor causes segv
Sorry, my changes in my dev branch were causing some odd behavior. It just crashes on 3.2.0.
Updated by nobu (Nobuyoshi Nakada) almost 2 years ago
luke-gru (Luke Gruber) wrote in #note-1:
It just crashes on 3.2.0.
I can't reproduce the SEGV on macOS 13.1.
What platform are you using?
Updated by nobu (Nobuyoshi Nakada) almost 2 years ago
- Status changed from Open to Feedback
Updated by luke-gru (Luke Gruber) almost 2 years ago
- ruby -v set to 3.2.0
Ubuntu 22.04 x86-64
Linux 5.15.0-58-generic
libpthread.so.0 (libc6,x86-64, OS ABI: Linux 3.2.0)
The issue seems to be calling rb_native_mutex_destroy
on a locked mutex in ractor_free
.
Relevant part of the backtrace:
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(die+0x0) [0x7fc1374d0e5f] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:798
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_bug) /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:800
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_bug_errno+0x43) [0x7fc137579223] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:829
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_native_mutex_destroy+0x24) [0x7fc137719a24] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/thread_pthread.c:603
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(ractor_free+0x11) [0x7fc137679991] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/ractor.c:235
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(run_final+0xf)
If instead you change exit 0
to exec "date"
, it doesn't crash. Maybe the atfork hooks need to be changed to acquire locks in parent, unlock in child.
Updated by luke-gru (Luke Gruber) almost 2 years ago
- Subject changed from Process forking within non-main Ractor causes segv to Process forking within non-main Ractor hits rb_bug()
Updated by luke-gru (Luke Gruber) almost 2 years ago
This fixes it:
https://github.com/luke-gru/ruby/commit/16d8e7575570c6b2d24505e3685d6f0147375286
The issue is that when there's multiple ractors and you call fork, the other ractor(s) that are in the child process that aren't the new main ractor need to be GC'd, and their mutexes could be in a weird state, so either skip destruction of them or reinitialize them in the child process. Re-init works on my machine but I don't know if it works across platforms.
Updated by luke-gru (Luke Gruber) 9 months ago
I can no longer reproduce this issue, I probably had some changes in my tree that were causing the issues. Sorry! Please close.
Updated by byroot (Jean Boussier) 9 months ago
- Status changed from Feedback to Closed
Updated by jhawthorn (John Hawthorn) 3 months ago
- Related to Bug #20670: fork deadlocks in child process due to timer thread added