Bug #19395
closed
Process forking within non-main Ractor hits rb_bug()
Added by luke-gru (Luke Gruber) almost 2 years ago.
Updated 9 months ago.
Description
def test_fork_in_ractor
r2 = Ractor.new do
pid = fork do
exit Ractor.count
end
pid
end
pid = r2.take
puts "Process #{Process.pid} waiting for #{pid}"
_pid, status = Process.waitpid2(pid) # stuck forever
if status.exitstatus != 1
raise "status is #{status.exitstatus}"
end
end
test_fork_in_ractor()
$ top # shows CPU usage is high for child process
- Subject changed from Process forking within non-main Ractor creates child stuck in busy loop to Process forking within non-main Ractor causes segv
Sorry, my changes in my dev branch were causing some odd behavior. It just crashes on 3.2.0.
luke-gru (Luke Gruber) wrote in #note-1:
It just crashes on 3.2.0.
I can't reproduce the SEGV on macOS 13.1.
What platform are you using?
- Status changed from Open to Feedback
Ubuntu 22.04 x86-64
Linux 5.15.0-58-generic
libpthread.so.0 (libc6,x86-64, OS ABI: Linux 3.2.0)
The issue seems to be calling rb_native_mutex_destroy
on a locked mutex in ractor_free
.
Relevant part of the backtrace:
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(die+0x0) [0x7fc1374d0e5f] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:798
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_bug) /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:800
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_bug_errno+0x43) [0x7fc137579223] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/error.c:829
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(rb_native_mutex_destroy+0x24) [0x7fc137719a24] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/thread_pthread.c:603
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(ractor_free+0x11) [0x7fc137679991] /tmp/ruby-build.20230103230257.28392.jU9iPR/ruby-3.2.0/ractor.c:235
/home/lukeg/.rbenv/versions/3.2.0/lib/libruby.so.3.2(run_final+0xf)
If instead you change exit 0
to exec "date"
, it doesn't crash. Maybe the atfork hooks need to be changed to acquire locks in parent, unlock in child.
- Subject changed from Process forking within non-main Ractor causes segv to Process forking within non-main Ractor hits rb_bug()
This fixes it:
https://github.com/luke-gru/ruby/commit/16d8e7575570c6b2d24505e3685d6f0147375286
The issue is that when there's multiple ractors and you call fork, the other ractor(s) that are in the child process that aren't the new main ractor need to be GC'd, and their mutexes could be in a weird state, so either skip destruction of them or reinitialize them in the child process. Re-init works on my machine but I don't know if it works across platforms.
I can no longer reproduce this issue, I probably had some changes in my tree that were causing the issues. Sorry! Please close.
- Status changed from Feedback to Closed
- Related to Bug #20670: fork deadlocks in child process due to timer thread added
Also available in: Atom
PDF
Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0