Bug #1525

Deadlock in Ruby 1.9's VM caused by ConditionVariable.wait and fork?

Added by hongli (Hongli Lai) over 11 years ago. Updated over 9 years ago.

Target version:
ruby -v:
ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-darwin9.6.0]


The following code seems to cause a VM-wide deadlock on 1.9:

require 'thread'

lock =
cond =
t = do
lock.synchronize do

pid = fork do
# Child
STDOUT.write "This is the child process.\n"
STDOUT.write "Child process exiting.\n"
STDOUT.write("Child PID = #{pid}\n")

The expected output is:

Child PID = xxxx
This is the child process.
Child process exiting.

After the exit message, Ruby should exit.

Instead, Ruby 1.9 gives:

Child PID = 15493
This is the child process.
(process hangs here)

Ruby 1.8 does not suffer from this problem.

Upon debugging Ruby, I've found that Ruby is stuck in blocking_region_end(), at the following line:


blocking_region_end() was called as part of rb_write_internal(), right after writing "This is the child process\n" to stdout. This problem only occurs if there's a background thread that's waiting on a ConditionVariable. If you remove the thread then the deadlock does not occur.


vm_deadlock_fix.diff (466 Bytes) vm_deadlock_fix.diff hongli (Hongli Lai), 11/17/2009 09:58 PM

Related issues

Related to Ruby master - Bug #2025: problem with pthread handling on non NPTL platformClosedmame (Yusuke Endoh)08/31/2009Actions

Updated by hongli (Hongli Lai) over 11 years ago

It appears that this bug is OS X-specific. On Ubuntu 8.04 it behaves correctly: ruby 1.9.1p129 (2009-05-12 revision 23412) [x86_64-linux]


Updated by yugui (Yuki Sonoda) over 11 years ago

  • Assignee set to ko1 (Koichi Sasada)
  • Target version set to 1.9.1




Updated by vanjab (Vanja Bucic) over 11 years ago

Just to chime in on this issue.

It is affecting our company as well. We run our software on apple machines in a server environment.
Our application is a multithreaded daemon process that is accessing mysql database pretty often. Some of these threads fork off a task that may take some time to complete. It has worked well prior to ruby 1.9. Since then we have tried to find a workaround but to no avail. Any attempts to use fork in our application will result in deadlocks 8/10 times.
It deadlocks in weirdest places, like 'puts' or mysql.query (which we know is setting global lock internally, but should be thread safe).

Any ideas and attempts to resolve this ASAP are welcome.


Updated by vanjab (Vanja Bucic) over 11 years ago

To add my test case:
# ------------------------
require 'thread'

$stderr.puts RUBY_VERSION

$pid = 0
$t1 = do
$pid = fork {
$stderr.puts "thread 1 exiting"

$stdout.puts "ok, thread with fork spawned"
$stdout.puts "la la la la"


$stderr.puts "never done"

# ------ outputs ----------
ok, thread with fork spawned
la la la la



Updated by vanjab (Vanja Bucic) over 11 years ago

In Reply to:
-- IMHO, from the respective of user, although it is hard, try not to use
-- any non-async-signal-safe functions in a forked child process before any
-- exec functions are called.

-- - Tetsu

Just so I can understand the logic, could you rewrite my test case above so that it does not deadlock?
I am not clear which of the functions I used in the test case are non-async-signal-safe or not.



Updated by matz (Yukihiro Matsumoto) over 11 years ago


In message "Re: [ruby-core:24565] Re: [Bug #1525] Deadlock in Ruby 1.9's VM caused by ConditionVariable.wait and fork?"
on Sun, 26 Jul 2009 22:11:41 +0900, Hongli Lai writes:

|In any case, not being able to create threads or doing anything
|complicated in child processes is a serious limitation. This makes
|forking-without-exec in Ruby 1.9 as good as useless. Even
|forking-with-exec is dangerous now. For example, suppose that the child
|process creates a command string to pass to exec(), and creating this
|command string involves malloc()ing memory. Even this isn't safe anymore.
|I think Kernel#fork should be made safe as much as possible.

I know what you mean. But we cannot override the underlying platform
behavior (i.e impossible). If it's possible, we are glad to adopt.

# but in case of Vanja, it might be able to support by adjusting the
# timing of launching the internal worker thread. I am not sure yet.




Updated by normalperson (Eric Wong) over 11 years ago

Looking at trunk, there doesn't seem to be any accounting of mutexes to pass to
handlers for pthread_atfork; so the child process will just inherit the mutexes
in an unknown state.

It should be possible to fix the problem by keeping track of all mutexes as
they're created/initialized and registering pthread_atfork handlers
to ensure all mutexes are unlocked when the child starts running.

I'm pretty sure forking in the presence of threads in the parent will always
require a GVL, but I don't think it's too big of an issue otherwise.



Updated by normalperson (Eric Wong) over 11 years ago

"none <" wrote:

Eric Wong wrote:

It should be possible to fix the problem by keeping track of all mutexes as
they're created/initialized and registering pthread_atfork handlers
to ensure all mutexes are unlocked when the child starts running.

In fact, it is impossible to track all mutexes because the usage of

mutexes really
depends on the underlying implementation.
For example, the deadlock on this issues doesn't happen on Linux system,

not on FreeBSD7.2, but happens on FreeBSD6.4.

Yes, it's not easy; but I think we can start making a best effort and
wait for OSes to catch up. This lets us start paving the way towards
reducing the reliance on the GVL:

The big system-side offenders are stdio, malloc and resolver...

  1. Ruby 1.9 already removed most of stdio dependencies.

  2. malloc still happens under a GVL, but I think replacing it with a
    Ruby-aware memory allocator that's better integrated with the GC and
    thread management would be a good thing anyways.

  3. Maybe look at c-ares or even resolv.rb since they'd play nicer
    with timeouts anyways... (not too sure on this one).

There's probably a few other things, but I think those are the main
ones that server applications (the ones most likely to use threads+fork)
will care about...

Eric Wong



Updated by hongli (Hongli Lai) about 11 years ago

The attached patch fixes the problem. Before forking there might be an arbitrary number of threads waiting on the lock, causing it to enter an undefined state after forking, which in turn causes a deadlock on some platforms. This patch reinitializes the global interpreter lock right after forking, which should be safe because all threads are gone right after forking.

I tried this before and it didn't work, but I suspect that that was caused by bug #2371. Now that #2371 has been fixed it would seem that this patch works.


Updated by vanjab (Vanja Bucic) about 11 years ago

Very good news, thanks. Where do I fetch the latest sources that include your patch so that I can test with our use case?


Updated by hongli (Hongli Lai) about 11 years ago

The patch is to be applied on top of Ruby's SVN sources.


Updated by nobu (Nobuyoshi Nakada) about 11 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r25844.
Hongli, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.



Updated by daniel (Daniel Cavanagh) about 11 years ago

On 25/11/2009, at 5:57 PM, Tanaka Akira wrote:

In article,
Hongli Lai writes:

% ./ruby -e 'fork { puts }'
-e:1: [BUG] native_mutex_unlock return non-zero: 1
ruby 1.9.2dev (2009-11-19 trunk 25848) [x86_64-freebsd6.4]

This is what I get on FreeBSD 7.1-RELEASE:

FreeBSD 8.0-RELEASE behaves similar to FreeBSD 6.4.

% uname -mrsv
FreeBSD 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:48:17 UTC 2009 i386
% ./ruby -e 'fork { puts }'
-e:1: [BUG] pthread_mutex_unlock: Operation not permitted (EPERM)
ruby 1.9.2dev (2009-11-25 trunk 25911) [i386-freebsd8.0]

-- control frame ----------
c:0009 p:---- s:0020 b:0020 l:000019 d:000019 CFUNC :write
c:0008 p:---- s:0018 b:0018 l:000017 d:000017 CFUNC :puts
c:0007 p:---- s:0016 b:0016 l:000015 d:000015 CFUNC :puts
c:0006 p:0009 s:0013 b:0013 l:0010a4 d:000012 BLOCK -e:1
c:0005 p:---- s:0011 b:0011 l:000010 d:000010 FINISH
c:0004 p:---- s:0009 b:0009 l:000008 d:000008 CFUNC :fork
c:0003 p:0009 s:0006 b:0006 l:0010a4 d:000004 EVAL -e:1
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH

c:0001 p:0000 s:0002 b:0002 l:0010a4 d:0010a4 TOP

-e:1:in <main>'
-e:1:in block in <main>'
-e:1:in puts'

exactly the same thing happens on netbsd 5.0.1, if that helps


Also available in: Atom PDF