Bug #6634
closedDeadlock with join and ConditionVariable
Description
I'm getting a fatal deadlock in one of my gems, it's a simple threadpool implementation.
The library works both in Rubinius and JRuby, so I guess it's a bug.
The gem is here: https://github.com/meh/ruby-threadpool
The example that crashes is attached.
Basically it raises a fatal deadlock if you join a thread and then call ConditionVariable#wait, I'm not 100% sure if the bug is in the ConditionVariable or what, all I know is that it happens in that situation and that it works on Rubinius and JRuby.
Files
Updated by Anonymous over 12 years ago
On Sat, Jun 23, 2012 at 11:49:14PM +0900, meh. (meh. I don't care) wrote:
Issue #6634 has been reported by meh. (meh. I don't care).
Bug #6634: Deadlock with join and ConditionVariable
https://bugs.ruby-lang.org/issues/6634Author: meh. (meh. I don't care)
Status: Open
Priority: Normal
Assignee:
Category: core
Target version:
ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]I'm getting a fatal deadlock in one of my gems, it's a simple threadpool implementation.
The library works both in Rubinius and JRuby, so I guess it's a bug.
The gem is here: https://github.com/meh/ruby-threadpool
The example that crashes is attached.
Basically it raises a fatal deadlock if you join a thread and then call ConditionVariable#wait, I'm not 100% sure if the bug is in the ConditionVariable or what, all I know is that it happens in that situation and that it works on Rubinius and JRuby.
I can't seem to reproduce this error:
http://www.youtube.com/watch?v=8J_eBXZ7ud4
Can you reduce the error to a self contained example that reliably
fails?
--
Aaron Patterson
http://tenderlovemaking.com/
Updated by meh. (meh. I don't care) over 12 years ago
- File reduced.rb reduced.rb added
Always happens, on Arch Linux x86_64.
ruby reduced.rb
reduced.rb:13:injoin': deadlock detected (fatal) from reduced.rb:13:in
'
Updated by kosaki (Motohiro KOSAKI) over 12 years ago
- Status changed from Open to Assigned
- Assignee set to kosaki (Motohiro KOSAKI)
Updated by kosaki (Motohiro KOSAKI) over 12 years ago
- Status changed from Assigned to Feedback
thread = Thread.new {
mutex.synchronize {
cond.wait(mutex)
}
}
thread.join
This is true deadlock. The above thread.join has no chance to exit successfully.
Can you please elaborate your point?
Updated by meh. (meh. I don't care) over 12 years ago
Then I can't come up with a reduced testcase, I know that it triggers a fatal deadlock in my gem when it's actually not a deadlock.
It works both in JRuby and Rubinius.
Updated by kosaki (Motohiro KOSAKI) over 12 years ago
Unfortunately, we don't have an esp capability. "The library works both in Rubinius and JRuby, so I guess it's a bug." don't gave me any hint. sorry.
Updated by meh. (meh. I don't care) over 12 years ago
The library is just ~250 lines.
The issue is that it's thinking it's deadlocking when actually another thread is going to shutdown the threadpool (hence broadcasting on the cond and not being a deadlock).
Updated by kosaki (Motohiro KOSAKI) about 12 years ago
- Assignee deleted (
kosaki (Motohiro KOSAKI))
Updated by mame (Yusuke Endoh) about 12 years ago
- File lol2.rb lol2.rb added
- Status changed from Feedback to Assigned
- Assignee set to kosaki (Motohiro KOSAKI)
- Target version set to 2.0.0
I succeeded to reproduce the issue, by adding set_trace_func to lol.rb, redirecting the output to the file, and repeating the invocation until the error occurs.
It looks very very timing sensitive issue.
$ gem install threadpool
$ ./ruby -v
ruby 2.0.0dev (2012-11-05 trunk 37474) [x86_64-linux]
$ ruby -e 'loop { system("./ruby lol2.rb > t") || break }'
/home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:183:in `join': No live threads left. Deadlock?
from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:183:in `join'
from lol.rb:9:in `<main>'
$ ruby -e 'loop { system("./ruby lol2.rb > t") || break }'
<internal:prelude>:8:in `lock': deadlock; recursive locking (ThreadError)
from <internal:prelude>:8:in `synchronize'
from /home/mame/work/local/lib/ruby/2.0.0/thread.rb:69:in `wait'
from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:234:in `block (3 levels) in spawn_thread'
from <internal:prelude>:10:in `synchronize'
from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:222:in `block (2 levels) in spawn_thread'
from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:249:in `loop'
from /home/mame/work/local/lib/ruby/gems/2.0.0/gems/threadpool-0.1.2/lib/threadpool.rb:249:in `block in spawn_thread'
I reviewed the source of threadpool gem, but I could find no problem.
Precisely, it may attempt to call undefined method named "reason"; it is clearly irrelevant.
Kosaki-san, could you try to reproduce? The core behavior looks to me indeed strange (too subtle to explain in English, sorry), but I failed to find the bug.
面倒なので日本語で。
再現性が乏しく (うちの環境で 100 回実行に 1 回くらい?) 、gdb を使いこなせないので printf debug で戦ってみたんですが、確かに core が怪しい挙動をしている気がしました。
CV 内の mutex を lock したはずなのになぜか threadpool 内の mutex が lock されているような、そうでないような。
大物のタイミングバグの予感がする (GC issue かも知れませんが) のですが、小崎さんの環境で再現できたら勝利だと思うので、試してみてもらえますでしょうか。
--
Yusuke Endoh mame@tsg.ne.jp
Updated by kosaki (Motohiro KOSAKI) about 12 years ago
- Assignee changed from kosaki (Motohiro KOSAKI) to ko1 (Koichi Sasada)
Updated by kosaki (Motohiro KOSAKI) about 12 years ago
Hi mame-san,
ko1 found the second case (i.e. below) is a his regression since October. He told me he plan to fix soon.
$ ruby -e 'loop { system("./ruby lol2.rb > t") || break }'
internal:prelude:8:inlock': deadlock; recursive locking (ThreadError) from <internal:prelude>:8:in
synchronize'
from /home/mame/work/local/lib/ruby/2.0.0/thread.rb:69:in `wait'
And I couldn't reproduce this issue at commit r37074 (Oct 3). So I think we haven't reproduce an original issue yet.
Updated by ko1 (Koichi Sasada) about 12 years ago
- Assignee changed from ko1 (Koichi Sasada) to mame (Yusuke Endoh)
Maybe this second problem is fixed at r37647.
mame-san, could you check it?
Updated by mame (Yusuke Endoh) about 12 years ago
- Status changed from Assigned to Feedback
Worked. Thank you!
Then, anyone can reproduce the original problem? Meh, can you still reproduce?
--
Yusuke Endoh mame@tsg.ne.jp
Updated by mame (Yusuke Endoh) almost 12 years ago
- Status changed from Feedback to Rejected
Marking this as rejected due to lack of feedback by the submitter.
--
Yusuke Endoh mame@tsg.ne.jp
Updated by we4tech (nhm tanveeer hossain khan) over 11 years ago
Hi there,
I've faced similar problem with ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.1.0] (installed with rvm)
Please checkout my attached code. Let me know if I could help you more. Or if i'm doing something dumb :)
Updated by nikkoara (L Nicoara) over 10 years ago
nhm tanveeer hossain khan wrote:
Hi there,
I've faced similar problem with ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.1.0] (installed with rvm)
Hey, I have the same problem. I took the test case you posted, reduced it further, and fiddled with the numbers of threads, etc. See attached. It crashed reliably for me, always right after launching it.
If we are using Ruby threads the wrong way, please let us know. If not, could you please take another look at this issue and possibly reactivate it?
Thanks.
Updated by nikkoara (L Nicoara) over 10 years ago
L Nicoara wrote:
nhm tanveeer hossain khan wrote:
Hi there,
I've faced similar problem with ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.1.0] (installed with rvm)
Hey, I have the same problem. I took the test case you posted, reduced it further, and fiddled with the numbers of threads, etc. See attached. It crashed reliably for me, always right after launching it.
For the record, the test case is malformed. Bummer. I think the one I based it on (from khan) is malformed as well. My apologies if you spent time on it.
Updated by kosaki (Motohiro KOSAKI) over 10 years ago
On Sat, May 3, 2014 at 8:45 AM, nikkoara@hates.ms wrote:
Issue #6634 has been updated by L Nicoara.
L Nicoara wrote:
nhm tanveeer hossain khan wrote:
Hi there,
I've faced similar problem with ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.1.0] (installed with rvm)
Hey, I have the same problem. I took the test case you posted, reduced it further, and fiddled with the numbers of threads, etc. See attached. It crashed reliably for me, always right after launching it.
For the record, the test case is malformed. Bummer. I think the one I based it on (from khan) is malformed as well. My apologies if you spent time on it.
NP :)