Bug #4266
closedTimeouts in threads cause "ThreadError: deadlock; recursive locking"
Description
=begin
Run the attached file (or this pastie http://pastie.org/1448542) a few times and you'll eventually get:
ThreadError: deadlock; recursive locking:
internal:prelude:8:in lock' <internal:prelude>:8:in
synchronize'
bin/deadlock_test.rb:86:in block (4 levels) in <main>' /Users/cjbottaro/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/timeout.rb:57:in
timeout'
bin/deadlock_test.rb:85:in block (3 levels) in <main>' bin/deadlock_test.rb:83:in
times'
bin/deadlock_test.rb:83:in `block (2 levels) in '
I've had the script run successfully over 5 times in a row before getting the errors, so if it doesn't happen the first few times... keep trying.
The problem doesn't happen in 1.8.7 or Jruby, but does happen in 1.9.1.
=end
Files
Updated by headius (Charles Nutter) almost 14 years ago
=begin
FYI, JRuby does not use timeout.rb anymore (as of JRuby 1.3ish I think). We use a native implementation based on Java's timed, thread-pooling executor in java.util.concurrent. The original timeout has many race conditions that are very difficult to fix.
=end
Updated by headius (Charles Nutter) almost 14 years ago
=begin
A question was raised about whether #to should understand context, as in knowing a top-level object being coerced to YAML should include the --- header. The answer seems to come from Marshal.
Marshal.dump(object) knows how to produce the marshal header, linking, and so on. Marshal is the master of this format. But Marshal defers to the objects themselves for actual content to go into that marshaled output, calling _dump or marshal_dump on each object in turn. With the #to protocol, Marshal.dump could be implemented as:
def Marshal.dump(obj)
emit_header
obj.to(Marshal)
end
In the same way that marshal_dump is used today.
Another concern raised is whether #to should be expected on all objects all the time, rather than just always calling MyClass.coerce(obj). The answer is simple: you want individual source types to control how they coerce to a target type, rather than expecting the target type to know about all possible sources. The default protocol where #to calls type.coerce would simply be a default behavior.
=end
Updated by headius (Charles Nutter) almost 14 years ago
=begin
Sorry about that, wrong bug. Too many bug windows open.
=end
Updated by headius (Charles Nutter) almost 14 years ago
=begin
Sorry about that, wrong bug. Too many bug windows open.
=end
Updated by kosaki (Motohiro KOSAKI) almost 14 years ago
=begin
This test program also makes ruby crash on my envronment (trunk + linux x86_64).
Related [Bug #3880]?
internal:prelude:8: [BUG] Segmentation fault
ruby 1.9.3dev (2011-01-09 trunk 30500) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0011 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :exception
c:0010 p:---- s:0034 b:0034 l:000033 d:000033 CFUNC :lock
c:0009 p:0011 s:0031 b:0031 l:000030 d:000030 METHOD internal:prelude:8
c:0008 p:0011 s:0028 b:0028 l:000bb8 d:000027 BLOCK deadlock_test.rb:21
c:0007 p:0109 s:0026 b:0026 l:001ad0 d:001ad0 METHOD /usr/local/lib/ruby/1.9.1/timeout.rb:57
c:0006 p:0019 s:0014 b:0014 l:000bb8 d:002580 BLOCK deadlock_test.rb:20
c:0005 p:---- s:0011 b:0011 l:000010 d:000010 FINISH
c:0004 p:---- s:0009 b:0009 l:000008 d:000008 CFUNC :times
c:0003 p:0010 s:0006 b:0006 l:000bb8 d:0012c8 BLOCK deadlock_test.rb:18
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:---- s:0002 b:0002 l:000001 d:000001 TOP
-- Ruby level backtrace information ----------------------------------------
deadlock_test.rb:18:in block (2 levels) in <main>' deadlock_test.rb:18:in
times'
deadlock_test.rb:20:in block (3 levels) in <main>' /usr/local/lib/ruby/1.9.1/timeout.rb:57:in
timeout'
deadlock_test.rb:21:in block (4 levels) in <main>' <internal:prelude>:8:in
synchronize'
internal:prelude:8:in lock' <internal:prelude>:8:in
exception'
-- C level backtrace information -------------------------------------------
./ruby() [0x51ff25] vm_dump.c:797
./ruby() [0x564296] error.c:249
./ruby(rb_bug+0xb1) [0x564431] error.c:266
./ruby() [0x4aec70] signal.c:624
/lib64/libpthread.so.0() [0x382140f440]
./ruby(st_lookup+0xe) [0x4b63ce] st.c:326
./ruby() [0x50d3af] vm_method.c:402
./ruby(rb_funcall2+0x2e) [0x51ccee] vm_eval.c:227
./ruby(rb_class_new_instance+0x30) [0x44f490] object.c:1570
./ruby() [0x516e33] vm_eval.c:79
./ruby() [0x517304] vm_eval.c:290
./ruby() [0x417691] eval.c:564
./ruby(rb_exc_raise+0x26) [0x417886] eval.c:473
./ruby() [0x52487a] thread.c:1301
./ruby(rb_mutex_lock+0x44a) [0x527cfa] thread.c:3266
./ruby() [0x50f0b7] vm_insnhelper.c:403
./ruby() [0x510a36] insns.def:1010
./ruby() [0x5157eb] vm.c:1150
./ruby(rb_yield+0x66) [0x51e116] vm.c:591
./ruby() [0x4469d1] numeric.c:3217
./ruby() [0x50f0b7] vm_insnhelper.c:403
./ruby() [0x510a36] insns.def:1010
./ruby() [0x5157eb] vm.c:1150
./ruby() [0x5186df] vm.c:607
./ruby() [0x5272d1] thread.c:453
./ruby() [0x5273e0] thread_pthread.c:522
/lib64/libpthread.so.0() [0x3821407761]
/lib64/libc.so.6(clone+0x6d) [0x3820ce151d]
=end
Updated by kosaki (Motohiro KOSAKI) almost 14 years ago
=begin
Hm, the crash issue seems different and unrelated one from this. becuase ruby_1_9_2 can reproduce
recursive deadlock issue, but can't reproduce crash.
=end
Updated by kosaki (Motohiro KOSAKI) almost 14 years ago
=begin
Old days, C# had similar issue. and they changed Monitor.enter(= our Mutex.lock) semantics.
http://www.bluebytesoftware.com/blog/2007/01/30/MonitorEnterThreadAbortsAndOrphanedLocks.aspx
So, Can we take similar way? The attached patch kill a race in mutex.synchronize. At least,
my ruby_1_9_2 branch don't reproduce the issue anymore.
Thanks.
=end
Updated by kosaki (Motohiro KOSAKI) almost 14 years ago
- Assignee set to ko1 (Koichi Sasada)
=begin
And, I think asynchronous exception should be blocked and delayed while running ensure block.
Otherwise Mutex.unlock can be bypassed.
ko1, What do you think?
=end
Updated by kosaki (Motohiro KOSAKI) almost 14 years ago
=begin
That said, our Mutex.synchronize() method has two unsafe point.
def synchronize
self.lock
// (1)
begin
yield
ensure
// (2)
self.unlock rescue nil
end
end
If the mutex got thread#raise at (1), it doesn't call unlock because
it haven't entered -begin- block. And if got at (2), it also doesn't call
unlock because we are no longer in -begin- block.
My patch fixed only (1).
Thanks.
=end
Updated by kosaki (Motohiro KOSAKI) almost 14 years ago
- File mutex-synchronize-use-c-implementation.patch mutex-synchronize-use-c-implementation.patch added
=begin
Alternative one is here. This one fixes both (1) and (2). But it only fixes Mutex#synchronize. That said, timeout{} still might not call ruby level ensure block.
=end
Updated by kosaki (Motohiro KOSAKI) almost 14 years ago
=begin
Hi,
I plan to commit mutex-synchronize-use-c-implementation.patch at this weekend. So if anyone have objection, please let me know soon.
=end
Updated by shyouhei (Shyouhei Urabe) over 13 years ago
- Status changed from Open to Assigned
Updated by normalperson (Eric Wong) over 13 years ago
=begin
I have a short-term fix for the related issue in Bug #4289
http://redmine.ruby-lang.org/attachments/1572/0001-timeout.rb-avoid-introducing-new-class-for-every-tim.patch
It works both with and without the mutex-synchronize-use-c-implementation.patch
posted here.
=end
Updated by briangug (Brian Gugliemetti) over 13 years ago
=begin
This issue also applies to MonitorMixin (monitor.rb) as it uses the same lock/begin yield/ensure block.
=end
Updated by nahi (Hiroshi Nakamura) over 13 years ago
- Target version changed from 1.9.2 to 1.9.3
Updated by ko1 (Koichi Sasada) over 13 years ago
- Target version changed from 1.9.3 to 2.0.0
Let's talk it after 1.9.3 release.
Updated by jtara (Jon Tara) over 13 years ago
Brian Gugliemetti wrote:
File monitor-synchronize-use-c-implementation.patch added
There is a parameter mis-match between the declaration and definition of rb_mon_synchronize. Compilation fails on at least OSX.
Updated by MingVonsalis (Ming Vonsalis) over 12 years ago
jtara (Jon Tara) wrote:
Brian Gugliemetti wrote:
File monitor-synchronize-use-c-implementation.patch added
There is a parameter mis-match between the declaration and definition of rb_mon_synchronize. Compilation fails on at least OSX.
Updated by trapni (Christian Parpart) about 12 years ago
We just ran into this very ugly bug in production, too. We would really appreciate a fix in 1.9.4, a pure bugfix release of the 1.9 branch since this is a major bug. :)
Updated by ko1 (Koichi Sasada) about 12 years ago
- Assignee changed from ko1 (Koichi Sasada) to kosaki (Motohiro KOSAKI)
I passed this ticket to thread specialist.
Updated by kosaki (Motohiro KOSAKI) about 12 years ago
Brian,
Can you please provide us a reproducer for you monitor.rb issue? And can you please make a new ticket for your issue to prevent cross line discussion? I plan to close this ticket soon and I also hope to don't forget your issue.
Thanks.
Updated by kosaki (Motohiro KOSAKI) about 12 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r37724.
Christopher, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- prelude.rb: Moved Mutex#synchronize to
- thread.c (rb_mutex_synchronize_m): here. [Bug #4266]