Project

General

Profile

Actions

Bug #4266

closed

Timeouts in threads cause "ThreadError: deadlock; recursive locking"

Added by cjbottaro (Christopher Bottaro) almost 14 years ago. Updated about 12 years ago.

Status:
Closed
Target version:
ruby -v:
ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-darwin10.6.0]
Backport:
[ruby-core:34362]

Description

=begin
Run the attached file (or this pastie http://pastie.org/1448542) a few times and you'll eventually get:

ThreadError: deadlock; recursive locking:
internal:prelude:8:in lock' <internal:prelude>:8:in synchronize'
bin/deadlock_test.rb:86:in block (4 levels) in <main>' /Users/cjbottaro/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/timeout.rb:57:in timeout'
bin/deadlock_test.rb:85:in block (3 levels) in <main>' bin/deadlock_test.rb:83:in times'
bin/deadlock_test.rb:83:in `block (2 levels) in '

I've had the script run successfully over 5 times in a row before getting the errors, so if it doesn't happen the first few times... keep trying.

The problem doesn't happen in 1.8.7 or Jruby, but does happen in 1.9.1.
=end


Files

deadlock_test.rb (554 Bytes) deadlock_test.rb cjbottaro (Christopher Bottaro), 01/12/2011 01:59 AM
diffdiff (1.38 KB) diffdiff Add Mutex.own? method kosaki (Motohiro KOSAKI), 01/12/2011 02:45 PM
mutex-synchronize-use-c-implementation.patch (1.28 KB) mutex-synchronize-use-c-implementation.patch kosaki (Motohiro KOSAKI), 01/12/2011 04:45 PM
monitor-synchronize-use-c-implementation.patch (4.6 KB) monitor-synchronize-use-c-implementation.patch Fix for MonitorMixin/monitor synchronize using similar approach as mutex synchronize patch briangug (Brian Gugliemetti), 05/05/2011 09:01 AM

Related issues 2 (0 open2 closed)

Related to Ruby master - Bug #4285: Ruby don't have asynchrounous exception safe syntax and It should have.Closedko1 (Koichi Sasada)01/17/2011Actions
Related to Ruby master - Bug #4289: Timeouts in threads cause SEGVClosedakr (Akira Tanaka)01/18/2011Actions
Actions #1

Updated by headius (Charles Nutter) almost 14 years ago

=begin
FYI, JRuby does not use timeout.rb anymore (as of JRuby 1.3ish I think). We use a native implementation based on Java's timed, thread-pooling executor in java.util.concurrent. The original timeout has many race conditions that are very difficult to fix.
=end

Actions #2

Updated by headius (Charles Nutter) almost 14 years ago

=begin
A question was raised about whether #to should understand context, as in knowing a top-level object being coerced to YAML should include the --- header. The answer seems to come from Marshal.

Marshal.dump(object) knows how to produce the marshal header, linking, and so on. Marshal is the master of this format. But Marshal defers to the objects themselves for actual content to go into that marshaled output, calling _dump or marshal_dump on each object in turn. With the #to protocol, Marshal.dump could be implemented as:

def Marshal.dump(obj)
emit_header
obj.to(Marshal)
end

In the same way that marshal_dump is used today.

Another concern raised is whether #to should be expected on all objects all the time, rather than just always calling MyClass.coerce(obj). The answer is simple: you want individual source types to control how they coerce to a target type, rather than expecting the target type to know about all possible sources. The default protocol where #to calls type.coerce would simply be a default behavior.
=end

Actions #3

Updated by headius (Charles Nutter) almost 14 years ago

=begin
Sorry about that, wrong bug. Too many bug windows open.
=end

Actions #4

Updated by headius (Charles Nutter) almost 14 years ago

=begin
Sorry about that, wrong bug. Too many bug windows open.
=end

Actions #5

Updated by kosaki (Motohiro KOSAKI) almost 14 years ago

=begin
This test program also makes ruby crash on my envronment (trunk + linux x86_64).

Related [Bug #3880]?

internal:prelude:8: [BUG] Segmentation fault
ruby 1.9.3dev (2011-01-09 trunk 30500) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0011 p:---- s:0036 b:0036 l:000035 d:000035 CFUNC :exception
c:0010 p:---- s:0034 b:0034 l:000033 d:000033 CFUNC :lock
c:0009 p:0011 s:0031 b:0031 l:000030 d:000030 METHOD internal:prelude:8
c:0008 p:0011 s:0028 b:0028 l:000bb8 d:000027 BLOCK deadlock_test.rb:21
c:0007 p:0109 s:0026 b:0026 l:001ad0 d:001ad0 METHOD /usr/local/lib/ruby/1.9.1/timeout.rb:57
c:0006 p:0019 s:0014 b:0014 l:000bb8 d:002580 BLOCK deadlock_test.rb:20
c:0005 p:---- s:0011 b:0011 l:000010 d:000010 FINISH
c:0004 p:---- s:0009 b:0009 l:000008 d:000008 CFUNC :times
c:0003 p:0010 s:0006 b:0006 l:000bb8 d:0012c8 BLOCK deadlock_test.rb:18
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:---- s:0002 b:0002 l:000001 d:000001 TOP

-- Ruby level backtrace information ----------------------------------------
deadlock_test.rb:18:in block (2 levels) in <main>' deadlock_test.rb:18:in times'
deadlock_test.rb:20:in block (3 levels) in <main>' /usr/local/lib/ruby/1.9.1/timeout.rb:57:in timeout'
deadlock_test.rb:21:in block (4 levels) in <main>' <internal:prelude>:8:in synchronize'
internal:prelude:8:in lock' <internal:prelude>:8:in exception'

-- C level backtrace information -------------------------------------------
./ruby() [0x51ff25] vm_dump.c:797
./ruby() [0x564296] error.c:249
./ruby(rb_bug+0xb1) [0x564431] error.c:266
./ruby() [0x4aec70] signal.c:624
/lib64/libpthread.so.0() [0x382140f440]
./ruby(st_lookup+0xe) [0x4b63ce] st.c:326
./ruby() [0x50d3af] vm_method.c:402
./ruby(rb_funcall2+0x2e) [0x51ccee] vm_eval.c:227
./ruby(rb_class_new_instance+0x30) [0x44f490] object.c:1570
./ruby() [0x516e33] vm_eval.c:79
./ruby() [0x517304] vm_eval.c:290
./ruby() [0x417691] eval.c:564
./ruby(rb_exc_raise+0x26) [0x417886] eval.c:473
./ruby() [0x52487a] thread.c:1301
./ruby(rb_mutex_lock+0x44a) [0x527cfa] thread.c:3266
./ruby() [0x50f0b7] vm_insnhelper.c:403
./ruby() [0x510a36] insns.def:1010
./ruby() [0x5157eb] vm.c:1150
./ruby(rb_yield+0x66) [0x51e116] vm.c:591
./ruby() [0x4469d1] numeric.c:3217
./ruby() [0x50f0b7] vm_insnhelper.c:403
./ruby() [0x510a36] insns.def:1010
./ruby() [0x5157eb] vm.c:1150
./ruby() [0x5186df] vm.c:607
./ruby() [0x5272d1] thread.c:453
./ruby() [0x5273e0] thread_pthread.c:522
/lib64/libpthread.so.0() [0x3821407761]
/lib64/libc.so.6(clone+0x6d) [0x3820ce151d]

=end

Actions #6

Updated by kosaki (Motohiro KOSAKI) almost 14 years ago

=begin
Hm, the crash issue seems different and unrelated one from this. becuase ruby_1_9_2 can reproduce
recursive deadlock issue, but can't reproduce crash.

=end

Actions #7

Updated by kosaki (Motohiro KOSAKI) almost 14 years ago

=begin
Old days, C# had similar issue. and they changed Monitor.enter(= our Mutex.lock) semantics.

http://www.bluebytesoftware.com/blog/2007/01/30/MonitorEnterThreadAbortsAndOrphanedLocks.aspx

So, Can we take similar way? The attached patch kill a race in mutex.synchronize. At least,
my ruby_1_9_2 branch don't reproduce the issue anymore.

Thanks.
=end

Actions #8

Updated by kosaki (Motohiro KOSAKI) almost 14 years ago

  • Assignee set to ko1 (Koichi Sasada)

=begin
And, I think asynchronous exception should be blocked and delayed while running ensure block.
Otherwise Mutex.unlock can be bypassed.

ko1, What do you think?
=end

Actions #9

Updated by kosaki (Motohiro KOSAKI) almost 14 years ago

=begin
That said, our Mutex.synchronize() method has two unsafe point.


def synchronize
self.lock
// (1)
begin
yield
ensure
// (2)
self.unlock rescue nil
end
end

If the mutex got thread#raise at (1), it doesn't call unlock because
it haven't entered -begin- block. And if got at (2), it also doesn't call
unlock because we are no longer in -begin- block.

My patch fixed only (1).

Thanks.
=end

Actions #10

Updated by kosaki (Motohiro KOSAKI) almost 14 years ago

=begin
Alternative one is here. This one fixes both (1) and (2). But it only fixes Mutex#synchronize. That said, timeout{} still might not call ruby level ensure block.

=end

Actions #11

Updated by kosaki (Motohiro KOSAKI) almost 14 years ago

=begin
Hi,

I plan to commit mutex-synchronize-use-c-implementation.patch at this weekend. So if anyone have objection, please let me know soon.

=end

Actions #12

Updated by shyouhei (Shyouhei Urabe) over 13 years ago

  • Status changed from Open to Assigned

Updated by normalperson (Eric Wong) over 13 years ago

=begin
I have a short-term fix for the related issue in Bug #4289
http://redmine.ruby-lang.org/attachments/1572/0001-timeout.rb-avoid-introducing-new-class-for-every-tim.patch

It works both with and without the mutex-synchronize-use-c-implementation.patch
posted here.

=end

Actions #14

Updated by briangug (Brian Gugliemetti) over 13 years ago

=begin
This issue also applies to MonitorMixin (monitor.rb) as it uses the same lock/begin yield/ensure block.
=end

Updated by nahi (Hiroshi Nakamura) over 13 years ago

  • Target version changed from 1.9.2 to 1.9.3

Updated by ko1 (Koichi Sasada) over 13 years ago

  • Target version changed from 1.9.3 to 2.0.0

Let's talk it after 1.9.3 release.

Updated by jtara (Jon Tara) over 13 years ago

Brian Gugliemetti wrote:

File monitor-synchronize-use-c-implementation.patch added

There is a parameter mis-match between the declaration and definition of rb_mon_synchronize. Compilation fails on at least OSX.

Updated by MingVonsalis (Ming Vonsalis) over 12 years ago

jtara (Jon Tara) wrote:

Brian Gugliemetti wrote:

File monitor-synchronize-use-c-implementation.patch added

There is a parameter mis-match between the declaration and definition of rb_mon_synchronize. Compilation fails on at least OSX.

http://originsofinuitart.posterous.com/

Updated by trapni (Christian Parpart) about 12 years ago

We just ran into this very ugly bug in production, too. We would really appreciate a fix in 1.9.4, a pure bugfix release of the 1.9 branch since this is a major bug. :)

Updated by ko1 (Koichi Sasada) about 12 years ago

  • Assignee changed from ko1 (Koichi Sasada) to kosaki (Motohiro KOSAKI)

I passed this ticket to thread specialist.

Updated by kosaki (Motohiro KOSAKI) about 12 years ago

Brian,

Can you please provide us a reproducer for you monitor.rb issue? And can you please make a new ticket for your issue to prevent cross line discussion? I plan to close this ticket soon and I also hope to don't forget your issue.

Thanks.

Actions #23

Updated by kosaki (Motohiro KOSAKI) about 12 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r37724.
Christopher, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • prelude.rb: Moved Mutex#synchronize to
  • thread.c (rb_mutex_synchronize_m): here. [Bug #4266]
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0