Project

General

Profile

Actions

Bug #10892

closed

Deadlock in autoload

Bug #10892: Deadlock in autoload

Added by Eregon (Benoit Daloze) over 10 years ago. Updated over 6 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.3.0dev (2015-02-23 trunk 49693) [x86_64-linux]
[ruby-core:68255]

Description

Updating to recent RubySpec seems to show a bug under concurrent autoload.
I attach the extracted logic to reproduce.

At me, the script ends with either, in most cases,

autoload_bug.rb:105:in `value': No live threads left. Deadlock? (fatal)
	from autoload_bug.rb:105:in `map'
	from autoload_bug.rb:105:in `<main>'

Or:

autoload_bug.rb:95:in `const_get': uninitialized constant Mod1 (NameError)
	from autoload_bug.rb:95:in `block (3 levels) in <main>'
	from autoload_bug.rb:86:in `each'
	from autoload_bug.rb:86:in `block (2 levels) in <main>'

Which both seem incorrect behavior.
All versions from 2.0 seem affected, and 1.9.3 behavior seems wrong but differently.

Could someone confirm this is a bug?
Is it likely to be fixed?


Files

autoload_bug.rb (2.18 KB) autoload_bug.rb Eregon (Benoit Daloze), 02/23/2015 12:34 PM
0001-load.c-unlock-the-new-shield.patch (1005 Bytes) 0001-load.c-unlock-the-new-shield.patch thedarkone (Vit Z), 07/31/2015 04:53 AM

Related issues 1 (0 open1 closed)

Related to Ruby - Bug #7530: Concurrent loads fail with mutex errorsClosedGlass_saga (Masaki Matsushita)Actions

Updated by thedarkone (Vit Z) about 10 years ago Actions #1 [ruby-core:70197]

That broken rubyspec was written by me. The problem lies with repeatedly autoloading the same .rb file, since this should be impossible, the spec manually deletes the loaded path from $LOADED_FEATURES and then re-declares the autoload, this is currently broken on MRI.

Here's a much smaller repro script:

def with_autoload_file(const_name, file_name = 'foo.rb')
  mangled_file_name = file_name.sub(/\.rb\Z/, '____temp____autoload.rb') # avoid accidentally overwriting any files
  File.write(mangled_file_name, "sleep 1; module #{const_name}; end")
  autoload const_name, File.expand_path(mangled_file_name.sub(/\.rb\Z/, ''))
  $LOADED_FEATURES.delete(File.expand_path(mangled_file_name)) if $LOADED_FEATURES.include?(File.expand_path(mangled_file_name))
  yield
ensure
  File.delete(mangled_file_name)
end

foo_ready = bar_waiting = bar_ready = false
t = Thread.new do
  Thread.pass until foo_ready
  Foo
  bar_waiting = true
  Thread.pass until bar_ready
  Bar
end

with_autoload_file('Foo') do
  foo_ready = true
  Foo
end

Thread.pass until bar_waiting

with_autoload_file('Bar') do
  bar_ready = true
  Bar
end

t.join

Running this results in an "uninitialized constant Bar" exception from the non-main thread.

If the last block is rearranged like this:

with_autoload_file('Bar') do
  Bar
  bar_ready = true
end

the script deadlocks (main thread deadlocks, while secondary thread t busy spins in Thread.pass until bar_ready).

If the last autoload block uses a different .rb file, everything works fine:

with_autoload_file('Bar', 'bar.rb') do
  Bar
  bar_ready = true
end

I think I've tracked the issue to an incorrectly locked load_lock's thread_shield: when rb_thread_shield_wait() returns Qfalse the failed thread creates a new thread_shield via rb_thread_shield_new(), however because rb_thread_shield_new() automatically locks the newly created shield and the branch does not return a successful ftptr, the newly installed shield is then never unlocked.

The attached patch seems to fix the issue for me.

Updated by nobu (Nobuyoshi Nakada) about 10 years ago Actions #2

  • Related to Bug #7530: Concurrent loads fail with mutex errors added

Updated by Eregon (Benoit Daloze) about 10 years ago Actions #3 [ruby-core:71062]

Could someone review the patch and apply it or find an alternative fix?

Updated by normalperson (Eric Wong) about 10 years ago Actions #4 [ruby-core:71090]

wrote:

Could someone review the patch and apply it or find an alternative fix?

Fwiw, I mentioned in [ruby-core:70359] that I tried it for [Bug #11384]
without success, but Redmine + list integration was broken at the
time.

Updated by Eregon (Benoit Daloze) about 10 years ago Actions #5 [ruby-core:71092]

On Wed, Oct 14, 2015 at 9:56 PM, Eric Wong wrote:

Fwiw, I mentioned in [ruby-core:70359] that I tried it for [Bug #11384]
without success, but Redmine + list integration was broken at the
time.

Ah indeed I missed that, thanks.
Did you try for this issue in particular?

About #11384, I guess we need another fix then :/

Updated by normalperson (Eric Wong) about 10 years ago Actions #6 [ruby-core:71095]

Benoit Daloze wrote:

On Wed, Oct 14, 2015 at 9:56 PM, Eric Wong wrote:

Fwiw, I mentioned in [ruby-core:70359] that I tried it for [Bug #11384]
without success, but Redmine + list integration was broken at the
time.

Ah indeed I missed that, thanks.
Did you try for this issue in particular?

Ah, yes, the repro script in [ruby-core:70197] does get fixed on my
machine with the patch. I don't understand this code enough to
know if it breaks anything else, or if #11384 is a different bug
or a different manifestation of the same bug.

Updated by eugeneius (Eugene Kenny) over 7 years ago Actions #7 [ruby-core:86932]

The simpler repro script runs successfully from 2.3.0 onwards, and git bisect between 2.2.0 and 2.3.0 shows that r59221 (from #11384) fixed it.

Updated by jeremyevans0 (Jeremy Evans) over 6 years ago Actions #8

  • Status changed from Open to Closed
Actions

Also available in: PDF Atom