Project

General

Profile

Bug #10892

Deadlock in autoload

Added by Eregon (Benoit Daloze) over 4 years ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.3.0dev (2015-02-23 trunk 49693) [x86_64-linux]
[ruby-core:68255]

Description

Updating to recent RubySpec seems to show a bug under concurrent autoload.
I attach the extracted logic to reproduce.

At me, the script ends with either, in most cases,

autoload_bug.rb:105:in `value': No live threads left. Deadlock? (fatal)
    from autoload_bug.rb:105:in `map'
    from autoload_bug.rb:105:in `<main>'

Or:

autoload_bug.rb:95:in `const_get': uninitialized constant Mod1 (NameError)
    from autoload_bug.rb:95:in `block (3 levels) in <main>'
    from autoload_bug.rb:86:in `each'
    from autoload_bug.rb:86:in `block (2 levels) in <main>'

Which both seem incorrect behavior.
All versions from 2.0 seem affected, and 1.9.3 behavior seems wrong but differently.

Could someone confirm this is a bug?
Is it likely to be fixed?


Files

autoload_bug.rb (2.18 KB) autoload_bug.rb Eregon (Benoit Daloze), 02/23/2015 12:34 PM
0001-load.c-unlock-the-new-shield.patch (1005 Bytes) 0001-load.c-unlock-the-new-shield.patch thedarkone (Vit Z), 07/31/2015 04:53 AM

Related issues

Related to Ruby master - Bug #7530: Concurrent loads fail with mutex errorsClosedActions

History

Updated by thedarkone (Vit Z) over 4 years ago

That broken rubyspec was written by me. The problem lies with repeatedly autoloading the same .rb file, since this should be impossible, the spec manually deletes the loaded path from $LOADED_FEATURES and then re-declares the autoload, this is currently broken on MRI.

Here's a much smaller repro script:

def with_autoload_file(const_name, file_name = 'foo.rb')
  mangled_file_name = file_name.sub(/\.rb\Z/, '____temp____autoload.rb') # avoid accidentally overwriting any files
  File.write(mangled_file_name, "sleep 1; module #{const_name}; end")
  autoload const_name, File.expand_path(mangled_file_name.sub(/\.rb\Z/, ''))
  $LOADED_FEATURES.delete(File.expand_path(mangled_file_name)) if $LOADED_FEATURES.include?(File.expand_path(mangled_file_name))
  yield
ensure
  File.delete(mangled_file_name)
end

foo_ready = bar_waiting = bar_ready = false
t = Thread.new do
  Thread.pass until foo_ready
  Foo
  bar_waiting = true
  Thread.pass until bar_ready
  Bar
end

with_autoload_file('Foo') do
  foo_ready = true
  Foo
end

Thread.pass until bar_waiting

with_autoload_file('Bar') do
  bar_ready = true
  Bar
end

t.join

Running this results in an "uninitialized constant Bar" exception from the non-main thread.

If the last block is rearranged like this:

with_autoload_file('Bar') do
  Bar
  bar_ready = true
end

the script deadlocks (main thread deadlocks, while secondary thread t busy spins in Thread.pass until bar_ready).

If the last autoload block uses a different .rb file, everything works fine:

with_autoload_file('Bar', 'bar.rb') do
  Bar
  bar_ready = true
end

I think I've tracked the issue to an incorrectly locked load_lock's thread_shield: when rb_thread_shield_wait() returns Qfalse the failed thread creates a new thread_shield via rb_thread_shield_new(), however because rb_thread_shield_new() automatically locks the newly created shield and the branch does not return a successful ftptr, the newly installed shield is then never unlocked.

The attached patch seems to fix the issue for me.

#2

Updated by nobu (Nobuyoshi Nakada) over 4 years ago

  • Related to Bug #7530: Concurrent loads fail with mutex errors added

Updated by Eregon (Benoit Daloze) about 4 years ago

Could someone review the patch and apply it or find an alternative fix?

Updated by normalperson (Eric Wong) about 4 years ago

eregontp@gmail.com wrote:

Could someone review the patch and apply it or find an alternative fix?

Fwiw, I mentioned in [ruby-core:70359] that I tried it for [Bug #11384]
without success, but Redmine + list integration was broken at the
time.

Updated by Eregon (Benoit Daloze) about 4 years ago

On Wed, Oct 14, 2015 at 9:56 PM, Eric Wong normalperson@yhbt.net wrote:

Fwiw, I mentioned in [ruby-core:70359] that I tried it for [Bug #11384]
without success, but Redmine + list integration was broken at the
time.

Ah indeed I missed that, thanks.
Did you try for this issue in particular?

About #11384, I guess we need another fix then :/

Updated by normalperson (Eric Wong) about 4 years ago

Benoit Daloze eregontp@gmail.com wrote:

On Wed, Oct 14, 2015 at 9:56 PM, Eric Wong normalperson@yhbt.net wrote:

Fwiw, I mentioned in [ruby-core:70359] that I tried it for [Bug #11384]
without success, but Redmine + list integration was broken at the
time.

Ah indeed I missed that, thanks.
Did you try for this issue in particular?

Ah, yes, the repro script in [ruby-core:70197] does get fixed on my
machine with the patch. I don't understand this code enough to
know if it breaks anything else, or if #11384 is a different bug
or a different manifestation of the same bug.

Updated by eugeneius (Eugene Kenny) over 1 year ago

The simpler repro script runs successfully from 2.3.0 onwards, and git bisect between 2.2.0 and 2.3.0 shows that r59221 (from #11384) fixed it.

#8

Updated by jeremyevans0 (Jeremy Evans) 5 months ago

  • Status changed from Open to Closed

Also available in: Atom PDF