Project

General

Profile

Actions

Bug #18663

closed

Autoload doesn't work with fiber context switch.

Added by ioquatix (Samuel Williams) 3 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:108075]

Description

As discussed most recently here: https://github.com/ruby/debug/issues/580

The following program appears to work:

#!/usr/bin/env ruby

require 'tempfile'

Tempfile.create(['foo', '.rb']) do |file|
  file.write(<<~RUBY)
    #
    $stderr.puts 1; q = Queue.new
    $stderr.puts 2; t = Thread.new{q.pop}
    $stderr.puts 3; q << :sig
    $stderr.puts 4; t.join

    sleep 1

    class C
    end
  RUBY
  file.close

  autoload :C, file.path

  Thread.new do
    threads = 3.times.map do |i|
      Thread.new do
        $stderr.puts "LOADING C" 
        $stderr.puts C
      end
    end

    threads.each(&:join)
  end.join
end

This one doesn't:

#!/usr/bin/env ruby

require 'tempfile'
require_relative 'lib/async'

Tempfile.create(['foo', '.rb']) do |file|
  file.write(<<~RUBY)
    #
    $stderr.puts 1; q = Queue.new
    $stderr.puts 2; t = Thread.new{q.pop}
    $stderr.puts 3; q << :sig
    $stderr.puts 4; t.join

    class C
    end
  RUBY
  file.close

  autoload :C, file.path

  Async do |task|
    3.times do |i|
      task.async do
        $stderr.puts "LOADING C" 
        $stderr.puts C
      end
    end
  end.wait
end

Semantically, they should be very similar. It feels like someone is checking the current thread rather than the current fiber or there is a poor implementation of locking somewhere, however I don't actually know for sure yet, investigation is required.


Related issues 1 (0 open1 closed)

Has duplicate Ruby master - Misc #18662: Fiber scheduling and Module#autoloadClosedActions

Updated by ioquatix (Samuel Williams) 3 months ago

Pure fiber repro.

#!/usr/bin/env ruby

require 'tempfile'

Tempfile.create(['foo', '.rb']) do |file|
  file.write(<<~RUBY)
    Fiber.yield

    class C
    end
  RUBY
  file.close

  autoload :C, file.path

  3.times do |i|
    fiber = Fiber.new do
      $stderr.puts "#{i} LOADING C" 
      $stderr.puts C
    rescue NameError
      $stderr.puts "NameError: #{i}"
      raise
    end

    fiber.resume
  end
end

Updated by ioquatix (Samuel Williams) 3 months ago

/* always on stack, no need to mark */
struct autoload_state {
    struct autoload_const *ac;
    VALUE result;
    VALUE thread;
    struct list_head waitq;
};

I'm suspicious of VALUE thread.

Updated by ioquatix (Samuel Williams) 3 months ago

Autoload uses a spin lock:

static VALUE
autoload_sleep(VALUE arg)
{
    struct autoload_state *state = (struct autoload_state *)arg;

    /*
     * autoload_reset in other thread will resume us and remove us
     * from the waitq list
     */
    do {
	rb_thread_sleep_deadly();
    } while (state->thread != Qfalse);

    return Qfalse;
}

It basically spins on state->thread.

I hesitate to call it poorly implemented, but it doesn’t look great. I don’t fully understand the implementation yet.

Updated by ioquatix (Samuel Williams) 3 months ago

@fxn (Xavier Noria) If I understand correctly, autoload is mostly a feature of a development environment, so if we made this a little bit slower in order to improve accuracy (i.e. using a proper mutex), it wouldn't be huge loss right? Ruby's mutex isn't that slow but I realise if we took this route, it would introduce some overhead.

Updated by fxn (Xavier Noria) 3 months ago

@ioquatix (Samuel Williams) hmmm, let me explain.

The feature in development for a web application is reloading. Ordinary gems using Zeitwerk may autoload in any project they are used, and when you eager load Rails in production you may need to autoload top-level constants like superclasses or mixins while eager loading, for example.

However, as the say goes, I can be fast if I can be wrong :). I'd go with that mutex, probably not a big deal, and fibers are key for the future of Ruby, so better be compatible in my view.

Updated by ioquatix (Samuel Williams) 3 months ago

I've tracked down the root of this bug, being that it's not yielding to the fiber scheduler and implements it's own condition variable like semantics. I'll propose a fix.

Actions #7

Updated by Eregon (Benoit Daloze) 3 months ago

  • Has duplicate Misc #18662: Fiber scheduling and Module#autoload added

Updated by ioquatix (Samuel Williams) about 2 months ago

https://github.com/ruby/ruby/pull/5788 fixes this issue.

I've confirmed that my PR fixes the given examples here.

There is a tiny bit of extra overhead; using a mutex has an object allocation, mutex lock and unlock, etc.

A light weight concurrency construct rb_condition or rb_fiber_condition that behaves like a "pessimistic mutex", i.e. avoids the allocation until it's actually needed might be a better solution but let's get it correct first, and we can optimise it later.

Just a few notes:

struct rb_condition {
  ... waitq;
}

VALUE rb_condition_wait(struct rb_condition *condition, VALUE timeout);
rb_condition_signal(struct rb_condition *condition, VALUE result); // wake up all waiters
Actions #10

Updated by nagachika (Tomoyuki Chikanaga) about 2 months ago

  • Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.7: DONTNEED, 3.0: REQUIRED, 3.1: REQUIRED

Updated by ioquatix (Samuel Williams) about 2 months ago

  • Status changed from Open to Closed

I've merged the fix.

Actions

Also available in: Atom PDF