Bug #921 [ruby-core:20797]

autoload is not thread-safe

Added by Charles Nutter 193 days ago. Updated 151 days ago.

Status :Open Start :12/23/2008
Priority :Normal Due date :
Assigned to :Nobuyoshi Nakada % Done :

0%

Category :-
Target version :-
ruby -v :

-


Description

Currently autoload is not safe to use in a multi-threaded application. To put it more bluntly, it's broken.

The current logic for autoload is as follows:

1. A special object is inserted into the target constant table, used as a marker for autoloading
2. When that constant is looked up, the marker is found and triggers autoloading
3. The marker is first removed, so the constant now appears to be undefined if retrieved concurrently
4. The associated autoload resource is required, and presumably redefines the constant in question
5. The constant lookup, upon completion of autoload, looks up the constant again and either returns its new value or proceeds with normal constant resolution

The problem arises when two or more threads try to access the constant. Because autoload is stateful and unsynchronized, the second thread may encounter the constant table in any number of states:

1. It may see the autoload has not yet fired, if the first thread has encountered the marker but not yet removed it. It would then proceed along the same autoload path, requiring the same file a second time.
2. It may not find an autoload marker, and assume the constant does not exist.
3. It may see the eventual constant the autoload was intended to define.

Of these combinations, (3) is obviously the desired behavior. (1) can only happen on native-threaded implementations that do not have a global interpreter lock, since it requires concurrency during autoload's internal logic. (2) can happen on any implementation, since while the required file is processing the original autoload constant appears to be undefined.

I have only come up with two solutions:

* When the autoload marker is encountered, it is replaced (under lock) with an "autoload in progress" marker. All subsequent threads will then see this marker and wait for the autoloading process to complete. the mechanics of this are a little tricky, but it would guarantee concurrent autoloads would only load the target file once and would always return the intended value to concurrent readers.
* A single autoload mutex, forcing all autoloads to happen in serial.

There is a potential for deadlock in the first solution, unfortunately, since two threads autoloading two constants with circular autoloaded constant dependencies would ultimately deadlock, each waiting for the other to complete. Because of this, a single autoload mutex for all autoloads may be the only safe solution.

History

12/23/2008 03:05 AM - Dave Thomas

On Dec 22, 2008, at 11:10 AM, Charles Nutter wrote:

> * When the autoload marker is encountered, it is replaced (under  
> lock) with an "autoload in progress" marker. All subsequent threads  
> will then see this marker and wait for the autoloading process to  
> complete. the mechanics of this are a little tricky, but it would  
> guarantee concurrent autoloads would only load the target file once  
> and would always return the intended value to concurrent readers.
> * A single autoload mutex, forcing all autoloads to happen in serial.
>
> There is a potential for deadlock in the first solution,  
> unfortunately, since two threads autoloading two constants with  
> circular autoloaded constant dependencies would ultimately deadlock,  
> each waiting for the other to complete. Because of this, a single  
> autoload mutex for all autoloads may be the only safe solution.

Why not use the same lock that's used by require? The two are clearly  
related, and if there's a require in progress, you'd want to suspend  
autoload until it has finished, and the result of the require might  
affect the behavior of the autoload.


Dave

12/23/2008 04:52 AM - Charles Nutter

Dave Thomas wrote:
> Why not use the same lock that's used by require? The two are clearly 
> related, and if there's a require in progress, you'd want to suspend 
> autoload until it has finished, and the result of the require might 
> affect the behavior of the autoload.

It probably could use the same lock; the trickier bit is that in between 
encountering the autoload and requiring the file there's already state 
changes happening. So I think we still need to ensure the autoload 
marker is mutated under lock as well.

To be honest, I'm not sure autoload is really even possible to make safe 
in the presence of threads...the design of it may be inherently 
incompatible. But I'm willing to talk through some possibilities and see 
what we can come up with.

12/23/2008 11:02 PM - Dave Thomas

On Dec 22, 2008, at 1:44 PM, Charles Oliver Nutter wrote:

> It probably could use the same lock; the trickier bit is that in  
> between encountering the autoload and requiring the file there's  
> already state changes happening. So I think we still need to ensure  
> the autoload marker is mutated under lock as well.
>
> To be honest, I'm not sure autoload is really even possible to make  
> safe in the presence of threads...the design of it may be inherently  
> incompatible. But I'm willing to talk through some possibilities and  
> see what we can come up with.

Yeah--you'd need to make require transactional somehow.

Stupid question: does anyone actually use autoload?

Dave

12/24/2008 04:34 AM - Eric Hodel

On Dec 23, 2008, at 05:53 AM, Dave Thomas wrote:
> On Dec 22, 2008, at 1:44 PM, Charles Oliver Nutter wrote:
>> It probably could use the same lock; the trickier bit is that in  
>> between encountering the autoload and requiring the file there's  
>> already state changes happening. So I think we still need to ensure  
>> the autoload marker is mutated under lock as well.
>>
>> To be honest, I'm not sure autoload is really even possible to make  
>> safe in the presence of threads...the design of it may be  
>> inherently incompatible. But I'm willing to talk through some  
>> possibilities and see what we can come up with.
>
> Yeah--you'd need to make require transactional somehow.
>
> Stupid question: does anyone actually use autoload?

RubyGems now does, but the code wasn't introduced by me.

12/24/2008 06:27 AM - Roger Pack

> Currently autoload is not safe to use in a multi-threaded application.

My preference would be for it at most one thread to be "in an
autoload" at a time, so similar to require, no chance for confusion.
But then again that's just me :)

-r-

12/24/2008 07:18 AM - Austin Ziegler

On Tue, Dec 23, 2008 at 3:51 PM, Christian Neukirchen
<chneukirchen@gmail.com> wrote:
> Dave Thomas <dave@pragprog.com> writes:
>> On Dec 22, 2008, at 1:44 PM, Charles Oliver Nutter wrote:
>>> It probably could use the same lock; the trickier bit is that in
>>> between encountering the autoload and requiring the file there's
>>> already state changes happening. So I think we still need to ensure
>>> the autoload marker is mutated under lock as well.
>>>
>>> To be honest, I'm not sure autoload is really even possible to make
>>> safe in the presence of threads...the design of it may be inherently
>>> incompatible. But I'm willing to talk through some possibilities and
>>> see what we can come up with.
>>
>> Yeah--you'd need to make require transactional somehow.
>>
>> Stupid question: does anyone actually use autoload?
>
> Rack does.  Merb does as well.

So does RbGCCXML, a random library that I'm using (I see no point as
to why it's doing so, to be honest).

-austin
-- 
Austin Ziegler * halostatue@gmail.com * http://www.halostatue.ca/
               * austin@halostatue.ca * http://www.halostatue.ca/feed/
               * austin@zieglers.ca

12/24/2008 07:25 AM - Charles Nutter

Dave Thomas wrote:
>> To be honest, I'm not sure autoload is really even possible to make 
>> safe in the presence of threads...the design of it may be inherently 
>> incompatible. But I'm willing to talk through some possibilities and 
>> see what we can come up with.
> 
> Yeah--you'd need to make require transactional somehow.

Yeah, that's essentially it (though it's more specific to autoload). 
You'd need to lock the constant under autoload in such a way that other 
threads would block until autoloading completed. And unless it were a 
single global lock for all autoload logic, there's a strong potential 
for deadlock if two threads encounter opposing constants at the same time.

> Stupid question: does anyone actually use autoload?

They do, but they probably shouldn't. It's broken.

12/24/2008 08:13 AM - Dave Thomas

On Dec 23, 2008, at 4:16 PM, Charles Oliver Nutter wrote:

>> Yeah--you'd need to make require transactional somehow.
>
> Yeah, that's essentially it (though it's more specific to autoload).  
> You'd need to lock the constant under autoload in such a way that  
> other threads would block until autoloading completed. And unless it  
> were a single global lock for all autoload logic, there's a strong  
> potential for deadlock if two threads encounter opposing constants  
> at the same time.

Would it be sufficient to add field to each constant to say "loading  
in progress in thread n"

Have the lock we've been talking about in autoload and require.

During require, set the current thread id into every newly defined  
constant, keeping a list of the ones that are set. Maintain a stack of  
the current requires. (These will all be in the same thread because of  
the mutex).

When the last require finishes, go though all constants in the list  
and unset the thread id in them. Finally, signal a condition variable.

Constant lookup then works like this: If the constant isn't defined,  
claim the require mutex and then check for autoloading stuff.

If the constant is defined, check to see if the thread id is set in  
the constant. If it is, wait on the condition variable. That way, you  
won't pick up partially initialized constants that are being  
autoloaded or required by other threads.

And, I know, this sounds ridiculously complex. But the only  
alternative I see is locking the entire interpreter during requires.



Dave

12/24/2008 11:51 AM - Nobuyoshi Nakada

Hi,

At Tue, 23 Dec 2008 02:10:34 +0900,
Charles Nutter wrote in [ruby-core:20797]:
> The current logic for autoload is as follows:

This description is old.  Already autoload sees the same lock
used by require, as Dave Thomas mentioned in [ruby-core:20801].

So I guess the autoload specific issue has gone, and we could
focus on the issue of recursive/circular require, i.e.,
[ruby-core:20794].

-- 
Nobu Nakada

12/24/2008 03:05 PM - Charles Nutter

Nobuyoshi Nakada wrote:
> Hi,
> 
> At Tue, 23 Dec 2008 02:10:34 +0900,
> Charles Nutter wrote in [ruby-core:20797]:
>> The current logic for autoload is as follows:
> 
> This description is old.  Already autoload sees the same lock
> used by require, as Dave Thomas mentioned in [ruby-core:20801].
> 
> So I guess the autoload specific issue has gone, and we could
> focus on the issue of recursive/circular require, i.e.,
> [ruby-core:20794].

So if two threads lookup the same autoload constant at the same time, is 
the second thread guaranteed to wait until the first thread has finished 
requiring?

I was not aware this had been fixed or decided.

12/25/2008 04:56 AM - Nobuyoshi Nakada

Hi,

At Wed, 24 Dec 2008 14:56:37 +0900,
Charles Oliver Nutter wrote in [ruby-core:20853]:
> > So I guess the autoload specific issue has gone, and we could
> > focus on the issue of recursive/circular require, i.e.,
> > [ruby-core:20794].
> 
> So if two threads lookup the same autoload constant at the same time, is 
> the second thread guaranteed to wait until the first thread has finished 
> requiring?

It's same as two threads requiring without autoloading.

-- 
Nobu Nakada

12/25/2008 08:33 AM - Charles Nutter

Nobuyoshi Nakada wrote:
> It's same as two threads requiring without autoloading.

I'm not sure this is safe enough. If the first thread has already set 
the constant, the second thread would not know it is an autoload and 
would return its value immediately. The resource the constant points at 
may not be done loading.

my_file.rb:

module XXX
   class Foo
     # HERE
     def bar; ... end
   end
end

my_autoload.rb:

module XXX
   autoload :Foo, "my_file.rb"
end

# Thread 1
Thread.new { XXX::Foo.new.bar }

# Thread 2
Thread.new { XXX::Foo.new.bar }

Thread 1 tries to look up XXX::Foo and starts requiring my_file.rb. It 
reaches # HERE above and the XXX::Foo constant now points at the Foo 
class. Now thread 2 looks up XXX::Foo, and since it is not an autoload 
marker it gets the incomplete Foo class. Thread 2 blows up when it tries 
to call bar because bar is not yet defined.

12/27/2008 03:37 PM - Stephen Bannasch

At 10:53 PM +0900 12/23/08, Dave Thomas wrote:
> Stupid question: does anyone actually use autoload?

Since the 2.2.2 release the rails trunk codebase has been doing a mass replacement of require with autoload.

02/03/2009 10:30 AM - Shyouhei Urabe

  • Assigned to set to Nobuyoshi Nakada
  • ruby -v set to -

Also available in: Atom PDF