Project

General

Profile

Bug #12689

Thread isolation of $~ and $_

Added by headius (Charles Nutter) almost 3 years ago. Updated over 1 year ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:76976]

Description

We are debating what is correct behavior now, and what should be correct behavior in the future, for the thread-visibility of the special variables %~ and $_

We have several examples from https://github.com/jruby/jruby/issues/3031 that seem to exhibit conflicting behavior...or at least the behavior is unexpected in many cases.

$ ruby23 -e 'p = proc { p $~; "foo" =~ /foo/ }; Thread.new {p.call}.join; Thread.new{p.call}.join'
nil
nil

$ ruby23 -e 'def foo; proc { p $~; "foo" =~ /foo/ }; end; p = foo; Thread.new {p.call}.join; Thread.new{p.call}.join'
nil
#<MatchData "foo">

$ ruby23 -e 'p = proc { p $~; "foo" =~ /foo/ }; def foo(p); Thread.new {p.call}.join; Thread.new{p.call}.join; end; foo(p)'
nil
#<MatchData "foo">

$ ruby23 -e 'class Foo; P = proc { p $~; "foo" =~ /foo/ }; def foo; Thread.new {P.call}.join; Thread.new{P.call}.join; end; end; Foo.new.foo'
nil
#<MatchData "foo">

$ ruby23 -e 'def foo; p = proc { p $~; "foo" =~ /foo/ }; Thread.new {p.call}.join; Thread.new{p.call}.join; end; foo'
nil
nil

$ ruby23 -e 'def foo; p = proc { p $~; "foo" =~ /foo/ }; bar(p); end; def bar(p); Thread.new {p.call}.join; Thread.new{p.call}.join; end; foo'
nil
#<MatchData "foo">

These cases exhibit some oddities in whether $~ (and presumably $_) are shared across threads.

The immediate thought is that they should be both frame and thread-local...but ko1 points out that such a change would break cases like this:

def foo
  /foo/ =~ 'foo'
  Proc.new{
    p $~
  }
end

Thread.new{
  foo.call
}.join

So there's a clear conflict here. Users sometimes expect the $~ value to be shared across threads (at least for read, as in ko1's example) and sometimes do not want it shared at all (as in the case of https://github.com/jruby/jruby/issues/3031

Now we discuss.

History

Updated by headius (Charles Nutter) almost 3 years ago

To clarify the one-liners' behavior: when the thread's top-level frame is the same as a proc's frame that it calls, it will see thread-local values. When the proc's frame is not the top-level frame for the thread, the memory location for $~ will be shared across all threads.

Updated by Eregon (Benoit Daloze) almost 3 years ago

Maybe $~ is always set in the surrounding method frame, but never in a block frame?
There is still a lot of weird cases to explain though.

Updated by darix (Marcus Rückert) almost 3 years ago

I wonder, if moving away from those special $ variables to explicit match objects wouldn't be a possible solution to this.

Updated by headius (Charles Nutter) almost 3 years ago

Marcus Rückert wrote:

I wonder, if moving away from those special $ variables to explicit match objects wouldn't be a possible solution to this.

If you always use the returned MatchData then you can avoid these problems. This only affects consumers of the implicit $~ variable.

Unfortunately, that also includes some core methods that access the $_ variable, so there's possibility of steppping on threading even if you never use the implicit variables in your code.

Updated by darix (Marcus Rückert) almost 3 years ago

That's why i would deprecate the $ variables and make people use match objects all the time.

I mean the stdlib even has code that reads

matchdata = $~

That feels just wrong.

Maybe 2.4 could start issue warnings about using $ variables and 3.0 removes them?

Updated by naruse (Yui NARUSE) almost 3 years ago

Below example shows 2nd thread overwrites 1st thread's regexp match result.

% ruby -e 'P = proc {|s| p [s, $~]; sleep 1; /foo.*/=~s; sleep 1; p [s,$~] }; def foo; Thread.new{P.call("foobar")}; sleep 0.2; Thread.new{P.call("foo")}; end; foo;sleep 5'
["foobar", nil]
["foo", nil]
["foobar", #<MatchData "foo">]
["foo", #<MatchData "foo">]

This example doesn't happen above phenomena different from above one.

% ruby -e 'P = proc {|s| p [s, $~]; sleep 1; /foo.*/=~s; sleep 1; p [s,$~] }; Thread.new{P.call("foobar")}; sleep 0.2; Thread.new{P.call("foo")}; sleep 5'
["foobar", nil]
["foo", nil]
["foobar", #<MatchData "foobar">]
["foo", #<MatchData "foo">]

Updated by headius (Charles Nutter) over 1 year ago

We've had another report in JRuby about this behavior. In this case, two threads doing String#split step on each others backrefs because they share a backref frame: https://github.com/jruby/jruby/issues/4868

This case can't even be avoided. Even if you don't use $~ there are threading issues. These may come into play for MRI in either split or other methods that consume backref and lastline, but they'll certainly be a problem for all parallel-threaded implementations that wish to be compatible.

Updated by Eregon (Benoit Daloze) over 1 year ago

FWIW, TruffleRuby always stores the MatchData $? in a thread-local storage per frame.
It seems to work fine so far and seems to cause no real-world incompatibilities.

Updated by ko1 (Koichi Sasada) over 1 year ago

Eregon (Benoit Daloze) wrote:

FWIW, TruffleRuby always stores the MatchData $? in a thread-local storage per frame.
It seems to work fine so far and seems to cause no real-world incompatibilities.

Each frame has a map (thread -> MachData)?

Updated by Eregon (Benoit Daloze) over 1 year ago

ko1 (Koichi Sasada) wrote:

Each frame has a map (thread -> MatchData)?

Conceptually yes, but it is allocated lazily and it specializes for being accessed by a single thread.
A Java ThreadLocal is used in the general case when more than one thread stores a MatchData in a frame.
https://github.com/graalvm/truffleruby/blob/vm-enterprise-0.29/src/main/java/org/truffleruby/language/threadlocal/ThreadAndFrameLocalStorage.java

Also available in: Atom PDF