Bug #12689
openThread isolation of $~ and $_
Description
We are debating what is correct behavior now, and what should be correct behavior in the future, for the thread-visibility of the special variables %~
and $_
We have several examples from https://github.com/jruby/jruby/issues/3031 that seem to exhibit conflicting behavior...or at least the behavior is unexpected in many cases.
$ ruby23 -e 'p = proc { p $~; "foo" =~ /foo/ }; Thread.new {p.call}.join; Thread.new{p.call}.join'
nil
nil
$ ruby23 -e 'def foo; proc { p $~; "foo" =~ /foo/ }; end; p = foo; Thread.new {p.call}.join; Thread.new{p.call}.join'
nil
#<MatchData "foo">
$ ruby23 -e 'p = proc { p $~; "foo" =~ /foo/ }; def foo(p); Thread.new {p.call}.join; Thread.new{p.call}.join; end; foo(p)'
nil
#<MatchData "foo">
$ ruby23 -e 'class Foo; P = proc { p $~; "foo" =~ /foo/ }; def foo; Thread.new {P.call}.join; Thread.new{P.call}.join; end; end; Foo.new.foo'
nil
#<MatchData "foo">
$ ruby23 -e 'def foo; p = proc { p $~; "foo" =~ /foo/ }; Thread.new {p.call}.join; Thread.new{p.call}.join; end; foo'
nil
nil
$ ruby23 -e 'def foo; p = proc { p $~; "foo" =~ /foo/ }; bar(p); end; def bar(p); Thread.new {p.call}.join; Thread.new{p.call}.join; end; foo'
nil
#<MatchData "foo">
These cases exhibit some oddities in whether $~ (and presumably $_) are shared across threads.
The immediate thought is that they should be both frame and thread-local...but ko1 points out that such a change would break cases like this:
def foo
/foo/ =~ 'foo'
Proc.new{
p $~
}
end
Thread.new{
foo.call
}.join
So there's a clear conflict here. Users sometimes expect the $~ value to be shared across threads (at least for read, as in ko1's example) and sometimes do not want it shared at all (as in the case of https://github.com/jruby/jruby/issues/3031
Now we discuss.
Updated by headius (Charles Nutter) over 8 years ago
To clarify the one-liners' behavior: when the thread's top-level frame is the same as a proc's frame that it calls, it will see thread-local values. When the proc's frame is not the top-level frame for the thread, the memory location for $~ will be shared across all threads.
Updated by Eregon (Benoit Daloze) over 8 years ago
Maybe $~ is always set in the surrounding method frame, but never in a block frame?
There is still a lot of weird cases to explain though.
Updated by darix (Marcus Rückert) over 8 years ago
I wonder, if moving away from those special $ variables to explicit match objects wouldn't be a possible solution to this.
Updated by headius (Charles Nutter) over 8 years ago
Marcus Rückert wrote:
I wonder, if moving away from those special $ variables to explicit match objects wouldn't be a possible solution to this.
If you always use the returned MatchData then you can avoid these problems. This only affects consumers of the implicit $~ variable.
Unfortunately, that also includes some core methods that access the $_ variable, so there's possibility of steppping on threading even if you never use the implicit variables in your code.
Updated by darix (Marcus Rückert) over 8 years ago
That's why i would deprecate the $ variables and make people use match objects all the time.
I mean the stdlib even has code that reads
matchdata = $~
That feels just wrong.
Maybe 2.4 could start issue warnings about using $ variables and 3.0 removes them?
Updated by naruse (Yui NARUSE) over 8 years ago
Below example shows 2nd thread overwrites 1st thread's regexp match result.
% ruby -e 'P = proc {|s| p [s, $~]; sleep 1; /foo.*/=~s; sleep 1; p [s,$~] }; def foo; Thread.new{P.call("foobar")}; sleep 0.2; Thread.new{P.call("foo")}; end; foo;sleep 5'
["foobar", nil]
["foo", nil]
["foobar", #<MatchData "foo">]
["foo", #<MatchData "foo">]
This example doesn't happen above phenomena different from above one.
% ruby -e 'P = proc {|s| p [s, $~]; sleep 1; /foo.*/=~s; sleep 1; p [s,$~] }; Thread.new{P.call("foobar")}; sleep 0.2; Thread.new{P.call("foo")}; sleep 5'
["foobar", nil]
["foo", nil]
["foobar", #<MatchData "foobar">]
["foo", #<MatchData "foo">]
Updated by headius (Charles Nutter) about 7 years ago
We've had another report in JRuby about this behavior. In this case, two threads doing String#split step on each others backrefs because they share a backref frame: https://github.com/jruby/jruby/issues/4868
This case can't even be avoided. Even if you don't use $~
there are threading issues. These may come into play for MRI in either split
or other methods that consume backref and lastline, but they'll certainly be a problem for all parallel-threaded implementations that wish to be compatible.
Updated by Eregon (Benoit Daloze) about 7 years ago
FWIW, TruffleRuby always stores the MatchData $? in a thread-local storage per frame.
It seems to work fine so far and seems to cause no real-world incompatibilities.
Updated by ko1 (Koichi Sasada) about 7 years ago
Eregon (Benoit Daloze) wrote:
FWIW, TruffleRuby always stores the MatchData $? in a thread-local storage per frame.
It seems to work fine so far and seems to cause no real-world incompatibilities.
Each frame has a map (thread -> MachData)?
Updated by Eregon (Benoit Daloze) about 7 years ago
ko1 (Koichi Sasada) wrote:
Each frame has a map (thread -> MatchData)?
Conceptually yes, but it is allocated lazily and it specializes for being accessed by a single thread.
A Java ThreadLocal is used in the general case when more than one thread stores a MatchData in a frame.
https://github.com/graalvm/truffleruby/blob/vm-enterprise-0.29/src/main/java/org/truffleruby/language/threadlocal/ThreadAndFrameLocalStorage.java
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
- Related to Bug #8444: Regexp vars $~ and friends are not thread local added
Updated by headius (Charles Nutter) over 3 years ago
Waking this up a bit...
The original issue that prompted this bug report has now been FIXED in JRuby 9.2.17.0 by making String#split never read backref from the frame-local storage:
https://github.com/jruby/jruby/pull/6644
Further improvements will come in 9.3 with the following PR, which eliminates ALL core method reads of backref (none of them used its contents anyway, and only read it to reuse it):
https://github.com/jruby/jruby/pull/6647
With these changes, all concurrency issues surrounding $~ within core methods are resolved. Users that opt into using $~
via the variable or methods like last_match
will still have to take care that the value is not being updated across threads, but such updates will not interfere with any $~
-related methods in JRuby 9.3.
Updated by headius (Charles Nutter) over 3 years ago
Also note this experimental PR that eliminates the update of $~
from String#split, since no specs and no tests check that behavior and it seems unexpected and unpredictable (it updates to the last match during the split loop).
https://github.com/jruby/jruby/pull/6646
And a bug I just filed to eliminate backref updating from start_with?
which should be a fast boolean check and not create a MatchData or update backref:
Updated by jeremyevans0 (Jeremy Evans) about 1 month ago
- Related to Bug #20807: String#gsub fails when called from string subclass with a block passed added