Project

General

Profile

Actions

Feature #21033

open

Allow lambdas that don't access `self` to be Ractor shareable

Added by tenderlovemaking (Aaron Patterson) 4 days ago. Updated 3 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:120643]

Description

Hi,

I would like to allow lambdas that don't access self to be eligible for Ractor shareability regardless of the shareability status of self.

Consider the following code:

class Foo
  def make_lambda
    x = 123
    lambda { x }
  end
end

Ractor.make_shareable(Foo.new.make_lambda)

With Ruby 3.4.X, this will raise an exception. The reason is because self, which is an unfrozen instance of Foo, is not shareable. However, we can see from the code that the lambda doesn't access self. I would like to make lambdas such as the ones above eligible for shareability, and I've submitted a patch here.

I think we can detect access to self by scanning the instructions in the lambda. Any references to putself, getinstancevariable, or setinstancevariable will result in using the default behavior (checking the frozen status of self).

Considerations

What about eval?

I think that eval is not a problem because calling eval has an implicit reference to self:

$ ./miniruby --dump=insns -e 'lambda { eval("123") }'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,22)>
0000 putself                                                          (   1)[Li]
0001 send                                   <calldata!mid:lambda, argc:0, FCALL>, block in <main>
0004 leave

== disasm: #<ISeq:block in <main>@-e:1 (1,7)-(1,22)>
0000 putself                                                          (   1)[LiBc]
0001 putchilledstring                       "123"
0003 opt_send_without_block                 <calldata!mid:eval, argc:1, FCALL|ARGS_SIMPLE>
0005 leave                                  [Br]

If we try to call eval inside the lambda, there will be an implicit putself instruction added which means we will fall back to the old behavior.

What about binding?

If you call binding from inside the lambda there will be a putself instruction so we fall back to the old behavior. This is the same as the eval case.

What about binding via a local?

If you assign binding to a local, shareability will fail because the lambda references an unshareable local:

class Foo
  def make_lambda
    x = binding
    lambda { x }
  end
end

b = Foo.new.make_lambda
# exception because local `x` is not shareable
Ractor.make_shareable(b)

What about accessing binding via the proc itself?

The lambda can references itself via a local and access binding, but again this will fail isolation when locals are scanned:

class Foo
  def make_lambda
    x = lambda { x.binding.eval("self") }
  end
end

b = Foo.new.make_lambda
# exception because local `x` is not shareable
Ractor.make_shareable(b)

I think I've covered all cases where self can possibly escape. I would appreciate any feedback.

Again, here is the patch.

Thanks.


Files


Related issues 3 (1 open2 closed)

Related to Ruby master - Bug #21039: Ractor.make_shareable breaks block semantics (seeing updated captured variables) of existing blocksOpenko1 (Koichi Sasada)Actions
Related to Ruby master - Bug #18243: Ractor.make_shareable does not freeze the receiver of a Proc but allows accessing ivars of itClosedActions
Related to Ruby master - Feature #18276: `Proc#bind_call(obj)` same as `obj.instance_exec(..., &proc_obj)`RejectedActions

Updated by luke-gru (Luke Gruber) 4 days ago

Hi, cool patch. I think the state of self could escape through a defined?(@ivar) still.

Updated by tenderlovemaking (Aaron Patterson) 4 days ago

luke-gru (Luke Gruber) wrote in #note-1:

Hi, cool patch. I think the state of self could escape through a defined?(@ivar) still.

Good catch. Thank you, I fixed it.

Updated by luke-gru (Luke Gruber) 4 days ago · Edited

I'm a bit worried in the general case about eval. What if eval is called through Kernel.eval? I think what you would have to do is put a flag on the proc object that says 'check eval during runtime' in this case and do a runtime check when evaling, see if you're inside the proc when in a ractor that didn't create the proc.

Updated by tenderlovemaking (Aaron Patterson) 4 days ago

luke-gru (Luke Gruber) wrote in #note-4:

I'm a bit worried in the general case about eval. What if eval is called through Kernel.eval?

FWIU, self inside eval will refer to whatever the receiver was. So Kernel.eval("self") will return Kernel:

$ ./miniruby -e'p Kernel.eval("self")'
Kernel

I think self can only escape if it's self.eval (or eval).

Updated by tenderlovemaking (Aaron Patterson) 4 days ago

luke-gru (Luke Gruber) wrote in #note-4:

I'm a bit worried in the general case about eval. What if eval is called through Kernel.eval? I think what you would have to do is put a flag on the proc object that says 'check eval during runtime' in this case and do a runtime check when evaling, see if you're inside the proc when in a ractor that didn't create the proc.

Sorry, I misunderstood what you were saying. I think even if eval is called via Kernel.eval, self will not escape:

class Foo
  def make_lambda
    lambda { Kernel.eval("eval('self')") }
  end
end

p Foo.new.make_lambda.call # => Kernel

Updated by luke-gru (Luke Gruber) 4 days ago · Edited

Sorry, I should have tested what I thought could happen.

I thought this could happen:

class Foo
  def make_block
    l = self
    proc { Kernel.eval('binding.local_variable_get("l")') }
  end
end

f = Foo.new
bl = f.make_block
p bl.call
Ractor.make_shareable(bl)
p bl.call

r = Ractor.new(bl) do |b|
 obj = b.call
 p obj
end
r.take

But it doesn't work, after the Ractor.make_shareable call, bl.call gives a NameError. I'm not sure why, but it looks fine. It might be nice
to add a test for this to make sure the binding is wiped, though.

Updated by tenderlovemaking (Aaron Patterson) 4 days ago

luke-gru (Luke Gruber) wrote in #note-7:

But it doesn't work, after the Ractor.make_shareable call, bl.call gives a NameError. I'm not sure why, but it looks fine. It might be nice
to add a test for this to make sure the binding is wiped, though.

I see, that is very odd. I'll update the patch with a test for this case. I changed your example to this:

class Foo
  def make_block
    l = self
    proc { Kernel.eval('[binding.eval("self"), binding.local_variable_get("l")]') }
  end
end

f = Foo.new
f.freeze
bl = f.make_block
p bl.call
Ractor.make_shareable(bl)
p bl.call

I would have expected both cases to raise a NameError. I'm not sure why self is Kernel, but binding seems to have an entry for l? This seems like a bug in the non-Ractor shareable case.

Updated by Eregon (Benoit Daloze) 4 days ago

$ ruby -e 'p lambda { Kernel.binding.receiver }.call'             
main
$ ruby --dump=insns -e 'p lambda { Kernel.binding.receiver }.call'
...
== disasm: #<ISeq:block in <main>@-e:1 (1,9)-(1,36)>
0000 opt_getconstant_path                   <ic:0 Kernel>             (   1)[LiBc]
0002 opt_send_without_block                 <calldata!mid:binding, argc:0, ARGS_SIMPLE>
0004 opt_send_without_block                 <calldata!mid:receiver, argc:0, ARGS_SIMPLE>
0006 leave                                  [Br]

I think it's quite brittle.
Things like https://github.com/ko1/debug_inspector would also break it.
And when it breaks it's a potential segfault with Ractors.

I think something explicit like the existing nil.instance_exec { lambda { ... } } is better.
Or maybe some special syntax for -> {} lambdas to explicitly not capture self like -> self=nil {} or something.
Related: https://bugs.ruby-lang.org/issues/18243#note-5

Updated by Eregon (Benoit Daloze) 4 days ago

This also breaks that analysis:

$ ruby -e 'p lambda { ->{}.binding.receiver }.call' 
main
$ ruby --dump=insns -e 'p lambda { ->{}.binding.receiver }.call'
...

== disasm: #<ISeq:block in <main>@-e:1 (1,9)-(1,34)>
0000 putspecialobject                       1                         (   1)[LiBc]
0002 send                                   <calldata!mid:lambda, argc:0, FCALL>, block (2 levels) in <main>
0005 opt_send_without_block                 <calldata!mid:binding, argc:0, ARGS_SIMPLE>
0007 opt_send_without_block                 <calldata!mid:receiver, argc:0, ARGS_SIMPLE>
0009 leave                                  [Br]

== disasm: #<ISeq:block (2 levels) in <main>@-e:1 (1,13)-(1,15)>
0000 putnil                                                           (   1)[Bc]
0001 leave                                  [Br]

Updated by tenderlovemaking (Aaron Patterson) 4 days ago

Good catches, thank you. I knew there must be something I was missing, but wasn't sure what.

Eregon (Benoit Daloze) wrote in #note-9:

I think it's quite brittle.

I agree, but it was the simplest patch.

Things like https://github.com/ko1/debug_inspector would also break it.
And when it breaks it's a potential segfault with Ractors.

I think something explicit like the existing nil.instance_exec { lambda { ... } } is better.
Or maybe some special syntax for -> {} lambdas to explicitly not capture self like -> self=nil {} or something.

I really don't like the idea of new syntax. Adding syntax makes it much harder to port existing applications to use Ractors.

I'm specifically looking at adding Ractor support to Rails at the moment, and there are blocks that are shared between threads and would need to be shared between Ractors. New syntax / idioms would mean we have to change all lambda creation sites.

Maybe we could add a compilation pass that does the escape analysis? Alternatively, maybe calling Ractor.make_shareable() on a proc would set self on the proc to nil?

Updated by Eregon (Benoit Daloze) 4 days ago · Edited

tenderlovemaking (Aaron Patterson) wrote in #note-11:

Adding syntax makes it much harder to port existing applications to use Ractors.
I'm specifically looking at adding Ractor support to Rails at the moment, and there are blocks that are shared between threads and would need to be shared between Ractors.

I think porting any non-trivial code to use Ractors is a huge effort (I would think most gems out there do not work when used inside a Ractor, which seems a huge compatibility problem for that programming model).
Ractor seems fundamentally incompatible with much of how Ruby code works, a lot of it being mutation, sharing state, configuration, just having state in multiple places, etc and having access to everything everywhere.
It's like forcing the actor model when there are so many other concurrency models and the actor one is only good for some (small) subset of it.

So my impression is the issue of procs/lambdas being not shareable is just one of the many hurdles (but maybe I'm wrong).

New syntax / idioms would mean we have to change all lambda creation sites.

I understand why new syntax would be very annoying.
But one could define def isolated(&) = nil.instance_exec(&) or so and then use isolated { lambda { ... } }.
Or maybe a new argument/kwarg to #lambda then or some new method (not sure how well that would work on older versions though).
Or a new magic comment?
I think it makes some sense to change all blocks meant to be passed to a Ractor, because changing self to nil not where the block is declared would lead to lots of confusion (NoMethodError on nil much later, looking at the code self can't possibly be nil), it breaks the intent of the person who wrote that block, and also to anyone reading that code.

It seems nice if we could reliably auto-detect blocks which don't rely on self, indeed, but I think that's basically impossible with things like debug_inspector and Proc#binding and eval.
I think it applies to a pretty small set of blocks, as I would guess most blocks rely on self, e.g. to call an instance method (or even just to call a Kernel instance method).
Also it doesn't seem nice if a program would behave differently by adding a p(value) in some block to debug (which adds a putself) and that would cause a Ractor::IsolationError or so.

Updated by Eregon (Benoit Daloze) 4 days ago · Edited

isolated { lambda { ... } }

Maybe this could be shortened to isolated { ... } or so, if that method is declared in core as basically an alias of lambda but always isolating, like Ractor.new's block.
(on Ruby implementations without Ractor and older Ruby versions it would just create a regular lambda)

Updated by Eregon (Benoit Daloze) 3 days ago · Edited

I played with Ractor.make_shareable and it already mutates a Proc in place, breaking the basic Ruby semantics of re-assigning variables captured by blocks:

a = 1

l = nil.instance_exec {
  -> {
    a
  }
}

Ractor.make_shareable(l)
a = Object.new
p l.call # => with make_shareable above: 1, without: #<Object:0x00007f357d828020>

r = Ractor.new {
  p Ractor.receive.call # => 1, ignoring that `a = Object.new` assignment above, i.e. breaking Ruby block semantics
}

r.send(l)
r.take

Given semantics are already incompatible when using Ractor.make_shareable, maybe that should change the receiver to nil too?
It doesn't feel right (IMO it's broken semantics) but since it's already incompatible with normal Ruby semantics, might as well make it useful?
Is changing self worse than preventing to see captured variables updates? (I don't know)
WDYT @ko1 (Koichi Sasada)?

Given that snippet behavior I think Ractor.make_shareable(Proc) should at least return a new Proc to not mutate the argument that way, then the Ruby block semantics would at least be preserved for usages using that Proc directly.

I think a proper solution would be a way to declare an isolated block (to be precise, using nil as self; and either allowing no captured variables like Ractor.new's block or only allowing shareable captured variables and taking a snapshot of them i.e. not seeing re-assignments), either with a new method or new syntax (which could then do the captured variables checks at parse time for the no captures variant).
Then it's fair for those isolated blocks to have different semantics, vs mutating existing Ruby blocks to have different semantics if Ractor.make_shareable has ever been called on them.

Updated by tenderlovemaking (Aaron Patterson) 3 days ago

Eregon (Benoit Daloze) wrote in #note-14:

Given semantics are already incompatible when using Ractor.make_shareable, maybe that should change the receiver to nil too?

We could set it to a singleton object that doesn't respond to anything. Instead of just raising NoMethodError on nil, it would be possible for us to intercept the method call and inform the user they're attempting to touch a non-shareable self.

It doesn't feel right (IMO it's broken semantics) but since it's already incompatible with normal Ruby semantics, might as well make it useful?
Is changing self worse than preventing to see captured variables updates? (I don't know)
WDYT @ko1 (Koichi Sasada)?

Given that snippet behavior I think Ractor.make_shareable(Proc) should at least return a new Proc to not mutate the argument that way, then the Ruby block semantics would at least be preserved for usages using that Proc directly.

Until this ticket, I was operating under the assumption it worked this way 😅. I agree with you.

I think a proper solution would be a way to declare an isolated block (to be precise, using nil as self; and either allowing no captured variables like Ractor.new's block or only allowing shareable captured variables and taking a snapshot of them i.e. not seeing re-assignments), either with a new method or new syntax (which could then do the captured variables checks at parse time for the no captures variant).
Then it's fair for those isolated blocks to have different semantics, vs mutating existing Ruby blocks to have different semantics if Ractor.make_shareable has ever been called on them.

I would like that, but as I mentioned, I don't want to change all call sites (if possible). At least new code could take advantage of such a syntax.

Actions #16

Updated by Eregon (Benoit Daloze) 3 days ago

  • Related to Bug #21039: Ractor.make_shareable breaks block semantics (seeing updated captured variables) of existing blocks added
Actions #17

Updated by Eregon (Benoit Daloze) 3 days ago

  • Related to Bug #18243: Ractor.make_shareable does not freeze the receiver of a Proc but allows accessing ivars of it added
Actions #18

Updated by Eregon (Benoit Daloze) 3 days ago

  • Related to Feature #18276: `Proc#bind_call(obj)` same as `obj.instance_exec(..., &proc_obj)` added

Updated by Eregon (Benoit Daloze) 3 days ago

tenderlovemaking (Aaron Patterson) wrote in #note-15:

Until this ticket, I was operating under the assumption it worked this way 😅. I agree with you.

Glad we agree on this :)
I extracted that to https://bugs.ruby-lang.org/issues/21039.

I would like that, but as I mentioned, I don't want to change all call sites (if possible). At least new code could take advantage of such a syntax.

I think it's not possible (in general at least) to not change call sites without breaking the existing's block intention and semantics.
If a block doesn't use self and doesn't use captured variables (or they are never written to) then yes it should be mostly transparent (except when e.g. adding p for debugging), but that seems too difficult to detect and would only cover some of the blocks which need to be passed to Ractors.

I see it a bit like marking C extensions as Ractor-safe, by default it's unsafe for good reasons (it would likely segfault otherwise or expose broken behavior), so we need to review the code and mark as safe where feasible or needed.
For blocks, that marking would be some kind of change in the code where it's declared, because that's where it can be understood to have different semantics than normal Ruby blocks.
I wrote some more ideas about ways to mark such blocks in #21039 (sorry for spreading the discussion).

I also linked this issue to previous tickets discussing basically the same thing.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0