Project

General

Profile

Actions

Feature #17100

closed

Ractor: a proposal for a new concurrent abstraction without thread-safety issues

Added by ko1 (Koichi Sasada) over 3 years ago. Updated over 3 years ago.

Status:
Closed
Target version:
-
[ruby-core:99449]

Description

Ractor: a proposal for a new concurrent abstraction without thread-safety issues

Abstract

This ticket proposes a new concurrent abstraction named "Ractor," Ruby's Actor-like feature (not an exact Actor-model).

Ractor achieves the following goals:

  • Parallel execution in a Ruby interpreter process
  • Avoidance of thread-safety issues (especially race issues) by limiting object sharing
  • Communication via copying and moving

I have been working on this proposal for a few years. The project name has been "Guild," but I renamed it to Ractor following Matz' preference.

Resources:

Current implementation is not complete (contains many bugs) but it passes the current CI. I propose to merge it soon and try the API, and to continue working on the implementation on master branch.

Background

MRI doesn't provide an in-process parallel computation feature because parallel "Threads" have many issues:

  • Ruby programmers need to consider about Thread-safety more.
  • Interpreter developers need to consider about Thread-safety more.
  • Interpreter will slow down in single thread execution because of fine-grain synchronization without clever optimizations.

The reason for these issues is "shared-everything" thread model.

Proposal

To overcome the issues on multiple-threads, Ractor abstraction is proposed. This proposal consists of two layers: memory model and communication model.

Basics:

  • Introduce "Ractor" as a new concurrent entity.
  • Ractors run in parallel.

Memory model:

  • Separate "shareable" objects and "un-shareable" objects among ractors running in parallel.
    • Shareable objects:
      • Immutable objects (frozen objects only refer to shareable objects)
      • Class/module objects
      • Special shareable objects (Ractor objects, and so on)
    • Un-shareable objects:
      • Other objects
  • Most objects are "un-shareable," which means we Ruby programmers and interpreter developers don't need to care about thread-safety in most cases.
  • We only concentrate on synchronizing "shareable" objects.
  • Compared with completely separated memory model (like MVM proposal), programming will be easier.
  • This model is similar to Racket's Place abstraction.

Communication model:

  • Actor-like (not the same) message passing using Ractor#send(obj) and Ractor.recv
  • Pull-type communication using Ractor.yield(obj) and Ractor#take
  • Support for multiple waiting using Ractor.select(...)

Actor-like model is the origin of the name of our proposal "Ractor" (Ruby's actor). However, currently, it is not an Actor model because we can't select the message (using pattern match as in Erlang, Elixir, ...). This means that we can't have multiple communication channels. Instead of adopting an incomplete actor model, this proposal provides yield/take pair to handle multiple channels. We discuss this topic later.

I strongly believe the memory model is promising. However, I'm not sure if the communication model is the best. This is why I introduced "experimental" warning.

Proposed specification: https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md

Implementation

https://github.com/ruby/ruby/pull/3365
All GH actions pass.

I describe the implementation briefly.

rb_ractor_t

Without Ractor, the VM-Thread-Fiber hierarchy is like this:

  • The VM rb_vm_t manages running threads (rb_thread_t).
  • A thread (rb_thread_t) points to a running fiber (rb_fiber_t).

With Ractor, we introduce a new layer rb_ractor_t:

  • The VM rb_vm_t manages running ractors (rb_ractor_t).
  • A Ractor manages running threads (rb_thread_t).
  • A thread (rb_thread_t) points to a running fiber (rb_fiber_t).

rb_ractor_t has a GVL to manage threads (only one among a Ractor's threads can run).

Ractor implementation is located in ractor.h, ractor.c and ractor.rb.

VM-wide lock

VM-wide lock is introduced to protect VM global resources such as object space. It should allow recursive lock, so the implementation is a monitor. We shall call it VM-wide monitor. For now, RB_VM_LOCK_ENTER() and RB_VM_LOCK_LEAVE() are provided to acquire/release the lock.

Note that it is different from the (current) GVL. A GVL is acquired anytime you run a Ruby thread. VM-wide lock is acquired only when accessing VM-wide resources.

On single ractor mode (all Ruby scripts except my tests)

Object management and GC

  • (1) All ractors share the object space.
  • (2) Each GC event will stop all ractors, and a ractor GC works under barrier synchronization.
    • Barrier at gc_enter()
    • marking, (lazy) sweeping, ...
  • (3) Because all of the object space are shared by ractors, object creation is protected by VM-wide lock.

(2) and (3) have huge impact on performance. The plan is:

  • For (2), introduce (semi-)separated object space. It would require a long time and Ruby 3.0 can't employ this technique.
  • For (3), introduce free slot cache for every ractor; then most creations can be done without synchronization. It will be employed soon.

Experimental warning

Currently, Ractor implementation and specification are not stable. So upon its first usage, Ractor.new will show a warning:

warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.

Discussion

Actor-based and channel-based

I think there are two message passing approaches: Actor-based (as in Erlang, ...) and channel-based (as in Go, ...).

With channel-based approach, it is easy to manipulate multiple channels because it manages them explicitly. Actor-based approach manipulates multiple channels with message pattern. The receiver can ignore unexpected structured messages or put them on hold and can handle them after the behavior has changed (role of actor has changed).

Ractor has send/recv like Actor-model, but there is no pattern matching feature. This is because we can't introduce new syntax, and I can't design a good API.

With channel-based approach, it is easy to design the API (for example, do ch = Ractor::Channel.new and share the ch that ractors can provide). However, I can't design a good API to handle exceptions among Ractors.

Regarding error handling, we propose a hybrid model using send/recv, yield/take pairs. Ractor#take can receive the source ractor's exception (like Thread#join). On Actor approach, we can detect when the destination Ractor is not working (killed) upon Ractor#send(obj). A receiver ractor (waiting for Ractor.recv) cannot detect the sender's trouble, but maybe the priority is not high. Ractor#take also detects the sender's (Ractor.yield(obj)) error, so the error can be propagated.

To handle multiple communication channels on Ractor, instead of using multiple channels, we use pipe ractors.

# worker-pool (receive by send)

main # pipe.send(obj)
-> pipe # Ractor.yield Ractor.recv
  ->
    worker1 # Ractor.yield(some_task pipe.take))
    worker2 # Ractor.yield(some_task pipe.take))
    worker3 # Ractor.yield(some_task pipe.take))
-> main # Ractor.select(worker1, worker2, worker3)

# if worker* causes an error, main can detect the error.

pipe ractors may look like channels. However, we don't need to introduce new classes with this technique (the implementation can omit Ractor creation for pipe ractors).

Maybe there are other possibilities. For example, if we can propagate the errors with channels, we can also consider a channel-model (we need to change the Ractor name :p then).

Name of Ractor (and Guild)

When I proposed Guild in 2016, I regarded "move" message-passing (see specification) to be characteristic of it, and I called this feature "moving membership." This is why the name "Guild" was chosen. However Matz pointed out that this move semantics is not used frequently, and he asked me to change the name. Also, someone had already been using the class name "Guild."

"Ractor" is short and is not an existing class; this is why I choose "Ractor."

I understand people may confuse it with "Reactor."

TODO

There are many remaining tasks.

Protection

Many VM-wide (process-wide) resources are not protected correctly, so using Ractor on a complicated program can cause critical bugs ([BUG]). Most global resource are managed by global variables, so we need to check them correctly.

C-methods

Currently, C-methods (methods written in C and defined with rb_define_method()) run in parallel. It means that thread-unsafe code can run in parallel. To solve this issue, I plan the following:

(1) Introduce thread-unsafe label for methods

It is impossible to make all C-methods thread-safe, especially for C-methods in third party C-extensions. To protect them, label these (possibly) thread-unsafe C-methods as "thread-unsafe."

When "unsafe"-labeled C methods are invoked, they acquire a VM-wide lock. This VM-wide lock should check for recursiveness (so this lock should be a monitor) and escaping (exceptions). Currently, VM-wide lock doesn't check for escaping, but that should be implemented soon.

(2) Built-in C-methods

I'll fix most of the builtin C-methods (String, Array, ...) so that they will become thread-safe. If it is not easy, I'll use thread-unsafe label.

Copying and moving

Currently, Marshal protocol makes deep copy on message communication. However, Marshal protocol doesn't support some objects like Ractor objects, so we need to modify them.

Only a few types are supported for moving, so we need to write more.

"GVL" naming

Currently, the source code contains the name "GVL" for Ractor local locks. Maybe they should be renamed.

Performance

To introduce fine-grained lock, performance tuning is needed.

Bug fixes

many many ....

Conclusion

This ticket proposes a new concurrent abstraction "Ractor." I think Ruby 3 can ship with Ractor under "experimental" status.


Related issues 2 (1 open1 closed)

Related to Ruby master - Feature #17145: Ractor-aware `Object#deep_freeze`RejectedActions
Related to Ruby master - Feature #17159: extend `define_method` for RactorOpenActions
Actions #1

Updated by ko1 (Koichi Sasada) over 3 years ago

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) over 3 years ago

Looking forward to this.
I have a few questions.

  • Reactor#recv/Channel#recv => Reactor#receive/Channel#receive. I don't think libc-like abbreviations should be used. Let's use actual words.
  • About synchronization for Modules, specifically, for instance variables, constants and class variables. Should it be consistent? I think it would be better. For instance module @ivars/constants/@@cvars can be accessed from any Reactor if it's a shareable value, and only main Reactor if non-shareable. This seems the most compatible while still safe, but it might be surprising that the access is allowed depending on the value.
  • I think we need to introduce #deep_freeze to make it easy to create deeply-immutable objects conveniently and efficiently.
  • On a similar note, #deep_copy seems useful as a more optimized way than Marshal#dump+load.
  • I think it is worth noting that send/yield(obj, move: true) still involves some shallow copying.
  • ObjectSpace should be per Reactor, right? It seems important for guarantees that ObjectSpace.each_object only iterates objects in the current Reactor (otherwise all low-level data races can happen again).
  • C extensions: should C extensions be marked as default thread-safe, or thread-unsafe? It seems considered thread-safe by default in the proposal (sounds optimistic, but also should help to identify those which are not thread-safe). I'm looking forward to have C extensions executing in parallel. Currently TruffleRuby defaults to using a global monitor for C extensions (it's a CLI flag) to guarantee the same semantics as CRuby (IIRC we found that the openssl C ext and a few others manipulate their internal state in a thread-unsafe manner).
  • Previously it was mentioned C extensions would only be allowed for the main/initial Reactor. I'm happy to hear this is no longer the case. However, isn't there a risk that C extensions might pass Ruby objects without copying or transfer to other Reactor via e.g. C global variables? (and that could cause data races/segfault). I guess we have to trust C extensions to not do that? Probably we need to introduce easy ways to have reactor-local variables (similar to thread-local variables), so C global variables can be replaced with reactor-local variables.
  • I'll fix most of builtin C-methods (String, Array, ...) thread-safe. You mean just about synchronizing access to global variables, right? I guess in your model String/Array/Hash operations will not have any synchronization but rely on Reactor isolation, right?
  • Does Ractor::RemoteError points to original error? How to avoid that exceptions "leak" objects from inside the Reactor? Maybe copy/move the exception? (The reactor can still be alive if it rescues e.g. Ractor::MovedError)
  • About global variables, what about $VERBOSE and $DEBUG which are implicitly accessed? Should it still be allowed to read them in other reactors? Should they become Reactor-local? Should global variables be allowed to be read from any Reactor if they contain a shareable value?

Updated by Eregon (Benoit Daloze) over 3 years ago

Note: I updated my questions on Redmine based on rereading https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md

Updated by ibylich (Ilya Bylich) over 3 years ago

First of all, thanks a lot for your work. This is a huge (and incredibly valuable) addition to Ruby.

I have your branch built locally and I played a lot with it during last few days. I've got a few questions:

  1. Multiple ractors use a single shared objectspace, right? If so, is it ok that objects can be shared using ObjectSpace? I'm talking about something like this:
require 'objspace'

A = []


1.upto(10) do |i|
  Ractor.new(A.object_id, i) do |a_object_id, i|
    a = ObjectSpace._id2ref(a_object_id)

    p a

    a << i
  end
end

sleep 1

p A

Currently it throws an error array modified during dump (RuntimeError) depending on the timing (i.e. sometimes it works). Also, global values in C can be used to store a reference to some VALUE that can be later used by other ractors. Is it possible (and is it necessary at all) to add some kind of protection against it?

  1. Also, I've noticed that marshalling is used to pass objects to ractors. Because of that, it's impossible to pass objects like Mutex, ConditionVariable and I suppose many other objects that encapsulate them (like Concurrent::Array from concurrent-ruby). Are there any plans to make them share-able between multiple ractors? These classes are developed to be used in parallel threads, but of course VM can't know about all of them.

  2. It's impossible to use require in non-main ractors (because it mutates many global values like $LOADED_FEATURES). Does it mean that initialization part of the client code always has to be performed at startup? What about autoload?

  3. Are there any plans to add something like Ractor.wait(ractor)/Ractor.wait_all (similar to Thread.join/Process.waitpid)?

Updated by ko1 (Koichi Sasada) over 3 years ago

Thank you for your question.

Eregon (Benoit Daloze) wrote in #note-2:

  • Reactor#recv/Channel#recv => Reactor#receive/Channel#receive. I don't think libc-like abbreviations should be used. Let's use actual words.

I want to ask Matz about naming. I don't care to rename it (send/recv is same character count so it is happy to write, though :p).

  • About synchronization for Modules, specifically, for instance variables, constants and class variables. Should it be consistent? I think it would be better. For instance module @ivars/constants/@@cvars can be accessed from any Reactor if it's a shareable value, and only main Reactor if non-shareable. This seems the most compatible while still safe, but it might be surprising that the access is allowed depending on the value.

I understand the concerns. Current design is fixed version by Matz.
Making it consistent can be an option.
BTW, there is some inconsistent by variable types:

p @ivar
p @@cvar #=> t.rb:2:in `<main>': uninitialized class variable @@cvar in Object (NameError)

so consistency is not best priority I guess.

  • I think we need to introduce #deep_freeze to make it easy to create deeply-immutable objects conveniently and efficiently.

Yes. We need to consider about new syntax or method.
But not essential. I think Ruby 3.1 is okay if it is not determined.

  • On a similar note, #deep_copy seems useful as a more optimized way than Marshal#dump+load.

Maybe it is different topic, but I agree it will help.

  • I think it is worth noting that send/yield(obj, move: true) still involves some shallow copying.

I agree. ractor.md will cover it.

  • ObjectSpace should be per Reactor, right? It seems important for guarantees that ObjectSpace.each_object only iterates objects in the current Reactor (otherwise all low-level data races can happen again).

This is very very difficult problem I ignored.
Now there is no way to recognize which objects belongs to which Ractors.

Simply each_objects method is allowed on single Ractor mode (no Ractor.new) is one option I guess.

  • C extensions: should C extensions be marked as default thread-safe, or thread-unsafe? It seems considered thread-safe by default in the proposal (sounds optimistic, but also should help to identify those which are not thread-safe). I'm looking forward to have C extensions executing in parallel. Currently TruffleRuby defaults to using a global monitor for C extensions (it's a CLI flag) to guarantee the same semantics as CRuby (IIRC we found that the openssl C ext and a few others manipulate their internal state in a thread-unsafe manner).

Default unsafe same as TruffleRuby, and now this feature is not supported on PR.

  • Previously it was mentioned C extensions would only be allowed for the main/initial Reactor. I'm happy to hear this is no longer the case. However, isn't there a risk that C extensions might pass Ruby objects without copying or transfer to other Reactor via e.g. C global variables? (and that could cause data races/segfault). I guess we have to trust C extensions to not do that? Probably we need to introduce easy ways to have reactor-local variables (similar to thread-local variables), so C global variables can be replaced with reactor-local variables.

I guess we have to trust C extensions to not do that?

I believe people. "Default unsafe" means I don't think it is easy to make safe.

Probably we need to introduce easy ways to have reactor-local variables (similar to thread-local variables), so C global variables can be replaced with reactor-local variables.

Agreed.

  • I'll fix most of builtin C-methods (String, Array, ...) thread-safe. You mean just about synchronizing access to global variables, right? I guess in your model String/Array/Hash operations will not have any synchronization but rely on Reactor isolation, right?

As you said, most of String/Array operations are already thread-safe. However, for example fstring table is not thread-safe. Encoding table is also not thread-safe, and so on. The challenge is how to find such global resource accesses.

  • Does Ractor::RemoteError points to original error? How to avoid that exceptions "leak" objects from inside the Reactor? Maybe copy/move the exception? (The reactor can still be alive if it rescues e.g. Ractor::MovedError)

Copied one.
One exceptional (sorry confusing) case is the exception which terminate the source ractor.
It can be an optimization but not implemented.

  • About global variables, what about $VERBOSE and $DEBUG which are implicitly accessed? Should it still be allowed to read them in other reactors? Should they become Reactor-local? Should global variables be allowed to be read from any Reactor if they contain a shareable value?

Good question. Also $0, $:, $stdin, $stdout, ... are needed to consider. Now no idea. Should be Rator local? Some global variables are scope local ($1, $2, $~, ...), so I don't think such irregular scope is an issue.

Thanks,
Koichi

Updated by ko1 (Koichi Sasada) over 3 years ago

Thank you for trying Ractor.

ibylich (Ilya Bylich) wrote in #note-4:

  1. Multiple ractors use a single shared objectspace, right? If so, is it ok that objects can be shared using ObjectSpace? I'm talking about something like this:

Should not be allowed.
We need to consider how to change the ObjectSpace semantics.

Currently it throws an error array modified during dump (RuntimeError) depending on the timing (i.e. sometimes it works). Also, global values in C can be used to store a reference to some VALUE that can be later used by other ractors. Is it possible (and is it necessary at all) to add some kind of protection against it?

You are right and such C-extensions should be thread-unsafe.

In a ticket, I mentioned "thread-unsafe" C-methods, but I need to mention "Ractor-supported" C-methods and other than it, they should not allowed to run on Ractors except the main ractor.

  1. Also, I've noticed that marshalling is used to pass objects to ractors. Because of that, it's impossible to pass objects like Mutex, ConditionVariable and I suppose many other objects that encapsulate them (like Concurrent::Array from concurrent-ruby). Are there any plans to make them share-able between multiple ractors? These classes are developed to be used in parallel threads, but of course VM can't know about all of them.

I believe synchronization primitive between Ractors should be message passing (because it is easy to debug), so I don't think support Mutex between Ractors.
If you mean only sending Mutex and it only affect between threads in a Ractor, it is possible.

  1. It's impossible to use require in non-main ractors (because it mutates many global values like $LOADED_FEATURES). Does it mean that initialization part of the client code always has to be performed at startup? What about autoload?

Great question, and it is still open question.
Of course it is safe to 'require` from main.

  1. Are there any plans to add something like Ractor.wait(ractor)/Ractor.wait_all (similar to Thread.join/Process.waitpid)?

Ractor#take(r) and Ractor.select(r1, r2, ...) can emulate them.

Updated by shyouhei (Shyouhei Urabe) over 3 years ago

Eregon (Benoit Daloze) wrote in #note-2:

  • I think we need to introduce #deep_freeze to make it easy to create deeply-immutable objects conveniently and efficiently.
  • On a similar note, #deep_copy seems useful as a more optimized way than Marshal#dump+load.

Do we need both? It seems an "optimised deep copy that returns everything frozen" should just suffice for Ractor's needs.

Updated by Eregon (Benoit Daloze) over 3 years ago

ko1 (Koichi Sasada) wrote in #note-5:

BTW, there is some inconsistent by variable types:

I think that does not matter much, it's about consistency between what "state" of modules/classes can be accessed.
I do not mind much about class variables because I think they should be deprecated/removed anyway (out of topic).

Instance variables on modules should probably be consistent with constants.
Instances variables of objects inside the Ractor can also be accessed, so it seems weird if @ivars on a shareable value like module cannot be accessed.
I wish @ivar access would not depend on the type or any extra condition (well, except frozen), but I guess that's not possible with Ractors.

BTW, is the ownership of modules tracked?
E.g., what about a Module.new{} created in a Ractor, there I'd guess all access is allowed?
Or is it assumed it might be shared with other ractors, and so it's restricted by type (module & class)?

Yes. We need to consider about new syntax or method.
But not essential. I think Ruby 3.1 is okay if it is not determined.

Yes, does not need to be at the same time but IMHO would make sense.
I think regarding the name, deep_freeze is the obvious choice.

This is very very difficult problem I ignored.
Now there is no way to recognize which objects belongs to which Ractors.

One possibility is how TruffleRuby implements ObjectSpace.each_object: to use the mark phase of a GC and start from the Ractor's roots.
This might require a precise GC/marking phase though, as some integer in memory shouldn't be interpreted as a valid object from another Ractor.

I guess this should be easier once each Ractor has its own heap. Then all those objects are OK + global shareable.

Simply each_objects method is allowed on single Ractor mode (no Ractor.new) is one option I guess.

Seems rather restrictive but certainly better than exposing all objects which is unsafe.

Default unsafe same as TruffleRuby, and now this feature is not supported on PR.

I'm a bit confused with your reply here.
I guess you mean C extensions should be considered by default thread-unsafe, and they need some way to mark thread/parallel/ractor-safe?
But current PR doesn't restrict anything yet?

The explanation above seems to mean "assume safe and allow using C exts in parallel by default:

Now C-methods (methods written in C and defined with rb_define_method()) are run in parallel. It means thread-unsafe code can run in parallel.
To solve this issue, I plan the following: [...] Introduce thread-unsafe label

So unlabeled functions would be considered Ractor/parallel-safe?

Or do you mean we should add a "ractor/parallel/thread-safe label", so unlabeled/existing functions are considered unsafe and not run in parallel?

I believe people. "Default unsafe" means I don't think it is easy to make safe.

Right, default considered as unsafe + a way to mark as safe and documentation to detail what are conditions to be safe sounds good.

As you said, most of String/Array operations are already thread-safe. However, for example fstring table is not thread-safe. Encoding table is also not thread-safe, and so on. The challenge is how to find such global resource accesses.

For TruffleRuby, when removing the GIL, I remember going through every field of the "global context" and make sure it's synchronized properly.
Probably this is a good way to find what to synchronize on MRI too.
There are also more tricky cases, like state related to bytecode, inline caches, module state like ivar offset table, etc, which are not directly in the "global context".

Good question. Also $0, $:, $stdin, $stdout, ... are needed to consider. Now no idea. Should be Rator local? Some global variables are scope local ($1, $2, $~, ...), so I don't think such irregular scope is an issue.

I think user (not special) global variables should be Ractor-local. That seems the most compatible and useful.

shyouhei (Shyouhei Urabe) wrote in #note-7:

Do we need both? It seems an "optimised deep copy that returns everything frozen" should just suffice for Ractor's needs.

For Ractor I think either deep_copy (would be used internally for non-deep-frozen messages) or deep_freeze is useful (to pass without copy), rarely together for the same objects.

I think they are both generally useful, even without Ractors.
Also, [1, Object.new, { a: "foo" }].deep_freeze needs no copy.
Many libraries already implement their own deep_copy (often via Marshal.load/dump, but that seems suboptimal).

P.S.: Sorry, I noticed I wrote "Reactor" instead of "Ractor" many times in the previous message, I was too tired and probably need to get used to the name.

Actions #9

Updated by sawa (Tsuyoshi Sawada) over 3 years ago

  • Subject changed from Ractor: a proposal for new concurrent abstraction without thread-safety issues to Ractor: a proposal for a new concurrent abstraction without thread-safety issues
  • Description updated (diff)
Actions #10

Updated by sawa (Tsuyoshi Sawada) over 3 years ago

  • Description updated (diff)

Updated by mame (Yusuke Endoh) over 3 years ago

Question: What is "immutable"?

I've tried the following code, but the implementation raises a NameError "can not access non-sharable objects in constant A by non-main Ractors".

A = {}

A.freeze
Hash.freeze
Object.freeze
self.freeze

Ractor.new do
  Object::A
end.take

Is this a bug or not-yet-implemented? Or do I need to freeze something other? Or is my expectation of "immutable" different?

Updated by mame (Yusuke Endoh) over 3 years ago

I've directly heard "not-yet-implemented" from @ko1 (Koichi Sasada). Sorry for noise.

Actions #13

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

  • Tracker changed from Bug to Feature
  • Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)

Updated by matz (Yukihiro Matsumoto) over 3 years ago

OK, I accept the Ractor concept. Go ahead and merge.

We have left some naming issues. My current ideas are:

  • I vote for recv mostly because of the past familiality to UNIX naming convention
  • I also want recvfrom a la UDP communication. The name recv has an intimacy with it.
  • I have no objection for having full spelled receive alias to recv
  • pipe may be renamed to channel or stream in the future. Need more discussion.

Matz.

Updated by mame (Yusuke Endoh) over 3 years ago

I basically like Ractors, but I'd like to discuss how to share constants.

Currently, a Ractor cannot read any constant if it has an unsharable object.

C = [[1, 2], [3, 4]]
Ractor.new do
  p C #=> can not access non-sharable objects in constant Object::C by non-main Ractor. (NameError)
end.take

It is possible to read the constant only if its value is deeply frozen.

C = [[1, 2].freeze, [3, 4].freeze].freeze
Ractor.new do
  p C #=> [1, 2, 3]
end.take

This design is safe, but too conservative, I think.
In fact, I managed to run optcarrot on parallel ractors, and I needed to write a bunch of code to deeply freeze many constants that have arrays and hashes.

https://github.com/mame/optcarrot/blob/9f7209b7264b32579b8d99922ed5fa93cad7567c/bin/optcarrot-bench-parallel-on-ractor#L11-L46

A new API to deeply freeze an object, say Object#deep_freeze, will make it less annoying, but I think there are two problems:

  • Practically, many existing libraries will not work as is in Ractor. I know that some people write .freeze to all constant values (mainly because rubocop suggests), but we need to replace it with .deep_freeze or something.
  • IMO, adding .deep_freeze to almost constant definitions spoils Ruby's conciseness and flexibility. I don't want to write them, and don't want even to see them.

Here is my counterproposal: When a Ractor attempts to read a constant that has an unsharable object, the object should be "deeply copied" and cached.

C = {[1, 2], [3, 4]]

Ractor.new do
  C #=> Marshal.load(Marshal.dump([[1, 2], [3, 4]]))
  C #=> The same object as above
  C[0] = "foo" # This modification should be Ractor-local, not observable by other Ractors
  p C #=> ["foo", [3, 4]]
end.take

p C #=> [[1, 2], [3, 4]] # not modified

There are pros and cons:

Pros:

  • Many existing libraries will work as is, I think. (At least, optcarrot will do.)
  • We don't have to write annoying .freeze.

Cons:

  • It is less effective due to object copy. (It is avoidable by adding .freeze if it is really important.)
  • If a constant is used as a mutable state holder, it may be unsafe.

The last point may be a bit important:

Counters = [0, 0, 0]

Ractor.new do
  # This may result in some inconsistent data that is against the programmer's intention
  p Counters #=> [3, 3, 2]
end

loop do
  Counters[0] += 1
  Counters[1] += 1
  Counters[2] += 1
end

The current behavior is indeed safe because reading Counters in a Ractor will raise an exception. I admit that this property is good.
However, this kind of constant usage should be relatively rare, IMO. I believe that almost all constants will not be updated, so I'm unsure if it is reasonable to prohibit reading any unsharable constants in a Ractor.

Updated by Eregon (Benoit Daloze) over 3 years ago

matz (Yukihiro Matsumoto) wrote in #note-14:

We have left some naming issues. My current ideas are:

  • I vote for recv mostly because of the past familiality to UNIX naming convention

I think many people can agree the C socket API has terrible naming, I think we should not blindly follow those mistakes for new non-Socket APIs.
Also, receiving an actor message is a completely different level of abstraction than a low-level socket recv(2) (for starters it does not involve IO).
Also recv(2) is read(2) with extra flags, which does not apply at all here (and so recv(2) is rarely used and not well known).

I think the only advantage I can see of recv is that it's 4 letters like send.
We are designing Ruby APIs, not C APIs here, we should focus on making it nice for Ruby programmers, not for C programmers.

Erlang names it receive, and that's one of the most used actor languages.

BTW I think send is needlessly conflicting with Kernel#send and will cause confusion.
(yes Socket#send exists but almost nobody uses it)
How about Ractor#<< instead?

  • I also want recvfrom a la UDP communication. The name recv has an intimacy with it.

What does recvfrom do? If it's "receive from" then it would be message = Ractor.recvfrom(specific_actor)? (of course not, but it illustrates the unclear naming)

Let's properly name things. So here I'd suggest message, sender = Ractor.receive_with_sender.
So much clearer and so much more readable.
So much more Ruby-like (IMHO).

  • I have no objection for having full spelled receive alias to recv

Do we want actor code to look like C kernel hackers code, or like proper and nice Ruby code?
IMHO recv should not exist for all the reasons mentioned above.
That said if we can't agree this is at least a way forward.

Let's ask the community to see what they think: https://twitter.com/eregontp/status/1299284528578596864
If I'm alone to have these concerns, then feel free to ignore me.
But I don't think that's the case.

Updated by Eregon (Benoit Daloze) over 3 years ago

mame (Yusuke Endoh) wrote in #note-15:

Here is my counterproposal: When a Ractor attempts to read a constant that has an unsharable object, the object should be "deeply copied" and cached.

Reading a constant to me is always expected to be fast and have minimal side-effects (except autoload on first access).
Doing a potentially large copy seems very far from that and counter-intuitive to me.

If a constant is used as a mutable state holder, it may be unsafe.

I'm worried this will lead to hard-to-debug bugs.

IMHO constants are global state, and just like modules & classes they should not be automatically copied with Ractors.
If they happen to contain a shareable object, which seems fairly frequent then it already works nicely.
If not I think an explicit deep_freeze is a good way.

I think deep_freeze is the way to go, and that 20 deep_freeze calls is not so bad for a non-trivial codebase like OptCarrot.
Maybe CONSTANT = ... should automatically deep_freeze
Would need a migration path though, probably with a magic comment. And also a way to keep it mutable when intended.
I'm not sure it's worth it.
Personally I regularly mutate the contents of constants, latest example.
I'd think almost all CONSTANT = [] or CONSTANT = {} are meant to be mutated, and I guess that's not a rare pattern.

Updated by Eregon (Benoit Daloze) over 3 years ago

@ko1 (Koichi Sasada) Can you review these potential performance overheads of adding Ractors? (and so potential regressions for code not using Ractors)

From https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md#language-changes-to-isolate-unshareable-objects-between-ractors

I'm thinking notably reading constants will be less efficient, as it will need to check if Ractor.current == main Ractor (then OK to read anything), otherwise check if the value is shareable (by reading a flag?).
That's more expensive than the current semantics where e.g. TruffleRuby folds Ruby constants to a constant in compiled code (even with Threads). TruffleRuby could still fold while there is only the main Ractor, but then creating a new Ractor would cause probably unexpected performance degradation by adding the extra check for if current == main.
An idea to fix that performance lost is to still fold (with no checks) if the value is shareable, because then it's always OK to read it.

Writing constants will need to check if main Ractor, but writing constants is slow path so that seems fine.

Every instance variable read/write will need to check if the receiver is shareable when there are Ractors. If it is, it will need to additionally check if current == main.
That seems expensive, can we check e.g. on OptCarrot how much it costs?

Same for class variables (but hopefully people don't use it much).

For global variables, we still need to define the semantics.

Some synchronization for VM data structures like the per-class ivar table, but that's mostly slow path so it should be fine.
And that synchronization already exists on Ruby impls without a GIL.
What about method tables? I guess synchronizing might slow down uncached method calls.
JRuby/TruffleRuby already synchronize the method table of course.

Anything else I missed?

Ractor.current uses __thread and fs/gs registers on CRuby, right?
On JVM there is unfortunately no easy way to do that (except by using Thread.currentThread() and subclassing java.lang.Thread to add a field, but that doesn't work for the main Java thread). And Java ThreadLocal is rather slow. Anyway, something to figure out on JVM.
Another way is to pass the current Ractor as a hidden argument through all calls, but that can be an overhead for calls.

shareable = "immutable or Module or Ractor" currently. How about making Ractor immutable so it's simply "immutable or Module"? Any reason to keep Ractor instances non-frozen?
That's just a semantics simplification, because either way all these would have the shareable flag set.

Updated by Eregon (Benoit Daloze) over 3 years ago

Re naming of receiving a message, here is the poll I made:
https://twitter.com/eregontp/status/1299284528578596864

126 votes, 83.3% Ractor.receive, 16.7% Ractor.recv.
I think the result can hardly be clearer.

I see it as naming it Ractor.recv is just hack.

Yes, please you full words. I never got why so many languages and ecosystem artificially and randomly shorten verbs and nouns. Mostly for no good reason.
Yeah and Ruby usually doesn't participate in this so it's very strange to me..

Even Erlang names it receive, and Ractor is clearly inspired by Erlang.

Ractor.recv feels like a random shortcut for no good reason.
The reason of "same numbers of letters as send" seems a weak argument:

  • r.send(message) and message = Ractor.recv are unlikely to ever align vertically, so visually in code it gives basically nothing
  • There is Ractor.yield(obj) and Ractor#take() and we did not try to rename yield to some 4-letters abbreviation.
  • The similarity with recv(2) is very thin, in fact I'd go as far as saying it has nothing to do with recv(2) (Ractor don't support networking currently and I see no plan in this proposal for it, and even if there was it would be an irrelevant implementation detail if it used recv(2) internally).

@matz (Yukihiro Matsumoto) Based on those arguments and the ones in my previous reply, could you agree Ractor.receive is the better alternative to Ractor.recv?

Regarding sending a message, quite a few were concerned of the conflict with Kernel#send (as you can see on the poll).
So I made another poll about sending a message:
https://twitter.com/eregontp/status/1299316723158585348

Also, send is about calling a method. But unlike some other actor models, Ractor#send does not call a method on the receiving actor.
It is not a message send ala Smalltalk (= method call). So that seems confusing.

OTOH send is the word used in "send a message" so it has pros too.

The results there are much less clear (1/3 #send, 1/3 #<<, 1/3 other).
Some suggestions from the tweets:

ractor.send(message)
ractor << message
ractor.transmit(message)
ractor.post(message)
ractor.put(message)
ractor.pass(message)
ractor.call(message)
ractor.emit(message)
ractor.dispatch(message)
ractor.push(message)
ractor.cast(message)
ractor.deliver(message)
ractor.enqueue(message)

@ko1 (Koichi Sasada), @matz (Yukihiro Matsumoto) Which one(s) do you like in this list?

Regarding BasicSocket#send, some see it as a mistake and I can agree.
It just didn't seem worth the conflict since BasicSocket#send is so rarely used.
And using __send__ instead of send in Ruby code feels ugly, so adding another send just makes it more likely to get the conflict and bring more confusion.

Updated by joanbm (Joan Blackmoore) over 3 years ago

@Eregon (Benoit Daloze) (Benoit Daloze) https://bugs.ruby-lang.org/issues/17100#note-19

Re naming of receiving a message, here is the poll I made:
https://twitter.com/eregontp/status/1299284528578596864
126 votes, 83.3% Ractor.receive, 16.7% Ractor.recv.
I think the result can hardly be clearer.

Regarding sending a message, quite a few were concerned of the conflict with Kernel#send (as you can see on the poll).
So I made another poll about sending a message:
https://twitter.com/eregontp/status/1299316723158585348
The results there are much less clear (1/3 #send, 1/3 #<<, 1/3 other).

Can you just cease your manipulative behaviour, based on faulty generalization resoning in this case ? It can only harm, makes no good to anybody.

Regarding BasicSocket#send, some see it as a mistake and I can agree.
It just didn't seem worth the conflict since BasicSocket#send is so rarely used.
And using send instead of send in Ruby code feels ugly, so adding another send just makes it more likely to get the conflict and bring more confusion.

I don't find it a "mistake" but a pragmatic and conscious decision, favour notoriously known naming while not breaking object system rules.

Actions #21

Updated by ko1 (Koichi Sasada) over 3 years ago

  • Status changed from Open to Closed

Applied in changeset git|79df14c04b452411b9d17e26a398e491bca1a811.


Introduce Ractor mechanism for parallel execution

This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.

[Feature #17100]

This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with Ractor.new.

I hope this feature can help programmers from thread-safety issues.

Updated by Eregon (Benoit Daloze) over 3 years ago

joanbm (Joan Blackmoore) wrote in #note-20:

I'll disregard your offensive comment, I tried to summarize the poll as good as I can.
This is my interpretation of the poll, and everyone can interpret as they want.
Of course it's a poll and so it did not ask everybody, which is not possible anyway.

Updated by Eregon (Benoit Daloze) over 3 years ago

@mame (Yusuke Endoh) Regarding OptCarrot, I think having deep_freeze would be a great help.
But I wonder, maybe Object.deep_freeze would work to freeze all constants very succintly?
Some constants expect to be mutable, so I guess it doesn't work that well on non-small programs.

And OptCarrot.deep_freeze would probably freeze Object too since it's accessible via OptCarrot.class.superclass.
Maybe we could have some control there to clarify what should be frozen (e.g., all modules & constants in the namespace, but not above)?

Maybe it is a good idea to freeze all constants & modules with Ractor.
Here is a problematic case on master sharing mutable state via define_method:

$ ruby -ve 'ary=[]; define_method(:add) { |e| ary << e }; add(1); Ractor.new { add(2) }; Ractor.new { add(3) }; sleep 1; p ary'
ruby 3.0.0dev (2020-09-03T12:11:06Z master b52513e2a1) [x86_64-linux]
<internal:ractor>:38: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
[1, 2, 3]

$ ruby -ve 'ary=[]; define_method(:add) { |e| ary << e }; 1000.times { |i| Ractor.new(i) { |i| add(i) } }; sleep 1; p ary'
crashes

I'm not sure what's a good way to solve that one.

Updated by marcandre (Marc-Andre Lafortune) over 3 years ago

I find Ractor very promising 🎉

The impact is severily limited without support for user-based immutable types. Is that planned for 3.0?

I'm curious what is the technical difficulty. I mean, can't existing builtin freeze check for all instance variables and if all are immutable or shareable, then it also sets the immutable flag for the current object?

Hopefully we will get deep_freeze. I created #17145

We should probably have Ractor.shareable?(obj) # => true or false, right?

Bike-shed level:

I also dislike recv. That abbreviation is not too familiar and it doesn't seem to be worthwhile to save 4 keystrokes a few times. I would expect the number of places where we call receive to be quite limited.

I dislike even more send. In Ruby, we send messages, not values.

I think transmit would be better than send, but why not use the same pair of terms for push and pull modes?

# push:
r.yield(obj)
Ractor.receive # in `r`
# pull:
Ractor.yield(obj) # in `r`
r.receive

Ractor.select: How about reversing the result order (from [ractor, message] to [message, ractor])? I can imagine more cases were we don't care about which ractor we got the result from than the reverse. result, = Ractor.select(*ractor_farm) is easier to write.

Actions #25

Updated by Eregon (Benoit Daloze) over 3 years ago

Updated by ko1 (Koichi Sasada) over 3 years ago

Sorry I missed your question.

Eregon (Benoit Daloze) wrote in #note-8:

Instance variables on modules should probably be consistent with constants.
Instances variables of objects inside the Ractor can also be accessed, so it seems weird if @ivars on a shareable value like module cannot be accessed.
I wish @ivar access would not depend on the type or any extra condition (well, except frozen), but I guess that's not possible with Ractors.

I think we can expect "Constants" are defined at once at the beginning of code.
But ivars are defined at runtime anytime and it can lead race-conditions.
I like current restriction. But I also agree the current limitation is too hard to migrate from the current programs.

Ideally (for me) classes/modules should use shareable mutable objects to change the state of them like that:

class C
  STATE = Ractor::ShareableHash.new
  def update
    STATE.transaction do
      STATE[:cnt] += 1
    end
  end
end

BTW, is the ownership of modules tracked?
E.g., what about a Module.new{} created in a Ractor, there I'd guess all access is allowed?

Yes.

This is very very difficult problem I ignored.
Now there is no way to recognize which objects belongs to which Ractors.

One possibility is how TruffleRuby implements ObjectSpace.each_object: to use the mark phase of a GC and start from the Ractor's roots.
This might require a precise GC/marking phase though, as some integer in memory shouldn't be interpreted as a valid object from another Ractor.

I guess this should be easier once each Ractor has its own heap. Then all those objects are OK + global shareable.

Yes. It is future work (per-ractor object space).

Default unsafe same as TruffleRuby, and now this feature is not supported on PR.

I'm a bit confused with your reply here.
I guess you mean C extensions should be considered by default thread-unsafe, and they need some way to mark thread/parallel/ractor-safe?
But current PR doesn't restrict anything yet?

You are correct.

The explanation above seems to mean "assume safe and allow using C exts in parallel by default:

Now C-methods (methods written in C and defined with rb_define_method()) are run in parallel. It means thread-unsafe code can run in parallel.
To solve this issue, I plan the following: [...] Introduce thread-unsafe label

So unlabeled functions would be considered Ractor/parallel-safe?

Or do you mean we should add a "ractor/parallel/thread-safe label", so unlabeled/existing functions are considered unsafe and not run in parallel?

Yes.

As you said, most of String/Array operations are already thread-safe. However, for example fstring table is not thread-safe. Encoding table is also not thread-safe, and so on. The challenge is how to find such global resource accesses.

For TruffleRuby, when removing the GIL, I remember going through every field of the "global context" and make sure it's synchronized properly.
Probably this is a good way to find what to synchronize on MRI too.
There are also more tricky cases, like state related to bytecode, inline caches, module state like ivar offset table, etc, which are not directly in the "global context".

Exactly. I'll use your technique.

Good question. Also $0, $:, $stdin, $stdout, ... are needed to consider. Now no idea. Should be Rator local? Some global variables are scope local ($1, $2, $~, ...), so I don't think such irregular scope is an issue.

I think user (not special) global variables should be Ractor-local. That seems the most compatible and useful.

BTW now $stdin/out/err is ractor-local, different IO objects (but share same fds).

Updated by ko1 (Koichi Sasada) over 3 years ago

Eregon (Benoit Daloze) wrote in #note-18:

@ko1 (Koichi Sasada) Can you review these potential performance overheads of adding Ractors? (and so potential regressions for code not using Ractors)

From https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md#language-changes-to-isolate-unshareable-objects-between-ractors

I'm thinking notably reading constants will be less efficient, as it will need to check if Ractor.current == main Ractor (then OK to read anything), otherwise check if the value is shareable (by reading a flag?).

Yes. I didn't think it is huge cost.

BUT

CONST = 1
N = 10_000_000

require 'benchmark'
Benchmark.bm{|x|
  x.report{
    N.times{
      v = CONST; v = CONST; v = CONST; v = CONST; v = CONST;
      v = CONST; v = CONST; v = CONST; v = CONST; v = CONST;
    }
  }
}

__END__

                    user     system      total        real
before Ractor   0.613114   0.002960   0.616074 (  0.616090)
after Ractor    0.968084   0.000000   0.968084 (  0.968097)

:p

That's more expensive than the current semantics where e.g. TruffleRuby folds Ruby constants to a constant in compiled code (even with Threads). TruffleRuby could still fold while there is only the main Ractor, but then creating a new Ractor would cause probably unexpected performance degradation by adding the extra check for if current == main.
An idea to fix that performance lost is to still fold (with no checks) if the value is shareable, because then it's always OK to read it.

I agree with this hack.

Writing constants will need to check if main Ractor, but writing constants is slow path so that seems fine.

Every instance variable read/write will need to check if the receiver is shareable when there are Ractors. If it is, it will need to additionally check if current == main.
That seems expensive, can we check e.g. on OptCarrot how much it costs?

On optcarrot, I don't see no perf regression. However current implementation lacks correct ivar checking. For example inline cache is not checked.

Same for class variables (but hopefully people don't use it much).

For global variables, we still need to define the semantics.

Some synchronization for VM data structures like the per-class ivar table, but that's mostly slow path so it should be fine.
And that synchronization already exists on Ruby impls without a GIL.
What about method tables? I guess synchronizing might slow down uncached method calls.
JRuby/TruffleRuby already synchronize the method table of course.

Yes. it should be protected, and now it is not protected enough (TODO).
At method lookup, it acquires VM-wide global lock (1 lock).

Anything else I missed?

Ractor.current uses __thread and fs/gs registers on CRuby, right?

Using pthread_getspecific() on pthread (and it accesses segment registers).
There is no problem to use __thread.

On JVM there is unfortunately no easy way to do that (except by using Thread.currentThread() and subclassing java.lang.Thread to add a field, but that doesn't work for the main Java thread). And Java ThreadLocal is rather slow. Anyway, something to figure out on JVM.
Another way is to pass the current Ractor as a hidden argument through all calls, but that can be an overhead for calls.

shareable = "immutable or Module or Ractor" currently. How about making Ractor immutable so it's simply "immutable or Module"? Any reason to keep Ractor instances non-frozen?
That's just a semantics simplification, because either way all these would have the shareable flag set.

I feel Ractor#send mutates the Ractor's state and frozen Ractor shouldn't accept any messages.

Updated by ko1 (Koichi Sasada) over 3 years ago

I feel Ractor#send mutates the Ractor's state and frozen Ractor shouldn't accept any messages.

However, I don't against to freeze Ractors.

q = Queue.new
q.freeze
q << 1

is working (I'm not sure it is intentional).

How about making Ractor immutable so it's simply "immutable or Module"?
(quote again)

The reason why I categorized "(1) immutable (2) modules (3) specials (includes Ractor)" is, I want to add shareable objects with explicit synchronization to (3), like I mentioned at https://bugs.ruby-lang.org/issues/17100#note-26

Updated by ko1 (Koichi Sasada) over 3 years ago

I don't care about the name recv and receive.

Eregon (Benoit Daloze) wrote in #note-19:

@ko1 (Koichi Sasada), @matz (Yukihiro Matsumoto) Which one(s) do you like in this list?

I believe Ractor#send is best name because it sends an message.
It is conflicts with Ruby's Object#send, but it should not be a reason to use appropriate naming.

Also Ractor#<< is already introduced. I recommend you to use it.

BTW, Matz asked me that he want to prepare the "recvfrom" which will return a message with source Ractor.
It means we need to add source ractor information. Usually it is not an issue, but if pipe-like Ractor should care about it.


# recv -> yield adapter type pipe
pipe1 = Ractor.new do
  msg, src = Ractor.recvfrom
  Ractor.yield msg from: src
end

# recv -> send type pipe
pipe2 = Ractor.new (to) do
  msg, src = Ractor.recvfrom
  to.send msg from: src
end

I think if recv is not acceptable, recvfrom should have better name. Any good idea?

Actions #30

Updated by ko1 (Koichi Sasada) over 3 years ago

Eregon (Benoit Daloze) wrote in #note-23:

Here is a problematic case on master sharing mutable state via define_method:

Exactly. define_method is also the big issue...
The Proc object referred from defined method is not isolated, so we need to prohibit such method calling from other Ractors.

define_method(:double) do |a| a * 2 end should be allowed, so we may prepare the way to specify they are isolated.

define_method(:double, &Proc.new{|a| a*2}.isolate)

or shorter syntax.

(I forget to propose Proc#isolate)

Updated by ko1 (Koichi Sasada) over 3 years ago

marcandre (Marc-Andre Lafortune) wrote in #note-24:

I find Ractor very promising 🎉

Thanks, I hope so.

The impact is severily limited without support for user-based immutable types. Is that planned for 3.0?

I'm curious what is the technical difficulty. I mean, can't existing builtin freeze check for all instance variables and if all are immutable or shareable, then it also sets the immutable flag for the current object?

I introduced shareable flag into objects (FL_SHAREABLE in C level).
Once the interpreter found it is immutable, this flag will be set and no need to check twice.

Hopefully we will get deep_freeze. I created #17145

We should probably have Ractor.shareable?(obj) # => true or false, right?

It is easy to implement, but not sure how to use them. debugging purpose?
(I don't against to introduce it)

Ractor.select: How about reversing the result order (from [ractor, message] to [message, ractor])? I can imagine more cases were we don't care about which ractor we got the result from than the reverse. result, = Ractor.select(*ractor_farm) is easier to write.

Good idea.

Updated by Eregon (Benoit Daloze) over 3 years ago

ko1 (Koichi Sasada) wrote in #note-28:

Thanks for all your replies!

q = Queue.new
q.freeze
q << 1

is working (I'm not sure it is intentional).

Surprising to me, I think unintended, I filed #17146.

I believe Ractor#send is best name because it sends an message.

"Sending a message" is unfortunately ambiguous, if we consider a "message send" = method call as in Smalltalk.
It might be confusing to name it send as then people might expect "call a method behavior" like ractor.send(:foo, 1, 2) would call foo(1, 2) on ractor asynchronously.
Example APIs that are somewhat similar:
https://github.com/collectiveidea/delayed_job#queuing-jobs
https://github.com/mperham/sidekiq/wiki/Delayed-extensions

Also Ractor#<< is already introduced. I recommend you to use it.

Good to know.

I think if recv is not acceptable, recvfrom should have better name. Any good idea?

Indeed, I think recvfrom is even more problematic.
I suggest message, sender = Ractor.receive_with_sender (I mentioned above)

Updated by MaxLap (Maxime Lapointe) over 3 years ago

Every method call in Ruby is a message that is sent. That's why Object#send(name,...) exists.

Sure, you are sending a message to Ractor, which is a message. But so is sending a packed from a Socket, using any connection, sending a query to a database. At this point, send is basically a reserved method and while __send__, all those other system did not overwrite send, I don't see why Ractor should. It just makes things inconsistent.

So please, anything but send.

my_ractor.accept(value)
my_ractor.send_value(value)

Ractor.wait_value

Actions #34

Updated by Eregon (Benoit Daloze) over 3 years ago

Updated by marcandre (Marc-Andre Lafortune) over 3 years ago

We now have receive, very good.

What about send? Should we use yield? pass? call?

Updated by Eregon (Benoit Daloze) over 3 years ago

marcandre (Marc-Andre Lafortune) wrote in #note-35:

What about send? Should we use yield? pass? call?

Currently there is the alias Ractor#<<, maybe we should use that consistently instead of send in the docs, tests, etc?
(so make send the alias, or maybe even remove Ractor#send since it seems prone to conflicts?
I could never get an agreement to remove Ractor#recv, so probably best to establish the name used in docs and keep send as an alias first)

Personally, I think (full list in https://bugs.ruby-lang.org/issues/17100#note-19 ):
yield would be confusing with Ractor.yield.
call sounds synchronous to me.
pass seems a bit strange to me given there is Thread.pass, but not really against.

<< is what I prefer.
post sounds good to me.

Updated by ko1 (Koichi Sasada) over 3 years ago

Eregon (Benoit Daloze) wrote in #note-36:

Currently there is the alias Ractor#<<, maybe we should use that consistently instead of send in the docs, tests, etc?

I believe send is the best name, so I disagree about it.

Updated by marcandre (Marc-Andre Lafortune) over 3 years ago

Eregon (Benoit Daloze) wrote in #note-36:

Currently there is the alias Ractor#<<, maybe we should use that consistently instead of send in the docs, tests, etc?
(so make send the alias, or maybe even remove Ractor#send since it seems prone to conflicts?

send accepts move: true, so ractor.<<(data, move: true) looks very odd.

pass seems a bit strange to me given there is Thread.pass, but not really against.
post sounds good to me.

I like post. Anything but send.

Updated by marcandre (Marc-Andre Lafortune) over 3 years ago

ko1 (Koichi Sasada) wrote in #note-37:

I believe send is the best name, so I disagree about it.

I understand that. Please consider that many people find this a really bad name though. What would be the second best name? What does Matz think about this?

Updated by Eregon (Benoit Daloze) over 3 years ago

marcandre (Marc-Andre Lafortune) wrote in #note-38:

send accepts move: true, so ractor.<<(data, move: true) looks very odd.

Good point.
Maybe a good way to avoid/workaround that issue would be to have Ractor#move(obj) (or a different name)?
But Ractor.yield(obj, move: true) seems fine, so it's not so nice.

I like << in part because it's so clear it takes a single argument, unlike send, and it corresponds to Queue#<<, which has similar semantics.
But all alternatives also make it pretty clear they take a single argument (even though not enforced syntactically), so I agree with "Anything but send".

Updated by duerst (Martin Dürst) over 3 years ago

ko1 (Koichi Sasada) wrote in #note-37:

I believe send is the best name, so I disagree about it.

Send would be the best name, except that that we are not in a vacuum. It is bad practice to have one and the same name for two fundamentally different and colliding functionalities.

Updated by nobu (Nobuyoshi Nakada) over 3 years ago

duerst (Martin Dürst) wrote in #note-41:

Send would be the best name, except that that we are not in a vacuum. It is bad practice to have one and the same name for two fundamentally different and colliding functionalities.

Agreed.
Kernel#send should be deprecated because of BasicObject#__send__.

Updated by Dan0042 (Daniel DeLorme) over 3 years ago

Trying to recap the ideas...
0) Use send; Kernel#send should be deprecated because of BasicObject#__send__ (amazing nobu!)

  1. Use a different name entirely; accept, yield, pass, call, post, queue, deliver
  2. Use a suffix, e.g. send_value and receive_value
  3. Use different methods depending on move semantics, e.g. send_move, send_copy, send_freeze

Updated by Eregon (Benoit Daloze) over 3 years ago

nobu (Nobuyoshi Nakada) wrote in #note-42:

Kernel#send should be deprecated because of BasicObject#__send__.

I think that's not reasonable.
There seem to be ~424649 usages of .send in gems, vs only ~28004 of .__send__ (https://twitter.com/eregontp/status/1320652814935298050).
It seems most .send usages mean Kernel#send.
And it seems many people know .send, and much fewer .__send__ (or they dislike it).
Deprecating would be a painful change for all these usages.
I think it's fair to say it's not worth it.

So I'm of the opposite opinion, BasicSocket#send should be deprecated (it's used so rarely anyway).
BasicSocket#send is stdlib though, Ractor#send would be core, that would be a much bigger issue.
Since Kernel#send exists and is widely used, adding Ractor#send would violate the Liskov substitution principle (completely incompatible & unrelated behavior).
In practice I think it would cause a lot of confusion, and probably break some code too.
Let's just use another name.

Quite some discussion here too: https://twitter.com/_ko1/status/1320354888686006274

Updated by ko1 (Koichi Sasada) over 3 years ago

Ractor#send naming issue

Recent weeks I'm thinking about the name of Ractor#send.
Matz and I discussed possibilities long time (this discussions and considerations stop my developments), and we agree to remain the name Ractor#send as an experimental feature.

Matz confirmed things about Kernel#send

  • BasicSocket#send and UDPSocket#send will not be deprecated. UNIX interface should be respected. (This is Matz's decision)

  • We can use Kernel#send only if we know the receiver doesn't override the Kernel#send method.

  • If you don't know the receiver's class, you need to use __send__ method instead of Kernel#send to dispatch the method.

    • Socket objects has send method as described above.
    • A rails app created by rails new has 3 classes which has send other than Kernel#send.
      • UDPSocket#send(*)
      • BasicSocket#send(*)
      • Concurrent::Agent#send(*args, &action) (it is similar to Ractor)
  • Matz agreed that __send__ is ugly (*2), so he is thinking about new method dispatch operator which can replace with __send__.

  • Overriding Object#send violates Liskov's substitution principle. However, sometimes Ruby violates this principle.

    • send has already overridden.
    • initialize is not compatible with Object#initialize.
    • there are some load methods.

(*1) This is an oversimplification, you can use send if you know the receivers.
(*2) MY OPINION (not a fact): BTW I don't think __send__ is ugly (because I'm a C programmer?). It is clear that something special (meta-programming) and it increases the readability for me.

Considerations

Ruby prefers duck-typing and it is friendly that same method name should behave same functionality (maybe people saying "Liskov's substitution principle" want to argue about it). If #write method only reads something, it is surprising. I believe the problem of the name "send" is this one issue.

If a reader finds the code r1.send(message) and it takes some seconds to understand that it is either Ractor#send or Kernel#send, it's a problem to consider. For example, I found that racotr.send(obj) in method_missing is a bit confusing. https://gist.github.com/ko1/cee945e8a9fa5ef4e23ce96d3980e11c#file-activeobject-rb-L18

I believe there is no confusion with Ractor#send and Kernel#send in most of cases if the receiver is clear. So I estimated the confusion problem is limited in practice.

We discussed alternative names but we can not find the best name to replace the "send".

  • send_message(obj): Matz doesn't like it because in OOP the name "message" means method selector (name) with parameters.
  • deliver(obj): Aaron pointed out that when deliver method finished, the received should be finished.
  • post(obj): I can't accept this method name because it is strongly connected with mailing system.
    (and other names)

In "Actor" context, "send" is the most clear word to represent the behavior because of historical reasons. Many papers and language references use this word.

I thought that naming it as other name (like post) and after introducing new method dispatch operator, rename (or add an alias) it to send is one idea. However, it is impossible because people can use Kernel#send for ractor objects before renaming it.

Finally, we can not find enough advantage of solving limited naming confusion problem by not being able to use appropriate name.

Fortunately, "Ractor" and Ractor APIs are experimental features (they shows mandatory experimental warning), so we can change the name "send" if it has critical issues.

("A library doesn't work well because it doesn't use __send__" is not a problem because it does not support all objects without Ractor#send)

So let's try Ractor#send on Ruby 3.0 as experimental name.

Updated by Eregon (Benoit Daloze) over 3 years ago

Thank you for the detailed explanation, that makes sense to me.
I am sure it will result in some confusion, but hopefully it will be rare in practice.

ko1 (Koichi Sasada) wrote in #note-45:

  • Matz agreed that __send__ is ugly (*2), so he is thinking about new method dispatch operator which can replace with __send__.

Yes, I think that would be good.
A new module with singleton methods might be an elegant way to make some basic methods which must not be redefined available.
That module could just be frozen to avoid redefinitions (the constant could still be re-assigned, but we could explicitly raise/warn against it).

I find it interesting that so far we only needed a non-redefined variant for send (__send__) and object_id (__id__), and not other Kernel/Object/BasicObject methods.
And in practice it seems very rare to override object_id, so it seems it's really just send being overridden for unrelated semantics.
I see it as the Liskov's substitution principle being mostly respected.

BTW, any initialize method is a superset of/is compatible with BasicObject#initialize, since that does nothing.

post(obj): I can't accept this method name because it is strongly connected with mailing system. (and other names)

Isn't it common actor terminology to receive a message in a mailbox? (e.g. https://erlang.org/doc/reference_manual/processes.html)
So I think the connection to mailing has been there for a while.

Updated by matz (Yukihiro Matsumoto) over 3 years ago

send is a term traditionally used in the actor context. We'd like to experiment how it works well (or bad) for Rubu3.0.

Matz.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0