Feature #17100
closedRactor: a proposal for a new concurrent abstraction without thread-safety issues
Description
Ractor: a proposal for a new concurrent abstraction without thread-safety issues¶
Abstract¶
This ticket proposes a new concurrent abstraction named "Ractor," Ruby's Actor-like feature (not an exact Actor-model).
Ractor achieves the following goals:
- Parallel execution in a Ruby interpreter process
- Avoidance of thread-safety issues (especially race issues) by limiting object sharing
- Communication via copying and moving
I have been working on this proposal for a few years. The project name has been "Guild," but I renamed it to Ractor following Matz' preference.
Resources:
- Proposed specification: https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md
- My talk:
- (latest, but written in Japanese) http://atdot.net/~ko1/activities/2020_ruby3summit.pdf
- (old, API is changed) http://atdot.net/~ko1/activities/2018_rubykaigi2018.pdf
- (old, API is changed) http://atdot.net/~ko1/activities/2018_rubyconf2018.pdf
Current implementation is not complete (contains many bugs) but it passes the current CI. I propose to merge it soon and try the API, and to continue working on the implementation on master branch.
Background¶
MRI doesn't provide an in-process parallel computation feature because parallel "Threads" have many issues:
- Ruby programmers need to consider about Thread-safety more.
- Interpreter developers need to consider about Thread-safety more.
- Interpreter will slow down in single thread execution because of fine-grain synchronization without clever optimizations.
The reason for these issues is "shared-everything" thread model.
Proposal¶
To overcome the issues on multiple-threads, Ractor abstraction is proposed. This proposal consists of two layers: memory model and communication model.
Basics:
- Introduce "Ractor" as a new concurrent entity.
- Ractors run in parallel.
Memory model:
- Separate "shareable" objects and "un-shareable" objects among ractors running in parallel.
- Shareable objects:
- Immutable objects (frozen objects only refer to shareable objects)
- Class/module objects
- Special shareable objects (Ractor objects, and so on)
- Un-shareable objects:
- Other objects
- Shareable objects:
- Most objects are "un-shareable," which means we Ruby programmers and interpreter developers don't need to care about thread-safety in most cases.
- We only concentrate on synchronizing "shareable" objects.
- Compared with completely separated memory model (like MVM proposal), programming will be easier.
- This model is similar to Racket's
Place
abstraction.
Communication model:
- Actor-like (not the same) message passing using
Ractor#send(obj)
andRactor.recv
- Pull-type communication using
Ractor.yield(obj)
andRactor#take
- Support for multiple waiting using
Ractor.select(...)
Actor-like model is the origin of the name of our proposal "Ractor" (Ruby's actor). However, currently, it is not an Actor model because we can't select the message (using pattern match as in Erlang, Elixir, ...). This means that we can't have multiple communication channels. Instead of adopting an incomplete actor model, this proposal provides yield
/take
pair to handle multiple channels. We discuss this topic later.
I strongly believe the memory model is promising. However, I'm not sure if the communication model is the best. This is why I introduced "experimental" warning.
Proposed specification: https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md
Implementation¶
https://github.com/ruby/ruby/pull/3365
All GH actions pass.
I describe the implementation briefly.
rb_ractor_t
¶
Without Ractor, the VM-Thread-Fiber hierarchy is like this:
- The VM
rb_vm_t
manages running threads (rb_thread_t
). - A thread (
rb_thread_t
) points to a running fiber (rb_fiber_t
).
With Ractor, we introduce a new layer rb_ractor_t
:
- The VM
rb_vm_t
manages running ractors (rb_ractor_t
). - A Ractor manages running threads (
rb_thread_t
). - A thread (
rb_thread_t
) points to a running fiber (rb_fiber_t
).
rb_ractor_t
has a GVL to manage threads (only one among a Ractor's threads can run).
Ractor implementation is located in ractor.h
, ractor.c
and ractor.rb
.
VM-wide lock¶
VM-wide lock is introduced to protect VM global resources such as object space. It should allow recursive lock, so the implementation is a monitor. We shall call it VM-wide monitor. For now, RB_VM_LOCK_ENTER()
and RB_VM_LOCK_LEAVE()
are provided to acquire/release the lock.
Note that it is different from the (current) GVL. A GVL is acquired anytime you run a Ruby thread. VM-wide lock is acquired only when accessing VM-wide resources.
On single ractor mode (all Ruby scripts except my tests)
Object management and GC¶
- (1) All ractors share the object space.
- (2) Each GC event will stop all ractors, and a ractor GC works under barrier synchronization.
- Barrier at
gc_enter()
- marking, (lazy) sweeping, ...
- Barrier at
- (3) Because all of the object space are shared by ractors, object creation is protected by VM-wide lock.
(2) and (3) have huge impact on performance. The plan is:
- For (2), introduce (semi-)separated object space. It would require a long time and Ruby 3.0 can't employ this technique.
- For (3), introduce free slot cache for every ractor; then most creations can be done without synchronization. It will be employed soon.
Experimental warning¶
Currently, Ractor implementation and specification are not stable. So upon its first usage, Ractor.new
will show a warning:
warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
Discussion¶
Actor-based and channel-based¶
I think there are two message passing approaches: Actor-based (as in Erlang, ...) and channel-based (as in Go, ...).
With channel-based approach, it is easy to manipulate multiple channels because it manages them explicitly. Actor-based approach manipulates multiple channels with message pattern. The receiver can ignore unexpected structured messages or put them on hold and can handle them after the behavior has changed (role of actor has changed).
Ractor has send/recv
like Actor-model, but there is no pattern matching feature. This is because we can't introduce new syntax, and I can't design a good API.
With channel-based approach, it is easy to design the API (for example, do ch = Ractor::Channel.new
and share the ch
that ractors can provide). However, I can't design a good API to handle exceptions among Ractors.
Regarding error handling, we propose a hybrid model using send/recv
, yield/take
pairs. Ractor#take
can receive the source ractor's exception (like Thread#join
). On Actor approach, we can detect when the destination Ractor is not working (killed) upon Ractor#send(obj)
. A receiver ractor (waiting for Ractor.recv
) cannot detect the sender's trouble, but maybe the priority is not high. Ractor#take
also detects the sender's (Ractor.yield(obj)
) error, so the error can be propagated.
To handle multiple communication channels on Ractor, instead of using multiple channels, we use pipe ractors.
# worker-pool (receive by send)
main # pipe.send(obj)
-> pipe # Ractor.yield Ractor.recv
->
worker1 # Ractor.yield(some_task pipe.take))
worker2 # Ractor.yield(some_task pipe.take))
worker3 # Ractor.yield(some_task pipe.take))
-> main # Ractor.select(worker1, worker2, worker3)
# if worker* causes an error, main can detect the error.
pipe ractors may look like channels. However, we don't need to introduce new classes with this technique (the implementation can omit Ractor creation for pipe ractors).
Maybe there are other possibilities. For example, if we can propagate the errors with channels, we can also consider a channel-model (we need to change the Ractor name :p then).
Name of Ractor (and Guild)¶
When I proposed Guild in 2016, I regarded "move" message-passing (see specification) to be characteristic of it, and I called this feature "moving membership." This is why the name "Guild" was chosen. However Matz pointed out that this move semantics is not used frequently, and he asked me to change the name. Also, someone had already been using the class name "Guild."
"Ractor" is short and is not an existing class; this is why I choose "Ractor."
I understand people may confuse it with "Reactor."
TODO¶
There are many remaining tasks.
Protection¶
Many VM-wide (process-wide) resources are not protected correctly, so using Ractor on a complicated program can cause critical bugs ([BUG]
). Most global resource are managed by global variables, so we need to check them correctly.
C-methods¶
Currently, C-methods (methods written in C and defined with rb_define_method()
) run in parallel. It means that thread-unsafe code can run in parallel. To solve this issue, I plan the following:
(1) Introduce thread-unsafe label for methods
It is impossible to make all C-methods thread-safe, especially for C-methods in third party C-extensions. To protect them, label these (possibly) thread-unsafe C-methods as "thread-unsafe."
When "unsafe"-labeled C methods are invoked, they acquire a VM-wide lock. This VM-wide lock should check for recursiveness (so this lock should be a monitor) and escaping (exceptions). Currently, VM-wide lock doesn't check for escaping, but that should be implemented soon.
(2) Built-in C-methods
I'll fix most of the builtin C-methods (String, Array, ...) so that they will become thread-safe. If it is not easy, I'll use thread-unsafe label.
Copying and moving¶
Currently, Marshal protocol makes deep copy on message communication. However, Marshal protocol doesn't support some objects like Ractor
objects, so we need to modify them.
Only a few types are supported for moving, so we need to write more.
"GVL" naming¶
Currently, the source code contains the name "GVL" for Ractor local locks. Maybe they should be renamed.
Performance¶
To introduce fine-grained lock, performance tuning is needed.
Bug fixes¶
many many ....
Conclusion¶
This ticket proposes a new concurrent abstraction "Ractor." I think Ruby 3 can ship with Ractor under "experimental" status.
Updated by Eregon (Benoit Daloze) over 4 years ago
Looking forward to this.
I have a few questions.
- Reactor#recv/Channel#recv => Reactor#receive/Channel#receive. I don't think libc-like abbreviations should be used. Let's use actual words.
- About synchronization for Modules, specifically, for instance variables, constants and class variables. Should it be consistent? I think it would be better. For instance module @ivars/constants/@@cvars can be accessed from any Reactor if it's a shareable value, and only main Reactor if non-shareable. This seems the most compatible while still safe, but it might be surprising that the access is allowed depending on the value.
- I think we need to introduce
#deep_freeze
to make it easy to create deeply-immutable objects conveniently and efficiently. - On a similar note,
#deep_copy
seems useful as a more optimized way than Marshal#dump+load. - I think it is worth noting that
send/yield(obj, move: true)
still involves some shallow copying. - ObjectSpace should be per Reactor, right? It seems important for guarantees that
ObjectSpace.each_object
only iterates objects in the current Reactor (otherwise all low-level data races can happen again). - C extensions: should C extensions be marked as default thread-safe, or thread-unsafe? It seems considered thread-safe by default in the proposal (sounds optimistic, but also should help to identify those which are not thread-safe). I'm looking forward to have C extensions executing in parallel. Currently TruffleRuby defaults to using a global monitor for C extensions (it's a CLI flag) to guarantee the same semantics as CRuby (IIRC we found that the
openssl
C ext and a few others manipulate their internal state in a thread-unsafe manner). - Previously it was mentioned C extensions would only be allowed for the main/initial Reactor. I'm happy to hear this is no longer the case. However, isn't there a risk that C extensions might pass Ruby objects without copying or transfer to other Reactor via e.g. C global variables? (and that could cause data races/segfault). I guess we have to trust C extensions to not do that? Probably we need to introduce easy ways to have reactor-local variables (similar to thread-local variables), so C global variables can be replaced with reactor-local variables.
-
I'll fix most of builtin C-methods (String, Array, ...) thread-safe.
You mean just about synchronizing access to global variables, right? I guess in your model String/Array/Hash operations will not have any synchronization but rely on Reactor isolation, right? - Does
Ractor::RemoteError
points to original error? How to avoid that exceptions "leak" objects from inside the Reactor? Maybe copy/move the exception? (The reactor can still be alive if it rescues e.g.Ractor::MovedError
) - About global variables, what about
$VERBOSE
and$DEBUG
which are implicitly accessed? Should it still be allowed to read them in other reactors? Should they become Reactor-local? Should global variables be allowed to be read from any Reactor if they contain a shareable value?
Updated by Eregon (Benoit Daloze) over 4 years ago
Note: I updated my questions on Redmine based on rereading https://github.com/ko1/ruby/blob/ractor_parallel/doc/ractor.md
Updated by ibylich (Ilya Bylich) over 4 years ago
First of all, thanks a lot for your work. This is a huge (and incredibly valuable) addition to Ruby.
I have your branch built locally and I played a lot with it during last few days. I've got a few questions:
- Multiple ractors use a single shared objectspace, right? If so, is it ok that objects can be shared using ObjectSpace? I'm talking about something like this:
require 'objspace'
A = []
1.upto(10) do |i|
Ractor.new(A.object_id, i) do |a_object_id, i|
a = ObjectSpace._id2ref(a_object_id)
p a
a << i
end
end
sleep 1
p A
Currently it throws an error array modified during dump (RuntimeError)
depending on the timing (i.e. sometimes it works). Also, global values in C can be used to store a reference to some VALUE
that can be later used by other ractors. Is it possible (and is it necessary at all) to add some kind of protection against it?
-
Also, I've noticed that marshalling is used to pass objects to ractors. Because of that, it's impossible to pass objects like
Mutex
,ConditionVariable
and I suppose many other objects that encapsulate them (likeConcurrent::Array
fromconcurrent-ruby
). Are there any plans to make them share-able between multiple ractors? These classes are developed to be used in parallel threads, but of course VM can't know about all of them. -
It's impossible to use
require
in non-main ractors (because it mutates many global values like$LOADED_FEATURES
). Does it mean that initialization part of the client code always has to be performed at startup? What aboutautoload
? -
Are there any plans to add something like
Ractor.wait(ractor)
/Ractor.wait_all
(similar toThread.join
/Process.waitpid
)?
Updated by ko1 (Koichi Sasada) over 4 years ago
Thank you for your question.
Eregon (Benoit Daloze) wrote in #note-2:
- Reactor#recv/Channel#recv => Reactor#receive/Channel#receive. I don't think libc-like abbreviations should be used. Let's use actual words.
I want to ask Matz about naming. I don't care to rename it (send/recv is same character count so it is happy to write, though :p).
- About synchronization for Modules, specifically, for instance variables, constants and class variables. Should it be consistent? I think it would be better. For instance module @ivars/constants/@@cvars can be accessed from any Reactor if it's a shareable value, and only main Reactor if non-shareable. This seems the most compatible while still safe, but it might be surprising that the access is allowed depending on the value.
I understand the concerns. Current design is fixed version by Matz.
Making it consistent can be an option.
BTW, there is some inconsistent by variable types:
p @ivar
p @@cvar #=> t.rb:2:in `<main>': uninitialized class variable @@cvar in Object (NameError)
so consistency is not best priority I guess.
- I think we need to introduce
#deep_freeze
to make it easy to create deeply-immutable objects conveniently and efficiently.
Yes. We need to consider about new syntax or method.
But not essential. I think Ruby 3.1 is okay if it is not determined.
- On a similar note,
#deep_copy
seems useful as a more optimized way than Marshal#dump+load.
Maybe it is different topic, but I agree it will help.
- I think it is worth noting that
send/yield(obj, move: true)
still involves some shallow copying.
I agree. ractor.md will cover it.
- ObjectSpace should be per Reactor, right? It seems important for guarantees that
ObjectSpace.each_object
only iterates objects in the current Reactor (otherwise all low-level data races can happen again).
This is very very difficult problem I ignored.
Now there is no way to recognize which objects belongs to which Ractors.
Simply each_objects
method is allowed on single Ractor mode (no Ractor.new
) is one option I guess.
- C extensions: should C extensions be marked as default thread-safe, or thread-unsafe? It seems considered thread-safe by default in the proposal (sounds optimistic, but also should help to identify those which are not thread-safe). I'm looking forward to have C extensions executing in parallel. Currently TruffleRuby defaults to using a global monitor for C extensions (it's a CLI flag) to guarantee the same semantics as CRuby (IIRC we found that the
openssl
C ext and a few others manipulate their internal state in a thread-unsafe manner).
Default unsafe same as TruffleRuby, and now this feature is not supported on PR.
- Previously it was mentioned C extensions would only be allowed for the main/initial Reactor. I'm happy to hear this is no longer the case. However, isn't there a risk that C extensions might pass Ruby objects without copying or transfer to other Reactor via e.g. C global variables? (and that could cause data races/segfault). I guess we have to trust C extensions to not do that? Probably we need to introduce easy ways to have reactor-local variables (similar to thread-local variables), so C global variables can be replaced with reactor-local variables.
I guess we have to trust C extensions to not do that?
I believe people. "Default unsafe" means I don't think it is easy to make safe.
Probably we need to introduce easy ways to have reactor-local variables (similar to thread-local variables), so C global variables can be replaced with reactor-local variables.
Agreed.
I'll fix most of builtin C-methods (String, Array, ...) thread-safe.
You mean just about synchronizing access to global variables, right? I guess in your model String/Array/Hash operations will not have any synchronization but rely on Reactor isolation, right?
As you said, most of String/Array operations are already thread-safe. However, for example fstring table is not thread-safe. Encoding table is also not thread-safe, and so on. The challenge is how to find such global resource accesses.
- Does
Ractor::RemoteError
points to original error? How to avoid that exceptions "leak" objects from inside the Reactor? Maybe copy/move the exception? (The reactor can still be alive if it rescues e.g.Ractor::MovedError
)
Copied one.
One exceptional (sorry confusing) case is the exception which terminate the source ractor.
It can be an optimization but not implemented.
- About global variables, what about
$VERBOSE
and$DEBUG
which are implicitly accessed? Should it still be allowed to read them in other reactors? Should they become Reactor-local? Should global variables be allowed to be read from any Reactor if they contain a shareable value?
Good question. Also $0
, $:
, $stdin
, $stdout
, ... are needed to consider. Now no idea. Should be Rator local? Some global variables are scope local ($1
, $2
, $~
, ...), so I don't think such irregular scope is an issue.
Thanks,
Koichi
Updated by ko1 (Koichi Sasada) over 4 years ago
Thank you for trying Ractor.
ibylich (Ilya Bylich) wrote in #note-4:
- Multiple ractors use a single shared objectspace, right? If so, is it ok that objects can be shared using ObjectSpace? I'm talking about something like this:
Should not be allowed.
We need to consider how to change the ObjectSpace semantics.
Currently it throws an error
array modified during dump (RuntimeError)
depending on the timing (i.e. sometimes it works). Also, global values in C can be used to store a reference to someVALUE
that can be later used by other ractors. Is it possible (and is it necessary at all) to add some kind of protection against it?
You are right and such C-extensions should be thread-unsafe.
In a ticket, I mentioned "thread-unsafe" C-methods, but I need to mention "Ractor-supported" C-methods and other than it, they should not allowed to run on Ractors except the main ractor.
- Also, I've noticed that marshalling is used to pass objects to ractors. Because of that, it's impossible to pass objects like
Mutex
,ConditionVariable
and I suppose many other objects that encapsulate them (likeConcurrent::Array
fromconcurrent-ruby
). Are there any plans to make them share-able between multiple ractors? These classes are developed to be used in parallel threads, but of course VM can't know about all of them.
I believe synchronization primitive between Ractors should be message passing (because it is easy to debug), so I don't think support Mutex between Ractors.
If you mean only sending Mutex and it only affect between threads in a Ractor, it is possible.
- It's impossible to use
require
in non-main ractors (because it mutates many global values like$LOADED_FEATURES
). Does it mean that initialization part of the client code always has to be performed at startup? What aboutautoload
?
Great question, and it is still open question.
Of course it is safe to 'require` from main.
- Are there any plans to add something like
Ractor.wait(ractor)
/Ractor.wait_all
(similar toThread.join
/Process.waitpid
)?
Ractor#take(r)
and Ractor.select(r1, r2, ...)
can emulate them.
Updated by shyouhei (Shyouhei Urabe) over 4 years ago
Eregon (Benoit Daloze) wrote in #note-2:
- I think we need to introduce
#deep_freeze
to make it easy to create deeply-immutable objects conveniently and efficiently.- On a similar note,
#deep_copy
seems useful as a more optimized way than Marshal#dump+load.
Do we need both? It seems an "optimised deep copy that returns everything frozen" should just suffice for Ractor's needs.
Updated by Eregon (Benoit Daloze) over 4 years ago
ko1 (Koichi Sasada) wrote in #note-5:
BTW, there is some inconsistent by variable types:
I think that does not matter much, it's about consistency between what "state" of modules/classes can be accessed.
I do not mind much about class variables because I think they should be deprecated/removed anyway (out of topic).
Instance variables on modules should probably be consistent with constants.
Instances variables of objects inside the Ractor can also be accessed, so it seems weird if @ivars on a shareable value like module cannot be accessed.
I wish @ivar access would not depend on the type or any extra condition (well, except frozen), but I guess that's not possible with Ractors.
BTW, is the ownership of modules tracked?
E.g., what about a Module.new{} created in a Ractor, there I'd guess all access is allowed?
Or is it assumed it might be shared with other ractors, and so it's restricted by type (module & class)?
Yes. We need to consider about new syntax or method.
But not essential. I think Ruby 3.1 is okay if it is not determined.
Yes, does not need to be at the same time but IMHO would make sense.
I think regarding the name, deep_freeze
is the obvious choice.
This is very very difficult problem I ignored.
Now there is no way to recognize which objects belongs to which Ractors.
One possibility is how TruffleRuby implements ObjectSpace.each_object: to use the mark phase of a GC and start from the Ractor's roots.
This might require a precise GC/marking phase though, as some integer in memory shouldn't be interpreted as a valid object from another Ractor.
I guess this should be easier once each Ractor has its own heap. Then all those objects are OK + global shareable.
Simply
each_objects
method is allowed on single Ractor mode (noRactor.new
) is one option I guess.
Seems rather restrictive but certainly better than exposing all objects which is unsafe.
Default unsafe same as TruffleRuby, and now this feature is not supported on PR.
I'm a bit confused with your reply here.
I guess you mean C extensions should be considered by default thread-unsafe, and they need some way to mark thread/parallel/ractor-safe?
But current PR doesn't restrict anything yet?
The explanation above seems to mean "assume safe and allow using C exts in parallel by default:
Now C-methods (methods written in C and defined with rb_define_method()) are run in parallel. It means thread-unsafe code can run in parallel.
To solve this issue, I plan the following: [...] Introduce thread-unsafe label
So unlabeled functions would be considered Ractor/parallel-safe?
Or do you mean we should add a "ractor/parallel/thread-safe label", so unlabeled/existing functions are considered unsafe and not run in parallel?
I believe people. "Default unsafe" means I don't think it is easy to make safe.
Right, default considered as unsafe + a way to mark as safe and documentation to detail what are conditions to be safe sounds good.
As you said, most of String/Array operations are already thread-safe. However, for example fstring table is not thread-safe. Encoding table is also not thread-safe, and so on. The challenge is how to find such global resource accesses.
For TruffleRuby, when removing the GIL, I remember going through every field of the "global context" and make sure it's synchronized properly.
Probably this is a good way to find what to synchronize on MRI too.
There are also more tricky cases, like state related to bytecode, inline caches, module state like ivar offset table, etc, which are not directly in the "global context".
Good question. Also
$0
,$:
,$stdin
,$stdout
, ... are needed to consider. Now no idea. Should be Rator local? Some global variables are scope local ($1
,$2
,$~
, ...), so I don't think such irregular scope is an issue.
I think user (not special) global variables should be Ractor-local. That seems the most compatible and useful.
shyouhei (Shyouhei Urabe) wrote in #note-7:
Do we need both? It seems an "optimised deep copy that returns everything frozen" should just suffice for Ractor's needs.
For Ractor I think either deep_copy
(would be used internally for non-deep-frozen messages) or deep_freeze
is useful (to pass without copy), rarely together for the same objects.
I think they are both generally useful, even without Ractors.
Also, [1, Object.new, { a: "foo" }].deep_freeze
needs no copy.
Many libraries already implement their own deep_copy
(often via Marshal.load/dump, but that seems suboptimal).
P.S.: Sorry, I noticed I wrote "Reactor" instead of "Ractor" many times in the previous message, I was too tired and probably need to get used to the name.
Updated by sawa (Tsuyoshi Sawada) over 4 years ago
- Subject changed from Ractor: a proposal for new concurrent abstraction without thread-safety issues to Ractor: a proposal for a new concurrent abstraction without thread-safety issues
- Description updated (diff)
Updated by mame (Yusuke Endoh) about 4 years ago
Question: What is "immutable"?
I've tried the following code, but the implementation raises a NameError "can not access non-sharable objects in constant A by non-main Ractors".
A = {}
A.freeze
Hash.freeze
Object.freeze
self.freeze
Ractor.new do
Object::A
end.take
Is this a bug or not-yet-implemented? Or do I need to freeze something other? Or is my expectation of "immutable" different?
Updated by mame (Yusuke Endoh) about 4 years ago
I've directly heard "not-yet-implemented" from @ko1 (Koichi Sasada). Sorry for noise.
Updated by jeremyevans0 (Jeremy Evans) about 4 years ago
- Tracker changed from Bug to Feature
- Backport deleted (
2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)
Updated by matz (Yukihiro Matsumoto) about 4 years ago
OK, I accept the Ractor concept. Go ahead and merge.
We have left some naming issues. My current ideas are:
- I vote for
recv
mostly because of the past familiality to UNIX naming convention - I also want
recvfrom
a la UDP communication. The namerecv
has an intimacy with it. - I have no objection for having full spelled
receive
alias torecv
-
pipe
may be renamed tochannel
orstream
in the future. Need more discussion.
Matz.
Updated by mame (Yusuke Endoh) about 4 years ago
I basically like Ractors, but I'd like to discuss how to share constants.
Currently, a Ractor cannot read any constant if it has an unsharable object.
C = [[1, 2], [3, 4]]
Ractor.new do
p C #=> can not access non-sharable objects in constant Object::C by non-main Ractor. (NameError)
end.take
It is possible to read the constant only if its value is deeply frozen.
C = [[1, 2].freeze, [3, 4].freeze].freeze
Ractor.new do
p C #=> [1, 2, 3]
end.take
This design is safe, but too conservative, I think.
In fact, I managed to run optcarrot on parallel ractors, and I needed to write a bunch of code to deeply freeze many constants that have arrays and hashes.
A new API to deeply freeze an object, say Object#deep_freeze
, will make it less annoying, but I think there are two problems:
- Practically, many existing libraries will not work as is in Ractor. I know that some people write
.freeze
to all constant values (mainly because rubocop suggests), but we need to replace it with.deep_freeze
or something. - IMO, adding
.deep_freeze
to almost constant definitions spoils Ruby's conciseness and flexibility. I don't want to write them, and don't want even to see them.
Here is my counterproposal: When a Ractor attempts to read a constant that has an unsharable object, the object should be "deeply copied" and cached.
C = {[1, 2], [3, 4]]
Ractor.new do
C #=> Marshal.load(Marshal.dump([[1, 2], [3, 4]]))
C #=> The same object as above
C[0] = "foo" # This modification should be Ractor-local, not observable by other Ractors
p C #=> ["foo", [3, 4]]
end.take
p C #=> [[1, 2], [3, 4]] # not modified
There are pros and cons:
Pros:
- Many existing libraries will work as is, I think. (At least, optcarrot will do.)
- We don't have to write annoying
.freeze
.
Cons:
- It is less effective due to object copy. (It is avoidable by adding
.freeze
if it is really important.) - If a constant is used as a mutable state holder, it may be unsafe.
The last point may be a bit important:
Counters = [0, 0, 0]
Ractor.new do
# This may result in some inconsistent data that is against the programmer's intention
p Counters #=> [3, 3, 2]
end
loop do
Counters[0] += 1
Counters[1] += 1
Counters[2] += 1
end
The current behavior is indeed safe because reading Counters
in a Ractor will raise an exception. I admit that this property is good.
However, this kind of constant usage should be relatively rare, IMO. I believe that almost all constants will not be updated, so I'm unsure if it is reasonable to prohibit reading any unsharable constants in a Ractor.
Updated by Eregon (Benoit Daloze) about 4 years ago
matz (Yukihiro Matsumoto) wrote in #note-14:
We have left some naming issues. My current ideas are:
- I vote for
recv
mostly because of the past familiality to UNIX naming convention
I think many people can agree the C socket API has terrible naming, I think we should not blindly follow those mistakes for new non-Socket APIs.
Also, receiving an actor message is a completely different level of abstraction than a low-level socket recv(2)
(for starters it does not involve IO).
Also recv(2)
is read(2)
with extra flags, which does not apply at all here (and so recv(2)
is rarely used and not well known).
I think the only advantage I can see of recv
is that it's 4 letters like send
.
We are designing Ruby APIs, not C APIs here, we should focus on making it nice for Ruby programmers, not for C programmers.
Erlang names it receive
, and that's one of the most used actor languages.
BTW I think send
is needlessly conflicting with Kernel#send
and will cause confusion.
(yes Socket#send
exists but almost nobody uses it)
How about Ractor#<<
instead?
- I also want
recvfrom
a la UDP communication. The namerecv
has an intimacy with it.
What does recvfrom
do? If it's "receive from" then it would be message = Ractor.recvfrom(specific_actor)
? (of course not, but it illustrates the unclear naming)
Let's properly name things. So here I'd suggest message, sender = Ractor.receive_with_sender
.
So much clearer and so much more readable.
So much more Ruby-like (IMHO).
- I have no objection for having full spelled
receive
alias torecv
Do we want actor code to look like C kernel hackers code, or like proper and nice Ruby code?
IMHO recv
should not exist for all the reasons mentioned above.
That said if we can't agree this is at least a way forward.
Let's ask the community to see what they think: https://twitter.com/eregontp/status/1299284528578596864
If I'm alone to have these concerns, then feel free to ignore me.
But I don't think that's the case.
Updated by Eregon (Benoit Daloze) about 4 years ago
mame (Yusuke Endoh) wrote in #note-15:
Here is my counterproposal: When a Ractor attempts to read a constant that has an unsharable object, the object should be "deeply copied" and cached.
Reading a constant to me is always expected to be fast and have minimal side-effects (except autoload on first access).
Doing a potentially large copy seems very far from that and counter-intuitive to me.
If a constant is used as a mutable state holder, it may be unsafe.
I'm worried this will lead to hard-to-debug bugs.
IMHO constants are global state, and just like modules & classes they should not be automatically copied with Ractors.
If they happen to contain a shareable object, which seems fairly frequent then it already works nicely.
If not I think an explicit deep_freeze
is a good way.
I think deep_freeze
is the way to go, and that 20 deep_freeze
calls is not so bad for a non-trivial codebase like OptCarrot.
Maybe CONSTANT = ...
should automatically deep_freeze
Would need a migration path though, probably with a magic comment. And also a way to keep it mutable when intended.
I'm not sure it's worth it.
Personally I regularly mutate the contents of constants, latest example.
I'd think almost all CONSTANT = []
or CONSTANT = {}
are meant to be mutated, and I guess that's not a rare pattern.
Updated by Eregon (Benoit Daloze) about 4 years ago
@ko1 (Koichi Sasada) Can you review these potential performance overheads of adding Ractors? (and so potential regressions for code not using Ractors)
I'm thinking notably reading constants will be less efficient, as it will need to check if Ractor.current == main Ractor (then OK to read anything), otherwise check if the value is shareable (by reading a flag?).
That's more expensive than the current semantics where e.g. TruffleRuby folds Ruby constants to a constant in compiled code (even with Threads). TruffleRuby could still fold while there is only the main Ractor, but then creating a new Ractor would cause probably unexpected performance degradation by adding the extra check for if current == main
.
An idea to fix that performance lost is to still fold (with no checks) if the value is shareable, because then it's always OK to read it.
Writing constants will need to check if main Ractor, but writing constants is slow path so that seems fine.
Every instance variable read/write will need to check if the receiver is shareable when there are Ractors. If it is, it will need to additionally check if current == main.
That seems expensive, can we check e.g. on OptCarrot how much it costs?
Same for class variables (but hopefully people don't use it much).
For global variables, we still need to define the semantics.
Some synchronization for VM data structures like the per-class ivar table, but that's mostly slow path so it should be fine.
And that synchronization already exists on Ruby impls without a GIL.
What about method tables? I guess synchronizing might slow down uncached method calls.
JRuby/TruffleRuby already synchronize the method table of course.
Anything else I missed?
Ractor.current
uses __thread
and fs/gs registers on CRuby, right?
On JVM there is unfortunately no easy way to do that (except by using Thread.currentThread() and subclassing java.lang.Thread to add a field, but that doesn't work for the main Java thread). And Java ThreadLocal is rather slow. Anyway, something to figure out on JVM.
Another way is to pass the current Ractor as a hidden argument through all calls, but that can be an overhead for calls.
shareable = "immutable or Module or Ractor" currently. How about making Ractor immutable so it's simply "immutable or Module"? Any reason to keep Ractor instances non-frozen?
That's just a semantics simplification, because either way all these would have the shareable flag set.
Updated by Eregon (Benoit Daloze) about 4 years ago
Re naming of receiving a message, here is the poll I made:
https://twitter.com/eregontp/status/1299284528578596864
126 votes, 83.3% Ractor.receive, 16.7% Ractor.recv.
I think the result can hardly be clearer.
I see it as naming it Ractor.recv
is just hack.
Yes, please you full words. I never got why so many languages and ecosystem artificially and randomly shorten verbs and nouns. Mostly for no good reason.
Yeah and Ruby usually doesn't participate in this so it's very strange to me..
Even Erlang names it receive
, and Ractor is clearly inspired by Erlang.
Ractor.recv
feels like a random shortcut for no good reason.
The reason of "same numbers of letters as send" seems a weak argument:
-
r.send(message)
andmessage = Ractor.recv
are unlikely to ever align vertically, so visually in code it gives basically nothing - There is
Ractor.yield(obj)
andRactor#take()
and we did not try to renameyield
to some 4-letters abbreviation. - The similarity with recv(2) is very thin, in fact I'd go as far as saying it has nothing to do with recv(2) (Ractor don't support networking currently and I see no plan in this proposal for it, and even if there was it would be an irrelevant implementation detail if it used
recv(2)
internally).
@matz (Yukihiro Matsumoto) Based on those arguments and the ones in my previous reply, could you agree Ractor.receive
is the better alternative to Ractor.recv
?
Regarding sending a message, quite a few were concerned of the conflict with Kernel#send (as you can see on the poll).
So I made another poll about sending a message:
https://twitter.com/eregontp/status/1299316723158585348
Also, send
is about calling a method. But unlike some other actor models, Ractor#send
does not call a method on the receiving actor.
It is not a message send ala Smalltalk (= method call). So that seems confusing.
OTOH send
is the word used in "send a message" so it has pros too.
The results there are much less clear (1/3 #send, 1/3 #<<, 1/3 other).
Some suggestions from the tweets:
ractor.send(message)
ractor << message
ractor.transmit(message)
ractor.post(message)
ractor.put(message)
ractor.pass(message)
ractor.call(message)
ractor.emit(message)
ractor.dispatch(message)
ractor.push(message)
ractor.cast(message)
ractor.deliver(message)
ractor.enqueue(message)
@ko1 (Koichi Sasada), @matz (Yukihiro Matsumoto) Which one(s) do you like in this list?
Regarding BasicSocket#send, some see it as a mistake and I can agree.
It just didn't seem worth the conflict since BasicSocket#send is so rarely used.
And using __send__
instead of send
in Ruby code feels ugly, so adding another send
just makes it more likely to get the conflict and bring more confusion.
Updated by joanbm (Joan Blackmoore) about 4 years ago
@Eregon (Benoit Daloze) (Benoit Daloze) https://bugs.ruby-lang.org/issues/17100#note-19
Re naming of receiving a message, here is the poll I made:
https://twitter.com/eregontp/status/1299284528578596864
126 votes, 83.3% Ractor.receive, 16.7% Ractor.recv.
I think the result can hardly be clearer.
…
Regarding sending a message, quite a few were concerned of the conflict with Kernel#send (as you can see on the poll).
So I made another poll about sending a message:
https://twitter.com/eregontp/status/1299316723158585348
The results there are much less clear (1/3 #send, 1/3 #<<, 1/3 other).
Can you just cease your manipulative behaviour, based on faulty generalization resoning in this case ? It can only harm, makes no good to anybody.
Regarding BasicSocket#send, some see it as a mistake and I can agree.
It just didn't seem worth the conflict since BasicSocket#send is so rarely used.
And using send instead of send in Ruby code feels ugly, so adding another send just makes it more likely to get the conflict and bring more confusion.
I don't find it a "mistake" but a pragmatic and conscious decision, favour notoriously known naming while not breaking object system rules.
Updated by ko1 (Koichi Sasada) about 4 years ago
- Status changed from Open to Closed
Applied in changeset git|79df14c04b452411b9d17e26a398e491bca1a811.
Introduce Ractor mechanism for parallel execution
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.
[Feature #17100]
This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with Ractor.new
.
I hope this feature can help programmers from thread-safety issues.
Updated by Eregon (Benoit Daloze) about 4 years ago
joanbm (Joan Blackmoore) wrote in #note-20:
I'll disregard your offensive comment, I tried to summarize the poll as good as I can.
This is my interpretation of the poll, and everyone can interpret as they want.
Of course it's a poll and so it did not ask everybody, which is not possible anyway.
Updated by Eregon (Benoit Daloze) about 4 years ago
@mame (Yusuke Endoh) Regarding OptCarrot, I think having deep_freeze
would be a great help.
But I wonder, maybe Object.deep_freeze
would work to freeze all constants very succintly?
Some constants expect to be mutable, so I guess it doesn't work that well on non-small programs.
And OptCarrot.deep_freeze
would probably freeze Object
too since it's accessible via OptCarrot.class.superclass
.
Maybe we could have some control there to clarify what should be frozen (e.g., all modules & constants in the namespace, but not above)?
Maybe it is a good idea to freeze all constants & modules with Ractor.
Here is a problematic case on master
sharing mutable state via define_method
:
$ ruby -ve 'ary=[]; define_method(:add) { |e| ary << e }; add(1); Ractor.new { add(2) }; Ractor.new { add(3) }; sleep 1; p ary'
ruby 3.0.0dev (2020-09-03T12:11:06Z master b52513e2a1) [x86_64-linux]
<internal:ractor>:38: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
[1, 2, 3]
$ ruby -ve 'ary=[]; define_method(:add) { |e| ary << e }; 1000.times { |i| Ractor.new(i) { |i| add(i) } }; sleep 1; p ary'
crashes
I'm not sure what's a good way to solve that one.
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
I find Ractor
very promising 🎉
The impact is severily limited without support for user-based immutable types. Is that planned for 3.0?
I'm curious what is the technical difficulty. I mean, can't existing builtin freeze
check for all instance variables and if all are immutable or shareable, then it also sets the immutable flag for the current object?
Hopefully we will get deep_freeze
. I created #17145
We should probably have Ractor.shareable?(obj) # => true or false
, right?
Bike-shed level:
I also dislike recv
. That abbreviation is not too familiar and it doesn't seem to be worthwhile to save 4 keystrokes a few times. I would expect the number of places where we call receive
to be quite limited.
I dislike even more send
. In Ruby, we send
messages, not values.
I think transmit
would be better than send
, but why not use the same pair of terms for push and pull modes?
# push:
r.yield(obj)
Ractor.receive # in `r`
# pull:
Ractor.yield(obj) # in `r`
r.receive
Ractor.select
: How about reversing the result order (from [ractor, message]
to [message, ractor]
)? I can imagine more cases were we don't care about which ractor we got the result from than the reverse. result, = Ractor.select(*ractor_farm)
is easier to write.
Updated by Eregon (Benoit Daloze) about 4 years ago
- Related to Feature #17145: Ractor-aware `Object#deep_freeze` added
Updated by ko1 (Koichi Sasada) about 4 years ago
Sorry I missed your question.
Eregon (Benoit Daloze) wrote in #note-8:
Instance variables on modules should probably be consistent with constants.
Instances variables of objects inside the Ractor can also be accessed, so it seems weird if @ivars on a shareable value like module cannot be accessed.
I wish @ivar access would not depend on the type or any extra condition (well, except frozen), but I guess that's not possible with Ractors.
I think we can expect "Constants" are defined at once at the beginning of code.
But ivars are defined at runtime anytime and it can lead race-conditions.
I like current restriction. But I also agree the current limitation is too hard to migrate from the current programs.
Ideally (for me) classes/modules should use shareable mutable objects to change the state of them like that:
class C
STATE = Ractor::ShareableHash.new
def update
STATE.transaction do
STATE[:cnt] += 1
end
end
end
BTW, is the ownership of modules tracked?
E.g., what about a Module.new{} created in a Ractor, there I'd guess all access is allowed?
Yes.
This is very very difficult problem I ignored.
Now there is no way to recognize which objects belongs to which Ractors.One possibility is how TruffleRuby implements ObjectSpace.each_object: to use the mark phase of a GC and start from the Ractor's roots.
This might require a precise GC/marking phase though, as some integer in memory shouldn't be interpreted as a valid object from another Ractor.I guess this should be easier once each Ractor has its own heap. Then all those objects are OK + global shareable.
Yes. It is future work (per-ractor object space).
Default unsafe same as TruffleRuby, and now this feature is not supported on PR.
I'm a bit confused with your reply here.
I guess you mean C extensions should be considered by default thread-unsafe, and they need some way to mark thread/parallel/ractor-safe?
But current PR doesn't restrict anything yet?
You are correct.
The explanation above seems to mean "assume safe and allow using C exts in parallel by default:
Now C-methods (methods written in C and defined with rb_define_method()) are run in parallel. It means thread-unsafe code can run in parallel.
To solve this issue, I plan the following: [...] Introduce thread-unsafe labelSo unlabeled functions would be considered Ractor/parallel-safe?
Or do you mean we should add a "ractor/parallel/thread-safe label", so unlabeled/existing functions are considered unsafe and not run in parallel?
Yes.
As you said, most of String/Array operations are already thread-safe. However, for example fstring table is not thread-safe. Encoding table is also not thread-safe, and so on. The challenge is how to find such global resource accesses.
For TruffleRuby, when removing the GIL, I remember going through every field of the "global context" and make sure it's synchronized properly.
Probably this is a good way to find what to synchronize on MRI too.
There are also more tricky cases, like state related to bytecode, inline caches, module state like ivar offset table, etc, which are not directly in the "global context".
Exactly. I'll use your technique.
Good question. Also
$0
,$:
,$stdin
,$stdout
, ... are needed to consider. Now no idea. Should be Rator local? Some global variables are scope local ($1
,$2
,$~
, ...), so I don't think such irregular scope is an issue.I think user (not special) global variables should be Ractor-local. That seems the most compatible and useful.
BTW now $stdin/out/err
is ractor-local, different IO objects (but share same fds).
Updated by ko1 (Koichi Sasada) about 4 years ago
Eregon (Benoit Daloze) wrote in #note-18:
@ko1 (Koichi Sasada) Can you review these potential performance overheads of adding Ractors? (and so potential regressions for code not using Ractors)
I'm thinking notably reading constants will be less efficient, as it will need to check if Ractor.current == main Ractor (then OK to read anything), otherwise check if the value is shareable (by reading a flag?).
Yes. I didn't think it is huge cost.
BUT
CONST = 1
N = 10_000_000
require 'benchmark'
Benchmark.bm{|x|
x.report{
N.times{
v = CONST; v = CONST; v = CONST; v = CONST; v = CONST;
v = CONST; v = CONST; v = CONST; v = CONST; v = CONST;
}
}
}
__END__
user system total real
before Ractor 0.613114 0.002960 0.616074 ( 0.616090)
after Ractor 0.968084 0.000000 0.968084 ( 0.968097)
:p
That's more expensive than the current semantics where e.g. TruffleRuby folds Ruby constants to a constant in compiled code (even with Threads). TruffleRuby could still fold while there is only the main Ractor, but then creating a new Ractor would cause probably unexpected performance degradation by adding the extra check for
if current == main
.
An idea to fix that performance lost is to still fold (with no checks) if the value is shareable, because then it's always OK to read it.
I agree with this hack.
Writing constants will need to check if main Ractor, but writing constants is slow path so that seems fine.
Every instance variable read/write will need to check if the receiver is shareable when there are Ractors. If it is, it will need to additionally check if current == main.
That seems expensive, can we check e.g. on OptCarrot how much it costs?
On optcarrot, I don't see no perf regression. However current implementation lacks correct ivar checking. For example inline cache is not checked.
Same for class variables (but hopefully people don't use it much).
For global variables, we still need to define the semantics.
Some synchronization for VM data structures like the per-class ivar table, but that's mostly slow path so it should be fine.
And that synchronization already exists on Ruby impls without a GIL.
What about method tables? I guess synchronizing might slow down uncached method calls.
JRuby/TruffleRuby already synchronize the method table of course.
Yes. it should be protected, and now it is not protected enough (TODO).
At method lookup, it acquires VM-wide global lock (1 lock).
Anything else I missed?
Ractor.current
uses__thread
and fs/gs registers on CRuby, right?
Using pthread_getspecific()
on pthread (and it accesses segment registers).
There is no problem to use __thread
.
On JVM there is unfortunately no easy way to do that (except by using Thread.currentThread() and subclassing java.lang.Thread to add a field, but that doesn't work for the main Java thread). And Java ThreadLocal is rather slow. Anyway, something to figure out on JVM.
Another way is to pass the current Ractor as a hidden argument through all calls, but that can be an overhead for calls.shareable = "immutable or Module or Ractor" currently. How about making Ractor immutable so it's simply "immutable or Module"? Any reason to keep Ractor instances non-frozen?
That's just a semantics simplification, because either way all these would have the shareable flag set.
I feel Ractor#send
mutates the Ractor's state and frozen Ractor shouldn't accept any messages.
Updated by ko1 (Koichi Sasada) about 4 years ago
I feel Ractor#send mutates the Ractor's state and frozen Ractor shouldn't accept any messages.
However, I don't against to freeze Ractors.
q = Queue.new
q.freeze
q << 1
is working (I'm not sure it is intentional).
How about making Ractor immutable so it's simply "immutable or Module"?
(quote again)
The reason why I categorized "(1) immutable (2) modules (3) specials (includes Ractor)" is, I want to add shareable objects with explicit synchronization to (3), like I mentioned at https://bugs.ruby-lang.org/issues/17100#note-26
Updated by ko1 (Koichi Sasada) about 4 years ago
I don't care about the name recv
and receive
.
Eregon (Benoit Daloze) wrote in #note-19:
@ko1 (Koichi Sasada), @matz (Yukihiro Matsumoto) Which one(s) do you like in this list?
I believe Ractor#send
is best name because it sends an message.
It is conflicts with Ruby's Object#send
, but it should not be a reason to use appropriate naming.
Also Ractor#<<
is already introduced. I recommend you to use it.
BTW, Matz asked me that he want to prepare the "recvfrom" which will return a message with source Ractor.
It means we need to add source ractor information. Usually it is not an issue, but if pipe-like Ractor should care about it.
# recv -> yield adapter type pipe
pipe1 = Ractor.new do
msg, src = Ractor.recvfrom
Ractor.yield msg from: src
end
# recv -> send type pipe
pipe2 = Ractor.new (to) do
msg, src = Ractor.recvfrom
to.send msg from: src
end
I think if recv
is not acceptable, recvfrom
should have better name. Any good idea?
Updated by ko1 (Koichi Sasada) about 4 years ago
Eregon (Benoit Daloze) wrote in #note-23:
Here is a problematic case on
master
sharing mutable state viadefine_method
:
Exactly. define_method
is also the big issue...
The Proc object referred from defined method is not isolated, so we need to prohibit such method calling from other Ractors.
define_method(:double) do |a| a * 2 end
should be allowed, so we may prepare the way to specify they are isolated.
define_method(:double, &Proc.new{|a| a*2}.isolate)
or shorter syntax.
(I forget to propose Proc#isolate
)
Updated by ko1 (Koichi Sasada) about 4 years ago
marcandre (Marc-Andre Lafortune) wrote in #note-24:
I find
Ractor
very promising 🎉
Thanks, I hope so.
The impact is severily limited without support for user-based immutable types. Is that planned for 3.0?
I'm curious what is the technical difficulty. I mean, can't existing builtin
freeze
check for all instance variables and if all are immutable or shareable, then it also sets the immutable flag for the current object?
I introduced shareable
flag into objects (FL_SHAREABLE
in C level).
Once the interpreter found it is immutable, this flag will be set and no need to check twice.
Hopefully we will get
deep_freeze
. I created #17145We should probably have
Ractor.shareable?(obj) # => true or false
, right?
It is easy to implement, but not sure how to use them. debugging purpose?
(I don't against to introduce it)
Ractor.select
: How about reversing the result order (from[ractor, message]
to[message, ractor]
)? I can imagine more cases were we don't care about which ractor we got the result from than the reverse.result, = Ractor.select(*ractor_farm)
is easier to write.
Good idea.
Updated by Eregon (Benoit Daloze) about 4 years ago
ko1 (Koichi Sasada) wrote in #note-28:
Thanks for all your replies!
q = Queue.new q.freeze q << 1
is working (I'm not sure it is intentional).
Surprising to me, I think unintended, I filed #17146.
I believe
Ractor#send
is best name because it sends an message.
"Sending a message" is unfortunately ambiguous, if we consider a "message send" = method call as in Smalltalk.
It might be confusing to name it send
as then people might expect "call a method behavior" like ractor.send(:foo, 1, 2)
would call foo(1, 2)
on ractor
asynchronously.
Example APIs that are somewhat similar:
https://github.com/collectiveidea/delayed_job#queuing-jobs
https://github.com/mperham/sidekiq/wiki/Delayed-extensions
Also
Ractor#<<
is already introduced. I recommend you to use it.
Good to know.
I think if
recv
is not acceptable,recvfrom
should have better name. Any good idea?
Indeed, I think recvfrom
is even more problematic.
I suggest message, sender = Ractor.receive_with_sender
(I mentioned above)
Updated by MaxLap (Maxime Lapointe) about 4 years ago
Every method call in Ruby is a message that is sent. That's why Object#send(name,...)
exists.
Sure, you are sending a message to Ractor, which is a message. But so is sending a packed from a Socket, using any connection, sending a query to a database. At this point, send is basically a reserved method and while __send__
, all those other system did not overwrite send
, I don't see why Ractor should. It just makes things inconsistent.
So please, anything but send
.
my_ractor.accept(value)
my_ractor.send_value(value)
Ractor.wait_value
Updated by Eregon (Benoit Daloze) about 4 years ago
- Related to Feature #17159: extend `define_method` for Ractor added
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
We now have receive
, very good.
What about send
? Should we use yield
? pass
? call
?
Updated by Eregon (Benoit Daloze) about 4 years ago
marcandre (Marc-Andre Lafortune) wrote in #note-35:
What about
send
? Should we useyield
?pass
?call
?
Currently there is the alias Ractor#<<
, maybe we should use that consistently instead of send
in the docs, tests, etc?
(so make send
the alias, or maybe even remove Ractor#send
since it seems prone to conflicts?
I could never get an agreement to remove Ractor#recv, so probably best to establish the name used in docs and keep send
as an alias first)
Personally, I think (full list in https://bugs.ruby-lang.org/issues/17100#note-19 ):
yield
would be confusing with Ractor.yield
.
call
sounds synchronous to me.
pass
seems a bit strange to me given there is Thread.pass
, but not really against.
<<
is what I prefer.
post
sounds good to me.
Updated by ko1 (Koichi Sasada) about 4 years ago
Eregon (Benoit Daloze) wrote in #note-36:
Currently there is the alias
Ractor#<<
, maybe we should use that consistently instead ofsend
in the docs, tests, etc?
I believe send
is the best name, so I disagree about it.
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
Eregon (Benoit Daloze) wrote in #note-36:
Currently there is the alias
Ractor#<<
, maybe we should use that consistently instead ofsend
in the docs, tests, etc?
(so makesend
the alias, or maybe even removeRactor#send
since it seems prone to conflicts?
send
accepts move: true
, so ractor.<<(data, move: true)
looks very odd.
pass
seems a bit strange to me given there isThread.pass
, but not really against.
post
sounds good to me.
I like post
. Anything but send
.
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
ko1 (Koichi Sasada) wrote in #note-37:
I believe
send
is the best name, so I disagree about it.
I understand that. Please consider that many people find this a really bad name though. What would be the second best name? What does Matz think about this?
Updated by Eregon (Benoit Daloze) about 4 years ago
marcandre (Marc-Andre Lafortune) wrote in #note-38:
send
acceptsmove: true
, soractor.<<(data, move: true)
looks very odd.
Good point.
Maybe a good way to avoid/workaround that issue would be to have Ractor#move(obj)
(or a different name)?
But Ractor.yield(obj, move: true)
seems fine, so it's not so nice.
I like <<
in part because it's so clear it takes a single argument, unlike send
, and it corresponds to Queue#<<
, which has similar semantics.
But all alternatives also make it pretty clear they take a single argument (even though not enforced syntactically), so I agree with "Anything but send".
Updated by duerst (Martin Dürst) about 4 years ago
ko1 (Koichi Sasada) wrote in #note-37:
I believe
send
is the best name, so I disagree about it.
Send would be the best name, except that that we are not in a vacuum. It is bad practice to have one and the same name for two fundamentally different and colliding functionalities.
Updated by nobu (Nobuyoshi Nakada) about 4 years ago
duerst (Martin Dürst) wrote in #note-41:
Send would be the best name, except that that we are not in a vacuum. It is bad practice to have one and the same name for two fundamentally different and colliding functionalities.
Agreed.
Kernel#send
should be deprecated because of BasicObject#__send__
.
Updated by Dan0042 (Daniel DeLorme) about 4 years ago
Trying to recap the ideas...
0) Use send
; Kernel#send
should be deprecated because of BasicObject#__send__
(amazing nobu!)
- Use a different name entirely;
accept
,yield
,pass
,call
,post
,queue
,deliver
- Use a suffix, e.g.
send_value
andreceive_value
- Use different methods depending on move semantics, e.g.
send_move
,send_copy
,send_freeze
Updated by Eregon (Benoit Daloze) about 4 years ago
nobu (Nobuyoshi Nakada) wrote in #note-42:
Kernel#send
should be deprecated because ofBasicObject#__send__
.
I think that's not reasonable.
There seem to be ~424649 usages of .send
in gems, vs only ~28004 of .__send__
(https://twitter.com/eregontp/status/1320652814935298050).
It seems most .send
usages mean Kernel#send
.
And it seems many people know .send
, and much fewer .__send__
(or they dislike it).
Deprecating would be a painful change for all these usages.
I think it's fair to say it's not worth it.
So I'm of the opposite opinion, BasicSocket#send should be deprecated (it's used so rarely anyway).
BasicSocket#send is stdlib though, Ractor#send would be core, that would be a much bigger issue.
Since Kernel#send
exists and is widely used, adding Ractor#send would violate the Liskov substitution principle (completely incompatible & unrelated behavior).
In practice I think it would cause a lot of confusion, and probably break some code too.
Let's just use another name.
Quite some discussion here too: https://twitter.com/_ko1/status/1320354888686006274
Updated by ko1 (Koichi Sasada) almost 4 years ago
Ractor#send
naming issue
Recent weeks I'm thinking about the name of Ractor#send
.
Matz and I discussed possibilities long time (this discussions and considerations stop my developments), and we agree to remain the name Ractor#send
as an experimental feature.
Matz confirmed things about Kernel#send
-
BasicSocket#send
andUDPSocket#send
will not be deprecated. UNIX interface should be respected. (This is Matz's decision) -
We can use
Kernel#send
only if we know the receiver doesn't override theKernel#send
method.- The document had implies the possibility of overriding of "send": "You can use
__send__
if the namesend
clashes with an existing method in obj." https://github.com/ruby/ruby/blob/ruby_2_7/vm_eval.c#L1171 - Current document points out clearer: "
__send__
is safer thansend
when obj has the same method name likeSocket
." https://github.com/ruby/ruby/blob/master/vm_eval.c#L1092 - Japanese document write it explicitly: "There is also an alias
__send__
in case__send__
is redefined, and libraries should use it." translated from 「send
が再定義された場合に備えて別名__send__
も 用意されており、ライブラリではこちらを使うべきです」https://docs.ruby-lang.org/ja/1.8.7/method/Object/i/send.html (*1)
- The document had implies the possibility of overriding of "send": "You can use
-
If you don't know the receiver's class, you need to use
__send__
method instead ofKernel#send
to dispatch the method.-
Socket
objects hassend
method as described above. - A rails app created by
rails new
has 3 classes which hassend
other thanKernel#send
.UDPSocket#send(*)
BasicSocket#send(*)
-
Concurrent::Agent#send(*args, &action)
(it is similar to Ractor)
-
-
Matz agreed that
__send__
is ugly (*2), so he is thinking about new method dispatch operator which can replace with__send__
. -
Overriding
Object#send
violates Liskov's substitution principle. However, sometimes Ruby violates this principle.-
send
has already overridden. -
initialize
is not compatible withObject#initialize
. - there are some
load
methods.
-
(*1)
This is an oversimplification, you can use send
if you know the receivers.
(*2)
MY OPINION (not a fact): BTW I don't think __send__
is ugly (because I'm a C programmer?). It is clear that something special (meta-programming) and it increases the readability for me.
Considerations¶
Ruby prefers duck-typing and it is friendly that same method name should behave same functionality (maybe people saying "Liskov's substitution principle" want to argue about it). If #write
method only reads something, it is surprising. I believe the problem of the name "send" is this one issue.
If a reader finds the code r1.send(message)
and it takes some seconds to understand that it is either Ractor#send
or Kernel#send
, it's a problem to consider. For example, I found that racotr.send(obj)
in method_missing
is a bit confusing. https://gist.github.com/ko1/cee945e8a9fa5ef4e23ce96d3980e11c#file-activeobject-rb-L18
I believe there is no confusion with Ractor#send
and Kernel#send
in most of cases if the receiver is clear. So I estimated the confusion problem is limited in practice.
We discussed alternative names but we can not find the best name to replace the "send".
-
send_message(obj)
: Matz doesn't like it because in OOP the name "message" means method selector (name) with parameters. -
deliver(obj)
: Aaron pointed out that whendeliver
method finished, the received should be finished. -
post(obj)
: I can't accept this method name because it is strongly connected with mailing system.
(and other names)
In "Actor" context, "send" is the most clear word to represent the behavior because of historical reasons. Many papers and language references use this word.
I thought that naming it as other name (like post
) and after introducing new method dispatch operator, rename (or add an alias) it to send
is one idea. However, it is impossible because people can use Kernel#send
for ractor objects before renaming it.
Finally, we can not find enough advantage of solving limited naming confusion problem by not being able to use appropriate name.
Fortunately, "Ractor" and Ractor APIs are experimental features (they shows mandatory experimental warning), so we can change the name "send" if it has critical issues.
("A library doesn't work well because it doesn't use __send__
" is not a problem because it does not support all objects without Ractor#send
)
So let's try Ractor#send
on Ruby 3.0 as experimental name.
Updated by Eregon (Benoit Daloze) almost 4 years ago
Thank you for the detailed explanation, that makes sense to me.
I am sure it will result in some confusion, but hopefully it will be rare in practice.
ko1 (Koichi Sasada) wrote in #note-45:
- Matz agreed that
__send__
is ugly (*2), so he is thinking about new method dispatch operator which can replace with__send__
.
Yes, I think that would be good.
A new module with singleton methods might be an elegant way to make some basic methods which must not be redefined available.
That module could just be frozen to avoid redefinitions (the constant could still be re-assigned, but we could explicitly raise/warn against it).
I find it interesting that so far we only needed a non-redefined variant for send
(__send__
) and object_id
(__id__
), and not other Kernel/Object/BasicObject methods.
And in practice it seems very rare to override object_id
, so it seems it's really just send
being overridden for unrelated semantics.
I see it as the Liskov's substitution principle being mostly respected.
BTW, any initialize
method is a superset of/is compatible with BasicObject#initialize
, since that does nothing.
post(obj)
: I can't accept this method name because it is strongly connected with mailing system. (and other names)
Isn't it common actor terminology to receive a message in a mailbox? (e.g. https://erlang.org/doc/reference_manual/processes.html)
So I think the connection to mailing has been there for a while.
Updated by matz (Yukihiro Matsumoto) almost 4 years ago
send
is a term traditionally used in the actor context. We'd like to experiment how it works well (or bad) for Rubu3.0.
Matz.