Bug #19288
openRactor JSON parsing significantly slower than linear parsing
Description
a simple benchmark:
require 'json'
require 'benchmark'
CONCURRENT = 5
RACTORS = true
ELEMENTS = 100_000
data = CONCURRENT.times.map do
ELEMENTS.times.map do
{
rand => rand,
rand => rand,
rand => rand,
rand => rand
}.to_json
end
end
ractors = CONCURRENT.times.map do
Ractor.new do
Ractor.receive.each { JSON.parse(_1) }
end
end
result = Benchmark.measure do
if RACTORS
CONCURRENT.times do |i|
ractors[i].send(data[i], move: false)
end
ractors.each(&:take)
else
# Linear without any threads
data.each do |piece|
piece.each { JSON.parse(_1) }
end
end
end
puts result
Gives following results on my 8 core machine:
# without ractors:
2.731748 0.003993 2.735741 ( 2.736349)
# with ractors
12.580452 5.089802 17.670254 ( 5.209755)
I would expect Ractors not to be two times slower on the CPU intense work.
Updated by Eregon (Benoit Daloze) about 2 years ago
It would be more fair to Ractor.make_shareable(data)
first.
But even with that Ractor is slower:
no Ractor:
2.748311 0.003002 2.751313 ( 2.763541)
Ractor
9.939530 5.816431 15.755961 ( 4.289792)
This high system
time seems strange.
Probably lock contention for allocations?
Updated by Eregon (Benoit Daloze) about 2 years ago
Also that script creates Ractors even in "linear" mode.
With the fixed script below:
2.040496 0.002988 2.043484 ( 2.048731)
i.e. it's also quite a bit slower if any Ractor is created.
Script:
require 'json'
require 'benchmark'
CONCURRENT = 5
RACTORS = ARGV.first == "ractor"
ELEMENTS = 100_000
data = CONCURRENT.times.map do
ELEMENTS.times.map do
{
rand => rand,
rand => rand,
rand => rand,
rand => rand
}.to_json
end
end
if RACTORS
Ractor.make_shareable(data)
ractors = CONCURRENT.times.map do
Ractor.new do
Ractor.receive.each { JSON.parse(_1) }
end
end
end
result = Benchmark.measure do
if RACTORS
CONCURRENT.times do |i|
ractors[i].send(data[i], move: false)
end
ractors.each(&:take)
else
# Linear without any threads
data.each do |piece|
piece.each { JSON.parse(_1) }
end
end
end
puts result
Updated by luke-gru (Luke Gruber) about 2 years ago
I just took a look at this and it looks like the culprit is the c dtoa function that's called in the json parser, specifically a helper function Balloc
. It uses a lock for some reason shrug.
Edit: It looks like in ruby's missing/dtoa.c, the lock function is a no-op. If that version of dtoa.c is used in your Ruby then it isn't that. My ruby is using the missing/dtoa.c and running the perf tool with this script it points to Balloc
being the main issue. Something funny is going on in that Balloc
function. I think it's the malloc() calls that are locking the malloc arena lock, and the lock contention is there, but that's just a guess.
Updated by maciej.mensfeld (Maciej Mensfeld) about 2 years ago
I find this issue important and if mitigated, it would allow me to release production-grade functionalities that would benefit users of the Ruby language.
I run an OSS project called Karafka (https://github.com/karafka/karafka) that allows for processing Kafka messages using multiple threads in parallel. For non-IO bound cases, the majority of the time of users whom use-cases I know is spent on data deserialization (> 80%). JSON is by far the most popular format that is also conveniently supported natively by Ruby. While providing true parallelism around the whole processing may not be easy due to a ton of synchronization around the whole process, the atomicity of messages deserialization makes it an ideal case of using Ractors.
- Data can be sent there, and results can be transferred without interdependencies.
- Each message is atomic; hence their deserialization can run in parallel.
- All message deserialization requests can be sent to a generic queue from which Ractors could consume.
I am not an expert in the Ruby code, but if there is anything I could help with to move this forward, please just ping me.
Updated by luke-gru (Luke Gruber) about 2 years ago
I've notified the flori/json people (https://github.com/flori/json/issues/511)
So to update everyone, the dtoa function is called during json generation, not parsing. As this script does both, it's hard to measure it using perf tools. You have to run
the generation part of the script alone and look at it the perf report, then compare it against running the generation and the parsing (both with ractors and without).
Updated by luke-gru (Luke Gruber) almost 2 years ago
Here's a simple reproduction showing that the problem is not send/receive:
RACTORS = ARGV.first == "ractor"
J = { rand => rand }.to_json
Ractor.make_shareable(J)
if RACTORS
rs = []
10.times.each do
rs << Ractor.new do
i = 0
while i < 100_000
JSON.parse(J)
i+=1
end
end
end
rs.each(&:take)
else
1_000_000.times do
JSON.parse(J)
end
end
The ractor example should take less time, but it doesn't.
Updated by Eregon (Benoit Daloze) almost 2 years ago
maciej.mensfeld (Maciej Mensfeld) wrote in #note-4:
I find this issue important and if mitigated, it would allow me to release production-grade functionalities that would benefit users of the Ruby language.
Note that Ractor is far from production-ready.
It has many issues as can be found on this bug tracker and when using it and as the warning says (Also there are many implementation issues.
).
Also the fact that the main Ruby test suites don't run any Ractor test in the same process also seems an indication of instability.
And then of course there is the issue that Ractor is incompatible with most gems/code out there.
While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.
Other Rubies have solved this in a much more efficient, usable and reliable way, by having no GVL.
Updated by maciej.mensfeld (Maciej Mensfeld) almost 2 years ago
Note that Ractor is far from production-ready.
I am well aware. I just provide a justification and since my case seems to fit the limited scope of this functionality, I wanted to raise the attention.
While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.
This is exactly why I want to get a limited functionality that anyhow would allow me to parallelize the processing.
Other Rubies have solved this in a much more efficient, usable and reliable way, by having no GVL.
I am also aware of this :)
Updated by luke-gru (Luke Gruber) almost 2 years ago
It has many issues as can be found on this bug tracker and when using it and as the warning says (Also there are many implementation issues.).
I think the implementation issues are solvable but the bigger picture issue of adoption is of course up in the air. IMO if are allowed to have an API, for example, of Ractor.disable_isolation_checks! { ... } for use around thread-safe code, that would be a big win in my book.
Also about the test-suite, I do want to add in-process ractor tests. I hope the ruby core team isn't against it.
Updated by maciej.mensfeld (Maciej Mensfeld) almost 2 years ago
I think the implementation issues are solvable but the bigger picture issue of adoption is of course up in the air.
The first step to adoption is to have a case for it that could be used. I believe the case I presented is viable and should be considered.
Updated by luke-gru (Luke Gruber) almost 2 years ago
This PR I made to JSON repository is related: https://github.com/flori/json/pull/512
Updated by duerst (Martin Dürst) almost 2 years ago
Eregon (Benoit Daloze) wrote in #note-7:
And then of course there is the issue that Ractor is incompatible with most gems/code out there.
While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.
Other Rubies have solved this in a much more efficient, usable and reliable way, by having no GVL.
But don't other Rubies rely on the programmer to know how to program with threads? That's only usable if you're used to programming with threads and avoid the related issues. The idea (where the implementation and many gems may still have to catch up) behind Ractor is that thread-related issues such as data races can be avoided at the level of the programming model.
Updated by maciej.mensfeld (Maciej Mensfeld) almost 2 years ago
And then of course there is the issue that Ractor is incompatible with most gems/code out there.
While JSON loading might work, any non-trivial processing after using a gem is unlikely to work well.
We need to start somewhere. Even if trivial/isolated cases work, if they work well, they can act as the first milestone to usage of this API for commercial benefits and I am willing to take the risk ;)
Updated by Eregon (Benoit Daloze) almost 2 years ago
duerst (Martin Dürst) wrote in #note-12:
But don't other Rubies rely on the programmer to know how to program with threads? That's only usable if you're used to programming with threads and avoid the related issues. The idea (where the implementation and many gems may still have to catch up) behind Ractor is that thread-related issues such as data races can be avoided at the level of the programming model.
We're getting a bit off-topic, but I believe not necessarily. And the GIL doesn't prevent most Ruby-level threading issues, so in that matter it's almost the same on CRuby.
For example I would think many Ruby on Rails devs don't know well threading, and they don't need to, even though webservers like Puma use threads.
Deep knowledge of multithreading is needed e.g. when creating concurrent data structures, but using them OTOH doesn't require much.
I would think for most programmers, using threads is much easier and more intuitive than having the big limitations of Ractor which prevent sharing any state, especially in an imperative and stateful language like Ruby where almost everything is mutable.
IMO Ractors are way more difficult to use than threads. They can also have some sorts of race conditions due to message order, so it's not that much safer either. And it's a lot less efficient for any communication between ractors vs threads (Ractor copy or move both need a object graph walk).
Updated by maciej.mensfeld (Maciej Mensfeld) over 1 year ago
I want to revisit our discussion about leveraging Ruby Ractors for parallel JSON parsing. It appears there hasn't been much activity on this thread for a long time.
I found it pertinent to mention that during the recent RubyKaigi conference, Koichi Sasada highlighted the need for real-life/commercial use-cases to showcase Ractors' potential. To that end, I wanted to bring forth that I do have a practical, commercial scenario. Karafka handles parsing of thousands or more of JSONs in parallel. Having Ractors support in such a context could substantially enhance performance, providing a tangible benefit to the end users.
Given this real-life use case, are there any updates or plans to continue work on allowing Ractors to operate faster in the presented-by-me scenario? It would indeed be invaluable for many of users working with Kafka in Ruby. While the end-user processing of data still will have to happen in a single Ractor, parsing seems like a great example where immutable raw payload can be shipped to independent ractors and frozen deserialized payloads can be shipped back.