Bug #10511
closedfix for #9940 causes dramatic performance regression
Description
Originally reported against debase gem (https://github.com/denofevil/debase/issues/16)
After fix for #9940 running anything under debugger became about 4 times slower.
I've verified that we I revert the fix performance is restored.
To reproduce you could get https://github.com/railstutorial/sample_app_rails_4 and run rspec rake task under debugger (I've used RubyMine for this because debase is used as
debugger's backend there)
Performance with 2.1.2 it takes about 20 seconds on my laptop, with 2.1.3 - almost 80 seconds :(
The same problem is observable with 2.1.4 and 2.1.5 too.
Updated by ko1 (Koichi Sasada) about 10 years ago
Thank you for your report.
Could you make small benchmark program to debug it?
Updated by denofevil (Dennis Ushakov) about 10 years ago
I was not able to achieve 4x on a simple benchmark, but the difference is quite big:
require 'benchmark'
N = 10000
RECURSION = 1000
def recurse(i)
recurse(i - 1) if i > 0
end
TracePoint.new(:call, :return, :c_call, :c_return, :line){|tp|
tp.lineno
}.enable
puts Benchmark.measure { N.times { recurse(RECURSION) } }
Results are:
2.1.2 - 14.000000 0.070000 14.070000 ( 14.640882) 2.1.5 - 16.980000 0.030000 17.010000 ( 17.294063)
Results without tracepoint:
2.1.2 - 0.810000 0.000000 0.810000 ( 0.890800) 2.1.5 - 0.830000 0.000000 0.830000 ( 0.855288)
(almost no difference between versions)
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
I've tried the simple benchmark with 2.1.3 I've built myself and results are:
9.420000 0.010000 9.430000 ( 9.449823)
Than I've removed for #9940, rebuilt 2.1.3 and results are
8.200000 0.030000 8.230000 ( 8.265794)
Updated by ko1 (Koichi Sasada) about 10 years ago
ruby 2.2.0dev (2014-11-17 trunk 48465) [i386-mswin32_110]
21.466000 0.000000 21.466000 ( 21.572857)
ruby 2.0.0p402 (2014-02-11 revision 44911) [i386-mswin32_110]
13.634000 0.000000 13.634000 ( 13.737744)
I try on my laptop. hmm...
Updated by ko1 (Koichi Sasada) about 10 years ago
- Assignee set to ko1 (Koichi Sasada)
- Target version set to 2.2.0
Updated by ko1 (Koichi Sasada) about 10 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
Applied in changeset r48596.
- vm_core.h: add rb_thread_t::local_storage_recursive_hash
to speed up Thread#[:recursive_key] access.
[Bug #10511] - thread.c (threadptr_local_aref): add fast path for
:recursive_data. - thread.c (threadptr_recursive_hash, threadptr_recursive_hash_set):
add special accessor for recursive hash. - cont.c: store/restore local_storage_recursive_hash.
- vm.c: init and mark local_storage_recursive_hash.
- vm_trace.c (rb_threadptr_exec_event_hooks_orig): clear and restore
local_storage_recursive_hash directly.
Updated by ko1 (Koichi Sasada) about 10 years ago
- Status changed from Closed to Feedback
could you try r48596?
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
Koichi Sasada wrote:
could you try r48596?
Here are results I've got.
$ ruby -v
ruby 2.2.0dev (2014-11-28 trunk 48629) [x86_64-darwin14]
$ ruby /tmp/bm.rb
6.660000 0.080000 6.740000 ( 7.153794)
$ ruby /tmp/bm.rb
6.710000 0.080000 6.790000 ( 7.119988)
$ rvm ruby-2.1.5 do ruby /tmp/bm.rb
10.230000 0.110000 10.340000 ( 10.906019)
$ rvm ruby-2.1.3 do ruby /tmp/bm.rb
10.130000 0.100000 10.230000 ( 10.585034)
$ rvm ruby-2.1.2 do ruby /tmp/bm.rb
8.210000 0.080000 8.290000 ( 8.507865)
Unfortunately I was unable to try the rails app I've used because sqlite can not be built with this version :(
Will ask others to try their tests.
Updated by ko1 (Koichi Sasada) about 10 years ago
Thank you for try. Can I close it?
Should we wait for your rails test?
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
Koichi Sasada wrote:
Thank you for try. Can I close it?
Should we wait for your rails test?
I think the ticket can be closed. Rails app still shows some performance degradation, but it is unclear if it is ruby or ruby-debug-ide.
I will investigate this and (in case it is ruby) file a new ticket linked to this one.
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
I think I've found one more source of performance degradation:
in debase we keep contexts for threads in hash and on every event get them from there. And it looks like "hash[Thread.current]" became significantly slower in TracePoint's hendler's context.
Here is a simple test which shows the problem:
require 'benchmark'
N = 10000
RECURSION = 1000
def recurse(i)
recurse(i - 1) if i > 0
end
theHash = Hash.new
TracePoint.new(:call, :return, :c_call, :c_return, :line){|tp|
tp.lineno
theHash[Thread.current]
}.enable
puts Benchmark.measure { N.times { recurse(RECURSION) } }
in my case it shows:
$ rvm ruby-2.1.2 do ruby /tmp/bm.rb
44.080000 0.330000 44.410000 ( 45.207792)
$ rvm ruby-head do ruby /tmp/bm.rb
72.170000 2.040000 74.210000 ( 76.259147)
Note: I've tried to replace "theHash[Thread.current]" with either "theHash[1]" or "Thread.current.hash" but such tests do not show any performance degradation, it looks like it is a combination of hash access and thread causes the problem.
Updated by normalperson (Eric Wong) about 10 years ago
os97673@gmail.com wrote:
I think I've found one more source of performance degradation:
I cannot reproduce the regression you see on Linux systems.
In fact, I see improvements:
x86_64-linux (Xeon E3-1230 v3):
2.1.5: 44.440000 0.000000 44.440000 ( 44.482789)
r48706: 32.410000 0.000000 32.410000 ( 32.437961)
i686-linux (Xen VM):
2.1.5: 80.300000 0.030000 80.330000 ( 80.337652)
r48706: 58.050000 0.000000 58.050000 ( 58.822390)
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
Eric Wong wrote:
os97673@gmail.com wrote:
I think I've found one more source of performance degradation:
I cannot reproduce the regression you see on Linux systems.
In fact, I see improvements:x86_64-linux (Xeon E3-1230 v3):
2.1.5: 44.440000 0.000000 44.440000 ( 44.482789)
r48706: 32.410000 0.000000 32.410000 ( 32.437961)i686-linux (Xen VM):
2.1.5: 80.300000 0.030000 80.330000 ( 80.337652)
r48706: 58.050000 0.000000 58.050000 ( 58.822390)
you need to compare performance of r48706 and 2.1.2 since the problem was introduced in 2.1.3.
r48706 is definitely better than 2.1.5, but 2.1.2 is still much faster.
Updated by normalperson (Eric Wong) about 10 years ago
Oops, OK. I was suspicious of [Feature #9325] hurting performance
(for cases where a hash function needs tuning).
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
Updated by normalperson (Eric Wong) about 10 years ago
Oops, [Feature #9425] (power-of-two hash sizes :)
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
Eric Wong wrote:
Oops, [Feature #9425] (power-of-two hash sizes :)
If the problem is in implementation of hash function then I'd expect that simple Thread.current.hash instead of theHash[Thread.current] would show a performance degradation, but (as far as I can see) it is not.
If the cause of the problem is in Hash implementation than I'd expect that theHash[1] would show performance degradation too, but it is not.
IMHO it is something more subtle and it looks like it is somehow relates to #9940.
Updated by ko1 (Koichi Sasada) about 10 years ago
I found strange behaviour.
Using Hash.new, it is slow as you say:
Hash.new
ruby 2.2.0dev (2014-12-02 trunk 48684) [i386-mswin32_110]
5.944000 0.000000 5.944000 ( 5.979824)
ruby 2.0.0p402 (2014-02-11 revision 44911) [i386-mswin32_110]
2.137000 0.000000 2.137000 ( 2.152773)
but using Hash literal ({}), it is not slow:
{}
ruby 2.2.0dev (2014-12-02 trunk 48684) [i386-mswin32_110]
1.357000 0.000000 1.357000 ( 1.352916)
ruby 2.0.0p402 (2014-02-11 revision 44911) [i386-mswin32_110]
1.607000 0.000000 1.607000 ( 1.606704)
What difference?
Updated by ko1 (Koichi Sasada) about 10 years ago
I understood.
First hash (Hash.new) contains empty st table.
Second hash ({}) does not have st table (NULL).
So the performance is different.
Making an entry for second hash like that:
theHash = {}
theHash[Thread.current] = nil
also shows slow down:
ruby 2.2.0dev (2014-12-02 trunk 48684) [i386-mswin32_110]
6.084000 0.000000 6.084000 ( 6.107212)
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
Perhaps there is some VM optimization? As for the test: I've just mimic the code we have in debase native code, I doubt we can use {} there :(
Updated by ko1 (Koichi Sasada) about 10 years ago
Ok, I understand the reason.
For each hash access, new Hash object are created each time.
You can check this behavior by that:
p GC.stat[:total_allocated_objects]
puts Benchmark.measure { N.times { recurse(RECURSION) } }
p GC.stat[:total_allocated_objects]
result:
53968
5.132000 0.000000 5.132000 ( 5.128507)
6062275
This is because clear recursive hash every time. Let me think about it. Maybe recycling such recursive hash can solve.
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
Koichi Sasada wrote:
Ok, I understand the reason.
For each hash access, new Hash object are created each time.
do you mean that it is your code who creates new hash every time?
Updated by ko1 (Koichi Sasada) about 10 years ago
- Status changed from Feedback to Closed
Applied in changeset r48765.
- vm_core.h: introduce new field
rb_thread_t::local_storage_recursive_hash_for_trace to store
recursive hash to avoid creating new recursive (nested) hashes
for each trace events.
[Bug #10511] - vm_trace.c (rb_threadptr_exec_event_hooks_orig): use it.
- cont.c: catch up this fix.
- vm.c (rb_thread_mark): ditto.
Updated by ko1 (Koichi Sasada) about 10 years ago
- Status changed from Closed to Feedback
could you try r48765?
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
Koichi Sasada wrote:
could you try r48765?
I've tried and it works ok for me (both simple benchmark and rails app + RubyMine debugger)
Asked our users to test too. Will let you know if they will find any problems.
BTW thank you for your work.
Updated by os97673 (Oleg Sukhodolsky) about 10 years ago
Oleg Sukhodolsky wrote:
Koichi Sasada wrote:
could you try r48765?
I've tried and it works ok for me (both simple benchmark and rails app + RubyMine debugger)
Asked our users to test too. Will let you know if they will find any problems.
users have confirmed that the problem is fixed.
Updated by ko1 (Koichi Sasada) about 10 years ago
- Status changed from Feedback to Closed
Thank you for confirmation!
Updated by nagachika (Tomoyuki Chikanaga) about 10 years ago
- Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: DONTNEED, 2.1: REQUIRED
see #10597