Bug #17373
closedRuby 3.0 is slower at Discourse bench than Ruby 2.7
Description
We have a continuous effort to keep https://rubybench.org/ up to date. Unfortunately we have lapsed a bit on updates so I decided to run Discourse bench by hand.
On my local system I am noticing the timing are much slower on Discourse bench than they were on Ruby 2.7:
98c9120cc00ba691b4abcc13a49a30fa54638535
---
categories:
50: 34
75: 35
90: 36
99: 40
home:
50: 47
75: 50
90: 56
99: 92
topic:
50: 53
75: 55
90: 58
99: 92
categories_admin:
50: 33
75: 34
90: 38
99: 68
home_admin:
50: 45
75: 47
90: 51
99: 95
topic_admin:
50: 51
75: 52
90: 54
99: 59
timings:
load_rails: 1449
ruby-version: 3.0.0-p-1
rss_kb: 293808
pss_kb: 284244
2.7.2
---
categories:
50: 29
75: 30
90: 33
99: 43
home:
50: 42
75: 43
90: 47
99: 73
topic:
50: 47
75: 47
90: 52
99: 54
categories_admin:
50: 30
75: 31
90: 36
99: 56
home_admin:
50: 41
75: 43
90: 48
99: 71
topic_admin:
50: 47
75: 48
90: 52
99: 55
timings:
load_rails: 1339
ruby-version: 2.7.2-p137
rss_kb: 296836
pss_kb: 287310
2.6.6
---
categories:
50: 30
75: 30
90: 31
99: 51
home:
50: 40
75: 42
90: 60
99: 79
topic:
50: 47
75: 48
90: 53
99: 71
categories_admin:
50: 30
75: 31
90: 35
99: 50
home_admin:
50: 41
75: 43
90: 62
99: 100
topic_admin:
50: 48
75: 49
90: 65
99: 93
timings:
load_rails: 1464
ruby-version: 2.6.6-p146
rss_kb: 332268
pss_kb: 322865
Concretely this means a typical "homepage" view is taking 47 milliseconds now, when in the past it was taking 42 or even 40 ms.
There may be some issues with the bench, you can see it here:
https://github.com/discourse/discourse/tree/ruby-3
The script to run the bench is ruby script/bench.rb
I re-tested against a cleanly built Ruby 2.7 direct from the Git branch to ensure compilation was not at fault.
This is pretty concerning, Discourse bench usually tracks Rails performance pretty accurately.
Files
Updated by k0kubun (Takashi Kokubun) about 4 years ago
Could you, or anybody reading this ticket, use the benchmark to find revisions which seem to have contributed to the performance decrease? Detecting which change was bad has been the most time-consuming part even while we have a good benchmark. If we know what revision should be fixed, at the very least we'd have an option to revert it.
I'm aware of the performance trend and I've actually spent some time for looking at graphs like https://rubybench.org/ruby/ruby/commits?result_type=Optcarrot%20Lan_Master.nes&display_count=2000, but when we benchmark a suspicious revision and its previous one found from the graph locally at the same time, results are often flipped and we've failed to find problematic commits that way.
Updated by tenderlovemaking (Aaron Patterson) about 4 years ago
- File aaron — perf _home_aaron_git_railsbench — ssh whiteclaw.local — 175×54 2020-12-07 12-41-12.png aaron — perf _home_aaron_git_railsbench — ssh whiteclaw.local — 175×54 2020-12-07 12-41-12.png added
- File aaron — perf _home_aaron_git_railsbench — ssh whiteclaw.local — 175×54 2020-12-07 12-41-58.png aaron — perf _home_aaron_git_railsbench — ssh whiteclaw.local — 175×54 2020-12-07 12-41-58.png added
- File aaron — perf _home_aaron_git_railsbench — ssh whiteclaw.local — 299×65 2020-12-07 13-19-20.png aaron — perf _home_aaron_git_railsbench — ssh whiteclaw.local — 299×65 2020-12-07 13-19-20.png added
Here is what I've found so far:
I've been testing with RailsBench. I ran this command:
RAILS_ENV=production INTERVAL=100 WARMUP=1 BENCHMARK=10000 perf record -F 99 --call-graph dwarf (which ruby) bin/bench
Then I used this command to find where time is spent:
perf report -g graph --no-children
Here is the report on Ruby 2.7:
Here is the report on Ruby 3.0 (master):
rb_id_table_lookup
is a new line in the Ruby 3.0 report. Looking at the callers, it looks like these are coming from the slow path method lookup:
b9007b6c54 touched these methods, so I ran the benchmarks on that commit:
aaron@whiteclaw ~/g/railsbench (ruby-3.0)> RAILS_ENV=production INTERVAL=100 WARMUP=1 BENCHMARK=10000 ruby -v bin/bench
ruby 2.8.0dev (2020-02-22T00:58:59Z pz-mvh-updated-mas.. b9007b6c54) [x86_64-linux]
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32454: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32615: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32651: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32740: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32756: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32822: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32863: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32888: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:31984: warning: assigned but unused variable - testEof
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/functions.rb:7: warning: method redefined; discarding old asset_data_url
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_processor.rb:274: warning: previous definition of asset_data_url was here
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/compressor.rb:7: warning: method redefined; discarding old initialize
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_compressor.rb:39: warning: previous definition of initialize was here
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/compressor.rb:17: warning: method redefined; discarding old call
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_compressor.rb:49: warning: previous definition of call was here
Warming up...
Warmup: 1 requests
Benchmark: 10000 requests
Request per second: 993.5 [#/s] (mean)
Percentage of the requests served within a certain time (ms)
50% 0.98
66% 0.99
75% 0.99
80% 1.00
90% 1.01
95% 1.03
98% 1.60
99% 2.14
100% 10.14
Mean request per second is 993.5.
If I run with the previous commit f2286925f0 here are the results:
aaron@whiteclaw ~/g/railsbench (ruby-3.0)> RAILS_ENV=production INTERVAL=100 WARMUP=1 BENCHMARK=10000 ruby -v bin/bench
ruby 2.8.0dev (2020-02-22T00:58:59Z pz-mvh-updated-mas.. f2286925f0) [x86_64-linux]
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32454: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32615: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32651: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32740: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32756: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32822: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32863: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:32888: warning: statement not reached
/home/aaron/.gem/ruby/2.7.3/gems/mail-2.7.1/lib/mail/parsers/address_lists_parser.rb:31984: warning: assigned but unused variable - testEof
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/functions.rb:7: warning: method redefined; discarding old asset_data_url
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_processor.rb:274: warning: previous definition of asset_data_url was here
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/compressor.rb:7: warning: method redefined; discarding old initialize
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_compressor.rb:39: warning: previous definition of initialize was here
/home/aaron/.gem/ruby/2.7.3/gems/sassc-rails-2.1.2/lib/sassc/rails/compressor.rb:17: warning: method redefined; discarding old call
/home/aaron/.gem/ruby/2.7.3/gems/sprockets-4.0.1/lib/sprockets/sass_compressor.rb:49: warning: previous definition of call was here
Warming up...
Warmup: 1 requests
Benchmark: 10000 requests
Request per second: 1164.9 [#/s] (mean)
Percentage of the requests served within a certain time (ms)
50% 0.85
66% 0.85
75% 0.85
80% 0.86
90% 0.86
95% 0.87
98% 0.89
99% 1.76
100% 3.44
Mean RPS is 1164.
I think b9007b6c54 introduced the performance problem, but I'm not sure why. The caller lookup_method_table
didn't change, it just got moved from here to here, and rb_id_table_lookup
is exactly the same too.
My current guess is that we're getting cache misses more frequently for some reason. I will look in to that now, but I wanted to share what I've found so far.
Updated by tenderlovemaking (Aaron Patterson) about 4 years ago
I backported the mc_search
debug counter. On Ruby master, search_method
is called 2,480,052 times, but on f2286925f08406bc857f7b03ad6779a5d61443ae it's only called 501,903 times. I think this is the source of our performance degradation, and it seems related to cache misses.
Updated by Eregon (Benoit Daloze) about 4 years ago
- Related to Feature #16614: New method cache mechanism for Guild added
Updated by tenderlovemaking (Aaron Patterson) about 4 years ago
It looks like the global method cache was removed.
On f2286925f0 the debug counters looked like this:
[RUBY_DEBUG_COUNTER] mc_inline_hit 54,445,333
[RUBY_DEBUG_COUNTER] mc_inline_miss 1,800,036
[RUBY_DEBUG_COUNTER] mc_global_hit 14,489,677
[RUBY_DEBUG_COUNTER] mc_global_miss 790,656
mc_global_hit
is the number of times the global method cache was hit. I think for Rails, the problem is with respond_to. It looks like Object#respond_to?
used to use the global method cache:
Here is a benchmark that demonstrates the impact of removing the global method cache:
require "benchmark/ips"
class Base
def bar; end
end
class Foo
30.times do |i|
class_eval "module M#{i}; end; include M#{i}"
end
end
foo = Foo.new
Benchmark.ips do |x|
x.report("respond to") { foo.respond_to?(:bar) }
end
On master:
aaron@whiteclaw ~/git> ruby/ruby -v test.rb
ruby 3.0.0dev (2020-12-07T17:10:03Z master 146b3695c0) [x86_64-linux]
Warming up --------------------------------------
respond to 585.997k i/100ms
Calculating -------------------------------------
respond to 5.869M (± 0.9%) i/s - 29.886M in 5.092708s
On f2286925f0:
aaron@whiteclaw ~/git> ruby/ruby -v test.rb
ruby 2.8.0dev (2020-02-22T00:58:59Z before-imc f2286925f0) [x86_64-linux]
Warming up --------------------------------------
respond to 1.686M i/100ms
Calculating -------------------------------------
respond to 16.888M (± 0.6%) i/s - 85.971M in 5.090695s
f2286925f0 is about 3x faster than master.
It seems that the global method cache was important for Rails benchmarks. I guess we can't have a global method cache because of Ractors? Would it be possible to make a per-Ractor "global" cache?
Updated by k0kubun (Takashi Kokubun) about 4 years ago
https://github.com/ruby/ruby/pull/2583 was an attempt to reduce method cache misses on Discourse, and IIRC https://github.com/ruby/ruby/pull/2888 removed it. It might be related at least for some extent.
Updated by sam.saffron (Sam Saffron) about 4 years ago
I made this tiny script, which you can use to test (in the branch in the first post)
#!/usr/bin/env ruby
# frozen_string_literal: true
ENV['RAILS_ENV'] = 'profile'
require File.expand_path("../../config/environment", __FILE__)
def make_request
req = {
"REQUEST_METHOD" => "GET",
"SCRIPT_NAME" => "",
"PATH_INFO" => "/",
"QUERY_STRING" => "",
"SERVER_NAME" => "127.0.0.1",
"SERVER_PORT" => "80",
"REQUEST_PATH" => "/",
"REMOTE_ADDR" => "127.0.0.1",
"HTTP_VERSION" => "HTTP/1.0",
"rack.input" => StringIO.new(""),
"rack.version" => [1, 2],
"rack.url_scheme" => "http"
}
Rails.application.call(req)
end
10.times do
make_request
end
3.times do
GC.start(full_mark: true, immediate_sweep: true)
end
start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
RubyVM.reset_debug_counters
make_request
RubyVM.show_debug_counters
elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - start
puts "Duration was #{elapsed}"
gets
Results for 2.7 are:
[RUBY_DEBUG_COUNTER] 417420 method call
[RUBY_DEBUG_COUNTER] mc_inline_hit 407,308
[RUBY_DEBUG_COUNTER] mc_inline_miss 17,502
[RUBY_DEBUG_COUNTER] mc_global_hit 86,554
[RUBY_DEBUG_COUNTER] mc_global_miss 207
[RUBY_DEBUG_COUNTER] mc_global_state_miss 2,762
[RUBY_DEBUG_COUNTER] mc_class_serial_miss 14,740
[RUBY_DEBUG_COUNTER] mc_cme_complement 10,191
[RUBY_DEBUG_COUNTER] mc_cme_complement_hit 10,191
[RUBY_DEBUG_COUNTER] mc_search_super 2,770
[RUBY_DEBUG_COUNTER] mc_miss_by_nome 0
[RUBY_DEBUG_COUNTER] mc_miss_by_distinct 10,348
[RUBY_DEBUG_COUNTER] mc_miss_by_refine 0
[RUBY_DEBUG_COUNTER] mc_miss_by_visi 218
[RUBY_DEBUG_COUNTER] mc_miss_spurious 6,936
[RUBY_DEBUG_COUNTER] mc_miss_reuse_call 2,245
[RUBY_DEBUG_COUNTER] ccf_general 16,704
[RUBY_DEBUG_COUNTER] ccf_iseq_setup 43,129
[RUBY_DEBUG_COUNTER] ccf_iseq_setup_0start 1,055
[RUBY_DEBUG_COUNTER] ccf_iseq_setup_tailcall_0start 0
[RUBY_DEBUG_COUNTER] ccf_iseq_fix 134,778
[RUBY_DEBUG_COUNTER] ccf_iseq_opt 14,039
[RUBY_DEBUG_COUNTER] ccf_iseq_kw1 124
[RUBY_DEBUG_COUNTER] ccf_iseq_kw2 195
[RUBY_DEBUG_COUNTER] ccf_cfunc 139,250
[RUBY_DEBUG_COUNTER] ccf_ivar 51,208
[RUBY_DEBUG_COUNTER] ccf_attrset 698
[RUBY_DEBUG_COUNTER] ccf_method_missing 524
[RUBY_DEBUG_COUNTER] ccf_zsuper 0
[RUBY_DEBUG_COUNTER] ccf_bmethod 18,238
[RUBY_DEBUG_COUNTER] ccf_opt_send 8,025
[RUBY_DEBUG_COUNTER] ccf_opt_call 857
[RUBY_DEBUG_COUNTER] ccf_opt_block_call 0
[RUBY_DEBUG_COUNTER] ccf_super_method 13,155
[RUBY_DEBUG_COUNTER] frame_push 392,112
[RUBY_DEBUG_COUNTER] frame_push_method 193,320
[RUBY_DEBUG_COUNTER] frame_push_block 35,436
[RUBY_DEBUG_COUNTER] frame_push_class 0
[RUBY_DEBUG_COUNTER] frame_push_top 0
[RUBY_DEBUG_COUNTER] frame_push_cfunc 159,590
[RUBY_DEBUG_COUNTER] frame_push_ifunc 3,764
[RUBY_DEBUG_COUNTER] frame_push_eval 0
[RUBY_DEBUG_COUNTER] frame_push_rescue 2
[RUBY_DEBUG_COUNTER] frame_push_dummy 0
[RUBY_DEBUG_COUNTER] frame_R2R 204,735
[RUBY_DEBUG_COUNTER] frame_R2C 148,739
[RUBY_DEBUG_COUNTER] frame_C2C 14,615
[RUBY_DEBUG_COUNTER] frame_C2R 24,023
[RUBY_DEBUG_COUNTER] ivar_get_ic_hit 105,173
[RUBY_DEBUG_COUNTER] ivar_get_ic_miss 20,240
[RUBY_DEBUG_COUNTER] ivar_get_ic_miss_serial 28,221
[RUBY_DEBUG_COUNTER] ivar_get_ic_miss_unset 11,963
[RUBY_DEBUG_COUNTER] ivar_get_ic_miss_noobject 0
[RUBY_DEBUG_COUNTER] ivar_set_ic_hit 10,802
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss 13,473
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss_serial 8,262
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss_unset 4
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss_oorange 3,924
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss_noobject 1,283
[RUBY_DEBUG_COUNTER] ivar_get_base 21,572
[RUBY_DEBUG_COUNTER] ivar_set_base 13,494
[RUBY_DEBUG_COUNTER] lvar_get 355,050
[RUBY_DEBUG_COUNTER] lvar_get_dynamic 40,160
[RUBY_DEBUG_COUNTER] lvar_set 78,325
[RUBY_DEBUG_COUNTER] lvar_set_dynamic 1,252
[RUBY_DEBUG_COUNTER] lvar_set_slowpath 101
[RUBY_DEBUG_COUNTER] gc_count 0
[RUBY_DEBUG_COUNTER] gc_minor_newobj 0
[RUBY_DEBUG_COUNTER] gc_minor_malloc 0
[RUBY_DEBUG_COUNTER] gc_minor_method 0
[RUBY_DEBUG_COUNTER] gc_minor_capi 0
[RUBY_DEBUG_COUNTER] gc_minor_stress 0
[RUBY_DEBUG_COUNTER] gc_major_nofree 0
[RUBY_DEBUG_COUNTER] gc_major_oldgen 0
[RUBY_DEBUG_COUNTER] gc_major_shady 0
[RUBY_DEBUG_COUNTER] gc_major_force 0
[RUBY_DEBUG_COUNTER] gc_major_oldmalloc 0
[RUBY_DEBUG_COUNTER] gc_isptr_trial 0
[RUBY_DEBUG_COUNTER] gc_isptr_range 0
[RUBY_DEBUG_COUNTER] gc_isptr_align 0
[RUBY_DEBUG_COUNTER] gc_isptr_maybe 0
[RUBY_DEBUG_COUNTER] obj_newobj 58,178
[RUBY_DEBUG_COUNTER] obj_newobj_slowpath 1,517
[RUBY_DEBUG_COUNTER] obj_newobj_wb_unprotected 877
[RUBY_DEBUG_COUNTER] obj_free 0
[RUBY_DEBUG_COUNTER] obj_promote 0
[RUBY_DEBUG_COUNTER] obj_wb_unprotect 0
[RUBY_DEBUG_COUNTER] obj_obj_embed 0
[RUBY_DEBUG_COUNTER] obj_obj_transient 0
[RUBY_DEBUG_COUNTER] obj_obj_ptr 0
[RUBY_DEBUG_COUNTER] obj_str_ptr 0
[RUBY_DEBUG_COUNTER] obj_str_embed 0
[RUBY_DEBUG_COUNTER] obj_str_shared 0
[RUBY_DEBUG_COUNTER] obj_str_nofree 0
[RUBY_DEBUG_COUNTER] obj_str_fstr 0
[RUBY_DEBUG_COUNTER] obj_ary_embed 0
[RUBY_DEBUG_COUNTER] obj_ary_transient 0
[RUBY_DEBUG_COUNTER] obj_ary_ptr 1
[RUBY_DEBUG_COUNTER] obj_ary_extracapa 0
[RUBY_DEBUG_COUNTER] obj_ary_shared_create 81
[RUBY_DEBUG_COUNTER] obj_ary_shared 0
[RUBY_DEBUG_COUNTER] obj_ary_shared_root_occupied 0
[RUBY_DEBUG_COUNTER] obj_hash_empty 0
[RUBY_DEBUG_COUNTER] obj_hash_1 0
[RUBY_DEBUG_COUNTER] obj_hash_2 0
[RUBY_DEBUG_COUNTER] obj_hash_3 0
[RUBY_DEBUG_COUNTER] obj_hash_4 0
[RUBY_DEBUG_COUNTER] obj_hash_5_8 0
[RUBY_DEBUG_COUNTER] obj_hash_g8 0
[RUBY_DEBUG_COUNTER] obj_hash_null 0
[RUBY_DEBUG_COUNTER] obj_hash_ar 0
[RUBY_DEBUG_COUNTER] obj_hash_st 0
[RUBY_DEBUG_COUNTER] obj_hash_transient 0
[RUBY_DEBUG_COUNTER] obj_hash_force_convert 0
[RUBY_DEBUG_COUNTER] obj_struct_embed 0
[RUBY_DEBUG_COUNTER] obj_struct_transient 0
[RUBY_DEBUG_COUNTER] obj_struct_ptr 0
[RUBY_DEBUG_COUNTER] obj_data_empty 0
[RUBY_DEBUG_COUNTER] obj_data_xfree 0
[RUBY_DEBUG_COUNTER] obj_data_imm_free 0
[RUBY_DEBUG_COUNTER] obj_data_zombie 0
[RUBY_DEBUG_COUNTER] obj_match_under4 0
[RUBY_DEBUG_COUNTER] obj_match_ge4 0
[RUBY_DEBUG_COUNTER] obj_match_ge8 0
[RUBY_DEBUG_COUNTER] obj_match_ptr 0
[RUBY_DEBUG_COUNTER] obj_iclass_ptr 0
[RUBY_DEBUG_COUNTER] obj_class_ptr 0
[RUBY_DEBUG_COUNTER] obj_module_ptr 0
[RUBY_DEBUG_COUNTER] obj_bignum_ptr 0
[RUBY_DEBUG_COUNTER] obj_bignum_embed 0
[RUBY_DEBUG_COUNTER] obj_float 0
[RUBY_DEBUG_COUNTER] obj_complex 0
[RUBY_DEBUG_COUNTER] obj_rational 0
[RUBY_DEBUG_COUNTER] obj_regexp_ptr 0
[RUBY_DEBUG_COUNTER] obj_file_ptr 0
[RUBY_DEBUG_COUNTER] obj_symbol 0
[RUBY_DEBUG_COUNTER] obj_imemo_ment 0
[RUBY_DEBUG_COUNTER] obj_imemo_iseq 0
[RUBY_DEBUG_COUNTER] obj_imemo_env 0
[RUBY_DEBUG_COUNTER] obj_imemo_tmpbuf 0
[RUBY_DEBUG_COUNTER] obj_imemo_ast 0
[RUBY_DEBUG_COUNTER] obj_imemo_cref 0
[RUBY_DEBUG_COUNTER] obj_imemo_svar 0
[RUBY_DEBUG_COUNTER] obj_imemo_throw_data 0
[RUBY_DEBUG_COUNTER] obj_imemo_ifunc 0
[RUBY_DEBUG_COUNTER] obj_imemo_memo 0
[RUBY_DEBUG_COUNTER] obj_imemo_parser_strterm 0
[RUBY_DEBUG_COUNTER] artable_hint_hit 23,024
[RUBY_DEBUG_COUNTER] artable_hint_miss 174
[RUBY_DEBUG_COUNTER] artable_hint_notfound 27,875
[RUBY_DEBUG_COUNTER] heap_xmalloc 7,642
[RUBY_DEBUG_COUNTER] heap_xrealloc 153
[RUBY_DEBUG_COUNTER] heap_xfree 539
[RUBY_DEBUG_COUNTER] theap_alloc 9,295
[RUBY_DEBUG_COUNTER] theap_alloc_fail 0
[RUBY_DEBUG_COUNTER] theap_evacuate 0
[RUBY_DEBUG_COUNTER] mjit_exec 0
[RUBY_DEBUG_COUNTER] mjit_exec_not_added 0
[RUBY_DEBUG_COUNTER] mjit_exec_not_added_add_iseq 0
[RUBY_DEBUG_COUNTER] mjit_exec_not_ready 0
[RUBY_DEBUG_COUNTER] mjit_exec_not_compiled 0
[RUBY_DEBUG_COUNTER] mjit_exec_call_func 0
[RUBY_DEBUG_COUNTER] mjit_frame_VM2VM 0
[RUBY_DEBUG_COUNTER] mjit_frame_VM2JT 0
[RUBY_DEBUG_COUNTER] mjit_frame_JT2JT 0
[RUBY_DEBUG_COUNTER] mjit_frame_JT2VM 0
[RUBY_DEBUG_COUNTER] mjit_cancel 0
[RUBY_DEBUG_COUNTER] mjit_cancel_ivar_inline 0
[RUBY_DEBUG_COUNTER] mjit_cancel_send_inline 0
[RUBY_DEBUG_COUNTER] mjit_cancel_opt_insn 0
[RUBY_DEBUG_COUNTER] mjit_cancel_invalidate_all 0
[RUBY_DEBUG_COUNTER] mjit_length_unit_queue 0
[RUBY_DEBUG_COUNTER] mjit_length_active_units 0
[RUBY_DEBUG_COUNTER] mjit_length_compact_units 0
[RUBY_DEBUG_COUNTER] mjit_length_stale_units 0
[RUBY_DEBUG_COUNTER] mjit_compile_failures 0
Results for 3.0 are:
[RUBY_DEBUG_COUNTER] 417140 show_debug_counters
[RUBY_DEBUG_COUNTER] mc_inline_hit 383,937
[RUBY_DEBUG_COUNTER] mc_inline_miss_klass 37,470
[RUBY_DEBUG_COUNTER] mc_inline_miss_invalidated 0
[RUBY_DEBUG_COUNTER] mc_cme_complement 166
[RUBY_DEBUG_COUNTER] mc_cme_complement_hit 166
[RUBY_DEBUG_COUNTER] mc_search 7,248
[RUBY_DEBUG_COUNTER] mc_search_notfound 4,254
[RUBY_DEBUG_COUNTER] mc_search_super 107,104
[RUBY_DEBUG_COUNTER] obj_respond_to 613
[RUBY_DEBUG_COUNTER] ci_packed 13,043
[RUBY_DEBUG_COUNTER] ci_kw 0
[RUBY_DEBUG_COUNTER] ci_nokw 0
[RUBY_DEBUG_COUNTER] ci_runtime 13,043
[RUBY_DEBUG_COUNTER] cc_new 228
[RUBY_DEBUG_COUNTER] cc_temp 0
[RUBY_DEBUG_COUNTER] cc_found_ccs 36,718
[RUBY_DEBUG_COUNTER] cc_ent_invalidate 0
[RUBY_DEBUG_COUNTER] cc_cme_invalidate 0
[RUBY_DEBUG_COUNTER] cc_invalidate_leaf 0
[RUBY_DEBUG_COUNTER] cc_invalidate_leaf_ccs 0
[RUBY_DEBUG_COUNTER] cc_invalidate_leaf_callable 0
[RUBY_DEBUG_COUNTER] cc_invalidate_tree 0
[RUBY_DEBUG_COUNTER] cc_invalidate_tree_cme 0
[RUBY_DEBUG_COUNTER] cc_invalidate_tree_callable 0
[RUBY_DEBUG_COUNTER] ccs_free 0
[RUBY_DEBUG_COUNTER] ccs_maxlen 2
[RUBY_DEBUG_COUNTER] ccs_found 57,960
[RUBY_DEBUG_COUNTER] iseq_num 0
[RUBY_DEBUG_COUNTER] iseq_cd_num 0
[RUBY_DEBUG_COUNTER] ccf_general 11,291
[RUBY_DEBUG_COUNTER] ccf_iseq_setup 36,053
[RUBY_DEBUG_COUNTER] ccf_iseq_setup_0start 1,068
[RUBY_DEBUG_COUNTER] ccf_iseq_setup_tailcall_0start 0
[RUBY_DEBUG_COUNTER] ccf_iseq_fix 154,419
[RUBY_DEBUG_COUNTER] ccf_iseq_opt 16,076
[RUBY_DEBUG_COUNTER] ccf_iseq_kw1 124
[RUBY_DEBUG_COUNTER] ccf_iseq_kw2 231
[RUBY_DEBUG_COUNTER] ccf_cfunc 1,878
[RUBY_DEBUG_COUNTER] ccf_cfunc_with_frame 123,495
[RUBY_DEBUG_COUNTER] ccf_ivar 51,189
[RUBY_DEBUG_COUNTER] ccf_attrset 698
[RUBY_DEBUG_COUNTER] ccf_method_missing 524
[RUBY_DEBUG_COUNTER] ccf_zsuper 0
[RUBY_DEBUG_COUNTER] ccf_bmethod 18,238
[RUBY_DEBUG_COUNTER] ccf_opt_send 8,025
[RUBY_DEBUG_COUNTER] ccf_opt_call 857
[RUBY_DEBUG_COUNTER] ccf_opt_block_call 0
[RUBY_DEBUG_COUNTER] ccf_super_method 32
[RUBY_DEBUG_COUNTER] frame_push 390,819
[RUBY_DEBUG_COUNTER] frame_push_method 207,971
[RUBY_DEBUG_COUNTER] frame_push_block 35,437
[RUBY_DEBUG_COUNTER] frame_push_class 0
[RUBY_DEBUG_COUNTER] frame_push_top 0
[RUBY_DEBUG_COUNTER] frame_push_cfunc 143,645
[RUBY_DEBUG_COUNTER] frame_push_ifunc 3,764
[RUBY_DEBUG_COUNTER] frame_push_eval 0
[RUBY_DEBUG_COUNTER] frame_push_rescue 2
[RUBY_DEBUG_COUNTER] frame_push_dummy 0
[RUBY_DEBUG_COUNTER] frame_R2R 219,554
[RUBY_DEBUG_COUNTER] frame_R2C 132,840
[RUBY_DEBUG_COUNTER] frame_C2C 14,569
[RUBY_DEBUG_COUNTER] frame_C2R 23,856
[RUBY_DEBUG_COUNTER] ivar_get_ic_hit 106,326
[RUBY_DEBUG_COUNTER] ivar_get_ic_miss 20,239
[RUBY_DEBUG_COUNTER] ivar_get_ic_miss_serial 27,800
[RUBY_DEBUG_COUNTER] ivar_get_ic_miss_unset 10,739
[RUBY_DEBUG_COUNTER] ivar_get_ic_miss_noobject 0
[RUBY_DEBUG_COUNTER] ivar_set_ic_hit 17,837
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss 1,283
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss_serial 5,148
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss_unset 7
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss_iv_hit 5,155
[RUBY_DEBUG_COUNTER] ivar_set_ic_miss_noobject 1,283
[RUBY_DEBUG_COUNTER] ivar_get_base 21,570
[RUBY_DEBUG_COUNTER] ivar_set_base 1,304
[RUBY_DEBUG_COUNTER] lvar_get 354,236
[RUBY_DEBUG_COUNTER] lvar_get_dynamic 40,161
[RUBY_DEBUG_COUNTER] lvar_set 78,173
[RUBY_DEBUG_COUNTER] lvar_set_dynamic 1,252
[RUBY_DEBUG_COUNTER] lvar_set_slowpath 101
[RUBY_DEBUG_COUNTER] gc_count 0
[RUBY_DEBUG_COUNTER] gc_minor_newobj 0
[RUBY_DEBUG_COUNTER] gc_minor_malloc 0
[RUBY_DEBUG_COUNTER] gc_minor_method 0
[RUBY_DEBUG_COUNTER] gc_minor_capi 0
[RUBY_DEBUG_COUNTER] gc_minor_stress 0
[RUBY_DEBUG_COUNTER] gc_major_nofree 0
[RUBY_DEBUG_COUNTER] gc_major_oldgen 0
[RUBY_DEBUG_COUNTER] gc_major_shady 0
[RUBY_DEBUG_COUNTER] gc_major_force 0
[RUBY_DEBUG_COUNTER] gc_major_oldmalloc 0
[RUBY_DEBUG_COUNTER] gc_isptr_trial 0
[RUBY_DEBUG_COUNTER] gc_isptr_range 0
[RUBY_DEBUG_COUNTER] gc_isptr_align 0
[RUBY_DEBUG_COUNTER] gc_isptr_maybe 0
[RUBY_DEBUG_COUNTER] obj_newobj 57,746
[RUBY_DEBUG_COUNTER] obj_newobj_slowpath 873
[RUBY_DEBUG_COUNTER] obj_newobj_wb_unprotected 733
[RUBY_DEBUG_COUNTER] obj_free 0
[RUBY_DEBUG_COUNTER] obj_promote 0
[RUBY_DEBUG_COUNTER] obj_wb_unprotect 0
[RUBY_DEBUG_COUNTER] obj_obj_embed 0
[RUBY_DEBUG_COUNTER] obj_obj_transient 0
[RUBY_DEBUG_COUNTER] obj_obj_ptr 0
[RUBY_DEBUG_COUNTER] obj_str_ptr 0
[RUBY_DEBUG_COUNTER] obj_str_embed 0
[RUBY_DEBUG_COUNTER] obj_str_shared 0
[RUBY_DEBUG_COUNTER] obj_str_nofree 0
[RUBY_DEBUG_COUNTER] obj_str_fstr 0
[RUBY_DEBUG_COUNTER] obj_ary_embed 0
[RUBY_DEBUG_COUNTER] obj_ary_transient 0
[RUBY_DEBUG_COUNTER] obj_ary_ptr 1
[RUBY_DEBUG_COUNTER] obj_ary_extracapa 0
[RUBY_DEBUG_COUNTER] obj_ary_shared_create 81
[RUBY_DEBUG_COUNTER] obj_ary_shared 0
[RUBY_DEBUG_COUNTER] obj_ary_shared_root_occupied 0
[RUBY_DEBUG_COUNTER] obj_hash_empty 0
[RUBY_DEBUG_COUNTER] obj_hash_1 0
[RUBY_DEBUG_COUNTER] obj_hash_2 0
[RUBY_DEBUG_COUNTER] obj_hash_3 0
[RUBY_DEBUG_COUNTER] obj_hash_4 0
[RUBY_DEBUG_COUNTER] obj_hash_5_8 0
[RUBY_DEBUG_COUNTER] obj_hash_g8 0
[RUBY_DEBUG_COUNTER] obj_hash_null 0
[RUBY_DEBUG_COUNTER] obj_hash_ar 0
[RUBY_DEBUG_COUNTER] obj_hash_st 0
[RUBY_DEBUG_COUNTER] obj_hash_transient 0
[RUBY_DEBUG_COUNTER] obj_hash_force_convert 0
[RUBY_DEBUG_COUNTER] obj_struct_embed 0
[RUBY_DEBUG_COUNTER] obj_struct_transient 0
[RUBY_DEBUG_COUNTER] obj_struct_ptr 0
[RUBY_DEBUG_COUNTER] obj_data_empty 0
[RUBY_DEBUG_COUNTER] obj_data_xfree 0
[RUBY_DEBUG_COUNTER] obj_data_imm_free 0
[RUBY_DEBUG_COUNTER] obj_data_zombie 0
[RUBY_DEBUG_COUNTER] obj_match_under4 0
[RUBY_DEBUG_COUNTER] obj_match_ge4 0
[RUBY_DEBUG_COUNTER] obj_match_ge8 0
[RUBY_DEBUG_COUNTER] obj_match_ptr 0
[RUBY_DEBUG_COUNTER] obj_iclass_ptr 0
[RUBY_DEBUG_COUNTER] obj_class_ptr 0
[RUBY_DEBUG_COUNTER] obj_module_ptr 0
[RUBY_DEBUG_COUNTER] obj_bignum_ptr 0
[RUBY_DEBUG_COUNTER] obj_bignum_embed 0
[RUBY_DEBUG_COUNTER] obj_float 0
[RUBY_DEBUG_COUNTER] obj_complex 0
[RUBY_DEBUG_COUNTER] obj_rational 0
[RUBY_DEBUG_COUNTER] obj_regexp_ptr 0
[RUBY_DEBUG_COUNTER] obj_file_ptr 0
[RUBY_DEBUG_COUNTER] obj_symbol 0
[RUBY_DEBUG_COUNTER] obj_imemo_ment 0
[RUBY_DEBUG_COUNTER] obj_imemo_iseq 0
[RUBY_DEBUG_COUNTER] obj_imemo_env 0
[RUBY_DEBUG_COUNTER] obj_imemo_tmpbuf 0
[RUBY_DEBUG_COUNTER] obj_imemo_ast 0
[RUBY_DEBUG_COUNTER] obj_imemo_cref 0
[RUBY_DEBUG_COUNTER] obj_imemo_svar 0
[RUBY_DEBUG_COUNTER] obj_imemo_throw_data 0
[RUBY_DEBUG_COUNTER] obj_imemo_ifunc 0
[RUBY_DEBUG_COUNTER] obj_imemo_memo 0
[RUBY_DEBUG_COUNTER] obj_imemo_parser_strterm 0
[RUBY_DEBUG_COUNTER] obj_imemo_callinfo 0
[RUBY_DEBUG_COUNTER] obj_imemo_callcache 0
[RUBY_DEBUG_COUNTER] artable_hint_hit 22,575
[RUBY_DEBUG_COUNTER] artable_hint_miss 70
[RUBY_DEBUG_COUNTER] artable_hint_notfound 27,631
[RUBY_DEBUG_COUNTER] heap_xmalloc 6,942
[RUBY_DEBUG_COUNTER] heap_xrealloc 987
[RUBY_DEBUG_COUNTER] heap_xfree 423
[RUBY_DEBUG_COUNTER] theap_alloc 7,715
[RUBY_DEBUG_COUNTER] theap_alloc_fail 0
[RUBY_DEBUG_COUNTER] theap_evacuate 0
[RUBY_DEBUG_COUNTER] mjit_exec 0
[RUBY_DEBUG_COUNTER] mjit_exec_not_added 0
[RUBY_DEBUG_COUNTER] mjit_exec_not_ready 0
[RUBY_DEBUG_COUNTER] mjit_exec_not_compiled 0
[RUBY_DEBUG_COUNTER] mjit_exec_call_func 0
[RUBY_DEBUG_COUNTER] mjit_add_iseq_to_process 0
[RUBY_DEBUG_COUNTER] mjit_unload_units 0
[RUBY_DEBUG_COUNTER] mjit_frame_VM2VM 0
[RUBY_DEBUG_COUNTER] mjit_frame_VM2JT 0
[RUBY_DEBUG_COUNTER] mjit_frame_JT2JT 0
[RUBY_DEBUG_COUNTER] mjit_frame_JT2VM 0
[RUBY_DEBUG_COUNTER] mjit_cancel 0
[RUBY_DEBUG_COUNTER] mjit_cancel_ivar_inline 0
[RUBY_DEBUG_COUNTER] mjit_cancel_exivar_inline 0
[RUBY_DEBUG_COUNTER] mjit_cancel_send_inline 0
[RUBY_DEBUG_COUNTER] mjit_cancel_opt_insn 0
[RUBY_DEBUG_COUNTER] mjit_cancel_invalidate_all 0
[RUBY_DEBUG_COUNTER] mjit_cancel_leave 0
[RUBY_DEBUG_COUNTER] mjit_length_unit_queue 0
[RUBY_DEBUG_COUNTER] mjit_length_active_units 0
[RUBY_DEBUG_COUNTER] mjit_length_compact_units 0
[RUBY_DEBUG_COUNTER] mjit_length_stale_units 0
[RUBY_DEBUG_COUNTER] mjit_compile_failures 0
I patched Ruby to also count obj_respond_to calls.
This is our "profile mode" home page, it takes about 40ms or so and is very typical of Rails performance.
A single page will issue 613 obj_respond_to calls, I guess this is not properly cached.
You can run it by running Discourse bench on 2.7 (which will prep the DB) then execute the script.
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
- Related to Bug #17383: 3.0 recursion memory|speed issues added
Updated by tenderlovemaking (Aaron Patterson) about 4 years ago
@xrxr and I made a patch that implements a negative lookup cache. We also changed respond_to?
to use the per class method cache.
This PR has the "respond_to" patch: https://github.com/ruby/ruby/pull/3873
This PR has the negative cache in addition to the "respond_to" patch: https://github.com/ruby/ruby/pull/3872
Neither of these patches seemed to have a large impact on RailsBench. Maybe they will help Discourse, but I'm not sure.
I do want to talk about the problems with the negative cache patch though. First, I think some of the invalidation is broken (and that's why the CI is red), but that's not really the biggest issue.
The pCMC is effectively a weak reference cache. In our strategy, we just insert a NULL in the cache for that method name. The fact that there is a cache entry, but the entry is NULL allows us to avoid lookups for the negative case.
The problem though: If we cache negative values (in this case it caches a NULL
), how long should those negative values live? Right now, they get GC'd extremely quickly. For example, here is a perf
report before adding a negative cache:
You can see rb_id_table_lookup
is pretty high in the report. Now compare to a perf report after adding the negative cache:
Now the number 2 entry is cc_table_free_i
. Now we're spending time freeing entries in the pCMC because the negative cache entries aren't real objects. I guess the question is: how long should these negative entries live? They can't live forever.
Numbers for RailsBench were approximately the same with this patch presumably because of the extra GC time.
Anyway, I don't have good answers right now, but I wanted to share progress. Also, I'm sorry these screenshots look so big in the webui, I don't know how to make them smaller 😅
Updated by ioquatix (Samuel Williams) about 4 years ago
In my early benchmarks:
Async¶
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]¶
Wrapper#wait_readable
114.385k (± 1.0%) i/s - 573.250k in 5.012051s
Reactor#register 137.206k (± 1.0%) i/s - 693.906k in 5.057946s
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin20]¶
Wrapper#wait_readable
110.108k (± 1.4%) i/s - 552.034k in 5.014519s
Reactor#register 131.360k (± 1.7%) i/s - 665.193k in 5.065527s
Async::HTTP¶
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]¶
Running 2s test @ http://127.0.0.1:9294/
8 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 228.46us 86.09us 3.82ms 83.40%
Req/Sec 4.32k 1.07k 5.80k 50.60%
72205 requests in 2.10s, 3.44MB read
Requests/sec: 34380.26
Transfer/sec: 1.64MB
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin20]¶
Running 2s test @ http://127.0.0.1:9294/
8 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 252.91us 107.48us 1.15ms 69.00%
Req/Sec 3.90k 1.46k 5.87k 39.88%
65106 requests in 2.10s, 3.10MB read
Requests/sec: 31001.91
Transfer/sec: 1.48MB
Updated by sam.saffron (Sam Saffron) about 4 years ago
Good news, @ko1 (Koichi Sasada) and @tenderlove managed to recover a fair bit of performance leading to release.
3.0 remains slower in our bench than 2.7 but we did improve over the first post.
That said, the picture as of release is:
Ruby 2.6 -> Ruby 2.7 -> Ruby 3.0
categories page / median: 26 -> 27 -> 28 (slow down 1ms from 2.7 ... 2ms from 2.6)
home page: 50 -> 51 -> 53
topic page: 44 -> 46 -> 49
cat admin: 26 -> 27 -> 28
home admin: 50 -> 53 -> 53
topic admin: 44 -> 45 -> 48
Interestingly when I test this all from the same process some of this slowdown is no longer observed, so some of the slowdown could be due to IO changes or GC.
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
- Status changed from Open to Closed
It looks like the slowdown for refined methods was fixed by @ko1 (Koichi Sasada) in cfd7729ce7a31c8b6ec5dd0e99c67b2932de4732. Even without that, I'm guessing with yjit, Ruby 3.2 is now faster than Ruby 2.6/2.7. If that is not the case, please reopen with details.