Bug #21201
openPerformance regression when defining methods inside `refine` blocks
Description
After the change introduced in PR #10037, a significant performance regression has been observed when defining methods within refine blocks.
The PR correctly fixes a bug where callcache
invalidation was missing when methods are defined inside a refinement. To fix this, it now invokes rb_clear_all_refinement_method_cache()
at the time of method definition within refine
. However, this function performs a full object space traversal via rb_objspace_each_objects
, making it extremely expensive.
In our team's Rails application, refinements are used heavily. During application code reload (triggered in development mode), rb_clear_all_refinement_method_cache()
is called 1425 times, 1142 of which originate via rb_clear_method_cache()
. As a result, the code reload time has increased from 5 seconds to 15 seconds after the patch. Since reloading occurs every time the code changes, this degrades development experience severely.
Reproduction script:¶
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
gem "benchmark-ips"
end
mod = Module.new
klass = Class.new
Benchmark.ips do |x|
x.report(RUBY_VERSION) do
mod.send(:refine, klass) do
def call_1 = nil
def call_2 = nil
def call_3 = nil
end
end
x.save! "/tmp/performance_regression_refine.bench"
x.compare!
end
Benchmark results:¶
Comparison:
3.2.7: 1500316.7 i/s
3.3.7: 158.0 i/s - 9496.46x slower
3.5.0: 124.6 i/s - 12041.43x slower
3.4.2: 119.5 i/s - 12553.50x slower
Related Slack discussion: https://ruby-jp.slack.com/archives/CLWSHA76V/p1741932018811539
I totally understand the necessity and value of the past PR that fixed the callcache bugs. That said, I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.
Updated by byroot (Jean Boussier) 4 days ago
I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.
The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.
But yes, this performance issue is why at work we disallowed usage of refinement using rubocop.
Updated by tenderlovemaking (Aaron Patterson) 4 days ago
byroot (Jean Boussier) wrote in #note-1:
I was wondering if there might be any ideas to make it faster, or to handle the invalidation more efficiently.
The solution is likely to keep the list (set) of all existing call caches that belong to a refinement, as to save the object space traversal.
But yes, this performance issue is why at work we disallowed usage of refinement using rubocop.
I guess if refined methods are rare enough, maybe it's not worthwhile to allocate "refinement inline cache"? In other words, refined methods are always an IC miss?
Updated by byroot (Jean Boussier) 4 days ago
refined methods are always an IC miss?
I like that idea.
Updated by byroot (Jean Boussier) 4 days ago
The downside however is that I think we'd need to lock the VM to lookup methods? So it doesn't help with making Ractor lock less.
Updated by jhawthorn (John Hawthorn) 4 days ago
ยท Edited
tenderlovemaking (Aaron Patterson) wrote in #note-2:
I guess if refined methods are rare enough, maybe it's not worthwhile to allocate "refinement inline cache"? In other words, refined methods are always an IC miss?
I believe that was the case before Ruby 3.3 https://github.com/ruby/ruby/commit/cfd7729ce7a31c8b6ec5dd0e99c67b2932de4732