Bug #20710
closedReducing Hash allocation introduces large performance degradation (probably related to VWA)
Description
I found a surprising performance degradation while developing RBS.
In short, I tried to remove unnecessary Hash allocations for RBS. Then, it made the execution time 2x slower.
VWA for Hash probably causes this degradation. I'd be happy if we could mitigate the impact by updating the memory management strategy.
Reproduce¶
You can reproduce this problem on a PR in pocke/rbs repository.
https://github.com/pocke/rbs/pull/2
This PR dedups empty Hash objects.
-
git clone
and checkout bundle install
-
bundle exec rake compile
for C-ext bundle ruby benchmark/benchmark_new_env.rb
The "before" commit is https://github.com/pocke/rbs/commit/2c356c060286429cfdb034f88a74a6f94420fd21.
The "after" commit is https://github.com/pocke/rbs/commit/bfb2c367c7d3b7f93720392252d3a3980d7bf335.
The benchmark results are the following:
# Before
$ bundle exec ruby benchmark/benchmark_new_env.rb
(snip)
new_env 6.426 (±15.6%) i/s - 64.000 in 10.125442s
new_rails_env 0.968 (± 0.0%) i/s - 10.000 in 10.355738s
# After
$ bundle exec ruby benchmark/benchmark_new_env.rb
(snip)
new_env 4.371 (±22.9%) i/s - 43.000 in 10.150192s
new_rails_env 0.360 (± 0.0%) i/s - 4.000 in 11.313158s
The IPS decreased 1.47x for new_env
case (parsing small RBS env), and 2.69x for new_rails_env
(parsing large RBS env).
Investigation¶
GC.stat¶
GC.stat
indicates the number of minor GCs increases.
# In the RBS repository
require_relative './benchmark/utils'
tmpdir = prepare_collection!
new_rails_env(tmpdir)
pp GC.stat
# before
{:count=>126,
:time=>541,
:marking_time=>496,
:sweeping_time=>45,
:heap_allocated_pages=>702,
:heap_sorted_length=>984,
:heap_allocatable_pages=>282,
:heap_available_slots=>793270,
:heap_live_slots=>787407,
:heap_free_slots=>5863,
:heap_final_slots=>0,
:heap_marked_slots=>757744,
:heap_eden_pages=>702,
:heap_tomb_pages=>0,
:total_allocated_pages=>702,
:total_freed_pages=>0,
:total_allocated_objects=>2220605,
:total_freed_objects=>1433198,
:malloc_increase_bytes=>5872,
:malloc_increase_bytes_limit=>16777216,
:minor_gc_count=>112,
:major_gc_count=>14,
:compact_count=>0,
:read_barrier_faults=>0,
:total_moved_objects=>0,
:remembered_wb_unprotected_objects=>0,
:remembered_wb_unprotected_objects_limit=>4779,
:old_objects=>615704,
:old_objects_limit=>955872,
:oldmalloc_increase_bytes=>210912,
:oldmalloc_increase_bytes_limit=>16777216}
# after
{:count=>255,
:time=>1551,
:marking_time=>1496,
:sweeping_time=>55,
:heap_allocated_pages=>570,
:heap_sorted_length=>1038,
:heap_allocatable_pages=>468,
:heap_available_slots=>735520,
:heap_live_slots=>731712,
:heap_free_slots=>3808,
:heap_final_slots=>0,
:heap_marked_slots=>728727,
:heap_eden_pages=>570,
:heap_tomb_pages=>0,
:total_allocated_pages=>570,
:total_freed_pages=>0,
:total_allocated_objects=>2183278,
:total_freed_objects=>1451566,
:malloc_increase_bytes=>1200,
:malloc_increase_bytes_limit=>16777216,
:minor_gc_count=>242,
:major_gc_count=>13,
:compact_count=>0,
:read_barrier_faults=>0,
:total_moved_objects=>0,
:remembered_wb_unprotected_objects=>0,
:remembered_wb_unprotected_objects_limit=>5915,
:old_objects=>600594,
:old_objects_limit=>1183070,
:oldmalloc_increase_bytes=>8128,
:oldmalloc_increase_bytes_limit=>16777216}
Warming up Hashes¶
The following patch, which creates unnecessary Hash objects before the benchmark, improves the execution time.
diff --git a/benchmark/benchmark_new_env.rb b/benchmark/benchmark_new_env.rb
index 6dd2b73f..a8da61c6 100644
--- a/benchmark/benchmark_new_env.rb
+++ b/benchmark/benchmark_new_env.rb
@@ -4,6 +4,8 @@ require 'benchmark/ips'
tmpdir = prepare_collection!
+(0..30_000_000).map { {} }
+
Benchmark.ips do |x|
x.time = 10
The results are the following:
# Before
Calculating -------------------------------------
new_env 10.354 (± 9.7%) i/s - 103.000 in 10.013834s
new_rails_env 1.661 (± 0.0%) i/s - 17.000 in 10.282490s
# After
Calculating -------------------------------------
new_env 10.771 (± 9.3%) i/s - 107.000 in 10.010446s
new_rails_env 1.584 (± 0.0%) i/s - 16.000 in 10.178984s
RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO
¶
The RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO
env var also mitigates the performance impact.
In this example, I set RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=0.6
(default: 0.20)
# Before
Calculating -------------------------------------
new_env 10.271 (± 9.7%) i/s - 102.000 in 10.087191s
new_rails_env 1.529 (± 0.0%) i/s - 16.000 in 10.538043s
# After
$ env RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=0.6 bundle exec ruby benchmark/benchmark_new_env.rb
Calculating -------------------------------------
new_env 11.003 (± 9.1%) i/s - 110.000 in 10.068428s
new_rails_env 1.347 (± 0.0%) i/s - 14.000 in 11.117665s
Additional Information¶
- I applied the same change to Array. But it does not cause this problem.
- I guess the cause is the difference of the Size Pool. An empty Array uses 40 bytes like the ordinal Ruby object, but an empty Hash uses 160 bytes.
- The Size Pool for 160 bytes objects has fewer objects than the 40 bytes one. So, reducing allocation affects the performance sensitively.
- I tried it on Ruby 3.2. This change on Ruby 3.2 does not degrade the execution time.
- VWA for Hash is introduced since Ruby 3.3. https://github.com/ruby/ruby/blob/73c39a5f93d3ad4514a06158e2bb7622496372b9/doc/NEWS/NEWS-3.3.0.md#gc--memory-management
Acknowledgement¶
@mame (Yusuke Endoh), @ko1 (Koichi Sasada), and @soutaro (Soutaro Matsumoto) helped the investigation. I would like to thank them.