Project

General

Profile

Actions

Bug #20710

closed

Reducing Hash allocation introduces large performance degradation (probably related to VWA)

Added by pocke (Masataka Kuwabara) 2 months ago. Updated about 2 months ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin21]
[ruby-core:119000]

Description

I found a surprising performance degradation while developing RBS.
In short, I tried to remove unnecessary Hash allocations for RBS. Then, it made the execution time 2x slower.

VWA for Hash probably causes this degradation. I'd be happy if we could mitigate the impact by updating the memory management strategy.

Reproduce

You can reproduce this problem on a PR in pocke/rbs repository.
https://github.com/pocke/rbs/pull/2
This PR dedups empty Hash objects.

  1. git clone and checkout
  2. bundle install
  3. bundle exec rake compile for C-ext
  4. bundle ruby benchmark/benchmark_new_env.rb

The "before" commit is https://github.com/pocke/rbs/commit/2c356c060286429cfdb034f88a74a6f94420fd21.
The "after" commit is https://github.com/pocke/rbs/commit/bfb2c367c7d3b7f93720392252d3a3980d7bf335.

The benchmark results are the following:

# Before
$ bundle exec ruby benchmark/benchmark_new_env.rb
(snip)
             new_env      6.426 (±15.6%) i/s -     64.000 in  10.125442s
       new_rails_env      0.968 (± 0.0%) i/s -     10.000 in  10.355738s

# After
$ bundle exec ruby benchmark/benchmark_new_env.rb
(snip)
             new_env      4.371 (±22.9%) i/s -     43.000 in  10.150192s
       new_rails_env      0.360 (± 0.0%) i/s -      4.000 in  11.313158s

The IPS decreased 1.47x for new_env case (parsing small RBS env), and 2.69x for new_rails_env (parsing large RBS env).

Investigation

GC.stat

GC.stat indicates the number of minor GCs increases.

# In the RBS repository
require_relative './benchmark/utils'

tmpdir = prepare_collection!
new_rails_env(tmpdir)
pp GC.stat
# before
{:count=>126,
 :time=>541,
 :marking_time=>496,
 :sweeping_time=>45,
 :heap_allocated_pages=>702,
 :heap_sorted_length=>984,
 :heap_allocatable_pages=>282,
 :heap_available_slots=>793270,
 :heap_live_slots=>787407,
 :heap_free_slots=>5863,
 :heap_final_slots=>0,
 :heap_marked_slots=>757744,
 :heap_eden_pages=>702,
 :heap_tomb_pages=>0,
 :total_allocated_pages=>702,
 :total_freed_pages=>0,
 :total_allocated_objects=>2220605,
 :total_freed_objects=>1433198,
 :malloc_increase_bytes=>5872,
 :malloc_increase_bytes_limit=>16777216,
 :minor_gc_count=>112,
 :major_gc_count=>14,
 :compact_count=>0,
 :read_barrier_faults=>0,
 :total_moved_objects=>0,
 :remembered_wb_unprotected_objects=>0,
 :remembered_wb_unprotected_objects_limit=>4779,
 :old_objects=>615704,
 :old_objects_limit=>955872,
 :oldmalloc_increase_bytes=>210912,
 :oldmalloc_increase_bytes_limit=>16777216}

# after
{:count=>255,
 :time=>1551,
 :marking_time=>1496,
 :sweeping_time=>55,
 :heap_allocated_pages=>570,
 :heap_sorted_length=>1038,
 :heap_allocatable_pages=>468,
 :heap_available_slots=>735520,
 :heap_live_slots=>731712,
 :heap_free_slots=>3808,
 :heap_final_slots=>0,
 :heap_marked_slots=>728727,
 :heap_eden_pages=>570,
 :heap_tomb_pages=>0,
 :total_allocated_pages=>570,
 :total_freed_pages=>0,
 :total_allocated_objects=>2183278,
 :total_freed_objects=>1451566,
 :malloc_increase_bytes=>1200,
 :malloc_increase_bytes_limit=>16777216,
 :minor_gc_count=>242,
 :major_gc_count=>13,
 :compact_count=>0,
 :read_barrier_faults=>0,
 :total_moved_objects=>0,
 :remembered_wb_unprotected_objects=>0,
 :remembered_wb_unprotected_objects_limit=>5915,
 :old_objects=>600594,
 :old_objects_limit=>1183070,
 :oldmalloc_increase_bytes=>8128,
 :oldmalloc_increase_bytes_limit=>16777216}

Warming up Hashes

The following patch, which creates unnecessary Hash objects before the benchmark, improves the execution time.

diff --git a/benchmark/benchmark_new_env.rb b/benchmark/benchmark_new_env.rb
index 6dd2b73f..a8da61c6 100644
--- a/benchmark/benchmark_new_env.rb
+++ b/benchmark/benchmark_new_env.rb
@@ -4,6 +4,8 @@ require 'benchmark/ips'
 
 tmpdir = prepare_collection!
 
+(0..30_000_000).map { {} }
+
 Benchmark.ips do |x|
   x.time = 10

The results are the following:

# Before
Calculating -------------------------------------
             new_env     10.354 (± 9.7%) i/s -    103.000 in  10.013834s
       new_rails_env      1.661 (± 0.0%) i/s -     17.000 in  10.282490s

# After
Calculating -------------------------------------
             new_env     10.771 (± 9.3%) i/s -    107.000 in  10.010446s
       new_rails_env      1.584 (± 0.0%) i/s -     16.000 in  10.178984s

RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO

The RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO env var also mitigates the performance impact.
In this example, I set RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=0.6 (default: 0.20)

# Before
Calculating -------------------------------------
             new_env     10.271 (± 9.7%) i/s -    102.000 in  10.087191s
       new_rails_env      1.529 (± 0.0%) i/s -     16.000 in  10.538043s

# After
$ env RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=0.6 bundle exec ruby benchmark/benchmark_new_env.rb
Calculating -------------------------------------
             new_env     11.003 (± 9.1%) i/s -    110.000 in  10.068428s
       new_rails_env      1.347 (± 0.0%) i/s -     14.000 in  11.117665s

Additional Information

  • I applied the same change to Array. But it does not cause this problem.
    • I guess the cause is the difference of the Size Pool. An empty Array uses 40 bytes like the ordinal Ruby object, but an empty Hash uses 160 bytes.
    • The Size Pool for 160 bytes objects has fewer objects than the 40 bytes one. So, reducing allocation affects the performance sensitively.
  • I tried it on Ruby 3.2. This change on Ruby 3.2 does not degrade the execution time.

Acknowledgement

@mame (Yusuke Endoh), @ko1 (Koichi Sasada), and @soutaro (Soutaro Matsumoto) helped the investigation. I would like to thank them.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0