Feature #17610
closed[PATCH] Reduce RubyVM::InstructionSequence.load_from_binary allocations
Description
Pull Request: https://github.com/ruby/ruby/pull/4119
Context¶
While profiling our application allocations, I noticed load_from_binary
would allocate a string for each method:
305.68 kB 7642 "initialize"
7454 bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19
This lead me to experiment with the following repro script:
# frozen_string_literal: true
require 'objspace'
ObjectSpace.trace_object_allocations_start
preload = [:some_func, :some_sym, :Foo, :extend, "Foo"]
class Foo;end
class Bar;end
binary = RubyVM::InstructionSequence.compile(<<~RUBY).to_binary
class Foo
def initialize
end
end
class Bar
def initialize
end
end
RUBY
4.times { GC.start }
GC.disable
gen = GC.count
RubyVM::InstructionSequence.load_from_binary(binary)
RubyVM::InstructionSequence.load_from_binary(binary)
puts ObjectSpace.dump_all(output: :string, since: gen).lines.grep(/"type":"STRING"/)
On 3.0.0p0 it allocates 12 strings:
{"address":"0x7f90e7027a40", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"<class:Bar>", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027ab8", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Bar", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027ae0", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"<class:Foo>", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027b08", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"<compiled>", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027b58", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"initialize", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027ba8", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Foo", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027cc0", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"<class:Bar>", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027d38", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Bar", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027d60", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"<class:Foo>", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027d88", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"<compiled>", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027dd8", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"initialize", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027e28", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Foo", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
For a good part, these allocated strings are immediately passed to rb_str_intern
to get a symbol, and since constant and method names are very likely to be referenced elsewhere in the application source, most of the time the symbol already exist and could be looked up without first allocating a String.
This patch¶
By using rb_intern3
and rb_enc_interned_str
, we can lookup existing symbols and fstrings without allocating anything.
The same repro script with this patch only allocate a single string:
{"address":"0x7f88ec8dfde8", "type":"STRING", "class":"0x7f88ec8be800", "frozen":true, "embedded":true, "fstring":true, "bytesize":10, "value":"<compiled>", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":25, "method":"load_from_binary", "generation":4, "memsize":40, "flags":{"wb_protected":true}}
Performance¶
I added a small micro benchmark which show minor gains:
compare-ruby: ruby 3.1.0dev (2021-01-21T19:19:44Z master 32b7dcfb56) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-01-25T09:58:02Z iseq-load-symbol 6b0e2c1580) [x86_64-darwin19]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|symbol | 447.846k| 489.421k|
| | -| 1.09x|
|define_method | 113.035k| 117.016k|
| | -| 1.04x|
|all | 61.421k| 64.382k|
| | -| 1.05x|
In a more real world scenario, this patch reduce load_from_binary
number of allocations by 65% ( and number of allocations by 7.7%
during our entire application's boot process) (~1.7M
allocations avoided):
Before:
allocated memory by location
-----------------------------------
429.56 MB bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19
allocated objects by location
-----------------------------------
2748272 bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19
After:
allocated memory by location
-----------------------------------
346.06 MB bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19
allocated objects by location
-----------------------------------
960451 bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19
And WALL profiling show much less time spent in load_from_binary
:
3.0.0-p0:
==================================
Mode: wall(1000)
Samples: 57745 (13.29% miss rate)
GC: 16369 (28.35%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
6500 (11.3%) 6497 (11.3%) RubyVM::InstructionSequence.load_from_binary
3.0.0-p0 + this patch:
==================================
Mode: wall(1000)
Samples: 46137 (13.32% miss rate)
GC: 14094 (30.55%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
3333 (7.2%) 3331 (7.2%) RubyVM::InstructionSequence.load_from_binary
I tried measuring the same thing on popular open source applications like Redmine or Discourse, unfortunately I couldn't make them work on Ruby 3.0 yet.