Project

General

Profile

Actions

Feature #17610

closed

[PATCH] Reduce RubyVM::InstructionSequence.load_from_binary allocations

Added by byroot (Jean Boussier) almost 4 years ago. Updated over 3 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:102397]

Description

Pull Request: https://github.com/ruby/ruby/pull/4119

Context

While profiling our application allocations, I noticed load_from_binary would allocate a string for each method:

 305.68 kB    7642  "initialize"
              7454  bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19

This lead me to experiment with the following repro script:

# frozen_string_literal: true

require 'objspace'
ObjectSpace.trace_object_allocations_start
preload = [:some_func, :some_sym, :Foo, :extend, "Foo"]

class Foo;end
class Bar;end

binary = RubyVM::InstructionSequence.compile(<<~RUBY).to_binary
  class Foo
    def initialize
    end
  end

  class Bar
    def initialize
    end
  end
RUBY

4.times { GC.start }
GC.disable
gen = GC.count
RubyVM::InstructionSequence.load_from_binary(binary)
RubyVM::InstructionSequence.load_from_binary(binary)

puts ObjectSpace.dump_all(output: :string, since: gen).lines.grep(/"type":"STRING"/)

On 3.0.0p0 it allocates 12 strings:

{"address":"0x7f90e7027a40", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"<class:Bar>", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027ab8", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Bar", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027ae0", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"<class:Foo>", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027b08", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"<compiled>", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027b58", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"initialize", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027ba8", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Foo", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":27, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027cc0", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"<class:Bar>", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027d38", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Bar", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027d60", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":11, "value":"<class:Foo>", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027d88", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"<compiled>", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027dd8", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":10, "value":"initialize", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x7f90e7027e28", "type":"STRING", "class":"0x7f90e78be870", "embedded":true, "bytesize":3, "value":"Foo", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":26, "method":"load_from_binary", "generation":13, "memsize":40, "flags":{"wb_protected":true}}

For a good part, these allocated strings are immediately passed to rb_str_intern to get a symbol, and since constant and method names are very likely to be referenced elsewhere in the application source, most of the time the symbol already exist and could be looked up without first allocating a String.

This patch

By using rb_intern3 and rb_enc_interned_str, we can lookup existing symbols and fstrings without allocating anything.

The same repro script with this patch only allocate a single string:

{"address":"0x7f88ec8dfde8", "type":"STRING", "class":"0x7f88ec8be800", "frozen":true, "embedded":true, "fstring":true, "bytesize":10, "value":"<compiled>", "encoding":"US-ASCII", "file":"/tmp/load.rb", "line":25, "method":"load_from_binary", "generation":4, "memsize":40, "flags":{"wb_protected":true}}

Performance

I added a small micro benchmark which show minor gains:

compare-ruby: ruby 3.1.0dev (2021-01-21T19:19:44Z master 32b7dcfb56) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-01-25T09:58:02Z iseq-load-symbol 6b0e2c1580) [x86_64-darwin19]
# Iteration per second (i/s)

|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|symbol         |    447.846k|  489.421k|
|               |           -|     1.09x|
|define_method  |    113.035k|  117.016k|
|               |           -|     1.04x|
|all            |     61.421k|   64.382k|
|               |           -|     1.05x|

In a more real world scenario, this patch reduce load_from_binary number of allocations by 65% ( and number of allocations by 7.7% during our entire application's boot process) (~1.7M allocations avoided):

Before:

allocated memory by location
-----------------------------------
 429.56 MB  bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19

allocated objects by location
-----------------------------------
  2748272  bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19

After:

allocated memory by location
-----------------------------------
 346.06 MB  bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19

allocated objects by location
-----------------------------------
    960451  bootsnap-1.5.1/lib/bootsnap/compile_cache/iseq.rb:19

And WALL profiling show much less time spent in load_from_binary:

3.0.0-p0:

==================================
  Mode: wall(1000)
  Samples: 57745 (13.29% miss rate)
  GC: 16369 (28.35%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      6500  (11.3%)        6497  (11.3%)     RubyVM::InstructionSequence.load_from_binary

3.0.0-p0 + this patch:

==================================
  Mode: wall(1000)
  Samples: 46137 (13.32% miss rate)
  GC: 14094 (30.55%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      3333   (7.2%)        3331   (7.2%)     RubyVM::InstructionSequence.load_from_binary

I tried measuring the same thing on popular open source applications like Redmine or Discourse, unfortunately I couldn't make them work on Ruby 3.0 yet.

Actions

Also available in: Atom PDF

Like0
Like0Like0