Feature #21254
openInlining Class#new
Description
We would like to propose inlining YARV bytecode for speeding up object allocations, specifically inlining the Class#new
method. In order to support inlining this method, we would like to introduce a new YARV instruction opt_new
. This instruction will allocate an object if the default allocator is not overridden, otherwise it will jump to a “slow path” for calling a method.
Class#new
especially benefits from inlining for two reasons:
- Calling
initialize
directly means we don't need to allocate a temporary hash for keyword arguments - We are able to use an inline cache when calling the
initialize
method
The patch can be found here, but please find implementation details below.
Implementation Details¶
This patch modifies the compiler to emit special instructions when it sees a callsite that uses “new”. Before this patch, calling Object.new
would result in bytecode like this:
ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0004 leave
With this patch, the bytecode looks like this:
./ruby --dump=insns -e'Object.new'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,10)>
0000 opt_getconstant_path <ic:0 Object> ( 1)[Li]
0002 putnil
0003 swap
0004 opt_new <calldata!mid:new, argc:0, ARGS_SIMPLE>, 11
0007 opt_send_without_block <calldata!mid:initialize, argc:0, FCALL|ARGS_SIMPLE>
0009 jump 14
0011 opt_send_without_block <calldata!mid:new, argc:0, ARGS_SIMPLE>
0013 swap
0014 pop
0015 leave
The new opt_new
instruction checks whether or not the new
implementation is the default “allocator” implementation. If it is the default allocator, then the instruction will allocate the object and call initialize passing parameters to initialize but not to new
. If the method is not the default allocator implementation, it will jump to the normal method dispatch instructions.
Performance Improvements
This patch improves performance of all allocations that use the normal “new” method for allocation. Here are two examples (all of these benchmarks compare Ruby 3.4.2 against Ruby master with inlining patch):
A simple Object.new
in a hot loop improves by about 24%:
hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 436.6 ms ± 3.3 ms [User: 432.3 ms, System: 3.8 ms]
Range (min … max): 430.5 ms … 442.6 ms 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Time (mean ± σ): 351.1 ms ± 3.6 ms [User: 347.4 ms, System: 3.3 ms]
Range (min … max): 343.9 ms … 357.4 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end' ran
1.24 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Object.new; i += 1; end'
Using a single keyword argument is improved by about 72%:
> hyperfine "ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'" "./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'"
Benchmark 1: ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 1.082 s ± 0.007 s [User: 1.074 s, System: 0.008 s]
Range (min … max): 1.071 s … 1.091 s 10 runs
Benchmark 2: ./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
Time (mean ± σ): 627.6 ms ± 4.8 ms [User: 622.6 ms, System: 4.5 ms]
Range (min … max): 622.1 ms … 637.2 ms 10 runs
Summary
./ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end' ran
1.72 ± 0.02 times faster than ruby --disable-gems -e'i = 0; while i < 10_000_000; Hash.new(capacity: 0); i += 1; end'
The performance increase depends on the number and type of parameters passed to initialize
. For example, an initialize
method that takes 3 parameters can see a speed improvement of ~3x:
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
i = 0
while i < 10_000_000
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
Foo.new(a: 1, b: 2, c: 3)
i += 1
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> hyperfine "ruby --disable-gems test.rb" "./ruby --disable-gems test.rb"
Benchmark 1: ruby --disable-gems test.rb
Time (mean ± σ): 3.700 s ± 0.033 s [User: 3.681 s, System: 0.018 s]
Range (min … max): 3.636 s … 3.751 s 10 runs
Benchmark 2: ./ruby --disable-gems test.rb
Time (mean ± σ): 1.182 s ± 0.013 s [User: 1.173 s, System: 0.008 s]
Range (min … max): 1.165 s … 1.203 s 10 runs
Summary
./ruby --disable-gems test.rb ran
3.13 ± 0.04 times faster than ruby --disable-gems test.rb
One factor in the performance increase for keyword arguments is that inlining is able to eliminate the hash allocation when calling “through” the C implementation of Class#new
:
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
class Foo
def initialize a:, b:, c:
end
end
def allocs
x = GC.stat(:total_allocated_objects)
yield
GC.stat(:total_allocated_objects) - x
end
def test; allocs { Foo.new(a: 1, b: 2, c: 3) }; end
test
p test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
2
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
1
Memory Increase¶
Of course this patch is not “free”. Inlining the method call adds extra YARV instructions. We estimate this patch increases new
call sites by about 122 bytes:
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
end
end
def test
Foo.new
end
puts ObjectSpace.memsize_of(RubyVM::InstructionSequence.of(method(:test)))
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
544
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
656
We’ve tested this in Shopify’s monolith, comparing Ruby 3.4.2 and Ruby 3.5+inlining, and it seems to increase total ISEQ memesize by about 3.8mb (roughly 0.5% increase in ISEQ size):
irb(main):001> 737191972 - 733354388
=> 3837584
However, Ruby 3.5 has more overall ISEQ objects than Ruby 3.4.2:
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-inline.txt
789545 sizes-inline.txt
aaron@Aarons-MacBook-Pro ~/Downloads> wc -l sizes-3.4.txt
789479 sizes-3.4.txt
We see total heap size as reported by memsize
to only increase by about 1MB:
irb(main):001> 3981075617 - 3979926505
=> 1149112
Changes to caller
This patch changes caller
reporting in the initialize
method:
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def initialize
puts caller
end
end
def test
Foo.new
end
test
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
test.rb:10:in 'Class#new'
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-03T13:03:19Z inline-new 567c54208c) +PRISM [arm64-darwin24]
test.rb:10:in 'Object#test'
test.rb:13:in '<main>'
As you can see in the above output, the Class#new
frame is eliminated. I'm not sure if anyone really cares about this frame. We've tested this patch in Shopify's CI, and didn't find any code that depends on this callstack. However, this patch did require changes to ERB for emitting warnings.
That said, eliminating the frame also has the side-effect of making some of our allocation tracing tools a little more useful:
aaron@tc-lan-adapter ~/g/ruby (inline-new)> cat test.rb
require "objspace"
class Foo
def test
Object.new
end
end
ObjectSpace.trace_object_allocations do
obj = Foo.new.test
puts ObjectSpace.allocation_class_path(obj)
puts ObjectSpace.allocation_method_id(obj)
end
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ruby -v test.rb
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Class
new
aaron@tc-lan-adapter ~/g/ruby (inline-new)> ./ruby -v test.rb
ruby 3.5.0dev (2025-04-07T19:40:59Z inline-new 2cf0efa18e) +PRISM [arm64-darwin24]
Foo
test
Before inlining, ObjectSpace
would report the allocation class path and method id as Class#new
which isn't very helpful. With the inlining patch, we can see that the object is allocated in Foo#test
.
Summary¶
I think the overall memory increase is modest, and the change to caller
is acceptable especially given the performance increase this patch provides.