Misc #19693
closedData initialization is significantly slower than Struct
Description
Maybe there is potential to make it as fast as Struct?
require 'benchmark/ips'
S = Struct.new(:a, :b, :c, :d, :e, :f, :g, :h, :i, :j)
D = Data.define(:a, :b, :c, :d, :e, :f, :g, :h, :i, :j)
Benchmark.ips do |x|
x.report('Struct') { S.new(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) }
x.report('Data') { D.new(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) }
x.compare!
end; 1
# => [...]
# => Struct: 6916530.4 i/s
# => Data: 1507259.5 i/s - 4.59x slower
Updated by nobu (Nobuyoshi Nakada) 11 months ago
Data.new
creates a Hash and initializes via the Hash.
The following line results in similar performance:
x.report('keywords') { S.new(a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9, j: 10) }
Warming up --------------------------------------
Struct 273.727k i/100ms
keywords 75.597k i/100ms
Data 75.350k i/100ms
Calculating -------------------------------------
Struct 5.234M (± 0.7%) i/s - 26.278M in 5.020698s
keywords 867.555k (± 4.9%) i/s - 4.385M in 5.067422s
Data 862.600k (± 5.1%) i/s - 4.370M in 5.082609s
Comparison:
Struct: 5234123.3 i/s
keywords: 867554.8 i/s - 6.03x slower
Data: 862599.6 i/s - 6.07x slower
Updated by byroot (Jean Boussier) 11 months ago
- Status changed from Open to Closed
Should we close? I don't see anything actionable here.
Updated by janosch-x (Janosch Müller) 11 months ago
@byroot (Jean Boussier) To me it looks as if the handling of positional args in rb_data_s_new
/ rb_data_initialize_m
could perhaps be patched so as not to create an intermediate hash and thus greatly increase performance for this case. I might be missing a gotcha and am unsure whether its worth it, though.
Updated by byroot (Jean Boussier) 11 months ago
@janosch-x (Janosch Müller) based on the discussion in #19278, I don't think it's possible as Data
specifically designed so that initialize
always receive keyword arguments.
So given the spec (which we can't change) I don't see how it could be made performant, but maybe someone has an idea?
Updated by Eregon (Benoit Daloze) 11 months ago
Defining a new
singleton method on the Data subclass, in Ruby and with explicit keyword arguments should be able to then use the literal kwargs optimization, but that means using eval
or so and then needing to validate all argument names are valid local variable names (might already be the case, I don't know). Or a way to define a method in C with the same kwargs optimization, but that is currently not possible AFAIK.
A more general optimization of keyword arguments could also work for this, like https://github.com/oracle/truffleruby/issues/2388 is an example of that, but it's not fully implemented yet.
Another way is splitting new
at every call site (for Data subclasses at least) + escape analysis + a Hash representation that can take e.g. 10 pairs and still escape analyze fine, then it should be the same as positional arguments if everything inlines (should be, it's not big methods). I'll try this on TruffleRuby when Data is implemented there.
Updated by nobu (Nobuyoshi Nakada) 11 months ago
This is a kind of cheat, but...
https://github.com/nobu/ruby/tree/data-optimize
compare-ruby: ruby 3.3.0dev (2023-05-26T17:07:47Z master c6e4337a99) [x86_64-darwin22]
built-ruby: ruby 3.3.0dev (2023-05-27T10:07:26Z data-optimize 81baba152b) [x86_64-darwin22]
last_commit=[Misc #19693] Optimize Data.new
warming up....
Iteration per second (i/s)¶
compare-ruby | built-ruby | |
---|---|---|
Struct-list | 7.194M | 7.169M |
1.00x | - | |
Struct-kwd | 933.437k | 925.462k |
1.01x | - | |
Data-list | 921.171k | 9.232M |
- | 10.02x | |
Data-kwd | 944.031k | 946.684k |
- | 1.00x |
Updated by byroot (Jean Boussier) 11 months ago
This is a kind of cheat
Hum, if I'm reading this correctly, you check is initialize
was redefined, and if it wasn't you bypass the initializer entirely?
I suppose it wouldn't hurt.
Updated by janosch-x (Janosch Müller) 11 months ago
I suppose it wouldn't hurt.
I think its beautiful 😁
Another option could be to bypass the initializer by default and only patch it into a Data subclass when initialize
is redefined (as detected via method_added
or so).
Defining a
new
singleton method on the Data subclass, in Ruby and with explicit keyword arguments should be able to then use the literal kwargs optimization
@Eregon (Benoit Daloze) Wouldn't that require exposing some kind of alternate, non-freezing initializer to Ruby, or some setter methods? This seems to go a bit against the immutable spirit of the class. Also, wouldn't defining new
in Ruby slow down initialization with positional/list arguments? Maybe the solution would rather be to auto-define a keyword_initialize
method or so in Ruby and let new
, still in C, delegate to that method when given kwargs?
BTW using literal kwargs in Ruby also makes struct initialization with keywords 3 times faster:
require 'benchmark/ips'
S1 = Struct.new(:a, :b, :c, :d, :e, :f, :g, :h, :i, :j)
S2 = Struct.new(:a, :b, :c, :d, :e, :f, :g, :h, :i, :j)
class << S2
alias orig_new new
def new(a: nil, b: nil, c: nil, d: nil, e: nil, f: nil, g: nil, h: nil, i: nil, j: nil)
orig_new(a, b, c, d, e, f, g, h, i, j)
end
end
Benchmark.ips do |x|
x.report('ckw') { S1.new(a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9, j: 10) }
x.report('rubykw') { S2.new(a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9, j: 10) }
x.compare!
end; 1
# rubykw: 4510892.9 i/s
# ckw: 1377423.3 i/s - 3.27x slower
Updated by Eregon (Benoit Daloze) 11 months ago
janosch-x (Janosch Müller) wrote in #note-8:
BTW using literal kwargs in Ruby also makes struct initialization with keywords 3 times faster:
Exactly, passing literal kwargs from Ruby to Ruby is optimized, passing kwargs from/to C is not, it's slow and creates a Hash every time (currently on CRuby at least).
Updated by janosch-x (Janosch Müller) 11 months ago
@Eregon (Benoit Daloze) Just in case you are interested, I tried implementing your suggestion:
Defining a
new
singleton method on the Data subclass, in Ruby and with explicit keyword arguments should be able to then use the literal kwargs optimization
Unfortunately it doesn't seem to be possible in a way that is performant and spec-compliant.
Struct and Data support both list and keyword arguments in new
now, so an override in Ruby needs to accept both types of arguments, pick one type, and check it for completeness. I tried to get around checking by using the anonymous kwrest splat (**
) and passing that on to another generated method with explicit kwargs, but **
brings the performance back to the original level. I guess it also (pointlessly?) creates a Hash.
Updated by byroot (Jean Boussier) 11 months ago
Have you tried:
def initialize(_a = nil, _b = nil, a: _a, b: _b)
But that too I'm not sure is spec compliant, as it would allow to call Foo.new("a", a: "also_a")
Updated by janosch-x (Janosch Müller) 11 months ago
nice idea! i've tried it. it still doubles (instead of triples) kw init speed, at the cost of halving speed when initializing with list args:
orig list: 7136048.0 i/s
patch list: 3115468.7 i/s - 2.29x slower
patch kw: 2956280.5 i/s - 2.41x slower
orig kw: 1621000.3 i/s - 4.40x slower
I'm not sure is spec compliant, as it would allow to call Foo.new("a", a: "also_a")
i think this one test, for structs with keyword_init: true
, will fail: https://github.com/ruby/ruby/blame/4589056/test/ruby/test_struct.rb#L116
however, this gives me the idea that a simple Ruby keyword initializer (def self.new(a: nil, b: nil); list_new(a, b) end
) could be auto-defined, and speed tripled, if and only if keyword_init: true
is given. this should be spec compliant. structs with keyword_init: true
already raise an arity error when given list args.
then again, i assume the idea is to rather fade out the keyword_init
option, which doesn't even exist for Data
?