Feature #19236
closedAllow to create hashes with a specific capacity from Ruby
Added by byroot (Jean Boussier) almost 3 years ago. Updated over 1 year ago.
Description
Followup on [Feature #18683] which added a C-API for this purpose.
Various protocol parsers such as Redis RESP3 or msgpack, have to create hashes, and they know the size in advance.
For efficiency, it would be preferable if they could directly allocate a Hash of the necessary size, so that large hashes wouldn't cause many re-alloccations and re-hash.
String and Array both already offer similar APIs:
String.new(capacity: XXX)
Array.new(XX) / rb_ary_new_capa(long)
However there's no such public API for Hashes in Ruby land.
Proposal¶
I think Hash should have a way to create a new hash with a capacity parameter.
The logical signature of Hash.new(capacity: 1000) was deemed too incompatible in [Feature #18683].
@Eregon (Benoit Daloze) proposed to add Hash.create(capacity: 1000).
Updated by byroot (Jean Boussier) almost 3 years ago
Actions
#1
- Related to Feature #18683: Allow to create hashes with a specific capacity. added
Updated by janosch-x (Janosch Müller) almost 3 years ago
Actions
#2
[ruby-core:111425]
maybe the genie is out of the bottle already, but it would be nice to have a uniform API for creating objects with a given capacity, e.g.
Array.with_capacity(100) # => []
Hash.with_capacity(100) # => {}
IO::Buffer.with_capacity(100) # => #<IO::Buffer>
String.with_capacity(100) # => ''
# more?
for Array and IO::Buffer, ::with_capacity would essentially be an alias for ::new. for String, the capacity kwarg could be deprecated to limit the number of APIs.
Updated by mame (Yusuke Endoh) almost 3 years ago
Actions
#3
[ruby-core:111919]
Discussed at the dev meeting.
@matz (Yukihiro Matsumoto) said that Hash.create(capacity: 4096) is acceptable (unless it conflicts with any major gems). However, several participants including @ko1 (Koichi Sasada) were a little cautious about introducing the new terminology "create" into Ruby core, and matz understood that.
@matsuda (Akira Matsuda) and @mame (Yusuke Endoh) prefer Hash.new(capacity: 4096). This is a bit incompatible, but we searched gem-codesearch with the query '\bHash\.new\(\w+: ' and found less than 20 results (manually excluding Foo::Bar::Hash.new(...) which is perhaps different from ::Hash). Moreover, some of the results seemed to misunderstand Hash.new(foo: 1) as { foo: 1 }. (The most examples are rspec; maybe because let(:option) { { foo: 1 } } looks bad, people inadvertently rewrote it with let(:option) { Hash.new(foo: 1) }.)
Therefore, how about deprecating giving the keyword to Hash.new and then introducing Hash.new(capacity: 4096)? @matz (Yukihiro Matsumoto) said this is also acceptable if the incompatibility is not a big problem.
(Off-topic: Array.new(capacity: 4096) is not yet available; I wonder if people want Hash.new(capacity: 4096) more than Array?)
Updated by byroot (Jean Boussier) almost 3 years ago
Actions
#4
[ruby-core:111931]
Well, Hash.new(capacity: 4096) was definitely my first pick, so this is great news IMO.
how about deprecating giving the keyword to Hash.new and then introducing Hash.new(capacity: 4096)?
What would be the timeline?
Deprecate in 3.3 and break in 3.4?
Off-topic: Array.new(capacity: 4096) is not yet available; I wonder if people want Hash.new(capacity: 4096) more than Array?
I think it's in part because Array.new(4096) while not exactly the same, already somewhat works. I'd be happy to add Array.new(capacity: 4096) though, but it has a similar backward compatibility concern doesn't it?
Updated by mame (Yusuke Endoh) almost 3 years ago
Actions
#5
[ruby-core:111983]
byroot (Jean Boussier) wrote in #note-4:
What would be the timeline?
Deprecate in 3.3 and break in 3.4?
That would be the fastest way.
Off-topic: Array.new(capacity: 4096) is not yet available; I wonder if people want Hash.new(capacity: 4096) more than Array?
I think it's in part because
Array.new(4096)while not exactly the same, already somewhat works. I'd be happy to addArray.new(capacity: 4096)though, but it has a similar backward compatibility concern doesn't it?
Fortunately, it raises an error: Array.new(capacity: 4096) #=> no implicit conversion of Hash into Integer (TypeError). So I don't see a big problem with changing this. Anyway, I think we need a separate ticket if we introduce it.
Updated by Eregon (Benoit Daloze) almost 3 years ago
Actions
#6
[ruby-core:112008]
If we use Hash.new(capacity: 4096) to set the capacity, then Hash.new({ capacity: 4096 }) should keep the semantics of: { capacity: 4096 } is the default value of the new Hash.
I think it's a little bit hacky/unclear/source of confusion, but still I'm not against it.
Updated by Dan0042 (Daniel DeLorme) over 2 years ago
Actions
#7
[ruby-core:113408]
Previousy, a capacity reader/writer was suggested by @byroot (Jean Boussier) in #18683#note-2
I would like to see this idea considered more seriously because
- It doesn't need to change anything to the initialize arguments of Array/Hash/String, which are already quite complex enough
- The same API can be used for any class; it's nicely consistent and easy to remember
- It's more versatile, as it can be used more than once after object creation, ex:
buffer = String.new #this example is with String, but the same could apply to Hash/Array
while line = gets
#increase buffer capacity by chunks of 10k
buffer.capacity += 10000 if buffer.capacity < buffer.bytesize + line.bytesize
buffer << line
end
buffer.capacity = 0 #trim buffer to minimal size (aka "right-size")
buffer.capacity == buffer.bytesize #=> true
Updated by ianks (Ian Ker-Seymer) over 2 years ago
Actions
#8
[ruby-core:113412]
I worry that new Rubyists might be confused with the Hash.new(capacity: n) semantics.
For example, Hash[capacity: 5] can look very similar to Hash.new(capacity: 5). It wouldn’t be unreasonable to assume they are the same thing… But you’d be in for an unexpected surprise.
To me Hash.with_capacity clearly communicates what’s happening. Anyone can understand it at first glance.
Updated by byroot (Jean Boussier) over 2 years ago
Actions
#9
[ruby-core:113413]
For example, Hash[capacity: 5] can look very similar to Hash.new(capacity: 5).
That seems like a very handwavy argument to me. I really don't see how the two could possibly be confused.
Updated by Dan0042 (Daniel DeLorme) over 2 years ago
Actions
#10
[ruby-core:113434]
ianks (Ian Ker-Seymer) wrote in #note-8:
To me
Hash.with_capacityclearly communicates what’s happening. Anyone can understand it at first glance.
Hash.with_capacity is not composable. What should you do if you want a default value/proc AND a capacity?
h = Hash.with_capacity(100)
h.default = default_value #this?? a bit ugly imho
Hash#with_capacity would be better, then you could do Hash.new(default_value).with_capacity(400) similar to compare_by_identity usage.
But at that point it's imho better to have Hash.new(default_value).tap{ _1.capacity = 400 }
Or the best: Hash.new(default_value).tap{ .capacity = 400 } ;-)
Updated by byroot (Jean Boussier) over 2 years ago
Actions
#11
[ruby-core:113528]
This was discussed in the last dev meeting. The conclusion was:
In 3.3 it throws error all keyword arguments to Hash.new. Then Ruby 3.4 allows that Hash.new will accept capacity keyword argument.
Updated by byroot (Jean Boussier) over 2 years ago
Actions
#12
[ruby-core:113598]
Correction:
In 3.3 it throws error all keyword arguments to Hash.new.
Was a misunderstanding.
What was actually agreed was a deprecation warning, I modified the pull request accordingly.
Updated by Anonymous over 2 years ago
Actions
#13
- Status changed from Open to Closed
Applied in changeset git|31ac8efca8ecb574e1e7b7c32cce54cb1b97f19a.
Hash.new: print a deprecation warning when receiving keyword arguments (#7828)
[Feature #19236]
In Ruby 3.3, Hash.new shall print a deprecation warning if keyword arguments
are passed instead of treating them as an implicit positional Hash.
This will allow to safely introduce a capacity keyword argument in 3.4
Co-authored-by: Jean Boussier byroot@ruby-lang.org
Updated by byroot (Jean Boussier) over 2 years ago
Actions
#14
[ruby-core:113600]
- Status changed from Closed to Open
- Target version deleted (
3.3)
Reopening as the merged commit is the Ruby 3.3 part.
I'll implement the 3.4 next year.
Updated by byroot (Jean Boussier) over 1 year ago
Actions
#15
[ruby-core:117322]
Implemented Hash.new(capacity:) in https://github.com/ruby/ruby/pull/10357
Updated by shan (Shannon Skipper) over 1 year ago
Actions
#16
[ruby-core:118468]
I'm really looking forward to this feature being available via a Ruby interface. ❤️
Updated by byroot (Jean Boussier) over 1 year ago
Actions
#17
- Status changed from Open to Closed
Applied in changeset git|9594db0cf28d7bc10bfc46142239191a11f1dbbe.
Implement Hash.new(capacity:)
[Feature #19236]
When building a large hash, pre-allocating it with enough
capacity can save many re-hashes and significantly improve
performance.
/opt/rubies/3.3.0/bin/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::../miniruby-master -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby --disable-gem" \
--output=markdown --output-compare -v $(find ./benchmark -maxdepth 1 -name 'hash_new' -o -name '*hash_new*.yml' -o -name '*hash_new*.rb' | sort)
compare-ruby: ruby 3.4.0dev (2024-03-25T11:48:11Z master f53209f023) +YJIT dev [arm64-darwin23]
last_commit=[ruby/irb] Cache RDoc::RI::Driver.new (https://github.com/ruby/irb/pull/911)
built-ruby: ruby 3.4.0dev (2024-03-25T15:29:40Z hash-new-rb 77652b08a2) +YJIT dev [arm64-darwin23]
warming up...
| |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|new | 7.614M| 5.976M|
| | 1.27x| -|
|new_with_capa_1k | 13.931k| 15.698k|
| | -| 1.13x|
|new_with_capa_100k | 124.746| 148.283|
| | -| 1.19x|