Project

General

Profile

Actions

Bug #20608

closed

Hash#find always allocates each iterated pair

Added by ivoanjo (Ivo Anjo) 5 months ago. Updated 5 months ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.4.0preview1 (2024-05-16 master 9d69619623) [x86_64-linux]
[ruby-core:118446]

Description

Hey there!

Recently I ran into this sharp edge in Hash#find:

puts RUBY_DESCRIPTION

def allocated_now = GC.stat(:total_allocated_objects)

dummy_hash = 10_000.times.each_with_index.to_h

before_allocs = allocated_now
dummy_hash.any? { |k, v| }
puts "Allocated #{allocated_now - before_allocs} for #{File.read(__FILE__).lines[__LINE__-2]}"

before_allocs = allocated_now
dummy_hash.each { |k, v| }
puts "Allocated #{allocated_now - before_allocs} for #{File.read(__FILE__).lines[__LINE__-2]}"

before_allocs = allocated_now
dummy_hash.find { |k, v| }
puts "Allocated #{allocated_now - before_allocs} for #{File.read(__FILE__).lines[__LINE__-2]}"

before_allocs = allocated_now
dummy_hash.any? { |k| }
puts "Allocated #{allocated_now - before_allocs} for #{File.read(__FILE__).lines[__LINE__-2]}"

Result:

ruby 3.4.0preview1 (2024-05-16 master 9d69619623) [x86_64-linux]
Allocated 0 for dummy_hash.any? { |k, v| }
Allocated 0 for dummy_hash.each { |k, v| }
Allocated 10002 for dummy_hash.find { |k, v| }
Allocated 10000 for dummy_hash.any? { |k| }

That is, while Hash#any?, Hash#each, etc avoid doing any allocations during iteration, Hash#find does not hit the rb_block_pair_yield_optimizable => each_pair_i_fast fast path, and so is massively costly compared to the others.

This is very surprising, as I'd expect a find to have a comparable cost to any? (and I ended up redoing some code to avoid find).

Also while experimenting a bit, it was surprising to me that the allocation optimization only kicks in when |k, v| are declared, and thus .any? { |k| } is also more expensive.

Updated by mame (Yusuke Endoh) 5 months ago

As you've probably noticed from reading the source, there is no Hash#find method. Enumerable#find is. It cannot know that #each yields a two-element array, so it is not easy to optimize it.

We could define Hash#find method as a faster special version of Enumerable#find. I personally don't like the idea of manually copying many Enumerable methods into Hash...

I was rather surprised that there is a Hash#any?. Even though there is no Hash#all?. If the consistency is an issue, a simple fix to this "bug" would be to remove Hash#any?.

Updated by ivoanjo (Ivo Anjo) 5 months ago

I personally don't like the idea of manually copying many Enumerable methods into Hash...

It is indeed annoying. On the good side, they could be implemented with Ruby code -- e.g. implementing find using any?. This would even open up the way YJIT to even optimize some of these methods (?).

If the consistency is an issue, a simple fix to this "bug" would be to remove Hash#any?.

Please no 😅! Unless I'm mistaken, Hash#any? is the only method in Hash that allows the iteration to be stopped mid-way (without having to raise or throw). If that was removed, then it would be even harder to iterate Hash to find something + stop when it's found without allocating extra memory.

Updated by mame (Yusuke Endoh) 5 months ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Open to Assigned

I have prototyped a patch that delays the array allocation of multiple arguments for Enumerable#find, #any? etc.

https://github.com/ruby/ruby/pull/11110

$ ./miniruby bench.rb
ruby 3.4.0dev (2024-07-05T17:35:30Z lazy-alloc-of-bloc.. 264a1d8810) [x86_64-linux]
Allocated 1 for dummy_hash.any? { |k, v| }
Allocated 1 for dummy_hash.each { |k, v| }
Allocated 5 for dummy_hash.find { |k, v| }
Allocated 10000 for dummy_hash.any? { |k| }

Enumerable#find, #all?, etc. no longer allocate an Array as long as the predicate block returns falsy. However, there is a cost of calling #each method, so the performance would be slightly less than direct implementation like Hash#any?.

This patch introduces a flag for ifunc. I'm not sure if this issue is worth the complexity. @ko1 (Koichi Sasada) What do you think?

Updated by mame (Yusuke Endoh) 5 months ago

  • Status changed from Assigned to Closed

Updated by ivoanjo (Ivo Anjo) 5 months ago

Amazing, thank you!

Actions

Also available in: Atom PDF

Like0
Like0Like0Like1Like0Like0