Actions

Copy link

Bug #20301

open

`Set#add?` does two hash look-ups

Bug #20301: `Set#add?` does two hash look-ups

Added by AMomchilov (Alexander Momchilov) over 2 years ago. Updated over 2 years ago.

Status:

Open

Assignee:

Target version:

ruby -v:

Backport:

3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN

[ruby-core:116941]

Description

A common usage of Sets is to keep track of seen objects, and do something different whenever an object is seen for the first time, e.g.:

SEEN_VALUES = Set.new
	
def receive_value(value)
	if SEEN_VALUES.add?(value)
		puts "Saw #{value} for the first time."
	else
		puts "Already seen #{value}, ignoring."
	end
end

receive_value(1) # Saw 1 for the first time.
receive_value(2) # Saw 2 for the first time.
receive_value(3) # Saw 3 for the first time.
receive_value(1) # Already seen 1, ignoring.

Readers might reasonably assume that add? is only looking up into the set a single time, but it's actually doing two separate look-ups! (source)

class Set
  def add?(o
    # 1. `include?(o)` looks up into `@hash`
    # 2. if the value isn't there, `add(o)` does a second look-up into `@hash`
    add(o) unless include?(o)
  end
end

This gets especially expensive if the values are large hash/arrays/objects, whose #hash is expensive to compute.

We can optimize this if it was possible to set a value in hash, and retrieve the value that was already there, in a single go. I propose adding Hash#exchange_value to do exactly that. If that existed, we can re-implement #add? as:

class Set
  def add?(o)
    # Only requires a single look-up into `@hash`!
    self unless @hash.exchange_value(o, true)
  end

Here's a proof-of-concept implementation: https://github.com/ruby/ruby/pull/10093

Theory¶

How much of a benefit this has depends on 2 factors:

How much #hash is called, which depends on how many new objects are added to the set.
- If every object is new, then #hash used to be called twice on every #add?.
  - This is where this improvement makes the biggest (2x!) change.
- If every object has already been seen, then #hash was never being called twice before anyway, so there would be no improvement.
  - It's important to not regress in this case, because many use cases of sets don't deal with many distinct objects, but just need to do quick checks against an existing set.
- Every other case lies somewhere in between those two, depending on the % of objects which are new.
How slow #hash is to compute for the key
- If the hash is slow to compute, this change will make a bigger improvement
- If the hash value is fast to compute, then it won't matter as much. Even if we called it half as much, it's a minority of the total time, so it won't have much net impact.

Benchmark summary¶

	All objects are new	All objects are preexisting
objects with slow `#hash`	100.0%	~0.0%
objects with fast `#hash`	24.5%	4.6%

As we see, this change makes a huge improvement the cases where it helps, and crucially, doesn't slow down the cases where it can't.

For the complete benchmark source code and results, see the PR: https://github.com/ruby/ruby/pull/10093

Related issues 1 (1 open — 0 closed)

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#1

Description updated (diff)

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#2

Description updated (diff)

Updated by Dan0042 (Daniel DeLorme) over 2 years ago · Edited Actions
Copy link
#3 [ruby-core:116952]

Now I understand why you proposed #20300 Hash#update_value

However I'd like to suggest an alternative approach for your consideration:

def add?(k)
  added = false
  @hash.add(k){ added = true } #call block only if k not in @hash; return existing or added value
  self if added
end

This is likely to be a bit less efficient than your approach, however Hash#add is a method I've been missing from ruby for a long time, and would find infinitely more useful than Hash#update_value

Updated by Eregon (Benoit Daloze) over 2 years ago 1Actions
Copy link
#4 [ruby-core:116955]

@Dan0042 (Daniel DeLorme) That's related to #17342 then, and also known as compute_if_absent in concurrent-ruby.

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#5

Related to Feature #20300: Hash: set value and get pre-existing value in one call added

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#6 [ruby-core:116959]

I don't mind it @Dan0042 (Daniel DeLorme), but that's a secondary issue IMO. The block call defeats the benefit of this optimization. It'll even slow down the case where you're looking up pre-existing objects (that's currently net-even perf after these changes), and that's a big no-no.

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#7

Description updated (diff)

Updated by shyouhei (Shyouhei Urabe) over 2 years ago Actions
Copy link
#8 [ruby-core:117138]

Why not:

def add?(o)
  n = size
  add(o)
  m = size
  return n == m ? self : nil
end

This implementation involves only one hash lookup.

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#9 [ruby-core:117141]

shyouhei (Shyouhei Urabe) wrote in #note-8:

  return n == m ? self : nil

The return value is inverse.

Updated by shyouhei (Shyouhei Urabe) over 2 years ago Actions
Copy link
#10 [ruby-core:117142]

nobu (Nobuyoshi Nakada) wrote in #note-9:

shyouhei (Shyouhei Urabe) wrote in #note-8:
  return n == m ? self : nil
The return value is inverse.

My bad. Thank you for correction.

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#11 [ruby-core:117177]

That implementation using size is not thread-safe, even on CRuby AFAIK.
For example, if T2 calls add? with a new element while T1 calls add? with an existing element.
If T1 is just before m = size when T2 executes add(o), then both threads return "element added" but T1 did not add an element (incorrect result).

The original code with two lookups does not have that race condition.
However it can have the race condition that two threads adding the same new element both return "element added".
Hash#exchange_value would fix that.

Updated by shyouhei (Shyouhei Urabe) over 2 years ago Actions
Copy link
#12 [ruby-core:117190]

Yes. add(o) unless include?(o) isn't thread safe already. My implementation just doesn't care to improve that.

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#13 [ruby-core:117192]

shyouhei (Shyouhei Urabe) wrote in #note-8:

Why not:

Because I didn't think of that :)

I would be okay with it, but I think the thread safety issue is also worth solving. The implementation I'm proposing solves both the performance and safety problems.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #20301

`Set#add?` does two hash look-ups

Theory¶

Benchmark summary¶

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#1

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#2

Updated by Dan0042 (Daniel DeLorme) over 2 years ago · Edited Actions
Copy link
#3 [ruby-core:116952]

Updated by Eregon (Benoit Daloze) over 2 years ago 1Actions
Copy link
#4 [ruby-core:116955]

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#5

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#6 [ruby-core:116959]

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#7

Updated by shyouhei (Shyouhei Urabe) over 2 years ago Actions
Copy link
#8 [ruby-core:117138]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#9 [ruby-core:117141]

Updated by shyouhei (Shyouhei Urabe) over 2 years ago Actions
Copy link
#10 [ruby-core:117142]

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#11 [ruby-core:117177]

Updated by shyouhei (Shyouhei Urabe) over 2 years ago Actions
Copy link
#12 [ruby-core:117190]

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#13 [ruby-core:117192]

Project

General

Profile

Ruby

Custom queries

Bug #20301

`Set#add?` does two hash look-ups

Theory¶

Benchmark summary¶

Updated by AMomchilov (Alexander Momchilov) over 2 years ago ActionsCopy link #1

Updated by AMomchilov (Alexander Momchilov) over 2 years ago ActionsCopy link #2

Updated by Dan0042 (Daniel DeLorme) over 2 years ago · Edited ActionsCopy link #3 [ruby-core:116952]

Updated by Eregon (Benoit Daloze) over 2 years ago 1ActionsCopy link #4 [ruby-core:116955]

Updated by Eregon (Benoit Daloze) over 2 years ago ActionsCopy link #5

Updated by AMomchilov (Alexander Momchilov) over 2 years ago ActionsCopy link #6 [ruby-core:116959]

Updated by AMomchilov (Alexander Momchilov) over 2 years ago ActionsCopy link #7

Updated by shyouhei (Shyouhei Urabe) over 2 years ago ActionsCopy link #8 [ruby-core:117138]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago ActionsCopy link #9 [ruby-core:117141]

Updated by shyouhei (Shyouhei Urabe) over 2 years ago ActionsCopy link #10 [ruby-core:117142]

Updated by Eregon (Benoit Daloze) over 2 years ago ActionsCopy link #11 [ruby-core:117177]

Updated by shyouhei (Shyouhei Urabe) over 2 years ago ActionsCopy link #12 [ruby-core:117190]

Updated by AMomchilov (Alexander Momchilov) over 2 years ago ActionsCopy link #13 [ruby-core:117192]

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#1

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#2

Updated by Dan0042 (Daniel DeLorme) over 2 years ago · Edited Actions
Copy link
#3 [ruby-core:116952]

Updated by Eregon (Benoit Daloze) over 2 years ago 1Actions
Copy link
#4 [ruby-core:116955]

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#5

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#6 [ruby-core:116959]

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#7

Updated by shyouhei (Shyouhei Urabe) over 2 years ago Actions
Copy link
#8 [ruby-core:117138]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#9 [ruby-core:117141]

Updated by shyouhei (Shyouhei Urabe) over 2 years ago Actions
Copy link
#10 [ruby-core:117142]

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#11 [ruby-core:117177]

Updated by shyouhei (Shyouhei Urabe) over 2 years ago Actions
Copy link
#12 [ruby-core:117190]

Updated by AMomchilov (Alexander Momchilov) over 2 years ago Actions
Copy link
#13 [ruby-core:117192]