Bug #16850
closedObject#hash doesn't behave as documented
Description
From Ruby 2.7 Object class documentation:
The hash value is used along with eql? by the Hash class to determine if two objects reference the same hash key.
From this I expect that overwritting the Object#hash
method changes the Hashing algorithm. But this is only the case for some objets. Consider the following code:
class Object
def hash
i_dont_exist
end
end
class Ana
end
hash = {}
hash[3] = true
hash[3] = false
puts 'Integers are not using the new hash method'
hash[Ana.new] = true
puts 'this will not be executed because the class Ana uses the new hash method'
The output of executing this code is:
Integers are not using the new hash method
Traceback (most recent call last):
1: from a.rb:13:in `<main>'
a.rb:3:in `hash': undefined local variable or method `i_dont_exist' for #<Ana:0x000055626f0b7a30> (NameError)
This proves that Integer hash method is not used to determine if two objects reference the same hash key, as said in the documentation. Check the any_hash
method for the other classes affected.
I think that either the behavior should be modified (aiming for a consistency along different classes and following the documentation) or the documentation updated.
Updated by jeremyevans0 (Jeremy Evans) over 4 years ago
I consider this an implementation detail. It never makes sense to override the hash calculation for the cases where Ruby uses the built-in calculation instead of calling #hash, so I don't think Ruby should change behavior to increase consistency. However, we could explicitly document the behavior. Something like:
diff --git a/hash.c b/hash.c
index e75a431c09..b7d320f7c2 100644
--- a/hash.c
+++ b/hash.c
@@ -315,6 +315,9 @@ objid_hash(VALUE obj)
* implementations of Ruby. If you need a stable identifier across Ruby
* invocations and implementations you will need to generate one with a custom
* method.
+ *
+ * Certain core classes such as Integer use built-in hash calculations and
+ * do not call the #hash method when used as a hash key.
*--
* \private
*++
Thoughts on that?
Updated by ana06 (Ana Maria Martinez Gomez) over 4 years ago
That looks good to me :+1: I may add String as well (such as Integer and String).
Updated by matz (Yukihiro Matsumoto) over 4 years ago
+1 The updates LGTM.
Matz.
Updated by Dan0042 (Daniel DeLorme) over 4 years ago
Would it be feasible to raise some sort of error when trying to redefine Integer#hash ?
Not just for this case, but I feel there were other cases where ruby bypasses the method call to use the built-in implementation directly. Can't recall exactly atm.
Making certain methods un-overridable could be a useful signal that "this is not going to work as you expect".
Updated by jeremyevans (Jeremy Evans) over 4 years ago
- Status changed from Open to Closed
Applied in changeset git|de29a022acb93691dfc50db852cb04f763565072.
Document that #hash is not called for certain core classes [ci skip]
Fixes [Bug #16850]
Updated by jeremyevans0 (Jeremy Evans) over 4 years ago
Dan0042 (Daniel DeLorme) wrote in #note-4:
Would it be feasible to raise some sort of error when trying to redefine Integer#hash ?
Not just for this case, but I feel there were other cases where ruby bypasses the method call to use the built-in implementation directly. Can't recall exactly atm.
Making certain methods un-overridable could be a useful signal that "this is not going to work as you expect".
It's probably feasible, but I don't think it is a good idea. You can do it in pure Ruby if you want:
module HashOverrideRaise
def method_added(method)
raise "don't override hash" if method == :hash
super
end
def include(*)
super
raise "don't override hash" unless instance_method(:hash).owner == Kernel
end
end
Integer.extend HashOverrideRaise
However, this doesn't handle the case in Ana's example, because you are defining Object#hash, and not Integer#hash.
The cases where Ruby bypasses method calls to use built-in code are different than this case. Ruby will default to use the faster built-in code, but if you define a method, Ruby will recognize it and start calling the method. This approach is usually referred to as deoptimization.
In this case, I don't think it makes sense to change Ruby's behavior. There is no reason to override #hash for the cases where Ruby uses a built-in implementation. So the fact that Ruby does not call #hash in all cases should not be a problem. If you override #eql?, Ruby will still respect that and treat objects differently:
h = {}
s = ''
def s.eql?(_) false end
h[s] = 1
h[s] = 2
h
# => {""=>1, ""=>2}
Updated by Dan0042 (Daniel DeLorme) over 4 years ago
Thanks for the insight into bypass vs deoptimization.
Of course if you override #hash and #eql? to return true for previously false cases then it wouldn't work, but anyway I don't see much use or reason for doing this on Integer or String. The added documentation is plenty good enough.