Project

General

Profile

Actions

Misc #20509

open

Document importance of #to_ary and #to_hash for Array#== and Hash#==

Added by gettalong (Thomas Leitner) 6 months ago. Updated 4 months ago.

Status:
Open
Assignee:
-
[ruby-core:118012]

Description

Both Array#== and Hash#== provide special behaviour in case the other argument is not an Array/Hash but defines the special #to_ary/#to_hash methods. Those methods are never called, they are just checked for existence. And if they exist, other#== is called to allow the other argument to decide whether the two objects are equal.

I think this is worth mentioning in the documentation for Array#== and Hash#==.

[Background: In my PDF library HexaPDF I have defined two classes PDFArray and Dictionary which act like Array and Hash but provide special PDF specific behaviour. For PDFArray I defined the #to_ary method but for Dictionary just the #to_h method. I have come across a bug where comparing Arrays with PDFArrays just works as it should be but comparing Hashes with Dictionaries doesn't due to the absence of #to_hash (it seems I removed Dictionary#to_hash in 2017 due to problems with automatic destructuring when passing a Dictionary as argument; from what I see that should be no problem anymore, so I will just add it back).]

Updated by matz (Yukihiro Matsumoto) 4 months ago

It is intentional behavior. Usually, having to_ary means the object must behave as an array. If the object a is an array and the object b responds to to_ary, I expect the same result from b == a and a == b.to_ary. We use the former to reduce unnecessary object allocation.

Same for to_hash respectively.

Maybe we should document this expectation clearly in the reference.

Matz.

Updated by Dan0042 (Daniel DeLorme) 4 months ago ยท Edited

Maybe we should document this expectation clearly in the reference.

Definitely, because this is all new and surprising to me even with 20+ years of ruby experience.

o = Object.new
def o.to_ary
  [1,2,3]
end
[1,2,3] == o        #=>false 
[1,2,3] == o.to_ary #=>true
#I expected these two expressions to be equivalent

So when we define #to_ary we also should define #==
It's the first time I hear about this.
It's also rather inconvenient, and imho not worth the benefit of "reduce unnecessary object allocation", but that's another story.

Updated by mame (Yusuke Endoh) 4 months ago

As you may know, but for the record, I add just some background.

The rationale for to_ary is a compromise between duck typing and efficiency.
Following the principles of duck typing, a method that accepts an Array should access its argument via methods such as #size, #[], #each, etc. However, this is too inefficient for builtin methods that are implemented in C. So, C methods access their arguments in a way that depends on the implementation details of the Array, such as RARRAY_LEN. Then, the C method cannot accept an non-Array object that behaves as an Array.

So, the protocol of to_ary was introduced: an object that behaves as an Array implements #to_ary; C methods that accept an Array will attempt to convert them using #to_ary if the argument is not an Array. This allows both duck typing and efficiency.

This means that an object that implements #to_ary is supposed to have their other methods also behave Array-compatible to a reasonable extent.

I have heard this several times from matz, but am not sure if it is documented. At least it wasn't in doc/implicit_conversion.rdoc. I think this should be added.

In the case of this ticket, this to_ary protocol is applied in a different way. Array#==(other) delegates to other == self, because the fact that other implements #to_ary implies that other's == must behave as Array#==. This is because calling to_ary is likely to generate an array, but other's == may be implemented to compare arrays without generating an array.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0