Bug #10984
closedHash#contain? to check whether hash contains other hash
Description
Comparing hashes seems like a common practice but there currently isn't a method to ask a
hash instance whether it includes another hash instance.
The most intuitive method to reach for would be Hash#include?
but it is in fact an alias to Hash#has_key?
What I'm looking for can be achieved with:
class Hash
def contain?(other)
self.merge(other) == self
end
end
Here's a simple demo of #contain?
in use:
{ a: true, b: false }.contain?({ a: true})
# => true
{ a: true, b: false }.contain?({ b: false})
# => true
{ a: true, b: false }.contain?({ a: false})
# => false
{ a: true, b: false }.contain?({ c: true})
# => false
One important note is that this method is not checking for nested hash matches.
This may need to be addressed when the parameters include a nested hash perhaps.
Thanks to Terence Lee's help, nobu created a patch for this feature last year.
I've only modified the name of the method from his original patch and attached it to this issue.
Files
Updated by sferik (Erik Michaels-Ober) almost 10 years ago
I agree that #include?
is a more consistent and appropriate name for this method. I would prefer to see that change introduced in Ruby 3.0 than settle for this suboptimal change in Ruby 2.
Updated by olivierlacan (Olivier Lacan) almost 10 years ago
Erik Michaels-Ober wrote:
I agree that
#include?
is a more consistent and appropriate name for this method. I would prefer to see that change introduced in Ruby 3.0 than settle for this suboptimal change in Ruby 2.
From the many conversations I've had with folks after publishing the original proposal for this it seemed that my expectations for Hash#include?
's behavior didn't match many (if not most) people's expectations so it seemed unlikely that such a breaking change would ever be favored.
I don't think a brand new method would be such a bad idea in 2.3 because it would allow for a smoother upgrade path. People could start using Hash#contain?
when 2.3 ships and then the internal implementation of Hash#include?
could be swapped in 3.0 if that is deemed a good idea later on.
Updated by shevegen (Robert A. Heiler) almost 10 years ago
I have no particular pro or contra opinion. I merely wish to think that #include? is a better name because Array also has include? and
String also has include?. While a Hash#include? may not be exactly the same as the other two #includes?, I think the name would be
ok. #contain? feels a bit weird, I mean you could use Array#contain? too (on the other hand, you could use an alias just as .map
vs. .collect; I myself use solely .map, other people can use .collect, things are fine) ok so bottom line from me, I think
Hash#include? is a perfectly suitable name for it too as Erik wrote. Perhaps for Ruby 3.0
Updated by olivierlacan (Olivier Lacan) over 9 years ago
I didn't notice the notes from DevelopersMeeting20150514Japan before so I'll link to my original musings about Hash#include?
which later evolved into this feature/patch: Proposal for a better Ruby Hash#include?
I'm aware this post needs revision and more concrete examples and use cases. I'll try to bring that together soon.
Updated by olivierlacan (Olivier Lacan) about 9 years ago
- Assignee set to akr (Akira Tanaka)
Responding to feedback from Akira Tanaka and Nobuyoshi Nakada at DevelopersMeeting20150514Japan
akr: “contain” is too general. “subhash”?
You mean something like this?
{ a: 1, b: 2 }.subhash?({ b: 2 })
Semantically, this feels strange to me. It doesn't seem obvious at all which hash we're checking for a subhash on and I would expect a lot of confusion with a method name like this. Compare to:
{ a: 1, b: 2 }.contains?({ b: 2 })
I believe contains is semantically far more self-evident.
It also seems odd to introduce a sub<class>?
method name for this since I'm not aware of any similar method names for classes that would have similar behavior.
n0kada: “contain?” seems similiar to “include?”
It is. Sadly, I've been told repeatedly that it's a bad idea to try to change the behavior of include?
. I would prefer replacing the existing include?
but I will settle for contains?
for now because the meaning of "contain" focuses on what's inside the object under observation and is far more commonly used than "comprise":
contain |kənˈtān|
verb [ with obj. ]
1. have or hold (someone or something) within: coffee cans that once contained a full pound of coffee.
- be made up of (a number of things); consist of: borscht can contain mainly beets or a number of vegetables.
- (of a number) be divisible by (a factor) without a remainder.
akr: do we really use? we need concrete examples.
Yes, RSpec has an ad-hoc implementation of this feature in its include
matcher: https://github.com/rspec/rspec-expectations/blob/bb731e29f7800f5cef736cf8850293276a0d3f90/lib/rspec/matchers/built_in/include.rb#L94-L97
RSpec has been downloaded 29 Million times on RubyGems. I think this is a legitimate use case. This would simplify not only RSpec's internal code for Hash matchers, but any existing application who depends on this code, for a relatively minimal impact on the core Hash codebase (see provided patch).
I expanded on my original proposal (since then changed from Hash#include? to Hash#contains?) here: http://olivierlacan.com/posts/proposal-for-a-better-ruby-hash-include/
Updated by funny_falcon (Yura Sokolov) about 9 years ago
What if
{b: 1} === {a: 2, b: 1}
then
h = {a: 2, b: 1}
case h
when {b: 1}
puts "got it"
end
😁😃😄😈
Updated by matz (Yukihiro Matsumoto) about 9 years ago
Hash#contain?
has slight ambiguity problem. I'd vote for adding >=
, along with <=
.
Matz.
Updated by ko1 (Koichi Sasada) about 9 years ago
Discussion: https://docs.google.com/document/d/1D0Eo5N7NE_unIySOKG9lVj_eyXf66BQPM4PKp7NvMyQ/pub
Feel free to continue discussion on this ticket.
Updated by olivierlacan (Olivier Lacan) about 9 years ago
Yukihiro Matsumoto wrote:
Hash#contain?
has slight ambiguity problem. I'd vote for adding>=
, along with<=
.Matz.
Thanks for considering this feature, Matz. :-)
If I understand correctly, the following examples would be correct?
{ a: 1, b: 2 } >= { b: 2 }
=> true
{ b: 2 } <= { a: 1, b: 2 }
=> true
# also
{ b: 2 } >= { b: 2 }
=> true
{ a: 1 } >= { b: 2 }
=> false
{ b: 2 } <= { b: 2 }
=> true
{ b: 2 } <= { a: 1 }
=> false
I think the versatility of this alone would make it even more useful than the one direction method I suggested.
Updated by nobu (Nobuyoshi Nakada) about 9 years ago
If we'll introduce Hash#<=
and Hash#>=
, then Hash#<
and Hash#>
too?
Hash
will include Comparable
with Hash#<=>
?
Updated by akr (Akira Tanaka) about 9 years ago
Nobuyoshi Nakada wrote:
If we'll introduce
Hash#<=
andHash#>=
, thenHash#<
andHash#>
too?
Maybe.
Usefulness of Hash#<
and Hash#>
is not discussed well, though.
Hash
will includeComparable
withHash#<=>
?
No. It is clearly stated by matz.
% ruby -e '
class Hash
def <=(other)
self.merge(other) == other
end
def >=(other)
self.merge(other) == self
end
def <(other)
self <= other && self != other
end
def >(other)
self >= other && self != other
end
end
hs = [{a:1, b:2}, {a:1, b:2, c:3}]
ops = %w[<= >= < >]
ops.each {|op|
hs.each {|h1|
hs.each {|h2|
puts "#{h1} #{op} #{h2} = #{h1.send(op, h2)}"
}
}
}
'
{:a=>1, :b=>2} <= {:a=>1, :b=>2} = true
{:a=>1, :b=>2} <= {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2, :c=>3} <= {:a=>1, :b=>2} = false
{:a=>1, :b=>2, :c=>3} <= {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2} >= {:a=>1, :b=>2} = true
{:a=>1, :b=>2} >= {:a=>1, :b=>2, :c=>3} = false
{:a=>1, :b=>2, :c=>3} >= {:a=>1, :b=>2} = true
{:a=>1, :b=>2, :c=>3} >= {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2} < {:a=>1, :b=>2} = false
{:a=>1, :b=>2} < {:a=>1, :b=>2, :c=>3} = true
{:a=>1, :b=>2, :c=>3} < {:a=>1, :b=>2} = false
{:a=>1, :b=>2, :c=>3} < {:a=>1, :b=>2, :c=>3} = false
{:a=>1, :b=>2} > {:a=>1, :b=>2} = false
{:a=>1, :b=>2} > {:a=>1, :b=>2, :c=>3} = false
{:a=>1, :b=>2, :c=>3} > {:a=>1, :b=>2} = true
{:a=>1, :b=>2, :c=>3} > {:a=>1, :b=>2, :c=>3} = false
Updated by nobu (Nobuyoshi Nakada) about 9 years ago
- Status changed from Open to Closed
Applied in changeset r52518.
hash.c: compare methods
- hash.c (rb_hash_{le,lt,ge,gt}): new methods, Hash#<=, Hash#<,
Hash#>=, Hash#>, to test if all elements of a hash are also
included in another hash, and vice versa.
[ruby-core:68561] [Feature #10984]
Updated by akr (Akira Tanaka) about 9 years ago
Akira Tanaka wrote:
% ruby -e ' class Hash def <=(other) self.merge(other) == other end def >=(other) self.merge(other) == self end def <(other) self <= other && self != other end def >(other) self >= other && self != other end end hs = [{a:1, b:2}, {a:1, b:2, c:3}] ops = %w[<= >= < >] ops.each {|op| hs.each {|h1| hs.each {|h2| puts "#{h1} #{op} #{h2} = #{h1.send(op, h2)}" } } } ' {:a=>1, :b=>2} <= {:a=>1, :b=>2} = true {:a=>1, :b=>2} <= {:a=>1, :b=>2, :c=>3} = true {:a=>1, :b=>2, :c=>3} <= {:a=>1, :b=>2} = false {:a=>1, :b=>2, :c=>3} <= {:a=>1, :b=>2, :c=>3} = true {:a=>1, :b=>2} >= {:a=>1, :b=>2} = true {:a=>1, :b=>2} >= {:a=>1, :b=>2, :c=>3} = false {:a=>1, :b=>2, :c=>3} >= {:a=>1, :b=>2} = true {:a=>1, :b=>2, :c=>3} >= {:a=>1, :b=>2, :c=>3} = true {:a=>1, :b=>2} < {:a=>1, :b=>2} = false {:a=>1, :b=>2} < {:a=>1, :b=>2, :c=>3} = true {:a=>1, :b=>2, :c=>3} < {:a=>1, :b=>2} = false {:a=>1, :b=>2, :c=>3} < {:a=>1, :b=>2, :c=>3} = false {:a=>1, :b=>2} > {:a=>1, :b=>2} = false {:a=>1, :b=>2} > {:a=>1, :b=>2, :c=>3} = false {:a=>1, :b=>2, :c=>3} > {:a=>1, :b=>2} = true {:a=>1, :b=>2, :c=>3} > {:a=>1, :b=>2, :c=>3} = false
For the record, this sample implementation was wrong.
It should consider that two hashs may have different values for same key.
% ruby -e '
class Hash
def <=(other)
self.merge(other) {|k,v1,v2| v1 } == other
end
def >=(other)
self.merge(other) {|k,v1,v2| v2 } == self
end
def <(other)
self <= other && self != other
end
def >(other)
self >= other && self != other
end
end
hs = [{a:1}, {a:2}]
ops = %w[<= >= < >]
ops.each {|op|
hs.each {|h1|
hs.each {|h2|
puts "#{h1} #{op} #{h2} = #{h1.send(op, h2)}"
}
}
}
'
{:a=>1} <= {:a=>1} = true
{:a=>1} <= {:a=>2} = false
{:a=>2} <= {:a=>1} = false
{:a=>2} <= {:a=>2} = true
{:a=>1} >= {:a=>1} = true
{:a=>1} >= {:a=>2} = false
{:a=>2} >= {:a=>1} = false
{:a=>2} >= {:a=>2} = true
{:a=>1} < {:a=>1} = false
{:a=>1} < {:a=>2} = false
{:a=>2} < {:a=>1} = false
{:a=>2} < {:a=>2} = false
{:a=>1} > {:a=>1} = false
{:a=>1} > {:a=>2} = false
{:a=>2} > {:a=>1} = false
{:a=>2} > {:a=>2} = false
Note that nobu's implementation already committed has no problem.
Updated by prijutme4ty (Ilya Vorontsov) about 9 years ago
Hello everyone.
I urge to remove Hash comparison methods and to stick to methods like #contain
. Or at least to return nil
instead of false
for comparison of non-comparable hashes. Underlying reasons are strictly mathematical but have far-reaching consequences.
Usually we deal with linearly ordered sets or totally ordered (like usual numbers or string are) i.e. such sets that either a <= b
or b <= a
for every two elements a
and b
of a set.
Comparison can be generalized for posets or partially ordered sets. They don't require that any two elements are comparable. Set of hashes is a typical example of a partially ordered set (see "Partial ordered set" or "Hasse diagram" in wikipedia).
One must not implement a <= b
for unrelated elements because if such comparison returns any certain result either true or false - then its negation would be counterintuitive. I'm not a proponent of current ruby approach of Class#<=>
because ordinary intuition based on everyday use of totally ordered sets suggest that this code would be correct which is definitely false:
if String <= Fixnum
puts 'String is a Fixnum subclass'
else
puts 'Fixnum is a String subclass'
end
But at least String <=> Fixnum
is neither true or false but nil which allow us to distinguish such situations. nil
result is properly handled by Comparable
methods like #sort
. Thus [String, Fixnum].sort
will raise.
So why one can sort this array and which result does one expect?:
[{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}].sort
That's why, I insist, comparison of non-comparable hashes at least must return nil
. As a more strict approach one can raise exception when try to compare hashes but it makes the main use-case impractical. But I can't see why one want to deal with such a controversial methods when #contain
and #included_by
will be enough for this not-so-often task.
As an example of why implementing #<=>
for posets is not a good idea, lets consider this typical hand-written qsort implementation.
def qsort(arr)
return arr if arr.size <= 1
pivot = arr[arr.length / 2]
left = arr.select{|el| el < pivot }
right = arr.select{|el| el > pivot }
central = arr.select{|el| el == pivot }
qsort(left) + central + qsort(right)
end
Okay. Now lets run and see how this "obvious" algorithm loses values.
qsort( [{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}] )
# => [{}, {:c=>3}]
Surely, sorting is already implemented, but this problem persist in every place where one suggest that a < b
, a == b
and a > b
are the only possible alternatives - thus in almost every if-else pair.
I ask a community think one more time about consequences of such a decision.
Ilya
Updated by olivierlacan (Olivier Lacan) about 9 years ago
Ilya Vorontsov wrote:
So why one can sort this array and which result does one expect?:
[{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}].sort
This is the actual result with 2.3.0-preview1:
RUBY_VERSION
=> "2.3.0"
[{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}].sort
ArgumentError: comparison of Hash with Hash failed
Maybe I'm reading you too quickly or I'm too tired but did you note that <=>
is not implemented on Hash? Matz made that clear and Akira Tanaka clarified it in this response.
Updated by prijutme4ty (Ilya Vorontsov) about 9 years ago
I've missed absence of <=>
first. Yes, you are right. And it can slightly reduce damage.
But anyway it doesn't resolve issues with misinterpretation of comparison negation. My qsort function is just the one problem which lie on surface. There are lots if (a < b) ...; else ...
constructions in lots of codebases.
I'm sure that such behaviour will lead to tons of subtle bugs.
Also It's highly probable that some libraries will be subjected to malicious inputs which will cause infinite recursion bombs or other threats. Here I've slightly changed qsort code using #reject
instead of #select
. Not a very natural code but it's still correct:
def qsort(arr)
return arr if arr.size <= 1
pivot = arr[arr.length / 2]
left = arr.reject{|el| el >= pivot }
right = arr.reject{|el| el <= pivot }
central = arr.select{|el| el == pivot }
qsort(left) + central + qsort(right)
end
This code will not reduce size as in my previous example but contrariwise will expand array size and break asymptotical estimations of CPU time.
qsort( [{}, {a:1,b:2}, {c:3}, {a:1}, {b:2}] )
# => [{}, {:b=>2}, {:a=>1}, {:b=>2}, {:a=>1, :b=>2}, {:c=>3}, {:b=>2}, {:a=>1}, {:b=>2}, {:a=>1, :b=>2}]
Look at expansion rate.
qsort( 10.times.map{|i| {i => i} } ).size # => 1023
qsort( 20.times.map{|i| {i => i} } ).size # => 1048575
qsort( 25.times.map{|i| {i => i} } ).size # => 33554431
The last array of 25 single-element hashes is sorted in a minute or so. It can easily hang server.
Ok, I specially wrote code subjected to this attack. But in a large codebase of ruby community there would be places which have similar problems (which are not problems for totally ordered sets like numbers). It can be hard to find and exploit it, but if you did - you have broken the server. Just imagine that some code which earlies just raised an exception due to bad input now ruins your application.
Updated by prijutme4ty (Ilya Vorontsov) about 9 years ago
- Tracker changed from Feature to Bug
Akira Tanaka wrote:
% ruby -e ' class Hash def <=(other) self.merge(other) == other end def >=(other) self.merge(other) == self end def <(other) self <= other && self != other end def >(other) self >= other && self != other end end
This implies that
{a: 1, b: 2} <= {a: 1, b: 3}
{a: 1, b: 3} <= {a: 1, b: 2}
Is that expected? This breaks the main use-case in testing method results.
Updated by nobu (Nobuyoshi Nakada) about 9 years ago
It's an easy code to show the concept.
Both return false
.
Updated by cvss (Kirill Vechera) about 9 years ago
I second Ilya's opinion regarding partially ordered sets. But propose to implement the comparision similiar to classes - Hash
and Class
both satisfy reflexive, antisymmetric, and transitive relations. So like the Class
, Hash
can implement <=>
, returning nil
on noncomparable argument.
Take a look on classes:
class A; end
class B; end
class BB < B; end
How they are comparing:
> A < B
=> nil
> B < BB
=> false
> BB < B
=> true
> A <=> B
=> nil
> B <=> BB
=> 1
So the similiar approach could be implemented for hashes, with inversed signigication for the <
and >
, to realize this behavior:
> {a: 1} < {b: 2}
=> nil
> {a: 1} < {a: 1, b: 2}
=> true
> {a: 1; b: 2} < {a: 1}
=> false
> {a: 1} <=> {b: 2}
=> nil
> {a: 1} <=> {a: 1, b: 2}
=> -1
Regarding qsort - partially ordered sets imply topological sorting, not kind of sorting algorithms for strictly ordered sets. So qsort should not work on array of hashes.