Feature #16039
openArray#contains? to check if one array contains another array
Description
I woud like to propose Array#contains?
which will check if the one array is part of another array.
Implementation can be as follows:
def contains?(other)
(other - self).empty?
end
Some test cases:
[1, 2, 3].contains?([2, 3]) => true
[1, 2, 3].contains?([]) => true
[1, 2, 3].contains?([1, 2, 3, 4]) => false
[].contains?([]) => true
[].contains?([1, 2, 3, 4]) => false
Updated by cha1tanya (Prathamesh Sonpatki) over 5 years ago
- Description updated (diff)
Updated by Eregon (Benoit Daloze) over 5 years ago
contains?
sounds like it would check if an element is part of the Array, i.e., like include?
.
So I think the name is problematic.
superset?
would be a better name, and that's defined on Set
.
So I think in this case it's better to simply use a Set
like Set[1,2,3].superset?(Set[2,3])
.
FWIW, Array#-
already uses a Hash or Set internally, so an efficient implementation has to use a Set for such functionality anyway.
I don't it's valuable to hide this behavior by having a method on Array, I think it's better to be explicit and use Set directly in this case.
Updated by cha1tanya (Prathamesh Sonpatki) over 5 years ago
Agree that superset
is better name. Here is the actual use case:
Project.pluck(:id).contains?([1,2,3])
Where Project.pluck
returns an array of integers.
To use set, I have to convert the array to a set.
Project.pluck(:id).to_set.superset?(Set[1,2,3])
I wanted to avoid creating Set objects just for the purpose of this check so my motivation was to have such method on Array.
Updated by shevegen (Robert A. Heiler) over 5 years ago
I think the use case is ok - you want to find out whether an Array or
an Array-like object, is contained in another object (container in a
container in a container ...).
My biggest problem with this is that #contains? is similar in meaning
to #include?, even though they do slightly different things. (By the way
I think for consistency, it would have to be #contain? rather than
#contains?, similar to #include? rather than #includes?).
I am not sure if superset is a better name; to me it conveys a different
meaning than #contains?; but one advantage that superset may have is
that it would not conflict with e. g. #include?, whereas I feel that
#contains? would do so more.
I wanted to avoid creating Set objects just for the purpose of this
check so my motivation was to have such method on Array.
#contains? would be simpler to read in your example indeed :) - but
I think it could lead to ruby users wondering when to use #include?
and when to use e. g. #contain?, and I am not sure this would be
good. We already have people wondering when to use strings and when
to use symbols in a Hash. Keeping things simple(r) would be good,
IMO. ;)
Updated by Eregon (Benoit Daloze) over 5 years ago
cha1tanya (Prathamesh Sonpatki) wrote:
I wanted to avoid creating Set objects just for the purpose of this check so my motivation was to have such method on Array.
Array#-
and any efficient (O(n+m) and not O(n*m), n the size of the LHS, m the size of the RHS) implementation of a superset check needs to use some kind of Hash internally.
So you might save a Set allocation, but internally it has to allocate a Hash anyway, so I don't think there is much of a difference, performance-wise.
I would recommend defining a helper method like you did above if you use this frequently in your code base.
Updated by ahvetm (Erik Madsen) over 5 years ago
I think this is a great proposal in terms of having one of those nice, useful methods easily available directly on the class you're interacting with, similar to Array#last
which can quite verbose in other languages.
I would propose the name #include_all?
or something similar to make it obvious that you're comparing it with another array.
Updated by sawa (Tsuyoshi Sawada) over 5 years ago
I am not a fan of this feature, but by analogy from Range
, cover?
may be a better name.
Updated by Dan0042 (Daniel DeLorme) about 5 years ago
There's some similarity with #15198, to the point that I can re-use my suggestion from there:
It might make sense to use ary1.to_set.superset?(ary2). That way it makes explicit the fact that ary1 must be converted to a set. But Set#superset? would have to support any Enumerable.
Updated by JustJosh (Joshua Stowers) about 5 years ago
#cover?¶
I do not think we should use the name cover?
because the types of arguments accepted by Range#cover?
would be incompatible with this use case.
For example:
(1..3).cover?(2) # true
But if Array's implementation worked similarly, we would have the following issue:
[1, 2, 3].cover?(2) # true by design of Range#cover?
[1, 2, 3].cover?([2]) # true because all values in argument are also in self
[1, [2], 3].cover?([2]) # ?
#superset?¶
It is worth noting that the unique nature of sets would affect the expected behavior of this method:
[1, 2, 3].contains?([1, 2, 2]) # false because self contains only one 2
[1, 2, 3].superset?([1, 2, 2]) # true because duplicates are ignored
In my opinion, the unambiguous behavior of superset?
is preferable.
Array/Set
Although I personally like array.superset?()
more than array.to_set.superset?()
, I think Set
would benefit from more compatibility with Enumerable
. So I agree with @Dan0042.
I recommend that we update Set#superset?
, proper_superset?
, subset?
, and proper_subset?
to accept any Enumerable
.
Updated by sawa (Tsuyoshi Sawada) about 5 years ago
JustJosh (Joshua Stowers) wrote:
I do not think we should use the name
cover?
because the types of arguments accepted byRange#cover?
would be incompatible with this use case.For example:
(1..3).cover?(2) # true
But if Array's implementation worked similarly, we would have the following issue:
[1, 2, 3].cover?(2) # true by design of Range#cover? [1, 2, 3].cover?([2]) # true because all values in argument are also in self [1, [2], 3].cover?([2]) # ?
When the argument is an array, it should be understood as the usual case; i.e., it should be interpreted as the $\subset$ relation. Otherwise, it should be considered as the abbreviated form; in such case, it should be interpreted as the $\in$ relation. So
[1, [2], 3].cover?([2])
should be unambiguously false
. To achieve the interpretation that leads to the true
output, you need to write:
[1, [2], 3].cover?([[2]])
That is exactly analogous to how Range#cover?
works, and there hasn't been a problem there.
Updated by JustJosh (Joshua Stowers) about 5 years ago
@sawa - I 100% agree with what you are saying. I did a poor job expressing my concerns.
I am nervous that singling out array arguments in the abbreviated form could result in confusion and misuse. This is not a problem for Range#cover?
, because a range cannot be composed of other ranges.
Since Set#superset?
does not have an abbreviated form, there is less opportunity for misuse.
Updated by sawa (Tsuyoshi Sawada) about 5 years ago
JustJosh (Joshua Stowers) wrote:
@sawa - I 100% agree with what you are saying. I did a poor job expressing my concerns.
I am nervous that singling out array arguments in the abbreviated form could result in confusion and misuse. This is not a problem for
Range#cover?
, because a range cannot be composed of other ranges.Since
Set#superset?
does not have an abbreviated form, there is less opportunity for misuse.
I understand your point. If that is a concern, then we can simply not allow the abbreviated form for Array#cover?
; raise an argument error when the argument is not an array, which is probably what the original proposal in this thread assumed. This somewhat weakly brakes the analogy from Range#cover?
, but it should not be a big deal.
(And still, there is an alternative view to not worry about that too much, and just allow the abbreviated form.)