Project

General

Profile

Feature #16039

Array#contains? to check if one array contains another array

Added by cha1tanya (Prathamesh Sonpatki) 10 months ago. Updated 6 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:94115]

Description

I woud like to propose Array#contains? which will check if the one array is part of another array.
Implementation can be as follows:

def contains?(other)
  (other - self).empty?
end

Some test cases:

[1, 2, 3].contains?([2, 3]) => true
[1, 2, 3].contains?([]) => true
[1, 2, 3].contains?([1, 2, 3, 4]) => false
[].contains?([]) => true
[].contains?([1, 2, 3, 4]) => false
#1

Updated by cha1tanya (Prathamesh Sonpatki) 10 months ago

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) 10 months ago

contains? sounds like it would check if an element is part of the Array, i.e., like include?.
So I think the name is problematic.

superset? would be a better name, and that's defined on Set.
So I think in this case it's better to simply use a Set like Set[1,2,3].superset?(Set[2,3]).

FWIW, Array#- already uses a Hash or Set internally, so an efficient implementation has to use a Set for such functionality anyway.
I don't it's valuable to hide this behavior by having a method on Array, I think it's better to be explicit and use Set directly in this case.

Updated by cha1tanya (Prathamesh Sonpatki) 10 months ago

Agree that superset is better name. Here is the actual use case:

Project.pluck(:id).contains?([1,2,3])

Where Project.pluck returns an array of integers.

To use set, I have to convert the array to a set.

Project.pluck(:id).to_set.superset?(Set[1,2,3])

I wanted to avoid creating Set objects just for the purpose of this check so my motivation was to have such method on Array.

Updated by shevegen (Robert A. Heiler) 10 months ago

I think the use case is ok - you want to find out whether an Array or
an Array-like object, is contained in another object (container in a
container in a container ...).

My biggest problem with this is that #contains? is similar in meaning
to #include?, even though they do slightly different things. (By the way
I think for consistency, it would have to be #contain? rather than
#contains?, similar to #include? rather than #includes?).

I am not sure if superset is a better name; to me it conveys a different
meaning than #contains?; but one advantage that superset may have is
that it would not conflict with e. g. #include?, whereas I feel that
#contains? would do so more.

I wanted to avoid creating Set objects just for the purpose of this
check so my motivation was to have such method on Array.

#contains? would be simpler to read in your example indeed :) - but
I think it could lead to ruby users wondering when to use #include?
and when to use e. g. #contain?, and I am not sure this would be
good. We already have people wondering when to use strings and when
to use symbols in a Hash. Keeping things simple(r) would be good,
IMO. ;)

Updated by Eregon (Benoit Daloze) 10 months ago

cha1tanya (Prathamesh Sonpatki) wrote:

I wanted to avoid creating Set objects just for the purpose of this check so my motivation was to have such method on Array.

Array#- and any efficient (O(n+m) and not O(n*m), n the size of the LHS, m the size of the RHS) implementation of a superset check needs to use some kind of Hash internally.
So you might save a Set allocation, but internally it has to allocate a Hash anyway, so I don't think there is much of a difference, performance-wise.

I would recommend defining a helper method like you did above if you use this frequently in your code base.

Updated by ahvetm (Erik Madsen) 10 months ago

I think this is a great proposal in terms of having one of those nice, useful methods easily available directly on the class you're interacting with, similar to Array#last which can quite verbose in other languages.

I would propose the name #include_all? or something similar to make it obvious that you're comparing it with another array.

Updated by sawa (Tsuyoshi Sawada) 10 months ago

I am not a fan of this feature, but by analogy from Range, cover? may be a better name.

Updated by Dan0042 (Daniel DeLorme) 6 months ago

There's some similarity with #15198, to the point that I can re-use my suggestion from there:

It might make sense to use ary1.to_set.superset?(ary2). That way it makes explicit the fact that ary1 must be converted to a set. But Set#superset? would have to support any Enumerable.

Updated by JustJosh (Joshua Stowers) 6 months ago

#cover?

I do not think we should use the name cover? because the types of arguments accepted by Range#cover? would be incompatible with this use case.

For example:

(1..3).cover?(2) # true

But if Array's implementation worked similarly, we would have the following issue:

[1, 2, 3].cover?(2) # true by design of Range#cover?
[1, 2, 3].cover?([2]) # true because all values in argument are also in self
[1, [2], 3].cover?([2]) # ?

#superset?

It is worth noting that the unique nature of sets would affect the expected behavior of this method:

[1, 2, 3].contains?([1, 2, 2]) # false because self contains only one 2
[1, 2, 3].superset?([1, 2, 2]) # true because duplicates are ignored 

In my opinion, the unambiguous behavior of superset? is preferable.

Array/Set

Although I personally like array.superset?() more than array.to_set.superset?(), I think Set would benefit from more compatibility with Enumerable. So I agree with @Dan0042.

I recommend that we update Set#superset?, proper_superset?, subset?, and proper_subset? to accept any Enumerable.

Updated by sawa (Tsuyoshi Sawada) 6 months ago

JustJosh (Joshua Stowers) wrote:

I do not think we should use the name cover? because the types of arguments accepted by Range#cover? would be incompatible with this use case.

For example:

(1..3).cover?(2) # true

But if Array's implementation worked similarly, we would have the following issue:

[1, 2, 3].cover?(2) # true by design of Range#cover?
[1, 2, 3].cover?([2]) # true because all values in argument are also in self
[1, [2], 3].cover?([2]) # ?

When the argument is an array, it should be understood as the usual case; i.e., it should be interpreted as the $\subset$ relation. Otherwise, it should be considered as the abbreviated form; in such case, it should be interpreted as the $\in$ relation. So

[1, [2], 3].cover?([2])

should be unambiguously false. To achieve the interpretation that leads to the true output, you need to write:

[1, [2], 3].cover?([[2]])

That is exactly analogous to how Range#cover? works, and there hasn't been a problem there.

Updated by JustJosh (Joshua Stowers) 6 months ago

sawa (Tsuyoshi Sawada) - I 100% agree with what you are saying. I did a poor job expressing my concerns.

I am nervous that singling out array arguments in the abbreviated form could result in confusion and misuse. This is not a problem for Range#cover?, because a range cannot be composed of other ranges.

Since Set#superset? does not have an abbreviated form, there is less opportunity for misuse.

Updated by sawa (Tsuyoshi Sawada) 6 months ago

JustJosh (Joshua Stowers) wrote:

sawa (Tsuyoshi Sawada) - I 100% agree with what you are saying. I did a poor job expressing my concerns.

I am nervous that singling out array arguments in the abbreviated form could result in confusion and misuse. This is not a problem for Range#cover?, because a range cannot be composed of other ranges.

Since Set#superset? does not have an abbreviated form, there is less opportunity for misuse.

I understand your point. If that is a concern, then we can simply not allow the abbreviated form for Array#cover?; raise an argument error when the argument is not an array, which is probably what the original proposal in this thread assumed. This somewhat weakly brakes the analogy from Range#cover?, but it should not be a big deal.

(And still, there is an alternative view to not worry about that too much, and just allow the abbreviated form.)

Also available in: Atom PDF