Feature #16990
openSets: operators compatibility with Array
Description
We currently have set <operator> array
work fine:
Set[1] + [2] # => Set[1, 2]
Nothing works in the reverse order:
[1] + Set[2] # => no implicit conversion of Set into Array
# should be:
[1] + Set[2] # => [1, 2]
set-like operators¶
Note that the situation is particularly frustrating for &
, |
and -
.
If someone wants to do ary - set
, one has to do ary - set.to_a
which will, internally, do a to_set
, so what is happening is set.to_a.to_set
!! (assuming ary
is over SMALL_ARRAY_LEN == 16
size, otherwise it's still doing in O(ary * set)
instead of O(ary)
).
The same holds with &
and |
; see order issue as to why this can not (officially) be done any other way.
Reminder:
ary & ary.reverse # => ary
Set[*ary] & Set[*ary.reverse] # => Set[*ary.reverse], officially order is indeterminate
Updated by marcandre (Marc-Andre Lafortune) over 4 years ago
- Related to Feature #16989: Sets: need ♥️ added
Updated by Eregon (Benoit Daloze) about 4 years ago
Isn't [1].to_set + Set[2]
a good workaround here?
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
Eregon (Benoit Daloze) wrote in #note-2:
Isn't
[1].to_set + Set[2]
a good workaround here?
Did you mean only for +
, or for all operators? Take -
for example... ary.to_set - set
the to_set
is wasteful, and I don't want to write it in the first place 😆
An important question is: should the result be a Set
, or an Array
? In most cases, if there is interoperability, it won't matter that much. I think that array <op> set
should return an array
. In my understanding, array & set
should be a great way to say I have this array, I want to keep only those elements that match the set. It should just work.
I'll repeat a usecase which was a bunch of constants in RuboCop
that were arrays. Changing them to Set
s would break a bunch of code that does my_list_of_stuff + OtherClass::STUFF
, for example.
Updated by knu (Akinori MUSHA) about 4 years ago
We can probably define Set#to_ary if it's OK, and Array#+ will be able to deal with a set. Let us think about the downsides...
Updated by knu (Akinori MUSHA) about 4 years ago
As for the result type, I think Array operators should return arrays. Otherwise array += set
would turn the variable array
to a Set and that would be a surprise.
Updated by Eregon (Benoit Daloze) about 4 years ago
Because Array and Set are fundamentally different, I think ensuring both operands have the same type, explicitly or internally, is completely reasonable.
I don't expect Array "set" operations to magically know about the Set representation.
Having something that is both fast for include?
and +
means it needs to keep both a set-like representation and an array-like representation, which is a memory trade-off that FastArray
in your PRs makes.
my_list_of_stuff + OtherClass::STUFF
Which is what type?
set + array
already does set + array.to_set
and returns a Set (dedup'd elements)
array + set
is not so well defined.
Does it do array + set.to_a
with duplicated elements? And then that makes the return type inconsistent with set + array
.
Doing array.to_set + set
implicitly doesn't seem nice either as @knu (Akinori MUSHA) said.
I think a few practical examples from RuboCop would help to figure what makes most sense.
Maybe having a specialized abstraction like FastArray is what makes most sense for RuboCop if non-set operations are used frequently.
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
knu (Akinori MUSHA) wrote in #note-4:
We can probably define Set#to_ary if it's OK, and Array#+ will be able to deal with a set. Let us think about the downsides...
While this may be a good thing, and at least make them interoperable, it is still quite inefficient... For example, Array#-
would call to_ary
, which would create a temporary array from the hash, only to create a temporary hash/set...
As for the result type, I think Array operators should return arrays. Otherwise array += set would turn the variable array to a Set and that would be a surprise.
Indeed. I'm glad we agree 👍
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
Eregon (Benoit Daloze) wrote in #note-6:
I don't expect Array "set" operations to magically know about the Set representation.
I would like to expect it 😆
array + set
is not so well defined.
Does it doarray + set.to_a
Yes
And then that makes the return type inconsistent with
set + array
.
Yes. Note that it is already inconsistent (Set
vs raising an error).
Updated by mame (Yusuke Endoh) about 4 years ago
I expect that ary + set
return a Set, not an Array, unless it raises an exception.
Otherwise array += set would turn the variable array to a Set and that would be a surprise.
It is a surprise if ary + set
returns a collection object that is ordered and that has multiple instances in its elements.
To me. ary += set
looks like int_val += float
. It is not a surprise to me that it changes the type of int_val
.
Updated by knu (Akinori MUSHA) about 4 years ago
mame (Yusuke Endoh) wrote in #note-9:
I expect that
ary + set
return a Set, not an Array, unless it raises an exception.Otherwise array += set would turn the variable array to a Set and that would be a surprise.
It is a surprise if
ary + set
returns a collection object that is ordered and that has multiple instances in its elements.
I will use array | something
if I mean to deduplicate the result, so array + something
to me is the way to explicitly say I want to simply concatenate two lists. (should array + set
be defined)
To me.
ary += set
looks likeint_val += float
. It is not a surprise to me that it changes the type ofint_val
.
The coercion protocol in Numeric classes works like that not to lose precision. In that sense, I think Array is to Set as Float is to Integer because a set can be converted to an array without losing information and not the other way around, so you could argue that set + array
and array + set
should both return an array when int + float
and float + int
both result in a float.
Updated by Student (Nathan Zook) over 3 years ago
mame (Yusuke Endoh) wrote in #note-9:
I expect that
ary + set
return a Set, not an Array, unless it raises an exception.Otherwise array += set would turn the variable array to a Set and that would be a surprise.
It is a surprise if
ary + set
returns a collection object that is ordered and that has multiple instances in its elements.
To me.ary += set
looks likeint_val += float
. It is not a surprise to me that it changes the type ofint_val
.
I found implicit conversions of values in K&R to be an abomination BEFORE I became aware of the many, many bugs & security issues that came from them.
In the world of objects, a + b = c is expected either to have the same class as a, or of the nearest common ancestor of the classes of a & b. Of course, in the Ruby type system, this is Object. Philosophically, it is pretty clear that Array is a descendant class of Set. Arrays add ordering, and thereby multiplicity and otherwise behave as Sets. HOWEVER--at some point it is important, even critical, to bow to the culture. The culture of programming (outside of LISP?) is that Arrays are the thing that "everyone" works with "all the time", while Sets are reserved for specialists or special situations. As a mathematician, I am loathe to use Arrays when I mean Sets, but then again, I never use ':' as a string quote character, either...
I find the exception that "everyone" does with int_val += float "unsurprising" because "everyone" has been taught it since K&R. It is horrible practice, and needs to be avoided. For the case of [1,2] += {1,2}, it is far from clear to me that the operation is sufficiently well-defined in the industry to be able to choose a solution. No matter what is chosen, the results are going to surprise a lot of folks. Throwing an exception says "we're not going to assume we can read your mind on this one."
My solution would be to define Array.add([1,2], {1,2}) and Set.add([1,2], {1,2}). Internally, the obnoxious conversions back & forth can be avoided. Externally, its really clear what will come back.