Feature #14476
openAdding same_all? for checking whether all items in an Array are same
Description
In this issue, I propose to introduce same_all?
instance method of Array
class. This new method checks whether all items in the receiver are the same.
Today, I needed to write a code to judge whether all items in an Array
are the same. I wanted to make the following expression, which I've written first, more efficient.
ary.all? {|e| e.foo == ary[0].foo }
I considered about the following simpler case, too.
ary.all? {|e| e == ary[0] }
As I discussed with some CRuby committers and my colleagues at Speee, Inc., I found that both cases can be written more efficiently as follows:
# for the 1st case
ary.empty? || ary[0].foo.yield_self {|e0| ary[1..-1].all? {|e| e.foo == e0 } }
# for the 2nd case
ary.empty? || ary[0..-2] == ary[1..-1]
However, it is not easy to understand the intent of either expression, which is to check whether all items are the same.
I want to give this feature a clear name to make code readable. And I think it should be provided as a core feature because it can be made more efficient by being implemented in C language.
The benchmark script is: https://gist.github.com/mrkn/26a0fcfc431a45fe809fbbef95aceaf5
I used it to find the efficient expressions. The example result of this benchmark on my MacBook Pro is here.
$ ruby -v bench.rb
ruby 2.6.0dev (2018-02-14 trunk 62402) [x86_64-darwin16]
----------------------------------------
Benchmark case: shuffle
user system total real
all?-method-1 0.001525 0.000088 0.001613 ( 0.001632)
all?-method-2 0.000489 0.000031 0.000520 ( 0.000565)
all?-method-3 0.000444 0.000039 0.000483 ( 0.000482)
all?-item 0.000325 0.000078 0.000403 ( 0.000402)
opt-method 0.655959 0.033814 0.689773 ( 0.708515)
opt-item 0.000316 0.000001 0.000317 ( 0.000317)
----------------------------------------
Benchmark case: tail0
user system total real
all?-method-1 9.412810 0.231126 9.643936 ( 9.681118)
all?-method-2 5.375075 0.137908 5.512983 ( 5.754550)
all?-method-3 5.226132 0.167640 5.393772 ( 5.507031)
all?-item 0.873700 0.007545 0.881245 ( 0.917210)
opt-method 5.319648 0.172547 5.492195 ( 5.633140)
opt-item 0.174349 0.001974 0.176323 ( 0.183002)
----------------------------------------
Benchmark case: head0
user system total real
all?-method-1 0.002421 0.000068 0.002489 ( 0.002489)
all?-method-2 0.002169 0.000213 0.002382 ( 0.002382)
all?-method-3 0.001624 0.000026 0.001650 ( 0.001651)
all?-item 0.000623 0.000001 0.000624 ( 0.000624)
opt-method 4.779120 0.146312 4.925432 ( 4.951167)
opt-item 0.000629 0.000001 0.000630 ( 0.000629)
----------------------------------------
Benchmark case: all1
user system total real
all?-method-1 9.379650 0.255865 9.635515 ( 9.683078)
all?-method-2 4.950280 0.150659 5.100939 ( 5.140174)
all?-method-3 4.857898 0.129125 4.987023 ( 5.003142)
all?-item 0.694113 0.001295 0.695408 ( 0.702370)
opt-method 5.032373 0.121708 5.154081 ( 5.189599)
opt-item 0.170540 0.002069 0.172609 ( 0.180343)
Updated by dsferreira (Daniel Ferreira) almost 7 years ago
Usually I use the following code to achieve the same purpose:
ary.uniq.size < 2
I didn't test but if you need to use that complex code I expect this example to be less performant.
Instead of Array#same_all?
why not Array#uniq?
Updated by Eregon (Benoit Daloze) almost 7 years ago
Why would it be more efficient in C?
A sufficiently smart JIT (or of course it can be done manually) could turn
ary.all? {|e| e.foo == ary[0].foo }
into (assuming foo has no side effects):
e0 = ary[0].foo
ary.all? {|e| e.foo == e0 }
and then the only performance difference is the extra comparison of ary[0].foo with itself,
which could be avoided by:
e0 = ary[0].foo
(1...ary.size).all? {|i| ary[i].foo == e0 }
Updated by dsferreira (Daniel Ferreira) almost 7 years ago
I believe we could even do:
ary.uniq?(&:foo)
Updated by mrkn (Kenta Murata) almost 7 years ago
uniq
scans all elements, whereas all?
and ==
don't.
And uniq
allocates a new array, so uniq
is always slower than all?-method-3
and opt-item
in all cases of my benchmark.
Eregon, if we write it in C, we can avoid using all?
. It reduces the number of block calls.
Updated by dsferreira (Daniel Ferreira) almost 7 years ago
mrkn (Kenta Murata) wrote:
uniq scans all elements, whereas all? and == don't.
And uniq allocates a new array, so uniq is always slower than all?-method-3 and opt-item in all cases of my benchmark.
That makes sense.
I suppose that Array#uniq?
could work as Array#all?
returning once false
?
Updated by mrkn (Kenta Murata) almost 7 years ago
I suppose that Array#uniq? could work as Array#all? returning once false?
We don't have Array#uniq?.
Do you suggest adding such a new method?
Updated by dsferreira (Daniel Ferreira) almost 7 years ago
mrkn (Kenta Murata) wrote:
We don't have Array#uniq?.
Do you suggest adding such a new method?
That is what I meant with:
"Instead of Array#same_all?
why not Array#uniq?
"
Array#uniq?
makes sense to me and maps well with the logic of ary.uniq.size < 2
that I mentioned earlier.
If we return earlier from the method at the first false
we will have what you are looking for isn't that right, or am I missing something?
I also think it is a method that will simplify some code and people will use it.
I will use it for sure.
Developing it in C is even better because it will guarantee the best performance results.
Updated by duerst (Martin Dürst) almost 7 years ago
Four points.
First, re. uniq?:
mrkn (Kenta Murata) wrote:
We don't have Array#uniq?.
Do you suggest adding such a new method?
I remember having suggested a method named uniq?
. Unfortunately, I didn't find the issue (yet). But the meaning was different. The method checked whether all elements were unique, not whether all elements were the same. In other words, it was:
def uniq?
uniq.length == length
end
In an efficient implementation, it would return false
as soon as it detected two elements that were the same, and true
otherwise.
Second, about naming:
same_all?
doesn't sound good to me. The two words 'same' and 'all' don't appear in this order in English. What about all_same?
, or better even, all_equal?
?
Third, about need:
It's clear that when you implement something in C, it should get faster. If that were the only criterion for accepting new methods into Ruby, we'd have many more methods. The way you describe it, my understanding is that you only just today had a need for such functionality, not before. I think there should be some indication that this is needed regularly in order to accept it.
Fourth, about alternative ways to write it. What about
ary.each_cons(2).all? { |a, b| a == b }
Updated by nobu (Nobuyoshi Nakada) almost 7 years ago
Array#uniq?
sounds like that all elements are unique, ary.uniq.size == ary.size
.
Updated by dsferreira (Daniel Ferreira) almost 7 years ago
nobu (Nobuyoshi Nakada) wrote:
Array#uniq?
sounds like that all elements are unique,ary.uniq.size == ary.size
.
I agree it makes more sense for that situation.
Updated by shevegen (Robert A. Heiler) almost 7 years ago
I agree with Martin. Perhaps alternatives such as all_same?
or
even better, all_equal?
could be explored/used.
Nobu wrote:
Array#uniq?
sounds like that all elements are unique,ary.uniq.size == ary.size
.
Agreed.
Perhaps someone can ask matz in the dev meeting about the names. :)
Updated by matz (Yukihiro Matsumoto) over 6 years ago
- Status changed from Assigned to Rejected
Rejected. Unfortunately, the incompatibility this proposal would bring is too big.
Besides that, we have performance concern too.
Matz.
Updated by matz (Yukihiro Matsumoto) over 6 years ago
- Status changed from Rejected to Assigned
- Assignee changed from matz (Yukihiro Matsumoto) to mrkn (Kenta Murata)
Ah, sorry. Posted to the wrong proposal.
Regarding this issue, we have the naming issue. I agree to add the functionality.
Matz.
Updated by matz (Yukihiro Matsumoto) over 6 years ago
I am not satisfied with any of the candidates.
- same_all? - weird word order
- all_same? - word order is OK, but there's an ambiguity that "same" means equality or identical.
- all_equal? - the name suggests comparison is done by
equal?
- uniform? - I like this best, but can uniform mean equality?
Matz.
Updated by duerst (Martin Dürst) over 6 years ago
matz (Yukihiro Matsumoto) wrote:
- uniform? - I like this best, but can uniform mean equality?
When seeing uniform?, one thing I think about is that it means "Are all elements of the same type?".
Example:
[1, 2, 3].uniform? % => true
[:a, :b, :c].uniform? % => true
[1, :b, 'c'].uniform? % => true
Updated by duerst (Martin Dürst) over 6 years ago
duerst (Martin Dürst) wrote:
Example:
[1, 2, 3].uniform? % => true
[:a, :b, :c].uniform? % => true
[1, :b, 'c'].uniform? % => true
Sorry, this should have been:
[1, 2, 3].uniform? % => true
[:a, :b, :c].uniform? % => true
[1, :b, 'c'].uniform? % => false
Updated by fatkodima (Dima Fatko) about 4 years ago
How about all_equivalent?
or all_duplicates?
?
Updated by marcandre (Marc-Andre Lafortune) about 4 years ago
I hope this new feature never makes it.
enum.each_cons(2).all? { _1 == _2 }
is simply superior and more versatile.
Moreover, I'm not convinced that the frequency of this use warrants a new feature, especially that good alternatives exist and all have similar performance.
I wish we could write enum.each_cons(2).all?(&:==)
, but I'll add that to issues with yielding arity that nobody seems interested in discussing.