Feature #16428
closedAdd Array#uniq?, Enumerable#uniq?
Description
I propose Array#uniq?.
I often need to check if an array have duplicate elements.
This method returns true if no duplicates are found in self, otherwise returns false.
If a block is given, it will use the return value of the block for comparison.
This is equivalent to array.uniq.size == array.size
, but faster.
% ~/tmp/r/bin/ruby -rbenchmark/ips -e 'a = Array.new(100) { rand(1000) }; Benchmark.ips { |x| x.report("uniq") { a.uniq.size == a.size }; x.report("uniq?") { a.uniq? } }'
Warming up --------------------------------------
uniq 25.765k i/100ms
uniq? 76.544k i/100ms
Calculating -------------------------------------
uniq 278.144k (± 4.1%) i/s - 1.391M in 5.010858s
uniq? 981.868k (± 5.1%) i/s - 4.975M in 5.081611s
I think the name uniq?
is natural because Array already has uniq
.
Updated by shevegen (Robert A. Heiler) almost 5 years ago
I often need to check if an array have duplicate elements.
Makes sense to me; I have had situations where I needed this
too in the past (including situations for non-unique entries
in an Array), so I agree on the general use case opportunities
in this regard.
Updated by duerst (Martin Dürst) almost 5 years ago
I seem to member that many years ago, I made the same proposal, and Nobu created a patch, but unfortunately, I didn't find any traces anymore on this tracker or in my mail.
Anyway, I support this proposal. It's definitely an useful functionality, and it's clearly faster than doing it indirectly via #uniq.
Updated by kyanagi (Kouhei Yanagita) almost 5 years ago
- Subject changed from Add Array#uniq? to Add Array#uniq?, Enumerable#uniq?
Following a suggestion of Enumerable#uniq?
, I also added Enumerable#uniq?
to my patch.
Array#uniq?
is left because it is faster than Enumerable#uniq?
.
Updated by matz (Yukihiro Matsumoto) over 4 years ago
- Status changed from Open to Feedback
You said, "I often need to check if an array have duplicate elements". But we cannot think of the real-world use-case.
Could you elaborate on how to use the proposed #uniq?
and its benefit?
Matz.
Updated by kyanagi (Kouhei Yanagita) over 4 years ago
I was developing mobile games, and I met these situations:
A card deck can't have duplicate characters.
i.e. deck.cards.map(&:character_id).uniq.size == deck.cards.size
-> deck.cards.map(&:character_id).uniq?
or deck.cards.uniq?(&:character_id)
When players compose items, each of them should be different.
i.e. materials.map(&:item_id).uniq.size == materials.size
-> materials.map(&:item_id).uniq?
or materials.uniq?(&:item_id)
Another situation:
I developed a registration form for relay runners.
A request body is like this:
# Missing sections are allowed. You can send them later.
[
{ section: 1, name: 'aaa' },
{ section: 3, name: 'bbb' },
{ section: 5, name: 'ccc' },
]
In this case, duplication of section
is not allowed.
runners.map(&:section).uniq.size == runners.size
-> runners.map(&:section).uniq?
or runners.uniq?(&:section)
I think uniq?
is easier to write and read than x.uniq.size == x.size
for expression of no duplication. It's even faster.
This check is also found in Ruby's repository (bundler):
https://github.com/ruby/ruby/blob/master/spec/bundler/support/matchers.rb#L84
Updated by shyouhei (Shyouhei Urabe) over 4 years ago
kyanagi (Kouhei Yanagita) wrote in #note-5:
I was developing mobile games, and I met these situations:
A card deck can't have duplicate characters.
i.e.deck.cards.map(&:character_id).uniq.size == deck.cards.size
->deck.cards.map(&:character_id).uniq?
ordeck.cards.uniq?(&:character_id)
So you just want to test? Why doesn't deck.cards.map(...).uniq!
's return value work?
When players compose items, each of them should be different.
i.e.materials.map(&:item_id).uniq.size == materials.size
->materials.map(&:item_id).uniq?
ormaterials.uniq?(&:item_id)
So you just want to test? Don't you want to show the duplicated materials to the players? Does uniq?
help then?
Another situation:
I developed a registration form for relay runners.
A request body is like this:# Missing sections are allowed. You can send them later. [ { section: 1, name: 'aaa' }, { section: 3, name: 'bbb' }, { section: 5, name: 'ccc' }, ]
In this case, duplication of
section
is not allowed.
runners.map(&:section).uniq.size == runners.size
->runners.map(&:section).uniq?
orrunners.uniq?(&:section)
So you just want to test? Don't you want to render error message about what is the duplicated section? Does uniq?
help then?
I think
uniq?
is easier to write and read thanx.uniq.size == x.size
for expression of no duplication. It's even faster.
My main question is: it isn't faster when you render error messages. How do you use it?
This check is also found in Ruby's repository (bundler):
https://github.com/ruby/ruby/blob/master/spec/bundler/support/matchers.rb#L84
Honestlt I don't understand what this matcher is trying to achieve.
Updated by kyanagi (Kouhei Yanagita) over 4 years ago
In my cases, I (server side) only had to check duplication because a client also have validations.
Legal users can't send a request with duplicates, so detailed error message was not required.
(If needed, I could investigate logged request.)
uniq!
's return value is also usable, but I think uniq?
is more fitting.
(I'd like to check duplication, not to get uniq array.)
Updated by keithrbennett (Keith Bennett) over 3 years ago
I was just going to post this suggestion, but saw that it was already here.
uniq?
could be helpful, for example, where you are loading objects from an external source (e.g. from JSON or YAML), and you need to verify that the objects' id's are unique. objects.map(&:id).uniq?
is much more expressive, clear, and concise, than the lower level, longer form that might be something like this:
ids = objects.map(&:id)
ids.size == ids.uniq.size
Also, it's consistent with the style of existing methods like empty?
, one?
, etc.
Updated by gotoken (Kentaro Goto) about 3 years ago
Recently I read similar topic again elsewhere. They pointed
- in most cases we have something to do on each duplicate element if any duplicate detected, e.g., reporting all duplicate elements as an error message
-
uniq?
looks slightly odd because we don't havesort?
orclear?
(uniq etymology: Perl funtion uniq. Originally Version 3 Unix command uniq.)
Though they make sense to me, but sometimes, in the case of back-of-the-envelope calculations, I just want to write code that just checks the array for duplicate elements, for example, to check whether a particular csv column meets a unique constraint from the irb console as Keith gave as an example.
So instead, I suggest a set of three methods
-
#repeated
returns a new Array containing repeated elements. This may be what we need. -
#repeated?
returnstrue
if there is a repeated element. This may be faster than! array.repeated.empty?
because can returntrue
immediately when a repetition is detected. -
#no_repeated?
returns the same to negation of#repeated?
. This is what we want intuitively. And functionally identical to Kouhei'suniq?
.
Here I chose word repeated instead of duplicate so as not to confuse it with the meaning of dup
.