Actions
Bug #16237
closedSet#union performance issue
Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux]
Backport:
Description
I've discovered a very large difference in performances between Set#union and the use of a hash to compute this union.
The issue can be shown with the code below.
Here are the benchmark results:
user system total real
hash 0.562500 0.078125 0.640625 ( 0.639528)
uniq 54.375000 0.312500 54.687500 ( 54.772007)
set 37.312500 0.125000 37.437500 ( 37.474751)
mix1 43.718750 0.000000 43.718750 ( 43.890027)
mix2 37.109375 0.000000 37.109375 ( 37.190839)
The code example:
require 'benchmark'
require 'set'
REPEAT = 1_000
$DISTINCT_COUNT = 100_000_000
def random_array(size=1_000)
Array.new(size) { Random.rand(1..$DISTINCT_COUNT) }
end
def uniq
ret = []
REPEAT.times do
ret += random_array
ret.uniq!
end
ret
end
def set
ret = Set.new
REPEAT.times do
ret += Set.new(random_array)
end
ret
end
def mix1
ret = Set.new
REPEAT.times do
ret += random_array
end
ret
end
def mix2
ret = Set.new
REPEAT.times do
ret += random_array.uniq
end
ret
end
def hash
ret = {}
REPEAT.times do
random_array.each { |v| ret[v] = nil }
end
ret.keys
end
Benchmark.bm do |x|
%i(hash uniq set mix1 mix2).each do |i|
x.report(i, &method(i))
end
end
Updated by mame (Yusuke Endoh) about 5 years ago
- Status changed from Open to Closed
This is a spec. The reason is because Set#union
duplicates self and then merges the argument, which makes your "set" implementation O( n^2 ). You may want to use Set#merge
instead of Set#union
in this case.
def set2
ret = Set.new
REPEAT.times do
ret.merge(Set.new(random_array))
end
ret
end
user system total real
hash 0.404887 0.011980 0.416867 ( 0.416911)
set2 0.468687 0.004019 0.472706 ( 0.473242)
Actions
Like0
Like0