Project

General

Profile

Actions

Bug #16237

closed

Set#union performance issue

Added by lionelperrin (Lionel Perrin) about 5 years ago. Updated about 5 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux]
[ruby-core:95221]

Description

I've discovered a very large difference in performances between Set#union and the use of a hash to compute this union.
The issue can be shown with the code below.

Here are the benchmark results:

       user     system      total        real
hash  0.562500   0.078125   0.640625 (  0.639528)
uniq 54.375000   0.312500  54.687500 ( 54.772007)
set 37.312500   0.125000  37.437500 ( 37.474751)
mix1 43.718750   0.000000  43.718750 ( 43.890027)
mix2 37.109375   0.000000  37.109375 ( 37.190839)

The code example:

require 'benchmark'
require 'set'

REPEAT = 1_000

$DISTINCT_COUNT = 100_000_000

def random_array(size=1_000)
  Array.new(size) { Random.rand(1..$DISTINCT_COUNT) }
end

def uniq
  ret = []
  REPEAT.times do
    ret += random_array
    ret.uniq!
  end
  ret
end

def set
  ret = Set.new
  REPEAT.times do
    ret += Set.new(random_array)
  end
  ret
end

def mix1
  ret = Set.new
  REPEAT.times do
    ret += random_array
  end
  ret
end

def mix2
  ret = Set.new
  REPEAT.times do
    ret += random_array.uniq
  end
  ret
end

def hash
  ret = {}
  REPEAT.times do
    random_array.each { |v| ret[v] = nil }
  end
  ret.keys
end

Benchmark.bm do |x|
  %i(hash uniq set mix1 mix2).each do |i|
    x.report(i, &method(i))
  end
end

Actions

Also available in: Atom PDF

Like0
Like0