Project

General

Profile

Actions

Feature #16993

open

Sets: from hash keys using Hash#key_set

Added by marcandre (Marc-Andre Lafortune) almost 5 years ago. Updated 14 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:98968]

Description

To create a set from hash keys currently implies a temporary array for all keys, rehashing all those keys and rebuilding a hash. Instead, the hash could be copied and its values set to true.

h = {a: 1}
# Now:
Set.new(h.keys) # => Set[:a]
# After
h.key_set # => Set[:a], efficiently.

Related issues 1 (1 open0 closed)

Related to Ruby - Feature #16989: Sets: need ♥️Assignedknu (Akinori MUSHA)Actions
Actions #1

Updated by marcandre (Marc-Andre Lafortune) almost 5 years ago

Updated by sawa (Tsuyoshi Sawada) almost 5 years ago

Would

Set.new(h.each_key)

not work?

Updated by marcandre (Marc-Andre Lafortune) almost 5 years ago

sawa (Tsuyoshi Sawada) wrote in #note-2:

Would

Set.new(h.each_key)

not work?

It will definitely work, but it will be the same as Set.new(h.keys) (a bit slower because enumerator is a bit slow). It still iterates on the keys, and add them to the new hash.

Here's a comparison:

# frozen_string_literal: true
require 'set'
require 'benchmark/ips'

class Hash
  def key_set
    s = Set.new
    s.instance_variable_set(:@hash, transform_values { true })
    s
  end
end

size = 50
h = (1..size).to_h { |i| ['x' * i, nil] }

Benchmark.ips do |x|
  x.report("key_set")          { h.key_set }
  x.report("keys")             { Set.new(h.keys)  }
  x.report("new(each_key)")    { Set.new(h.each_key)  }
  x.report("each_key{}")       { h2 = {}; h.each_key {|k| h2[k] = true} ; h2  }

The last example builds a Hash, not a Set, but it is to show that you can not be quicker than that if you rehash the keys.
I get these results:

Calculating -------------------------------------
             key_set    244.549k (± 7.4%) i/s -      1.219M in   5.017876s
                keys     82.661k (± 2.3%) i/s -    417.400k in   5.052408s
       new(each_key)     75.002k (± 5.0%) i/s -    377.400k in   5.045102s
          each_key{}    114.757k (± 3.8%) i/s -    582.000k in   5.079700s

If you increase the size, the ratio will be bigger.
A builtin keyset would be even faster, since it would avoid calling the block { true }; yielding is not super fast in Ruby.

Updated by nobu (Nobuyoshi Nakada) 2 months ago

This kind of extensions should be done in the corresponding library side, set.rb.

Updated by Dan0042 (Daniel DeLorme) 2 months ago

I like the concept but I'm going to bikeshed the naming: by itself "key_set" is unclear if you don't already know what it does. It sounds like the purpose is to set a key (like instance_variable_set). I'd prefer something a bit more descriptive like "keys_to_set"

Updated by greggzst (Grzegorz Jakubiak) 2 months ago

nobu (Nobuyoshi Nakada) wrote in #note-5:

This kind of extensions should be done in the corresponding library side, set.rb.

Thanks for pointing that out. Here's the PR https://github.com/ruby/set/pull/40

Dan0042 (Daniel DeLorme) wrote in #note-6:

I like the concept but I'm going to bikeshed the naming: by itself "key_set" is unclear if you don't already know what it does. It sounds like the purpose is to set a key (like instance_variable_set). I'd prefer something a bit more descriptive like "keys_to_set"

Good point I addressed that

Updated by mame (Yusuke Endoh) about 1 month ago

This was discussed at the dev meeting. @matz (Yukihiro Matsumoto) wanted to hear more about the use cases for this method first.

Also, since Set is currently in a halfway state where it is not fully built-in, a built-in Hash#key_set might require a special hack like Enumerable#to_set.

https://github.com/ruby/ruby/blob/7c88cbb4a6c486348c44be24941f17ef8be6b329/prelude.rb#L34

@akr (Akira Tanaka) pointed out the possible consistency issue of Hash#compare_by_identity. Since Hash#transform_values keeps this compare_by_identity flag, the proposed patch will assign a Hash with compare_by_identity flag to @hash in Set. We will need to review it carefully to make sure it does not break the implicit assumptions of set.rb.

Updated by mame (Yusuke Endoh) 15 days ago

Given the convention of deriving methods such as key_set from keys, where the former returns a set instead of an array, one might expect a corresponding derivation like Kernel#instance_variable_set for Kernel#instance_variables. Unfortunately, this would conflict with an existing method name.

Updated by Dan0042 (Daniel DeLorme) 14 days ago

mame (Yusuke Endoh) wrote in #note-9:

Given the convention of deriving methods such as key_set from keys, where the former returns a set instead of an array, one might expect a corresponding derivation like Kernel#instance_variable_set for Kernel#instance_variables. Unfortunately, this would conflict with an existing method name.

In https://github.com/ruby/set/pull/40 the name was changed to #keys_to_set, so this comment doesn't apply anymore.

Then again I wish all optimizations like this could be implemented at the VM level, so that hash.keys.to_set would automatically be efficient rather than having to explicitly use the efficient version #keys_to_set.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0