Feature #16993
openSets: from hash keys using Hash#key_set
Description
To create a set from hash keys currently implies a temporary array for all keys, rehashing all those keys and rebuilding a hash. Instead, the hash could be copied and its values set to true.
h = {a: 1}
# Now:
Set.new(h.keys) # => Set[:a]
# After
h.key_set # => Set[:a], efficiently.
  
        
          
          Updated by marcandre (Marc-Andre Lafortune) over 5 years ago
          
          
        
        
      
      - Related to Feature #16989: Sets: need ♥️ added
 
        
          
          Updated by sawa (Tsuyoshi Sawada) over 5 years ago
          
          
        
        
      
      Would
Set.new(h.each_key)
not work?
        
          
          Updated by marcandre (Marc-Andre Lafortune) over 5 years ago
          
          
        
        
      
      sawa (Tsuyoshi Sawada) wrote in #note-2:
Would
Set.new(h.each_key)not work?
It will definitely work, but it will be the same as Set.new(h.keys) (a bit slower because enumerator is a bit slow). It still iterates on the keys, and add them to the new hash.
Here's a comparison:
# frozen_string_literal: true
require 'set'
require 'benchmark/ips'
class Hash
  def key_set
    s = Set.new
    s.instance_variable_set(:@hash, transform_values { true })
    s
  end
end
size = 50
h = (1..size).to_h { |i| ['x' * i, nil] }
Benchmark.ips do |x|
  x.report("key_set")          { h.key_set }
  x.report("keys")             { Set.new(h.keys)  }
  x.report("new(each_key)")    { Set.new(h.each_key)  }
  x.report("each_key{}")       { h2 = {}; h.each_key {|k| h2[k] = true} ; h2  }
The last example builds a Hash, not a Set, but it is to show that you can not be quicker than that if you rehash the keys.
I get these results:
Calculating -------------------------------------
             key_set    244.549k (± 7.4%) i/s -      1.219M in   5.017876s
                keys     82.661k (± 2.3%) i/s -    417.400k in   5.052408s
       new(each_key)     75.002k (± 5.0%) i/s -    377.400k in   5.045102s
          each_key{}    114.757k (± 3.8%) i/s -    582.000k in   5.079700s
If you increase the size, the ratio will be bigger.
A builtin keyset would be even faster, since it would avoid calling the block { true }; yielding is not super fast in Ruby.
        
          
          Updated by greggzst (Grzegorz Jakubiak) 9 months ago
          
          
        
        
      
      I opened a PR to add this functionality https://github.com/ruby/ruby/pull/12750
        
          
          Updated by nobu (Nobuyoshi Nakada) 9 months ago
          
          
        
        
      
      This kind of extensions should be done in the corresponding library side, set.rb.
        
          
          Updated by Dan0042 (Daniel DeLorme) 9 months ago
          
          
        
        
      
      I like the concept but I'm going to bikeshed the naming: by itself "key_set" is unclear if you don't already know what it does. It sounds like the purpose is to set a key (like instance_variable_set). I'd prefer something a bit more descriptive like "keys_to_set"
        
          
          Updated by greggzst (Grzegorz Jakubiak) 9 months ago
          
          
        
        
      
      nobu (Nobuyoshi Nakada) wrote in #note-5:
This kind of extensions should be done in the corresponding library side, set.rb.
Thanks for pointing that out. Here's the PR https://github.com/ruby/set/pull/40
Dan0042 (Daniel DeLorme) wrote in #note-6:
I like the concept but I'm going to bikeshed the naming: by itself "key_set" is unclear if you don't already know what it does. It sounds like the purpose is to set a key (like instance_variable_set). I'd prefer something a bit more descriptive like "keys_to_set"
Good point I addressed that
        
          
          Updated by mame (Yusuke Endoh) 8 months ago
          
          
        
        
      
      This was discussed at the dev meeting. @matz (Yukihiro Matsumoto) wanted to hear more about the use cases for this method first.
Also, since Set is currently in a halfway state where it is not fully built-in, a built-in Hash#key_set might require a special hack like Enumerable#to_set.
https://github.com/ruby/ruby/blob/7c88cbb4a6c486348c44be24941f17ef8be6b329/prelude.rb#L34
@akr (Akira Tanaka) pointed out the possible consistency issue of Hash#compare_by_identity. Since Hash#transform_values keeps this compare_by_identity flag, the proposed patch will assign a Hash with compare_by_identity flag to @hash in Set. We will need to review it carefully to make sure it does not break the implicit assumptions of set.rb.
        
          
          Updated by mame (Yusuke Endoh) 7 months ago
          
          
        
        
      
      Given the convention of deriving methods such as key_set from keys, where the former returns a set instead of an array, one might expect a corresponding derivation like Kernel#instance_variable_set for Kernel#instance_variables. Unfortunately, this would conflict with an existing method name.
        
          
          Updated by Dan0042 (Daniel DeLorme) 7 months ago
          
          
        
        
      
      mame (Yusuke Endoh) wrote in #note-9:
Given the convention of deriving methods such as
key_setfromkeys, where the former returns a set instead of an array, one might expect a corresponding derivation likeKernel#instance_variable_setforKernel#instance_variables. Unfortunately, this would conflict with an existing method name.
In https://github.com/ruby/set/pull/40 the name was changed to #keys_to_set, so this comment doesn't apply anymore.
Then again I wish all optimizations like this could be implemented at the VM level, so that hash.keys.to_set would automatically be efficient rather than having to explicitly use the efficient version #keys_to_set.