Project

General

Profile

Actions

Feature #8478

open

The hash returned by Enumerable#group_by should have an empty array for its default value

Added by phiggins (Pete Higgins) almost 11 years ago. Updated almost 11 years ago.

Status:
Open
Target version:
-
[ruby-core:55260]

Description

Without this patch, nil checks might need to be done on the return value of Enumerable#group_by:

$ cat test_group_by.rb
a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }

puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Arrays: #{g[Array].size}"

$ ruby test_group_by.rb
Fixnums: 3
Strings: 2
test_group_by.rb:6:in <main>': undefined method size' for nil:NilClass (NoMethodError)

This patch adds a default value of an empty array to the hash returned by Enumerable#group_by, so the script above will work:

$ ./ruby -I.:lib test_group_by.rb
Fixnums: 3
Strings: 2
Arrays: 0


Files

group_by_default_empty_array.diff (1.81 KB) group_by_default_empty_array.diff phiggins (Pete Higgins), 06/03/2013 04:37 AM

Updated by marcandre (Marc-Andre Lafortune) almost 11 years ago

I understand the idea, but there are problems with this.

First of, it's a really bad idea to set the default of a hash to a mutable object:

a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }
g[Array] << [:foo]
g[Hash] # => [[:foo]], you expected []

Assuming now that your request was instead to change the default proc to {|hash, key| hash[key] = []}, which would not cause the problem above, you still have the problem of incompatibility.

Updated by Anonymous almost 11 years ago

@marcandre (Marc-Andre Lafortune): What do you mean by incompatibility? That the behavior is not the same as before, or something deeper?

Updated by marcandre (Marc-Andre Lafortune) almost 11 years ago

  • Assignee set to matz (Yukihiro Matsumoto)

Right, that the behavior is changing and that this could break existing code. It might have been an interesting idea at the time group_by was introduced, but I doubt that Matz will feel the change is compelling enough to break compatibility today.

Updated by duerst (Martin Dürst) almost 11 years ago

I don't think I agree with the proposer. The example looks good, but what about something like:

a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }

puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Zeroes: #{g[0].size}"

In other words, it might be desirable to return an empty array for a group key that is part of the collection that we group over (classes in the example), but it'd be better to cause an error for group keys that are outside of the collection (i.e., not classes in the example).

That would mean that the default proc should (for this example) be something like
{|hash, key| hash[key] = [] if key.class==Class }
Of course, that cannot be part of Ruby, unless maybe as a third argument of some form to group_by.

Updated by funny_falcon (Yura Sokolov) almost 11 years ago

There are always different ways to work with hash:

counters = Hash.new{|h,k| 0}
list.each{|e| counters[e]+=1}
some_thing_count = counters[some_thing]

compared to

counters = {}
list.each{|e| counters[e] = (counters[e]||0) + 1}
raise "THERE IS NO SOME-THING" unless counters[some_thing]

So that, group_by should not assume user's needs

So, one could assign default proc if he really wants:

a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }
g.default_proc = proc{|hash, key| hash[key] = [] if key.class==Class }

puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Arrays: #{g[Array].size}"
if g[0]
  puts "Zeroes: #{g[0].size}"
else
  raise "NO ZEROES!!!!"
end

Updated by phiggins (Pete Higgins) almost 11 years ago

marcandre (Marc-Andre Lafortune) wrote:

First of, it's a really bad idea to set the default of a hash to a mutable object:

Yes, I didn't consider the problem of returning the same array for every key, and that would have to be changed if this were to be seriously considered.

Assuming now that your request was instead to change the default proc to {|hash, key| hash[key] = []}, which would not cause the problem above, you still have the problem of incompatibility.

It seems like some appropriate milestone like 2.1 might be an acceptable place to make such an incompatible change.

Updated by phiggins (Pete Higgins) almost 11 years ago

duerst (Martin Dürst) wrote:

I don't think I agree with the proposer. The example looks good, but what about something like:

a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }

puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Zeroes: #{g[0].size}"

In other words, it might be desirable to return an empty array for a group key that is part of the collection that we group over (classes in the example), but it'd be better to cause an error for group keys that are outside of the collection (i.e., not classes in the example).

That would mean that the default proc should (for this example) be something like
{|hash, key| hash[key] = [] if key.class==Class }
Of course, that cannot be part of Ruby, unless maybe as a third argument of some form to group_by.

I certainly didn't intend for the default value to have some idea of the type of comparison done by the call to group_by, that seems to go well outside the scope of group_by and just general ruby semantics. It seems counter intuitive to me that while the point of the method is to return a hash of arrays, the return value would produce either an array with some values in it or nil. It returning a hash that produced arrays with some values in them or an empty arrays means you're only dealing with one type of thing, an array.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0