Feature #8478
openThe hash returned by Enumerable#group_by should have an empty array for its default value
Description
Without this patch, nil checks might need to be done on the return value of Enumerable#group_by:
$ cat test_group_by.rb
a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }
puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Arrays: #{g[Array].size}"
$ ruby test_group_by.rb
Fixnums: 3
Strings: 2
test_group_by.rb:6:in <main>': undefined method
size' for nil:NilClass (NoMethodError)
This patch adds a default value of an empty array to the hash returned by Enumerable#group_by, so the script above will work:
$ ./ruby -I.:lib test_group_by.rb
Fixnums: 3
Strings: 2
Arrays: 0
Files
Updated by marcandre (Marc-Andre Lafortune) over 11 years ago
I understand the idea, but there are problems with this.
First of, it's a really bad idea to set the default of a hash to a mutable object:
a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }
g[Array] << [:foo]
g[Hash] # => [[:foo]], you expected []
Assuming now that your request was instead to change the default proc to {|hash, key| hash[key] = []}, which would not cause the problem above, you still have the problem of incompatibility.
Updated by Anonymous over 11 years ago
@marcandre (Marc-Andre Lafortune): What do you mean by incompatibility? That the behavior is not the same as before, or something deeper?
Updated by marcandre (Marc-Andre Lafortune) over 11 years ago
- Assignee set to matz (Yukihiro Matsumoto)
Right, that the behavior is changing and that this could break existing code. It might have been an interesting idea at the time group_by
was introduced, but I doubt that Matz will feel the change is compelling enough to break compatibility today.
Updated by duerst (Martin Dürst) over 11 years ago
I don't think I agree with the proposer. The example looks good, but what about something like:
a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }
puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Zeroes: #{g[0].size}"
In other words, it might be desirable to return an empty array for a group key that is part of the collection that we group over (classes in the example), but it'd be better to cause an error for group keys that are outside of the collection (i.e., not classes in the example).
That would mean that the default proc should (for this example) be something like
{|hash, key| hash[key] = [] if key.class==Class }
Of course, that cannot be part of Ruby, unless maybe as a third argument of some form to group_by.
Updated by funny_falcon (Yura Sokolov) over 11 years ago
There are always different ways to work with hash:
counters = Hash.new{|h,k| 0}
list.each{|e| counters[e]+=1}
some_thing_count = counters[some_thing]
compared to
counters = {}
list.each{|e| counters[e] = (counters[e]||0) + 1}
raise "THERE IS NO SOME-THING" unless counters[some_thing]
So that, group_by should not assume user's needs
So, one could assign default proc if he really wants:
a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }
g.default_proc = proc{|hash, key| hash[key] = [] if key.class==Class }
puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Arrays: #{g[Array].size}"
if g[0]
puts "Zeroes: #{g[0].size}"
else
raise "NO ZEROES!!!!"
end
Updated by phiggins (Pete Higgins) over 11 years ago
marcandre (Marc-Andre Lafortune) wrote:
First of, it's a really bad idea to set the default of a hash to a mutable object:
Yes, I didn't consider the problem of returning the same array for every key, and that would have to be changed if this were to be seriously considered.
Assuming now that your request was instead to change the default proc to {|hash, key| hash[key] = []}, which would not cause the problem above, you still have the problem of incompatibility.
It seems like some appropriate milestone like 2.1 might be an acceptable place to make such an incompatible change.
Updated by phiggins (Pete Higgins) over 11 years ago
duerst (Martin Dürst) wrote:
I don't think I agree with the proposer. The example looks good, but what about something like:
a = [1, 2, 3, "a", "b"]
g = a.group_by {|o| o.class }puts "Fixnums: #{g[Fixnum].size}"
puts "Strings: #{g[String].size}"
puts "Zeroes: #{g[0].size}"In other words, it might be desirable to return an empty array for a group key that is part of the collection that we group over (classes in the example), but it'd be better to cause an error for group keys that are outside of the collection (i.e., not classes in the example).
That would mean that the default proc should (for this example) be something like
{|hash, key| hash[key] = [] if key.class==Class }
Of course, that cannot be part of Ruby, unless maybe as a third argument of some form to group_by.
I certainly didn't intend for the default value to have some idea of the type of comparison done by the call to group_by, that seems to go well outside the scope of group_by and just general ruby semantics. It seems counter intuitive to me that while the point of the method is to return a hash of arrays, the return value would produce either an array with some values in it or nil. It returning a hash that produced arrays with some values in them or an empty arrays means you're only dealing with one type of thing, an array.
Updated by hsbt (Hiroshi SHIBATA) 8 months ago
- Status changed from Open to Assigned