Project

General

Profile

Actions

Feature #19324

open

Enumerator.product => Enumerable#product

Added by zverok (Victor Shepelev) almost 2 years ago. Updated 7 months ago.

Status:
Assigned
Target version:
-
[ruby-core:111739]

Description

I know it might be too late after introducing a feature and releasing a version, but I find Enumerator.product quite confusing, and can't find any justification in #18685.

Problem 1: It is Array#product but Enumerator.product

[1, 2].product([4, 5])
# => [[1, 4], [1, 5], [2, 4], [2, 5]]

# Usually, when we add methods to Enumerable/Enumerator which
# already array had before, it is symmetric, say...
[1, nil, 2, 3].compact #=> [1, 2, 3]
[1, nil, 2, 3].lazy.compact.first(2) #=> [1, 2]

# But not in this case:
[1, 2].lazy.product([4, 5]).first(2)
# undefined method `product' for #<Enumerator::Lazy: [1, 2]> (NoMethodError)
# Because you "just" need to change it to:
Enumerator.product([1, 2].lazy, [4, 5]).first(2)
# => [[1, 4], [1, 5]]

No other method was "promoted" from Array this way

And in general, I believe core methods tend to belong to the first object in the expression and not be free module methods, Elixir style.

Problem 2: It is one letter different from Enumerator.produce

I understand I might be biased here (as a person who proposed produce), and that method is not as popular (yet?) as I hoped, but still, two methods that do completely different things and differ by one letter, both being somewhat vague verbs (so it is easy to confuse them unless you did a lot of math and "product" is firmly set for set product in your head).

I believe that EITHER of two problems would be concerning enough, but the combination of them seems to be a strong enough argument to make the change?.. (Maybe with graceful deprecation of module method in one next version, but, considering the Ruby 3.2 is just released, maybe vice versa, fix the problem in the next minor release?..)


Related issues 2 (1 open1 closed)

Related to Ruby master - Feature #6499: Array::zipRejectedmatz (Yukihiro Matsumoto)05/26/2012Actions
Related to Ruby master - Feature #8970: Array.zip and Array.productOpenActions

Updated by sawa (Tsuyoshi Sawada) almost 2 years ago

[T]wo methods [...] do completely different things and differ by one letter, both being somewhat vague verbs

produce is (most of the time) a verb or (sometimes) a noun. product is no doubt a noun.

Updated by duerst (Martin Dürst) almost 2 years ago

For product, one important issue not yet mentioned here is whether it's calculated on e.g. two or three separate arrays, where making the first of the arrays the receiver, or whether it's calculated on an array of arrays where the first of the arrays doesn't have any special significance.

In the second case, writing e.g.
[0, 1].product([0, 1], [0, 1]) feels rather inappropriate. But this case in my experience is actually rather more frequent than having a first array that is in some way special. Even more, if the array of arrays is already a variable, it gets rather involved:

array_of_arrays = [[0, 1]] * 3
# =>  [[0, 1], [0, 1], [0, 1]]
array_of_arrays.first.product(*array_of_arrays.drop(1))
# =>
# [[0, 0, 0],
#  [0, 0, 1],
#  [0, 1, 0],
#  [0, 1, 1],
#  [1, 0, 0],
#  [1, 0, 1],
#  [1, 1, 0],
#  [1, 1, 1]]

Enumerator.product makes that more straightforward: Enumerator.product(*array_of_arrays). (In my purely personal opinion, the need for the * is still a nuissance, but only a small one.)

Having these two invocations distinguished by whether the method is an instance method or a class method to some extent makes sense, although being able to use the instance method directly on an array of arrays might also make some sense. Having the instance method on Array and the class method on Enumerator seems something that could be improved without any backwards compatibility problems.

That still leaves the problem that I really would like to write array_of_arrays.product, in particular in a method chain. That will be difficult to get to because of backward compatibility problems.

Updated by mame (Yusuke Endoh) almost 2 years ago

As @duerst (Martin Dürst) mentioned, it is sometimes noted that Array#product and Array#zip are ugly and inconvenient due to its asymmetry.

https://bugs.ruby-lang.org/issues/8970
https://bugs.ruby-lang.org/issues/6499

I don't think we have to inherit the asymmetry when introducing a new Enumerator#product.

As for the name issue, I believe Enumerator.product is definitely the best name. The term is widely used for this operation in math and computer science, specifically set theory. If we must change either, we should rethink produce. I personally don't think it is worth changing, though.

Updated by zverok (Victor Shepelev) almost 2 years ago

First, I just looked through codebases at my disposal, and it seems that for product there are at least as many examples where the first argument has a special meaning as those where it doesn't. Examples are like:

user_names.product(['']) 
# => [['name1', ''], ['name2' ''], ...]
# this is the format some API requires, and .product is a simplest way to produce it

possible_methods.product(possible_arguments) 
#=> list of test cases to check the stability of methods for all known arguments

users.product(%i[original thumbnail]) #=> then produce big and small avatar for all of them

Or, in ruby/lib, there are exactly 3 entries, of which two have distinct arguments:

$ grep -F "product(" lib -r
lib/did_you_mean/tree_spell_checker.rb:      states = plausibles[0].product(*plausibles[1..-1])
lib/bundler/spec_set.rb:      handled = ["bundler"].product(platforms).map {|k| [k, true] }.to_h
lib/bundler/spec_set.rb:      deps = dependencies.product(platforms)

Rubocop (both entries have distinct arguments):

$ grep -F "product(" {lib,spec} -r
lib/rubocop/config_obsoletion.rb:          cops.product(parameters).map do |cop, parameter|
spec/rubocop/cop/style/self_assignment_spec.rb:  %i[+ - * ** / | & || &&].product(['x', '@x', '@@x']).each do |op, var|

But second and more important, the same argument (it is a method that operates on homogenous parameters, why it belongs to the first parameter?..) can be applied to a lot of methods. Like the aforementioned Array#zip, or Hash#merge (there are a lot of cases when we are just merging several homogenous hashes), or, well, even a + b!

But multi-parameter methods consistently belong to their first parameters throughout Ruby, save for several methods in Math maybe (which, honestly, always felt like a compromise or legacy).

So, the question is: is there a design plan to change many of the methods (Array.product in addition to Enumerable.product, Array.zip, Hash.merge, I dunno, maybe even Array.concat?..) to the Module.method protocol? Or, would Enumerable.product be the only sore exclusion from what is consistently done throughout Ruby?..

Updated by sawa (Tsuyoshi Sawada) almost 2 years ago

zverok (Victor Shepelev) wrote in #note-4:

[I]s there a design plan to change many of the methods (Array.product in addition to Enumerable.product, Array.zip, Hash.merge, I dunno, maybe even Array.concat?..) to the Module.method protocol? Or, would Enumerable.product be the only sore exclusion from what is consistently done throughout Ruby?

I believe quite many people are in favor of Array.product and/or Array.zip. I wish the implementation/existence of Enumerator.product will work as pressure towards implementing these methods as well. However, I do not think many other methods like those you mention would need to be handled similarly for the sake of consistency; (i) Array#product and Array#zip are different from (ii) Hash#merge and Array#concat. The (ii) methods with multiple arguments are nothing more than a repetition of binary operations (and furthermore, they are associative):

[1, 2, 3].concat([4, 5, 6], [7, 8, 9]) ==
[1, 2, 3].concat([4, 5, 6]).concat([7, 8, 9]) ==
[1, 2, 3].concat([4, 5, 6].concat([7, 8, 9]))
# => [1, 2, 3, 4, 5, 6, 7, 8, 9]

{a: 1}.merge({b: 2}, {c: 3}) ==
{a: 1}.merge({b: 2}).merge({c: 3}) ==
{a: 1}.merge({b: 2}.merge({c: 3}))
# => {:a=>1, :b=>2, :c=>3}

whereas the (i) methods are not:

[1, 2].product([3, 4], [5, 6])           # => [[1, 3, 5], [1, 3, 6], [1, 4, 5], [1, 4, 6], [2, 3, 5], [2, 3, 6], [2, 4, 5], [2, 4, 6]]
[1, 2].product([3, 4]).product([5, 6])   # => [[[1, 3], 5], [[1, 3], 6], [[1, 4], 5], [[1, 4], 6], [[2, 3], 5], [[2, 3], 6], [[2, 4], 5], [[2, 4], 6]]
[1, 2].product(([3, 4]).product([5, 6])) # => [[1, [3, 5]], [1, [3, 6]], [1, [4, 5]], [1, [4, 6]], [2, [3, 5]], [2, [3, 6]], [2, [4, 5]], [2, [4, 6]]]

[1, 2, 3].zip([4, 5, 6], [7, 8, 9])     # => [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
[1, 2, 3].zip([4, 5, 6]).zip([7, 8, 9]) # => [[[1, 4], 7], [[2, 5], 8], [[3, 6], 9]]
[1, 2, 3].zip([4, 5, 6].zip([7, 8, 9])) # => [[1, [4, 7]], [2, [5, 8]], [3, [6, 9]]]

Thus, while allowing multiple arguments with (ii) methods is just a convenience feature, the (i) methods with multiple arguments have a firm position. They have stronger motivation to exist as an independent concept, and accordingly, stronger motivation for their receiver and the arguments to be treated symmetrically.

Actions #6

Updated by duerst (Martin Dürst) almost 2 years ago

Updated by ttilberg (Tim Tilberg) almost 2 years ago

I believe quite many people are in favor of Array.product and/or Array.zip.

I personally find this surprising! It goes against what I find to be some of Ruby’s best design philosophies against Python:

Instance method versions are more discoverable. “What can I do with this object?” was how I learned how to be fluent in Ruby. I’ve always appreciated the consistency of “everything I do, is a method on this object”. Notably, compared to Python, you aren’t faced with many critical top-level functions that you must learn in addition to the object’s obvious API. This also affects chainability, which I’ve always appreciated in Ruby. Rather than continuing your chain as you go, you have to work backwards to add the function in.

Finally, to my contrary, I do truly appreciate the desire for a class/module level function, for exactly the reasons outlined above. It certainly does work nice for unpacking arrays as in the example above. And, additionally, it’s likely easier to use with more functional paradigms.

Is it not possible to allow for both styles?

I am personally a huge fan of the instance method versions of all above (zip, merge, etc). But I also appreciate the class/function styles too.

Updated by zverok (Victor Shepelev) almost 2 years ago

I believe quite many people are in favor of Array.product and/or Array.zip. I wish the implementation/existence of Enumerator.product will work as a pressure towards implementing these methods as well.

I can empathize with that, but there are two important things to consider:

  1. Even if standalone product is a thing, Enumerable#product still needs to be defined: as my examples show, "first argument is special" is frequent; and inability to just convert ary.product(others) to ary.lazy.product(others) is irritating
  2. If there would be a course to have a few "standalone methods", I believe it should be discussed as soon as possible to settle on an acceptable approaches. We have a small window of opportunity (while Enumerator.product is just introduced and can be considered not creating a lot of backward compatibility issues) to settle on the proper design. In particular, questions to consider:
  • Would standalone methods really be a thing (even if in distant future), or would it always be just one sore exception?
  • If it would be a thing, what would be the proper place for it? Because, actually even for standalone method I don't believe Enumerator is the most appropriate. If it would be in Array, it probably make sense to have a pair of Array#zip and Array.zip; probably the same is true for abstract enumerables: Enumerable#product and Enumerable.product.
  • Other currently existing Enumerator class methods (.new and .produce) are focused on "creating a new enumerator (from some arguments/algorithms)", while Enumerator.product is focused on "apply this operation to enumerables" (producing... whatever it will produce, the user isn't focused on "enumerator" concept here)

Updated by duerst (Martin Dürst) almost 2 years ago

Thinking about this a bit more, I guess both the "first argument is special" (A) and the "all arguments are the same" (B) have use cases. There's probably a third one, which is "all arguments are already in an array" (C). (B) and (C) are only one splat/[] away from each other; (A) is clearly a bit farther off from the other two, see array_of_arrays.first.product(*array_of_arrays.drop(1)) above.

Making a distinction by using the difference between method with receiver and class method is one way to accommodate these use cases, but I think it would be more Ruby-like if receivers where used in both cases and the distinction were made by method name. So as an example, we could have: a.zip(b, c) and [a,b,c].multi_zip. multi_zip here is only one possible name, there may be better ones. This solution would allow method chaining, which is important for good Ruby style.

Updated by zverok (Victor Shepelev) almost 2 years ago

Just to add a few points:

  1. I don't believe "zip with all arguments the same" has really any significant usage: at least, in codebases I looked in (Ruby stdlib, Rails, Rubocop, Sidekiq, my work apps) I can't find any. BTW, the trivial case of "have array of arrays, and want to zip each row" is just Array#transpose
  2. From the top of my head, I can't think of more methods like product and zip (that process uniformly array of arguments, and wouldn't be the same as arguments.reduce(:method), like Hash#merge is), which, considering (1), seems to leave product as a rare special case, not a useful trend to start;
  3. In modern Ruby, chainable product is trivially implemented: arys.then { |first, *rest| first.product(*rest) }. If, by chance, anonymous params for blocks would become a thing, it would even be arys.then { |first, *| first.product(*) }. It might still look worse than one specialized method, though.
  4. Oh, and returning to zip, its first argument is special, because it defines the size of the output:
%w[a b c].zip([1, 2], %i[x])
# => [["a", 1, :x], ["b", 2, nil], ["c", nil, nil]]
[1, 2].zip(%w[a b c], %i[x])
# => [[1, "a", :x], [2, "b", nil]]
%i[x].zip(%w[a b c], [1, 2])
# => [[:x, "a", 1]]

...so, considering first argument as an "owner" of the operation has its semantical meaning.

Updated by knu (Akinori MUSHA) over 1 year ago

Array#product has been there for years and is used a lot. What are the steps to introduce Enumerable#product without bringing confusion?

Updated by zverok (Victor Shepelev) over 1 year ago

@knu (Akinori MUSHA) Sorry, I might miss some context.

What's the problem with just straightforwardly introducing Enumerable#product (which would be redefined in Array), the same as it is with many other Enumerable methods?..

Updated by mame (Yusuke Endoh) 12 months ago

Discussed at the dev meeting. @knu (Akinori MUSHA) will respond, but I'll give you just an overview of the discussion.

  • We do not remove Enumerator.product.
  • We may add Enumerable#product that returns an Array (not an Enumerator)
    • because returning a non-Array is incompatible with existing Array#product.
    • also because Enumerable methods usually returns an Array.
  • If so, we will introduce Enumerable::Lazy#product too.
    • You will be able to use .lazy to get an Enumerator of product.
[1, 2].product([3, 4])              #=> [[1, 3], [1, 4], [2, 3], [2, 4]]
[1, 2].to_enum.product([3, 4])      #=> [[1, 3], [1, 4], [2, 3], [2, 4]]
[1, 2].lazy.product([3, 4])         #=> #<Enumerator::Lazy ... >
[1, 2].lazy.product([3, 4]).eager   #=> #<Enumerator ... >
[1, 2].lazy.product([3, 4]).force   #=> [[1, 3], [1, 4], [2, 3], [2, 4]]

Updated by knu (Akinori MUSHA) 11 months ago

We agreed that it's okay to add Enumerable#product that returns an array (of arrays) and the lazy counterpart Enumerable::Lazy#product.
Do you already have an implementation? Do you want me to do it?

Updated by knu (Akinori MUSHA) 11 months ago

  • Assignee set to knu (Akinori MUSHA)
  • Target version set to 3.3

Updated by zverok (Victor Shepelev) 11 months ago

@knu (Akinori MUSHA) Sorry for the late reply.

I don't have an implementation unfortunately. If it is possible for you to do it, it would be awesome.
Otherwise, I'll try to do it myself (unless it isn't too late).

Actions #17

Updated by mame (Yusuke Endoh) 11 months ago

  • Target version deleted (3.3)

Updated by knu (Akinori MUSHA) 11 months ago

I couldn't take the time to work on this, sorry.

One thing I noticed while I was thinking of this was that Array#product only takes arrays and is optimized for iterating over arrays using index access instead of calling each.
When we "promote" it to Enumerable#product that takes any enumerables, we'll probably need to incorporate the optimization into Enumerator::Product to avoid any performance degradation, where it should check if each is redefined.

Updated by zverok (Victor Shepelev) 11 months ago

I couldn't take the time to work on this, sorry.

Yeah, same :(

It seems this has to wait till 3.4 (especially considering the need to think about optimizations), I'll work on this in January, hopefully.

Sorry, I had a peculiar year, so the changes I proposed at the beginning of it weren't in my focus during most of it.

Actions #20

Updated by hsbt (Hiroshi SHIBATA) 7 months ago

  • Status changed from Open to Assigned
Actions

Also available in: Atom PDF

Like0
Like0Like0Like1Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0