Bug #21780
openChange the default size of Enumerator.produce back to infinity
Description
In #21701 a new argument size: was introduced, and its default value is nil (unknown).
While I support the new argument, I'd argue that the default should be Float::INFINITY.
Reasoning: By design, Enumerator.produce is infinite (there is no internal condition to stop iteration), and the simplest, most straightforward usages of the method would produce definitely infinite iterators, which the user than can limit with take, or take_while or similar methods.
To produce the enumerator that will stop by itself requires explicit raising of StopIteration, which I expect to be a (slightly) advanced technique, and those who use it might be more inclined to provide additional arguments to clarify the semantics.
While Enumerator#size is hardly frequently used now (other than in #to_set, which started the discussion), it might be in the future, and I believe it is better to stick with more user-friendly defaults.
Now:
# very trivial enumerator, but if you want it to have "proper" size, you need
# to not forget to use an elaborate argument and type additional 21 characters
Enumerator.produce(1, size: Float::INFINITY, &:succ)
# already non-trivial enumerator, which is hardly frequently used, but the
# current defaults correspond to its semantics:
Enumerator.produce(Date.today) {
raise StopIteration if it.tuesday? && it.day.odd?
it + 1
}
With my proposal:
# trivial, most widespread case:
Enumerator.produce(1, &:succ).size #=> Infinity
# non-trivial case, with the enumerator designer clarifying their
# intention that "we are sure it stops somewhere":
Enumerator.produce(Date.today, size: nil) {
raise StopIteration if it.tuesday? && it.day.odd?
it + 1
}
Updated by Eregon (Benoit Daloze) 2 days ago
· Edited
I disagree on this one, as written on https://bugs.ruby-lang.org/issues/21701#note-3
I think Enumerator#size should only be non-nil when it is known to be the exact size.
In this case it is not known if it is infinite, so returning Float::INFINITY for the size is "wrong".
One use case I know of for Enumerator#size is to do a progress bar while iterating the Enumerator.
That can only work reliably if the non-nil size is the exact size.
Returning Float::INFINITY when it is not would be misleading, though of course returning nil won't give the actual size, which might simply be not known.
BTW, your examples use Enumerator.new but the text seems to be about Enumerator.produce.
I think either way it applies to both the same way though.
Updated by Eregon (Benoit Daloze) 2 days ago
Actually for Enumerator.new, it's trivial to not be infinite and does not even need StopIteration, e.g.:
Enumerator.new { |y| y << 1 }.count # => 1
So I guess you meant to use Enumerator.produce instead in your examples above.
Updated by zverok (Victor Shepelev) 2 days ago
· Edited
- Description updated (diff)
I think
Enumerator#sizeshould only be non-nil when it is known to be the exact size.
In this case it is not know if it is infinite, so returningFloat::INFINITYfor the size is "wrong".
I would argue that it is known to be infinite: that's how produce works: loops infinitely, unless explicitly stopped by an exception, there is no other way than an exceptional one (while this might seem to be a dumb pun, I actually think that we have a useful distinction here).
So I would argue that the default expectation of the user to "not think about it and trust Ruby to do the sane thing", and the sane thing is "produce is infinite unless you raise that specific exception" (even break wouldn't work... which is kinda unpleasant, but a discussion for another time).
In a rare situation when they'd question the behavior, there is a clearly documented way to adjust it.
BTW, your examples use
Enumerator.newbut the text seems to be aboutEnumerator.produce.
Yes, thank you, fixed. The title said what I meant but the code was broken, sorry.
Updated by zverok (Victor Shepelev) 1 day ago
There are, by the way, other effects of the current default that are, even if minor, still annoying:
Enumerator.produce(1, &:succ).lazy.take(6).size
# Ruby 3.4: => 6 -- which is correct and useful
# Ruby 4.0: => nil -- which is ... less useful
Updated by mame (Yusuke Endoh) about 6 hours ago
- Related to Feature #21701: Enumerator.produce accepts an optional `size` keyword argument added
Updated by knu (Akinori MUSHA) about 5 hours ago
The argument that Enumerator.produce is infinite by nature is certainly valid. However, the change that made Enumerator#to_set refuse to operate when the size returns infinity introduced a compatibility issue: it breaks existing code that relies on calling #to_set on Enumerator.produce (that the programmer knows is finite) being possible.
Redefining Enumerator::Produce#to_set to ignore the size is one way, but that would be awkward and incorrect in the long term. This decision was made after balancing backward compatibility against what the default size of produce() should be.
Updated by zverok (Victor Shepelev) about 3 hours ago
However, the change that made Enumerator#to_set refuse to operate when the size returns infinity introduced a compatibility issue
TBH, I don't see the compatibility argument applied with any consistency here.
Let's imagine several cases:
-
Somebody relies on constructing elaborate
Enumerator.produce-based enumerators that throwStopIterationto terminate (instead of using simpler techniques), and then applies#to_setto them. In this feature, we are keeping compatibility for them. -
Somebody uses
Enumerator.producealongside other types of enumerators. In some branch of their code, they doraise "Can't do this operation" if enum.size == Float::INFINITY. The compatibility is broken for them. -
Somebody relies on
Enumerator.produce { ... }.take(5)to have non-nilsize, throwing it around as a duck-typed array. The compatibility is broken for them. -
(Just to expand the scope of possible compatibility studies) Somebody might've suddenly had code like this, and it is now also broken:
Enumerator.produce(size: 5) { it.merge(size: it[:size] + 1) }.take(8) #=> [{size: 5}, {size: 6}, {size: 7}, {size: 8}, {size: 9}, {size: 10}, {size: 11}, {size: 12}]
Intuitively, I would say that (2) is the most basic case that shouldn't be broken; (3) is a (weak) evidence to the same; while both (1) and (4) are both a "collateral damage" that should be accepted (because if we treat compatibility with any real rigor, no changes should be made at all, "any change breaks somebody's usecase").
Is there any study that I am not aware of that says that (1) is the widespread case and breaking it will outrage a huge part of the community, while breaking 2-3 (as well as the general semantics of the method) is negligible?
Or maybe there is some evidence/discussion that it is authors of elaborate enumerators who wouldn't understand a very small (and well-explained semantically) change to fix the incompatibility, while those who would expect this enumerator to be infinite should just swallow it and add size: Float::INFINITY maybe in a dozen places in their code?
What am I missing here?