Project

General

Profile

Actions

Bug #21780

open

Change the default size of Enumerator.produce back to infinity

Bug #21780: Change the default size of Enumerator.produce back to infinity

Added by zverok (Victor Shepelev) 2 days ago. Updated about 1 hour ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:124190]

Description

In #21701 a new argument size: was introduced, and its default value is nil (unknown).

While I support the new argument, I'd argue that the default should be Float::INFINITY.

Reasoning: By design, Enumerator.produce is infinite (there is no internal condition to stop iteration), and the simplest, most straightforward usages of the method would produce definitely infinite iterators, which the user than can limit with take, or take_while or similar methods.

To produce the enumerator that will stop by itself requires explicit raising of StopIteration, which I expect to be a (slightly) advanced technique, and those who use it might be more inclined to provide additional arguments to clarify the semantics.

While Enumerator#size is hardly frequently used now (other than in #to_set, which started the discussion), it might be in the future, and I believe it is better to stick with more user-friendly defaults.

Now:

# very trivial enumerator, but if you want it to have "proper" size, you need 
# to not forget to use an elaborate argument and type additional 21 characters
Enumerator.produce(1, size: Float::INFINITY, &:succ)

# already non-trivial enumerator, which is hardly frequently used, but the 
# current defaults correspond to its semantics:
Enumerator.produce(Date.today) {
  raise StopIteration if it.tuesday? && it.day.odd?
  it + 1
}

With my proposal:

# trivial, most widespread case:
Enumerator.produce(1, &:succ).size #=> Infinity

# non-trivial case, with the enumerator designer clarifying their
# intention that "we are sure it stops somewhere":
Enumerator.produce(Date.today, size: nil) {
  raise StopIteration if it.tuesday? && it.day.odd?
  it + 1
}

Related issues 2 (0 open2 closed)

Related to Ruby - Feature #21701: Enumerator.produce accepts an optional `size` keyword argumentClosedknu (Akinori MUSHA)Actions
Related to Ruby - Bug #21654: Set#new calls extra methods compared to previous versionsClosedActions

Updated by Eregon (Benoit Daloze) 2 days ago · Edited Actions #1 [ruby-core:124194]

I disagree on this one, as written on https://bugs.ruby-lang.org/issues/21701#note-3

I think Enumerator#size should only be non-nil when it is known to be the exact size.
In this case it is not known if it is infinite, so returning Float::INFINITY for the size is "wrong".

One use case I know of for Enumerator#size is to do a progress bar while iterating the Enumerator.
That can only work reliably if the non-nil size is the exact size.
Returning Float::INFINITY when it is not would be misleading, though of course returning nil won't give the actual size, which might simply be not known.

BTW, your examples use Enumerator.new but the text seems to be about Enumerator.produce.
I think either way it applies to both the same way though.

Updated by Eregon (Benoit Daloze) 2 days ago Actions #2 [ruby-core:124195]

Actually for Enumerator.new, it's trivial to not be infinite and does not even need StopIteration, e.g.:

Enumerator.new { |y| y << 1 }.count # => 1

So I guess you meant to use Enumerator.produce instead in your examples above.

Updated by zverok (Victor Shepelev) 2 days ago · Edited Actions #3 [ruby-core:124196]

  • Description updated (diff)

I think Enumerator#size should only be non-nil when it is known to be the exact size.
In this case it is not know if it is infinite, so returning Float::INFINITY for the size is "wrong".

I would argue that it is known to be infinite: that's how produce works: loops infinitely, unless explicitly stopped by an exception, there is no other way than an exceptional one (while this might seem to be a dumb pun, I actually think that we have a useful distinction here).

So I would argue that the default expectation of the user to "not think about it and trust Ruby to do the sane thing", and the sane thing is "produce is infinite unless you raise that specific exception" (even break wouldn't work... which is kinda unpleasant, but a discussion for another time).

In a rare situation when they'd question the behavior, there is a clearly documented way to adjust it.

BTW, your examples use Enumerator.new but the text seems to be about Enumerator.produce.

Yes, thank you, fixed. The title said what I meant but the code was broken, sorry.

Updated by zverok (Victor Shepelev) 2 days ago Actions #4 [ruby-core:124199]

There are, by the way, other effects of the current default that are, even if minor, still annoying:

Enumerator.produce(1, &:succ).lazy.take(6).size
# Ruby 3.4: => 6    -- which is correct and useful
# Ruby 4.0: => nil  -- which is ... less useful

Updated by mame (Yusuke Endoh) about 13 hours ago Actions #5

  • Related to Feature #21701: Enumerator.produce accepts an optional `size` keyword argument added

Updated by knu (Akinori MUSHA) about 12 hours ago Actions #6 [ruby-core:124223]

The argument that Enumerator.produce is infinite by nature is certainly valid. However, the change that made Enumerator#to_set refuse to operate when the size returns infinity introduced a compatibility issue: it breaks existing code that relies on calling #to_set on Enumerator.produce (that the programmer knows is finite) being possible.

Redefining Enumerator::Produce#to_set to ignore the size is one way, but that would be awkward and incorrect in the long term. This decision was made after balancing backward compatibility against what the default size of produce() should be.

Updated by zverok (Victor Shepelev) about 11 hours ago Actions #7 [ruby-core:124224]

However, the change that made Enumerator#to_set refuse to operate when the size returns infinity introduced a compatibility issue

TBH, I don't see the compatibility argument applied with any consistency here.

Let's imagine several cases:

  1. Somebody relies on constructing elaborate Enumerator.produce-based enumerators that throw StopIteration to terminate (instead of using simpler techniques), and then applies #to_set to them. In this feature, we are keeping compatibility for them.

  2. Somebody uses Enumerator.produce alongside other types of enumerators. In some branch of their code, they do raise "Can't do this operation" if enum.size == Float::INFINITY. The compatibility is broken for them.

  3. Somebody relies on Enumerator.produce { ... }.take(5) to have non-nil size, throwing it around as a duck-typed array. The compatibility is broken for them.

  4. (Just to expand the scope of possible compatibility studies) Somebody might've suddenly had code like this, and it is now also broken:

    Enumerator.produce(size: 5) { it.merge(size: it[:size] + 1) }.take(8)
    #=> [{size: 5}, {size: 6}, {size: 7}, {size: 8}, {size: 9}, {size: 10}, {size: 11}, {size: 12}]
    

Intuitively, I would say that (2) is the most basic case that shouldn't be broken; (3) is a (weak) evidence to the same; while both (1) and (4) are both a "collateral damage" that should be accepted (because if we treat compatibility with any real rigor, no changes should be made at all, "any change breaks somebody's usecase").

Is there any study that I am not aware of that says that (1) is the widespread case and breaking it will outrage a huge part of the community, while breaking 2-3 (as well as the general semantics of the method) is negligible?

Or maybe there is some evidence/discussion that it is authors of elaborate enumerators who wouldn't understand a very small (and well-explained semantically) change to fix the incompatibility, while those who would expect this enumerator to be infinite should just swallow it and add size: Float::INFINITY maybe in a dozen places in their code?

What am I missing here?

Updated by knu (Akinori MUSHA) about 6 hours ago · Edited Actions #8 [ruby-core:124236]

zverok (Victor Shepelev) wrote in #note-7:

TBH, I don't see the compatibility argument applied with any consistency here.

Let's imagine several cases:

  1. Somebody relies on constructing elaborate Enumerator.produce-based enumerators that throw StopIteration to terminate (instead of using simpler techniques), and then applies #to_set to them. In this feature, we are keeping compatibility for them.

This is where you are mistaken.

  • Ruby 3.4.7

    % ruby -ve 'e=Enumerator.produce(1) {it>=3 ? raise(StopIteration) : it+1}; p e.to_set'
    ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [arm64-darwin25]
    #<Set: {1, 2, 3}>
    
  • Ruby master

    % ruby -ve 'e=Enumerator.produce(1) {it>=3 ? raise(StopIteration) : it+1}; p e.to_set'
    ruby 4.0.0dev (2025-12-16T10:52:45Z master 2b1a9afbfb) +PRISM [arm64-darwin25]
    Set[1, 2, 3]
    

    (compatible; ignoring the string representation difference)

  • Ruby master with 79a6ec74831cc47d022b86dfabe3c774eaaf91ca reverted

    % ruby -ve 'e=Enumerator.produce(1) {it>=3 ? raise(StopIteration) : it+1}; p e.to_set'
    ruby 4.0.0dev (2025-12-16T10:52:45Z master 2b1a9afbfb) +PRISM [arm64-darwin25]
    -e:1:in 'Enumerator#to_set': cannot convert an infinite enumerator to a set (ArgumentError)
            from -e:1:in '<main>'
    

    (compatibility broken)

Updated by knu (Akinori MUSHA) about 5 hours ago Actions #9 [ruby-core:124237]

I may have misread your comment. Let me review again.

Updated by zverok (Victor Shepelev) about 5 hours ago Actions #10 [ruby-core:124238]

This is where you are mistaken.

No, that's exactly what I've meant:

# case 1:

e = Enumerator.produce(1, size: Float::INFINITY) {it>=3 ? raise(StopIteration) : it+1}
p e.to_set
# ruby 3.4: #<Set: {1, 2, 3}>
# master: Set[1, 2, 3]

# master with my proposed fix (size: Infinity by default):
#  cannot convert an infinite enumerator to a set (ArgumentError) -- compatibility broken
# fix for the code authors:
e = Enumerator.produce(1, size: nil) {it>=3 ? raise(StopIteration) : it+1} # -- clear change, easy to explain
# Another possible fix, "come to think about it, I don't need StopIteration!"
e = Enumerator.produce(1) { it + 1 }.take_while { it <= 3 }.to_set #=> Set[1, 2, 3]

# case 2:

e = Enumerator.produce(1, &:succ)
if e.size == Float::INFINITY
  puts "Early stop processing!" # in reality, probably raise/early return
else
  puts "Continue processing"
end

# ruby 3.4: "Early stop processing"
# master: "Continue processing" -- compatibility broken
# fix for the code authors:
Enumerator.produce(1, size: Float::INFINITY, &:succ) # -- ugly change to trivial code
# master with my proposed fix (size: Infinity by default): "Early stop processing" -- compatibility OK

# case 3:

e = Enumerator.produce(1, &:succ).lazy.take(6)
p e.size

# ruby 3.4: 6
# master: nil -- compatibility broken
# master with my proposed fix: 6 -- compatibility OK

Why do we consider that case (1) of those is the one where we preserve compatibility, while two other should be broken, and not vice versa?

Updated by Eregon (Benoit Daloze) about 5 hours ago · Edited Actions #11 [ruby-core:124241]

It's good points about compatibility, I agree losing the size on .take is not good.

IMO trying to detect "infinite loop" (in https://bugs.ruby-lang.org/issues/21654) is kinda pointless, it's the halting problem, for the vast majority of cases we cannot know.
Enumerators which have a .size => Float::INFINITY may or may not be infinite, they might use break (and yet not adjust .size), throw an exception (can't be fully predicted and always possible), etc.

So I think a good fix would be to keep Enumerator.produce.size as Float::INFINITY, and stop assuming that enum.size.infinite? means infinite loop to iterate since it's not guaranteed (similar thinking as in this comment).

Updated by Eregon (Benoit Daloze) about 5 hours ago Actions #12

  • Related to Bug #21654: Set#new calls extra methods compared to previous versions added

Updated by knu (Akinori MUSHA) about 4 hours ago · Edited Actions #13 [ruby-core:124243]

zverok (Victor Shepelev) wrote in #note-7:

  1. Somebody uses Enumerator.produce alongside other types of enumerators. In some branch of their code, they do raise "Can't do this operation" if enum.size == Float::INFINITY. The compatibility is broken for them.

I consider that kind of usage as a safeguard that usually works in development rather than something that should be relied upon in production code. However, the same goes for the change in Enumerator#to_set, so we might want to consider reverting it in the first place.

  1. Somebody relies on Enumerator.produce { ... }.take(5) to have non-nil size, throwing it around as a duck-typed array. The compatibility is broken for them.

Enumerator.produce { ... }.take(5) returns an array. Did you mean Enumerator.produce { ... }.lazy.take(5)? If so, then yes, the compatibility is broken for them.

  1. (Just to expand the scope of possible compatibility studies) Somebody might've suddenly had code like this, and it is now also broken:
    Enumerator.produce(size: 5) { it.merge(size: it[:size] + 1) }.take(8)
    #=> [{size: 5}, {size: 6}, {size: 7}, {size: 8}, {size: 9}, {size: 10}, {size: 11}, {size: 12}]
    

This should have never been allowed. It was my mistake I did not prohibit it.

Intuitively, I would say that (2) is the most basic case that shouldn't be broken; (3) is a (weak) evidence to the same; while both (1) and (4) are both a "collateral damage" that should be accepted (because if we treat compatibility with any real rigor, no changes should be made at all, "any change breaks somebody's usecase").

I think (3) is more important than (2), and I'm not absolutely sure about how (1) and (3) compare in importance. If the concern about (1) could be eliminated by reverting Enumerator#to_set, we would be able to save both.

Is there any study that I am not aware of that says that (1) is the widespread case and breaking it will outrage a huge part of the community, while breaking 2-3 (as well as the general semantics of the method) is negligible?

Or maybe there is some evidence/discussion that it is authors of elaborate enumerators who wouldn't understand a very small (and well-explained semantically) change to fix the incompatibility, while those who would expect this enumerator to be infinite should just swallow it and add size: Float::INFINITY maybe in a dozen places in their code?

I don't have any data about how widespread each of these use cases is, or even how Enumerator.produce is used in production code. I just know Enumerator.produce is still immature and would love to hear from more users about how they use it.

Updated by knu (Akinori MUSHA) about 4 hours ago Actions #14 [ruby-core:124244]

I'm leaning toward doing these:

  • Removing the Enumerator#to_set override that refuses to work against an infinite enumerator as a safeguard
  • Reverting the default size of Enumerator.produce from nil to infinity

This way we can guarantee 100% backward compatibility at the expense of some safety against infinite enumerators with #to_set (and potentially #to_a in the future).

What do you think?

Updated by zverok (Victor Shepelev) about 4 hours ago Actions #15 [ruby-core:124245]

knu (Akinori MUSHA) wrote in #note-14:

I'm leaning toward doing these:

  • Removing the Enumerator#to_set override that refuses to work against an infinite enumerator as a safeguard
  • Reverting the default size of Enumerator.produce from nil to infinity

This way we can guarantee 100% backward compatibility at the expense of some safety against infinite enumerators with #to_set (and potentially #to_a in the future).

What do you think?

IMO, this seems like a great compromise!

Updated by mame (Yusuke Endoh) about 3 hours ago Actions #16 [ruby-core:124248]

I couldn't find any cases where Enumerator#size returns Float::INFINITY for a finite-length Enumerator, except Enumerator.produce and when explicitly creating a fake-sized enumerator with Enumerator.new(Float::INFINITY). Am I missing something?

raise StopIteration is indeed tricky, but even so, if it could be finite, I think it is correct for Enumerator.produce's size to return nil (the size is unknown).

What problem do you have if the size is nil?

Updated by knu (Akinori MUSHA) about 3 hours ago Actions #17 [ruby-core:124250]

mame (Yusuke Endoh) wrote in #note-16:

What problem do you have if the size is nil?

Among others, Enumerator.produce(1) { it+1 }.lazy.take(5) now returns nil, which returned 5 previously.

https://github.com/ruby/ruby/blob/6b35f074bd83794007d4c7b773a289bddef0dbdf/enumerator.c#L2433

Updated by mame (Yusuke Endoh) about 3 hours ago Actions #18 [ruby-core:124251]

With Ruby 3.4:

Enumerator.produce(1) { raise StopIteration }.lazy.take(5).size      #=> 5
Enumerator.produce(1) { raise StopIteration }.lazy.take(5).to_a.size #=> 1

I believe it is fair to call this behavior a bug.

Updated by knu (Akinori MUSHA) about 3 hours ago Actions #19 [ruby-core:124252]

The point of this discussion is that fixing the "bug" broke compatibility, and we are comparing the impacts. That "bug" can or should be properly fixed by changing the code to Enumerator.produce(1, size: nil) { raise StopIteration }.lazy.take(5) after 4.0 is out.

Updated by mame (Yusuke Endoh) about 2 hours ago Actions #20 [ruby-core:124253]

Yes, all bug fixes could be incompatibilities. What I am interested in is whether this incompatibility is actually causing a real problem, and if so, what kind of applications or libraries was affected.

Updated by zverok (Victor Shepelev) about 2 hours ago Actions #21 [ruby-core:124254]

@mame (Yusuke Endoh) My thinking (already outlined above) goes this way:

  • while Enumerator#size, I believe, is not used extensively, it might in the future (this very change of Enumerator.produce call-sequence brings some attention to it)
  • Enumerator::Lazy#take respecting the infinity size is a (weak) argument that Enumerator#size matters at least sometimes, and it is good to have it aligned with the user's intuitions
  • Enumerator.produce being an infinite enumerator corresponds to its design. Yes, it can be broken from by an exception, but its more general behavior is "loop infinitely". So, basically:
    • if the developer relies on the default behavior (which produces an infinite sequence), they have the default size (infinity)
    • if they consciously adjust behavior, by throwing exceptions, it might be OK for them (if they care at all), to adjust size to nil/"unknown beforehand" or some known value.

The only reason the default for size: was chosen to be nil is that Enumerator.produce {...}.to_set was "broken". And, honestly, I don't think it is a good reason to muddy the semantics. With @knu's proposal to just stop checking #size is Enumerator#to_set seems to resolve this.

Are there clear advantages to keep it nil now, if the enumerator is infinite by design?

TL;DR: I don't think many devs care, but for those who do, Infinity is more reasonable for this enumerator.

Updated by mame (Yusuke Endoh) about 2 hours ago Actions #22 [ruby-core:124255]

I understand zverok's feeling. In fact, I thought Enumerator.produce always returned an infinite Enumerator. I'm surprised it can be stopped by StopIteration. However, since that functionality actually exists, I think #size has no choice but to return nil.

while Enumerator#size, I believe, is not used extensively

I agree. And, that's precisely why I believe making #size return nil won't cause any real-world incompatibility issues.

Updated by zverok (Victor Shepelev) about 1 hour ago Actions #23 [ruby-core:124256]

However, since that functionality actually exists, I think #size has no choice but to return nil.

Respectfully, I disagree. I think it is much easier and more useful to explain it along the lines of...

It is infinite by implementation, but you can break it with exception (which is, well, for exceptional situations); if you care about size reporting, here is how you can adjust it (the new size: parameter introduced).

Yes, if you didn't pass it AND broken the enumerator, the #size "lies", but as the parameter exist, it is no different to "lie" in this situation:

e = Enumerator.new(5) { |y| 2.times { y << it } }
e.size #=> 5
e.to_a.size #=> 2

If the user cares, they will appreciate "you have control, the defaults are sane for the default usage". I agree there might be not many those who care, but the feature exists, and it exposes some semantics, and it might become more used in the future.

The discussion is mostly theoretical, I agree. But I believe that if Ruby 4.0 draws some attention to Enumertor.produce-enumerator sizes (by introducing the new parameter), it is better to use the most useful defaults. And I do believe that Infinity is the most useful.

Actions

Also available in: PDF Atom