Enumerable#size

Closed
Normal
[ruby-core:45805]

Description

Now that it has been made clear that Enumerable#count never calls #size and that we have Enumerable#lazy, let me propose again an API for a lazy way to get the size of an Enumerable: Enumerable#size.

• call-seq:
• enum.size # => nil, Integer or Float::INFINITY
• Returns the number of elements that will be yielded, without going through
• the iteration (i.e. lazy), or +nil+ if it can't be calculated lazily.
• perm = (1..100).to_a.permutation(4)
• perm.size # => 94109400
• perm.each_cons(2).size # => 94109399
• loop.size # => Float::INFINITY
• [42].drop_while.size # => nil

About 66 core methods returning enumerators would have a lazy size, like each_slice, permutation or lazy.take.

A few would have size return nil:
Array#{r}index, {take|drop}_while
Enumerable#find{_index}, {take|drop}_while
IO: all methods

Sized enumerators can also be created naturally by providing a block to to_enum/enum_for or a lambda to Enumerator.new.

Example for to_enum:

class Integer
def composition
yield [] if zero?
downto(1) do |i|
(self - i).composition do |comp|
yield [i, *comp]
end
end
end
end

4.composition.to_a
# => [[4], [3, 1], [2, 2], [2, 1, 1], [1, 3], [1, 2, 1], [1, 1, 2], [1, 1, 1, 1]]
42.composition.size # => 2199023255552

Example for Enumerator.new:

def lazy_product(*enums)
sizer = ->{
enums.inject(1) do |product, e|
break if (size = e.size).nil?
product * size
end
}
Enumerator.new(sizer) do |yielder|
# ... generate combinations
end
end

lazy_product(1..4, (1..3).each_cons(2)).size # => 8
lazy_product(1..4, (1..3).cycle).size # => Float::INFINITY

• enumerator.c (enumerator_initialize): Warn when using deprecated form [Feature #6636]

• enumerator: New method #size; constructor accepts size
[Feature #6636]

• include/ruby/intern.h: RETURN_SIZED_ENUMERATOR for support of
sized enumerators

• enumerator.c (obj_to_enum): Have #to_enum accept a block [Feature #6636]

• enumerator.c: Support #size for enumerators created from enumerators [Feature #6636]

• array.c: Support for Enumerator#size in trivial cases: each, each_index, reverse_each, sort_by, collect, collect!, select, select!, keep_if, reject, reject!, delete_if [Feature #6636]

• array.c (rb_ary_permutation): Support for Array#permutation.size [Feature #6636]

• array.c (rb_ary_combination): Support for Array#combination.size [Feature #6636]

• array.c (rb_ary_repeated_permutation): Support for repeated_permutation.size [Feature #6636]

• array.c (rb_ary_repeated_combination): Support for repeated_combination.size [Feature #6636]

• array.c (rb_ary_cycle): Support for Array#cycle.size [Feature #6636]

• vm_eval.c (rb_f_loop): Support for loop.size [Feature #6636]

• enum.c: Support for enumerators created by Enumerable with forwarding: find_all, reject, ... [Feature #6636]

• enum.c (enum_each_slice): Support for Enumerable#each_slice.size [Feature #6636]

• enum.c (enum_each_cons): Support for Enumerable#each_cons.size [Feature #6636]

• enum.c (enum_cycle): Support for Enumerable#cycle.size [Feature #6636]

• hash.c: Support for enumerators created by Hash: delete_if, reject!, ... [Feature #6636]

• hash.c: Support for enumerators created by ENV: each, each_value, ... [Feature #6636]

• struct.c: Support for Struct's enumerators #size [Feature #6636]

• numeric.c: Extract ruby_float_step_size [Feature #6636]

• numeric.c (num_step): Support for Numeric#step.size [Feature #6636]

• range.c: Support for Range#size and Range#each.size [Feature #6636]

• range.c: Support for range.step.size [Feature #6636]

• numeric.c (int_upto, int_downto): Support for Integer#{down|up}to.size [Feature #6636]

• numeric.c (int_dotimes): Support for Integer#times.size [Feature #6636]

• string.c: Support for String#{each_byte,each_char,each_codepoint}.size [Feature #6636]

• enumerator.c: Support for lazy.size [Feature #6636]

• enumerator.c: Support for lazy.{map|flat_map|...}.size [Feature #6636]

• enumerator.c: Support for lazy.take.size [Feature #6636]

• enumerator.c: Add support for lazy.drop.size [Feature #6636]

• enumerator.c: Support for lazy.cycle.size [Feature #6636]

• NEWS: Update for lazy size evaluation [Feature #6636]

History

Updated by marcandre (Marc-Andre Lafortune)over 7 years ago

Attaching one-minute slide

Updated by mame (Yusuke Endoh)over 7 years ago

• Status changed from Open to Assigned

Yusuke Endoh mame@tsg.ne.jp

Updated by naruse (Yui NARUSE)over 7 years ago

diff --git a/enumerator.c b/enumerator.c
index f01ddd5..8e3ae9a 100644
--- a/enumerator.c
+++ b/enumerator.c
@@ -942,6 +942,32 @@ enumerator_inspect(VALUE obj)
}

/*

• * call-seq:
• *
• * Returns the receiver of this enumerator.
• */ + +static VALUE +enumerator_receiver(VALUE obj) +{
• struct enumerator *e;
• VALUE eobj; +
• TypedData_Get_Struct(obj, struct enumerator, &enumerator_data_type, e);
• if (!e || e->obj == Qundef) {
• return Qnil;
• } +
• if (NIL_P(eobj)) {
• eobj = e->obj;
• } +
• return eobj; +} + +/*
• Yielder */ static void @@ -1748,6 +1774,7 @@ InitVM_Enumerator(void) rb_define_method(rb_cEnumerator, "feed", enumerator_feed, 1); rb_define_method(rb_cEnumerator, "rewind", enumerator_rewind, 0); rb_define_method(rb_cEnumerator, "inspect", enumerator_inspect, 0);

/* Lazy */
rb_cLazy = rb_define_class_under(rb_cEnumerator, "Lazy", rb_cEnumerator);

irb(main):007:0> e="abcde".enum_for(:each_byte)
=> #
=> nil
irb(main):010:0> e.size
=> 5

Updated by marcandre (Marc-Andre Lafortune)over 7 years ago

Hi,

On Sat, Jul 21, 2012 at 2:43 AM, naruse (Yui NARUSE) naruse@airemix.jp wrote:

I agree receiver could be helpful. One would also need the method and the arguments, as in my request #3714.

Still, it doesn't really address the issue.

If someone wants to write a library to output the progression, for example, it is still not possible for a general enumerable/enumerator.

The proposal is so that:

• it is standard so anyone can depend on it and also create their own enumerables/enumerators
• it can help in some calculations
• it can help to have generic progression reports
• etc.

Thanks

Updated by mame (Yusuke Endoh)over 7 years ago

• Status changed from Assigned to Feedback

Marc-Andre Lafortune,

We discussed your slide at the developer meeting (7/21).

Matz was positive to the spec of return value: Integer, Float::
INFINITY, and nil.
However, we couldn't understand what API is proposed for creating
an Enumeartor with size.

points:

• Enumerator.new(size) is not acceptable because of compatibility:

p Enumerator.new([1,2,3]).take(2) #=> [1, 2]

• We cannot determine the size of enumerator when creating it:

a = [1]
e = a.permutation
a << 2
p e.to_a #=> 1, 2], [2, 1

So, the API may need to receive a code fragment that calculates
size, such as a Proc.

Yusuke Endoh mame@tsg.ne.jp

Updated by marcandre (Marc-Andre Lafortune)over 7 years ago

• Status changed from Feedback to Open

Hi,

mame (Yusuke Endoh) wrote:

Matz was positive to the spec of return value: Integer, Float::
INFINITY, and nil.

:-)

However, we couldn't understand what API is proposed for creating
an Enumeartor with size.

points:

• Enumerator.new(size) is not acceptable because of compatibility:

p Enumerator.new([1,2,3]).take(2) #=> [1, 2]

Agreed.
I am proposing Enumerator.new(size_lambda){ block }, i.e. only if a block is given, then the first argument can be a lambda/proc that can lazily compute the size.

The old syntax of Enumerator.new without a block does not change meaning.

• We cannot determine the size of enumerator when creating it:

a = [1]
e = a.permutation
a << 2
p e.to_a #=> 1, 2], [2, 1

So, the API may need to receive a code fragment that calculates
size, such as a Proc.

Agreed.
This is why I propose that to_enum accepts a block that can calculate the size, and Enumerator.new with a block can accept a lambda/proc for the same.

Marc-André

Updated by mame (Yusuke Endoh)over 7 years ago

Hello Marc-Andre

2012/7/24, marcandre (Marc-Andre Lafortune) ruby-core@marc-andre.ca:

• Enumerator.new(size) is not acceptable because of compatibility:

p Enumerator.new([1,2,3]).take(2) #=> [1, 2]

Agreed.
I am proposing Enumerator.new(size_lambda){ block }, i.e. only if a block
is given, then the first argument can be a lambda/proc that can lazily
compute the size.

This is just my guess, but matz will not like such a method whose
meaning of its argument varies depending on whether block is given
or not.

The old syntax of Enumerator.new without a block does not change meaning.

Is it okay that there is no way to specify size in this case?

• We cannot determine the size of enumerator when creating it:

a = [1]
e = a.permutation
a << 2
p e.to_a #=> 1, 2], [2, 1

So, the API may need to receive a code fragment that calculates
size, such as a Proc.

Agreed.
This is why I propose that to_enum accepts a block that can calculate the
size, and Enumerator.new with a block can accept a lambda/proc for the
same.

What argument(s) will the lambda/proc receive?

--
Yusuke Endoh mame@tsg.ne.jp

Updated by marcandre (Marc-Andre Lafortune)over 7 years ago

Hi,

mame (Yusuke Endoh) wrote:

I am proposing Enumerator.new(size_lambda){ block }, i.e. only if a block
is given, then the first argument can be a lambda/proc that can lazily
compute the size.

This is just my guess, but matz will not like such a method whose
meaning of its argument varies depending on whether block is given
or not.

I understand the concern.

It could still be acceptable here because the other form is already documented as 'discouraged'. Maybe we should deprecate it?

Other possibility would be to add a different creator, e.g. Enumerator.sized(size_lambda){|yielder| ... }.

The old syntax of Enumerator.new without a block does not change meaning.

Is it okay that there is no way to specify size in this case?

This old syntax is already discouraged and to_enum/enum_for should be used instead.

This is why I propose that to_enum accepts a block that can calculate the
size, and Enumerator.new with a block can accept a lambda/proc for the
same.

What argument(s) will the lambda/proc receive?

We could consider passing the receiver and/or any arguments passed to to_enum, but I would propose to keep it simple and pass no arguments.

Marc-André

• Assignee changed from matz (Yukihiro Matsumoto) to mame (Yusuke Endoh)

Updated by mame (Yusuke Endoh)about 7 years ago

• Status changed from Open to Assigned
• Assignee changed from mame (Yusuke Endoh) to matz (Yukihiro Matsumoto)
• Target version changed from 2.0.0 to 2.6

Updated by marcandre (Marc-Andre Lafortune)about 7 years ago

Hi.

My understanding was this new feature would make it into Ruby 2.0. Did I misunderstand?

The implementation can be seen here: https://github.com/marcandre/ruby/compare/marcandre:trunk...marcandre:enum_size

Although the combined diff (https://github.com/marcandre/ruby/compare/marcandre:trunk...marcandre:enum_size.diff ) is pretty big, there are really just two interesting commits. The second adds #size and extends constructor. The third extends to_enum to accept a block. See https://github.com/marcandre/ruby/commit/add_enumerator_size and https://github.com/marcandre/ruby/commit/sized_to_enum

The first commit only warns on using the deprecated form with no block Enumerator.new(obj, *args), but compatibility is maintained.

The remaining commits add support for the different ways of creating enumerators that can evaluate lazily their size. A few remain to be implemented, in particular the lazy ones.

Updated by marcandre (Marc-Andre Lafortune)about 7 years ago

I added support for lazy enumerators.

Updated by matz (Yukihiro Matsumoto)about 7 years ago

After skimming your modifies, I feel they are decent.
Sorry for being late to check.

Matz.

Updated by marcandre (Marc-Andre Lafortune)about 7 years ago

• Assignee changed from matz (Yukihiro Matsumoto) to marcandre (Marc-Andre Lafortune)
• Target version changed from 2.6 to 2.0.0

Hi,

matz (Yukihiro Matsumoto) wrote:

After skimming your modifies, I feel they are decent.

Thanks for looking at them. Very happy to read this :-)

Marc-André

Updated by marcandre (Marc-Andre Lafortune)about 7 years ago

• Status changed from Assigned to Closed
• % Done changed from 0 to 100

This issue was solved with changeset r37495.
Marc-Andre, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

• enumerator.c (enumerator_initialize): Warn when using deprecated form [Feature #6636]

Also available in: Atom PDF