Feature #6808
closedImplicit index for enumerations
Description
=begin
One of the less lovely things about Ruby's otherwise elegant enumerables is the lack of ubiquitous access to the current index. Because of this, we end up with a bevy of extra methods that are little more than counter parts and compensation for other enumerable methods to gain access to the index. Examples include, #each_with_index, #each_index and (in many extension libraries) #collect_with_index. It is all rather wasteful, inelegant, and limiting. Heaven forbid we need a #select_with_index, or some other uncommon case.
No doubt this has had some discussion long in the past, but I would like revisit and offer a bnew more concrete proposal...
Thanks to Enumerator, we can now at least do:
[:a,:b,:c].each_with_index.map{ |e, i| [i, e] }
That's great, but it has obvious shortcomings. It's long winded and it has the overhead of an Enumerator object. Ideally we would want to do this instead:
[:a,:b,:c].map{ |e| [$i, e] }
Where $i is the implicit index. Now a global variable is surely the simplest solution. But, I can understand that some might object to the use of a global variable, despite the fact that this approach is common with regexp matches like $1, $2, etc. In that case, we could designate a new keyword. Lets call it index
.
[:a,:b,:c].map{ |e| [index, e] }
We might suffer a conflict here however if someone has already used "index" as a block argument. In that case we would need Ruby to allow it to be overridden, in the same sense that one can define a public method called class
, even though class
is a keyword in other contexts.
If this were all that we gained then I say it is a victory, but I'd like to consider also that we go a step further, and instead of having just "index", we have an iterative object. After all Ruby is an OOPL. In this case, the keyword would be it
and we could do:
[:a,:b,:c].map{ |e| [it.index, e] }
The nice thing about it
is that it can have a few other useful methods to improve readability of code, such as it.first?
and it.last?
(if size is known for the enumerable). I think this is awesome solution that grants the most readability and flexibility to the language.
Of course, having an iteration object might bring up concerns about performance, since it will add overhead to create a new iteration instance with every pass. This can be addressed by having the object be mutable, so all that needs to change is the index in the same object. A minor downside here, an it
can't be stored by reference between passes (e.g. prev_it = it
), but knowing this, #dup could be used if that was really necessary. If that isn't good enough to curb performance concerns, I would suggest a means of indicating the it
object be made available. We don't want to drag Enumerator into this so map.it{...}
is not the solution, but perhaps Ruby could recognize ;it
at the end of block arguments?
[:a,:b,:c].map{ |e; it| [it.index, e] }
Maybe that syntax can't work, but surely something along these lines could. Personally, I doubt the overhead of mutable it
is too much, but just in case.
To summarize, I propose an implicit mutable iteration object called it
that allows access to the enumerations index, plus convenience methods for querying the index. Or, if that is considered too much, then at least an implicit index, either as a global variable or a special keyword. Any of these choices would be a marked improvement, allowing us to avoid the endless proliferation of _with_index
methods.
=end
Updated by trans (Thomas Sawyer) over 12 years ago
Ixnay on the global variable. It was just pointed out to me that it would be problem for nested iterations. Keyword could still work though b/c it would be block local.
Updated by cjheath (Clifford Heath) over 12 years ago
The inconsistency between operations over Arrays and Hashes isn't always avoidable, but consider:
h = Hash[1, 2, 3, 4, 5, 6]
a = [1, 2, 3, 4, 5, 6]
h[3]
=> 4
h.detect{|k, v| k == 3 }
#=> [3, 4]
a[3]
#=> 3
a.detect{|v| v == 3 }
#=> 3
Surely the block passed to Array#detect should receive the index, and
the result of a Hash#detect should be a value, not a [key, value] pair?
I don't expect this to be changed, but perhaps it might inform proposed changes.
Clifford Heath.
Updated by drbrain (Eric Hodel) over 12 years ago
What about the overhead for an infinite enumerator that does not use the implicit index? Especially after a few days of CPU time for a frequently-used enumerator?
Updated by trans (Thomas Sawyer) over 12 years ago
@drbrain (Eric Hodel) It's a fair point. But I imagine the index already exists under the hood. Probably the question is, can it be made accessible with little overhead?
Perhaps it
could even be partially lazy, so full overhead doesn't even come into play unless used.
Updated by drbrain (Eric Hodel) over 12 years ago
There is no such index internal to Enumerator to make accessible.
each_with_index already exists to do this. The use of each_with_index is preferable to an implicit, lazy iteration count. The former is intention-revealing while the latter is not.
For this new feature a user may ask, "Where does this local variable come from? I didn't assign to it!" or, "Why was my 'it' (or 'index') variable overwritten with a number?" each_with_index does not have either of these problems.
Updated by trans (Thomas Sawyer) over 12 years ago
There is no such index internal to Enumerator to make accessible.
It not about Enumerator, its about #each itself. And there certainly is and index with #each method itself. e.g. https://github.com/ruby/ruby/blob/trunk/array.c#L1549
#each_with_index doesn't cut it at all, as it doesn't address all the other possibilities, #map_with_index, #select_with_index, etc., which is the point.
I really don't see the problem with "I didn't assign that". If its spec its expected, and we already have this sort of thing with regexp globals.
Updated by Anonymous over 12 years ago
On Jul 30, 2012, at 6:59 PM, trans (Thomas Sawyer) transfire@gmail.com wrote:
Issue #6808 has been updated by trans (Thomas Sawyer).
There is no such index internal to Enumerator to make accessible.
It not about Enumerator, its about #each itself. And there certainly is and index with #each method itself. e.g. https://github.com/ruby/ruby/blob/trunk/array.c#L1549
But that's only each for arrays. I doubt an each method for, say, linked lists would bother to implement an internal index.
#each_with_index doesn't cut it at all, as it doesn't address all the other possibilities, #map_with_index, #select_with_index, etc., which is the point.
I'm not clear why the #with_index method isn't adequate for your needs.
[1,2,3].map.with_index { |item, index| [item, index] }
--
-- Jim Weirich
-- jim.weirich@gmail.com
Updated by trans (Thomas Sawyer) over 12 years ago
=begin
I'm not clear why the #with_index method isn't adequate for your needs.
Yes, functionally your are right. I was just thinking that the overall overhead would be less if enumerator wasn't used. I ran some benchmarks: https://gist.github.com/3213779
EACH user system total real¶
each 3.840000 0.010000 3.850000 ( 3.843590)
enumerator each 5.130000 0.020000 5.150000 ( 5.156704)
each_with_index 5.650000 0.020000 5.670000 ( 5.662425)
each and manual index 5.190000 0.010000 5.200000 ( 5.206394)
enumerator each.with_index 6.500000 0.020000 6.520000 ( 6.519111)
enumerator each and manual index 6.480000 0.020000 6.500000 ( 6.501582)
MAP user system total real¶
map 5.210000 0.020000 5.230000 ( 5.230273)
enumerator map 9.450000 0.040000 9.490000 ( 9.491262)
map and manual index 6.600000 0.020000 6.620000 ( 6.629977)
enumerator map.with_index 8.270000 0.030000 8.300000 ( 8.291350)
enumerator map and manual index 11.210000 0.050000 11.260000 ( 11.256445)
Notice the results of the manual index without the enumerator --and that's in Ruby, not C code.
=end
Updated by drbrain (Eric Hodel) over 12 years ago
trans (Thomas Sawyer) wrote:
I'm not clear why the #with_index method isn't adequate for your needs.
Yes, functionally your are right. I was just thinking that the overall overhead would be less if enumerator wasn't used. I ran some benchmarks: https://gist.github.com/3213779
[…]
Notice the results of the manual index without the enumerator --and that's in Ruby, not C code.
Are the times of these benchmarks dominated by object creation or iteration? What happens if you run a small number of trials across a large array? (n = 26, a = (0...1000000).to_a)
No matter which method is faster, what happens to this code:
index = 10
offsets.each do |e|
index = e if condition e
break if index > 30
end
Does index equal 10 on the first execution of the block? Does it equal 0?
Updated by trans (Thomas Sawyer) over 12 years ago
=begin
Are the times of these benchmarks dominated by object creation or iteration? What happens if you run a small number of trials across a large array? (n = 26, a = (0...1000000).to_a)
You are correct that the difference would be less for large arrays and few iterations.
EACH user system total real¶
each 3.610000 0.050000 3.660000 ( 3.671551)
enumerator each 3.610000 0.030000 3.640000 ( 3.642515)
each_with_index 4.920000 0.020000 4.940000 ( 4.972732)
each and manual index 4.930000 0.010000 4.940000 ( 4.950868)
enumerator each.with_index 4.950000 0.000000 4.950000 ( 4.982888)
enumerator each and manual index 4.900000 0.000000 4.900000 ( 4.911986)
MAP user system total real¶
map 4.230000 0.080000 4.310000 ( 4.324616)
enumerator map 6.060000 0.090000 6.150000 ( 6.176046)
map and manual index 5.540000 0.070000 5.610000 ( 5.633096)
enumerator map.with_index 5.510000 0.060000 5.570000 ( 5.568634)
enumerator map and manual index 7.090000 0.200000 7.290000 ( 7.287555)
But the difference looks less pronounced in this case, and on average I think programs tend to create and iterate over more small arrays, then they do large ones.
No matter which method is faster, what happens to this code:
...
Does index equal 10 on the first execution of the block? Does it equal 0?
That's a fair question. I think to preserve backward compatibility, this code would have to behave just as you present it. In other words, the implicit index has been overridden by assigning it as a local variable. Which is why originally a global seemed the right choice. But can a global behave block local?
In any case, I think I will withdraw this request. Having to worry about local override or managing global that behaves block local will probably dry up any performance gain. And in retrospect I think the whole it
idea, while good on it's face, doesn't really solve the issues it is intended to well.
I'm glad to have had the chance to discuss this and flush it out though, as it has been sitting in the back of my mind for a while.
=end
Updated by drbrain (Eric Hodel) over 12 years ago
- Status changed from Open to Closed
trans (Thomas Sawyer) wrote:
No matter which method is faster, what happens to this code:
...
Does index equal 10 on the first execution of the block? Does it equal 0?That's a fair question. I think to preserve backward compatibility, this code would have to behave just as you present it. In other words, the implicit index has been overridden by assigning it as a local variable. Which is why originally a global seemed the right choice. But can a global behave block local?
$~ and friends behave this way (thread and method local).
In any case, I think I will withdraw this request. Having to worry about local override or managing global that behaves block local will probably dry up any performance gain. And in retrospect I think the whole
it
idea, while good on it's face, doesn't really solve the issues it is intended to well.
OK.
Updated by trans (Thomas Sawyer) over 12 years ago
$~ and friends behave this way (thread and method local).
Hmmm... In that case, maybe it is worth trying, to see what the actual performance change would be. I'm willing to do it, and I basically know enough to work with a global variable in the C code. But how to handle block local behavior?
Updated by drbrain (Eric Hodel) over 12 years ago
- Status changed from Closed to Feedback
See rb_define_virtual_variable() and vm_svar_get()
Updated by yhara (Yutaka HARA) about 12 years ago
- Target version changed from 2.0.0 to 2.6