Feature #6261
closedEnumerable#emap and Enumerable#egrep
Description
I was inspired by Ruby 1.9.x`s Enumerable#chunk and #slice_before, which both take a block and return an enumerator. I wish to introduce two new method into the Enumerable core, which can be implemented in Ruby like this:
module Enumerable
def emap # return an enumerator
raise ArgumentError, 'no block given' unless block_given?
Enumerator.new do |yielder|
self.each do |elem|
mapped = yield elem
yielder << mapped
end
end
end
def egrep
raise ArgumentError, 'no block given' unless block_given?
Enumerator.new do |yielder|
self.each do |elem|
allowed = yield elem
yielder << elem if allowed
end
end
end
end
#emap + #to_a is just like #map / #collect, #egrep + #to_a is just like #select. Why I think it's necessary to introduce those methods? Because #collect and #select sometimes are not effecient. Here's an weird example:
lines = File.foreach('a_very_large_file')
.egrep {|line| line.length < 10 }
.emap {|line| line.chomp!; line }
.each_slice(3)
.emap {|lines| lines.join(';').downcase }
.take_while {|line| line.length > 20 }
The above code means: from 'a_very_large_file' take each line, let go whose length < 10, chomp each allowed line, take 3 of them as a group and join them, at last, stop when the length of joined line has length less than 20.
If you replace #egrep with #select, #emap with #collect, you must iterate the whole lines of 'a_very_large_file' and create a temporary array, 3 times! It is not efficient in this situation, because the #take_while means 'I do not want to check all lines'.
If you want to omit the #select and #collect, just do it like:
File.foreach('a_very_large_file') do |line|
blah blah to achieve the same goal¶
end
I'm afraid it's hard to make the code clear at a glance.
So you may see #egrep and #emap are very useful.
Another example, I want to make a class FreqDist, which records the frequency distribution of a population of samples.
class FreqDist
def initialize(samples)
@sample_dict = Hash.new(0)
samples.each {|sample| @sample_dict[sample] += 1 }
end
end
I want to use FreqDist to store the frequency distribution of a list of words, but there is case problem, 'When' and 'when' should not be regard as two sample. I can do it like this:
fd = FreqDist.new(words.emap {|w| w.downcase })
use an enumerator instead of an array as argument, iterate once, no temporary array.
Well, in my opinion, such #emap and #egrep are very powerful. Although I can implement them in Ruby and put them in a custom gem, I think it's better to introduce them into the core Enumerable module.
Please consider the suggestion. Thank you!
Updated by Eregon (Benoit Daloze) over 12 years ago
Hello,
This should already be possible with the recent Enumerator::Lazy (in trunk), just drop a .lazy
after the File.foreach and use usual select,map,...:
lines = File.foreach('a_very_large_file').lazy
.select {|line| line.length < 10 }
.map {|line| line.chomp!; line }
.each_slice(3)
.map {|lines| lines.join(';').downcase }
.take_while {|line| line.length > 20 }
The same goes for the second example: words.lazy.map(&:downcase)
.
Be aware it's not always faster (although likely taking less memory), this is a trade-off.
Updated by matz (Yukihiro Matsumoto) over 12 years ago
- Status changed from Open to Rejected
use Enumerable#lazy.
Matz.