Project

General

Profile

Actions

Feature #6261

closed

Enumerable#emap and Enumerable#egrep

Added by yimutang (Joey Zhou) about 10 years ago. Updated about 10 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:44147]

Description

I was inspired by Ruby 1.9.x`s Enumerable#chunk and #slice_before, which both take a block and return an enumerator. I wish to introduce two new method into the Enumerable core, which can be implemented in Ruby like this:

module Enumerable

def emap # return an enumerator
raise ArgumentError, 'no block given' unless block_given?

Enumerator.new do |yielder|
  self.each do |elem|
    mapped = yield elem
    yielder << mapped
  end
end

end

def egrep
raise ArgumentError, 'no block given' unless block_given?

Enumerator.new do |yielder|
  self.each do |elem|
    allowed = yield elem
    yielder << elem if allowed
  end
end

end

end

#emap + #to_a is just like #map / #collect, #egrep + #to_a is just like #select. Why I think it's necessary to introduce those methods? Because #collect and #select sometimes are not effecient. Here's an weird example:

lines = File.foreach('a_very_large_file')
.egrep {|line| line.length < 10 }
.emap {|line| line.chomp!; line }
.each_slice(3)
.emap {|lines| lines.join(';').downcase }
.take_while {|line| line.length > 20 }

The above code means: from 'a_very_large_file' take each line, let go whose length < 10, chomp each allowed line, take 3 of them as a group and join them, at last, stop when the length of joined line has length less than 20.

If you replace #egrep with #select, #emap with #collect, you must iterate the whole lines of 'a_very_large_file' and create a temporary array, 3 times! It is not efficient in this situation, because the #take_while means 'I do not want to check all lines'.

If you want to omit the #select and #collect, just do it like:

File.foreach('a_very_large_file') do |line|

blah blah to achieve the same goal

end

I'm afraid it's hard to make the code clear at a glance.

So you may see #egrep and #emap are very useful.

Another example, I want to make a class FreqDist, which records the frequency distribution of a population of samples.

class FreqDist

def initialize(samples)
@sample_dict = Hash.new(0)
samples.each {|sample| @sample_dict[sample] += 1 }
end

end

I want to use FreqDist to store the frequency distribution of a list of words, but there is case problem, 'When' and 'when' should not be regard as two sample. I can do it like this:

fd = FreqDist.new(words.emap {|w| w.downcase })

use an enumerator instead of an array as argument, iterate once, no temporary array.

Well, in my opinion, such #emap and #egrep are very powerful. Although I can implement them in Ruby and put them in a custom gem, I think it's better to introduce them into the core Enumerable module.

Please consider the suggestion. Thank you!

Updated by Eregon (Benoit Daloze) about 10 years ago

Hello,

This should already be possible with the recent Enumerator::Lazy (in trunk), just drop a .lazy after the File.foreach and use usual select,map,...:

lines = File.foreach('a_very_large_file').lazy
.select {|line| line.length < 10 }
.map {|line| line.chomp!; line }
.each_slice(3)
.map {|lines| lines.join(';').downcase }
.take_while {|line| line.length > 20 }

The same goes for the second example: words.lazy.map(&:downcase).

Be aware it's not always faster (although likely taking less memory), this is a trade-off.

Updated by matz (Yukihiro Matsumoto) about 10 years ago

  • Status changed from Open to Rejected

use Enumerable#lazy.

Matz.

Actions

Also available in: Atom PDF