Actions
Feature #18262
openEnumerator::Lazy#partition
Status:
Open
Assignee:
-
Target version:
-
Description
(Part of my set of proposals about making .lazy
more useful/popular.)
Currently:
file = File.open('very-large-file.txt')
lines_with_errors, lines_without_errors = file.lazy.partition { _1.start_with?('E:') }
lines_with_errors.class
# => Array, all file is read by this moment
This might be not very practical performance-wise and memory-wise.
I am thinking that maybe returning a pair of lazy enumerators might be a good addition to Enumerator::Lazy
Naive prototype:
class Enumerator::Lazy
def partition(&block)
buffer1 = []
buffer2 = []
source = self
[
Enumerator.new { |y|
loop do
if buffer1.empty?
begin
item = source.next
if block.call(item)
y.yield(item)
else
buffer2.push(item)
end
rescue StopIteration
break
end
else
y.yield buffer1.shift
end
end
}.lazy,
Enumerator.new { |y|
loop do
if buffer2.empty?
begin
item = source.next
if !block.call(item)
y.yield(item)
else
buffer1.push(item)
end
rescue StopIteration
break
end
else
y.yield buffer2.shift
end
end
}.lazy
]
end
end
Testing it:
Enumerator.produce(1) { |i| puts "processing #{i}"; i + 1 }.lazy
.take(30)
.partition(&:odd?)
.then { |odd, even|
p odd.first(3), even.first(3)
}
# Prints:
# processing 1
# processing 2
# processing 3
# processing 4
# processing 5
# [1, 3, 5]
# [2, 4, 6]
As you might notice by the "processing" log, it only fetched the amount of entries that was required by produced enumerators.
The drawback would be—as my prototype implementation shows—the need of internal "buffering" (I don't think it is possible to implement lazy partition without it), but it still might be worth a shot?
Actions
Like0
Like0Like0Like0Like0Like0Like0Like0