Feature #16987

Enumerator::Lazy vs Array methods

Added by zverok (Victor Shepelev) 3 months ago. Updated 2 months ago.

Target version:


Enumerations are designed to be greedy (immediately executed on each method call within a chain) by default. Sometimes, that is not useful for practical purposes (e.g. 2 mln strings array, drop comments, split into fields, find the first ten whose field 2 is equal to some value). So one needs to either do everything in one each block, or use Enumerable#lazy. There are three problems with the latter:

  1. It is much less known,
  2. It is said to be almost always slower than non-lazy, and is therefore not recommended,
  3. It lacks some methods that are often necessary in processing large data chunks.

I want to discuss (3) here. Enumerator::Lazy would better, but actually doesn't, have methods such as: #flatten, #product, and #compact. They are all methods of Array, not Enumerable. In fact,

  1. They probably should belong to Enumerable (none of them requires anything besides #each to function),
  2. They are definitely useful for lazily processing large sequences.

Also available in: Atom PDF