Project

General

Profile

Feature #15549

Enumerable#to_reader (or anything enumerable, Enumerator, lazy enums, enum_for results)

Added by chucke (Tiago Cardoso) 9 months ago. Updated 9 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:91181]

Description

This is a feature proposal for something I've had to implement before multiple times.

For a lot of IO-related APIs, there is this unspoken (because ruby doesn't have official interfaces) notion of a reader/writer protocol, that is, you pass arguments to certain functions where they either must implement "#read(nsize, buffer)" or "#write(data)". An example would be "IO.copy_stream".

It happened to me multiple times in the past that I started implementing some data-generator using "#each" in a specific format (CSV data, JSON...) to be lazy and memory conservative, but end up rewriting it because I can't read from an enumerable into a socket/file handle directly.

Lately I've been adopting the pattern of "injecting" a "#read" method to these objects, so that I can indeed use these APIs to my benefit. Sadly, I have to reimplement this in every project. This is the gist:

https://gist.github.com/HoneyryderChuck/625c7b873a00a18d12b1a08695551510

I think such an API would be very benefitial to the common user. In most projects I've worked in, writing data to a tempfile, S3 bucket, FTP server, is very common, and I've lost the count to the number of implementations which write the whole data in memory then write to the handle, which obviously gives the impression that ruby consumes a lot of memory.

Now, I also understand that this is only beneficial to particular case of enums (those which yield strings/"to_s"-ables). But since there's a precedent for "#sum", so maybe I can make a case.

This is an example that works if you load the gist code":

enum = %w(a) * 65536
puts "size: #{enum.size}"
reader = enum.to_reader
IO.copy_stream(reader, $stderr)

History

Updated by shevegen (Robert A. Heiler) 9 months ago

I do not have any major pro or con opinion on the functionality as such; I think the
name #to_reader may not be ideal, though. People may confuse or think about
attr_reader ; or they may think about .to_s , .to_i and so forth. Perhaps
there may be other names possible, if they may also have to do with a stream. (I
have no alternative good example, though.)

Updated by chucke (Tiago Cardoso) 9 months ago

I'm open for other names (#to_readable_stream perhaps?). Important is to acknowledge the validity of this use-case, as there are some constraints depending of where it is used.

For instance, responding to #read is a sufficient requirement for writing to disk. As for writing to an S3 bucket, the sdk requires that it also responds to #bytesize. That's beyond the scope of this change though.

Also available in: Atom PDF