Project

General

Profile

Actions

Feature #6047

closed

read_all: Grow buffer exponentially in generic case

Added by MartinBosslet (Martin Bosslet) about 12 years ago. Updated over 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:42748]

Description

In the general case, read_all grows its buffer linearly by just the amount that is currently read from the underlying source. This results in a linear number of reallocs, It might turn out beneficial if the buffer were grown exponentially by multiplying with a constant factor (e.g. 1.5 or 2), thus resulting in only a logarithmic numver of reallocs.

I will provide a patch and benchmarks, but I'm already opening this issue so I won't forget.

See also https://bugs.ruby-lang.org/issues/5353 for more details.

Updated by ko1 (Koichi Sasada) over 11 years ago

ping. status?
Do you need helps or comments?

Updated by MartinBosslet (Martin Bosslet) over 11 years ago

ko1 (Koichi Sasada) wrote:

ping. status?
Do you need helps or comments?

Thanks for your help, to be honest, I haven't tried so far. Can we leave it at 2.0.0 target for now? If I run into problems, I'll ask here!

Updated by normalperson (Eric Wong) over 11 years ago

Martin Bosslet wrote:

In the general case, read_all grows its buffer linearly by just the
amount that is currently read from the underlying source. This results
in a linear number of reallocs, It might turn out beneficial if the
buffer were grown exponentially by multiplying with a constant factor
(e.g. 1.5 or 2), thus resulting in only a logarithmic numver of
reallocs.

I think growing the buffer exponentially makes sense.

I would enforce a hard limit (probably <= 8 MB) for each growth,
to:

  1. discourage read_all() for large files, it's very wasteful and
    usually hurts performance

  2. prevent memory exhaustion for edge cases (especially on 32-bit)

Updated by mame (Yusuke Endoh) over 11 years ago

  • Target version changed from 2.0.0 to 2.6

My experience also shows that it is useless to open a ticket for a reminder to myself :-)

I'm setting to next minor tentatively, but if it is really just a performance improvement (i.e., it affects no external modules), you can commit it to 2.0.0 before code freeze.

--
Yusuke Endoh

Actions #5

Updated by zzak (zzak _) over 8 years ago

  • Assignee changed from MartinBosslet (Martin Bosslet) to 7150
Actions #6

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago

  • Status changed from Assigned to Open

Updated by byroot (Jean Boussier) over 1 year ago

I just tried my hand at this one: https://github.com/ruby/ruby/pull/6829

I think such a change would make sense. Not that IO#read without a size if common, but might as well do something sensible.

Actions #8

Updated by Anonymous over 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|7390eb43fe1bfb069af80ba8f73f7dc4999df0fd.


io.c (read_all): grow the buffer exponentially when size is unknown

[Feature #6047]

Currently it's grown by BUFSIZ (1024) on every iteration which is bit wasteful.
Instead we can double the capacity whenever there is less than BUFSIZ capacity
left.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0