Project

General

Profile

Actions

Bug #20424

closed

ZLib::GZipReader always double allocates strings when passed outbuf, significantly increasing memory usage

Added by martinemde (Martin Emde) 7 months ago. Updated 6 months ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
[ruby-core:117497]

Description

In trying to improve the memory performance during the install of rubygems, we previously found a bug in eof?. Further investigation into the memory usage during the fix for this bug found wasteful allocating of strings in readpartial and read.

In ZLib, when reading with readpartial or read, a new string is always created for the bytes read from the buffer.

The current approach allocates a string no matter if there is an outbuf passed.

# vastly simplified psuedo implementation
def readpartial(len, dst=nil)
  if (buffer.empty?)
    buffer = gzipfile.readpartial(len, dst) # adds inflated bytes into dst if passed
  end
  dst = allocate_new_string(len) # make a new string for the destination
  dst << buffer.read(len) # read from the buffer into the destination
end

The result is that readpartial always allocated at least double the bytes necessary.

Samuel Giddins submitted, and I have tested and reviewed, a pull request, zlib#61 that resolves the issue and vastly improves the memory usage and increases the speed of GZipReader by avoiding excess memcpy and rb_str_new calls that were wasted.

This PR also adds an outbuf to GZipReader#read for improvement memory management, very similar to IO#read

We appreciate your attention to this performance improvement. We believe it will further improve the performance of rubygems gem installs.

Actions

Also available in: Atom PDF

Like2
Like0Like0