Bug #20424
closedZLib::GZipReader always double allocates strings when passed outbuf, significantly increasing memory usage
Description
In trying to improve the memory performance during the install of rubygems, we previously found a bug in eof?
. Further investigation into the memory usage during the fix for this bug found wasteful allocating of strings in readpartial and read.
In ZLib, when reading with readpartial or read, a new string is always created for the bytes read from the buffer.
The current approach allocates a string no matter if there is an outbuf passed.
# vastly simplified psuedo implementation
def readpartial(len, dst=nil)
if (buffer.empty?)
buffer = gzipfile.readpartial(len, dst) # adds inflated bytes into dst if passed
end
dst = allocate_new_string(len) # make a new string for the destination
dst << buffer.read(len) # read from the buffer into the destination
end
The result is that readpartial always allocated at least double the bytes necessary.
Samuel Giddins submitted, and I have tested and reviewed, a pull request, zlib#61 that resolves the issue and vastly improves the memory usage and increases the speed of GZipReader by avoiding excess memcpy and rb_str_new calls that were wasted.
This PR also adds an outbuf to GZipReader#read for improvement memory management, very similar to IO#read
We appreciate your attention to this performance improvement. We believe it will further improve the performance of rubygems gem installs.