Bug #20585
closedSize of memory allocated by String.new(:capacity) is different from the specified value
Description
IMHO, if :capacity is specified in String.new, capa will be its value.
In fact, Ruby 3.2 seems to allocate the size as specified.
% cat string_capacity.rb
unless /\A3\.[23]\./ =~ RUBY_VERSION
raise NotImplementedError, 'Not Supported Ruby Version'
end
require 'inline'
class String
def super_inspect
self.class.superclass.instance_method(:inspect).bind(self).call
end
inline do |builder|
builder.include '<stdio.h>'
builder.add_compile_flags '-Wall'
builder.c_raw <<~CODE
VALUE capacity(int argc, VALUE *argv, VALUE self) {
struct RString *rstring = RSTRING(self);
if (! (RBASIC(self)->flags & RSTRING_NOEMBED)) {
return rb_to_symbol(rb_str_new_cstr("EMBED"));
} else {
if (RBASIC(self)->flags & ELTS_SHARED) {
return rb_to_symbol(rb_str_new_cstr("SHARED"));
} else {
return LONG2NUM(rstring->as.heap.aux.capa);
}
}
return Qnil; /* NOTREACHED */
}
CODE
end
end
% irb -I. -rstring_capacity
irb(main):001:0> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.2.4"]
irb(main):002:0> String.new('', capacity: 1024).capacity
=> 1024
irb(main):003:0> String.new('*'*1024, capacity: 1024).capacity
=> 1024
irb(main):004:0>
This is what I expect.
However, Ruby 3.3 seems to behave differently.
% irb -I. -rstring_capacity
irb(main):001> [RUBY_PLATFORM, RUBY_VERSION]
=> ["x86_64-freebsd14.0", "3.3.2"]
irb(main):002> String.new('', capacity: 1024).capacity
=> 1023
irb(main):003> String.new('*'*1024, capacity: 1024).capacity
=> 2047
irb(main):004>
- If only :capacity is specified, one byte less is allocated.
- If the initial string and its bytesize are specified, about twice the size is allocated.
Is this intentional?
Updated by byroot (Jean Boussier) 5 months ago
Most of this comes from: https://github.com/ruby/ruby/pull/8825
Long story short, capacity
is a bit confusing because since Ruby strings are null terminated, there is always at least one extra byte needed. So it's debatable whether the terminating byte is accounted for in the capacity.
I see how when using String.new(capacity:)
, the goal is to avoid reallocation, so if you precomputed the final string size, that might defeat the purpose. The other side of the coin though, is that if you use sizes like 4096
hoping to fit in a specific size in memory, the extra terminator byte make it not behave as you'd hoped.
If the initial string and its bytesize are specified, about twice the size is allocated.
I need to dig more to answer this one.
Updated by byroot (Jean Boussier) 5 months ago
- Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED
If the initial string and its bytesize are specified, about twice the size is allocated.
Alrigth, this was just a fallout of the other change. The smaller buffer would cause the string to grow in size when the original string was copied, so doubling.
I opened: https://github.com/ruby/ruby/pull/11018
Updated by byroot (Jean Boussier) 5 months ago
- Status changed from Open to Closed
Applied in changeset git|83f57ca3d225ce06abbc5eef6aec37de4fa36d58.
String.new(capacity:) don't substract termlen
[Bug #20585]
This was changed in 36a06efdd9f0604093dccbaf96d4e2cb17874dc8 because
String.new(1024)
would end up allocating 1025
bytes, but the problem
with this change is that the caller may be trying to right size a String.
So instead, we should just better document the behavior of capacity:
.
Updated by Dan0042 (Daniel DeLorme) 5 months ago
What about allocating capacity+1 unless capacity is a power of two?
Updated by k0kubun (Takashi Kokubun) 4 months ago
- Backport changed from 3.1: DONTNEED, 3.2: DONTNEED, 3.3: REQUIRED to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONE
ruby_3_3 d1ffd5ecfa62a049b7c508f30b6912a890de1b32.