Project

General

Profile

Actions

Feature #12025

closed

Reduce minimum string buffer size from 128 to 127

Added by jeremyevans0 (Jeremy Evans) about 8 years ago. Updated almost 8 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:73493]

Description

This changes the minimum buffer size for string buffers from 128 to
127. The underlying C buffer is always 1 more than the ruby buffer,
so this changes the actual amount of memory used for the minimum
string buffer from 129 to 128. This makes it much easier on the
malloc implementation, as evidenced by the following code (note that
time -l is used here, but Linux systems may need time -v).

$ cat bench_mem.rb
i = ARGV.first.to_i
Array.new(1000000){" " * i}
$ /usr/bin/time -l ruby bench_mem.rb 128
        3.10 real         2.19 user         0.46 sys
    289080  maximum resident set size
     72673  minor page faults
        13  block output operations
        29  voluntary context switches
$ /usr/bin/time -l ruby bench_mem.rb 127
        2.64 real         2.09 user         0.27 sys
    162720  maximum resident set size
     40966  minor page faults
         2  block output operations
         4  voluntary context switches

To try to ensure a power-of-2 growth, when a ruby string capacity
needs to be increased, after doubling the capacity, add one. This
ensures the ruby capacity will be odd, which means actual amount
of memory used will be even, which is probably better than the
current case of the ruby capacity being even and the actual amount
of memory used being odd.

A very similar patch was proposed 4 years ago in feature #5875. It
ended up being rejected, because no performance increase was shown.
One reason for that is that ruby does not use STR_BUF_MIN_SIZE
unless rb_str_buf_new is called, and that previously did not have
a ruby API, only a C API, so unless you were using a C extension
that called it, there would be no performance increase.

With the recently proposed feature #12024, String.buffer is added,
which is a ruby API for creating string buffers. Using
String.buffer(100) wastes much less memory with this patch, as the
malloc implementation can more easily deal with the power-of-2
sized memory usage. As measured above, memory usage is 44% less,
and performance is 17% better.


Files

Updated by normalperson (Eric Wong) about 8 years ago

wrote:

String.buffer(100) wastes much less memory with this patch, as the
malloc implementation can more easily deal with the power-of-2
sized memory usage. As measured above, memory usage is 44% less,
and performance is 17% better.

None of jemalloc, dlmalloc 2.8.6, or glibc (dlmalloc 2.7.x-based) are
power-of-2 allocators. I was not able to measure a difference with any
of those.

But yeah, I could see this being an improvement for power-of-2 malloc
implementations out there (OpenBSD?). I don't think your change
would be harmful for jemalloc/dlmalloc/glibc users, either.

Updated by naruse (Yui NARUSE) about 8 years ago

  • Description updated (diff)

Updated by naruse (Yui NARUSE) about 8 years ago

With glibc as Eric says it won't save memory so much.
But with jemalloc, whose next allocation size of 128 is 192, saves some memory.
Darwin also saves 6%.

memo:
https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/
https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/
http://www.slideshare.net/kosaki55tea/glibc-malloc in Japanese

Updated by jeremyevans0 (Jeremy Evans) almost 8 years ago

It's been over 5 months since this issue was created. Since it looks like both Eric and Yui agree that this won't cause harm to any systems and can benefit some systems, could it be accepted?

Actions #5

Updated by Anonymous almost 8 years ago

  • Status changed from Open to Closed

Applied in changeset r55686.


string.c: reduce malloc overhead for default buffer size

  • string.c (STR_BUF_MIN_SIZE): reduce from 128 to 127
    [ruby-core:76371] [Feature #12025]
  • string.c (rb_str_buf_new): adjust for above reduction

From Jeremy Evans :

This changes the minimum buffer size for string buffers from 128 to
127. The underlying C buffer is always 1 more than the ruby buffer,
so this changes the actual amount of memory used for the minimum
string buffer from 129 to 128. This makes it much easier on the
malloc implementation, as evidenced by the following code (note that
time -l is used here, but Linux systems may need time -v).

$ cat bench_mem.rb
i = ARGV.first.to_i
Array.new(1000000){" " * i}
$ /usr/bin/time -l ruby bench_mem.rb 128
3.10 real 2.19 user 0.46 sys
289080 maximum resident set size
72673 minor page faults
13 block output operations
29 voluntary context switches
$ /usr/bin/time -l ruby bench_mem.rb 127
2.64 real 2.09 user 0.27 sys
162720 maximum resident set size
40966 minor page faults
2 block output operations
4 voluntary context switches

To try to ensure a power-of-2 growth, when a ruby string capacity
needs to be increased, after doubling the capacity, add one. This
ensures the ruby capacity will be odd, which means actual amount
of memory used will be even, which is probably better than the
current case of the ruby capacity being even and the actual amount
of memory used being odd.

A very similar patch was proposed 4 years ago in feature #5875. It
ended up being rejected, because no performance increase was shown.
One reason for that is that ruby does not use STR_BUF_MIN_SIZE
unless rb_str_buf_new is called, and that previously did not have
a ruby API, only a C API, so unless you were using a C extension
that called it, there would be no performance increase.

With the recently proposed feature #12024, String.buffer is added,
which is a ruby API for creating string buffers. Using
String.buffer(100) wastes much less memory with this patch, as the
malloc implementation can more easily deal with the power-of-2
sized memory usage. As measured above, memory usage is 44% less,
and performance is 17% better.

Updated by normalperson (Eric Wong) almost 8 years ago

wrote:

It's been over 5 months since this issue was created. Since it looks
like both Eric and Yui agree that this won't cause harm to any systems
and can benefit some systems, could it be accepted?

Thanks for the reminder, committed as r55686

/me is forgetfulperson :x

Updated by ngoto (Naohisa Goto) almost 8 years ago

Due to the "+ 1" in string.c:2593 inserted in r55686, integer overflow may occur if capa == LONG_MAX / 2.

	    while (total > capa) {
		if (capa > LONG_MAX / 2) {
		    capa = (total + 4095) / 4096 * 4096;
		    break;
		}
		capa = 2 * capa + 1;
	    }

I also think that termlen should be used, instead of "+ 1".

Updated by ngoto (Naohisa Goto) almost 8 years ago

In r55692, integer overflow is fixed, and termlen is used.

Updated by normalperson (Eric Wong) almost 8 years ago

wrote:

In r55692, integer overflow is fixed, and termlen is used.

Good catch, thank you

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0