Project

General

Profile

Feature #17002

Extend heap pages to exactly 16KiB

Added by tenderlovemaking (Aaron Patterson) about 1 month ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:99000]

Description

Hi,

I would like to extend heap pages to be exactly 16KiB. Currently, struct heap_page_body is 16KiB - (sizeof(size_t) * 5).

Before I list the reasons I want to change, there are two important facts I want to list. First, OS pages are 4KiB on platforms I tested (macOS, Ubuntu, Windows). Second, when the GC allocates pages, it first allocates struct heap_page_body immediately followed by struct heap_page:

https://github.com/ruby/ruby/blob/289a28e68f30e879760fd000833b512d506a0805/gc.c#L1756-L1767

I want to make this change for a few reasons:

  1. I would like struct heap_page_body to be a multiple of OS pages so that we can use mprotect on it (I want to implement read barriers on heap pages with mprotect, so this is my selfish reason)
  2. Some allocators (specifically glibc) will put struct heap_page on the same OS page as struct heap_page_body. struct heap_page is frequently modified, so that OS page (including Ruby objects) will be copied. Extending struct heap_page_body to 16KiB can help prevent CoW faults. (see Note 1)
  3. Allocating 16KiB can reduce overall memory consumption. Some allocators (specifically jemalloc) will round requested chunks to bin sizes. jemalloc has a 16KiB bin size, so our request for 16KiB - (sizeof(size_t) * 5) is rounded up to 16KiB anyway, and (sizeof(size_t) * 5) is wasted. (sizeof(size_t) * 5) is enough room to fit one more Ruby object, so if we use that space for one more object, then we don't need to allocate as many pages, and memory usage can actually decrease.

My hypothesis is that this patch will either not change overall memory usage, or decrease overall memory usage. But in either case it will allow us to use mprotect, and improve CoW.

Tests

I tested this patch on an Ubuntu machine with jemalloc and glibc. Here is my system information:

Linux version:

aaron@whiteclaw ~> uname -a
Linux whiteclaw 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
aaron@whiteclaw ~> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04 LTS
Release:    20.04
Codename:   focal

GLIBC version:

aaron@whiteclaw ~> ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9) 2.31
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

jemalloc version:

aaron@whiteclaw ~/git> apt list --installed | grep jemalloc

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libjemalloc-dev/focal,now 5.2.1-1ubuntu1 amd64 [installed]
libjemalloc2/focal,now 5.2.1-1ubuntu1 amd64 [installed,automatic]

To test memory usage, I used this tool: https://github.com/bpowers/mstat

mstat is a sampling profiler that will report memory usage over time. I generated RDoc for Ruby and took samples while documentation was generated. Here is the Ruby command I used:

./ruby --disable-gems "./libexec/rdoc" --root "." --encoding=UTF-8 --all --ri --op ".ext/rdoc" --page-dir "./doc" --no-force-update  "."

I made 50 samples for each allocator and branch like this:

for x in (seq 50)
  sudo rm -rf .ext/rdoc; sudo ../src/mstat/mstat -o glibc-branch_$x.tsv ./ruby --disable-gems "./libexec/rdoc" --root "." --encoding=UTF-8 --all --ri --op ".ext/rdoc" --page-dir "./doc" --no-force-update  "."
end

In other words I made 200 samples total (50 jemalloc + master, 50 jemalloc + branch, 50 glibc + master, 50 glibc + branch).

glibc

Here is a comparison of glibc over time (lower is better):

glibc changes

From this graph it looks like glibc is mostly the same, but sometimes lower. It looks like there are some outlier samples that go higher. I made a box plot to compare maximum RSS:

glibc max boxplot

The box plot shows the max RSS is usually lower with some outliers that are higher.

jemalloc

Here is a comparison of jemalloc over time (lower is better):

jemalloc over time

According to this graph jemalloc is usually lower. I made another box plot to compare maximum RSS on jemalloc:

jemalloc max RSS

The box plot shows that max RSS is typically lower on jemalloc.

CoW Performance

I didn't find a good way to measure CoW performance, but I don't think this patch would possibly degrade it.

Summary

I would like to merge this patch because there are a few good points (ability to use mprotect, memory savings, possible CoW improvements), and I can't find any downsides.

Thanks John Hawthorn for helping me get the math right on the "end pointer" part.

Note 1: I was able to prove that struct heap_page will exist on the same OS page as struct heap_page_body here: https://github.com/ruby/ruby/pull/3253/commits/33390d15e7a6f803823efcb41205167c8b126fbb


Files

0001-Expand-heap-pages-to-be-exactly-16kb.patch (4.51 KB) 0001-Expand-heap-pages-to-be-exactly-16kb.patch tenderlovemaking (Aaron Patterson), 06/30/2020 10:06 PM

Updated by headius (Charles Nutter) about 1 month ago

I chatted with tenderlovemaking (Aaron Patterson) a bit about this.

I did some research as well and my reading of this "extra" metadata from malloc seems to indicate that you really should not assumptions about it being in a specific place, or a specific size, or on a specific page. It is a malloc-internal detail, and presumably a good implementation would not dirty a whole new page just to do this bookkeeping. Let malloc do malloc.

It seems clear from this issue that at least one of these assumptions (where the metadata goes) is not always correct. To me, that's as bad as never being correct.

tenderlovemaking (Aaron Patterson) had already discovered this when I made the same suggestion based on my independent research, and we both came to the same conclusion. I think this is the right change to make.

Updated by ko1 (Koichi Sasada) about 1 month ago

Thank you for your survey. Seems fine!

Updated by tenderlovemaking (Aaron Patterson) about 1 month ago

  • Status changed from Open to Closed

Thanks, I committed it.

Also available in: Atom PDF