Extend heap pages to exactly 16KiB
I would like to extend heap pages to be exactly 16KiB. Currently,
struct heap_page_body is 16KiB -
(sizeof(size_t) * 5).
Before I list the reasons I want to change, there are two important facts I want to list. First, OS pages are 4KiB on platforms I tested (macOS, Ubuntu, Windows). Second, when the GC allocates pages, it first allocates
struct heap_page_body immediately followed by
I want to make this change for a few reasons:
- I would like
struct heap_page_bodyto be a multiple of OS pages so that we can use
mprotecton it (I want to implement read barriers on heap pages with
mprotect, so this is my selfish reason)
- Some allocators (specifically glibc) will put
struct heap_pageon the same OS page as
struct heap_pageis frequently modified, so that OS page (including Ruby objects) will be copied. Extending
struct heap_page_bodyto 16KiB can help prevent CoW faults. (see Note 1)
- Allocating 16KiB can reduce overall memory consumption. Some allocators (specifically jemalloc) will round requested chunks to bin sizes. jemalloc has a 16KiB bin size, so our request for
16KiB - (sizeof(size_t) * 5)is rounded up to 16KiB anyway, and
(sizeof(size_t) * 5)is wasted.
(sizeof(size_t) * 5)is enough room to fit one more Ruby object, so if we use that space for one more object, then we don't need to allocate as many pages, and memory usage can actually decrease.
My hypothesis is that this patch will either not change overall memory usage, or decrease overall memory usage. But in either case it will allow us to use
mprotect, and improve CoW.
I tested this patch on an Ubuntu machine with jemalloc and glibc. Here is my system information:
aaron@whiteclaw ~> uname -a Linux whiteclaw 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux aaron@whiteclaw ~> lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04 LTS Release: 20.04 Codename: focal
aaron@whiteclaw ~> ldd --version ldd (Ubuntu GLIBC 2.31-0ubuntu9) 2.31 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.
aaron@whiteclaw ~/git> apt list --installed | grep jemalloc WARNING: apt does not have a stable CLI interface. Use with caution in scripts. libjemalloc-dev/focal,now 5.2.1-1ubuntu1 amd64 [installed] libjemalloc2/focal,now 5.2.1-1ubuntu1 amd64 [installed,automatic]
To test memory usage, I used this tool: https://github.com/bpowers/mstat
mstat is a sampling profiler that will report memory usage over time. I generated RDoc for Ruby and took samples while documentation was generated. Here is the Ruby command I used:
./ruby --disable-gems "./libexec/rdoc" --root "." --encoding=UTF-8 --all --ri --op ".ext/rdoc" --page-dir "./doc" --no-force-update "."
I made 50 samples for each allocator and branch like this:
for x in (seq 50) sudo rm -rf .ext/rdoc; sudo ../src/mstat/mstat -o glibc-branch_$x.tsv ./ruby --disable-gems "./libexec/rdoc" --root "." --encoding=UTF-8 --all --ri --op ".ext/rdoc" --page-dir "./doc" --no-force-update "." end
In other words I made 200 samples total (50 jemalloc + master, 50 jemalloc + branch, 50 glibc + master, 50 glibc + branch).
Here is a comparison of glibc over time (lower is better):
From this graph it looks like glibc is mostly the same, but sometimes lower. It looks like there are some outlier samples that go higher. I made a box plot to compare maximum RSS:
The box plot shows the max RSS is usually lower with some outliers that are higher.
Here is a comparison of jemalloc over time (lower is better):
According to this graph jemalloc is usually lower. I made another box plot to compare maximum RSS on jemalloc:
The box plot shows that max RSS is typically lower on jemalloc.
I didn't find a good way to measure CoW performance, but I don't think this patch would possibly degrade it.
I would like to merge this patch because there are a few good points (ability to use mprotect, memory savings, possible CoW improvements), and I can't find any downsides.
Thanks John Hawthorn for helping me get the math right on the "end pointer" part.
Note 1: I was able to prove that
struct heap_page will exist on the same OS page as
struct heap_page_body here: https://github.com/ruby/ruby/pull/3253/commits/33390d15e7a6f803823efcb41205167c8b126fbb
Updated by headius (Charles Nutter) about 1 month ago
I chatted with tenderlovemaking (Aaron Patterson) a bit about this.
I did some research as well and my reading of this "extra" metadata from malloc seems to indicate that you really should not assumptions about it being in a specific place, or a specific size, or on a specific page. It is a malloc-internal detail, and presumably a good implementation would not dirty a whole new page just to do this bookkeeping. Let malloc do malloc.
It seems clear from this issue that at least one of these assumptions (where the metadata goes) is not always correct. To me, that's as bad as never being correct.
tenderlovemaking (Aaron Patterson) had already discovered this when I made the same suggestion based on my independent research, and we both came to the same conclusion. I think this is the right change to make.