Misc #19052
openIncreased memory usage (RSS) for Ruby when compiled by gcc
Description
Hello! We have seen a large increase in memory usage as measured by resident set size (RSS) in our Ruby 3.1.2 programs when testing on Ubuntu Jammy (22.04). After further testing, we have narrowed down the problem as starting with Ubuntu Eoan (19.10). We first noticed this increase coming from a gem called eventmachine v1.2.7, which is a concurrency library that uses threads internally. That led us to try threads directly, and we came up with a simple Ruby program that reproduces the memory usage issue by spawning multiple threads through Thread.new.
We are unsure whether the problem extends beyond just Thread.new; it was the easiest problem to spot within our applications.
We are building Ruby from source using ruby-install, though have seen the problem in versions installed via apt.
We are seeking to understand the underlying cause of this memory bloat, and what we can do to prevent it.
Generally, we've been seeing similar total virtual memory allocated, only a difference in RSS usage.
On Ubuntu Disco (19.04) and prior, we see a relatively small increase in RSS upon spawning the threads, on the order of 20 kilobytes per thread. On Ubuntu Eoan and later, we see that RSS increases by 1 megabyte per thread.
When Ruby is compiled using clang instead of gcc on Ubuntu Eoan and later, the memory usage is comparable to older versions of Ubuntu.
We do not believe it is related to gcc version by itself, as we tried compiling Ruby using gcc 9 on Ubuntu Disco, and did not encounter the memory problem. On the theory that it may be a memory fragmentation issue, we tried both using the MALLOC_ARENA_MAX runtime environment variable, and compiling Ruby against jemalloc. Neither of these reduced RSS usage.
Here are the memory differences when simply calling Thread.new 10 times:
| Configuration | RSS before spawning threads | RSS after spawning threads | Diff | 
|---|---|---|---|
| Disco + gcc 8 | 13596 kB | 13800 kB | 204 kB | 
| Disco + gcc 9 | 13880 kB | 14088 kB | 208 kB | 
| Eoan + gcc 9 | 22268 kB | 32672 kB | 10404 kB | 
| Eoan + clang | 13976 kB | 14180 kB | 204 kB | 
| Jammy + gcc 11 | 22208 kB | 32608 kB | 10400 kB | 
| Jammy + clang | 13668 kB | 13872 kB | 204 kB | 
| Jammy + jemalloc | 24212 kB | 35216 kB | 11004 kB | 
| Jammy + apt | 21308 kB | 31712 kB | 10404 kB | 
We have attached the program and a series of Dockerfiles that replicate the test environments. While the reproduction is using Docker files, we have seen this same behavior on actual VMs. To run a particular Docker config, use ./run.sh <DIRECTORY>, e.g. ./run.sh jammy-clang.
Does anyone understand why there is an increase in resident set size (RSS) when Ruby is compiled with gcc vs clang?
Is there additional configuration (flags, etc) that could be sent to gcc to get equivalent memory usage on Jammy that is seen when using Disco or clang?
Files