Feature #20860
openMerge Optional Experimental Feature MMTk into Ruby
Description
GitHub PR: https://github.com/ruby/ruby/pull/11979
Summary¶
In this ticket, we're proposing upstreaming the current MMTk implementation into the ruby/mmtk
repository. This repository will be mirrored into ruby/ruby
and adds the files in gc/mmtk.c
, gc/mmtk.h
, and the Rust implementation in gc/mmtk
.
The current MMTk implementation uses the GC API and implements the NoGC and mark-sweep algorithms.
The current implementation is, in many cases, slower than Ruby's default GC, but we have concrete steps to improve performance, which is discussed in the Next Steps section.
Background¶
In [Feature #20351] we introduced a mechanism to plug an external garbage collector into Ruby using a dynamic shared library and in [Feature #20470] we introduced an API for third-party garbage collectors to plug into Ruby. Using this API, we were able to demonstrate that we can plug NoGC (a GC that allocates but never collects) and a modified version of Ruby's garbage collector into Ruby.
For the past few months, we've been implementing plugging MMTk into Ruby using this API.
What's MMTk?¶
MMTk is a framework that provides a wide variety of garbage collector implementations. Once a language integrates into their API, the language can use a wide variety of GC implementations, from basic algorithms such as mark-sweep (similar to Ruby's current GC), to more complex algorithms such as Immix its variants.
Kunshan Wang is a researcher at Australian National University and he has been working on implementing Ruby with MMTk. His work is available as a fork of Ruby at mmtk/ruby with Rust bindings at mmtk/mmtk-ruby.
Implementation¶
Overview¶
We've taken Kunshan's implementation and rewritten it using the GC API. Compared to Kunshan's original implementation, this makes the changes minimally invasive as it removes the need for MMTk specific code inside Ruby. Instead, all of the MMTk code lives inside of gc/mmtk.c
, gc/mmtk.h
, and the Rust code inside of the gc/mmtk
directory.
However, compared to Kunshan's implementation, we only support a subset of features and have inferior performance. Most notably, we currently only support NoGC (a GC that only allocates but never collects) and mark-sweep (which is similar to Ruby's current GC). We do not support the advanced copying GC that Kunshan's implementation supports, such as Immix.
Using This Feature¶
To use this feature, follow these steps:
- Configure Ruby with
--with-shared-gc=.
(you can also change the directory you want to place the GC libraries).
You should seewith shared GC: yes
in the configuration summary. - Build MMTk and the Rust binding by running
cargo build
orcargo build --release
in thegc/mmtk
directory to build the debug and release versions, respectively.
This will generate thegc/mmtk/target/debug/libmmtk_ruby.a
orgc/mmtk/target/release/libmmtk_ruby.a
file for MMTk and the Rust binding. - Run
make shared-gc SHARED_GC=mmtk
to build the GC library.
This will generate thelibrubygc.mmtk.so
(on Linux) orlibrubygc.mmtk.dylib
(on macOS) file in the directory that you specified with--with-shared-gc
. - Run Ruby with
RUBY_GC_LIBRARY=mmtk
environment variable to use MMTk.
On debug builds of MMTk, you should see logging output, such asInitialized MMTk with MarkSweep
. You can turn this output off by settingRUST_LOG=
(empty value) environment variable.
You can also customize MMTk at runtime with the following environment variables:
-
MMTK_PLAN=<NoGC|MarkSweep>
: Configures the GC algorithm used by MMTk. Defaults toMarkSweep
. -
MMTK_HEAP_MODE=<fixed|dynamic>
: Configures the MMTk heap used.fixed
is a fixed size heap,dynamic
is a dynamic sized heap that will grow and shrink in size based on heuristics using the MemBalancer algorithm. Defaults todynamic
. -
MMTK_HEAP_MIN=<size>
: Configures the lower bound in heap memory usage by MMTk. Only valid whenMMTK_HEAP_MODE=dynamic
.size
is in bytes, but you can also appendKiB
,MiB
,GiB
for larger sizes. Defaults to 1MiB. -
MMTK_HEAP_MAX=<size>
: Configures the upper bound in heap memory usage by MMTk. Once this limit is reached and no objects can be garbage collected, it will crash with an out-of-memory.size
is in bytes, but you can also appendKiB
,MiB
,GiB
for larger sizes. Defaults to 80% of your system RAM.
Code Organization¶
The code is organized into two parts: the C binding and the Rust binding.
The C binding lives in gc/mmtk.c
. It implements the GC API that Ruby communicates with and performs Ruby-level operations such as stopping and starting Ractors before and after a GC, marking objects, and freeing objects.
The Rust binding lives in the gc/mmtk
directory. It calls the APIs provided by mmtk-core to allocate objects, and also implements traits (including callbacks) required by mmtk-core for stopping/resuming Ractors, scanning roots, and scanning object fields.
mmtk-core is included as a dependency of the Rust binding and contains language agnostic implementations of various garbage collectors.
At compile time, the Rust binding is statically linked to the C binding to form a shared object that can be dynamically loaded by Ruby.
Why We Are Proposing to Upstream This Feature¶
At this point, we have fully functional implementations of NoGC and mark-sweep algorithms. While we still have a long way to go to improve performance and implement more advanced algorithms (discussed in the Next Steps section), we would like to upstream this to improve collaboration with the Ruby core team and the Ruby community.
This proposal is for an experimental feature and will not be enabled by default. For users that want to try out this feature, they will have to compile Ruby with the shared GC feature enabled, compile the Rust bindings, and compile the MMTk shared GC.
Additionally, we do not ever anticipate replacing Ruby's default GC with MMTk but instead offer it as an alternative implementation. Ruby's default GC will always be the default GC due to its lack of external dependencies, versatility, and ease of use. As such, similar to YJIT, we will not be introducing a dependency on Rust for normal builds.
Benchmarks and Analysis¶
We ran yjit-bench (commit 1b298fa) on a Ubuntu 24.04 machine with an Intel Core Ultra 7 155H. Here are the benchmark results:
-------------- ----------- ---------- --------- --------- ---------- --------- ------------ -----------
bench master (ms) stddev (%) RSS (MiB) mmtk (ms) stddev (%) RSS (MiB) mmtk 1st itr master/mmtk
activerecord 451.4 0.1 60.8 4667.3 0.9 69.6 0.10 0.10
chunky-png 1151.1 0.3 41.4 1416.1 0.4 29.4 0.83 0.81
erubi-rails 1940.7 0.2 105.6 15892.9 3.4 326.3 0.14 0.12
hexapdf 3570.4 1.1 119.2 7552.2 3.6 159.7 0.37 0.47
liquid-c 87.7 0.7 26.1 247.8 2.3 33.8 0.35 0.35
liquid-compile 86.1 3.1 27.0 192.5 14.2 34.6 0.31 0.45
liquid-render 217.5 0.3 25.8 428.7 1.3 34.4 0.50 0.51
lobsters 1662.3 1.1 268.1 2455.5 2.6 282.3 0.27 0.68
mail 194.2 0.6 48.5 585.8 2.3 58.7 0.37 0.33
psych-load 2976.9 0.0 24.8 12646.1 1.7 33.6 0.23 0.24
railsbench 4124.3 0.3 96.9 20606.6 0.5 171.5 0.22 0.20
rubocop 245.3 1.6 81.7 474.7 7.1 92.9 0.27 0.52
ruby-lsp 234.5 0.3 61.5 1337.8 2.0 62.7 0.20 0.18
sequel 98.3 0.6 30.0 376.7 1.7 44.4 0.26 0.26
-------------- ----------- ---------- --------- --------- ---------- --------- ------------ -----------
The performance geometric mean is 0.28, so it is almost 4x slower than the default GC.
We analyzed the railsbench benchmark, and we found clear reasons why it's slower:
- The default GC runs 823 GC times for the benchmark, while MMTk runs 10111 times, which is over 10x as much.
- The default GC runs only 11 major GC runs, and the rest (over 800) are minor GC. Since the MMTk implementation is not generational, every GC run is a major GC.
- As a result, the default GC spends 3397ms in GC while MMTk spends 141386ms (which is 41x the time compared to the default GC).
- Running a profile (attached screenshot below), we can see that the most of the time only a single worker thread is performing work. Since parallelism wasn't a priority in this phase of the project, there's significant improvement opportunities there.
These are some of the performance bottlenecks that we have identified and we have concrete steps on improving this in the section below.
Next Steps¶
Our current roadmap looks like the following:
- Support a faster non-moving collector such as non-moving Immix.
- Improve parallelism in the GC cycle so that it is faster.
- Implement copying GC such as Immix.
- Implement generational garbage collectors for better performance.
- Improve Ruby's data structures (such as object shapes, arrays, and strings) to take advantage of MMTk's ability to allocate dynamic and larger object sizes.
Files
No data to display