Feature #19729
Updated by eightbitraptor (Matt V-H) over 1 year ago
## Github PR [Store object age in a bitmap #7938](https://github.com/ruby/ruby/pull/7938) ## Abstract Ruby currently uses 2 bits of the flags in every object to track how many GC events the object has survived. Objects are created at age 0 and can currently grow to age 3, at which point they are considered "old". Similar to the work carried out for [bitmap marking in Ruby 2.0](https://bugs.ruby-lang.org/issues/5839) which moved the mark bit from the flags to a bitmap attached to a heap page; This PR moves the age bits out of the object flags and stores them in a bitmap. ## Description This PR creates a new bitmap `age_bits` on each heap page. the size of the bitmap is controlled by `HEAP_PAGE_BITMAP_LIMIT`, which roughly indicates how many objects need to be considered by that bitmap, and `RVALUE_AGE_BITS_SIZE` which varies how many bits we use per object to store the age. We also introduce functions `RVALUE_AGE_SET`, `RVALUE_AGE_GET`, `RVALUE_AGE_RESET` and `RVALUE_AGE_INC` to manipulate the age of an object pointed to by a VALUE. ## Impact **Benefits:** * Improved CoW performance, because GC will modify no longer has to mutate the object header fewer times for fewer objects. objects that age. * Allow configuration of the age at which objects are considered old. Because the number of bits used is configurable, we can support arbitrary numbers of GC events before an object is considered old. We can use this to change major/minor GC timings for workloads with unusually high/low object churn. * Free a flag in each object that can be repurposed. Object flags are a precious resource, we should prefer to store data outside the flags where possible. * Remove GC related concerns from the object structure. This is important for initiatives like a generic GC interface and MMTk in order to keep GC related code as isolated as possible. **Concerns:** * Slightly increased RSS, because we now allocate an extra 2 bits per heap page slot, to create the age_bits bitmap on VM bootup. On my machine `sizeof(struct heap_page)` has increased from 1312 to 1728 bytes. ## Benchmarking Railsbench and Optcarrot benchmarked using `yjit-bench`. Showing small memory increase, but comparable performance. ``` master: ruby 3.3.0dev (2023-06-08T11:22:43Z master 3fe09eba9d) [x86_64-linux] mvh-rvalue-age-bitmap: ruby 3.3.0dev (2023-06-13T06:59:40Z master c74f42a4fb) [x86_64-linux] ---------- ----------- ---------- --------- -------------------------- ---------- --------- ----------------------------- ---------------------------- bench master (ms) stddev (%) RSS (MiB) mvh-rvalue-age-bitmap (ms) stddev (%) RSS (MiB) mvh-rvalue-age-bitmap 1st itr master/mvh-rvalue-age-bitmap railsbench 2154.3 0.5 101.4 2124.9 0.5 101.5 1.01 1.01 optcarrot 5372.2 0.6 55.0 5282.3 0.6 55.1 1.02 1.02 ---------- ----------- ---------- --------- -------------------------- ---------- --------- ----------------------------- ---------------------------- Legend: - mvh-rvalue-age-bitmap 1st itr: ratio of master/mvh-rvalue-age-bitmap time for the first benchmarking iteration. - master/mvh-rvalue-age-bitmap: ratio of master/mvh-rvalue-age-bitmap time. Higher is better for mvh-rvalue-age-bitmap. Above 1 represents a speedup. ``` ## Notes **FL_PROMOTED** We still use one of the two age bits. The original `FL_PROMOTED0` has been renamed to `FL_PROMOTED` and has been repurposed to indicate an objects old/young status. We need this because correctly tracking references from old to young objects relies on a write barrier, that's triggered whenever a field on an object is written to. Because this code is a very hot path it needs to be fast. Looking up the heap page and then calculating the old/young status based on the age bits would slow down this part of the code too much. Instead we set `FL_PROMOTED` whenever the object crosses the threshold into the old gen, and unset it if that object ever gets demoted back into the young gen. This way the write barrier can quickly tell whether the object is old or not and whether to add it to the rememberset. **rb_age_reset** I expose a function from gc.c to reset the object age. This is only used from one place: `Init_VM` in `vm.c`. We create a hidden class during boot for the VM's "Frozen Core". This is created using `rb_class_new` as a T_CLASS, then it has it's class path set, and then the flags are overwritten, forcing the type to T_ICLASS. Classes are allocated with age 2, and `rb_set_class_path` allocates, which may trigger GC. If this happens then the class will immediately become old, and have its `FL_PROMOTED` bit set. This will then be immediately wiped over when the flags are forced to T_ICLASS. This will result in the `FL_PROMOTED` bit and the object age being out of sync. I don't know why this code is using `rb_class_new` rather than `rb_include_class_new` but I will follow up this PR with an investigation.