Bug #21214
openVmRSS consumption increase in Ruby 3.4.2 vs Ruby 3.3.6
Description
Hello,
After updating Ruby from 3.3.6 to 3.4.2, our batch-style (not based on rails) application exceed its memory limit.
Below is an example script that runs on both versions and demonstrates that 'ObjectSpace.memsize_of_all' does not vary significantly, but the OS 'VmRSS' increases significantly.
Do you have any information on what might have caused this increase or any lead to reduce this peak?
Here is the result on a Linux 5.15.167.4-microsoft-standard-WSL2:
with Ruby 3.3.6:
ruby 3.3.6 (2024-11-05 revision 75015d4c1f) [x86_64-linux]
On start on 3.3.6
- OS VmRSS: 20616 kB
- ObjectSpace.memsize_of_all: 1.7MB
On first full workload: 0
- OS VmRSS: 559212 kB
- ObjectSpace.memsize_of_all: 327.86MB
....
After workload
- OS VmRSS: 711776 kB
- ObjectSpace.memsize_of_all: 327.86MB
After data released
- OS VmRSS: 616364 kB
- ObjectSpace.memsize_of_all: 1.71MB
and 3.4.2:
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
On start on 3.4.2
- OS VmRSS: 13076 kB
- ObjectSpace.memsize_of_all: 1.7MB
On first full workload: 0
- OS VmRSS: 674324 kB
- ObjectSpace.memsize_of_all: 353.6MB
....
After workload
- OS VmRSS: 1000628 kB
- ObjectSpace.memsize_of_all: 327.85MB
After data released
- OS VmRSS: 843636 kB
- ObjectSpace.memsize_of_all: 1.7MB
and the associated script:
require 'objspace'
BYTES_TO_MB = 1024 * 1024
$stdout.sync = true
srand(1)
# Declare supporting code
def print_info(context)
puts context
GC.start
os_mem_metric = File.readlines("/proc/#{Process.pid}/status").find { |line| line.start_with?('VmRSS:') }
puts "- OS #{os_mem_metric}"
puts "- ObjectSpace.memsize_of_all: #{(ObjectSpace.memsize_of_all.to_f/BYTES_TO_MB).round(2)}MB"
puts ''
end
def random_string = Array.new(10) { rand(99) }.join
class A
def initialize
@a = random_string
@b = rand(1000000000000)
end
end
# Main
print_info "On start on #{RUBY_VERSION}"
objects = Array.new(1_000_000) { A.new }
hashes = Array.new(250_000) { { a: rand(100_000), b: rand(100_000), c: random_string } }
arrays = Array.new(250_000) { [rand(100_000), rand(100_000), random_string] }
keep_if = ->(index) { index.even? }
0.upto(3) do |i_loop|
objects = objects.map.with_index { |obj, index| keep_if.call(index) ? obj : A.new }
hashes = hashes.map.with_index { |obj, index| keep_if.call(index) ? obj : { a: rand(10_000), b: rand(10_000), c: random_string } }
arrays = arrays.map.with_index { |obj, index| keep_if.call(index) ? obj : [rand(10_000), rand(10_000), random_string] }
print_info " On first full workload: #{i_loop}" if i_loop.zero?
keep_if = ->(index) { index.odd? } if i_loop == 1
keep_if = ->(index) { index%5 == 0 } if i_loop == 2
keep_if = ->(index) { index.even? } if i_loop == 3
print '.'
end
puts ''
print_info 'After workload'
objects.clear
hashes.clear
arrays.clear
print_info 'After data released'
Regards
Updated by mood_vuadensl (LOIC VUADENS) 1 day ago
- Description updated (diff)
Add random strings to the object created during the loop
Updated by byroot (Jean Boussier) 1 day ago
ObjectSpace.memsize_of_all
being mostly stable suggest the difference is likely in the GC releasing the memory less eagerly, or having trouble releasing it because it's more fragmented.
any lead to reduce this peak?
There are a number of GC parameters you can tweak to make it more or less aggressive in reducing memory usage. Self-plug but this post is the most up to date writup on GC tuning. It's goal are quite contrary to yours, but it explains what the key settings do, so it could be helpful to you.
You can also try calling GC.compact
in between workloads to reduce fragmentation.
Updated by peterzhu2118 (Peter Zhu) about 24 hours ago
It looks like there is an issue with strings. I simplified the script to:
require 'objspace'
BYTES_TO_MB = 1024 * 1024
$stdout.sync = true
srand(1)
# Declare supporting code
def print_info(context)
puts context
GC.start
puts "- OS #{`ps -o rss= -p #{$$}`}"
puts "- ObjectSpace.memsize_of_all: #{(ObjectSpace.memsize_of_all.to_f/BYTES_TO_MB).round(2)}MB"
puts ''
end
def random_string = "#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}"
# Main
print_info "On start on #{RUBY_VERSION}"
strings = Array.new(1_000_000) { random_string }
print_info "After creating strings"
3.4:
On start on 3.4.2
- OS 13248
- ObjectSpace.memsize_of_all: 1.63MB
After creating strings
- OS 170832
- ObjectSpace.memsize_of_all: 85.55MB
3.3:
On start on 3.3.6
- OS 12944
- ObjectSpace.memsize_of_all: 1.64MB
After creating strings
- OS 109344
- ObjectSpace.memsize_of_all: 85.56MB
Updated by byroot (Jean Boussier) about 23 hours ago
It looks like there is an issue with strings
Looking at GC.count
and GC.stat_heap
3.3.4
gc_count: 1179
0 => { :total_allocated_pages=>13, :total_freed_pages=>0}
1 => { :total_allocated_pages=>1226, :total_freed_pages=>0,}
master
gc_count: 74
0 => { total_allocated_pages: 934, }
1 => { total_allocated_pages: 1227, }
So the difference seem to come from the 40B slots created by rand(99).to_s
, they don't seem to trigger the GC as eagerly as before.
Updated by mood_vuadensl (LOIC VUADENS) about 13 hours ago
byroot (Jean Boussier) wrote in #note-2:
ObjectSpace.memsize_of_all
being mostly stable suggest the difference is likely in the GC releasing the memory less eagerly, or having trouble releasing it because it's more fragmented.any lead to reduce this peak?
There are a number of GC parameters you can tweak to make it more or less aggressive in reducing memory usage. Self-plug but this post is the most up to date writup on GC tuning. It's goal are quite contrary to yours, but it explains what the key settings do, so it could be helpful to you.
You can also try calling
GC.compact
in between workloads to reduce fragmentation.
We explored this way and on this script we can roughly reach the same memory state with something like:
export RUBY_GC_HEAP_GROWTH_MAX_SLOTS=150000
export RUBY_GC_HEAP_FREE_SLOTS_GOAL_RATIO=0.0
Unfortunately, on our application that runs for longer period of time, this is not enough (even using a GC.compact).