Project

General

Profile

Actions

Bug #21214

open

VmRSS consumption increase in Ruby 3.4.2 vs Ruby 3.3.6

Added by mood_vuadensl (LOIC VUADENS) 2 days ago. Updated about 8 hours ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
[ruby-core:121519]

Description

Hello,

After updating Ruby from 3.3.6 to 3.4.2, our batch-style (not based on rails) application exceed its memory limit.
Below is an example script that runs on both versions and demonstrates that 'ObjectSpace.memsize_of_all' does not vary significantly, but the OS 'VmRSS' increases significantly.

Do you have any information on what might have caused this increase or any lead to reduce this peak?

Here is the result on a Linux 5.15.167.4-microsoft-standard-WSL2:
with Ruby 3.3.6:

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) [x86_64-linux]
On start on 3.3.6
- OS VmRSS:        20616 kB
- ObjectSpace.memsize_of_all: 1.7MB

 On first full workload: 0
- OS VmRSS:       559212 kB
- ObjectSpace.memsize_of_all: 327.86MB

....
After workload
- OS VmRSS:       711776 kB
- ObjectSpace.memsize_of_all: 327.86MB

After data released
- OS VmRSS:       616364 kB
- ObjectSpace.memsize_of_all: 1.71MB

and 3.4.2:

ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
On start on 3.4.2
- OS VmRSS:        13076 kB
- ObjectSpace.memsize_of_all: 1.7MB

 On first full workload: 0
- OS VmRSS:       674324 kB
- ObjectSpace.memsize_of_all: 353.6MB

....
After workload
- OS VmRSS:      1000628 kB
- ObjectSpace.memsize_of_all: 327.85MB

After data released
- OS VmRSS:       843636 kB
- ObjectSpace.memsize_of_all: 1.7MB

and the associated script:

require 'objspace'

BYTES_TO_MB = 1024 * 1024

$stdout.sync = true
srand(1)

# Declare supporting code
def print_info(context)
  puts context
  GC.start

  os_mem_metric = File.readlines("/proc/#{Process.pid}/status").find { |line| line.start_with?('VmRSS:') }
  puts "- OS #{os_mem_metric}"
  puts "- ObjectSpace.memsize_of_all: #{(ObjectSpace.memsize_of_all.to_f/BYTES_TO_MB).round(2)}MB"
  puts ''
end

def random_string = Array.new(10) { rand(99) }.join

class A
  def initialize
    @a = random_string
    @b = rand(1000000000000)
  end
end


# Main
print_info "On start on #{RUBY_VERSION}"

objects = Array.new(1_000_000) { A.new }
hashes =  Array.new(250_000) { { a: rand(100_000), b: rand(100_000), c: random_string } }
arrays =  Array.new(250_000) { [rand(100_000), rand(100_000), random_string] }

keep_if = ->(index) { index.even? }
0.upto(3) do |i_loop|
  objects = objects.map.with_index { |obj, index| keep_if.call(index) ? obj : A.new }
  hashes = hashes.map.with_index { |obj, index| keep_if.call(index) ? obj : { a: rand(10_000), b: rand(10_000), c: random_string } }
  arrays = arrays.map.with_index { |obj, index| keep_if.call(index) ? obj : [rand(10_000), rand(10_000), random_string] }

  print_info " On first full workload: #{i_loop}" if i_loop.zero?

  keep_if = ->(index) { index.odd? } if i_loop == 1
  keep_if = ->(index) { index%5 == 0 } if i_loop == 2
  keep_if = ->(index) { index.even? } if i_loop == 3
  print '.'
end
puts ''

print_info 'After workload'

objects.clear
hashes.clear
arrays.clear

print_info 'After data released'

Regards

Updated by mood_vuadensl (LOIC VUADENS) 2 days ago

  • Description updated (diff)

Add random strings to the object created during the loop

Updated by byroot (Jean Boussier) 2 days ago

ObjectSpace.memsize_of_all being mostly stable suggest the difference is likely in the GC releasing the memory less eagerly, or having trouble releasing it because it's more fragmented.

any lead to reduce this peak?

There are a number of GC parameters you can tweak to make it more or less aggressive in reducing memory usage. Self-plug but this post is the most up to date writup on GC tuning. It's goal are quite contrary to yours, but it explains what the key settings do, so it could be helpful to you.

You can also try calling GC.compact in between workloads to reduce fragmentation.

Updated by peterzhu2118 (Peter Zhu) 2 days ago

It looks like there is an issue with strings. I simplified the script to:

require 'objspace'

BYTES_TO_MB = 1024 * 1024

$stdout.sync = true
srand(1)

# Declare supporting code
def print_info(context)
  puts context
  GC.start

  puts "- OS #{`ps -o rss= -p #{$$}`}"
  puts "- ObjectSpace.memsize_of_all: #{(ObjectSpace.memsize_of_all.to_f/BYTES_TO_MB).round(2)}MB"
  puts ''
end

def random_string = "#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}"

# Main
print_info "On start on #{RUBY_VERSION}"

strings =  Array.new(1_000_000) { random_string }

print_info "After creating strings"

3.4:

On start on 3.4.2
- OS  13248
- ObjectSpace.memsize_of_all: 1.63MB

After creating strings
- OS 170832
- ObjectSpace.memsize_of_all: 85.55MB

3.3:

On start on 3.3.6
- OS  12944
- ObjectSpace.memsize_of_all: 1.64MB

After creating strings
- OS 109344
- ObjectSpace.memsize_of_all: 85.56MB

Updated by byroot (Jean Boussier) 1 day ago

It looks like there is an issue with strings

Looking at GC.count and GC.stat_heap

3.3.4

gc_count: 1179
0 =>  { :total_allocated_pages=>13, :total_freed_pages=>0}
1 =>  {    :total_allocated_pages=>1226, :total_freed_pages=>0,}

master

gc_count: 74
0 =>  { total_allocated_pages: 934, }
1 =>  { total_allocated_pages: 1227, }

So the difference seem to come from the 40B slots created by rand(99).to_s, they don't seem to trigger the GC as eagerly as before.

Updated by mood_vuadensl (LOIC VUADENS) 1 day ago

byroot (Jean Boussier) wrote in #note-2:

ObjectSpace.memsize_of_all being mostly stable suggest the difference is likely in the GC releasing the memory less eagerly, or having trouble releasing it because it's more fragmented.

any lead to reduce this peak?

There are a number of GC parameters you can tweak to make it more or less aggressive in reducing memory usage. Self-plug but this post is the most up to date writup on GC tuning. It's goal are quite contrary to yours, but it explains what the key settings do, so it could be helpful to you.

You can also try calling GC.compact in between workloads to reduce fragmentation.

We explored this way and on this script we can roughly reach the same memory state with something like:

export RUBY_GC_HEAP_GROWTH_MAX_SLOTS=150000
export RUBY_GC_HEAP_FREE_SLOTS_GOAL_RATIO=0.0

Unfortunately, on our application that runs for longer period of time, this is not enough (even using a GC.compact).

Updated by peterzhu2118 (Peter Zhu) about 16 hours ago

  • Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED

Updated by mood_vuadensl (LOIC VUADENS) about 8 hours ago

byroot (Jean Boussier) wrote in #note-2:

ObjectSpace.memsize_of_all being mostly stable suggest the difference is likely in the GC releasing the memory less eagerly, or having trouble releasing it because it's more fragmented.
[...]
peterzhu2118 (Peter Zhu) wrote in #note-6:
I have a fix here: https://github.com/ruby/ruby/pull/13061

Thank to both of you for your time !

Looking forward for the fix to be available.
Thanks again.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like1Like1Like0Like0Like0