Project

General

Profile

Actions

Bug #21214

open

VmRSS consumption increase in Ruby 3.4.2 vs Ruby 3.3.6

Added by mood_vuadensl (LOIC VUADENS) 1 day ago. Updated about 13 hours ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
[ruby-core:121519]

Description

Hello,

After updating Ruby from 3.3.6 to 3.4.2, our batch-style (not based on rails) application exceed its memory limit.
Below is an example script that runs on both versions and demonstrates that 'ObjectSpace.memsize_of_all' does not vary significantly, but the OS 'VmRSS' increases significantly.

Do you have any information on what might have caused this increase or any lead to reduce this peak?

Here is the result on a Linux 5.15.167.4-microsoft-standard-WSL2:
with Ruby 3.3.6:

ruby 3.3.6 (2024-11-05 revision 75015d4c1f) [x86_64-linux]
On start on 3.3.6
- OS VmRSS:        20616 kB
- ObjectSpace.memsize_of_all: 1.7MB

 On first full workload: 0
- OS VmRSS:       559212 kB
- ObjectSpace.memsize_of_all: 327.86MB

....
After workload
- OS VmRSS:       711776 kB
- ObjectSpace.memsize_of_all: 327.86MB

After data released
- OS VmRSS:       616364 kB
- ObjectSpace.memsize_of_all: 1.71MB

and 3.4.2:

ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
On start on 3.4.2
- OS VmRSS:        13076 kB
- ObjectSpace.memsize_of_all: 1.7MB

 On first full workload: 0
- OS VmRSS:       674324 kB
- ObjectSpace.memsize_of_all: 353.6MB

....
After workload
- OS VmRSS:      1000628 kB
- ObjectSpace.memsize_of_all: 327.85MB

After data released
- OS VmRSS:       843636 kB
- ObjectSpace.memsize_of_all: 1.7MB

and the associated script:

require 'objspace'

BYTES_TO_MB = 1024 * 1024

$stdout.sync = true
srand(1)

# Declare supporting code
def print_info(context)
  puts context
  GC.start

  os_mem_metric = File.readlines("/proc/#{Process.pid}/status").find { |line| line.start_with?('VmRSS:') }
  puts "- OS #{os_mem_metric}"
  puts "- ObjectSpace.memsize_of_all: #{(ObjectSpace.memsize_of_all.to_f/BYTES_TO_MB).round(2)}MB"
  puts ''
end

def random_string = Array.new(10) { rand(99) }.join

class A
  def initialize
    @a = random_string
    @b = rand(1000000000000)
  end
end


# Main
print_info "On start on #{RUBY_VERSION}"

objects = Array.new(1_000_000) { A.new }
hashes =  Array.new(250_000) { { a: rand(100_000), b: rand(100_000), c: random_string } }
arrays =  Array.new(250_000) { [rand(100_000), rand(100_000), random_string] }

keep_if = ->(index) { index.even? }
0.upto(3) do |i_loop|
  objects = objects.map.with_index { |obj, index| keep_if.call(index) ? obj : A.new }
  hashes = hashes.map.with_index { |obj, index| keep_if.call(index) ? obj : { a: rand(10_000), b: rand(10_000), c: random_string } }
  arrays = arrays.map.with_index { |obj, index| keep_if.call(index) ? obj : [rand(10_000), rand(10_000), random_string] }

  print_info " On first full workload: #{i_loop}" if i_loop.zero?

  keep_if = ->(index) { index.odd? } if i_loop == 1
  keep_if = ->(index) { index%5 == 0 } if i_loop == 2
  keep_if = ->(index) { index.even? } if i_loop == 3
  print '.'
end
puts ''

print_info 'After workload'

objects.clear
hashes.clear
arrays.clear

print_info 'After data released'

Regards

Updated by mood_vuadensl (LOIC VUADENS) 1 day ago

  • Description updated (diff)

Add random strings to the object created during the loop

Updated by byroot (Jean Boussier) 1 day ago

ObjectSpace.memsize_of_all being mostly stable suggest the difference is likely in the GC releasing the memory less eagerly, or having trouble releasing it because it's more fragmented.

any lead to reduce this peak?

There are a number of GC parameters you can tweak to make it more or less aggressive in reducing memory usage. Self-plug but this post is the most up to date writup on GC tuning. It's goal are quite contrary to yours, but it explains what the key settings do, so it could be helpful to you.

You can also try calling GC.compact in between workloads to reduce fragmentation.

Updated by peterzhu2118 (Peter Zhu) about 24 hours ago

It looks like there is an issue with strings. I simplified the script to:

require 'objspace'

BYTES_TO_MB = 1024 * 1024

$stdout.sync = true
srand(1)

# Declare supporting code
def print_info(context)
  puts context
  GC.start

  puts "- OS #{`ps -o rss= -p #{$$}`}"
  puts "- ObjectSpace.memsize_of_all: #{(ObjectSpace.memsize_of_all.to_f/BYTES_TO_MB).round(2)}MB"
  puts ''
end

def random_string = "#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}#{rand(99)}"

# Main
print_info "On start on #{RUBY_VERSION}"

strings =  Array.new(1_000_000) { random_string }

print_info "After creating strings"

3.4:

On start on 3.4.2
- OS  13248
- ObjectSpace.memsize_of_all: 1.63MB

After creating strings
- OS 170832
- ObjectSpace.memsize_of_all: 85.55MB

3.3:

On start on 3.3.6
- OS  12944
- ObjectSpace.memsize_of_all: 1.64MB

After creating strings
- OS 109344
- ObjectSpace.memsize_of_all: 85.56MB

Updated by byroot (Jean Boussier) about 23 hours ago

It looks like there is an issue with strings

Looking at GC.count and GC.stat_heap

3.3.4

gc_count: 1179
0 =>  { :total_allocated_pages=>13, :total_freed_pages=>0}
1 =>  {    :total_allocated_pages=>1226, :total_freed_pages=>0,}

master

gc_count: 74
0 =>  { total_allocated_pages: 934, }
1 =>  { total_allocated_pages: 1227, }

So the difference seem to come from the 40B slots created by rand(99).to_s, they don't seem to trigger the GC as eagerly as before.

Updated by mood_vuadensl (LOIC VUADENS) about 13 hours ago

byroot (Jean Boussier) wrote in #note-2:

ObjectSpace.memsize_of_all being mostly stable suggest the difference is likely in the GC releasing the memory less eagerly, or having trouble releasing it because it's more fragmented.

any lead to reduce this peak?

There are a number of GC parameters you can tweak to make it more or less aggressive in reducing memory usage. Self-plug but this post is the most up to date writup on GC tuning. It's goal are quite contrary to yours, but it explains what the key settings do, so it could be helpful to you.

You can also try calling GC.compact in between workloads to reduce fragmentation.

We explored this way and on this script we can roughly reach the same memory state with something like:

export RUBY_GC_HEAP_GROWTH_MAX_SLOTS=150000
export RUBY_GC_HEAP_FREE_SLOTS_GOAL_RATIO=0.0

Unfortunately, on our application that runs for longer period of time, this is not enough (even using a GC.compact).

Actions

Also available in: Atom PDF

Like0
Like0Like0Like1Like1Like0