Bug #19969
closed
Regression of memory usage with Ruby 3.1
Added by hsbt (Hiroshi SHIBATA) about 1 year ago.
Updated about 1 year ago.
Description
Our company that is ANDPAD, Inc. encountered to increase memory usage after upgrading Ruby 3.2 from 3.0 on our Rails application. This increase size is about 20%.
My colleague found this root cause and reproduction code:
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); Array.new(10000) { s1 - s2 - [0] }; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.0.6p216 (2023-06-29 revision bdfe1958a8) +JIT [arm64-darwin22]
248096
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); Array.new(10000) { s1 - s2 - [0] }; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.2.2 (2023-07-05 revision 2f603bc4d7) +YJIT [arm64-darwin22]
2949280
Should we revert #16996 for Ruby 3.1 or later? I'm not sure this increased memory usage is reasonable with performance improvement.
- Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 3.0: DONTNEED, 3.1: REQUIRED, 3.2: REQUIRED
Right, @nobu's approach seems much better than reintroducing that weird behavior for .dup
.
Ideally we wouldn't rehash as in calling key.hash
methods again, but instead just shrink the internal data structure (and same when growing it).
So apparently some applications were relying on Set#dup
/Hash#dup
to do like C++ shrink_to_fit.
Ruby does not have such a method and it feels quite low-level, so it seems better to resize the internal data structure when removing elements/entries and going below some threshold.
As a note, this repro code is very "lucky" to trigger a dup
after removing 99.99% of the elements.
I suppose it's done that way to make the effect very clear though.
Without the - [0]
the same problem occurs on 3.0:
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); a=Array.new(10000) { s1 - s2 }; GC.start; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.0.6p216 (2023-03-30 revision 23a532679b) [x86_64-linux]
3015808
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); a=Array.new(10000) { s1 - s2 - [0] }; GC.start; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.0.6p216 (2023-03-30 revision 23a532679b) [x86_64-linux]
74552
If a Set is kept alive a long time, one way to ensure it uses the minimum amount of space is Set#reset
, at the cost of extra time to reset/rehash (which notably calls #hash
for every key), it's a time vs memory trade-off, can be worth it for big long-lived sets:
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); a=Array.new(10000) { s=s1 - s2 - [0]; s.reset; s }; GC.start; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
62992
Automatic shrinking (PR at https://github.com/ruby/ruby/pull/8748) should help the worst cases like the repro so that seems good anyway.
- Status changed from Open to Closed
- Backport changed from 3.0: DONTNEED, 3.1: REQUIRED, 3.2: REQUIRED to 3.0: DONTNEED, 3.1: REQUIRED, 3.2: DONE
ruby_3_2 1cc38d5a2f84733e1c2e42548639e2891fe61e69 merged revision(s) 9eac9d71786a8dbec520d0541a91149f01adf8ea.
Thanks nobu and nagachika.
I confirmed to resolve this regrassion with ruby_3_2
branch.
# Before
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); Array.new(10000) { s1 - s2 - [0] }; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.2.2 (2023-07-05 revision 2f603bc4d7) +YJIT [arm64-darwin23]
4564304
# After
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); Array.new(10000) { s1 - s2 - [0] }; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.2.2 (2023-11-19 revision d9f4f321c6) +YJIT [arm64-darwin23]
40864
- Backport changed from 3.0: DONTNEED, 3.1: REQUIRED, 3.2: DONE to 3.0: DONTNEED, 3.1: DONE, 3.2: DONE
ruby_3_1 1cae5e7ceaca7304108fdec35d4858a9e4ff7fe0 merged revision(s) 9eac9d71786a8dbec520d0541a91149f01adf8ea.
Also available in: Atom
PDF
Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0