Project

General

Profile

Actions

Bug #20485

closed

Simple use of Fiber makes GC leak objects with singleton method

Added by skhrshin (Shintaro Sakahara) 6 months ago. Updated 6 months ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.4 (2024-04-23 revision af471c0e01) [x86_64-linux]
[ruby-core:117838]

Description

I found a possible memory leak which occurs only when several conditions are met.

The code to reproduce the problem is below:

class Work
  def add_method
    singleton_class.define_method(:f) {}
  end
end

1.times { Fiber.new {}.resume }

work = Work.new
work.add_method
work = nil
GC.start

num_objs = ObjectSpace.each_object.select { |o| o.is_a?(Work) rescue false }.size
unless num_objs.zero?
  raise "NG"
end

Expected result: The script exits normally.
Actual result: RuntimeError "NG" is raised.

If I change 1.times { Fiber.new {}.resume } to just Fiber.new {}.resume or remove work.add_method, GC works as expected.
Is there any problem at the way to use Fiber in this code, or is it a bug due to Ruby?

I tested ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] too and the result was a little different. The code above didn't reproduce the problem, but if I changed 1.times to Mutex.new.synchronize, it was able to reproduce.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #19436: Call Cache for singleton methods can lead to "memory leaks"Closedko1 (Koichi Sasada)Actions

Updated by skhrshin (Shintaro Sakahara) 6 months ago

  • Subject changed from Simple use of Mutex and Fiber makes GC leak objects with singleton method to Simple use of Fiber makes GC leak objects with singleton method
  • Description updated (diff)

Update: Using Mutex was not necessary.

Actions #2

Updated by skhrshin (Shintaro Sakahara) 6 months ago

  • Description updated (diff)

Updated by skhrshin (Shintaro Sakahara) 6 months ago

  • Description updated (diff)

Update: To reproduce this issue with Ruby 3.3.1, Mutex is necessary.

Updated by skhrshin (Shintaro Sakahara) 6 months ago

Changing 1.times to [1].each could reproduce the problem on Ruby 3.3.1 too.

Updated by skhrshin (Shintaro Sakahara) 6 months ago

  • ruby -v changed from ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux] to ruby 3.2.4 (2024-04-23 revision af471c0e01) [x86_64-linux]

I confirmed that all of 1.times, [1].each and Mutex.new.synchronize versions reproduce the problem on Ruby 3.2.4.

Updated by byroot (Jean Boussier) 6 months ago

  • Status changed from Open to Closed

Looks like a duplicate of https://bugs.ruby-lang.org/issues/19436, fixed in Ruby 3.3 but can't really be backported.

Actions #7

Updated by byroot (Jean Boussier) 6 months ago

  • Related to Bug #19436: Call Cache for singleton methods can lead to "memory leaks" added

Updated by skhrshin (Shintaro Sakahara) 6 months ago

Do you mean this is fixed in trunk? Or are you saying this shouldn't happen on Ruby 3.3.1? If latter, that is not correct as I wrote [1].each and Mutex.new.synchronize versions reproduce the problem on Ruby 3.3.1.
I would like you to reopen this issue. Should I update ruby -v to 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] here, or should I create a new issue?

Updated by byroot (Jean Boussier) 6 months ago

I closed because I tried your repro script with ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin23] both with 1.times and Mutex.new.synchronize and it doesn't fail.

Also your description really fit [Bug #19436], hence why I considered it a duplicate.

If you say you can reproduce it on 3.3.1, I'll re-open, but then I have no explanation why it doesn't reproduce on my machine.

Actions #10

Updated by byroot (Jean Boussier) 6 months ago

  • Status changed from Closed to Open

Updated by byroot (Jean Boussier) 6 months ago

To be honest I also tried 3.2.2 and 3.1.4, each with [1].each, 1.times and Mutex.new.synchronize, and neither reproduced.

So I'm starting to wonder if it isn't simply that for some reason one object consistently end up on the stack in your environment.

Updated by byroot (Jean Boussier) 6 months ago

@skhrshin (Shintaro Sakahara) if you can reproduce consistently, what could be helpful would be to provide a heap dump like this (use some service list GitHub gist because the output might be big:

require 'objspace'

class Work
  def add_method
    singleton_class.define_method(:f) {}
  end
end

Mutex.new.synchronize { Fiber.new {}.resume }

work = Work.new
work.add_method
puts ObjectSpace.dump(work)

work = nil
GC.start

num_objs = ObjectSpace.each_object(Work).count
unless num_objs.zero?
  puts '-' * 40
  puts ObjectSpace.dump_all(output: :stdout)
  raise "NG"
end

That would allow us to trace back what's preventing the object from being garbage collected.

Updated by skhrshin (Shintaro Sakahara) 6 months ago

I asked my co-workers to try this script and some of them gave me their results. The following table includes my results.

Environment # of people Reproducibility
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] on Ubuntu/WSL2 1 Probably 100%
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] on Ubuntu/virtualbox 2 Very high but less than 100%
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] on Docker Desktop/Windows 1 High but less than 100%
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] on Ubuntu/Hyper-V 1 Low (about 10%)
ruby 3.3.1 (2024-04-23 revision c56cd86388) +YJIT [arm64-darwin23] 1 0%
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-darwin22] 1 0%

The person who tried it on ruby 3.3.1 (2024-04-23 revision c56cd86388) +YJIT [arm64-darwin23] also gave me the results on several Ruby versions. He said it was reproducible on 3.2.4, but not on 3.2.2.

I created a dump log by putting ObjectSpace.dump_all(output: :stdout) before raise "NG" and uploaded it to GitHub. This log doesn't contain ObjectSpace.dump(work) you've suggested because with putting something like ObjectSpace.dump(work), puts 0 or sleep 1 between work.add_method and work = nil the script doesn't reproduce the problem.

https://gist.github.com/skhrshin/f639e387578db8faf431adfb7ac06631#file-bugs-ruby-lang-org_issues_20485_dump_all-log

As far as I investigated, I couldn't find any OBJECT that prevented work from being GCed. The address of work looks to be 0x7f2b552d0bb8. I don't have any knowledge about what IMEMO is. I would appreciate it if you could help me.

Updated by byroot (Jean Boussier) 6 months ago

Alright, looking at your dump:

{"address":"0x7f2b570455b8", "type":"CLASS", "shape_id":2, "slot_size":160, "class":"0x7f2b57045518", "variation_count":0, "superclass":"0x7f2b5707fd30", "name":"Work", "references":["0x7f2b5707fd30", "0x7f2b552d0d98", "0x7f2b704fea00", "0x7f2b552d0b90", "0x7f2b552d0d98", "0x7f2b552d0b68", "0x7f2b552d0b40", "0x7f2b552d0b18", "0x7f2b552d41f0"], "memsize":488, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}

This is the Work class.

{"address":"0x7f2b57045478", "type":"CLASS", "shape_id":2, "slot_size":160, "class":"0x7f2b57045518", "variation_count":0, "superclass
":"0x7f2b570455b8", "real_class_name":"Work", "singleton":true, "references":["0x7f2b552d0bb8", "0x7f2b570455b8", "0x7f2b552d0a50", "0x7f2b704fe910", "0x7f2b552d0a28"], "memsize":384, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}

This is the Work instance singleton class ("superclass":"0x7f2b570455b8").

{"address":"0x7f2b552d0bb8", "type":"OBJECT", "shape_id":5, "slot_size":40, "class":"0x7f2b57045478", "embedded":true, "ivars":0, "memsize":40, "flags":{"wb_protected":true}}

Is the Work instance ("class":"0x7f2b57045478").

Using harb I can see it's referenced by the Proc and the singleton class:

harb> print 0x7f2b552d0bb8
    0x7f2b552d0bb8: "OBJECT"
             class: (null)
           memsize: 40
  retained memsize: 40
   referenced from: [
                      0x7f2b552d0a78 (DATA: proc)
                      0x7f2b57045478 (CLASS: (null))
                    ]

Which is expected.

However following both references, there is no path to the root. So my understanding is simply that one of these references is left on the C stack, and since Ruby's GC is conservative, it cannot know for sure if this is a true reference or not, so it doesn't collect the object.

To further prove that this isn't a leak, you could loop in your reproduction script. I suspect the "leaked" objects count will remain at one.

Updated by skhrshin (Shintaro Sakahara) 6 months ago

I tried looping with 100.times in a more complex case that I created in the middle of the entire investigation of a test suite whose memory usage keeps growing until getting killed due to OOM, and as you said, the "leaked" object remaining was only one. So I conclude that this behavior is not a problem. I apologize for wasting your time. Thank you for the great help.

Updated by byroot (Jean Boussier) 6 months ago

  • Status changed from Open to Closed

No worries. This aspect of Ruby GC often confuses people.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0