Bug #20485
closedSimple use of Fiber makes GC leak objects with singleton method
Description
I found a possible memory leak which occurs only when several conditions are met.
The code to reproduce the problem is below:
class Work
def add_method
singleton_class.define_method(:f) {}
end
end
1.times { Fiber.new {}.resume }
work = Work.new
work.add_method
work = nil
GC.start
num_objs = ObjectSpace.each_object.select { |o| o.is_a?(Work) rescue false }.size
unless num_objs.zero?
raise "NG"
end
Expected result: The script exits normally.
Actual result: RuntimeError "NG" is raised.
If I change 1.times { Fiber.new {}.resume }
to just Fiber.new {}.resume
or remove work.add_method
, GC works as expected.
Is there any problem at the way to use Fiber in this code, or is it a bug due to Ruby?
I tested ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] too and the result was a little different. The code above didn't reproduce the problem, but if I changed 1.times
to Mutex.new.synchronize
, it was able to reproduce.
Updated by skhrshin (Shintaro Sakahara) 6 months ago
- Subject changed from Simple use of Mutex and Fiber makes GC leak objects with singleton method to Simple use of Fiber makes GC leak objects with singleton method
- Description updated (diff)
Update: Using Mutex was not necessary.
Updated by skhrshin (Shintaro Sakahara) 6 months ago
- Description updated (diff)
Update: To reproduce this issue with Ruby 3.3.1, Mutex is necessary.
Updated by skhrshin (Shintaro Sakahara) 6 months ago
Changing 1.times
to [1].each
could reproduce the problem on Ruby 3.3.1 too.
Updated by skhrshin (Shintaro Sakahara) 6 months ago
- ruby -v changed from ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux] to ruby 3.2.4 (2024-04-23 revision af471c0e01) [x86_64-linux]
I confirmed that all of 1.times
, [1].each
and Mutex.new.synchronize
versions reproduce the problem on Ruby 3.2.4.
Updated by byroot (Jean Boussier) 6 months ago
- Status changed from Open to Closed
Looks like a duplicate of https://bugs.ruby-lang.org/issues/19436, fixed in Ruby 3.3 but can't really be backported.
Updated by byroot (Jean Boussier) 6 months ago
- Related to Bug #19436: Call Cache for singleton methods can lead to "memory leaks" added
Updated by skhrshin (Shintaro Sakahara) 6 months ago
Do you mean this is fixed in trunk? Or are you saying this shouldn't happen on Ruby 3.3.1? If latter, that is not correct as I wrote [1].each
and Mutex.new.synchronize
versions reproduce the problem on Ruby 3.3.1.
I would like you to reopen this issue. Should I update ruby -v
to 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux]
here, or should I create a new issue?
Updated by byroot (Jean Boussier) 6 months ago
I closed because I tried your repro script with ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin23]
both with 1.times
and Mutex.new.synchronize
and it doesn't fail.
Also your description really fit [Bug #19436], hence why I considered it a duplicate.
If you say you can reproduce it on 3.3.1, I'll re-open, but then I have no explanation why it doesn't reproduce on my machine.
Updated by byroot (Jean Boussier) 6 months ago
- Status changed from Closed to Open
Updated by byroot (Jean Boussier) 6 months ago
To be honest I also tried 3.2.2 and 3.1.4, each with [1].each
, 1.times
and Mutex.new.synchronize
, and neither reproduced.
So I'm starting to wonder if it isn't simply that for some reason one object consistently end up on the stack in your environment.
Updated by byroot (Jean Boussier) 6 months ago
@skhrshin (Shintaro Sakahara) if you can reproduce consistently, what could be helpful would be to provide a heap dump like this (use some service list GitHub gist because the output might be big:
require 'objspace'
class Work
def add_method
singleton_class.define_method(:f) {}
end
end
Mutex.new.synchronize { Fiber.new {}.resume }
work = Work.new
work.add_method
puts ObjectSpace.dump(work)
work = nil
GC.start
num_objs = ObjectSpace.each_object(Work).count
unless num_objs.zero?
puts '-' * 40
puts ObjectSpace.dump_all(output: :stdout)
raise "NG"
end
That would allow us to trace back what's preventing the object from being garbage collected.
Updated by skhrshin (Shintaro Sakahara) 6 months ago
I asked my co-workers to try this script and some of them gave me their results. The following table includes my results.
Environment | # of people | Reproducibility |
---|---|---|
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] on Ubuntu/WSL2 | 1 | Probably 100% |
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] on Ubuntu/virtualbox | 2 | Very high but less than 100% |
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] on Docker Desktop/Windows | 1 | High but less than 100% |
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux] on Ubuntu/Hyper-V | 1 | Low (about 10%) |
ruby 3.3.1 (2024-04-23 revision c56cd86388) +YJIT [arm64-darwin23] | 1 | 0% |
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-darwin22] | 1 | 0% |
The person who tried it on ruby 3.3.1 (2024-04-23 revision c56cd86388) +YJIT [arm64-darwin23]
also gave me the results on several Ruby versions. He said it was reproducible on 3.2.4, but not on 3.2.2.
I created a dump log by putting ObjectSpace.dump_all(output: :stdout)
before raise "NG"
and uploaded it to GitHub. This log doesn't contain ObjectSpace.dump(work)
you've suggested because with putting something like ObjectSpace.dump(work)
, puts 0
or sleep 1
between work.add_method
and work = nil
the script doesn't reproduce the problem.
As far as I investigated, I couldn't find any OBJECT that prevented work
from being GCed. The address of work
looks to be 0x7f2b552d0bb8. I don't have any knowledge about what IMEMO is. I would appreciate it if you could help me.
Updated by byroot (Jean Boussier) 6 months ago
Alright, looking at your dump:
{"address":"0x7f2b570455b8", "type":"CLASS", "shape_id":2, "slot_size":160, "class":"0x7f2b57045518", "variation_count":0, "superclass":"0x7f2b5707fd30", "name":"Work", "references":["0x7f2b5707fd30", "0x7f2b552d0d98", "0x7f2b704fea00", "0x7f2b552d0b90", "0x7f2b552d0d98", "0x7f2b552d0b68", "0x7f2b552d0b40", "0x7f2b552d0b18", "0x7f2b552d41f0"], "memsize":488, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}
This is the Work
class.
{"address":"0x7f2b57045478", "type":"CLASS", "shape_id":2, "slot_size":160, "class":"0x7f2b57045518", "variation_count":0, "superclass
":"0x7f2b570455b8", "real_class_name":"Work", "singleton":true, "references":["0x7f2b552d0bb8", "0x7f2b570455b8", "0x7f2b552d0a50", "0x7f2b704fe910", "0x7f2b552d0a28"], "memsize":384, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}
This is the Work
instance singleton class ("superclass":"0x7f2b570455b8"
).
{"address":"0x7f2b552d0bb8", "type":"OBJECT", "shape_id":5, "slot_size":40, "class":"0x7f2b57045478", "embedded":true, "ivars":0, "memsize":40, "flags":{"wb_protected":true}}
Is the Work
instance ("class":"0x7f2b57045478"
).
Using harb
I can see it's referenced by the Proc
and the singleton class:
harb> print 0x7f2b552d0bb8
0x7f2b552d0bb8: "OBJECT"
class: (null)
memsize: 40
retained memsize: 40
referenced from: [
0x7f2b552d0a78 (DATA: proc)
0x7f2b57045478 (CLASS: (null))
]
Which is expected.
However following both references, there is no path to the root. So my understanding is simply that one of these references is left on the C stack, and since Ruby's GC is conservative, it cannot know for sure if this is a true reference or not, so it doesn't collect the object.
To further prove that this isn't a leak, you could loop in your reproduction script. I suspect the "leaked" objects count will remain at one.
Updated by skhrshin (Shintaro Sakahara) 6 months ago
I tried looping with 100.times
in a more complex case that I created in the middle of the entire investigation of a test suite whose memory usage keeps growing until getting killed due to OOM, and as you said, the "leaked" object remaining was only one. So I conclude that this behavior is not a problem. I apologize for wasting your time. Thank you for the great help.
Updated by byroot (Jean Boussier) 6 months ago
- Status changed from Open to Closed
No worries. This aspect of Ruby GC often confuses people.