Project

General

Profile

Feature #18559

Updated by byroot (Jean Boussier) over 2 years ago

Marking this as a feature, because I think it should be improved but can hardly be considered a bug. 

 ### Repro 

 Consider the following script: 

 ```ruby 
 # /tmp/allocation-source.rb 
 require 'objspace' 
 require 'tmpdir' 

 source = File.join(Dir.tmpdir, "foo.rb") 
 File.write(source, <<~RUBY) 
   # frozen_string_literal: true 
   class Foo 
     def plop 
       "fizz" 
     end 
   end 
 RUBY 

 ObjectSpace.trace_object_allocations_start 

 GC.start 
 gen = GC.count 
 require(source) 
 ObjectSpace.dump_all(output: $stdout, since: gen) 
 ``` 

 ### Expected behavior 

 I'd expect the `ObjectSpace.dump_all` output to attribute all new objects, including `T_IMEMO` etc, to `foo.rb` 

 ### Actual behavior 

 They are attributed to the source file that called `Kernel.require` (so with `--disable-gems`): 

 ``` 
 {"address":"0x11acaec78", "type":"CLASS", "class":"0x11acaebb0", "superclass":"0x10fa4a848", "name":"Foo", "references":["0x10fa4a848", "0x11acaea98", "0x11acaf790"], "file":"/var/folders/vy/srfpq1vn6hv5r6bzkvcw13y80000gn/T/foo.rb", "line":2, "generation":1, "memsize":544, "flags":{"wb_protected":true}} 
 {"address":"0x11acaeca0", "type":"IMEMO", "class":"0x8", "imemo_type":"cref", "references":["0x10fa4a848"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}} 
 {"address":"0x11acaecc8", "type":"STRING", "class":"0x10fa42418", "frozen":true, "embedded":true, "fstring":true, "bytesize":4, "value":"fizz", "encoding":"UTF-8", "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}} 
 {"address":"0x11acaecf0", "type":"ARRAY", "class":"0x10fa28f68", "frozen":true, "length":2, "embedded":true, "references":["0x11acaff88", "0x11acaf240"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}} 
 {"address":"0x11acaed18", "type":"IMEMO", "imemo_type":"iseq", "references":["0x11acaecc8", "0x11acaf600", "0x11acaf600", "0x11acaecf0"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":416, "flags":{"wb_protected":true}} 
 {"address":"0x11acaf1a0", "type":"ARRAY", "class":"0x10fa28f68", "frozen":true, "length":2, "embedded":true, "references":["0x11acaff88", "0x11acaf240"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}} 
 {"address":"0x11acaf1c8", "type":"IMEMO", "imemo_type":"iseq", "references":["0x11acaed18", "0x11acaf1f0", "0x11acaf1f0", "0x11acaf1a0", "0x11acaf290"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":456, "flags":{"wb_protected":true}} 
 {"address":"0x11acaf1f0", "type":"STRING", "class":"0x10fa42418", "frozen":true, "embedded":true, "fstring":true, "bytesize":11, "value":"<class:Foo>", "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}} 
 {"address":"0x11acaf218", "type":"ARRAY", "class":"0x10fa28f68", "frozen":true, "length":2, "embedded":true, "references":["0x11acaff88", "0x11acaf240"], "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":40, "flags":{"wb_protected":true}} 
 {"address":"0x11acaf240", "type":"STRING", "class":"0x10fa42418", "frozen":true, "fstring":true, "bytesize":63, "value":"/private/var/folders/vy/srfpq1vn6hv5r6bzkvcw13y80000gn/T/foo.rb", "encoding":"UTF-8", "file":"/tmp/allocation-source.rb", "line":19, "method":"require", "generation":1, "memsize":104, "flags":{"wb_protected":true}} 
 .... 

 ``` 

 ### Why is it a problem? 

 This behavior makes it impossible to properly analyze which part of an application use the most memory. For instance when using `heap-profiler` on an app using `Bootsnap`, all objects created as a result of loading source file are attributed to bootsnap: 

 ``` 
 retained memory by gem 
 ----------------------------------- 
  351.64 MB    bootsnap-1.10.2 
 ``` 

 If this behaved as I expect, `heap-profiler` would be able to report how much each gem contribute to the app RAM usage. 

 ### Possible solution 

 I think `ObjectSpace` should have an API to override `get_trace_arg() / EC->trace_arg`, in the context of allocation tracing, so that `Kernel.require` and `RubyVM::InstructionSequence.load_from_binary` could set it to the source file they're loading. 

 ### Additional use cases? 

 A very similar issue is with objects created by static data parsers such as `YAML`, `JSON` etc. All the objects they created as part of the parsing is attributed to them. 

 So it would very useful if there was a Ruby API so that we could do something like this: 

 ```ruby 
 module YAMLAllocationTracing 
  def load_file(path, ...) 
    ObjectSpace.set_allocation_source(file: path, line: 1, class_path: :YAML, method_id: :load_file) do 
      super 
    end 
   end 
 end 
 YAML.singleton_class.prepend(YAMLAllocationTracing) 
 ```  

Back