Project

General

Profile

ActionsLike1

Bug #19436

closed

Call Cache for singleton methods can lead to "memory leaks"

Added by byroot (Jean Boussier) about 2 years ago. Updated over 1 year ago.


Description

Using "memory leaks" with quotes, because strictly speaking the memory isn't leaked, but it can nonetheless lead to large memory overheads.

Minimal Reproduction

module Foo
  def bar
  end
end

def call_bar(obj)
  # Here the call cache we'll keep a ref on the method_entry
  # which then keep a ref on the singleton_class, making that
  # instance immortal until the method is called again with
  # another instance.

  # The reference chain is IMEMO(callcache) -> IMEMO(ment) -> ICLASS -> CLASS(singleton) -> OBJECT
  obj.bar
end

obj = Object.new
obj.extend(Foo)

call_bar(obj)

id = obj.object_id
obj = nil

4.times { GC.start }
p ObjectSpace._id2ref(id)

Explanation

Call caches keep a strong reference onto the "callable method entry" (CME), which itself keeps a strong reference on the called object class
and in the cache of a singleton class, it keeps a strong reference onto the attached_object (instance).

This means that any call site that calls a singleton method, will effectively keep a strong reference onto the last receiver.
If the method is frequently called it's not too bad, but if it's infrequently called, it's effectively a (bounded) memory leak.
And if the attached_object is big, the wasted memory can be very substantial.

Practical Implications

Once relative common API impacted by this is Rails' extending API. This API allow to extend a "query result set" with a module.
These query results set can sometimes be very big, especially since they keep references to the instantiated ActiveRecord::Base instances etc.

Possible Solutions

Only keep a weak reference to the CME

The fairly "obvious" solution is to keep a weak reference to the CME, that's what I explored in https://github.com/ruby/ruby/pull/7272, and it seems to work.
However in debug mode It does fail on an assertion during compaction, but it's isn't quite clear to me what the impact is.

Additionally, something that makes me think this would be the right solution, is that call caches already try to avoid marking the class:

# vm_callinfo.h:275

struct rb_callcache {
    const VALUE flags;

    /* inline cache: key */
    const VALUE klass; // should not mark it because klass can not be free'd
                       // because of this marking. When klass is collected,
                       // cc will be cleared (cc->klass = 0) at vm_ccs_free().

So it appears that the class being also marked through the CME is some kind of oversight?

Don't cache based on some heuristics

If the above isn't possible or too complicated, an alternative would be to not cache CMEs found in singleton classes, except if it's the the singleton class of a Class or Module.

It would make repeated calls to such methods slower, but the assumption is that it's unlikely that these CME would live very long.

Make Class#attached_object a weak reference

Alternatively we could make the attached_object a weak reference, which would drastically limit the amount of memory that may be leaked in such scenario.

The downside is that Class#attached_object was very recently exposed in Ruby 3.2.0, so it means changing its semantic a bit.

cc @peterzhu2118 (Peter Zhu) @ko1 (Koichi Sasada)


Related issues 2 (0 open2 closed)

Related to Ruby - Feature #19783: Weak References in the GCClosedActions
Related to Ruby - Bug #20485: Simple use of Fiber makes GC leak objects with singleton methodClosedActions

Added by Ruby (Elena Mangano) over 1 year ago

Revision c330037c

cc->cme should not be marked.

cc is callcache.

cc->klass (klass) should not be marked because if the klass is
free'ed, the cc->klass will be cleared by vm_cc_invalidate().

cc->cme (cme) should not be marked because if cc is invalidated
when cme is free'ed.

  • klass marks cme if klass uses cme.
  • caller classe's ccs->cme marks cc->cme.
  • if cc is invalidated (klass doesn't refer the cc),
    cc is invalidated by vm_cc_invalidate() and cc->cme is
    not be accessed.
  • On the multi-Ractors, cme will be collected with global GC
    so that it is safe if GC is not interleaving while accessing
    cc and cme.

fix [Bug #19436]

10_000.times{|i|
  # p i if (i%1_000) == 0

  str = "x" * 1_000_000
  def str.foo = nil
  eval "def call#{i}(s) = s.foo"
  send "call#{i}", str
}

Without this patch:

real    1m5.639s
user    0m6.637s
sys     0m58.292s

and with this patch:

real    0m2.045s
user    0m1.627s
sys     0m0.164s
ActionsLike1

Also available in: Atom PDF