Feature #21722
openExpose rb_gc_mark_weak API for use in extensions
Description
In https://bugs.ruby-lang.org/issues/21710 it came up that
-
On top of deprecating _id2ref on Ruby 4.0, it's a bad idea to be using object_id from the NEWOBJ tracepoint
-
rb_gc_mark_weakwhich would be the alternative for an extension that needs weak reference-like behavior is not available for extensions
So I've opened this ticket to request exposing rb_gc_mark_weak so it can be used by extensions?
The Datadog Ruby profiler is currently using object_id and id2ref to implement its "heap profiling" -- that is, we have a NEWOBJ tracepoint, and from time to time (e.g. not for every object), we select an object, and track its lifetime by keeping its id and checking from time to time if it's still alive.
We're using this approach instead of:
-
The FREEOBJ event => Reduced overhead, as we don't need to be called for every object (+ not needing to deal with corner cases of when FREEOBJ may not be called for an object)
-
WeakMap => Weakmap APIs are Ruby-level and need the GVL, and thus make it hard to use from low-level tracepoints and to avoid overhead by doing profiler work with the GVL released.
For our purposes, it would be OK if this API is not "official" -- e.g. if it's one of those that gets exposed as a public symbol but not documented and no promises made for future Ruby releases.
Updated by peterzhu2118 (Peter Zhu) 25 minutes ago
Hi, author of rb_gc_mark_weak here.
I think it would be good to have such an API available. However, I don't think the current API is it. This is because the API is very tricky to use with incremental marking. Since incremental marking splits marking into several steps interleaved with Ruby code execution, it's possible that the state of the object changes after it has been marked. But since rb_gc_mark_weak operates on pointers, the underlying memory of the pointers may have been freed or realloced. This is why rb_gc_remove_weak exists and also why the ST tables in WeakMap/WeakKeyMap is an ST table that has keys and values that point to malloc memory containing the actual keys and values.
I have proposed #21084 (implemented in this PR) that I think may be an easier to use API.
Updated by ivoanjo (Ivo Anjo) 12 minutes ago
Thanks for the hint, Peter! I had not spotted https://bugs.ruby-lang.org/issues/21084 :)
Based on your notes it looks like exposing a rb_gc_declare_weak_references is a much better choice 👍