Feature #21853
closedMake Embedded TypedData a public API
Description
As part of Ruby 3.3, we added a private RUBY_TYPED_EMBEDDABLE flag to the TypedData API to allow TypedData to use variable width allocation.
Technically, we inadvertently exposed that flag in public headers so third party extensions can make use of it, but it's not considered public API as it's not documented, so it would be a poor decision.
This API has both memory and speed benefits as it allow to avoid some malloc/free churn, reduce pointer chasing, etc.
For instance, when we converted Time to be embedded, it improved allocation performance by 30% and also reduced memory usage by 20%: https://github.com/ruby/ruby/commit/aa6642de630cfc10063154d84e45a7bff30e9103
I believe numerous third party native extensions could benefit from it (I would certainly make use of it in ruby/json),
now that we used it internally for several years, I'd like to work on making it a public API for Ruby 4.1
Updated by Eregon (Benoit Daloze) 2 months ago
I'm thinking about this in the context of TruffleRuby, where RTypedData never moves (it's allocated via system calloc()).
I think the best then would be to ignore this new flag entirely, and so the public API should be done in a way that it can be implemented as if it's not embedded.
Related: https://github.com/truffleruby/truffleruby/issues/4130
So on TruffleRuby I think we could always use the same allocation for the RTypedData + data struct, when using TypedData_Make_Struct(), effectively the same as embedded TypedData but never moving.
But not when using TypedData_Wrap_Struct() since that uses an existing data pointer.
Updated by byroot (Jean Boussier) 2 months ago
So on TruffleRuby I think we could always use the same allocation for the RTypedData + data struct, when using TypedData_Make_Struct(), effectively the same as embedded TypedData but never moving.
I don't think so, because you still need to support DATA_PTR(obj) = ptr, which isn't allowed for embedded typed datas.
Updated by Eregon (Benoit Daloze) 2 months ago
· Edited
Good point! How do embedded typed datas handle this, do they raise an exception in such a case?
Seems tricky given the DATA_PTR(obj) API returning a pointer.
I'd actually love if we had a separate API for changing the data pointer as a macro or function (e.g. RTYPEDDATA_SET_DATA(obj, new_data_pointer) to follow RTYPEDDATA_GET_DATA), so we know better when it can be changed.
Currently we have to workaround in TruffleRuby that after every native call that accesses a T_DATA we have to check if the data pointer has changed :/
Of course we wouldn't be able to remove DATA_PTR() yet, but we could maybe deprecate it and/or at some point make it return a const pointer or so to prevent writes.
Updated by byroot (Jean Boussier) 2 months ago
How do embedded typed datas handle this, do they raise an exception in such a case?
Unfortunately not. It end up with data corruption.
I'd actually love if we had a separate API for changing the data pointer as a macro or function
Makes sense.
Updated by Eregon (Benoit Daloze) 19 days ago
· Edited
One tricky aspect about RUBY_TYPED_EMBEDDABLE is if in the struct there is a pointer to inside that struct then those pointers will become invalid when the object is moved.
Is there a way to handle that correctly to update such pointers? (EDIT: it seems not looking at this).
If the struct is ever passed to a native library I would consider it extremely dangerous to use RUBY_TYPED_EMBEDDABLE.
Overall it sounds quite error-prone, also considering there is no safeguard to avoid writing to DATA_PTR, so I'm not sure it's appropriate to expose this to user extensions.
EDIT: and it also needs RB_GC_GUARD() calls to avoid the object moving or being freed while on the stack as shown in this commit. Those are notoriously easy to forget.
Updated by Eregon (Benoit Daloze) 19 days ago
Eregon (Benoit Daloze) wrote in #note-5:
also considering there is no safeguard to avoid writing to
DATA_PTR
One idea to address this (but not the other 2 concerns) would be to raise on DATA_PTR() for RUBY_TYPED_EMBEDDABLE, and only allow RTYPEDDATA_GET_DATA().
Updated by byroot (Jean Boussier) 12 days ago
- Status changed from Open to Closed
Applied in changeset git|305b563ec974b554b2c00d2724c62a4abe99acc7.
Expose and document RUBY_TYPED_EMBEDDABLE
[Feature #21853]
Updated by Eregon (Benoit Daloze) 11 days ago
Were the various safety concerns mentioned above (https://bugs.ruby-lang.org/issues/21853#note-5, https://bugs.ruby-lang.org/issues/21853#note-6) even discussed in the meeting?
It seems not according to the log: https://github.com/ruby/dev-meeting-log/blob/master/2026/DevMeeting-2026-03-17.md#feature-21853-make-embedded-typeddata-a-public-api-byroot
So we are giving a new API to users that can easily cause data corruption (due to writing to DATA_PTR), out-of-bounds access (due to compaction) and subtle GC-related bugs (due to lack of RB_GC_GUARD) and probably has no checks currently against any of those, only docs?
Seems a recipe for nasty segfaults to me.
cc @ko1 (Koichi Sasada) @matz (Yukihiro Matsumoto)
At least we should check the parts we can, I'm adding some comments on the PR: https://github.com/ruby/ruby/pull/16455
Updated by larskanis (Lars Kanis) 9 days ago
I found the original PR which added RUBY_TYPED_EMBEDDABLE very readable and expressive: https://github.com/ruby/ruby/pull/7440/changes
It allowed me understand and use the feature in swig and fxruby without any docs.
With the latest changes in https://github.com/ruby/ruby/pull/16455 , https://github.com/ruby/ruby/pull/16509 , https://github.com/ruby/ruby/pull/16518 I think it is easy and clear how to use it safely.
Updated by Eregon (Benoit Daloze) 8 days ago
· Edited
With the follow-ups PRs (linked by Lars above) for improving the docs and adding a (ruby-debug-only since it's RUBY_ASSERT) check for RTYPEDDATA_DATA, I'm OK with this.
It's still a bit dangerous and people using it need to be careful, notably about RB_GC_GUARD and not storing pointers to or into the struct anywhere, but there is no way around that to provide this optimization.