Feature #21353
closedAdd shape_id to RBasic under 32 bit
Description
Currently on 64bit systems, for every types, the shape_id
is stored inside the RBasic.flags
field, and is 32bit
long.
However, on 32bit systems like i686 and WASM, it is much more complicated.
For T_OBJECT
, T_CLASS
and T_MODULE
, the shape_id
is stored as part of "user flags" in FL_USER4-19
,
and for all other types it's stored alongside the instance variable in the generic_fields_tbl
, which means
a hash lookup is required to access it.
This situation makes a lot of routine noticeably more complicated, with numerous codepath taken only by 32bit systems,
because to avoid doing two hash-lookup per ivar access, the code need a lot of contortions.
You can look for SHAPE_IN_BASIC_FLAGS
to have an idea of the added complexity.
In addition, it forces us to duplicate some bits of information. For instance RUBY_FL_FREEZE
is redundant with the
shape_id
. The shape already record that the object is frozen, and the only reason RUBY_FL_FREEZE
hasn't been
eliminated is because on 32bits the shape_id
isn't store inline for some objects.
Similarly, RUBY_FL_EXIVAR
is redundant with the shape_id
, because to know whether an object has ivars, you can
simply check if shape_id == 0
.
Reclaiming these bits would be very useful for Ractors, as we'd need two bits in objects to be able to implement
lightwieght locks.
Yet another complication, is that on 32bit systems, the shape_id
is only 16bits long.
I have the project to use the upper bits of the shape_id
to store metadata, such as the frozen
and too_complex
status, allowing to test for this without chasing a pointer: https://github.com/ruby/ruby/pull/13289.
But this currently can't be done on 32bit sytems, both because accessing the shape_id
might require a hash-lookup,
and also because it's only 16bits long, so every single bit used for tagging severely restrict the maximum number of
shapes.
Proposal¶
To simplify all this, we propose that on 32bit systems, we add a VALUE shape_id
in RBasic
:
struct RBasic {
VALUE flags;
const VALUE klass;
#if RBASIC_SHAPE_ID_FIELD
VALUE shape_id;
#endif
}
This ensure that on 32bits, all objects have their shape_id
always at the same predictable offset, and 32bits long.
As you can see on the pull request, it simplify the code quite significantly: https://github.com/ruby/ruby/pull/13341,
and there's more cleanup that can be done.
The downside obviously is that on 32bit, objects would grow from 20B
to 24B
.
Pull Request¶
You can find the proposed patch at: https://github.com/ruby/ruby/pull/13341
cc @tenderlovemaking (Aaron Patterson) and @jhawthorn (John Hawthorn)
Also FYI @katei (Yuta Saito) because as the maintainer of WASM I assume this impact you the most.