Misc #16258
closed[PATCH] Combine call info and cache to speed up method invocation
Description
Proposed change: https://github.com/ruby/ruby/pull/2564
To perform a regular method call, the VM needs two structs, rb_call_info
and rb_call_cache
. At the moment, we allocate these two structures in
separate buffers. In the worst case, the CPU needs to read 4 cache lines to
complete a method call. Putting the two structures together reduces the
maximum number of cache line reads to 2.
Combining the structures also saves 8 bytes per call site as the current
layout uses separate pointers for the call info and the call cache. This
change saves about 2 MiB on Discourse.
The Optcarrot benchmark receives a performance improvement from this patch. I
collected the following results using make install
binaries compiled with
-DRUBY_NDEBUG
, with a sample size of 50 for each category:
master-a5245c | after patch | speed-up ratio | |
---|---|---|---|
plain | 42.39 | 50.17 | 18.35% |
jit | 71.72 | 72.73 | 1.41% |
These are medium FPS from the benchmark output. For raw benchmark results and
basic stats, see
https://gist.github.com/XrXr/ce5cb7cf2c3c4d29e58c919fa5c86b33. I took these
results with a i7-8750H CPU @ 2.20GHz on a 2018 MacBook Pro. I also ran the
benchmark with a AMD 2400G running Arch Linux and observed a 3% improvement
without the jit.
Complications¶
- A new instruction attribute
comptime_sp_inc
is introduced to calculate
SP increase at compile time without using call caches. At compile time, a
TS_CALLDATA
operand points to a call info struct, but at runtime, the
same operand points to a call data struct. Instruction that explicitly
definesp_inc
also need to definecomptime_sp_inc
. - MJIT code for copying call cache becomes slightly more complicated.
- This changes the bytecode format, which might break existing tools.
I think this patch offers a good general performance boost for a manageable amount
of code change.