Bug #18886
closedStruct aref and aset don't trigger any tracepoints.
Description
Given the following program, thing.name
and thing.shape
don't trigger c_call
trace points (or any trace points actually).
pp RUBY_VERSION
trace_point = TracePoint.new(:line, :call, :c_call, :a_call) do |trace|
puts trace.event
if trace.event == :call
# Ruby doesn't always mark call-sites in sub-expressions, so we use this approach to compute a call site and mark it:
if location = caller_locations(2, 1).first and path = location.path
puts "> #{path}:#{location.lineno}:#{trace.event}"
end
end
if path = trace.path
puts "= #{path}:#{trace.lineno}:#{trace.event}"
end
end
trace_point.enable
# This will trigger call trace points
class Thing
def name
:cat
end
def shape
:square
end
end
thing = Thing.new
# Thing = Struct.new(:name, :shape)
# thing = Thing.new(:cat, :rectangle)
[
name: thing.name,
shape: thing.shape,
]
Current HEAD¶
= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:
Proposed PR¶
= ../test.rb:30:line:
= ../test.rb:30:c_call:new
= ../test.rb:30:c_call:inherited
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:singleton_method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:method_added
= ../test.rb:30:c_call:const_added
= ../test.rb:31:line:
= ../test.rb:31:c_call:new
= ../test.rb:31:c_call:initialize
= ../test.rb:34:line:
= ../test.rb:34:c_call:name
= ../test.rb:35:c_call:shape
The reason is the internal implementation of struct doesn't have trace point instrumentation in vm_call_opt_struct_aset
or vm_call_opt_struct_aref
.
Proposed fix: https://github.com/ruby/ruby/pull/6071 but this would need a review, maybe @jeremyevans0 (Jeremy Evans) and @ko1 (Koichi Sasada) can help.
Updated by ioquatix (Samuel Williams) over 2 years ago
- Description updated (diff)
Updated by ioquatix (Samuel Williams) over 2 years ago
Here is my performance comparison:
Firstly, with no changes (should be identical, shows some variance).
> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort)
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi | 149.515M| 147.470M|
| | 1.01x| -|
|vm_struct_big_aref_lo | 148.916M| 147.628M|
| | 1.01x| -|
|vm_struct_big_aset | 7.006| 7.080|
| | -| 1.01x|
|vm_struct_big_href_hi | 27.652M| 27.382M|
| | 1.01x| -|
|vm_struct_big_href_lo | 27.547M| 27.725M|
| | -| 1.01x|
|vm_struct_big_hset | 3.049| 3.094|
| | -| 1.01x|
|vm_struct_small_aref | 144.031M| 136.167M|
| | 1.06x| -|
|vm_struct_small_aset | 7.032| 7.063|
| | -| 1.00x|
|vm_struct_small_href | 30.694M| 30.432M|
| | 1.01x| -|
|vm_struct_small_hset | 28.590M| 28.083M|
| | 1.02x| -|
Now with my PR:
> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort)
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi | 147.665M| 115.510M|
| | 1.28x| -|
|vm_struct_big_aref_lo | 146.666M| 116.298M|
| | 1.26x| -|
|vm_struct_big_aset | 7.098| 6.801|
| | 1.04x| -|
|vm_struct_big_href_hi | 27.608M| 27.957M|
| | -| 1.01x|
|vm_struct_big_href_lo | 27.521M| 27.937M|
| | -| 1.02x|
|vm_struct_big_hset | 3.092| 3.113|
| | -| 1.01x|
|vm_struct_small_aref | 147.494M| 113.729M|
| | 1.30x| -|
|vm_struct_small_aset | 6.898| 6.840|
| | 1.01x| -|
|vm_struct_small_href | 31.004M| 31.547M|
| | -| 1.02x|
|vm_struct_small_hset | 28.433M| 28.527M|
| | -| 1.00x|
With changes suggested by @ko1 (Koichi Sasada):
> make benchmark ITEM=vm_struct COMPARE_RUBY=(which ruby)
/home/samuel/.rubies/ruby-head/bin/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/home/samuel/.rubies/ruby-head/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v $(find ../benchmark -maxdepth 1 -name 'vm_struct' -o -name '*vm_struct*.yml' -o -name '*vm_struct*.rb' | sort)
compare-ruby: ruby 3.2.0dev (2022-06-29T00:53:14Z master 3051d4bc2a) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-06-29T02:51:15Z struct-tracepoints 12e97af1d2) [x86_64-linux]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi | 146.836M| 115.141M|
| | 1.28x| -|
|vm_struct_big_aref_lo | 147.929M| 115.918M|
| | 1.28x| -|
|vm_struct_big_aset | 7.079| 6.802|
| | 1.04x| -|
|vm_struct_big_href_hi | 27.495M| 27.933M|
| | -| 1.02x|
|vm_struct_big_href_lo | 27.396M| 27.584M|
| | -| 1.01x|
|vm_struct_big_hset | 3.128| 3.095|
| | 1.01x| -|
|vm_struct_small_aref | 142.123M| 113.700M|
| | 1.25x| -|
|vm_struct_small_aset | 6.995| 6.755|
| | 1.04x| -|
|vm_struct_small_href | 30.561M| 31.424M|
| | -| 1.03x|
|vm_struct_small_hset | 28.353M| 28.650M|
| | -| 1.01x|
It seems like 25-30% performance impact, but this is still a very fast operation. Is there any way we can make this faster? Could we have tracing and non-tracing op-codes which call different functions? e.g. vm_call_opt_struct_aref
and vm_call_opt_struct_aref_trace
for example.
Updated by ioquatix (Samuel Williams) over 2 years ago
With some minor changes the best performance I could get is around 20% cost:
| |compare-ruby|built-ruby|
|:----------------------|-----------:|---------:|
|vm_struct_big_aref_hi | 149.051M| 121.849M|
| | 1.22x| -|
|vm_struct_big_aref_lo | 147.353M| 121.167M|
| | 1.22x| -|
|vm_struct_big_aset | 7.107| 6.469|
| | 1.10x| -|
|vm_struct_big_href_hi | 27.081M| 28.269M|
| | -| 1.04x|
|vm_struct_big_href_lo | 27.642M| 27.630M|
| | 1.00x| -|
|vm_struct_big_hset | 3.106| 3.150|
| | -| 1.01x|
|vm_struct_small_aref | 139.761M| 113.250M|
| | 1.23x| -|
|vm_struct_small_aset | 7.086| 6.775|
| | 1.05x| -|
|vm_struct_small_href | 30.977M| 31.479M|
| | -| 1.02x|
|vm_struct_small_hset | 28.565M| 28.554M|
| | 1.00x| -|
Updated by Eregon (Benoit Daloze) over 2 years ago
- Related to Bug #16383: TracePoint does not report calls to attribute reader methods added
Updated by ioquatix (Samuel Williams) over 2 years ago
Rather than using branch for tracepoints, using a nop/jmp
instruction which is set or cleared might be a lower impact option.
Updated by jeremyevans0 (Jeremy Evans) about 1 year ago
I submitted a pull request to fix this: https://github.com/ruby/ruby/pull/8782
Updated by mame (Yusuke Endoh) about 1 year ago
@jeremyevans0 (Jeremy Evans) Could you measure the overhead?
Updated by jeremyevans0 (Jeremy Evans) about 1 year ago
mame (Yusuke Endoh) wrote in #note-7:
@jeremyevans0 (Jeremy Evans) Could you measure the overhead?
Using the benchmark in the pull request, the pull request seems reliably ~5% faster in my testing for the member reader method, but mixed for other cases (sometimes up to 3% faster/slower depending on run). I don't have any explanation as to why it would be faster, so I'm going to assume that the difference is still within the margin of error.
I don't think this will affect performance unless TracePoint is used, because the same method caching is done in both cases.
Updated by ko1 (Koichi Sasada) about 1 year ago
Please merge it.
I don't think this will affect performance unless TracePoint is used, because the same method caching is done in both cases.
Showing benchmarking result will help IMO to confirm there is no penalty.
Updated by jeremyevans (Jeremy Evans) about 1 year ago
- Status changed from Open to Closed
Applied in changeset git|3081c83169c55ef7eead6222e49248e09232c22c.
Support tracing of struct member accessor methods
This follows the same approach used for attr_reader/attr_writer in
2d98593bf54a37397c6e4886ccc7e3654c2eaf85, skipping the checking for
tracing after the first call using the call cache, and clearing the
call cache when tracing is turned on/off.
Fixes [Bug #18886]