Feature #19554
openInvalid memory access detected by Valgrind when using Fibers
Description
This issue was originally reported here: https://github.com/Shopify/ruby_memcheck/issues/14
Running the script shown below using valgrind --trace-children=yes --num-callers=500 ruby test.rb
outputs a lot of invalid memory access errors. I've shown a few sample errors below.
I am able to reproduce this issue on the master branch (commit 1e9a218ade), 3.2, 3.1, 3.0.
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "graphql"
end
module Example
class FooType < GraphQL::Schema::Object
field :id, ID, null: false
end
class FooSource < GraphQL::Dataloader::Source
def fetch(ids)
ids
end
end
class QueryType < GraphQL::Schema::Object
field :foo, Example::FooType do
argument :foo_id, GraphQL::Types::ID, required: false, loads: Example::FooType
end
def foo(foo: nil); end
end
class Schema < GraphQL::Schema
query Example::QueryType
use GraphQL::Dataloader
def self.object_from_id(id, ctx)
ctx.dataloader.with(Example::FooSource).request(id)
end
end
end
Example::Schema.execute(<<-GRAPHQL)
{
foo(fooId: "Other") {
id
}
}
GRAPHQL
==203957== Use of uninitialised value of size 8
==203957== at 0x3453FD: vm_exec_core (vm.inc:4411)
==203957== by 0x357EFB: rb_vm_exec (vm.c:2366)
==203957== by 0x354E44: invoke_block (vm.c:1384)
==203957== by 0x355759: invoke_iseq_block_from_c (vm.c:1440)
==203957== by 0x355759: invoke_block_from_c_proc (vm.c:1538)
==203957== by 0x355759: vm_invoke_proc (vm.c:1568)
==203957== by 0x355DF4: rb_vm_invoke_proc (vm.c:1589)
==203957== by 0x48F695: rb_fiber_start (cont.c:2513)
==203957== by 0x48CCF8: fiber_entry (cont.c:831)
==203957==
==203957== Invalid write of size 8
==203957== at 0x48C407: fiber_pool_stack_reset (cont.c:325)
==203957== by 0x48C4E9: fiber_pool_vacancy_reset (cont.c:364)
==203957== by 0x48CBB0: fiber_pool_stack_release (cont.c:752)
==203957== by 0x48CECF: fiber_stack_release (cont.c:874)
==203957== by 0x48FC9F: fiber_switch (cont.c:2726)
==203957== by 0x4901F9: fiber_resume_kw (cont.c:2906)
==203957== by 0x490235: rb_fiber_resume_kw (cont.c:2912)
==203957== by 0x4903B7: rb_fiber_m_resume (cont.c:2973)
==203957== by 0x3337D6: ractor_safe_call_cfunc_m1 (vm_insnhelper.c:3166)
==203957== by 0x33440A: vm_call_cfunc_with_frame_ (vm_insnhelper.c:3357)
==203957== by 0x3345E1: vm_call_cfunc_with_frame (vm_insnhelper.c:3385)
==203957== by 0x3398E5: vm_sendish (vm_insnhelper.c:5225)
==203957== by 0x341203: vm_exec_core (insns.def:835)
==203957== by 0x357EFB: rb_vm_exec (vm.c:2366)
==203957== by 0x354E44: invoke_block (vm.c:1384)
==203957== by 0x355759: invoke_iseq_block_from_c (vm.c:1440)
==203957== by 0x355759: invoke_block_from_c_proc (vm.c:1538)
==203957== by 0x355759: vm_invoke_proc (vm.c:1568)
==203957== by 0x355DF4: rb_vm_invoke_proc (vm.c:1589)
==203957== by 0x48F695: rb_fiber_start (cont.c:2513)
==203957== by 0x48CCF8: fiber_entry (cont.c:831)
==203957== Address 0x9bad008 is in a rw- anonymous segment
Updated by ioquatix (Samuel Williams) almost 2 years ago
Supporting Valgrind is a good idea. However, I actually prefer ASAN. But both are good tools.
I recently added ASAN support for fibers/coroutines: https://github.com/ioquatix/ruby/commit/901525b1079ac02da0122a76d8e4c3546a7f80f6
I also proposed to add it to the build matrix: https://github.com/ruby/ruby/pull/5939
You will probably find that without Valgrind knowing about the stack swapping, it won't work, you'll need to inform it when the stack is swapped.
I don't really want to maintain two different implementations of "stack swapping instrumentation". I suppose we could abstract it out, however, it's fairly non-trivial/invasive.
Therefore, it might be better to use ASAN over Valgrind to keep the code simpler. Or, if we decided Valgrind was better, we should remove ASAN, for example.
==203957== Invalid write of size 8
==203957== at 0x48C407: fiber_pool_stack_reset (cont.c:325)
==203957== by 0x48C4E9: fiber_pool_vacancy_reset (cont.c:364)
==203957== by 0x48CBB0: fiber_pool_stack_release (cont.c:752)
==203957== by 0x48CECF: fiber_stack_release (cont.c:874)
Regarding this specific write, this isn't an invalid write. It's written into a memory mapped area. I wonder why it's being flagged. "Address 0x9bad008 is in a rw- anonymous segment" is a valid area IMHO. The fiber's stack, if it's unused, is used for free list management. If you can tell me how we should inform Valgrind that this is a valid area to write into, I can adjust the code (similar to poison/unpoison memory).
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
Thanks for the reply @ioquatix (Samuel Williams)!
I haven't looked too much into ASAN, I'll look into it further. My main concern with ASAN is the need for Ruby to be compiled with it turned on and that additional code may need to be added into native extensions to support it. Both of these are barriers to entry compared to Valgrind.
In the Valgrind manual, it looks like there's a VALGRIND_MAKE_MEM_DEFINED
function. Maybe that can be used to mark the regions as valid.
Updated by ioquatix (Samuel Williams) almost 2 years ago
I haven't looked too much into ASAN, I'll look into it further. My main concern with ASAN is the need for Ruby to be compiled with it turned on and that additional code may need to be added into native extensions to support it. Both of these are barriers to entry compared to Valgrind.
It's unlikely that Valgrind will work with CRuby out of the box without Valgrind specific code e.g. for marking stacks, poisoning "freed" memory, etc. I am okay to help you try it. Just don't underestimate how complicated it can be to get all the right feedback. e.g. ASAN poison is used for GC to mark regions of the memory that should not be accessed. The same is true for fibers. It already all works.
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
It's unlikely that Valgrind will work with CRuby out of the box without Valgrind specific code e.g. for marking stacks, poisoning "freed" memory, etc.
Valgrind (without the memory leak detection feature, so only detecting invalid memory access) actually works very well with CRuby (excluding Fibers), which is why I opened this ticket.
For most C extensions, they are able to use Valgrind with memory leak detection using ruby_memcheck, which applies heuristics to remove false-positive memory leaks in Ruby.
In fact, issues like #18264 and #18936 were found by Valgrind.
Updated by mdalessio (Mike Dalessio) almost 2 years ago
I'm just here to support Peter's take: Valgrind is a very useful tool for C extension maintainers to find and prevent illegal memory access. It's been an incredibly valuable tool for me over the ~15 years I've been a maintainer, and is a key tool in Nokogiri's CI pipelines.
ASan is also useful! But because it's a build-time option, it's slightly less useful for debugging production builds. If it's not too difficult, I think Ruby should support both tools.
Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago
- Tracker changed from Bug to Feature
- Backport deleted (
2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN)