Bug #20001
closedMake Ruby work properly with ASAN enabled
Description
This ticket covers some work I want to do to get the ASAN infrastructure working in Ruby working again. I don't know if it ever worked well, but if it did, it appears to have bitrotted. Here are a few of its current problems:
Stack size calculations are wrong¶
Ruby takes the address of a local variable as part of the process of working out the top/bottom of the native stack in native_thread_init_stack
. Because ASAN can end up putting some local variables on a "fake stack", this calculation can wind up producing the wrong result and setting th->ec->machine.stack_start
incorrectly. This then leads to stack_check
thinking that the machine stack has overflowed all the time, and thus, leading to programs like the following to fail:
ASAN_OPTIONS=use_sigaltstack=0:detect_leaks=0 ./miniruby -e 'Thread.new { puts "hi" }.join'
#<Thread:0x00007fb5d79f3f28 -e:1 run> terminated with exception (report_on_exception is true):
SystemStackError
-e: stack level too deep (SystemStackError)
Another consequence of stack size detection not working properly is that the machine stack is not properly marked during GC, so things on the stack which should be considered live get prematurely collected.
ASAN provides the __asan_addr_is_in_fake_stack
function which can be used to get the address of a local variable on the real stack; I think Ruby's various stack-detecting macros could then make use of this to make it work.
VALUEs in fake stacks are not marked¶
Another consequece of ASAN storing local variables in fake stacks is that we don't see them when doing the machine stack mark. Again, the __asan_addr_is_in_fake_stack
function can help us here. ASAN leaves a pointer to the fake stack on the real stack in every frame. When marking the machine stack, we can check each word to see if it's a pointer to a fake stack frame, and then use __asan_addr_is_in_fake_stack
to get the extents of the fake frame and scan that too.
This seems to be e.g. how V8 does it
Doesn't work with GCC¶
Our ASAN implementation doesn't work with GCC, even though GCC supports ASAN. This is because we use the __has_feature(address_sanitizer)
macro in sanitizers.h, which is a clang-ism. The equivalent GCCism is __SANITIZE_ADDRESS__
and we should check that too.
Plan of attack¶
At the moment, I can't even run a full build of ruby to run, because miniruby crashees during the build process. My plan of attack here is to:
- Address those known problems I've already identified above
- Get
make
to actually work with asan - Try running the test suite through ASAN, and fix any issues that turns up
- I'm thinking we should add an
--enable-asan
or--enable-address-sanitizer
or some such to our configure script, to make it easy to build Ruby with ASAN without having to poke around with individual CFLAGS/LDFLAGS - Eventually, it would be great to actually run the tests under ASAN in CI
This is probably a medium term body of work, but I'll try and tackle it in bits.
Also: @HParker (Adam Hess) and @peterzhu2118 (Peter Zhu) - I know you folks have been working on getting Valgrind to work better with Ruby, for leak detection. I think I see my efforts here as complementary to yours, rather than duplicative. The ASAN infrastructure for poisoning/unpoisoning stuff in the GC already exists and is close to working properly, and it really did help me solve a bug yesterday (https://bugs.ruby-lang.org/issues/19994), so it seems useful and should be made to work. Your work on freeing memory on shutdown (https://bugs.ruby-lang.org/issues/19993) should actually help ASAN usefully detect leaks as well. I think ASAN might be better for eventually running CI checks against the full Ruby test suite, since allegedly it's faster. However, if you think solving these issues with ASAN is a waste of time and Valgrind can catch the same bugs already, please chime in!