Bug #15362
Updated by alanwu (Alan Wu) almost 6 years ago
Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but it affects the default build configuration for MacOS and is causing segfaults on 2.5.x. I've put the test for this in a separate patch because I'm not sure if we want to have a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux. I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs. Please let me know if anything in my understanding is wrong. I've pasted my commit message below. ---- Fibers save execution contextes, and execution contexts include a native stack pointer. It may happen that a Fiber outlive the native thread it executed on. Consider the following code adapted from Bug #14561: ```ruby enum = Enumerator.new { |y| y << 1 } thread = Thread.new { enum.peek } # fiber constructed inside the # block and saved inside `enum` thread.join sleep 5 # thread finishes and thread cache wait time runs out. # Native thread exits, possibly freeing its stack. GC.start # segfault because GC tires to mark the dangling stack pointer # inside `enum`'s fiber ``` The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE, as those implementations already do what this commit does. Generally on Linux systems, FIBER_USE_NATIVE is 1 even when one uses `./configure --disable-fiber-coroutine`, since most Linux systems have getcontext() and setcontext() which turns on FIBER_USE_NATIVE. (compile with `make DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it) Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE are off, and the GC reads from the stack of a dead native thread, MRI does not segfault on Linux. This is probably due to libpthread not marking the page where the dead stack lives as unreadable. Nevertheless, this use-after-free is visible through Valgrind. On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE. Thread cache is also unavailable for 2.5.x, triggering this issue more often. (thread cache gives this bug a grace period since it makes native threads wait a little before exiting) This issue is very visible on MacOS on 2.5.x since libpthread marks the dead stack as unreadable, consistently turning this use-after-free into a segfault. Fixes Bug #14561 * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a fiber to keep the GC marking it. `saved_ec` gets rehydrated with a stack pointer if/when the fiber runs again.