Project

General

Profile

Bug #15362

Updated by alanwu (Alan Wu) over 3 years ago

Hello! I have a patch that fixes Bug #14561. It's not a platform specific issue but 
 it affects the default build configuration for MacOS and is causing segfaults on 2.5.x. 
 I've put the test for this in a separate patch because I'm not sure if we want to have 
 a 5 second test that only matters for non-default build configs and doesn't catch things reliably on Linux.   
 I tested this on both trunk and ruby_2_5, on MacOS and on Linux, on various build configs. 

   

 Please let me know if anything in my understanding is wrong. I've pasted my commit message below. 

 ---- 

 Fibers save execution contextes, and execution contexts include a native 
 stack pointer. It may happen that a Fiber outlive the native thread 
 it executed on. Consider the following code adapted from Bug #14561: 

 ```ruby 
 enum = Enumerator.new { |y| y << 1 } 
 thread = Thread.new { enum.peek }    # fiber constructed inside the 
                                    # block and saved inside `enum` 
 thread.join 
 sleep 5        # thread finishes and thread cache wait time runs out. 
              # Native thread exits, possibly freeing its stack. 
 GC.start       # segfault because GC tires to mark the dangling stack pointer 
              # inside `enum`'s fiber 

 ``` 

 The problem is masked by FIBER_USE_COROUTINE and FIBER_USE_NATIVE, 
 as those implementations already do what this commit does. 
 Generally on Linux systems, FIBER_USE_NATIVE is 1 even when 
 one uses `./configure --disable-fiber-coroutine`, since most 
 Linux systems have getcontext() and setcontext() which 
 turns on FIBER_USE_NATIVE. (compile with `make 
 DEFS="-DFIBER_USE_NATIVE=0" to explicitly disable it) 

 Furthermore, when both FIBER_USE_COROUTINE and FIBER_USE_NATIVE 
 are off, and the GC reads from the stack of a dead native 
 thread, MRI does not segfault on Linux. This is probably due to 
 libpthread not marking the page where the dead stack lives as 
 unreadable. Nevertheless, this use-after-free is visible through 
 Valgrind. 

 On ruby_2_5, this is an acute problem, since it doesn't have FIBER_USE_COROUTINE. 
 Thread cache is also unavailable for 2.5.x, triggering this issue 
 more often. (thread cache gives this bug a grace period since 
 it makes native threads wait a little before exiting) 

 This issue is very visible on MacOS on 2.5.x since libpthread marks 
 the dead stack as unreadable, consistently turning this use-after-free 
 into a segfault. 

 Fixes Bug #14561 

  * cont.c: Set saved_ec.machine.stack_end to NULL when switching away from a 
            fiber to keep the GC marking it. `saved_ec` gets rehydrated with a 
            stack pointer if/when the fiber runs again. 

Back