Bug #17666
closedThread#join hangs when Fiber.set_scheduler is set
Added by arjunmdas (arjun das) over 4 years ago. Updated over 4 years ago.
Description
class MockScheduler
  def block(blocker, timeout = nil)
    byebug
  end
  def close
    byebug
  end
  def fiber(&block)
    byebug
    Fiber.new(blocking: false, &block).tap(&:resume)
  end
  def io_wait(io, events, timeout)
    byebug
  end
  def kernel_sleep(duration=nil)
    byebug
    Fiber.yield
  end
  def process_wait(pid, flags)
    byebug
  end
  def unblock(blocker, fiber)
    byebug
  end
end
  
Fiber.set_scheduler(MockScheduler.new)
t1 = Thread.new do
       p 'before'
       sleep 1
       p 'after'
     end
t1.join
Code hangs at this point.
        
           Updated by arjunmdas (arjun das) over 4 years ago
          
          
        
        
          
            Actions
          
          #1
            [ruby-core:102689]
          Updated by arjunmdas (arjun das) over 4 years ago
          
          
        
        
          
            Actions
          
          #1
            [ruby-core:102689]
        
      
      - Subject changed from Sleep in a thread hangs when Fiber.set_scheduler is set to Thread#join hangs when Fiber.set_scheduler is set
class MockScheduler
  def block(blocker, timeout = nil)
    byebug
  end
  def close
    byebug
  end
  def fiber(&block)
    byebug
    Fiber.new(blocking: false, &block).tap(&:resume)
  end
  def io_wait(io, events, timeout)
    byebug
  end
  def kernel_sleep(duration=nil)
    byebug
    Fiber.yield
  end
  def process_wait(pid, flags)
    byebug
  end
  def unblock(blocker, fiber)
    byebug
  end
end
Fiber.set_scheduler(MockScheduler.new)
t1 = Thread.new do
       p 'before'
       p 'after'
     end
t1.join
Code hangs a this point.
        
           Updated by xtkoba (Tee KOBAYASHI) over 4 years ago
          
          
        
        
          
            Actions
          
          #2
            [ruby-core:102691]
          Updated by xtkoba (Tee KOBAYASHI) over 4 years ago
          
          
        
        
          
            Actions
          
          #2
            [ruby-core:102691]
        
      
      This issue seems already resolved in the latest development version. (I confirmed that it reproduces in 3.0.0p0 release version on x86_64-linux.)
        
           Updated by arjunmdas (arjun das) over 4 years ago
          
          
        
        
          
            Actions
          
          #3
            [ruby-core:102692]
          Updated by arjunmdas (arjun das) over 4 years ago
          
          
        
        
          
            Actions
          
          #3
            [ruby-core:102692]
        
      
      Thanks a lot for confirming.
I will check the latest development version as well. Are you aware of any other tracking bug for this?
        
           Updated by xtkoba (Tee KOBAYASHI) over 4 years ago
          
          
        
        
          
            Actions
          
          #4
            [ruby-core:102693]
          Updated by xtkoba (Tee KOBAYASHI) over 4 years ago
          
          
        
        
          
            Actions
          
          #4
            [ruby-core:102693]
        
      
      This issue seems fixed in 5f69a7f60467fa58c2f998daffab43e118bff36c. I'm not sure if there is a ticket here. There is a related PR in GitHub: https://github.com/ruby/ruby/pull/3945
        
           Updated by arjunmdas (arjun das) over 4 years ago
          
          
        
        
          
            Actions
          
          #5
            [ruby-core:102694]
          Updated by arjunmdas (arjun das) over 4 years ago
          
          
        
        
          
            Actions
          
          #5
            [ruby-core:102694]
        
      
      Thanks again.
Just curious, how did you identify the right PR that fixed this?
        
           Updated by xtkoba (Tee KOBAYASHI) over 4 years ago
          
          
        
        
          
            Actions
          
          #6
            [ruby-core:102696]
          Updated by xtkoba (Tee KOBAYASHI) over 4 years ago
          
          
        
        
          
            Actions
          
          #6
            [ruby-core:102696]
        
      
      user:arjunmdas I just searched the git log with the keyword "fiber". The PR number comes from the commit message.
        
           Updated by jeremyevans0 (Jeremy Evans) over 4 years ago
          
          
        
        
          
            Actions
          
          #7
          Updated by jeremyevans0 (Jeremy Evans) over 4 years ago
          
          
        
        
          
            Actions
          
          #7
        
      
      - Status changed from Open to Closed
        
           Updated by arjunmdas (arjun das) over 4 years ago
          
          
        
        
          
            Actions
          
          #8
            [ruby-core:102734]
          Updated by arjunmdas (arjun das) over 4 years ago
          
          
        
        
          
            Actions
          
          #8
            [ruby-core:102734]
        
      
      xtkoba (Tee KOBAYASHI) - Re-verified in the latest development. It's fixed. Thanks for the update.
.
        
           Updated by ioquatix (Samuel Williams) over 4 years ago
          
          
        
        
          
            Actions
          
          #9
            [ruby-core:104604]
          Updated by ioquatix (Samuel Williams) over 4 years ago
          
          
        
        
          
            Actions
          
          #9
            [ruby-core:104604]
        
      
      I think we should consider isolating this for backport to 3.0.3 if possible.
This three lines:
https://github.com/ruby/ruby/blob/v3_0_2/thread.c#L547-L549
        if (target_thread->scheduler != Qnil) {
            rb_scheduler_unblock(target_thread->scheduler, target_thread->self, rb_fiberptr_self(join_list->fiber));
        } else {
should become:
        if (target_thread->scheduler != Qnil && rb_fiberptr_blocking(join_list->fiber) == 0) {
            rb_scheduler_unblock(target_thread->scheduler, target_thread->self, rb_fiberptr_self(join_list->fiber));
        }
We would need to backport rb_fiberptr_blocking from cont.c too:
unsigned int rb_fiberptr_blocking(struct rb_fiber_struct *fiber)
{
    return fiber->blocking;
}
There are some other usage in thread_sync.c which might be candidate for backport. Check https://github.com/ruby/ruby/blob/master/thread_sync.c for usage of rb_fiberptr_blocking. These are all bugs because it can cause the fiber scheduler to be invoked when it shouldn't causing indefinite hang.
        
           Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago
          
          
        
        
          
            Actions
          
          #10
            [ruby-core:104605]
          Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago
          
          
        
        
          
            Actions
          
          #10
            [ruby-core:104605]
        
      
      - Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.7: DONTNEED, 3.0: REQUIRED
I don't have enough time to see details of the request just now. I filled the Backport field to recall myself to this ticket later.
        
           Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago
          
          
        
        
          
            Actions
          
          #11
            [ruby-core:104635]
          Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago
          
          
        
        
          
            Actions
          
          #11
            [ruby-core:104635]
        
      
      Hello Samuel,
I have created a backport patch according to your suggestions.
https://github.com/nagachika/ruby/commit/c2697018d4d8cad7ea80ca6aa57f65d76072053c
Are there anything else should be included?
Would you review my patch please?
        
           Updated by ioquatix (Samuel Williams) over 4 years ago
          
          
        
        
          
            Actions
          
          #12
            [ruby-core:104638]
          Updated by ioquatix (Samuel Williams) over 4 years ago
          
          
        
        
          
            Actions
          
          #12
            [ruby-core:104638]
        
      
      @nagachika (Tomoyuki Chikanaga) thanks so much for your effort here. I have a local test case which can easily fail without this fix, so I'll try your PR to confirm it fixes the issue.
        
           Updated by ioquatix (Samuel Williams) over 4 years ago
          
          
        
        
          
            Actions
          
          #13
            [ruby-core:104639]
          Updated by ioquatix (Samuel Williams) over 4 years ago
          
          
        
        
          
            Actions
          
          #13
            [ruby-core:104639]
        
      
      I have made a PR to your PR.
https://github.com/nagachika/ruby/pull/1
This adds a test case which fails without this backport.
        
           Updated by ioquatix (Samuel Williams) over 4 years ago
          
          
        
        
          
            Actions
          
          #14
            [ruby-core:104640]
          Updated by ioquatix (Samuel Williams) over 4 years ago
          
          
        
        
          
            Actions
          
          #14
            [ruby-core:104640]
        
      
      I also tested it against async main branch which can hang on 3.0.2, but passes on 3.0.2 + your PR. So, I can confirm this fixes the issue that was discussed here.
        
           Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago
          
          
        
        
          
            Actions
          
          #15
            [ruby-core:104660]
          Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago
          
          
        
        
          
            Actions
          
          #15
            [ruby-core:104660]
        
      
      Thank you for your confirmation and adding a test!
I will create a patch for ruby_3_0 branch based on your pull request.
        
           Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago
          
          
        
        
          
            Actions
          
          #16
            [ruby-core:104670]
          Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago
          
          
        
        
          
            Actions
          
          #16
            [ruby-core:104670]
        
      
      - Backport changed from 2.7: DONTNEED, 3.0: REQUIRED to 2.7: DONTNEED, 3.0: DONE
ruby_3_0 95dc88c88869541dd0eccafd14924d78c8d7f427 merged revision(s) 5f69a7f60467fa58c2f998daffab43e118bff36c.