Bug #12298
closedIndeterministic ruby behavior when another thread is killed
Description
#!ruby
require 'securerandom'
class MyThread < ::Thread; end
def delay
  15
end
def run
  loop { work }
rescue Exception => e
  puts "#{Time.now} Exception"
ensure
  puts "#{Time.now} stopping agent"
end
def work
  puts "#{Time.now} start work"
  10_000_000.times { SecureRandom.hex }
  puts "finished work"
rescue StandardError => e
  puts "#{Time.now} Error"
ensure
  puts "#{Time.now} start sleep"
  sleep(delay)
  puts "#{Time.now} finished sleep"
end
t = MyThread.new{ run }
at_exit do
  puts "#{Time.now} killing thread"
  Thread.kill(t)
  puts "#{Time.now} killed thread"
end
sleep(10)
exit
I tried running this script multiple times in ruby 2.1.10, 2.2.4, 2.3.0 and I get inconsistent behavior. Sometimes the main thread does not wait for the second thread.
2016-04-15 11:07:09 +0200 start work
2016-04-15 11:07:19 +0200 killing thread
2016-04-15 11:07:19 +0200 killed thread
2016-04-15 11:07:19 +0200 stopping agent
And sometimes it does.
2016-04-15 11:07:26 +0200 start work
2016-04-15 11:07:36 +0200 killing thread
2016-04-15 11:07:36 +0200 killed thread
2016-04-15 11:07:36 +0200 start sleep
2016-04-15 11:07:51 +0200 finished sleep
2016-04-15 11:07:51 +0200 stopping agent
I appears that under higher CPU usage the 2nd scenario is more likely. In normal conditions the 1st happens more often probably.
I described the whole story in my blogpost
I am not sure which behavior is ruby default (I assume not waiting for other threads) but sometimes apparently ruby does wait for other threads to finish.
Files
        
           Updated by rupert (Robert Pankowecki) over 9 years ago
          Updated by rupert (Robert Pankowecki) over 9 years ago
          
          
        
        
      
      Related Honeybadger bug: https://github.com/honeybadger-io/honeybadger-ruby/issues/186
        
           Updated by drbrain (Eric Hodel) over 9 years ago
          Updated by drbrain (Eric Hodel) over 9 years ago
          
          
        
        
      
      - File a.txt a.txt added
- File b.txt b.txt added
- File 12998.rb 12998.rb added
- Status changed from Open to Rejected
This is not a ruby bug. Thread scheduling is inherently non-deterministic.
Sometimes you'll switch to the work thread before reaching rb_thread_terminate_all which allows the ensure to run, sometimes you won't.
I've attached a reduced test case and logs from ruby built with THREAD_DEBUG=1
Honeybadger should probably not sleep in ensure as a way to reduce thrashing when work finishes (I'm guessing that's what they use the sleep for) and should use a different mechanism instead.
        
           Updated by shevegen (Robert A. Heiler) over 9 years ago
          Updated by shevegen (Robert A. Heiler) over 9 years ago
          
          
        
        
      
      Hmm. Although the report was already rejected, and even if we all may agree that
the honeybadger code was not brilliant, I feel that the overall issue here in
regards to Threads may be useful for more people in the future too.
For instance, without the blog explanation, where else would you find that much
information about ruby code used for "real"? From the official documentation
of Threads?
http://ruby-doc.org/core-2.3.0/Thread.html
The documentation is not bad at all, mind you, but the blog semi-taught me
more than the documentation would.
There may also be small improvements. We have instance methods like:
"See also the instance methods alive? and stop?"
In the code he checked whether the thread was aborting:
Thread.current.status == "aborting" 
This could be simplified if the ruby code would allow
for this check:
Thread.current.aborting?
Or perhaps even
Thread.aborting?
(I do not really know Threads that well that I can suggest an API
that makes sense / is logical.)
Without that block, I would probably have never been able to figure
out that a thread is not just alive or dead but may be in between
the two like the schroedinger cat.
        
           Updated by rupert (Robert Pankowecki) over 9 years ago
          Updated by rupert (Robert Pankowecki) over 9 years ago
          
          
        
        
      
      Eric Hodel wrote:
This is not a ruby bug. Thread scheduling is inherently non-deterministic.
Sometimes you'll switch to the work thread before reaching rb_thread_terminate_all which allows the ensure to run, sometimes you won't.
I get that. But why does Ruby wait for the second thread to finish? Or is it that Ruby always waits for those threads to finish, but the Thread gets un-catchable Exception and only runs the ensure blocks and that's what is happening?
        
           Updated by drbrain (Eric Hodel) over 9 years ago
          Updated by drbrain (Eric Hodel) over 9 years ago
          
          
        
        
      
      In ruby_cleanup:
First we mark the main thread as killed: https://github.com/ruby/ruby/blob/trunk/eval.c#L186
Then we kill all threads: https://github.com/ruby/ruby/blob/trunk/eval.c#L191
Then we start at_exit hooks: (via ruby_finalize_1) https://github.com/ruby/ruby/blob/trunk/eval.c#L222
Sometimes ruby switches to the killed thread right away, so it sleeps in the ensure in your test case. It can't switch back to the main thread because it is marked as killed.
Sometimes ruby doesn't switch to the killed thread so it gets two signals and never sleeps (because the second arrives in the ensure when it tries to sleep).