Project

General

Profile

Actions

Bug #20314

open

Simultaneous Timeout expires may raise an exception after the block

Added by mame (Yusuke Endoh) 8 months ago. Updated 16 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:117003]

Description

Launchable reports TestTimeout#test_nested_timeout as a flaky test, and I reproduced it as follows.

require "timeout"

class A < Exception
end

class B < Exception
end

begin
  Timeout.timeout(0.1, A) do
    Timeout.timeout(0.1, B) do
      nil while true
    end
  end
rescue A, B
  p $! #=> #<A: execution expired>

  # Exception B is raised after the above call returns
  #=> test.rb:16:in `p': execution expired (B)

  p :end # not reach
end

This is because the timer thread performs two consecutive Thread#raise to the target thread.

I have discussed this with @ko1 (Koichi Sasada) and have come up with three solutions.

Solution 1

When multiple nested Timeouts expire simultaneously, raise an exception for the outer-most Timeout and let the inner Timeouts expire without throwing an exception. In the above example, it would only raise A.

The problem with this approach is that if you are rescuing A in the inner block, it may never ends:

Timeout.timeout(0.1, A) do
  Timeout.timeout(0.1, B) do
    begin
      sleep
    rescue A
      sleep # The exception A is caught. The inner Timeout is already expired, so the code (may) never end.
    end
  end
end

Note that, if A and B did not occur at the same time, it would raise B. This is a race condition.

Solution 2

When multiple nested Timeouts expire simultaneously, raise an exception for the inner-most Timeout and let the outer Timeouts wait until the inner-most Timeout returns. In the above example, it would raise either A or B, not both.

The problem with this approach is that if you are rescuing B in the inner block, it never ends:

Timeout.timeout(0.1, A) do
  Timeout.timeout(0.1, B) do
    begin
      sleep
    rescue B
      sleep # The outer Timeout waits for the inner timeout, and the inner Timeout never return. So this code never ends.
    end
  end
end

Solution 3

Make thread interrupt queue one length. If the target thread has already been Thread#raise(A), the new Thread#raise(B) blocks until the target thread processes A.

Since there will be no more simultaneous Thread#raise, there will be no more exceptions after the end of the block. The timeout timer thread should be changed in consideration that Thread#raise may block.

Updated by Eregon (Benoit Daloze) 8 months ago ยท Edited

I'm not sure how Solution 3 would work.
Thread#raise would block until what?
Until the exception started to be raised/thrown on that thread? I think that would not fix that snippet.
It does not seem reasonable to wait until the exception has been rescued (or escapes the thread) because that could run arbitrary code via ensure which could take a long time (and block the caller of Thread#raise for a long time).

I wonder if we should always use Timeout::ExitException to "unwind" until we exit the corresponding block given to Timeout.timeout.
Unsure if that would help for this issue though.

Maybe there are other solutions too?

Between solutions 1 and 2, 2 seems better because it seems clearly broken code to rescue B; sleep; end in Timeout.timeout(0.1, B).
And raising the innermost Timeout's exception seems more intuitive too (since that happens while executing the innermost Timeout block).
In any case sleep in ensure/rescue is broken code as it can already hang (e.g. with a single Timeout.timeout(0.1, SomeKlass) around it, or without using Timeout).

Updated by jeremyevans0 (Jeremy Evans) 8 months ago

Solution 2 makes the most sense to me. If inside a Timeout.timeout block, you are swallowing the exception that you provided in the Timeout.timeout method call, that to me indicates you do not want to handle it as a timeout.

Updated by mame (Yusuke Endoh) 8 months ago

Indeed Solution 3 seems to be wrong. I think I must have misunderstood what ko1 was saying.

Actually, I don't like neither Solutions 1 and 2. In the code example, I used exceptions A and B just for clarity. In most real-world cases, both are Timeout::Error. Swallowing Timeout::Error is a normal practice (to make Timeout.timeout end gracefully), so I don't think it is "broken".

I think this is a design flaw of Timeout's API, but I don't know what we can do for that. I guess we will just have to go with Solution 2 for now?

Actions #4

Updated by mame (Yusuke Endoh) 5 months ago

  • Status changed from Open to Closed

Applied in changeset git|2114d0af1e5790da365584a38ea7ee58670dc11b.


Make test_nested_timeouts less flaky

This test randomly fails due to the bug reported in [Bug #20314], where
the two timeouts are too close so that they can expire at the same time.

As a workaround, this change increases the time difference between
timeouts. This will reduce the probability of simultaneous expirations
and lower flakiness.

Actions #5

Updated by mame (Yusuke Endoh) 5 months ago

  • Status changed from Closed to Open
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0