Bug #20314
openSimultaneous Timeout expires may raise an exception after the block
Description
Launchable reports TestTimeout#test_nested_timeout
as a flaky test, and I reproduced it as follows.
require "timeout"
class A < Exception
end
class B < Exception
end
begin
Timeout.timeout(0.1, A) do
Timeout.timeout(0.1, B) do
nil while true
end
end
rescue A, B
p $! #=> #<A: execution expired>
# Exception B is raised after the above call returns
#=> test.rb:16:in `p': execution expired (B)
p :end # not reach
end
This is because the timer thread performs two consecutive Thread#raise
to the target thread.
I have discussed this with @ko1 (Koichi Sasada) and have come up with three solutions.
Solution 1¶
When multiple nested Timeouts expire simultaneously, raise an exception for the outer-most Timeout and let the inner Timeouts expire without throwing an exception. In the above example, it would only raise A.
The problem with this approach is that if you are rescuing A in the inner block, it may never ends:
Timeout.timeout(0.1, A) do
Timeout.timeout(0.1, B) do
begin
sleep
rescue A
sleep # The exception A is caught. The inner Timeout is already expired, so the code (may) never end.
end
end
end
Note that, if A and B did not occur at the same time, it would raise B. This is a race condition.
Solution 2¶
When multiple nested Timeouts expire simultaneously, raise an exception for the inner-most Timeout and let the outer Timeouts wait until the inner-most Timeout returns. In the above example, it would raise either A or B, not both.
The problem with this approach is that if you are rescuing B in the inner block, it never ends:
Timeout.timeout(0.1, A) do
Timeout.timeout(0.1, B) do
begin
sleep
rescue B
sleep # The outer Timeout waits for the inner timeout, and the inner Timeout never return. So this code never ends.
end
end
end
Solution 3¶
Make thread interrupt queue one length. If the target thread has already been Thread#raise(A)
, the new Thread#raise(B)
blocks until the target thread processes A.
Since there will be no more simultaneous Thread#raise, there will be no more exceptions after the end of the block. The timeout timer thread should be changed in consideration that Thread#raise
may block.
Updated by Eregon (Benoit Daloze) 10 months ago ยท Edited
I'm not sure how Solution 3 would work.
Thread#raise
would block until what?
Until the exception started to be raised/thrown on that thread? I think that would not fix that snippet.
It does not seem reasonable to wait until the exception has been rescued (or escapes the thread) because that could run arbitrary code via ensure
which could take a long time (and block the caller of Thread#raise
for a long time).
I wonder if we should always use Timeout::ExitException
to "unwind" until we exit the corresponding block given to Timeout.timeout
.
Unsure if that would help for this issue though.
Maybe there are other solutions too?
Between solutions 1 and 2, 2 seems better because it seems clearly broken code to rescue B; sleep; end
in Timeout.timeout(0.1, B)
.
And raising the innermost Timeout's exception seems more intuitive too (since that happens while executing the innermost Timeout block).
In any case sleep
in ensure
/rescue
is broken code as it can already hang (e.g. with a single Timeout.timeout(0.1, SomeKlass)
around it, or without using Timeout).
Updated by jeremyevans0 (Jeremy Evans) 10 months ago
Solution 2 makes the most sense to me. If inside a Timeout.timeout
block, you are swallowing the exception that you provided in the Timeout.timeout
method call, that to me indicates you do not want to handle it as a timeout.
Updated by mame (Yusuke Endoh) 10 months ago
Indeed Solution 3 seems to be wrong. I think I must have misunderstood what ko1 was saying.
Actually, I don't like neither Solutions 1 and 2. In the code example, I used exceptions A and B just for clarity. In most real-world cases, both are Timeout::Error
. Swallowing Timeout::Error
is a normal practice (to make Timeout.timeout
end gracefully), so I don't think it is "broken".
I think this is a design flaw of Timeout's API, but I don't know what we can do for that. I guess we will just have to go with Solution 2 for now?
Updated by mame (Yusuke Endoh) 7 months ago
- Status changed from Open to Closed
Applied in changeset git|2114d0af1e5790da365584a38ea7ee58670dc11b.
Make test_nested_timeouts less flaky
This test randomly fails due to the bug reported in [Bug #20314], where
the two timeouts are too close so that they can expire at the same time.
As a workaround, this change increases the time difference between
timeouts. This will reduce the probability of simultaneous expirations
and lower flakiness.