Bug #19473
opencan't be called from trap context (ThreadError) is too limiting
Description
Simple reproducer:
$ ruby -ve 'm=Mutex.new; trap(:HUP) { m.synchronize { p :OK } }; Process.kill :HUP, Process.pid; sleep 0.1'
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]
-e:1:in `synchronize': can't be called from trap context (ThreadError)
from -e:1:in `block in <main>'
from -e:1:in `kill'
from -e:1:in `<main>'
Expected behavior:
$ ruby -ve 'm=Mutex.new; trap(:HUP) { m.synchronize { p :OK } }; Process.kill :HUP, Process.pid; sleep 0.1'
truffleruby 22.3.1, like ruby 3.0.3, GraalVM CE Native [x86_64-linux]
:OK
$ ruby -ve 'm=Mutex.new; trap(:HUP) { m.synchronize { p :OK } }; Process.kill :HUP, Process.pid; sleep 0.1'
jruby 9.4.0.0 (3.1.0) 2022-11-23 95c0ec159f OpenJDK 64-Bit Server VM 17.0.6+10 on 17.0.6+10 +jit [x86_64-linux]
:OK
This exception is highly problematic, for instance it breaks Timeout.timeout
in trap
:
https://github.com/ruby/timeout/issues/17#issuecomment-1142035939
I suppose this behavior is because sometimes it's problematic to lock a Mutex in trap, e.g., if it's already locked by the main thread/fiber.
But that would otherwise already raise deadlock; recursive locking (ThreadError)
, so there is no point to fail early.
And that's just one case, not all, so we should not always raise an exception.
There seems to be no valid reason to prevent all Mutex#synchronize
in trap
.
After all, if the Mutex for instance is only used in trap
, it's well-defined AFAIK.
For instance a given trap handler does not seem executed concurrently:
$ ruby -ve 'trap(:HUP) { puts "in trap\n"+caller.join("\n")+"\n\n"; sleep 0.1 }; pid = Process.pid; Process.wait fork { 20.times { Process.kill :HUP, pid } }; sleep 1'
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]
in trap
-e:1:in `wait'
-e:1:in `<main>'
in trap
-e:1:in `wait'
-e:1:in `<main>'
in trap
-e:1:in `wait'
-e:1:in `<main>'
in trap
-e:1:in `wait'
-e:1:in `<main>'
in trap
-e:1:in `wait'
-e:1:in `<main>'
in trap
-e:1:in `wait'
-e:1:in `<main>'
And if the trap handler using the Mutex is never called while the Mutex is held by the main thread/fiber, there is also no problem.
Updated by Eregon (Benoit Daloze) over 1 year ago
Also Monitor
has the same problem on CRuby:
$ ruby -ve 'm=Monitor.new; trap(:HUP) { m.synchronize { p :OK } }; Process.kill :HUP, Process.pid; sleep 0.1'
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]
-e:1:in `synchronize': can't be called from trap context (ThreadError)
from -e:1:in `block in <main>'
from -e:1:in `kill'
from -e:1:in `<main>'
And again it works fine on truffleruby and jruby
And since Monitor
is reentrant there is no reason at all (AFAIK) to prevent it in trap
.
Updated by Eregon (Benoit Daloze) over 1 year ago
- ruby -v set to ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]
Updated by Eregon (Benoit Daloze) over 1 year ago
Another approach would be to do it like e.g. Java which runs signal handlers on a separate thread.
Then there is no potential problem of the signal handler being reentrant to the main thread/fiber.
But that is a much bigger semantic change of course, so I think just removing the can't be called from trap context (ThreadError)
is much more actionable.
Updated by ioquatix (Samuel Williams) over 1 year ago
I was under the impression that very few operations are safe to call from a signal handler. Is that not the case here?
Updated by Eregon (Benoit Daloze) over 1 year ago
ioquatix (Samuel Williams) wrote in #note-5:
I was under the impression that very few operations are safe to call from a signal handler. Is that not the case here?
I think you are thinking about C signal handlers and C async-safety. This is not relevant here, the Ruby signal handler doesn't run inside the C signal handler.
Updated by ko1 (Koichi Sasada) over 1 year ago
The current limitation is introduced to protect user from dead-lock error like that:
m = Mutex.new
trap(:INT){m.synchronizaton{ do_something } }
m.lock; sleep
# C-c here
will cause deadlock and it is hard to predict and coding for it. ( https://github.com/ruby/timeout/issues/17#issuecomment-1461498517 is one example)
Another approach would be to do it like e.g. Java which runs signal handlers on a separate thread.
Yes. There was a discussion to introduce a thread for signal handlers. But it was not introduced maybe because it is too much for many cases.
akr had proposed that if someone want to use such case, how about to make a thread in a trap handler for workaround. I think it is reasonable.
Updated by Eregon (Benoit Daloze) over 1 year ago
Right, like I mentioned above.
But the deadlock might not happen (or even cannot happen for some cases I mentioned above), and then CRuby fails needlessly.
If the deadlock does happen, then we would get this error: deadlock; recursive locking (ThreadError)
. Which means the program doesn't hang and there is a stacktrace and it shows relevant information which is useful to fix it.
So I think it is just a mistake to fail early with can't be called from trap context (ThreadError)
.
In fact CRuby even cheats this rule for IO I saw.
There are fully correct use cases for using a Mutex inside trap
as I mentioned in the description.
Updated by Eregon (Benoit Daloze) over 1 year ago
https://github.com/ruby/timeout/issues/17#issuecomment-1464039853 seems a proof this restriction is wrong, and actually prevents a correct solution to that issue.
Also this restriction makes no sense for Monitor, since that's reentrant by design.
In fact, one could make their own Mutex using Queue, since Queue is allowed in trap handlers to workaround this restriction. It doesn't make any sense to reimplement Mutex in such an inefficient way of course.
Updated by Eregon (Benoit Daloze) over 1 year ago
ko1 (Koichi Sasada) wrote in #note-7:
Yes. There was a discussion to introduce a thread for signal handlers. But it was not introduced maybe because it is too much for many cases.
I think this is ultimately the proper solution to this problem.
It's extremely hard to reason about a signal handler running on top of any line of code of the main thread.
akr had proposed that if someone want to use such case, how about to make a thread in a trap handler for workaround. I think it is reasonable.
I cannot really use that here, or I would need an efficient way to know I'm in a trap handler (I suppose I could catch the can't be called from trap context (ThreadError)
but that's really ugly and likely quite slow).
Because Timeout doesn't define the trap handler, the user does.
And asking every user to workaround like that doesn't seem good, that actually creates more threads than a single signal-handling thread created by Ruby.
Updated by ko1 (Koichi Sasada) over 1 year ago
Eregon (Benoit Daloze) wrote in #note-8:
Right, like I mentioned above.
But the deadlock might not happen (or even cannot happen for some cases I mentioned above), and then CRuby fails needlessly.
If the deadlock does happen, then we would get this error:deadlock; recursive locking (ThreadError)
. Which means the program doesn't hang and there is a stacktrace and it shows relevant information which is useful to fix it.
My understandings are:
(1) It is hard to detect such deadlock risk because such errors occur with with very low frequency.
(2) It is hard to expect that the Mutex users care about signal handlers.
(3) It is hard to modify fixing with signal handlers.
So I think current limitation makes sense.
akr had proposed that if someone want to use such case, how about to make a thread in a trap handler for workaround. I think it is reasonable.
I cannot really use that here, or I would need an efficient way to know I'm in a trap handler (I suppose I could catch the can't be called from trap context (ThreadError) but that's really ugly and likely quite slow).
Because Timeout doesn't define the trap handler, the user does.
And asking every user to workaround like that doesn't seem good, that actually creates more threads than a single signal-handling thread created by Ruby.
akr's idea is not about Timeout library, but for Timeout users in trap handers.
If someone wants to use Timeout (or other complex/thread-safety needed code), making a thread like trap(...){Thread.new{ ... }}
seems feasible workaround.
Updated by Eregon (Benoit Daloze) over 1 year ago
ko1 (Koichi Sasada) wrote in #note-11:
My understandings are:
(1) It is hard to detect such deadlock risk because such errors occur with with very low frequency.
(2) It is hard to expect that the Mutex users care about signal handlers.
(3) It is hard to modify fixing with signal handlers.So I think current limitation makes sense.
No, it doesn't make sense, because it actually prevents the proper fix for TIMEOUT_THREAD_MUTEX
for Timeout:
https://github.com/ruby/timeout/issues/17#issuecomment-1464039853
So the limitation is unsound (it prevents valid and correct usages) and so we must remove it.
Also it doesn't catch try_lock
and Queue#pop
, etc, so it's very rough and inaccurate.
We could make it a $VERBOSE = true
warning maybe, but again it would be false positives and impossible to address, so it seems not good.
Another idea would be being able to mark which Mutex are "trap-safe", maybe with an argument to Mutex#initialize.
Sort of like how IO does it.
ko1 (Koichi Sasada) wrote in #note-11:
akr's idea is not about Timeout library, but for Timeout users in trap handers.
If someone wants to use Timeout (or other complex/thread-safety needed code), making a thread liketrap(...){Thread.new{ ... }}
seems feasible workaround.
But users might use some gem or some library code in trap handler and they might not know whether that uses Timeout or not.
And even if they know, it should just work, without needing any workaround from the user.
My conclusion is doing nothing is unacceptable, this CRuby limitation breaks valid and correct code such as https://github.com/ruby/timeout/issues/17#issuecomment-1464039853.
So either:
- We remove this unsound limitation, same as on other Rubies
- We add an argument to Mutex#initialize to mark it as trap-safe
- We execute trap handlers on a separate thread
Which one do we choose?
I think 3 is the best (much easier to reason about), but of course there is some potential for incompatibility there.
1 or 2 seem easy, so we could do them fast and maybe even backport it.
Updated by ko1 (Koichi Sasada) over 1 year ago
No, it doesn't make sense, because it actually prevents the proper fix for TIMEOUT_THREAD_MUTEX for Timeout:
I think this reason violates (1).
Updated by Eregon (Benoit Daloze) over 1 year ago
ko1 (Koichi Sasada) wrote in #note-13:
I think this reason violates (1).
What do you mean?
AFAIK https://github.com/ruby/timeout/issues/17#issuecomment-1464039853 is correct.
If it's incorrect, please point the mistake in my reasoning.
We can't have understanding (1)
by using solution 1.
(remove the trap+Mutex check), indeed those are exclusive.
IMO it's still worth it because we are always breaking when it's only rarely a problem.
So CRuby is causing hard failures for things that might just work.
Here is a list of your 3 understandings and how the 3 solutions address them, I also add (0):
(0) Fixes soundness, i.e. doesn't fail when the code correctly handles signal handler reentrancy
Solutions:
- fixes (0)
- fixes (0), (1) because no change there, (2) because opt-in
- fixes (0), (1), (2), (3)