Bug #14939
closed[PATCH] cont.c (ec_switch): prevent delayed/missed trap interrupt race
Description
ko1: I noticed this while working on timer-thread elimination/lazy-spawning.
However, it looks like a bug we introduced in 2.5 with `ec'.
Can you check this patch? Thanks.
cont.c (ec_switch): prevent delayed/missed trap interrupt race
timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).
Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].
This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.
Files
Updated by ko1 (Koichi Sasada) over 6 years ago
Good point. Seems good. Thank you!
BTW, I forget that why the trap handler is limited to main thread.
trap(:INT) do
p Thread.current
end
Thread.new{
sleep 1
Process.kill :INT, $$
p [:child, Thread.current]
}.join
sleep 10
p [:main, Thread.current]
__END__
[:child, #<Thread:0x00000125801a3028@t.rb:5 run>]
#<Thread:0x00000125801e9c80 run>
# after 10 seconds
[:main, #<Thread:0x00000125801e9c80 run>]
Can we relax this limitation?
Updated by ko1 (Koichi Sasada) over 6 years ago
Oops, sorry forget about sleep
lines.
Updated by normalperson (Eric Wong) over 6 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r64062.
cont.c (ec_switch): prevent delayed/missed trap interrupt race
timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).
Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].
This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.
[ruby-core:88119] [Bug #14939]
Updated by normalperson (Eric Wong) over 6 years ago
ko1@atdot.net wrote:
Good point. Seems good. Thank you!
OK, commited as r64062
BTW, I forget that why the trap handler is limited to main thread.
I can think of at least three reasons:
-
Similar to this bug: other threads may exit soon,
main thread is long-lived so interrupt won't get lost -
Typical trap handlers are established at program startup,
before other threads exist, so maybe some are reliant on
Thread.current[] values inside trap (perhaps unknowingly
through libraries). -
Concurrency problem. Mutex isn't usable inside trap,
but there may be instances of reentrancy-safe code which
can't be made thread-safe.
Can we relax this limitation?
I don't think it's necessary; anybody who relies on trap will
likely structure main thread to deal with it (e.g. not block on
slow NFS IO#read :P). We can workaround 1), but I expect
compatibility and safety problems from 2) and 3).
Updated by nagachika (Tomoyuki Chikanaga) over 6 years ago
- Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: REQUIRED to 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: DONE
ruby_2_5 r64999 merged revision(s) 64062.