Bug #5343
closedUnexpected blocking behavior when interrupt Socket#accept
Description
In CentOS release 5.6 (Kernel: 2.6.18-238.12.1.el5, glibc 2.5),
the following sample script rarely (about once every 1000) blocks at Thread#join with 1.9.3-head.
require "socket"
require "thread"
queue = Queue.new
th = Thread.start {
s = TCPServer.new(10000)
queue.push(nil)
cli = s.accept
}
queue.pop
th.kill.join
Backtrace:
thread-1:
#0 0x0000003a3500aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x000000000052d81b in native_cond_wait (th=0x19551550, timeout_tv=0x0)
at ../ruby-1.9.3/thread_pthread.c:307
#2 native_sleep (th=0x19551550, timeout_tv=0x0)
at ../ruby-1.9.3/thread_pthread.c:908
#3 0x000000000052f9ab in sleep_forever (th=0x19551550, deadlockable=1)
at ../ruby-1.9.3/thread.c:855
#4 0x000000000052fa4d in thread_join_sleep (arg=140733838958496)
at ../ruby-1.9.3/thread.c:688
#5 0x0000000000417abb in rb_ensure (b_proc=0x52fa00 <thread_join_sleep>,
data1=140733838958496, e_proc=0x5288e0 <remove_from_join_list>,
data2=140733838958496) at ../ruby-1.9.3/eval.c:736
#6 0x000000000052a78e in thread_join (argc=,
argv=, self=)
at ../ruby-1.9.3/thread.c:721
#7 thread_join_m (argc=, argv=,
self=) at ../ruby-1.9.3/thread.c:802
#8 0x0000000000524b0d in vm_call_cfunc (th=0x19551550, cfp=0x2b38cc37df08,
num=, blockptr=,
flag=, id=, me=0x196663c0,
recv=427644560) at ../ruby-1.9.3/vm_insnhelper.c:404
#9 vm_call_method (th=0x19551550, cfp=0x2b38cc37df08,
num=, blockptr=,
flag=, id=, me=0x196663c0,
recv=427644560) at ../ruby-1.9.3/vm_insnhelper.c:530
#10 0x000000000051908d in vm_exec_core (th=0x19551550,
initial=) at ../ruby-1.9.3/insns.def:1015
#11 0x000000000051ed7e in vm_exec (th=0x19551550) at ../ruby-1.9.3/vm.c:1220
#12 0x0000000000525f9f in rb_iseq_eval_main (iseqval=427473840)
at ../ruby-1.9.3/vm.c:1461
#13 0x0000000000414c22 in ruby_exec_internal (n=0x197abbb0)
at ../ruby-1.9.3/eval.c:204
#14 0x00000000004172d4 in ruby_exec_node (n=)
at ../ruby-1.9.3/eval.c:251
#15 ruby_run_node (n=) at ../ruby-1.9.3/eval.c:244
#16 0x0000000000414689 in main (argc=2, argv=0x7fff267aa588)
at ../ruby-1.9.3/main.c:38
thread-2:
#0 0x0000003a344cb696 in poll () from /lib64/libc.so.6
#1 0x00000000005301ba in ppoll (fd=,
events=, tv=0x0) at ../ruby-1.9.3/thread.c:2820
#2 rb_wait_for_single_fd (fd=,
events=, tv=0x0) at ../ruby-1.9.3/thread.c:2849
#3 0x000000000053052c in rb_thread_wait_fd_rw (fd=5)
at ../ruby-1.9.3/thread.c:2686
#4 rb_thread_wait_fd (fd=5) at ../ruby-1.9.3/thread.c:2699
#5 0x00002aaaab0b7b6f in rsock_s_accept (klass=427743720, fd=5,
sockaddr=, len=0x40473a3c)
at ../../../ruby-1.9.3/ext/socket/init.c:499
#6 0x00002aaaab0c3310 in tcp_accept (sock=)
at ../../../ruby-1.9.3/ext/socket/tcpserver.c:55
#7 0x0000000000524b0d in vm_call_cfunc (th=0x197ffe90, cfp=0x2aaaab3d3f08,
num=, blockptr=,
flag=, id=, me=0x1980ba70,
recv=427644480) at ../ruby-1.9.3/vm_insnhelper.c:404
#8 vm_call_method (th=0x197ffe90, cfp=0x2aaaab3d3f08,
num=, blockptr=,
flag=, id=, me=0x1980ba70,
recv=427644480) at ../ruby-1.9.3/vm_insnhelper.c:530
#9 0x000000000051908d in vm_exec_core (th=0x197ffe90,
initial=) at ../ruby-1.9.3/insns.def:1015
#10 0x000000000051ed7e in vm_exec (th=0x197ffe90) at ../ruby-1.9.3/vm.c:1220
#11 0x000000000051fad5 in invoke_block_from_c (th=0x197ffe90,
block=, self=, argc=0,
argv=, blockptr=, cref=0x0)
at ../ruby-1.9.3/vm.c:624
#12 0x000000000052026f in rb_vm_invoke_proc (th=0x197ffe90, proc=0x196d72a0,
self=425420560, argc=0, argv=0x197d56c8, blockptr=0x0)
at ../ruby-1.9.3/vm.c:670
#13 0x000000000052f5c1 in thread_start_func_2 (th=0x197ffe90,
stack_start=) at ../ruby-1.9.3/thread.c:453
#14 0x000000000052f75e in thread_start_func_1 (th_ptr=0x197ffe90)
at ../ruby-1.9.3/thread_pthread.c:656
#15 0x0000003a3500673d in start_thread () from /lib64/libpthread.so.0
#16 0x0000003a344d44bd in clone () from /lib64/libc.so.6
Updated by nagachika (Tomoyuki Chikanaga) about 13 years ago
Hi,
I've found that this issue is not specific problem of TCPServer#accept.
The following script also rarely blocks forever.
require "thread"
queue = Queue.new
r, w = IO.pipe
th = Thread.start {
queue.push(nil)
r.read 1
}
queue.pop
th.kill.join
I guess this is because SIGVTALRM from send from ubf_select() is received by blocking thread after GVL release but before enter poll(2)/ppoll(2).
Updated by kosaki (Motohiro KOSAKI) about 13 years ago
- Status changed from Open to Assigned
- Assignee set to ko1 (Koichi Sasada)
- Priority changed from Normal to 5
Aghh. This is regression since 1.9.3.
Updated by ko1 (Koichi Sasada) about 13 years ago
- ruby -v changed from ruby 1.9.3dev (2011-09-17 revision 33290) [i686-linux] to -
(2011/09/20 2:22), Tomoyuki Chikanaga wrote:
Issue #5343 has been updated by Tomoyuki Chikanaga.
Hi,
I've found that this issue is not specific problem of TCPServer#accept.
The following script also rarely blocks forever.
require "thread"
queue = Queue.new
r, w = IO.pipe
th = Thread.start {
queue.push(nil)
r.read 1
}
queue.pop
th.kill.joinI guess this is because SIGVTALRM from send from ubf_select() is received by blocking thread after GVL release but before enter poll(2)/ppoll(2).
Thanks. I understand. How about it?
Index: thread_pthread.c¶
--- thread_pthread.c (revision 33272)
+++ thread_pthread.c (working copy)
@@ -1013,6 +1013,7 @@
{
rb_thread_t *th = (rb_thread_t *)ptr;
add_signal_thread_list(th);
- rb_thread_wakeup_timer_thread(); /* activate timer thread */
ubf_select_each(th);
}
--
// SASADA Koichi at atdot dot net
Updated by normalperson (Eric Wong) about 13 years ago
SASADA Koichi ko1@atdot.net wrote:
(2011/09/20 2:22), Tomoyuki Chikanaga wrote:
I guess this is because SIGVTALRM from send from ubf_select() is received by blocking thread after GVL release but before enter poll(2)/ppoll(2).
Thanks. I understand. How about it?
I'm not sure issues like this are completely avoidable, but your patch
seems to help avoid it.
I needed ~15K tries to reproduce blocking on the original test case
(no SMP on this Xen VM). So far I'm at 65K loop interations and still
going... I will update if I notice it blocked again.
Updated by nagachika (Tomoyuki Chikanaga) about 13 years ago
Hi, sasada san.
With your patch, over 60000 times trials pass successfully.
Your patch works fine for me too.
Thank you.
Updated by ko1 (Koichi Sasada) about 13 years ago
Thanks Eric and Chikanaga-san.
Could you commit it? Maybe 1.9.3 also needs this patch. But I'm not
sure how to commit frozen 1.9.3 branch.
Regards,
Koichi
(2011/09/20 17:25), Tomoyuki Chikanaga wrote:
Issue #5343 has been updated by Tomoyuki Chikanaga.
Hi, sasada san.
With your patch, over 60000 times trials pass successfully.
Your patch works fine for me too.
Thank you.Bug #5343: Unexpected blocking behavior when interrupt Socket#accept
http://redmine.ruby-lang.org/issues/5343Author: Tomoyuki Chikanaga
Status: Assigned
Priority: High
Assignee: Koichi Sasada
Category: ext
Target version: 1.9.3
ruby -v: -In CentOS release 5.6 (Kernel: 2.6.18-238.12.1.el5, glibc 2.5),
the following sample script rarely (about once every 1000) blocks at Thread#join with 1.9.3-head.require "socket"
require "thread"queue = Queue.new
th = Thread.start {
s = TCPServer.new(10000)
queue.push(nil)
cli = s.accept
}queue.pop
th.kill.joinBacktrace:
thread-1:
#0 0x0000003a3500aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x000000000052d81b in native_cond_wait (th=0x19551550, timeout_tv=0x0)
at ../ruby-1.9.3/thread_pthread.c:307
#2 native_sleep (th=0x19551550, timeout_tv=0x0)
at ../ruby-1.9.3/thread_pthread.c:908
#3 0x000000000052f9ab in sleep_forever (th=0x19551550, deadlockable=1)
at ../ruby-1.9.3/thread.c:855
#4 0x000000000052fa4d in thread_join_sleep (arg=140733838958496)
at ../ruby-1.9.3/thread.c:688
#5 0x0000000000417abb in rb_ensure (b_proc=0x52fa00 <thread_join_sleep>,
data1=140733838958496, e_proc=0x5288e0 <remove_from_join_list>,
data2=140733838958496) at ../ruby-1.9.3/eval.c:736
#6 0x000000000052a78e in thread_join (argc=,
argv=, self=)
at ../ruby-1.9.3/thread.c:721
#7 thread_join_m (argc=, argv=,
self=) at ../ruby-1.9.3/thread.c:802
#8 0x0000000000524b0d in vm_call_cfunc (th=0x19551550, cfp=0x2b38cc37df08,
num=, blockptr=,
flag=, id=, me=0x196663c0,
recv=427644560) at ../ruby-1.9.3/vm_insnhelper.c:404
#9 vm_call_method (th=0x19551550, cfp=0x2b38cc37df08,
num=, blockptr=,
flag=, id=, me=0x196663c0,
recv=427644560) at ../ruby-1.9.3/vm_insnhelper.c:530
#10 0x000000000051908d in vm_exec_core (th=0x19551550,
initial=) at ../ruby-1.9.3/insns.def:1015
#11 0x000000000051ed7e in vm_exec (th=0x19551550) at ../ruby-1.9.3/vm.c:1220
#12 0x0000000000525f9f in rb_iseq_eval_main (iseqval=427473840)
at ../ruby-1.9.3/vm.c:1461
#13 0x0000000000414c22 in ruby_exec_internal (n=0x197abbb0)
at ../ruby-1.9.3/eval.c:204
#14 0x00000000004172d4 in ruby_exec_node (n=)
at ../ruby-1.9.3/eval.c:251
#15 ruby_run_node (n=) at ../ruby-1.9.3/eval.c:244
#16 0x0000000000414689 in main (argc=2, argv=0x7fff267aa588)
at ../ruby-1.9.3/main.c:38thread-2:
#0 0x0000003a344cb696 in poll () from /lib64/libc.so.6
#1 0x00000000005301ba in ppoll (fd=,
events=, tv=0x0) at ../ruby-1.9.3/thread.c:2820
#2 rb_wait_for_single_fd (fd=,
events=, tv=0x0) at ../ruby-1.9.3/thread.c:2849
#3 0x000000000053052c in rb_thread_wait_fd_rw (fd=5)
at ../ruby-1.9.3/thread.c:2686
#4 rb_thread_wait_fd (fd=5) at ../ruby-1.9.3/thread.c:2699
#5 0x00002aaaab0b7b6f in rsock_s_accept (klass=427743720, fd=5,
sockaddr=, len=0x40473a3c)
at ../../../ruby-1.9.3/ext/socket/init.c:499
#6 0x00002aaaab0c3310 in tcp_accept (sock=)
at ../../../ruby-1.9.3/ext/socket/tcpserver.c:55
#7 0x0000000000524b0d in vm_call_cfunc (th=0x197ffe90, cfp=0x2aaaab3d3f08,
num=, blockptr=,
flag=, id=, me=0x1980ba70,
recv=427644480) at ../ruby-1.9.3/vm_insnhelper.c:404
#8 vm_call_method (th=0x197ffe90, cfp=0x2aaaab3d3f08,
num=, blockptr=,
flag=, id=, me=0x1980ba70,
recv=427644480) at ../ruby-1.9.3/vm_insnhelper.c:530
#9 0x000000000051908d in vm_exec_core (th=0x197ffe90,
initial=) at ../ruby-1.9.3/insns.def:1015
#10 0x000000000051ed7e in vm_exec (th=0x197ffe90) at ../ruby-1.9.3/vm.c:1220
#11 0x000000000051fad5 in invoke_block_from_c (th=0x197ffe90,
block=, self=, argc=0,
argv=, blockptr=, cref=0x0)
at ../ruby-1.9.3/vm.c:624
#12 0x000000000052026f in rb_vm_invoke_proc (th=0x197ffe90, proc=0x196d72a0,
self=425420560, argc=0, argv=0x197d56c8, blockptr=0x0)
at ../ruby-1.9.3/vm.c:670
#13 0x000000000052f5c1 in thread_start_func_2 (th=0x197ffe90,
stack_start=) at ../ruby-1.9.3/thread.c:453
#14 0x000000000052f75e in thread_start_func_1 (th_ptr=0x197ffe90)
at ../ruby-1.9.3/thread_pthread.c:656
#15 0x0000003a3500673d in start_thread () from /lib64/libpthread.so.0
#16 0x0000003a344d44bd in clone () from /lib64/libc.so.6
--
// SASADA Koichi at atdot dot net
Updated by nagachika (Tomoyuki Chikanaga) about 13 years ago
Koichi Sasada wrote:
Could you commit it? Maybe 1.9.3 also needs this patch. But I'm not
sure how to commit frozen 1.9.3 branch.
I'd be happy to commit it. I'll check in it to trunk and create a backport request ticket.
BTW, let me summarize the reason why this patch works fine. If I misunderstand, please point out.
When interrupt other thread (cf. by Thread#kill, Thread#raise) in blocking region, unblocking function (ubf) is called to interrupt blocking system call like select/ppoll etc... The default ubf (ubf_select) call pthread_kill() to send SIGVTALRM to interrupted thread.
But There's race condition. If interrupted thread was just released GVL and received SIGVTALRM before entering blocking systemcall, signal was lost.
To cover this race condition, ubf_select() also call add_signal_thread_list() to register "signal thread list". Timer thread periodically send SIGVTALRM to threads in signal thread list. Recently timer thread is suspended when there's no more than one runnable thread. So when interrupt other thread and register to signal thread list, you should activate timer thread to polling-mode.
Updated by nagachika (Tomoyuki Chikanaga) about 13 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r33307.
Tomoyuki, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- thread_pthread.c (ubf_select): activate timer thread when interrupt
blocking thread.
A patch created by Koichi Sasada. [ruby-core:39634] [Bug #5343]
to cover race condition, timer thread periodically send SIGVTARLM to
threads in signal thread list. so you should activate timer thread
when interrupt a thread.
Updated by ko1 (Koichi Sasada) about 13 years ago
(2011/09/21 9:38), Tomoyuki Chikanaga wrote:
Could you commit it? Maybe 1.9.3 also needs this patch. But I'm not
sure how to commit frozen 1.9.3 branch.
I'd be happy to commit it. I'll check in it to trunk and create a backport request ticket.
Thanks.
BTW, let me summarize the reason why this patch works fine. If I misunderstand, please point out.
When interrupt other thread (cf. by Thread#kill, Thread#raise) in blocking region, unblocking function (ubf) is called to interrupt blocking system call like select/ppoll etc...
The default ubf (ubf_select)
No. ubf_select is not a default one. ubf_select is only for canceling
select() blocking. If you use it for other process, it may be a bug.
call pthread_kill() to send SIGVTALRM to interrupted thread.
But There's race condition.
If interrupted thread was just released GVL and received SIGVTALRM
before entering blocking systemcall, signal was lost.
To cover this race condition, ubf_select() also call
add_signal_thread_list() to register "signal thread list".
Timer thread periodically send SIGVTALRM to threads in signal thread
list.
Recently timer thread is suspended when there's no more than one
runnable thread. So when interrupt other thread and register to
signal thread list, you should activate timer thread to
polling-mode.
Yes, absolutely.
--
// SASADA Koichi at atdot dot net
Updated by normalperson (Eric Wong) about 13 years ago
Tomoyuki Chikanaga nagachika00@gmail.com wrote:
To cover this race condition, ubf_select() also call
add_signal_thread_list() to register "signal thread list". Timer
thread periodically send SIGVTALRM to threads in signal thread list.
Ah, I missed the second part of that earlier. My concern raised in
[ruby-core:39643] is no more, thank you.