Bug #15320
closedIO.popen with MJIT worker thread may deadlock
Description
In r65785, we added mjit_pause
on before_fork_ruby
. We seem to have introduced a few deadlocks by that. This was discussed in https://bugs.ruby-lang.org/issues/15316, but I filed a separate ticket because this issue is independent from postponed_job race.
Example deadlock:
_pthread_mutex_unlock_usercnt (decr=1, mutex=mutex@entry=0x55a4ee3b7520 <mjit_engine_mutex>) at pthread_mutex_unlock.c:38
>>> Machine level backtrace
Thread 2 (Thread 0x7f9ea8981700 (LWP 20561)):
#0 0x00007f9ec2b4acf6 in __GI_ppoll (fds=fds@entry=0x7f9ea897fb08, nfds=nfds@entry=1, timeout=<optimized out>, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x000055a4edfd0f37 in rb_sigwait_sleep (th=th@entry=0x0, sigwait_fd=sigwait_fd@entry=3, rel=rel@entry=0x0) at /home/ko1/ruby/src/trunk-mjit-wait/thread_pthread.c:1969
#2 0x000055a4edf3b0dc in ruby_waitpid_locked (vm=vm@entry=0x55a4ee90b950, pid=pid@entry=20590, status=status@entry=0x7f9ea897fbfc, options=options@entry=0, cond=cond@entry=0x7f9ea897fc00) at /home/ko1/ruby/src/trunk-mjit-wait/process.c:1070
#3 0x000055a4eded2e82 in exec_process (path=<optimized out>, argv=argv@entry=0x7f9ebc0061e0) at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:611
#4 0x000055a4eded4528 in link_o_to_so (so_file=0x7f9ea897fc70 "/tmp/_ruby_mjit_p27024u127.so", o_files=0x7f9ea897fd20) at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:824
#5 convert_unit_to_func (unit=unit@entry=0x55a4eeb2f790, cc_entries=<optimized out>, is_entries=<optimized out>) at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:1090
#6 0x000055a4eded4e52 in mjit_worker () at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:1217
#7 0x000055a4edfcb15a in mjit_worker (arg=0x55a4eded4a20 <mjit_worker>) at /home/ko1/ruby/src/trunk-mjit-wait/thread_pthread.c:1876
#8 0x00007f9ec3a916db in start_thread (arg=0x7f9ea8981700) at pthread_create.c:463
#9 0x00007f9ec2b5788f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1 (Thread 0x7f9ec40df740 (LWP 27024)):
#0 __pthread_mutex_unlock_usercnt (decr=1, mutex=mutex@entry=0x55a4ee3b7520 <mjit_engine_mutex>) at pthread_mutex_unlock.c:38
#1 __GI___pthread_mutex_unlock (mutex=mutex@entry=0x55a4ee3b7520 <mjit_engine_mutex>) at pthread_mutex_unlock.c:345
#2 0x000055a4edfcf3e9 in rb_native_mutex_unlock (lock=lock@entry=0x55a4ee3b7520 <mjit_engine_mutex>) at /home/ko1/ruby/src/trunk-mjit-wait/thread_pthread.c:400
#3 0x000055a4eded6659 in CRITICAL_SECTION_FINISH (msg=0x55a4ee0d7226 "in stop_worker", level=3) at /home/ko1/ruby/src/trunk-mjit-wait/mjit_worker.c:412
#4 stop_worker () at /home/ko1/ruby/src/trunk-mjit-wait/mjit.c:689
#5 mjit_pause (wait_p=wait_p@entry=0) at /home/ko1/ruby/src/trunk-mjit-wait/mjit.c:719
#6 0x000055a4edf3e603 in before_fork_ruby () at /home/ko1/ruby/src/trunk-mjit-wait/process.c:1510
#7 rb_fork_ruby (status=status@entry=0x7ffde545cce8) at /home/ko1/ruby/src/trunk-mjit-wait/process.c:4011
#8 0x000055a4edeb1482 in pipe_open (execarg_obj=execarg_obj@entry=8, fmode=3, convconfig=convconfig@entry=0x7ffde545cdf0, modestr=<optimized out>) at /home/ko1/ruby/src/trunk-mjit-wait/io.c:6592
#9 0x000055a4edeb29e2 in rb_io_s_popen (argc=<optimized out>, argv=<optimized out>, klass=94166865734200) at /home/ko1/ruby/src/trunk-mjit-wait/io.c:6868
#10 0x000055a4ee00abc5 in vm_call_cfunc_with_frame (ci=0x55a4eef36ca0, cc=<optimized out>, calling=<optimized out>, reg_cfp=0x7f9ec40dd9b8, ec=0x55a4ee90c2e8) at /home/ko1/ruby/src/trunk-mjit-wait/vm_insnhelper.c:1908
#11 vm_call_cfunc (ec=0x55a4ee90c2e8, reg_cfp=0x7f9ec40dd9b8, calling=<optimized out>, ci=0x55a4eef36ca0, cc=<optimized out>) at /home/ko1/ruby/src/trunk-mjit-wait/vm_insnhelper.c:1924
#12 0x000055a4ee016033 in vm_call_method (ec=0x55a4ee90c2e8, cfp=0x7f9ec40dd9b8, calling=<optimized out>, ci=<optimized out>, cc=<optimized out>) at /home/ko1/ruby/src/trunk-mjit-wait/vm_insnhelper.c:2397
#13 0x000055a4ee01ce93 in vm_exec_core (ec=ec@entry=0x55a4ee90c2e8, initial=initial@entry=0) at ../../src/trunk-mjit-wait/insns.def:751
#14 0x000055a4ee01292c in rb_vm_exec (ec=0x55a4ee90c2e8, mjit_enable_p=1) at /home/ko1/ruby/src/trunk-mjit-wait/vm.c:1825
#15 0x00007f9ea8985491 in ?? ()
#16 0x000055a4ee90c2e8 in ?? ()
#17 0x00007ffde545d280 in ?? ()
#18 0x00007ffd00000000 in ?? ()
#19 0x0000000000000003 in ?? ()
#20 0x0000000000000000 in ?? ()
>>> Dump Ruby level backtrace
th: 0x55a4ee90be60, native_id: 0x7f9ec40df740
-- Control frame information -----------------------------------------------
c:0030 p:---- s:0190 e:000189 CFUNC :popen
c:0029 p:0015 s:0184 e:000183 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/ruby/test_process.rb:1776
c:0028 p:0059 s:0177 e:000176 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:1181 [FINISH]
c:0027 p:0058 s:0171 e:000170 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1291 [FINISH]
c:0026 p:0015 s:0162 e:000161 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit/testcase.rb:18
c:0025 p:0082 s:0157 e:000156 BLOCK /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:952 [FINISH]
c:0024 p:---- s:0150 e:000149 CFUNC :map
c:0023 p:0133 s:0146 e:000145 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:945
c:0022 p:0045 s:0134 e:000133 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:1044
c:0021 p:0011 s:0127 e:000125 BLOCK /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:552 [FINISH]
c:0020 p:---- s:0121 e:000120 CFUNC :each
c:0019 p:0057 s:0117 e:000116 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:550
c:0018 p:0009 s:0110 e:000109 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:587
c:0017 p:0460 s:0103 e:000101 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:512
c:0016 p:0048 s:0089 e:000088 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:548
c:0015 p:0009 s:0082 e:000081 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:587
c:0014 p:0155 s:0075 e:000074 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:888
c:0013 p:0018 s:0061 e:000060 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:981
c:0012 p:0006 s:0056 e:000055 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1110
c:0011 p:0007 s:0052 e:000051 BLOCK /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1097 [FINISH]
c:0010 p:---- s:0048 e:000047 CFUNC :each
c:0009 p:0058 s:0044 e:000043 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1096
c:0008 p:0016 s:0039 e:000038 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/minitest/unit.rb:1084
c:0007 p:0009 s:0034 e:000033 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:790
c:0006 p:0009 s:0028 e:000027 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:620
c:0005 p:0017 s:0022 e:000021 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:33
c:0004 p:0037 s:0017 e:000016 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:1143
c:0003 p:0011 s:0013 e:000012 METHOD /home/ko1/ruby/src/trunk-mjit-wait/test/lib/test/unit.rb:1147
c:0002 p:0261 s:0008 E:001458 EVAL /home/ko1/ruby/src/trunk-mjit-wait/test/runner.rb:33 [FINISH]
c:0001 p:0000 s:0003 E:001230 (none) [FINISH]
As the backtrace looks similar to https://bugs.ruby-lang.org/issues/14867#note-82, I suspect IO.popen has race with MJIT worker for waiting_pids
handling and RUBY_VM_CHECK_INTS
inside before_fork_ruby
might unexpectedly proceed SIGCHLD handler (for gcc process) while waitpid list is not prepared for it yet.
Updated by k0kubun (Takashi Kokubun) almost 6 years ago
- Related to Bug #15316: rb_postponed_job_register not thread-safe added
Updated by k0kubun (Takashi Kokubun) almost 6 years ago
- Related to Bug #14867: Process.wait can wait for MJIT compiler process added
Updated by k0kubun (Takashi Kokubun) almost 6 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r65807.
process.c: do not run signal handler before fork
to prevent from proceeding one for MJIT while it's not safe yet.
By that situation, MJIT worker could be waiting for compiler process forever
http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1468033
[Bug #15320]
Updated by k0kubun (Takashi Kokubun) almost 6 years ago
- Subject changed from IO.popen with MJIT worker thread may deadlock to Process.daemon in IO.popen block with MJIT worker thread may deadlock
- Status changed from Closed to Open
Updated by k0kubun (Takashi Kokubun) almost 6 years ago
- Subject changed from Process.daemon in IO.popen block with MJIT worker thread may deadlock to IO.popen with MJIT worker thread may deadlock
Updated by k0kubun (Takashi Kokubun) almost 6 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r65817.
process.c: do not try to pause MJIT
while child handler is disabled.
trying to fix [Bug #15320]