Project

General

Profile

Backport #4009

Segfault with combination of threads and condition variables

Added by nanodeath (Max Aller) almost 9 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
[ruby-core:32982]

Description

=begin
When running the attached program, I get a segfault. When changing some of the values inside what I've designated as the "delayed adder" thread (namely, the number of jobs that get added, or the duration that the sleep occurs for), I get "fatal: deadlock detected" -- which is fine. But with the given settings on my laptop, it segfaults routinely.

I've already trimmed down the example as much as I could, but I realize it's a bit long, still. It's apparently important that the "delayed adder" thread exists at all, potentially "putting off" the deadlock detection somehow.
=end


Files

proof_of_segfault.rb (955 Bytes) proof_of_segfault.rb program that segfaults nanodeath (Max Aller), 11/01/2010 02:43 AM
proof_of_segfault_output (12.9 KB) proof_of_segfault_output output from stack trace nanodeath (Max Aller), 11/01/2010 02:43 AM
proof_of_segfault_small.rb (427 Bytes) proof_of_segfault_small.rb nanodeath (Max Aller), 01/18/2011 01:11 PM

Associated revisions

Revision a2ba50d9
Added by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30743 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 30743
Added by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982]

Revision 30743
Added by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982]

Revision 30743
Added by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982]

Revision 30743
Added by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982]

Revision 30743
Added by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982]

Revision 30743
Added by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982]

Revision 30743
Added by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982]

Revision 67db9280
Added by yugui (Yuki Sonoda) over 8 years ago

merges r30743 from trunk into ruby_1_9_2.

    * thread.c (thread_start_func_2): check deadlock condition before
      release thread stack. fix memory violation when deadlock detected.
      reported by Max Aller. [Bug #4009] [ruby-core:32982]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_9_2@31202 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

History

#1

Updated by nanodeath (Max Aller) almost 9 years ago

=begin
Note: this also happens with ruby-1.9.2-p0. The provided program doesn't terminate at all using ruby-1.8.7-p302.
=end

#2

Updated by naruse (Yui NARUSE) almost 9 years ago

=begin
I can't reproduce this.
Can you show gdb backtrace?

ruby 1.9.3dev (2010-11-09 trunk 29733) [i686-linux]
1277:12: warning: assigned but unused variable - job
/home/naruse/local/ruby/lib/ruby/1.9.1/thread.rb:71:in sleep': deadlock detected (fatal)
from /home/naruse/local/ruby/lib/ruby/1.9.1/thread.rb:71:in
wait'
from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:100:in wait'
from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:121:in
wait_until'
from 1277:49:in block in <main>'
from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:201:in
mon_synchronize'
from 1277:48:in `'

=end

#3

Updated by nagachika (Tomoyuki Chikanaga) almost 9 years ago

=begin
Hi
I can reproduce similar SEGV under gdb with "ruby 1.9.3dev (2010-11-16 trunk 29789) [i686-linux]"

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1222669392 (LWP 14762)]
0x0813df46 in vm_call0 (th=0x9384238, recv=153972280, id=2512, argc=1,
argv=0xb71f8948, me=0x92e8a68) at vm_eval.c:75
75 vm_push_frame(th, 0, VM_FRAME_MAGIC_CFUNC,
(gdb) where
#0 0x0813df46 in vm_call0 (th=0x9384238, recv=153972280, id=2512, argc=1,
argv=0xb71f8948, me=0x92e8a68) at vm_eval.c:75
#1 0x0813e711 in check_funcall (recv=153972280, mid=2512, argc=1,
argv=0xb71f8948) at vm_eval.c:290
#2 0x0813e73e in rb_check_funcall (recv=153972280, mid=2512, argc=1,
argv=0xb71f8948) at vm_eval.c:296
#3 0x08059e05 in make_exception (argc=2, argv=0xb71f8944, isstr=1)
at eval.c:552
#4 0x08059ebb in rb_make_exception (argc=2, argv=0xb71f8944) at eval.c:574
#5 0x081484e8 in rb_threadptr_raise (th=0x92a7568, argc=2, argv=0xb71f8944)
at thread.c:1350
#6 0x0814c31a in rb_check_deadlock (vm=0x92a72e0) at thread.c:4465
#7 0x08147286 in thread_start_func_2 (th=0x9384238, stack_start=0xb71f8a78)
at thread.c:516
#8 0x08146356 in thread_start_func_1 (th_ptr=0x9384238)
at thread_pthread.c:361
#9 0x00666dd8 in start_thread () from /lib/tls/libpthread.so.0
#10 0x00f83d1a in clone () from /lib/tls/libc.so.6

th->cfp seems pointed invalid address

(gdb) p reg_cfp->sp
Cannot access memory at address 0xb7278fe0
(gdb) p reg_cfp
$9 = (rb_control_frame_t *) 0xb7278fdc
(gdb) p *reg_cfp
Cannot access memory at address 0xb7278fdc
(gdb) p th->cfp
$10 = (rb_control_frame_t *) 0xb7278fdc

=end

#4

Updated by nagachika (Tomoyuki Chikanaga) almost 9 years ago

=begin
I think the following patch fixes this situation.
Max, could you try this patch?

BTW I'm not too confident in this patch. Please review it.

Index: thread.c
===================================================================
--- thread.c (revision 29809)
+++ thread.c (working copy)
@@ -507,13 +507,14 @@
join_th = join_th->join_list_next;
}

  • thread_unlock_all_locking_mutexes(th);
  • if (th != main_th) rb_check_deadlock(th->vm); + if (!th->root_fiber) { rb_thread_recycle_stack_release(th->stack); th->stack = 0; } }
  • thread_unlock_all_locking_mutexes(th);
  • if (th != main_th) rb_check_deadlock(th->vm); if (th->vm->main_thread == th) { ruby_cleanup(state); }

=end

#5

Updated by nagachika (Tomoyuki Chikanaga) over 8 years ago

=begin
Hi,
I can still reproduce this segv on trunk(r30329) and also on 1.9.2-HEAD(r30326).
Please check the previous patch, thanks.
=end

#6

Updated by nanodeath (Max Aller) over 8 years ago

=begin
Hit this on Ruby 1.9.3dev (2011-01-18 trunk 30590) [i686-linux] again. Tried applying the above patch (had to improvise regarding line numbers a little) but didn't seem to have any effect. The first two times I ran my script again it segfaulted as described in the original ticket, but the third time it hung with "*** glibc detected *** ruby: corrupted double-linked list: 0x0973f110 ***" immediately after the "C level backtrace information" header and I had to kill -9 it.
=end

#7

Updated by nanodeath (Max Aller) over 8 years ago

=begin
Good news, I have managed to greatly reduce the failing code, which will hopefully make it easier to figure out. It's attached.
=end

#8

Updated by nagachika (Tomoyuki Chikanaga) over 8 years ago

=begin
Hi,
The reduced sample code saves time to examine. Thank you :)
I also can reproduce segv in my Linux environment with ruby 1.9.3dev (2011-01-18 trunk 30590) [i686-linux], for both 'proof_og_segfault.rb' and 'proof_of_segfault_small.rb'.
But my previous patch seems effective in my environment.
Hmm, there may be another potential problems. I'll check with valgrind later.
=end

#9

Updated by nagachika (Tomoyuki Chikanaga) over 8 years ago

=begin
Hi,
I've checked again with valgrind and get no extra problem report.
Sorry I can't hel

I have noticed that according to 'proof_of_segfault_output', your ruby should be build with --enable-shared configuration.
And if you retry with my patch like below, it could be dynamically linked with installed ~/.rvm/rubies/ruby-head/lib/libruby.so.1.9

(in building directory. after apply patch)
% make
% ./ruby proof_of_segfault.rb

If so, how about command like like below?

% LD_LIBRARY_PATH= ./ruby proof_of_segfault.rb
=end

#10

Updated by nanodeath (Max Aller) over 8 years ago

=begin
Tomoyuki, I suspect you were right regarding the linking situation -- so I applied your patch to my .rvm/repos/ruby-head path and ran rvm --static install ruby-head (which, interestingly, does not perform a git pull, so I did that manually; it just copies from repos/ to src/ and configures/builds/installs), and...presto, no more segfault! Get the deadlock detected error instead, which I think is the desired behavior. I even raised/lowered the worker count, still deadlocks.

To be thorough, I also tried running my sample with stock ruby-head, and it did still have the bug, so I think your patch is, at the very least, a functional solution.

Nice work.
=end

#11

Updated by mame (Yusuke Endoh) over 8 years ago

=begin
Hi,

2010/11/17 Tomoyuki Chikanaga redmine@ruby-lang.org:

BTW I'm not too confident in this patch. Please review it.

Index: thread.c

--- thread.c    (revision 29809)
+++ thread.c    (working copy)
@@ -507,13 +507,14 @@
           join_th = join_th->join_list_next;
       }

  •       thread_unlock_all_locking_mutexes(th);
  •       if (th != main_th) rb_check_deadlock(th->vm); +        if (!th->root_fiber) {            rb_thread_recycle_stack_release(th->stack);            th->stack = 0;        }     }
  •    thread_unlock_all_locking_mutexes(th);
  •    if (th != main_th) rb_check_deadlock(th->vm);     if (th->vm->main_thread == th) {        ruby_cleanup(state);     }

Looks good. Nice work!

--
Yusuke Endoh mame@tsg.ne.jp

=end

#12

Updated by nagachika (Tomoyuki Chikanaga) over 8 years ago

=begin
Hi,

Max san, thank you for checking my patch again. I'm relieved to hear that it works fine.

Endoh san, thank you for reviewing.
I'll check in the patch later if there is no opposition.
=end

#13

Updated by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r30743.
Max, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982] =end
#14

Updated by nagachika (Tomoyuki Chikanaga) over 8 years ago

  • Status changed from Closed to Assigned
  • Assignee set to yugui (Yuki Sonoda)

=begin
Please backport r30743 to 1.9.2
=end

#15

Updated by nagachika (Tomoyuki Chikanaga) almost 8 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r30743.
Max, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • thread.c (thread_start_func_2): check deadlock condition before release thread stack. fix memory violation when deadlock detected. reported by Max Aller. [Bug #4009] [ruby-core:32982]

Also available in: Atom PDF