Project

General

Profile

Actions

Backport #4009

closed

Segfault with combination of threads and condition variables

Added by nanodeath (Max Aller) over 13 years ago. Updated over 12 years ago.

Status:
Closed
[ruby-core:32982]

Description

=begin
When running the attached program, I get a segfault. When changing some of the values inside what I've designated as the "delayed adder" thread (namely, the number of jobs that get added, or the duration that the sleep occurs for), I get "fatal: deadlock detected" -- which is fine. But with the given settings on my laptop, it segfaults routinely.

I've already trimmed down the example as much as I could, but I realize it's a bit long, still. It's apparently important that the "delayed adder" thread exists at all, potentially "putting off" the deadlock detection somehow.
=end


Files

proof_of_segfault.rb (955 Bytes) proof_of_segfault.rb program that segfaults nanodeath (Max Aller), 11/01/2010 02:43 AM
proof_of_segfault_output (12.9 KB) proof_of_segfault_output output from stack trace nanodeath (Max Aller), 11/01/2010 02:43 AM
proof_of_segfault_small.rb (427 Bytes) proof_of_segfault_small.rb nanodeath (Max Aller), 01/18/2011 01:11 PM
Actions #1

Updated by nanodeath (Max Aller) over 13 years ago

=begin
Note: this also happens with ruby-1.9.2-p0. The provided program doesn't terminate at all using ruby-1.8.7-p302.
=end

Actions #2

Updated by naruse (Yui NARUSE) over 13 years ago

=begin
I can't reproduce this.
Can you show gdb backtrace?

ruby 1.9.3dev (2010-11-09 trunk 29733) [i686-linux]
1277:12: warning: assigned but unused variable - job
/home/naruse/local/ruby/lib/ruby/1.9.1/thread.rb:71:in sleep': deadlock detected (fatal) from /home/naruse/local/ruby/lib/ruby/1.9.1/thread.rb:71:in wait'
from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:100:in wait' from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:121:in wait_until'
from 1277:49:in block in <main>' from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:201:in mon_synchronize'
from 1277:48:in `'

=end

Actions #3

Updated by nagachika (Tomoyuki Chikanaga) over 13 years ago

=begin
Hi
I can reproduce similar SEGV under gdb with "ruby 1.9.3dev (2010-11-16 trunk 29789) [i686-linux]"

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1222669392 (LWP 14762)]
0x0813df46 in vm_call0 (th=0x9384238, recv=153972280, id=2512, argc=1,
argv=0xb71f8948, me=0x92e8a68) at vm_eval.c:75
75 vm_push_frame(th, 0, VM_FRAME_MAGIC_CFUNC,
(gdb) where
#0 0x0813df46 in vm_call0 (th=0x9384238, recv=153972280, id=2512, argc=1,
argv=0xb71f8948, me=0x92e8a68) at vm_eval.c:75
#1 0x0813e711 in check_funcall (recv=153972280, mid=2512, argc=1,
argv=0xb71f8948) at vm_eval.c:290
#2 0x0813e73e in rb_check_funcall (recv=153972280, mid=2512, argc=1,
argv=0xb71f8948) at vm_eval.c:296
#3 0x08059e05 in make_exception (argc=2, argv=0xb71f8944, isstr=1)
at eval.c:552
#4 0x08059ebb in rb_make_exception (argc=2, argv=0xb71f8944) at eval.c:574
#5 0x081484e8 in rb_threadptr_raise (th=0x92a7568, argc=2, argv=0xb71f8944)
at thread.c:1350
#6 0x0814c31a in rb_check_deadlock (vm=0x92a72e0) at thread.c:4465
#7 0x08147286 in thread_start_func_2 (th=0x9384238, stack_start=0xb71f8a78)
at thread.c:516
#8 0x08146356 in thread_start_func_1 (th_ptr=0x9384238)
at thread_pthread.c:361
#9 0x00666dd8 in start_thread () from /lib/tls/libpthread.so.0
#10 0x00f83d1a in clone () from /lib/tls/libc.so.6

th->cfp seems pointed invalid address

(gdb) p reg_cfp->sp
Cannot access memory at address 0xb7278fe0
(gdb) p reg_cfp
$9 = (rb_control_frame_t *) 0xb7278fdc
(gdb) p *reg_cfp
Cannot access memory at address 0xb7278fdc
(gdb) p th->cfp
$10 = (rb_control_frame_t *) 0xb7278fdc

=end

Actions #4

Updated by nagachika (Tomoyuki Chikanaga) over 13 years ago

=begin
I think the following patch fixes this situation.
Max, could you try this patch?

BTW I'm not too confident in this patch. Please review it.

Index: thread.c

--- thread.c (revision 29809)
+++ thread.c (working copy)
@@ -507,13 +507,14 @@
join_th = join_th->join_list_next;
}

  •   thread_unlock_all_locking_mutexes(th);
    
  •   if (th != main_th) rb_check_deadlock(th->vm);
    
  •   if (!th->root_fiber) {
          rb_thread_recycle_stack_release(th->stack);
          th->stack = 0;
      }
    
    }
  • thread_unlock_all_locking_mutexes(th);
  • if (th != main_th) rb_check_deadlock(th->vm);
    if (th->vm->main_thread == th) {
    ruby_cleanup(state);
    }

=end

Actions #5

Updated by nagachika (Tomoyuki Chikanaga) over 13 years ago

=begin
Hi,
I can still reproduce this segv on trunk(r30329) and also on 1.9.2-HEAD(r30326).
Please check the previous patch, thanks.
=end

Actions #6

Updated by nanodeath (Max Aller) over 13 years ago

=begin
Hit this on Ruby 1.9.3dev (2011-01-18 trunk 30590) [i686-linux] again. Tried applying the above patch (had to improvise regarding line numbers a little) but didn't seem to have any effect. The first two times I ran my script again it segfaulted as described in the original ticket, but the third time it hung with "*** glibc detected *** ruby: corrupted double-linked list: 0x0973f110 ***" immediately after the "C level backtrace information" header and I had to kill -9 it.
=end

Actions #7

Updated by nanodeath (Max Aller) over 13 years ago

=begin
Good news, I have managed to greatly reduce the failing code, which will hopefully make it easier to figure out. It's attached.
=end

Actions #8

Updated by nagachika (Tomoyuki Chikanaga) about 13 years ago

=begin
Hi,
The reduced sample code saves time to examine. Thank you :)
I also can reproduce segv in my Linux environment with ruby 1.9.3dev (2011-01-18 trunk 30590) [i686-linux], for both 'proof_og_segfault.rb' and 'proof_of_segfault_small.rb'.
But my previous patch seems effective in my environment.
Hmm, there may be another potential problems. I'll check with valgrind later.
=end

Actions #9

Updated by nagachika (Tomoyuki Chikanaga) about 13 years ago

=begin
Hi,
I've checked again with valgrind and get no extra problem report.
Sorry I can't hel

I have noticed that according to 'proof_of_segfault_output', your ruby should be build with --enable-shared configuration.
And if you retry with my patch like below, it could be dynamically linked with installed ~/.rvm/rubies/ruby-head/lib/libruby.so.1.9

(in building directory. after apply patch)
% make
% ./ruby proof_of_segfault.rb

If so, how about command like like below?

% LD_LIBRARY_PATH=<building_directory> ./ruby proof_of_segfault.rb
=end

Actions #10

Updated by nanodeath (Max Aller) about 13 years ago

=begin
Tomoyuki, I suspect you were right regarding the linking situation -- so I applied your patch to my .rvm/repos/ruby-head path and ran rvm --static install ruby-head (which, interestingly, does not perform a git pull, so I did that manually; it just copies from repos/ to src/ and configures/builds/installs), and...presto, no more segfault! Get the deadlock detected error instead, which I think is the desired behavior. I even raised/lowered the worker count, still deadlocks.

To be thorough, I also tried running my sample with stock ruby-head, and it did still have the bug, so I think your patch is, at the very least, a functional solution.

Nice work.
=end

Actions #11

Updated by mame (Yusuke Endoh) about 13 years ago

=begin
Hi,

2010/11/17 Tomoyuki Chikanaga :

BTW I'm not too confident in this patch. Please review it.

Index: thread.c

--- thread.c    (revision 29809)
+++ thread.c    (working copy)
@@ -507,13 +507,14 @@
           join_th = join_th->join_list_next;
       }

  •       thread_unlock_all_locking_mutexes(th);
  •       if (th != main_th) rb_check_deadlock(th->vm);

       if (!th->root_fiber) {
           rb_thread_recycle_stack_release(th->stack);
           th->stack = 0;
       }
    }

  •    thread_unlock_all_locking_mutexes(th);
  •    if (th != main_th) rb_check_deadlock(th->vm);
        if (th->vm->main_thread == th) {
           ruby_cleanup(state);
        }

Looks good. Nice work!

--
Yusuke Endoh

=end

Actions #12

Updated by nagachika (Tomoyuki Chikanaga) about 13 years ago

=begin
Hi,

Max san, thank you for checking my patch again. I'm relieved to hear that it works fine.

Endoh san, thank you for reviewing.
I'll check in the patch later if there is no opposition.
=end

Actions #13

Updated by nagachika (Tomoyuki Chikanaga) about 13 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r30743.
Max, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • thread.c (thread_start_func_2): check deadlock condition before
    release thread stack. fix memory violation when deadlock detected.
    reported by Max Aller. [Bug #4009] [ruby-core:32982]
    =end
Actions #14

Updated by nagachika (Tomoyuki Chikanaga) about 13 years ago

  • Status changed from Closed to Assigned
  • Assignee set to yugui (Yuki Sonoda)

=begin
Please backport r30743 to 1.9.2
=end

Actions #15

Updated by nagachika (Tomoyuki Chikanaga) over 12 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r30743.
Max, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • thread.c (thread_start_func_2): check deadlock condition before
    release thread stack. fix memory violation when deadlock detected.
    reported by Max Aller. [Bug #4009] [ruby-core:32982]
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0