Project

General

Profile

Actions

Bug #16809

closed

Fiber crashes with --with-coroutine=copy

Added by ncopa (Natanael Copa) almost 4 years ago. Updated almost 3 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [s390x-linux-musl]
[ruby-core:98026]

Description

./revision.h unchanged                                                                                                  
#190 test_fiber.rb:15:in `<top (required)>':                                                                            
     Fiber.new{                                                                                                         
     }.resume                                                                                                           
     :ok                                                    
  #=> "" (expected "ok")                                                                                                
#192 test_fiber.rb:26:in `<top (required)>':                                                                            
     fibers = 100.times.collect{Fiber.new{Fiber.yield}}                                                                 
     fibers.each(&:resume)                                                                                              
     fibers.each(&:resume)                                                                                              
     :ok                                                                                                                
  #=> "" (expected "ok")     
#193 test_fiber.rb:33:in `<top (required)>':                                                                            
     at_exit { Fiber.new{}.resume }                                                                                     
  #=> killed by SIGFPE (signal 8)                           
#194 test_fiber.rb:37:in `<top (required)>':                                                                            
     Fiber.new(&Object.method(:class_eval)).resume("foo")                                                               
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
test_fiber.rb           FAIL 4/5                                                                                        
#934 test_massign.rb:165:in `<top (required)>':                                                                         
     a,s=[],"aaa"                                                                                                       
     300.times { a<<s; s=s.succ }                                                                                       
     eval <<-END__                                                                                                      
     GC.stress=true                                         
     Fiber.new do                                                                                                       
       #{ a.join(",") },*zzz=1                                                                                          
     end.resume                                                                                                         
     END__                                                                                                              
     :ok                                                                                                                
  #=> "" (expected "ok")  [ruby-dev:32581]                                                                              
test_massign.rb         FAIL 1/34                                                                                       
#1391 test_thread.rb:310:in `<top (required)>':                                                                         
     g = enum_for(:local_variables)                         
     loop { g.next }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
#1392 test_thread.rb:315:in `<top (required)>':                                                                         
     g = enum_for(:block_given?)                                                                                        
     loop { g.next }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
#1393 test_thread.rb:320:in `<top (required)>':                                                                         
     g = enum_for(:binding)                                                                                             
     loop { g.next }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
#1394 test_thread.rb:325:in `<top (required)>':                                                                         
     g = "abc".enum_for(:scan, /./)                                                                                     
     loop { g.next }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
#1395 test_thread.rb:330:in `<top (required)>':                                                                         
     g = Module.enum_for(:new)                              
     loop { g.next }                                                                                                    
  #=> killed by SIGFPE (signal 8)  [ruby-dev:34128]                                                                     
test_thread.rb          FAIL 5/48                                                                                       
                                                                                                                        
Thread count: 10000 (skipping)                              
FAIL 10/1409 tests failed                                                                                               
make: *** [uncommon.mk:751: yes-btest-ruby] Error 1

May be related to this warning:

compiling coroutine/copy/Context.c                          
coroutine/copy/Context.c: In function 'coroutine_restore_stack_padded':                                                 
coroutine/copy/Context.c:87:34: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   87 |     _longjmp(context->state, 1 | (int)buffer);      
      |             

Updated by puchuu (Andrew Aladjev) over 3 years ago

I've tested copy coroutine. Unfortunately today it is broken completely: hangs, segfaults, etc.

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

  • Subject changed from ruby testsuite fails on s390x alpine (musl) with --with-coroutine=copy to Fiber crashes with --with-coroutine=copy
  • Status changed from Open to Assigned
  • Assignee set to ioquatix (Samuel Williams)

OpenBSD/sparc64 (which uses copy coroutine) is similarly broken in regards to fibers. Even something simple like ruby27 -e 'Fiber.new{Fiber.yield}.resume' crashes (ruby26 works fine for this). Changing the title to be more general since this does not just affect s390x alpine (musl).

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

It looks like sometimes the copy coroutine implementation can segfault even on x86_64: https://travis-ci.org/github/ruby/ruby/jobs/729643639

Updated by ioquatix (Samuel Williams) over 3 years ago

This might be a pointer alignment issue / problem with the alloca elision.

After playing around with godbolt compiler explorer, I think this might be one option:

https://github.com/ruby/ruby/pull/3624

However, I wouldn't be surprised if it doesn't solve the issue.

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

I tried pull request #3624 on OpenBSD/sparc64 and it still crashed.

I was able to come up with a fix that works on OpenBSD/sparc64, as long as a couple files are compiled without optimization: https://github.com/ruby/ruby/pull/3726

Updated by ioquatix (Samuel Williams) over 3 years ago

I think we found the root cause of this, and it should be addressed by:

https://github.com/ruby/ruby/pull/3624/commits/9de559acc82a28bb0d912ed55cd36cf6f652ea9f

However, @jeremyevans0 (Jeremy Evans) is still testing it.

Updated by ioquatix (Samuel Williams) over 3 years ago

  • Status changed from Assigned to Closed

Updated by ioquatix (Samuel Williams) over 3 years ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN to 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: REQUIRED

@jeremyevans0 (Jeremy Evans) can you manage the backport? Or who is responsible?

This commit (and only this commit) should be backported: https://github.com/ruby/ruby/pull/3624/commits/440983fa9e7695d83def190e9701b5a22e076495

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: REQUIRED to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED

ioquatix (Samuel Williams) wrote in #note-8:

@jeremyevans0 (Jeremy Evans) can you manage the backport? Or who is responsible?

The branch maintainer is responsible. For 2.7, that is currently @nagachika (Tomoyuki Chikanaga) .

I updated the backport flag to indicate this is only needed by 2.7 and not earlier versions.

Updated by nagachika (Tomoyuki Chikanaga) almost 3 years ago

  • Backport changed from 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONE

ruby_2_7 d84cc717020be1da7d89b6bda02d1427f9593968 merged revision(s) 15e23312f6abcbf1afc6fbbf7917a57a0637f680.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0