Bug #17618
closedExceptions in Fiber Scheduler causes a segv
Description
If the fiber scheduler doesn't define an unblock function, Ruby will segv when threads are joined.
Here is an example program:
class Scheduler
def block blocker, timeout = nil
end
def fiber &block
fiber = Fiber.new blocking: false, &block
fiber.resume
fiber
end
end
Fiber.set_scheduler Scheduler.new
Fiber.schedule do
Thread.new { }.join
end
The backtrace looks like this:
(lldb) bt
* thread #3, name = 'test.rb:17', stop reason = EXC_BAD_ACCESS (code=1, address=0xb0)
frame #0: 0x00000001000dc49a miniruby`rb_ec_tag_jump(ec=0x0000000100a2ec50, st=RUBY_TAG_RAISE) at eval_intern.h:185:20
frame #1: 0x00000001000dbda7 miniruby`rb_longjmp(ec=0x0000000100a2ec50, tag=6, mesg=0x000000010101b3f8, cause=0x0000000000000008) at eval.c:699:5
frame #2: 0x00000001000dbb9c miniruby`rb_exc_raise(mesg=0x000000010101b3f8) at eval.c:717:5
frame #3: 0x000000010037446c miniruby`raise_method_missing(ec=0x0000000100a2ec50, argc=3, argv=0x000070000e6d39e0, obj=0x000000010101b8d0, last_call_status=MISSING_MISSING) at vm_eval.c:955:2
frame #4: 0x0000000100374288 miniruby`method_missing(ec=0x0000000100a2ec50, obj=0x000000010101b8d0, id=24721, argc=3, argv=0x000070000e6d39e0, call_status=MISSING_NOENTRY, kw_splat=0) at vm_eval.c:1002:5
frame #5: 0x0000000100385fdd miniruby`rb_call0(ec=0x0000000100a2ec50, recv=0x000000010101b8d0, mid=24721, argc=2, argv=0x000070000e6d3be0, call_scope=CALL_FCALL, self=0x0000000000000008) at vm_eval.c:515:20
frame #6: 0x0000000100358a02 miniruby`rb_funcallv_scope(recv=0x000000010101b8d0, mid=24721, argc=2, argv=0x000070000e6d3be0, scope=CALL_FCALL) at vm_eval.c:1021:16
frame #7: 0x0000000100354c71 miniruby`rb_funcallv(recv=0x000000010101b8d0, mid=24721, argc=2, argv=0x000070000e6d3be0) at vm_eval.c:1038:12
frame #8: 0x000000010035921d miniruby`rb_funcall(recv=0x000000010101b8d0, mid=24721, n=2) at vm_eval.c:1109:12
* frame #9: 0x0000000100291d23 miniruby`rb_fiber_scheduler_unblock(scheduler=0x000000010101b8d0, blocker=0x000000010107bd70, fiber=0x000000010101b768) at scheduler.c:142:12
frame #10: 0x00000001002f1445 miniruby`rb_threadptr_join_list_wakeup(thread=0x0000000100a2e9b0) at thread.c:555:13
frame #11: 0x00000001002f0fd5 miniruby`thread_start_func_2(th=0x0000000100a2e9b0, stack_start=0x000070000e7d3f70) at thread.c:891:9
frame #12: 0x00000001002f07b5 miniruby`thread_start_func_1(th_ptr=0x0000000100a2e9b0) at thread_pthread.c:1033:9
frame #13: 0x00007fff2043a950 libsystem_pthread.dylib`_pthread_start + 224
frame #14: 0x00007fff2043647b libsystem_pthread.dylib`thread_start + 15
It seems like the ec is missing a tag:
(lldb) f 0
frame #0: 0x00000001000dc49a miniruby`rb_ec_tag_jump(ec=0x0000000100a2ec50, st=RUBY_TAG_RAISE) at eval_intern.h:185:20
182 static inline void
183 rb_ec_tag_jump(const rb_execution_context_t *ec, enum ruby_tag_type st)
184 {
-> 185 ec->tag->state = st;
186 ruby_longjmp(ec->tag->buf, 1);
187 }
188
(lldb) p ec->tag
(rb_vm_tag *const) $1 = 0x0000000000000000
(lldb)
I tried popping the tag later in thread_start_func_2
, but it caused the process to go in to an infinite loop.
Updated by alanwu (Alan Wu) almost 4 years ago
Just some observations in case it's useful. Implementing unblock
in the scheduler and printing out the current thread shows that unblock
runs on a dead thread:
class Scheduler
def block blocker, timeout = nil
end
def unblock a, b
p Thread.current
end
def fiber &block
fiber = Fiber.new blocking: false, &block
fiber.resume
fiber
end
end
Fiber.set_scheduler Scheduler.new
Fiber.schedule do
Thread.new { }.join
end
ruby 3.1.0dev (2021-02-09T22:47:36Z master 49d3830f44) [x86_64-darwin19]
#<Thread:0x00007fee4d81b490 test.rb:20 dead>
It doesn't seem right to run Ruby code on a dead thread.
Also, raising any exception in the unblock method will cause a SEGV. For example:
class Scheduler
def block blocker, timeout = nil
end
def unblock a, b
raise
end
def fiber &block
fiber = Fiber.new blocking: false, &block
fiber.resume
fiber
end
end
Fiber.set_scheduler Scheduler.new
Fiber.schedule do
Thread.new { }.join
end
Updated by ioquatix (Samuel Williams) almost 4 years ago
My initial reaction is a scheduler without unblock is broken by design, and it's the dead thread which is invoking unblock as part of it's tidy up - which in other cases will wake up other threads. I don't have any strong opinion about it, except that a thread that transitions to dead is then able to notify others that join
can proceed.
Updated by ioquatix (Samuel Williams) over 3 years ago
I found the reason for this and I have made a PR which I think addresses this. I'll use this as a test case.
Updated by ioquatix (Samuel Williams) over 3 years ago
Okay, now rather than SEGV, I get unlimited number of
undefined method `unblock' for #<Scheduler:0x000000010a1b1fb0> (NoMethodError)
which I think is at least somewhat better. So I'll merge the PR.
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
- Status changed from Open to Closed