Project

General

Profile

Actions

Bug #4612

closed

Segmentation fault in fiber GC mark cycle

Added by serge_balyuk (Serge Balyuk) over 11 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.2p188 (2011-03-28 revision 31204) [x86_64-darwin10.7.0]
Backport:
[ruby-core:35891]

Description

=begin
((|Fiber.current|)) can cause segfault on GC cycle when used in threads. Please find attached ruby sample which should help to pinpoint the problem.

The coredump shows the following backtrace:

....
#18
#19 0x000000010004cce5 in mark_locations_array (objspace=0x100838000, x=0x101dc8ab0, n=649) at gc.c:1315
#20 0x000000010004cea6 in gc_mark_locations (objspace=0x100838000, start=0x101dc8ab0, end=0x101dc9f00) at gc.c:1331
#21 0x000000010004f3dc in rb_gc_mark_machine_stack (th=0x1008a1048) at gc.c:2235
#22 0x0000000100177a4f in rb_thread_mark (ptr=0x1008a1048) at vm.c:1683
#23 0x000000010018179a in cont_mark (ptr=0x1008a1000) at cont.c:88
#24 0x0000000100181947 in fiber_mark (ptr=0x1008a1000) at cont.c:168
#25 0x000000010004dbb9 in gc_mark_children (objspace=0x100838000, ptr=4303907200, lev=1) at gc.c:1719
#26 0x000000010004d3fb in gc_mark (objspace=0x100838000, ptr=4303907200, lev=0) at gc.c:1514
#27 0x000000010004d428 in rb_gc_mark (ptr=4303907200) at gc.c:1520
#28 0x000000010017797f in rb_thread_mark (ptr=0x10049fa00) at vm.c:1673
....

It looks like in frame 28 ((|rb_thread_mark|)) is called on terminated thread, which is ok, but then it goes down to fiber's continuation member ((|saved_thread|)) in frame 22. I think saved_thread holds an earlier snapshot (still running) of the same thread that we see in 28 (thread_id are equal), and because its machine stack pointers are stale, ((|rb_gc_mark_machine_stack|)) starts marking inaccessible memory.

One possible quick-fix is to set machine stack pointers to 0 in ((|cont_init()|)), given the original thread will take care of that stuff and free as needed. It cures segfaults, but I wonder if that doesn't break some other code.

I was also wondering why continuation holds a copy of thread struct instead of a pointer to it. It's hard to correctly follow real thread life cycle with ((|saved_thread|)). So can it harm in other cases like the above?

BTW, while following the code around fibers and continuations, I've found another curious thing: ((|fiber_free()|)) calls ((|cont_free(&fib->cont)|)) on its cont member, and ((|cont_free()|)) calls ((|ruby_xfree()|)) on it. Is that ok, given cont was allocated as part of fiber, and don't we need to ((|ruby_xfree|)) the ((|fib|)) itself?

The attached patch is made against ruby_1_9_2 branch; trunk seems to have the segfault behavior too.

=end


Files

test.rb (158 Bytes) test.rb test sample serge_balyuk (Serge Balyuk), 04/26/2011 03:13 AM
fiber-gc-fix.diff (377 Bytes) fiber-gc-fix.diff proposed patch serge_balyuk (Serge Balyuk), 04/26/2011 03:13 AM

Updated by serge_balyuk (Serge Balyuk) about 11 years ago

I'd like to bring attention to the issue again. Can I get some feedback? Maybe I haven't set some important property like target version or category, but unfortunately I can't change those now, given current permissions.

Updated by nagachika (Tomoyuki Chikanaga) about 11 years ago

Hi,

Sorry for late reply, and thank you for your reporting bug :)

I've checked I can reproduce segv and your patch fix the problem.
And make test-all reports no extra error.
I think your patch is reasonable. I'll check in it to trunk.

BTW, while following the code around fibers and continuations,
I've found another curious thing: fiber_free() calls cont_free(&fib->cont)
on its cont member, and cont_free() calls ruby_xfree() on it. Is that ok,
given cont was allocated as part of fiber, and don't we need to ruby_xfree the fib itself?
'cont' is a first member of structure 'rb_fiber_t', and
(void *)(&fib->cont) == (void *)&fib
That's why we don't need (in fact, must not) to call ruby_xfree() to fib itself.

Updated by serge_balyuk (Serge Balyuk) about 11 years ago

Awesome, thank you very much!

And thanks for the explanation on fiber deallocation, now it makes much more sense to me.

Actions #4

Updated by nagachika (Tomoyuki Chikanaga) about 11 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r31577.
Serge, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • cont.c (cont_init): clear macihne_stack_start/end of saved thread to
    prevent mark machine stack of GC'ed Thread. root Fiber is not initialized by
    fiber_init(). based on a patch by Serge Balyuk [ruby-core:35891] fixes #4612
  • test/ruby/test_fiber.rb (test_gc_root_fiber): add test for it.
Actions

Also available in: Atom PDF