Project

General

Profile

Bug #15809

GC.verify_compaction_references - intermittent SEGV's on multiple platforms

Added by MSP-Greg (Greg L) 6 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x64-mingw32
[ruby-core:92465]

Description

See

https://travis-ci.org/ruby/ruby/jobs/525627187#L2684
https://ci.appveyor.com/project/ruby/ruby/builds/24138134/job/i7e441u7se11w7ey#L3402

Both have an error similar to:

/ruby/test/ruby/test_gc_compact.rb:128: [BUG] ROOT finalizers points to MOVED:
  0x0000000004679c20 -> 0x00000000089367c0 [0     ] T_ARRAY [E ] len: 1 (embed)

ruby-loco has also failed with a similar error. Attached STDERR output.


Files

test_all_err.log (42 KB) test_all_err.log ruby 2.7.0dev (2019-04-27 trunk 87d2a2d) [x64-mingw32] test-all STDERR log MSP-Greg (Greg L), 04/28/2019 04:46 PM
drb-test_drbobject.rb.patch (800 Bytes) drb-test_drbobject.rb.patch MSP-Greg (Greg L), 05/16/2019 09:53 PM

Associated revisions

Revision 136ae558
Added by ko1 (Koichi Sasada) 5 months ago

Do not kick finalizers on rb_gc().

rb_gc() kicks gc_finalize_deferred(), which invokes finalizers.
This means that any Ruby program can be run this point and
it may be thread switching points and so on.

However, it is difficult to think it invokes any Ruby programs.
For example, GC.compact use rb_gc() to implement it, howver,
any Ruby program must not be run on this timing.

For this reason (it is difficult to image it run any Ruby program),
I removed gc_finalize_deferred() line in rb_gc().

This patch solves GC.compact issue.
[Bug #15809] and re-enable GC.compact test.

History

Updated by k0kubun (Takashi Kokubun) 6 months ago

  • Assignee set to tenderlovemaking (Aaron Patterson)
  • Status changed from Open to Assigned

Updated by MSP-Greg (Greg L) 5 months ago

  • ruby -v set to ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x64-mingw32

Continuing to see intermittent SEGV's, today:

/ruby/test/ruby/test_gc_compact.rb:114: [BUG] ROOT finalizers points to MOVED: 0x0000000004f37f88 -> 0x0000000009736f50 [0     ] T_ARRAY [E ] len: 1 (embed)

ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x64-mingw32]

-- Control frame information -----------------------------------------------
c:0016 p:---- s:0116 e:000115 CFUNC  :verify_compaction_references
c:0015 p:0056 s:0112 e:000111 METHOD /ruby/test/ruby/test_gc_compact.rb:114
c:0014 p:0059 s:0103 e:000102 METHOD /ruby/test/lib/test/unit.rb:1198
c:0013 p:0059 s:0097 e:000096 METHOD /ruby/test/lib/minitest/unit.rb:1327
c:0012 p:0015 s:0088 e:000087 METHOD /ruby/test/lib/test/unit/testcase.rb:18
c:0011 p:0043 s:0083 e:000082 BLOCK  /ruby/test/lib/minitest/unit.rb:964 [FINISH]
c:0010 p:---- s:0075 e:000074 CFUNC  :map

Updated by wanabe (_ wanabe) 5 months ago

Recursive finalizer definition raises a similar error on my environment.

$ ./miniruby -ve 'def foo(c = 10); ObjectSpace.define_finalizer(Object.new) { foo(c - 1) } if c > 0; end; 10.times do foo; GC.verify_compaction_references end'
ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x86_64-linux]
-e:1: [BUG] ROOT finalizers points to MOVED: 0x000055eb4bfdfd18 -> 0x000055eb4c041040 [0     ] T_ARRAY [E ] len: 1 (embed)

ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0017 e:000016 CFUNC  :verify_compaction_references
c:0004 p:0016 s:0013 e:000012 BLOCK  -e:1 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :times
(snip)

Note: this script is inspired by "lib/drb/timeridconv.rb".

Updated by wanabe (_ wanabe) 5 months ago

rb_gc() calls garbage_collect() and gc_finalize_deferred().
Finalizers may create (or recycle) objects, but they can't be pinned because the mark-and-pin phase of garbage_collect() is done.
How about delaying gc_finalize_deferred() phase after gc_update_references()?

Updated by wanabe (_ wanabe) 5 months ago

wanabe (_ wanabe) wrote:

How about delaying gc_finalize_deferred() phase after gc_update_references()?

Like this:

diff --git a/gc.c b/gc.c
index ea4f54ce0e..68d58fcb5e 100644
--- a/gc.c
+++ b/gc.c
@@ -8100,14 +8100,17 @@ static VALUE
 rb_gc_compact(VALUE mod)
 {
     rb_objspace_t *objspace = &rb_objspace;
+    int reason = GPR_FLAG_FULL_MARK | GPR_FLAG_IMMEDIATE_MARK |
+                GPR_FLAG_IMMEDIATE_SWEEP | GPR_FLAG_CAPI;

     /* Ensure objects are pinned */
-    rb_gc();
+    garbage_collect(objspace, reason);

     gc_compact_heap(objspace);

     heap_eden->freelist = NULL;
     gc_update_references(objspace);
+    gc_finalize_deferred(objspace);

     rb_clear_method_cache_by_class(rb_cObject);
     rb_clear_constant_cache();
@@ -8191,14 +8194,17 @@ static VALUE
 gc_verify_compaction_references(VALUE mod)
 {
     rb_objspace_t *objspace = &rb_objspace;
+    int reason = GPR_FLAG_FULL_MARK | GPR_FLAG_IMMEDIATE_MARK |
+                GPR_FLAG_IMMEDIATE_SWEEP | GPR_FLAG_CAPI;

     /* Ensure objects are pinned */
-    rb_gc();
+    garbage_collect(objspace, reason);

     gc_compact_heap(objspace);

     heap_eden->freelist = NULL;
     gc_update_references(objspace);
+    gc_finalize_deferred(objspace);
     gc_check_references_for_moved(mod);

     rb_clear_method_cache_by_class(rb_cObject);

But I think it needs a refactoring because of duplicated code.

Updated by MSP-Greg (Greg L) 5 months ago

Still having intermittent SEGV, today:

/ruby/test/ruby/test_gc_compact.rb:129: [BUG] Object 0x000000000e2c9a80 [0     ] proc (Proc) block in on_gc@/install/lib/ruby/2.7.0/drb/timeridconv.rb:65 points to MOVED: 0x0000000001daec28 -> 0x000000000e2f89c0 [0     ] T_IMEMO env

ruby 2.7.0dev (2019-05-15 trunk 214e2f93aa) [x64-mingw32]

-- Control frame information -----------------------------------------------
c:0016 p:---- s:0114 e:000113 CFUNC  :verify_compaction_references
c:0015 p:0025 s:0109 e:000108 METHOD /ruby/test/ruby/test_gc_compact.rb:129
<clipped>

Updated by wanabe (_ wanabe) 5 months ago

gc_update_references() updates finalizer_table after https://git.ruby-lang.org/ruby.git/commit/?id=3cf767ee.
Therefore, "[BUG] ROOT finalizers points to MOVED" error seems to be suppressed.

I guess the [BUG] of [ruby-core:92665] is caused by T_IMEMO env, not by finalizer_table.
Values of env->env[] are pinned normally, but finalizer can create imemo_env objects and they have unpinned env->env[] as I said in [ruby-core:92615].

diff --git a/gc.c b/gc.c
index b71c501a5b..ec90416672 100644
--- a/gc.c
+++ b/gc.c
@@ -7721,7 +7721,12 @@ gc_ref_update_imemo(rb_objspace_t *objspace, VALUE obj)
         case imemo_env:
             {
                 rb_env_t *env = (rb_env_t *)obj;
+                VALUE *envp = (VALUE*)env->env;
+                unsigned int i;
                 TYPED_UPDATE_IF_MOVED(objspace, rb_iseq_t *, env->iseq);
+                for (i = 0; i < env->env_size; i++) {
+                    UPDATE_IF_MOVED(objspace, envp[i]);
+                }
             }
             break;
             break;

Updated by MSP-Greg (Greg L) 5 months ago

wanabe (_ wanabe)

Thank you for your work on this. Today, I had the SEGV fault again during a ruby-loco build, and have found that running the two following test files will cause a SEGV:

drb/test_drbobject.rb  ruby/test_gc_compact.rb

I don't know if interaction with tests other than drb/test_drbobject.rb will cause a SEGV in ruby/test_gc_compact.rb.

ruby/test_gc_compact.rb seems to have been isolated in the CI here. Up to now, ruby-loco has reliably passed test-all running parallel with retry.

I don't know c, and even if I did, I wouldn't want to jump into GC code. Hence, I can't help with the issue.

It certainly seems incorrect to 'sweep the issue under the rug' via test isolation. I can't think of any code in Ruby that should include a caveat something like "this may SEGV if you use it with an indeterminate set of classes/modules/methods...

I may isolate the drb test in ruby-loco and see what happens...

Updated by MSP-Greg (Greg L) 5 months ago

I added a patch to (hopefully) allow ruby/test_gc_compact.rb to run with the rest of tests, which may help in determining if more tests are incompatible with it.

Decreased the 'time-to-live' of DRb::TimerIdConv objects from the default (600) to 0.2...

#10

Updated by ko1 (Koichi Sasada) 5 months ago

  • Status changed from Assigned to Closed

Applied in changeset git|136ae55892ae120bb94e4ff2d025c745fdaa1091.


Do not kick finalizers on rb_gc().

rb_gc() kicks gc_finalize_deferred(), which invokes finalizers.
This means that any Ruby program can be run this point and
it may be thread switching points and so on.

However, it is difficult to think it invokes any Ruby programs.
For example, GC.compact use rb_gc() to implement it, howver,
any Ruby program must not be run on this timing.

For this reason (it is difficult to image it run any Ruby program),
I removed gc_finalize_deferred() line in rb_gc().

This patch solves GC.compact issue.
[Bug #15809] and re-enable GC.compact test.

Also available in: Atom PDF