Project

General

Profile

Bug #15809

GC.verify_compaction_references - intermittent SEGV's on multiple platforms

Added by MSP-Greg (Greg L) 23 days ago. Updated 5 days ago.

Status:
Assigned
Priority:
Normal
Target version:
-
ruby -v:
ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x64-mingw32
[ruby-core:92465]

Description

See

https://travis-ci.org/ruby/ruby/jobs/525627187#L2684
https://ci.appveyor.com/project/ruby/ruby/builds/24138134/job/i7e441u7se11w7ey#L3402

Both have an error similar to:

/ruby/test/ruby/test_gc_compact.rb:128: [BUG] ROOT finalizers points to MOVED:
  0x0000000004679c20 -> 0x00000000089367c0 [0     ] T_ARRAY [E ] len: 1 (embed)

ruby-loco has also failed with a similar error. Attached STDERR output.


Files

test_all_err.log (42 KB) test_all_err.log ruby 2.7.0dev (2019-04-27 trunk 87d2a2d) [x64-mingw32] test-all STDERR log MSP-Greg (Greg L), 04/28/2019 04:46 PM
drb-test_drbobject.rb.patch (800 Bytes) drb-test_drbobject.rb.patch MSP-Greg (Greg L), 05/16/2019 09:53 PM

History

Updated by k0kubun (Takashi Kokubun) 23 days ago

  • Assignee set to tenderlovemaking (Aaron Patterson)
  • Status changed from Open to Assigned

Updated by MSP-Greg (Greg L) 15 days ago

  • ruby -v set to ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x64-mingw32

Continuing to see intermittent SEGV's, today:

/ruby/test/ruby/test_gc_compact.rb:114: [BUG] ROOT finalizers points to MOVED: 0x0000000004f37f88 -> 0x0000000009736f50 [0     ] T_ARRAY [E ] len: 1 (embed)

ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x64-mingw32]

-- Control frame information -----------------------------------------------
c:0016 p:---- s:0116 e:000115 CFUNC  :verify_compaction_references
c:0015 p:0056 s:0112 e:000111 METHOD /ruby/test/ruby/test_gc_compact.rb:114
c:0014 p:0059 s:0103 e:000102 METHOD /ruby/test/lib/test/unit.rb:1198
c:0013 p:0059 s:0097 e:000096 METHOD /ruby/test/lib/minitest/unit.rb:1327
c:0012 p:0015 s:0088 e:000087 METHOD /ruby/test/lib/test/unit/testcase.rb:18
c:0011 p:0043 s:0083 e:000082 BLOCK  /ruby/test/lib/minitest/unit.rb:964 [FINISH]
c:0010 p:---- s:0075 e:000074 CFUNC  :map

Updated by wanabe (_ wanabe) 15 days ago

Recursive finalizer definition raises a similar error on my environment.

$ ./miniruby -ve 'def foo(c = 10); ObjectSpace.define_finalizer(Object.new) { foo(c - 1) } if c > 0; end; 10.times do foo; GC.verify_compaction_references end'
ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x86_64-linux]
-e:1: [BUG] ROOT finalizers points to MOVED: 0x000055eb4bfdfd18 -> 0x000055eb4c041040 [0     ] T_ARRAY [E ] len: 1 (embed)

ruby 2.7.0dev (2019-05-06 trunk c3cf1ef9bb) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0017 e:000016 CFUNC  :verify_compaction_references
c:0004 p:0016 s:0013 e:000012 BLOCK  -e:1 [FINISH]
c:0003 p:---- s:0010 e:000009 CFUNC  :times
(snip)

Note: this script is inspired by "lib/drb/timeridconv.rb".

Updated by wanabe (_ wanabe) 13 days ago

rb_gc() calls garbage_collect() and gc_finalize_deferred().
Finalizers may create (or recycle) objects, but they can't be pinned because the mark-and-pin phase of garbage_collect() is done.
How about delaying gc_finalize_deferred() phase after gc_update_references()?

Updated by wanabe (_ wanabe) 12 days ago

wanabe (_ wanabe) wrote:

How about delaying gc_finalize_deferred() phase after gc_update_references()?

Like this:

diff --git a/gc.c b/gc.c
index ea4f54ce0e..68d58fcb5e 100644
--- a/gc.c
+++ b/gc.c
@@ -8100,14 +8100,17 @@ static VALUE
 rb_gc_compact(VALUE mod)
 {
     rb_objspace_t *objspace = &rb_objspace;
+    int reason = GPR_FLAG_FULL_MARK | GPR_FLAG_IMMEDIATE_MARK |
+                GPR_FLAG_IMMEDIATE_SWEEP | GPR_FLAG_CAPI;

     /* Ensure objects are pinned */
-    rb_gc();
+    garbage_collect(objspace, reason);

     gc_compact_heap(objspace);

     heap_eden->freelist = NULL;
     gc_update_references(objspace);
+    gc_finalize_deferred(objspace);

     rb_clear_method_cache_by_class(rb_cObject);
     rb_clear_constant_cache();
@@ -8191,14 +8194,17 @@ static VALUE
 gc_verify_compaction_references(VALUE mod)
 {
     rb_objspace_t *objspace = &rb_objspace;
+    int reason = GPR_FLAG_FULL_MARK | GPR_FLAG_IMMEDIATE_MARK |
+                GPR_FLAG_IMMEDIATE_SWEEP | GPR_FLAG_CAPI;

     /* Ensure objects are pinned */
-    rb_gc();
+    garbage_collect(objspace, reason);

     gc_compact_heap(objspace);

     heap_eden->freelist = NULL;
     gc_update_references(objspace);
+    gc_finalize_deferred(objspace);
     gc_check_references_for_moved(mod);

     rb_clear_method_cache_by_class(rb_cObject);

But I think it needs a refactoring because of duplicated code.

Updated by MSP-Greg (Greg L) 6 days ago

Still having intermittent SEGV, today:

/ruby/test/ruby/test_gc_compact.rb:129: [BUG] Object 0x000000000e2c9a80 [0     ] proc (Proc) block in on_gc@/install/lib/ruby/2.7.0/drb/timeridconv.rb:65 points to MOVED: 0x0000000001daec28 -> 0x000000000e2f89c0 [0     ] T_IMEMO env

ruby 2.7.0dev (2019-05-15 trunk 214e2f93aa) [x64-mingw32]

-- Control frame information -----------------------------------------------
c:0016 p:---- s:0114 e:000113 CFUNC  :verify_compaction_references
c:0015 p:0025 s:0109 e:000108 METHOD /ruby/test/ruby/test_gc_compact.rb:129
<clipped>

Updated by wanabe (_ wanabe) 6 days ago

gc_update_references() updates finalizer_table after https://git.ruby-lang.org/ruby.git/commit/?id=3cf767ee.
Therefore, "[BUG] ROOT finalizers points to MOVED" error seems to be suppressed.

I guess the [BUG] of [ruby-core:92665] is caused by T_IMEMO env, not by finalizer_table.
Values of env->env[] are pinned normally, but finalizer can create imemo_env objects and they have unpinned env->env[] as I said in [ruby-core:92615].

diff --git a/gc.c b/gc.c
index b71c501a5b..ec90416672 100644
--- a/gc.c
+++ b/gc.c
@@ -7721,7 +7721,12 @@ gc_ref_update_imemo(rb_objspace_t *objspace, VALUE obj)
         case imemo_env:
             {
                 rb_env_t *env = (rb_env_t *)obj;
+                VALUE *envp = (VALUE*)env->env;
+                unsigned int i;
                 TYPED_UPDATE_IF_MOVED(objspace, rb_iseq_t *, env->iseq);
+                for (i = 0; i < env->env_size; i++) {
+                    UPDATE_IF_MOVED(objspace, envp[i]);
+                }
             }
             break;
             break;

Updated by MSP-Greg (Greg L) 5 days ago

wanabe (_ wanabe)

Thank you for your work on this. Today, I had the SEGV fault again during a ruby-loco build, and have found that running the two following test files will cause a SEGV:

drb/test_drbobject.rb  ruby/test_gc_compact.rb

I don't know if interaction with tests other than drb/test_drbobject.rb will cause a SEGV in ruby/test_gc_compact.rb.

ruby/test_gc_compact.rb seems to have been isolated in the CI here. Up to now, ruby-loco has reliably passed test-all running parallel with retry.

I don't know c, and even if I did, I wouldn't want to jump into GC code. Hence, I can't help with the issue.

It certainly seems incorrect to 'sweep the issue under the rug' via test isolation. I can't think of any code in Ruby that should include a caveat something like "this may SEGV if you use it with an indeterminate set of classes/modules/methods...

I may isolate the drb test in ruby-loco and see what happens...

Updated by MSP-Greg (Greg L) 5 days ago

I added a patch to (hopefully) allow ruby/test_gc_compact.rb to run with the rest of tests, which may help in determining if more tests are incompatible with it.

Decreased the 'time-to-live' of DRb::TimerIdConv objects from the default (600) to 0.2...

Also available in: Atom PDF