Project

General

Profile

Actions

Bug #19248

closed

TestGCCompact#test_moving_objects_between_size_pools test failure

Added by vo.x (Vit Ondruch) about 2 years ago. Updated almost 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.0dev (2022-12-21 master 6af6857ecf) [x86_64-linux]
[ruby-core:111361]

Description

Testing on Fedora Rawhide with git|6af6857ecf, I observe the following error:

  1) Error:
TestGCCompact#test_moving_objects_between_size_pools:
NoMethodError: undefined method `>=' for nil:NilClass
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/ruby/test_gc_compact.rb:278:in `<main>'
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/ruby/test_gc_compact.rb:256:in `test_moving_objects_between_size_pools'
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/tool/test/runner.rb:23:in `<top (required)>'
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/runner.rb:16:in `require_relative'
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/runner.rb:16:in `<main>'

Testing previously with git|11acb7f7bc, everything was fine. I might just guess that this is related to git|bfc66e07b7e0134dfa2041c311dc56941fe1caf0

Updated by peterzhu2118 (Peter Zhu) about 2 years ago

Thank you for this bug report. Could you run this script and paste the output? It will help me to debug this issue.

require "objspace"

class Foo
  def add_ivars
    10.times do |i|
      instance_variable_set("@foo" + i.to_s, 0)
    end
  end
end

OBJ_COUNT = 500

GC.verify_compaction_references(expand_heap: true, toward: :empty)

ary = OBJ_COUNT.times.map { Foo.new }
ary.each(&:add_ivars)

GC.start
foo = Foo.new
foo.add_ivars

puts "--- BEFORE COMPACT"
puts GC.stat
puts
puts GC.stat_heap
puts
puts ObjectSpace.dump(ary[0])
puts
puts ObjectSpace.dump(foo)
puts

stats = GC.verify_compaction_references(expand_heap: true, toward: :empty)

puts "--- AFTER COMPACT"
puts stats
puts
puts GC.stat
puts
puts GC.stat_heap
puts
puts ObjectSpace.dump(ary[0])
puts
puts ObjectSpace.dump(foo)

Updated by eightbitraptor (Matt V-H) about 2 years ago

vo.x (Vit Ondruch) wrote:

Testing on Fedora Rawhide with git|6af6857ecf, I observe the following error:

Hey. I have been unable to replicate this test failure on both my development environment, which is Fedora 37, or a toolbox container running Rawhide, both on x86_64. In addition to the script output that @peterzhu2118 (Peter Zhu) has requested - could you let us know what architecture this is failing on for you, is it consistent or intermittent?

As an aside, this should be a test failure rather than raising a NoMethodError. I'll look at that Addressd in PR #6978

Updated by vo.x (Vit Ondruch) about 2 years ago

eightbitraptor (Matthew Valentine-House) wrote in #note-2:

could you let us know what architecture this is failing on for you, is it consistent or intermittent?

It was x86_64. And I hit the issue for the first build run.

I did not have enough time to play with the issue, I have just reported the ticket once I have encountered the issue. But I'll try to provide more feedback tomorrow.

Updated by vo.x (Vit Ondruch) about 2 years ago

peterzhu2118 (Peter Zhu) wrote in #note-1:

Thank you for this bug report. Could you run this script and paste the output? It will help me to debug this issue.

$ make -C redhat-linux-build/ test-all TESTS="../test_19248.rb"
make: Entering directory '/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/redhat-linux-build'
--- BEFORE COMPACT
{:count=>8, :time=>19, :heap_allocated_pages=>325, :heap_sorted_length=>611, :heap_allocatable_pages=>99, :heap_available_slots=>275275, :heap_live_slots=>17572, :heap_free_slots=>257703, :heap_final_slots=>0, :heap_marked_slots=>17502, :heap_eden_pages=>325, :heap_tomb_pages=>0, :total_allocated_pages=>512, :total_freed_pages=>187, :total_allocated_objects=>72809, :total_freed_objects=>55237, :malloc_increase_bytes=>62760, :malloc_increase_bytes_limit=>16777216, :minor_gc_count=>3, :major_gc_count=>5, :compact_count=>1, :read_barrier_faults=>0, :total_moved_objects=>15899, :remembered_wb_unprotected_objects=>123, :remembered_wb_unprotected_objects_limit=>246, :old_objects=>16872, :old_objects_limit=>33744, :oldmalloc_increase_bytes=>62760, :oldmalloc_increase_bytes_limit=>16777216}

{0=>{:slot_size=>40, :heap_allocatable_pages=>0, :heap_eden_pages=>85, :heap_eden_slots=>139182, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>263, :total_freed_pages=>178, :force_major_gc_count=>0}, 1=>{:slot_size=>80, :heap_allocatable_pages=>0, :heap_eden_pages=>122, :heap_eden_slots=>99838, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>131, :total_freed_pages=>9, :force_major_gc_count=>0}, 2=>{:slot_size=>160, :heap_allocatable_pages=>0, :heap_eden_pages=>68, :heap_eden_slots=>27799, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>68, :total_freed_pages=>0, :force_major_gc_count=>0}, 3=>{:slot_size=>320, :heap_allocatable_pages=>17, :heap_eden_pages=>33, :heap_eden_slots=>6725, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>33, :total_freed_pages=>0, :force_major_gc_count=>0}, 4=>{:slot_size=>640, :heap_allocatable_pages=>82, :heap_eden_pages=>17, :heap_eden_slots=>1731, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>17, :total_freed_pages=>0, :force_major_gc_count=>0}}

{"address":"0x7f6471b0fa68", "type":"OBJECT", "shape_id":134, "slot_size":40, "class":"0x7f646c568178", "ivars":10, "memsize":136, "flags":{"wb_protected":true}}

{"address":"0x7f646ea5fbe8", "type":"OBJECT", "shape_id":144, "slot_size":160, "class":"0x7f646c568178", "ivars":10, "memsize":160, "flags":{"wb_protected":true}}

--- AFTER COMPACT
{:considered=>{:T_OBJECT=>541, :T_CLASS=>444, :T_MODULE=>44, :T_FLOAT=>3, :T_STRING=>7152, :T_REGEXP=>118, :T_ARRAY=>573, :T_HASH=>41, :T_STRUCT=>7, :T_BIGNUM=>30, :T_DATA=>277, :T_MATCH=>1, :T_SYMBOL=>8, :T_IMEMO=>7177, :T_ICLASS=>83}, :moved=>{:T_OBJECT=>541, :T_CLASS=>444, :T_MODULE=>44, :T_FLOAT=>3, :T_STRING=>7152, :T_REGEXP=>118, :T_ARRAY=>573, :T_HASH=>41, :T_STRUCT=>7, :T_BIGNUM=>30, :T_DATA=>277, :T_MATCH=>1, :T_SYMBOL=>2, :T_IMEMO=>7177, :T_ICLASS=>83}, :moved_up=>{}, :moved_down=>{:T_ARRAY=>1}}

{:count=>10, :time=>37, :heap_allocated_pages=>618, :heap_sorted_length=>833, :heap_allocatable_pages=>65, :heap_available_slots=>516160, :heap_live_slots=>17639, :heap_free_slots=>498521, :heap_final_slots=>0, :heap_marked_slots=>17558, :heap_eden_pages=>618, :heap_tomb_pages=>0, :total_allocated_pages=>1005, :total_freed_pages=>387, :total_allocated_objects=>73081, :total_freed_objects=>55442, :malloc_increase_bytes=>3760, :malloc_increase_bytes_limit=>16777216, :minor_gc_count=>3, :major_gc_count=>7, :compact_count=>2, :read_barrier_faults=>0, :total_moved_objects=>32392, :remembered_wb_unprotected_objects=>123, :remembered_wb_unprotected_objects_limit=>246, :old_objects=>17427, :old_objects_limit=>34854, :oldmalloc_increase_bytes=>3760, :oldmalloc_increase_bytes_limit=>16777216}

{0=>{:slot_size=>40, :heap_allocatable_pages=>0, :heap_eden_pages=>160, :heap_eden_slots=>261987, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>513, :total_freed_pages=>353, :force_major_gc_count=>0}, 1=>{:slot_size=>80, :heap_allocatable_pages=>0, :heap_eden_pages=>222, :heap_eden_slots=>181663, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>256, :total_freed_pages=>34, :force_major_gc_count=>0}, 2=>{:slot_size=>160, :heap_allocatable_pages=>0, :heap_eden_pages=>136, :heap_eden_slots=>55598, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>136, :total_freed_pages=>0, :force_major_gc_count=>0}, 3=>{:slot_size=>320, :heap_allocatable_pages=>0, :heap_eden_pages=>66, :heap_eden_slots=>13450, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>66, :total_freed_pages=>0, :force_major_gc_count=>0}, 4=>{:slot_size=>640, :heap_allocatable_pages=>65, :heap_eden_pages=>34, :heap_eden_slots=>3462, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>34, :total_freed_pages=>0, :force_major_gc_count=>0}}

{"address":"0x7f646bfaafb0", "type":"OBJECT", "shape_id":134, "slot_size":40, "class":"0x7f646b26ece8", "ivars":10, "memsize":136, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}

{"address":"0x7f646ea5fbe8", "type":"OBJECT", "shape_id":144, "slot_size":160, "class":"0x7f646b26ece8", "ivars":10, "memsize":160, "flags":{"wb_protected":true, "pinned":true}}
Run options: 
  --seed=23777
  "--ruby=./miniruby -I/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/lib -I. -I.ext/common  /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/tool/runruby.rb --extout=.ext  -- --disable-gems"
  --excludes-dir=/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/excludes
  --name=!/memory_leak/

# Running tests:

Finished tests in 0.009343s, 0.0000 tests/s, 0.0000 assertions/s.
0 tests, 0 assertions, 0 failures, 0 errors, 0 skips

ruby -v: ruby 3.2.0dev (2022-12-21 master 6af6857ecf) [x86_64-linux]
make: Leaving directory '/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/redhat-linux-build'

eightbitraptor (Matthew Valentine-House) wrote in #note-2:

is it consistent or intermittent?

It seems to be consistent. 5 failures out of 5 attempts:

$ make -C redhat-linux-build/ test-all TESTS="-v -n /TestGCCompact#test_moving_objects_between_size_pools/"
make: Entering directory '/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/redhat-linux-build'
Run options: 
  --seed=40875
  "--ruby=./miniruby -I/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/lib -I. -I.ext/common  /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/tool/runruby.rb --extout=.ext  -- --disable-gems"
  --excludes-dir=/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/excludes
  --name=!/memory_leak/
  -v
  -n
  /TestGCCompact#test_moving_objects_between_size_pools/

# Running tests:

[1/0] TestGCCompact#test_moving_objects_between_size_pools = 0.12 s

  1) Error:
TestGCCompact#test_moving_objects_between_size_pools:
NoMethodError: undefined method `>=' for nil:NilClass
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/ruby/test_gc_compact.rb:278:in `<main>'
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/ruby/test_gc_compact.rb:256:in `test_moving_objects_between_size_pools'
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/tool/test/runner.rb:23:in `<top (required)>'
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/runner.rb:16:in `require_relative'
    /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/runner.rb:16:in `<main>'

Finished tests in 4.621852s, 0.2164 tests/s, 0.6491 assertions/s.
1 tests, 3 assertions, 0 failures, 1 errors, 0 skips

ruby -v: ruby 3.2.0dev (2022-12-21 master 6af6857ecf) [x86_64-linux]
make: *** [uncommon.mk:856: yes-test-all] Error 1
make: Leaving directory '/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/redhat-linux-build'

Updated by peterzhu2118 (Peter Zhu) almost 2 years ago

Thanks for running the script! From the output it looks like the read barrier is not being triggered, so the objects aren't being moved back. It also looks like the objects are being moved but just not upwards. I'm really not sure what's causing the objects to not be moved upwards.

I tried running a Fedora Rawhide docker container (from registry.fedoraproject.org/fedora:rawhide) on my x86_64 Ubuntu 22.04 machine, but I was not able to replicate this bug. I'll try a VM next, but I've got a few other things I need to before Ruby 3.2.0 release (which is in about 2 days), so I can't promise that I'll be able to work on this before then. But I'll certainly get back on this in the new year!

Updated by vo.x (Vit Ondruch) almost 2 years ago

Just FTR, this is result on the official builder (will be garbage collected in several days, sorry):

https://koji.fedoraproject.org/koji/taskinfo?taskID=95624219

You can check the build.log for each of the platforms. And it seems the test is passing on i686/s390x while failing on x86_64/ppc64le/aarch64. If you can't reproduce in container, can it be Kernel related?

Updated by peterzhu2118 (Peter Zhu) almost 2 years ago

Thank you for providing the logs, I took a look at it. I noticed these lines:

+ echo 'Patch #0 (ruby-2.3.0-ruby_version.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
patching file template/ruby.pc.in
patching file lib/rdoc/ri/paths.rb
patching file tool/rbinstall.rb
patching file lib/rubygems/defaults.rb
patching file test/rubygems/test_gem.rb
patching file configure.ac
Patch #1 (ruby-2.1.0-Prevent-duplicated-paths-when-empty-version-string-i.patch):
+ echo 'Patch #1 (ruby-2.1.0-Prevent-duplicated-paths-when-empty-version-string-i.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
patching file loadpath.c
patching file tool/mkconfig.rb
Patch #2 (ruby-2.1.0-Enable-configuration-of-archlibdir.patch):
+ echo 'Patch #2 (ruby-2.1.0-Enable-configuration-of-archlibdir.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
Patch #3 (ruby-2.1.0-always-use-i386.patch):
+ echo 'Patch #3 (ruby-2.1.0-always-use-i386.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
Patch #4 (ruby-2.1.0-custom-rubygems-location.patch):
+ echo 'Patch #4 (ruby-2.1.0-custom-rubygems-location.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
patching file loadpath.c
patching file template/verconf.h.tmpl
patching file tool/rbinstall.rb
Patch #6 (ruby-2.7.0-Initialize-ABRT-hook.patch):
+ echo 'Patch #6 (ruby-2.7.0-Initialize-ABRT-hook.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file abrt.c
patching file common.mk
patching file ruby.c
Patch #7 (ruby-3.1.0-Don-t-query-RubyVM-FrozenCore-for-class-path.patch):
+ echo 'Patch #7 (ruby-3.1.0-Don-t-query-RubyVM-FrozenCore-for-class-path.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file vm.c
Patch #8 (ruby-2.7.1-Timeout-the-test_bug_reporter_add-witout-raising-err.patch):
+ echo 'Patch #8 (ruby-2.7.1-Timeout-the-test_bug_reporter_add-witout-raising-err.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file test/-ext-/bug_reporter/test_bug_reporter.rb

It looks like your build system is patching files in Ruby? If so, could you provide these files and/or look into potentially upstreaming it?

In particular, there's a file called always-use-i386. Is it forcing things to be built for i386?

Updated by vo.x (Vit Ondruch) almost 2 years ago

peterzhu2118 (Peter Zhu) wrote in #note-7:

It looks like your build system is patching files in Ruby? If so, could you provide these files and/or look into potentially upstreaming it?

I have certainly tried at times. But let me try to build without them.

In particular, there's a file called always-use-i386. Is it forcing things to be built for i386?

We carry this around for a while:

https://src.fedoraproject.org/rpms/ruby/blob/rawhide/f/ruby-2.1.0-always-use-i386.patch

It should apply to just i386/i686 builds IMO. I have tried to get rid of it, but I am not sure what effect it actually has 😬

Updated by vo.x (Vit Ondruch) almost 2 years ago

vo.x (Vit Ondruch) wrote in #note-8:

peterzhu2118 (Peter Zhu) wrote in #note-7:

It looks like your build system is patching files in Ruby? If so, could you provide these files and/or look into potentially upstreaming it?

I have certainly tried at times. But let me try to build without them.

Good news that it works without patches and with default configuration. Now what causes the issue. Let me try.

Updated by vo.x (Vit Ondruch) almost 2 years ago

Patches makes no difference.

Updated by vo.x (Vit Ondruch) almost 2 years ago

So far, it seems the key are the compiler flags. Going with default, the test case passes, using Fedora flags, the test case fails. These are hopefully the relevant parts of the log:

... snip ...

+ CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
+ export CFLAGS
+ CXXFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
+ export CXXFLAGS
+ FFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules'
+ export FFLAGS
+ FCFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules'
+ export FCFLAGS
+ VALAFLAGS=-g
+ export VALAFLAGS
+ LDFLAGS='-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1 '
+ export LDFLAGS
+ LT_SYS_LIBRARY_PATH=/usr/lib64:
+ export LT_SYS_LIBRARY_PATH
+ CC=gcc
+ export CC
+ CXX=g++
+ export CXX

... snip ...

---
Configuration summary for ruby version 3.2.0

   * Installation prefix: /usr
   * exec prefix:         /usr
   * arch:                x86_64-linux
   * site arch:           ${arch}
   * RUBY_BASE_NAME:      ruby
   * ruby lib prefix:     ${libdir}/${RUBY_BASE_NAME}
   * site libraries path: ${rubylibprefix}/${sitearch}
   * vendor path:         ${rubylibprefix}/vendor_ruby
   * target OS:           linux
   * compiler:            gcc
   * with thread:         pthread
   * with coroutine:      amd64
   * enable shared libs:  no
   * dynamic library ext: so
   * CFLAGS:              ${optflags} ${debugflags} ${warnflags}
   * LDFLAGS:             -L. -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now \
                          -specs=/usr/lib/rpm/redhat/redhat-hardened-ld \
                          -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  \
                          -Wl,--build-id=sha1 -fstack-protector-strong \
                          -rdynamic -Wl,-export-dynamic
   * DLDFLAGS:            -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now \
                          -specs=/usr/lib/rpm/redhat/redhat-hardened-ld \
                          -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  \
                          -Wl,--build-id=sha1  \
                          -Wl,--compress-debug-sections=zlib
   * optflags:            -O3 -fno-fast-math
   * debugflags:          -ggdb3
   * warnflags:           -Wall -Wextra -Wdeprecated-declarations \
                          -Wdiv-by-zero -Wduplicated-cond \
                          -Wimplicit-function-declaration -Wimplicit-int \
                          -Wmisleading-indentation -Wpointer-arith \
                          -Wwrite-strings -Wold-style-definition \
                          -Wimplicit-fallthrough=0 -Wmissing-noreturn \
                          -Wno-cast-function-type \
                          -Wno-constant-logical-operand -Wno-long-long \
                          -Wno-missing-field-initializers \
                          -Wno-overlength-strings \
                          -Wno-packed-bitfield-compat \
                          -Wno-parentheses-equality -Wno-self-assign \
                          -Wno-tautological-compare -Wno-unused-parameter \
                          -Wno-unused-value -Wsuggest-attribute=format \
                          -Wsuggest-attribute=noreturn -Wunused-variable \
                          -Wundef
   * strip command:       strip -S -x
   * install doc:         rdoc
   * MJIT support:        yes
   * YJIT support:        yes
   * man page type:       doc

---
~/build/BUILD/ruby-3.2.0-c5eefb7f37
+ popd
+ /usr/bin/make -O -j8 V=1 VERBOSE=1 'COPY=cp -p' -C redhat-linux-build
make: Entering directory '/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/redhat-linux-build'
	BASERUBY = echo executable host ruby is required.  use --with-baseruby option.; false
	CC = gcc
	LD = ld
	LDSHARED = gcc -shared
	CFLAGS = -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -m64
	XCFLAGS = -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-strict-overflow -fvisibility=hidden -fexcess-precision=standard -DRUBY_EXPORT -fPIE -I. -I.ext/include/x86_64-linux -I/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/include -I/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37 -I/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/enc/unicode/15.0.0 
	CPPFLAGS =   
	DLDFLAGS = -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1  -Wl,--compress-debug-sections=zlib -fstack-protector-strong -pie  -m64
	SOLIBS = yjit/target/release/libyjit.a -lz -lrt -lrt -lgmp -ldl -lcrypt -lm -lpthread 
	LANG = C
	LC_ALL = 
	LC_CTYPE = 
	MFLAGS = -w -j8 -Otarget --jobserver-auth=4,5
	RUSTC = rustc
	YJIT_RUSTC_ARGS = --crate-name=yjit --crate-type=staticlib --edition=2021 -g -C opt-level=3 -C overflow-checks=on '--out-dir=/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/redhat-linux-build/yjit/target/release/' /builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/yjit/src/lib.rs
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

... snip ...

Updated by peterzhu2118 (Peter Zhu) almost 2 years ago

Thank you for checking the patches and testing different configurations. It looks like you're using quite a lot of custom configuration flags, most of which I'm not too familiar with. I'll look into this in the new year and debug the issue.

Updated by vo.x (Vit Ondruch) almost 2 years ago

peterzhu2118 (Peter Zhu) wrote in #note-12:

It looks like you're using quite a lot of custom configuration flags

Right, mostly hardening. Which can certainly influence memory layout and what not.

most of which I'm not too familiar with.

I think that these are documented here:

https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/rawhide/f/buildflags.md

The "Individual compiler flags" section and bellow might be of your interest.

Updated by mtasaka (Mamoru TASAKA) almost 2 years ago

This seems also due to LTO, adding %global _lto_cflags %{nil} to ruby.spec, i.e. removing -flto=auto -ffat-lto-objects from compilation flag seems to make this test pass.

Updated by vo.x (Vit Ondruch) almost 2 years ago

mtasaka (Mamoru Tasaka) wrote in #note-14:

This seems also due to LTO, adding %global _lto_cflags %{nil} to ruby.spec, i.e. removing -flto=auto -ffat-lto-objects from compilation flag seems to make this test pass.

Thx a lot for help with the analysis!

BTW I have asked Fedorians if there is some convenient way to setup the build options Fedora is using 1 and it seems that this could be the way:

$ sudo dnf install redhat-rpm-config

$ $ rpm -E '%set_build_flags'

  CFLAGS="${CFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection}" ; export CFLAGS ; 
  CXXFLAGS="${CXXFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection}" ; export CXXFLAGS ; 
  FFLAGS="${FFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules}" ; export FFLAGS ; 
  FCFLAGS="${FCFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules}" ; export FCFLAGS ; 
  VALAFLAGS="${VALAFLAGS:--g}" ; export VALAFLAGS ; 
  LDFLAGS="${LDFLAGS:--Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1 }" ; export LDFLAGS ; 
  LT_SYS_LIBRARY_PATH="${LT_SYS_LIBRARY_PATH:-/usr/lib64:}" ; export LT_SYS_LIBRARY_PATH ; 
  CC="${CC:-gcc}" ; export CC ; 
  CXX="${CXX:-g++}" ; export CXX


$ eval `rpm -E '%set_build_flags'`

Updated by peterzhu2118 (Peter Zhu) almost 2 years ago

Thanks to @mtasaka (Mamoru TASAKA) and @vo.x (Vit Ondruch) for debugging this further! Using this information, I think this is a bug in GCC. Here's what I did:

  • I can confirm that I can reproduce this bug on Ubuntu 22.04 and GCC 11.3.0 with:

    CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
    
  • I noticed during debugging the function rb_shape_traverse_from_new_root had seemingly impossible behaviour. I confirm that if I disable optimizations on that function, then test passes as expected.

    #pragma GCC push_options
    #pragma GCC optimize ("O0")
    rb_shape_t *
    rb_shape_traverse_from_new_root(rb_shape_t *initial_shape, rb_shape_t *dest_shape)
    {
      // ...
    }
    #pragma GCC pop_options
    
  • I further isolated it down by setting CFLAGS='-flto=auto', which causes Ruby to segfault.

  • Interestingly, disabling optimizations (CFLAGS='-O0 -flto=auto') causes the test to pass.

As such, I think there's an issue with -flto that causes bugs in Ruby.

Updated by alanwu (Alan Wu) almost 2 years ago

Looking at rb_shape_traverse_from_new_root(), it seems like it's triggering UB
in a way that also failed with LTO in the past. Maybe try this patch to confirm?

diff --git a/shape.c b/shape.c
index 7580003412..8e75f4a270 100644
--- a/shape.c
+++ b/shape.c
@@ -449,6 +449,7 @@ rb_shape_traverse_from_new_root(rb_shape_t *initial_shape, rb_shape_t *dest_shap
 {
     RUBY_ASSERT(initial_shape->type == SHAPE_T_OBJECT);
     rb_shape_t *next_shape = initial_shape;
+    VALUE lookup_result;
 
     if (dest_shape->type != initial_shape->type) {
         next_shape = rb_shape_traverse_from_new_root(initial_shape, rb_shape_get_parent(dest_shape));
@@ -462,7 +463,10 @@ rb_shape_traverse_from_new_root(rb_shape_t *initial_shape, rb_shape_t *dest_shap
         if (!next_shape->edges) {
             return NULL;
         }
-        if (!rb_id_table_lookup(next_shape->edges, dest_shape->edge_name, (VALUE *)&next_shape)) {
+        if (rb_id_table_lookup(next_shape->edges, dest_shape->edge_name, &lookup_result)) {
+            next_shape = (rb_shape_t *)lookup_result;
+        }
+        else {
             return NULL;
         }
         break;

I wrote a post about this class of issue in the past: https://alanwu.space/post/strict-aliasing/

Updated by vo.x (Vit Ondruch) almost 2 years ago

alanwu (Alan Wu) wrote in #note-17:

Maybe try this patch to confirm?

The patch helps to mitigate the issue.

Updated by peterzhu2118 (Peter Zhu) almost 2 years ago

Thank you @alanwu (Alan Wu) for looking into this and @vo.x (Vit Ondruch) for confirming the fix. I've also confirmed that this fix does indeed make the tests pass. I've opened a PR here: https://github.com/ruby/ruby/pull/7067

Updated by peterzhu2118 (Peter Zhu) almost 2 years ago

@vo.x (Vit Ondruch) Do you need this backported to Ruby 3.2 or is it ok to keep it only on the master branch (i.e. it will be part of Ruby 3.3)?

Updated by alanwu (Alan Wu) almost 2 years ago

@vo.x (Vit Ondruch) You might want to consider packaging with -fno-strict-aliasing.
I suspect the perf loss should be minimal if noticeable at all and it would
mitigate these type of strict aliasing violations, yielding an artifact more likely to be correct.
Using LTO increases the exposure to strict aliasing violations. Since these bugs are rather
arcane and time consuming to track down you might consider this to be a good compromise.

Updated by vo.x (Vit Ondruch) almost 2 years ago

peterzhu2118 (Peter Zhu) wrote in #note-20:

@vo.x (Vit Ondruch) Do you need this backported to Ruby 3.2 or is it ok to keep it only on the master branch (i.e. it will be part of Ruby 3.3)?

Unfortunately, I have no idea how likely is to hit this in real applications. I'd leave the decision to you. For the moment, I have disabled the test case in Fedora, so the only advantage from my POV would be to make the Fedora .spec file a bit cleaner, which is never bad thing ;)

alanwu (Alan Wu) wrote in #note-21:

@vo.x (Vit Ondruch) You might want to consider packaging with -fno-strict-aliasing.

Generally, I leave the decision about the compiler options to distribution. From that POV, I don't think using -fno-strict-aliasing is an option. Of course if this was deemed the right compiler option by Ruby upstream, that would be different thing.

Actions #23

Updated by peterzhu2118 (Peter Zhu) almost 2 years ago

  • Status changed from Open to Closed

Applied in changeset git|273dca3aed7989120d57f80c789733d4bc870ffe.


Fix undefined behavior in shape.c

Under strict aliasing, writing to the memory location of a different
type is not allowed and will result in undefined behavior. This was
happening in shape.c due to rb_id_table_lookup writing to the memory
location of VALUE * that was casted from a rb_shape_t **.

This was causing test failures when compiled with LTO.

Fixes [Bug #19248]

Co-Authored-By: Alan Wu

Actions #24

Updated by peterzhu2118 (Peter Zhu) almost 2 years ago

  • Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED

Updated by peterzhu2118 (Peter Zhu) almost 2 years ago

I flagged it for backporting so it will be available on Ruby 3.2.1 :)

Updated by naruse (Yui NARUSE) almost 2 years ago

  • Backport changed from 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE

ruby_3_2 08ae7f64dc52c2b61e451d6e79ebdae73d482677 merged revision(s) 273dca3aed7989120d57f80c789733d4bc870ffe.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0