Bug #19248
closedTestGCCompact#test_moving_objects_between_size_pools test failure
Added by vo.x (Vit Ondruch) about 2 years ago. Updated almost 2 years ago.
Description
Testing on Fedora Rawhide with git|6af6857ecf, I observe the following error:
1) Error:
TestGCCompact#test_moving_objects_between_size_pools:
NoMethodError: undefined method `>=' for nil:NilClass
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/ruby/test_gc_compact.rb:278:in `<main>'
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/ruby/test_gc_compact.rb:256:in `test_moving_objects_between_size_pools'
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/tool/test/runner.rb:23:in `<top (required)>'
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/runner.rb:16:in `require_relative'
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/runner.rb:16:in `<main>'
Testing previously with git|11acb7f7bc, everything was fine. I might just guess that this is related to git|bfc66e07b7e0134dfa2041c311dc56941fe1caf0
Updated by peterzhu2118 (Peter Zhu) about 2 years ago
Thank you for this bug report. Could you run this script and paste the output? It will help me to debug this issue.
require "objspace"
class Foo
def add_ivars
10.times do |i|
instance_variable_set("@foo" + i.to_s, 0)
end
end
end
OBJ_COUNT = 500
GC.verify_compaction_references(expand_heap: true, toward: :empty)
ary = OBJ_COUNT.times.map { Foo.new }
ary.each(&:add_ivars)
GC.start
foo = Foo.new
foo.add_ivars
puts "--- BEFORE COMPACT"
puts GC.stat
puts
puts GC.stat_heap
puts
puts ObjectSpace.dump(ary[0])
puts
puts ObjectSpace.dump(foo)
puts
stats = GC.verify_compaction_references(expand_heap: true, toward: :empty)
puts "--- AFTER COMPACT"
puts stats
puts
puts GC.stat
puts
puts GC.stat_heap
puts
puts ObjectSpace.dump(ary[0])
puts
puts ObjectSpace.dump(foo)
Updated by eightbitraptor (Matt V-H) about 2 years ago
vo.x (Vit Ondruch) wrote:
Testing on Fedora Rawhide with git|6af6857ecf, I observe the following error:
Hey. I have been unable to replicate this test failure on both my development environment, which is Fedora 37, or a toolbox container running Rawhide, both on x86_64. In addition to the script output that @peterzhu2118 (Peter Zhu) has requested - could you let us know what architecture this is failing on for you, is it consistent or intermittent?
As an aside, this should be a test failure rather than raising a Addressd in PR #6978NoMethodError
. I'll look at that
Updated by vo.x (Vit Ondruch) about 2 years ago
eightbitraptor (Matthew Valentine-House) wrote in #note-2:
could you let us know what architecture this is failing on for you, is it consistent or intermittent?
It was x86_64. And I hit the issue for the first build run.
I did not have enough time to play with the issue, I have just reported the ticket once I have encountered the issue. But I'll try to provide more feedback tomorrow.
Updated by vo.x (Vit Ondruch) about 2 years ago
peterzhu2118 (Peter Zhu) wrote in #note-1:
Thank you for this bug report. Could you run this script and paste the output? It will help me to debug this issue.
$ make -C redhat-linux-build/ test-all TESTS="../test_19248.rb"
make: Entering directory '/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/redhat-linux-build'
--- BEFORE COMPACT
{:count=>8, :time=>19, :heap_allocated_pages=>325, :heap_sorted_length=>611, :heap_allocatable_pages=>99, :heap_available_slots=>275275, :heap_live_slots=>17572, :heap_free_slots=>257703, :heap_final_slots=>0, :heap_marked_slots=>17502, :heap_eden_pages=>325, :heap_tomb_pages=>0, :total_allocated_pages=>512, :total_freed_pages=>187, :total_allocated_objects=>72809, :total_freed_objects=>55237, :malloc_increase_bytes=>62760, :malloc_increase_bytes_limit=>16777216, :minor_gc_count=>3, :major_gc_count=>5, :compact_count=>1, :read_barrier_faults=>0, :total_moved_objects=>15899, :remembered_wb_unprotected_objects=>123, :remembered_wb_unprotected_objects_limit=>246, :old_objects=>16872, :old_objects_limit=>33744, :oldmalloc_increase_bytes=>62760, :oldmalloc_increase_bytes_limit=>16777216}
{0=>{:slot_size=>40, :heap_allocatable_pages=>0, :heap_eden_pages=>85, :heap_eden_slots=>139182, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>263, :total_freed_pages=>178, :force_major_gc_count=>0}, 1=>{:slot_size=>80, :heap_allocatable_pages=>0, :heap_eden_pages=>122, :heap_eden_slots=>99838, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>131, :total_freed_pages=>9, :force_major_gc_count=>0}, 2=>{:slot_size=>160, :heap_allocatable_pages=>0, :heap_eden_pages=>68, :heap_eden_slots=>27799, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>68, :total_freed_pages=>0, :force_major_gc_count=>0}, 3=>{:slot_size=>320, :heap_allocatable_pages=>17, :heap_eden_pages=>33, :heap_eden_slots=>6725, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>33, :total_freed_pages=>0, :force_major_gc_count=>0}, 4=>{:slot_size=>640, :heap_allocatable_pages=>82, :heap_eden_pages=>17, :heap_eden_slots=>1731, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>17, :total_freed_pages=>0, :force_major_gc_count=>0}}
{"address":"0x7f6471b0fa68", "type":"OBJECT", "shape_id":134, "slot_size":40, "class":"0x7f646c568178", "ivars":10, "memsize":136, "flags":{"wb_protected":true}}
{"address":"0x7f646ea5fbe8", "type":"OBJECT", "shape_id":144, "slot_size":160, "class":"0x7f646c568178", "ivars":10, "memsize":160, "flags":{"wb_protected":true}}
--- AFTER COMPACT
{:considered=>{:T_OBJECT=>541, :T_CLASS=>444, :T_MODULE=>44, :T_FLOAT=>3, :T_STRING=>7152, :T_REGEXP=>118, :T_ARRAY=>573, :T_HASH=>41, :T_STRUCT=>7, :T_BIGNUM=>30, :T_DATA=>277, :T_MATCH=>1, :T_SYMBOL=>8, :T_IMEMO=>7177, :T_ICLASS=>83}, :moved=>{:T_OBJECT=>541, :T_CLASS=>444, :T_MODULE=>44, :T_FLOAT=>3, :T_STRING=>7152, :T_REGEXP=>118, :T_ARRAY=>573, :T_HASH=>41, :T_STRUCT=>7, :T_BIGNUM=>30, :T_DATA=>277, :T_MATCH=>1, :T_SYMBOL=>2, :T_IMEMO=>7177, :T_ICLASS=>83}, :moved_up=>{}, :moved_down=>{:T_ARRAY=>1}}
{:count=>10, :time=>37, :heap_allocated_pages=>618, :heap_sorted_length=>833, :heap_allocatable_pages=>65, :heap_available_slots=>516160, :heap_live_slots=>17639, :heap_free_slots=>498521, :heap_final_slots=>0, :heap_marked_slots=>17558, :heap_eden_pages=>618, :heap_tomb_pages=>0, :total_allocated_pages=>1005, :total_freed_pages=>387, :total_allocated_objects=>73081, :total_freed_objects=>55442, :malloc_increase_bytes=>3760, :malloc_increase_bytes_limit=>16777216, :minor_gc_count=>3, :major_gc_count=>7, :compact_count=>2, :read_barrier_faults=>0, :total_moved_objects=>32392, :remembered_wb_unprotected_objects=>123, :remembered_wb_unprotected_objects_limit=>246, :old_objects=>17427, :old_objects_limit=>34854, :oldmalloc_increase_bytes=>3760, :oldmalloc_increase_bytes_limit=>16777216}
{0=>{:slot_size=>40, :heap_allocatable_pages=>0, :heap_eden_pages=>160, :heap_eden_slots=>261987, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>513, :total_freed_pages=>353, :force_major_gc_count=>0}, 1=>{:slot_size=>80, :heap_allocatable_pages=>0, :heap_eden_pages=>222, :heap_eden_slots=>181663, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>256, :total_freed_pages=>34, :force_major_gc_count=>0}, 2=>{:slot_size=>160, :heap_allocatable_pages=>0, :heap_eden_pages=>136, :heap_eden_slots=>55598, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>136, :total_freed_pages=>0, :force_major_gc_count=>0}, 3=>{:slot_size=>320, :heap_allocatable_pages=>0, :heap_eden_pages=>66, :heap_eden_slots=>13450, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>66, :total_freed_pages=>0, :force_major_gc_count=>0}, 4=>{:slot_size=>640, :heap_allocatable_pages=>65, :heap_eden_pages=>34, :heap_eden_slots=>3462, :heap_tomb_pages=>0, :heap_tomb_slots=>0, :total_allocated_pages=>34, :total_freed_pages=>0, :force_major_gc_count=>0}}
{"address":"0x7f646bfaafb0", "type":"OBJECT", "shape_id":134, "slot_size":40, "class":"0x7f646b26ece8", "ivars":10, "memsize":136, "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}}
{"address":"0x7f646ea5fbe8", "type":"OBJECT", "shape_id":144, "slot_size":160, "class":"0x7f646b26ece8", "ivars":10, "memsize":160, "flags":{"wb_protected":true, "pinned":true}}
Run options:
--seed=23777
"--ruby=./miniruby -I/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/lib -I. -I.ext/common /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/tool/runruby.rb --extout=.ext -- --disable-gems"
--excludes-dir=/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/excludes
--name=!/memory_leak/
# Running tests:
Finished tests in 0.009343s, 0.0000 tests/s, 0.0000 assertions/s.
0 tests, 0 assertions, 0 failures, 0 errors, 0 skips
ruby -v: ruby 3.2.0dev (2022-12-21 master 6af6857ecf) [x86_64-linux]
make: Leaving directory '/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/redhat-linux-build'
eightbitraptor (Matthew Valentine-House) wrote in #note-2:
is it consistent or intermittent?
It seems to be consistent. 5 failures out of 5 attempts:
$ make -C redhat-linux-build/ test-all TESTS="-v -n /TestGCCompact#test_moving_objects_between_size_pools/"
make: Entering directory '/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/redhat-linux-build'
Run options:
--seed=40875
"--ruby=./miniruby -I/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/lib -I. -I.ext/common /builddir/build/BUILD/ruby-3.2.0-6af6857ecf/tool/runruby.rb --extout=.ext -- --disable-gems"
--excludes-dir=/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/excludes
--name=!/memory_leak/
-v
-n
/TestGCCompact#test_moving_objects_between_size_pools/
# Running tests:
[1/0] TestGCCompact#test_moving_objects_between_size_pools = 0.12 s
1) Error:
TestGCCompact#test_moving_objects_between_size_pools:
NoMethodError: undefined method `>=' for nil:NilClass
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/ruby/test_gc_compact.rb:278:in `<main>'
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/ruby/test_gc_compact.rb:256:in `test_moving_objects_between_size_pools'
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/tool/test/runner.rb:23:in `<top (required)>'
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/runner.rb:16:in `require_relative'
/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/test/runner.rb:16:in `<main>'
Finished tests in 4.621852s, 0.2164 tests/s, 0.6491 assertions/s.
1 tests, 3 assertions, 0 failures, 1 errors, 0 skips
ruby -v: ruby 3.2.0dev (2022-12-21 master 6af6857ecf) [x86_64-linux]
make: *** [uncommon.mk:856: yes-test-all] Error 1
make: Leaving directory '/builddir/build/BUILD/ruby-3.2.0-6af6857ecf/redhat-linux-build'
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
Thanks for running the script! From the output it looks like the read barrier is not being triggered, so the objects aren't being moved back. It also looks like the objects are being moved but just not upwards. I'm really not sure what's causing the objects to not be moved upwards.
I tried running a Fedora Rawhide docker container (from registry.fedoraproject.org/fedora:rawhide) on my x86_64 Ubuntu 22.04 machine, but I was not able to replicate this bug. I'll try a VM next, but I've got a few other things I need to before Ruby 3.2.0 release (which is in about 2 days), so I can't promise that I'll be able to work on this before then. But I'll certainly get back on this in the new year!
Updated by vo.x (Vit Ondruch) almost 2 years ago
Just FTR, this is result on the official builder (will be garbage collected in several days, sorry):
https://koji.fedoraproject.org/koji/taskinfo?taskID=95624219
You can check the build.log for each of the platforms. And it seems the test is passing on i686/s390x while failing on x86_64/ppc64le/aarch64. If you can't reproduce in container, can it be Kernel related?
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
Thank you for providing the logs, I took a look at it. I noticed these lines:
+ echo 'Patch #0 (ruby-2.3.0-ruby_version.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
patching file template/ruby.pc.in
patching file lib/rdoc/ri/paths.rb
patching file tool/rbinstall.rb
patching file lib/rubygems/defaults.rb
patching file test/rubygems/test_gem.rb
patching file configure.ac
Patch #1 (ruby-2.1.0-Prevent-duplicated-paths-when-empty-version-string-i.patch):
+ echo 'Patch #1 (ruby-2.1.0-Prevent-duplicated-paths-when-empty-version-string-i.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
patching file loadpath.c
patching file tool/mkconfig.rb
Patch #2 (ruby-2.1.0-Enable-configuration-of-archlibdir.patch):
+ echo 'Patch #2 (ruby-2.1.0-Enable-configuration-of-archlibdir.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
Patch #3 (ruby-2.1.0-always-use-i386.patch):
+ echo 'Patch #3 (ruby-2.1.0-always-use-i386.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
Patch #4 (ruby-2.1.0-custom-rubygems-location.patch):
+ echo 'Patch #4 (ruby-2.1.0-custom-rubygems-location.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file configure.ac
patching file loadpath.c
patching file template/verconf.h.tmpl
patching file tool/rbinstall.rb
Patch #6 (ruby-2.7.0-Initialize-ABRT-hook.patch):
+ echo 'Patch #6 (ruby-2.7.0-Initialize-ABRT-hook.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file abrt.c
patching file common.mk
patching file ruby.c
Patch #7 (ruby-3.1.0-Don-t-query-RubyVM-FrozenCore-for-class-path.patch):
+ echo 'Patch #7 (ruby-3.1.0-Don-t-query-RubyVM-FrozenCore-for-class-path.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file vm.c
Patch #8 (ruby-2.7.1-Timeout-the-test_bug_reporter_add-witout-raising-err.patch):
+ echo 'Patch #8 (ruby-2.7.1-Timeout-the-test_bug_reporter_add-witout-raising-err.patch):'
+ /usr/bin/patch --no-backup-if-mismatch -f -p1 --fuzz=0
patching file test/-ext-/bug_reporter/test_bug_reporter.rb
It looks like your build system is patching files in Ruby? If so, could you provide these files and/or look into potentially upstreaming it?
In particular, there's a file called always-use-i386
. Is it forcing things to be built for i386?
Updated by vo.x (Vit Ondruch) almost 2 years ago
peterzhu2118 (Peter Zhu) wrote in #note-7:
It looks like your build system is patching files in Ruby? If so, could you provide these files and/or look into potentially upstreaming it?
I have certainly tried at times. But let me try to build without them.
In particular, there's a file called
always-use-i386
. Is it forcing things to be built for i386?
We carry this around for a while:
https://src.fedoraproject.org/rpms/ruby/blob/rawhide/f/ruby-2.1.0-always-use-i386.patch
It should apply to just i386/i686 builds IMO. I have tried to get rid of it, but I am not sure what effect it actually has 😬
Updated by vo.x (Vit Ondruch) almost 2 years ago
vo.x (Vit Ondruch) wrote in #note-8:
peterzhu2118 (Peter Zhu) wrote in #note-7:
It looks like your build system is patching files in Ruby? If so, could you provide these files and/or look into potentially upstreaming it?
I have certainly tried at times. But let me try to build without them.
Good news that it works without patches and with default configuration. Now what causes the issue. Let me try.
Updated by vo.x (Vit Ondruch) almost 2 years ago
Patches makes no difference.
Updated by vo.x (Vit Ondruch) almost 2 years ago
So far, it seems the key are the compiler flags. Going with default, the test case passes, using Fedora flags, the test case fails. These are hopefully the relevant parts of the log:
... snip ...
+ CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
+ export CFLAGS
+ CXXFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
+ export CXXFLAGS
+ FFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules'
+ export FFLAGS
+ FCFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules'
+ export FCFLAGS
+ VALAFLAGS=-g
+ export VALAFLAGS
+ LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 '
+ export LDFLAGS
+ LT_SYS_LIBRARY_PATH=/usr/lib64:
+ export LT_SYS_LIBRARY_PATH
+ CC=gcc
+ export CC
+ CXX=g++
+ export CXX
... snip ...
---
Configuration summary for ruby version 3.2.0
* Installation prefix: /usr
* exec prefix: /usr
* arch: x86_64-linux
* site arch: ${arch}
* RUBY_BASE_NAME: ruby
* ruby lib prefix: ${libdir}/${RUBY_BASE_NAME}
* site libraries path: ${rubylibprefix}/${sitearch}
* vendor path: ${rubylibprefix}/vendor_ruby
* target OS: linux
* compiler: gcc
* with thread: pthread
* with coroutine: amd64
* enable shared libs: no
* dynamic library ext: so
* CFLAGS: ${optflags} ${debugflags} ${warnflags}
* LDFLAGS: -L. -Wl,-z,relro -Wl,--as-needed -Wl,-z,now \
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld \
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 \
-Wl,--build-id=sha1 -fstack-protector-strong \
-rdynamic -Wl,-export-dynamic
* DLDFLAGS: -Wl,-z,relro -Wl,--as-needed -Wl,-z,now \
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld \
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 \
-Wl,--build-id=sha1 \
-Wl,--compress-debug-sections=zlib
* optflags: -O3 -fno-fast-math
* debugflags: -ggdb3
* warnflags: -Wall -Wextra -Wdeprecated-declarations \
-Wdiv-by-zero -Wduplicated-cond \
-Wimplicit-function-declaration -Wimplicit-int \
-Wmisleading-indentation -Wpointer-arith \
-Wwrite-strings -Wold-style-definition \
-Wimplicit-fallthrough=0 -Wmissing-noreturn \
-Wno-cast-function-type \
-Wno-constant-logical-operand -Wno-long-long \
-Wno-missing-field-initializers \
-Wno-overlength-strings \
-Wno-packed-bitfield-compat \
-Wno-parentheses-equality -Wno-self-assign \
-Wno-tautological-compare -Wno-unused-parameter \
-Wno-unused-value -Wsuggest-attribute=format \
-Wsuggest-attribute=noreturn -Wunused-variable \
-Wundef
* strip command: strip -S -x
* install doc: rdoc
* MJIT support: yes
* YJIT support: yes
* man page type: doc
---
~/build/BUILD/ruby-3.2.0-c5eefb7f37
+ popd
+ /usr/bin/make -O -j8 V=1 VERBOSE=1 'COPY=cp -p' -C redhat-linux-build
make: Entering directory '/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/redhat-linux-build'
BASERUBY = echo executable host ruby is required. use --with-baseruby option.; false
CC = gcc
LD = ld
LDSHARED = gcc -shared
CFLAGS = -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -m64
XCFLAGS = -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fno-strict-overflow -fvisibility=hidden -fexcess-precision=standard -DRUBY_EXPORT -fPIE -I. -I.ext/include/x86_64-linux -I/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/include -I/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37 -I/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/enc/unicode/15.0.0
CPPFLAGS =
DLDFLAGS = -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -Wl,--compress-debug-sections=zlib -fstack-protector-strong -pie -m64
SOLIBS = yjit/target/release/libyjit.a -lz -lrt -lrt -lgmp -ldl -lcrypt -lm -lpthread
LANG = C
LC_ALL =
LC_CTYPE =
MFLAGS = -w -j8 -Otarget --jobserver-auth=4,5
RUSTC = rustc
YJIT_RUSTC_ARGS = --crate-name=yjit --crate-type=staticlib --edition=2021 -g -C opt-level=3 -C overflow-checks=on '--out-dir=/builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/redhat-linux-build/yjit/target/release/' /builddir/build/BUILD/ruby-3.2.0-c5eefb7f37/yjit/src/lib.rs
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
... snip ...
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
Thank you for checking the patches and testing different configurations. It looks like you're using quite a lot of custom configuration flags, most of which I'm not too familiar with. I'll look into this in the new year and debug the issue.
Updated by vo.x (Vit Ondruch) almost 2 years ago
peterzhu2118 (Peter Zhu) wrote in #note-12:
It looks like you're using quite a lot of custom configuration flags
Right, mostly hardening. Which can certainly influence memory layout and what not.
most of which I'm not too familiar with.
I think that these are documented here:
https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/rawhide/f/buildflags.md
The "Individual compiler flags" section and bellow might be of your interest.
Updated by mtasaka (Mamoru TASAKA) almost 2 years ago
This seems also due to LTO, adding %global _lto_cflags %{nil}
to ruby.spec, i.e. removing -flto=auto -ffat-lto-objects
from compilation flag seems to make this test pass.
Updated by vo.x (Vit Ondruch) almost 2 years ago
mtasaka (Mamoru Tasaka) wrote in #note-14:
This seems also due to LTO, adding
%global _lto_cflags %{nil}
to ruby.spec, i.e. removing-flto=auto -ffat-lto-objects
from compilation flag seems to make this test pass.
Thx a lot for help with the analysis!
BTW I have asked Fedorians if there is some convenient way to setup the build options Fedora is using 1 and it seems that this could be the way:
$ sudo dnf install redhat-rpm-config
$ $ rpm -E '%set_build_flags'
CFLAGS="${CFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection}" ; export CFLAGS ;
CXXFLAGS="${CXXFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection}" ; export CXXFLAGS ;
FFLAGS="${FFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules}" ; export FFLAGS ;
FCFLAGS="${FCFLAGS:--O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules}" ; export FCFLAGS ;
VALAFLAGS="${VALAFLAGS:--g}" ; export VALAFLAGS ;
LDFLAGS="${LDFLAGS:--Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 }" ; export LDFLAGS ;
LT_SYS_LIBRARY_PATH="${LT_SYS_LIBRARY_PATH:-/usr/lib64:}" ; export LT_SYS_LIBRARY_PATH ;
CC="${CC:-gcc}" ; export CC ;
CXX="${CXX:-g++}" ; export CXX
$ eval `rpm -E '%set_build_flags'`
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
Thanks to @mtasaka (Mamoru TASAKA) and @vo.x (Vit Ondruch) for debugging this further! Using this information, I think this is a bug in GCC. Here's what I did:
-
I can confirm that I can reproduce this bug on Ubuntu 22.04 and GCC 11.3.0 with:
CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
-
I noticed during debugging the function
rb_shape_traverse_from_new_root
had seemingly impossible behaviour. I confirm that if I disable optimizations on that function, then test passes as expected.#pragma GCC push_options #pragma GCC optimize ("O0") rb_shape_t * rb_shape_traverse_from_new_root(rb_shape_t *initial_shape, rb_shape_t *dest_shape) { // ... } #pragma GCC pop_options
-
I further isolated it down by setting
CFLAGS='-flto=auto'
, which causes Ruby to segfault. -
Interestingly, disabling optimizations (
CFLAGS='-O0 -flto=auto'
) causes the test to pass.
As such, I think there's an issue with -flto
that causes bugs in Ruby.
Updated by alanwu (Alan Wu) almost 2 years ago
Looking at rb_shape_traverse_from_new_root()
, it seems like it's triggering UB
in a way that also failed with LTO in the past. Maybe try this patch to confirm?
diff --git a/shape.c b/shape.c
index 7580003412..8e75f4a270 100644
--- a/shape.c
+++ b/shape.c
@@ -449,6 +449,7 @@ rb_shape_traverse_from_new_root(rb_shape_t *initial_shape, rb_shape_t *dest_shap
{
RUBY_ASSERT(initial_shape->type == SHAPE_T_OBJECT);
rb_shape_t *next_shape = initial_shape;
+ VALUE lookup_result;
if (dest_shape->type != initial_shape->type) {
next_shape = rb_shape_traverse_from_new_root(initial_shape, rb_shape_get_parent(dest_shape));
@@ -462,7 +463,10 @@ rb_shape_traverse_from_new_root(rb_shape_t *initial_shape, rb_shape_t *dest_shap
if (!next_shape->edges) {
return NULL;
}
- if (!rb_id_table_lookup(next_shape->edges, dest_shape->edge_name, (VALUE *)&next_shape)) {
+ if (rb_id_table_lookup(next_shape->edges, dest_shape->edge_name, &lookup_result)) {
+ next_shape = (rb_shape_t *)lookup_result;
+ }
+ else {
return NULL;
}
break;
I wrote a post about this class of issue in the past: https://alanwu.space/post/strict-aliasing/
Updated by vo.x (Vit Ondruch) almost 2 years ago
alanwu (Alan Wu) wrote in #note-17:
Maybe try this patch to confirm?
The patch helps to mitigate the issue.
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
Thank you @alanwu (Alan Wu) for looking into this and @vo.x (Vit Ondruch) for confirming the fix. I've also confirmed that this fix does indeed make the tests pass. I've opened a PR here: https://github.com/ruby/ruby/pull/7067
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
@vo.x (Vit Ondruch) Do you need this backported to Ruby 3.2 or is it ok to keep it only on the master branch (i.e. it will be part of Ruby 3.3)?
Updated by alanwu (Alan Wu) almost 2 years ago
@vo.x (Vit Ondruch) You might want to consider packaging with -fno-strict-aliasing
.
I suspect the perf loss should be minimal if noticeable at all and it would
mitigate these type of strict aliasing violations, yielding an artifact more likely to be correct.
Using LTO increases the exposure to strict aliasing violations. Since these bugs are rather
arcane and time consuming to track down you might consider this to be a good compromise.
Updated by vo.x (Vit Ondruch) almost 2 years ago
peterzhu2118 (Peter Zhu) wrote in #note-20:
@vo.x (Vit Ondruch) Do you need this backported to Ruby 3.2 or is it ok to keep it only on the master branch (i.e. it will be part of Ruby 3.3)?
Unfortunately, I have no idea how likely is to hit this in real applications. I'd leave the decision to you. For the moment, I have disabled the test case in Fedora, so the only advantage from my POV would be to make the Fedora .spec file a bit cleaner, which is never bad thing ;)
alanwu (Alan Wu) wrote in #note-21:
@vo.x (Vit Ondruch) You might want to consider packaging with
-fno-strict-aliasing
.
Generally, I leave the decision about the compiler options to distribution. From that POV, I don't think using -fno-strict-aliasing
is an option. Of course if this was deemed the right compiler option by Ruby upstream, that would be different thing.
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
- Status changed from Open to Closed
Applied in changeset git|273dca3aed7989120d57f80c789733d4bc870ffe.
Fix undefined behavior in shape.c
Under strict aliasing, writing to the memory location of a different
type is not allowed and will result in undefined behavior. This was
happening in shape.c due to rb_id_table_lookup
writing to the memory
location of VALUE *
that was casted from a rb_shape_t **
.
This was causing test failures when compiled with LTO.
Fixes [Bug #19248]
Co-Authored-By: Alan Wu alanwu@ruby-lang.org
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
- Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED
Updated by peterzhu2118 (Peter Zhu) almost 2 years ago
I flagged it for backporting so it will be available on Ruby 3.2.1 :)
Updated by naruse (Yui NARUSE) almost 2 years ago
- Backport changed from 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE
ruby_3_2 08ae7f64dc52c2b61e451d6e79ebdae73d482677 merged revision(s) 273dca3aed7989120d57f80c789733d4bc870ffe.