Bug #17529
closedRactor Segfaults with GC enabled
Description
I've been benchmarking Ractor
on my machine with the following naive prime number generator:
# frozen_string_literal: true
def prime?(n)
2.upto(n - 1).none? { |i| n % i == 0 }
end
NUM_WORKERS = ARGV[0].to_i
producer = Ractor.new do
i = 1000000
loop { Ractor.yield i; i += 1 }
end
workers = (1..NUM_WORKERS).map do
Ractor.new producer do |producer|
while n = producer.take
Ractor.yield [n, prime?(n)]
end
end
end
loop do
_r, ( number, prime ) = Ractor.select(*workers)
p number if prime
end
The code inevitably segfaults, and it appears to be the garbage collector.
If I stick GC.disable
in there, the code happily chugs along for several minutes on end without a problem.
Files
Updated by prajjwal (Prajjwal Singh) almost 4 years ago
- File ractor.crash ractor.crash added
Updated by marcandre (Marc-Andre Lafortune) almost 4 years ago
- Related to Bug #17489: Ractor segfaults added
Updated by marcandre (Marc-Andre Lafortune) almost 4 years ago
Thanks for the report.
Probably the same bug as #17489
Updated by ko1 (Koichi Sasada) almost 4 years ago
I couldn't reproduce it. Could you tell me ARGV[0]
?
BTW please fill "ruby -v:" filed with your environment (even if it is in crash log)
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
Updated by prajjwal (Prajjwal Singh) almost 4 years ago
It crashes for any value of ARGV[0] between 1 and 25 (that I tested).
The fact that its happening so consistently for me and not for you makes me wonder if the problem stems from my version of Linux or GCC? Some other compile time option perhaps?
Here's my GCC version:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (GCC)
And Linux:
Linux Wraith 5.9.14-arch1-1 #1 SMP PREEMPT Sat, 12 Dec 2020 14:37:12 +0000 x86_64 GNU/Linux
Ruby Configure Args
'--prefix=/home/prajjwal/.rbenv/versions/3.0.0' '--enable-shared' 'LDFLAGS=-L/home/prajjwal/.rbenv/versions/3.0.0/lib ' 'CPPFLAGS=-I/home/prajjwal/.rbenv/versions/3.0.0/include '
Updated by prajjwal (Prajjwal Singh) almost 4 years ago
- ruby -v changed from 3.0.0 to ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
Updated by prajjwal (Prajjwal Singh) almost 4 years ago
Just confirmed that it only segfaults when ruby is configured with the --enable-shared
option (which rbenv
does by default).
Even more info:
glibc 2.32-5
Updated by ko1 (Koichi Sasada) almost 4 years ago
- Status changed from Open to Assigned
- Assignee set to ko1 (Koichi Sasada)
Updated by ko1 (Koichi Sasada) almost 4 years ago
hmm I can't reproduce it yet. Can someone try it and get more information about it?
Updated by wanabe (_ wanabe) almost 3 years ago
I confirmed with 3.0.0 that the issue is reproducible.
According to git bisect
, it seems to be fixed in fff1edf23ba28267bf57097c269f7fa87530e3fa and d0d6227a0da5925acf946a09191f172daf53baf2.
$ (git checkout origin/ruby_3_0 && git cherry-pick d0d6227a0da5925acf946a09191f172daf53baf2 fff1edf23ba28267bf57097c269f7fa87530e3fa && make miniruby -j8 ) >/dev/null 2>&1 && ./miniruby -v -W0 segv.rb
ruby 3.0.4p197 (2022-03-13 revision b04eb796e4) [x86_64-linux]
$ (git checkout origin/ruby_3_0 && make miniruby -j8 ) >/dev/null 2>&1 && ./miniruby -v -W0 segv.rbruby 3.0.4p197 (2022-03-13 revision f404b21f84) [x86_64-linux]
<internal:ractor>:627: [BUG] Segmentation fault at 0x0000000000000020
ruby 3.0.4p197 (2022-03-13 revision f404b21f84) [x86_64-linux]
-- Control frame information -----------------------------------------------
c:0003 p:0003 s:0015 e:000014 METHOD <internal:ractor>:627
c:0002 p:0019 s:0008 e:000007 BLOCK segv.rb:13 [FINISH]
c:0001 p:---- s:0003 e:000002 (none) [FINISH]
-- Ruby level backtrace information ----------------------------------------
segv.rb:13:in `block (4 levels) in <main>'
<internal:ractor>:627:in `yield'
-- Machine register context ------------------------------------------------
RIP: 0x0000560d0246f8d8 RBP: 0x00007f1c80f28920 RSP: 0x00007f1c80f28800
RAX: 0x0000000000000000 RBX: 0x00007f1c80f28810 RCX: 0x0000000000000000
RDX: 0x0000000000000001 RDI: 0x0000560d04738b98 RSI: 0x0000000000000000
R8: 0x0000560d04738e10 R9: 0x0000000000000000 R10: 0x0000000000000001
R11: 0x0000000000000002 R12: 0x00007f1c80f28820 R13: 0x0000560d04738b70
R14: 0x0000560d047416b8 R15: 0x00007f1c80f28810 EFL: 0x0000000000010246
-- C level backtrace information -------------------------------------------
./miniruby(rb_vm_bugreport+0x4a4) [0x560d02565b34]
./miniruby(rb_bug_for_fatal_signal+0xf4) [0x560d02369a54]
./miniruby(sigsegv+0x4d) [0x560d024bae1d]
[0x7f1c85894520]
./miniruby(ractor_select+0x478) [0x560d0246f8d8]
./miniruby(builtin_inline_class_627+0x3e) [0x560d0247019e]
./miniruby(vm_exec_core+0x32cd) [0x560d0254d31d]
./miniruby(rb_vm_exec+0x1a2) [0x560d0254f8d2]
./miniruby(thread_do_start_proc+0x339) [0x560d025057d9]
./miniruby(thread_start_func_2+0xc84) [0x560d02506554]
./miniruby(thread_start_func_1+0xde) [0x560d0250682e]
[0x7f1c858e6947]
[0x7f1c85976a44]
-- Other runtime information -----------------------------------------------
(snip)
And the following script has been modified to make it easier to try.
1000.times do |q|
producer = Ractor.new do
1000.times do |i|
Ractor.yield true
end
end
workers = (1..10).map do
Ractor.new producer do |producer|
while n = producer.take
Ractor.yield nil
end
rescue Ractor::ClosedError
end
end
loop do
_r, prime = Ractor.select(*workers)
end
end
Updated by wanabe (_ wanabe) almost 3 years ago
I guess that the btest failure of ruby_3_0 branch on icc-x64 env may be fixed by git cherry-pick d0d6227a0da5925acf946a09191f172daf53baf2 fff1edf23ba28267bf57097c269f7fa87530e3fa
.
(An example of this failure is http://rubyci.s3.amazonaws.com/icc-x64/ruby-3.0/log/20220321T004434Z.log.html.gz#test.rb)
Updated by nagachika (Tomoyuki Chikanaga) almost 3 years ago
- Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED
Updated by nagachika (Tomoyuki Chikanaga) almost 3 years ago
- Backport changed from 2.6: DONTNEED, 2.7: DONTNEED, 3.0: REQUIRED to 2.6: DONTNEED, 2.7: DONTNEED, 3.0: DONE
ruby_3_0 a72b7b898c69a116d754d599e8bb061761015255 merged revision(s) d0d6227a0da5925acf946a09191f172daf53baf2,fff1edf23ba28267bf57097c269f7fa87530e3fa.
Updated by nagachika (Tomoyuki Chikanaga) almost 3 years ago
- Status changed from Assigned to Closed
Applied in changeset git|a72b7b898c69a116d754d599e8bb061761015255.
merge revision(s) d0d6227a0da5925acf946a09191f172daf53baf2,fff1edf23ba28267bf57097c269f7fa87530e3fa: [Backport #17529]
alen should be actions number on ractor_select()
alen was number of rs, but it should be actions number
(taking ractors + receiving + yielding).
---
ractor.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
fix Ractor.yield(obj, move: true)
Ractor.yield(obj, move: true) and
Ractor.select(..., yield_value: obj, move: true) tried to yield a
value with move semantices, but if the trial is faild, the obj
should not become a moved object.
To keep this rule, `wait_moving` wait status is introduced.
New yield/take process:
(1) If a ractor tried to yield (move:true), make taking racotr's
wait status `wait_moving` and make a moved object by
`ractor_move(obj)` and wakeup taking ractor.
(2) If a ractor tried to take a message from a ractor waiting fo
yielding (move:true), wakeup the ractor and wait for (1).
---
bootstraptest/test_ractor.rb | 25 +++++++++++++++
ractor.c | 73 +++++++++++++++++++++++++++++++++++---------
ractor_core.h | 1 +
3 files changed, 84 insertions(+), 15 deletions(-)