Bug #19234: [3.2.0dev] YJIT code GC can lead to crashes - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #19234

closed

[3.2.0dev] YJIT code GC can lead to crashes

Added by byroot (Jean Boussier) over 2 years ago. Updated over 2 years ago.

Status:

Closed

Assignee:

k0kubun (Takashi Kokubun)

Target version:

3.2

ruby -v:

ruby 3.2.0dev (2022-12-13T16:07:29Z master a66a69865d) +YJIT [x86_64-linux]

Backport:

2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED

[ruby-core:111281]

Description

Filing this bug here in case some people may have observed it too and may have more information, and also to keep track of it for the upcoming 3.2.0 release.

After changing some settings on our CI to make sure YJIT's code_gc would trigger, we discovered that it sometimes cause crashes.

The crash can take many different form (e.g. [BUG] Segmentation fault at 0x00005604a8e78006 or [BUG] Illegal instruction at 0x0000aaaacc0ce4c0), and happens on both x86 and arm64.

It however happens very consistently on our CI, but only after running for 15 to 20 minutes and we haven't been able to reduce it to a local reproduction script.

When it happens however the backtrace isn't really helpful:

-- C level backtrace information -------------------------------------------
/usr/local/ruby/bin/real-ruby(rb_print_backtrace+0x11) [0x5604a8a6df7d] vm_dump.c:770
/usr/local/ruby/bin/real-ruby(rb_vm_bugreport) vm_dump.c:1065
/usr/local/ruby/bin/real-ruby(rb_bug_for_fatal_signal+0xee) [0x5604a8ba927e] error.c:813
/usr/local/ruby/bin/real-ruby(sigsegv+0x4d) [0x5604a89c3ded] signal.c:964
/lib/x86_64-linux-gnu/libpthread.so.0(__restore_rt+0x0) [0x7fb5b4285420]
[0x5604acd31079]

Like regular GC bugs, it is likely that the code GC need to trigger at a very specific place for the bug to happen. Our attempts at triggering it manually with RubyVM::YJIT.code_gc or to set the executable memory very low to trigger it more often didn't allow for a simpler reproduction.

Both @k0kubun (Takashi Kokubun) and @alanwu (Alan Wu) are investigating it right now.

Actions

Copy link

Updated by byroot (Jean Boussier) over 2 years ago

Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED

Actions

Copy link

Updated by alanwu (Alan Wu) over 2 years ago

Status changed from Open to Closed

Applied in changeset git|5fa608ed79645464bf80fa318d89745159301471.

YJIT: Fix code GC freeing stubs with a trampoline (#6937)

Stubs we generate for invalidation don't necessarily co-locate with the
code that jump to the stub. Since we rely on co-location to keep stubs
alive as they are in the outlined code block, it used to be possible for
code GC inside branch_stub_hit() to free the stub that's its direct
caller, leading us to return to freed code after.

Stubs used to look like:

mov arg0, branch_ptr
mov arg1, target_idx
mov arg2, ec
call branch_stub_hit
jmp return_reg

Since the call and the jump after the call is the same for all stubs, we
can extract them and use a static trampoline for them. That makes
branch_stub_hit() always return to static code. Stubs now look like:

mov arg0, branch_ptr
mov arg1, target_idx
jmp trampoline

Where the trampoline is:

mov arg2, ec
call branch_stub_hit
jmp return_reg

Code GC can now free stubs without problems since we'll always return
to the trampoline, which we generate once on boot and lives forever.

This might save a small bit of memory due to factoring out the static
part of stubs, but it's probably minor.

[Bug #19234]

Co-authored-by: Takashi Kokubun takashikkbn@gmail.com

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #19234

[3.2.0dev] YJIT code GC can lead to crashes

Updated by byroot (Jean Boussier) over 2 years ago

Updated by alanwu (Alan Wu) over 2 years ago