Project

General

Profile

Actions

Bug #18048

closed

Thread#join can break with fiber scheduler unblock fails or blocks.

Added by ioquatix (Samuel Williams) over 2 years ago. Updated over 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:104692]

Description

In addition to https://bugs.ruby-lang.org/issues/17666 we found several more cases that need to be addressed.

Fix potential hang when joining threads.

If the thread termination invokes user code after th->status becomes
THREAD_KILLED, and the user unblock function causes that th->status to
become something else (e.g. THREAD_RUNNING), threads waiting in
thread_join_sleep will hang forever. We move the unblock function call
to before the thread status is updated, and allow threads to join as soon
as th->value becomes defined.

Wake up join list within thread EC context. (#4471)

If rb_fiber_scheduler_unblock raises an exception, it can result in a
segfault if rb_threadptr_join_list_wakeup is not within a valid EC. This
change moves rb_threadptr_join_list_wakeup into the thread's top level EC
which initially caused an infinite loop because on exception will retry. We
explicitly remove items from the thread's join list to avoid this situation.

These are already fixed on master branch. Here is a PR for backport: https://github.com/ruby/ruby/pull/4686

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

Thank you for creating the pack for backport. I see the PR was basically backporting 050a89543952a2c9e7c9bc938f4fdb538f6c9278 partially. I will try to merge it.

Actions #2

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

  • Status changed from Open to Closed

Updated by ioquatix (Samuel Williams) over 2 years ago

The PR is 050a89543952a2c9e7c9bc938f4fdb538f6c9278 followed by 13f8521c630a15c87398dee0763e95f59c032a94

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

I see the git:2d4f29e77e883c29e35417799f8001b8046cde03 was pushed as the retry of 13f8521c630a15c87398dee0763e95f59c032a94.
I will pay attention on the RubyCI for a while.

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

I create the backport patch including 050a89543952a2c9e7c9bc938f4fdb538f6c9278 and 13f8521c630a15c87398dee0763e95f59c032a94 and push to my branch. See https://github.com/ruby/ruby/pull/4686/files.

But on the branch, make btest hangs on the bootstraptest/test_ractor.rb.

% make btest
2021-08-14 16:57:56 +0900
Driver is ruby 3.0.3p123 (2021-08-08 revision 3922394c85) [x86_64-darwin19]
Target is ruby 3.0.3p124 (2021-08-14 revision 720d9c0803) [x86_64-darwin19]

test_attr.rb            PASS 2
test_autoload.rb        PASS 8
test_block.rb           PASS 58
test_class.rb           PASS 48
test_env.rb             PASS 2
test_eval.rb            PASS 37
test_exception.rb       PASS 34
test_fiber.rb           PASS 5
test_finalizer.rb       PASS 1
test_flip.rb            PASS 1
test_flow.rb            PASS 62
test_fork.rb            PASS 4
test_gc.rb              PASS 2
test_insns.rb           PASS 383
test_io.rb              PASS 9
test_jump.rb            PASS 29
test_literal.rb         PASS 156
test_literal_suffix.rb  PASS 48
test_load.rb            PASS 2
test_marshal.rb         PASS 1
test_massign.rb         PASS 34
test_method.rb          PASS 223
test_objectspace.rb     PASS 6
test_proc.rb            PASS 37
test_ractor.rb          \
↑ hangs up here

Samuel, would you review my backport candidate branch if you don't mind?

Updated by ioquatix (Samuel Williams) over 2 years ago

I will check it.

Updated by ioquatix (Samuel Williams) over 2 years ago

I rebased my backport PR on ruby_3_0 and could not reproduce the failure. I'll push the updated branch.

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

I am sorry I pointed wrong PR.
This is my backport candidate PR.
https://github.com/ruby/ruby/pull/4896

Updated by nagachika (Tomoyuki Chikanaga) over 2 years ago

  • Backport changed from 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: REQUIRED to 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: DONE

The changesets 050a89543952a2c9e7c9bc938f4fdb538f6c9278, 57eaa07ba6c1ee958c16d5c451e2dceb2208edf1, edbe0e224c2594b7a7b055f0986cbfd690d754d5 and 2d4f29e77e883c29e35417799f8001b8046cde03 are backported by merging https://github.com/ruby/ruby/pull/4686.

Updated by ioquatix (Samuel Williams) over 2 years ago

Thank you so much!

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0