Project

General

Profile

Actions

Misc #20661

open

Stop retrying tests in `make test-all` command by default

Added by ono-max (Naoto Ono) 5 months ago. Updated 3 months ago.

Status:
Assigned
[ruby-core:118777]

Description

Summary

Currently, tests are retried once if they fail when tests are executed as parallel tests in the make test-all command by default. This is to prevent test failures caused by parallel testing. That's why failed tests are executed serially, not in parallel.
Source code: https://github.com/ruby/ruby/blob/master/tool/lib/test/unit.rb#L728-L751

However, there is a problem, which might hide a real issue like https://bugs.ruby-lang.org/issues/20314. To prevent missing this "real" problem, stopping the retry of tests in the make test-all command by default would be better.

Concerns when stopping the retry of tests in the make test-all command

Here are some concerns about stopping the retry mechanism in make test-all command.

1. There are some flaky tests in the test suite. Are we okay with that?

Yes, all flaky tests are tracked by Launchable, and they're mainly monitored by @mame (Yusuke Endoh). Thus, they'll be fixed in the future.

2. When a test that is not related to my change fails, what should we do?

Just click the "Re-run jobs" button via the GitHub UI.

Updated by kjtsanaktsidis (KJ Tsanaktsidis) 5 months ago

I have no objection to disabling automatic re-runs, but one thing:

Just click the "Re-run jobs" button via the GitHub UI.

Is this available to people who aren't a member of the Ruby organisation on Github? I think it might not be... I certainly remember pushing empty commits to poke CI because of this.

Updated by mame (Yusuke Endoh) 4 months ago

I add some background.

As a result of the introduction of Launchable (#20254), it was discovered that many test failures were being unintentionally masked by automatic retry. Many were test-side problems, such as timeouts, but some were hidden by intrinsic implementation problems (#20314).

I believe we should stop the automatic retry. I have spent several months or so working on preventing the flaky failures. According to Launchable, I have reduced the number of flaky failures sufficiently, so I asked @ono-max (Naoto Ono) to stop the automatic retry.

Since flaky failures cannot be completely eliminated, the frequency of manual retries will go up a bit. However, I think that the additional frequency is now low enough.

I know that flake failure can be confusing, especially for first-time contributors. But currently, it is more important to allow skilled committers to observe failures (unless all committers watch Launchable).
My personal wish is that more committers would have the ownership to fix flaky failures, rather than manually retrying because "it shouldn't have anything to do with my changes".

kjtsanaktsidis (KJ Tsanaktsidis) wrote in #note-1:

I have no objection to disabling automatic re-runs, but one thing:

Just click the "Re-run jobs" button via the GitHub UI.

Is this available to people who aren't a member of the Ruby organisation on Github? I think it might not be... I certainly remember pushing empty commits to poke CI because of this.

Yeah, I think it needs to be done that way. But this is not a new problem. Even with the automatic retry, flaky failures have occurring sometimes.

Actions #3

Updated by hsbt (Hiroshi SHIBATA) 3 months ago

  • Status changed from Open to Assigned
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0