Project

General

Profile

Actions

Misc #20013

open

Travis CI status

Added by jaruga (Jun Aruga) 12 months ago. Updated 6 days ago.

Status:
Open
Assignee:
-
[ruby-core:115438]

Description

I would like to use this ticket to manage our activities to report Travis CI status.

Because there is Travis CI status page provided by Travis CI. However, even when the page shows ok, I actually see infra issues.
https://www.traviscistatus.com/

I would share my activities and report the Travis CI status on the ticket.
The ticket's status is not closed until we stop using Travis CI.

The easiest option to fix the Travis infra issue is to email Travis CI support support _AT_ travis-ci.com.

You can check this ruby/ruby Travis CI wiki page for details.


Related issues 1 (1 open0 closed)

Related to Ruby master - Misc #20320: Using OSU Open Source Lab native ppc64le/s390x CI services trigged on pull-requestsOpenActions

Updated by jaruga (Jun Aruga) 12 months ago

I am seeing that Travis s390x builds are not starting right now. I am asking to fix it by emailing Travis CI customer support.

https://app.travis-ci.com/github/ruby/ruby/builds/267381855
https://app.travis-ci.com/github/ruby/ruby/builds/267383404

Updated by jaruga (Jun Aruga) 12 months ago

It seems that s390x build takes time to start. But the builds are still running.
https://app.travis-ci.com/github/ruby/ruby/builds

Updated by jaruga (Jun Aruga) 12 months ago

I asked Travis CI support about the s390x build issue yesterday. The support replied that they are investigating the issue now.

Updated by jaruga (Jun Aruga) 12 months ago

I will enable allow_failures for s390x. I am sorry for that.
https://github.com/ruby/ruby/pull/8997

Updated by jaruga (Jun Aruga) 12 months ago

I see the following infra is colored as yellow (not green).

https://www.traviscistatus.com/

Pusher Webhooks - Degraded Performance

Updated by jaruga (Jun Aruga) 12 months ago

I will drop the s390x temporarily. I guess that there are maximum queue number in Travis CI. And as s390x builds are in the queue, other CPU architecture builds (arm64, arm32, ppc64le) even don't start.
https://github.com/ruby/ruby/pull/9004

Updated by jaruga (Jun Aruga) 12 months ago

I am canceling the s390x builds manually for the running Travis builds.

Updated by jaruga (Jun Aruga) 12 months ago

I can see Travis CI builds are stable except for s390x.
https://app.travis-ci.com/github/ruby/ruby/builds

I am communicating with Travis CI support. It seems they added "IBM Z Builds" in Build Processing on Travis CI status page. And you see the status is Degraded Performance (yellow color). It's really helpful! I am asking them to add the "Arm builds" / "IBM ppc64le builds" on the page too.
https://www.traviscistatus.com/

Updated by jaruga (Jun Aruga) 12 months ago

I am testing the s390x builds on my forked repository to add it again.
https://github.com/ruby/ruby/pull/9024

Updated by jaruga (Jun Aruga) 12 months ago

I tested the PR to add the s390x on my forked repository, and merged it. Now Travis CI has the s390x pipeline again.

Updated by jaruga (Jun Aruga) 12 months ago

I was told by Travis customer support that their infra team resolved the issue with s390x builds, and the builds should work now.

Updated by jaruga (Jun Aruga) 11 months ago

Now I am asking Travis CI support by emailing them about the following error messages which are printed in only Arm64 pipelines, and it seems not affected to the result of the CI tests.

https://app.travis-ci.com/github/ruby/ruby/jobs/615194806#L6

sudo: unable to resolve host travis-job-ruby-ruby-615194806: Name or service not known

I opened the thread about the issue in the end of the October 2023, but I haven't seen the response there.
https://travis-ci.community/t/arm64-sudo-unable-to-resolve-host-name-or-service-not-known/14028

So, I emailed them today, and then I was told that the support has reached out to the Travis infra team. I will let you know here when I have updates.

Updated by jaruga (Jun Aruga) 9 months ago

It seems that Travis s390x is slow, running out the max 50 minutes (ruby_3_3 specific issue?),
https://app.travis-ci.com/github/ruby/ruby/builds/268615249

Or not starting soon.
https://app.travis-ci.com/github/ruby/ruby/builds/268616415

I am contacting Travis CI support.

Updated by jaruga (Jun Aruga) 9 months ago

I will drop the s390x case in Travis CI temporarily. I am not sure that the issue comes from an infra or Ruby. But right now the test failing with 50 minutes is not convenient as a CI.
https://github.com/ruby/ruby/pull/9758

Updated by jaruga (Jun Aruga) 9 months ago

jaruga (Jun Aruga) wrote in #note-14:

I will drop the s390x case in Travis CI temporarily. I am not sure that the issue comes from an infra or Ruby. But right now the test failing with 50 minutes is not convenient as a CI.
https://github.com/ruby/ruby/pull/9758

I got a message from Travis CI support "Our Infra team has deployed a fix for the issue you encountered with the s390x Build environment."
Now I am testing the Travis s390x on my forked repository.

Updated by jaruga (Jun Aruga) 9 months ago

Now I am testing the Travis s390x on my forked repository.

I tested. I sent a PR to add the s390x again.
https://github.com/ruby/ruby/pull/9773

Updated by jaruga (Jun Aruga) 9 months ago

jaruga (Jun Aruga) wrote in #note-16:

Now I am testing the Travis s390x on my forked repository.

I tested. I sent a PR to add the s390x again.
https://github.com/ruby/ruby/pull/9773

Merged. The s390x is added on Travis again.

Updated by jaruga (Jun Aruga) 9 months ago

It seems some s390x builds are not starting after 3 hours now. I am asking Travis customer support.

https://app.travis-ci.com/github/ruby/ruby/builds
https://app.travis-ci.com/github/ruby/ruby/builds/269093276
https://app.travis-ci.com/github/ruby/ruby/builds/269093679

https://www.traviscistatus.com/ - Builds Processing - IBM Z Builds shows operational (green).

Updated by jaruga (Jun Aruga) 8 months ago

We are seeing one of s390x build[1] is very slow, exceeding the maximum timeout 50 minutes, totally taking the make test-all build time for 1515 seconds (= about 25 minutes) So far this build is only the case. This behavior is not normal. Because the next build[2] takes total 28 minutes 40 seconds, taking the make test-all build time 683 seconds (= about 11 minutes). I suspect this may come from a specific slow running machine. And I am asking Travis support about this issue.

[1] https://app.travis-ci.com/github/ruby/ruby/jobs/618214295#L2094
[2] https://app.travis-ci.com/github/ruby/ruby/jobs/618215618#L2262

Updated by jaruga (Jun Aruga) 8 months ago

jaruga (Jun Aruga) wrote in #note-19:

We are seeing one of s390x build[1] is very slow, exceeding the maximum timeout 50 minutes, totally taking the make test-all build time for 1515 seconds (= about 25 minutes) So far this build is only the case. This behavior is not normal. Because the next build[2] takes total 28 minutes 40 seconds, taking the make test-all build time 683 seconds (= about 11 minutes). I suspect this may come from a specific slow running machine. And I am asking Travis support about this issue.

[1] https://app.travis-ci.com/github/ruby/ruby/jobs/618214295#L2094
[2] https://app.travis-ci.com/github/ruby/ruby/jobs/618215618#L2262

Today I found another s390x build exceeding the maximum timeout 50 minutes. Interestingly it took the make test-all build time for 588 seconds (= about 9 minutes). That is normal.
https://app.travis-ci.com/github/ruby/ruby/jobs/618265449#L2095
But it seems that a freezing happened in the step of the make test-spec.
https://app.travis-ci.com/github/ruby/ruby/jobs/618265449#L3079

Updated by jaruga (Jun Aruga) 8 months ago

jaruga (Jun Aruga) wrote in #note-19:

We are seeing one of s390x build[1] is very slow, exceeding the maximum timeout 50 minutes, totally taking the make test-all build time for 1515 seconds (= about 25 minutes) So far this build is only the case. This behavior is not normal. Because the next build[2] takes total 28 minutes 40 seconds, taking the make test-all build time 683 seconds (= about 11 minutes). I suspect this may come from a specific slow running machine. And I am asking Travis support about this issue.

[1] https://app.travis-ci.com/github/ruby/ruby/jobs/618214295#L2094
[2] https://app.travis-ci.com/github/ruby/ruby/jobs/618215618#L2262

I was told from the Travis support that the Travis's engineers were able to check this issue by their message below.

Thanks so much for your patience here.

Our engineers were able to check on this and you should be able to see your builds are now running. Very sorry for the trouble and we will continue to monitor this!

Updated by Eregon (Benoit Daloze) 8 months ago

FYI mspec has a --timeout SECONDS option, which should help identify which spec is hanging/very slow.

Updated by jaruga (Jun Aruga) 8 months ago

Eregon (Benoit Daloze) wrote in #note-22:

FYI mspec has a --timeout SECONDS option, which should help identify which spec is hanging/very slow.

OK. Thanks for the tip!

Updated by jaruga (Jun Aruga) 8 months ago

We have observed unstable Travis ppc64le/s390x pipelines. So, I added the allow_failures to the pipelines by the PR https://github.com/ruby/ruby/pull/10158.

ppc64le

We have seen the following errors around 10 or more times in latest 1 or 2 days.

s390x

The following error happened without any output.

Updated by jaruga (Jun Aruga) 8 months ago

I found the following information on https://www.traviscistatus.com/ . Travis CI is undergoing a maintenance in a week of 27/Feb - 5/Mar.

Back-end maintenance 27-Feb to 5-Mar

Update - Build status on GitHub works. Builds triggered from GitLab, BitBucket and Assembla operational. Next updates on Feb-29.
Feb 28, 2024 - 12:42 UTC

Update - Be advised: Build statsues are not passed back to GitHub after build is executed. Triggering builds from GitLab, BitBucket and Assembla not available. We are in progress with maintenance activities.
Feb 28, 2024 - 11:36 UTC

In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 27, 2024 - 08:00 UTC

Update - Reminder: Travis CI will be undergoing a maintenance in a week of 27/Feb - 5/Mar. There may be intermittent service detoration, particularly on Feb 28th.
Feb 27, 2024 08:00 - Mar 5, 2024 08:00 UTC

Scheduled - Travis CI will be undergoing a maintenance in a week of 27/Feb - 5/Mar. We will do all that we can to not interrupt the service during this period. If you spot erratic or deteriorated service behavior please report back to our support.
Feb 27, 2024 08:00 - Mar 5, 2024 08:00 UTC

Actions #26

Updated by jaruga (Jun Aruga) 8 months ago

  • Related to Misc #20320: Using OSU Open Source Lab native ppc64le/s390x CI services trigged on pull-requests added

Updated by jaruga (Jun Aruga) 8 months ago

jaruga (Jun Aruga) wrote in #note-21:

jaruga (Jun Aruga) wrote in #note-19:

We are seeing one of s390x build[1] is very slow, exceeding the maximum timeout 50 minutes, totally taking the make test-all build time for 1515 seconds (= about 25 minutes) So far this build is only the case. This behavior is not normal. Because the next build[2] takes total 28 minutes 40 seconds, taking the make test-all build time 683 seconds (= about 11 minutes). I suspect this may come from a specific slow running machine. And I am asking Travis support about this issue.

For the slow s390x build issue, I received the following reply from Travis support on 1st March 2024.

Our Infra team has resolved the issue you encountered. In case it resurfaces, please reach back and we will gladly help.

Updated by jaruga (Jun Aruga) 8 months ago

I noticed the following announcement that would happen on this Wednesday, 6th March. So, I will plan to add the allow_failures to the ruby/ruby's arm64, arm32 cases too before the maintenance. I hope ideally Travis will maintain their service without stopping their service.

https://app.travis-ci.com/github/ruby/ruby

Please note: Travis CI is undergoing maintenance. On March 6 , between 08:00-12:00 UTC+0 service may be temporarily unavailable.

Updated by jaruga (Jun Aruga) 8 months ago

jaruga (Jun Aruga) wrote in #note-28:

I noticed the following announcement that would happen on this Wednesday, 6th March. So, I will plan to add the allow_failures to the ruby/ruby's arm64, arm32 cases too before the maintenance. ...

I sent the PR for that.
https://github.com/ruby/ruby/pull/10180

Updated by jaruga (Jun Aruga) 8 months ago

jaruga (Jun Aruga) wrote in #note-29:

jaruga (Jun Aruga) wrote in #note-28:

I noticed the following announcement that would happen on this Wednesday, 6th March. So, I will plan to add the allow_failures to the ruby/ruby's arm64, arm32 cases too before the maintenance. ...

I sent the PR for that.
https://github.com/ruby/ruby/pull/10180

As it seems that the maintenance is finished, I reverted the commit above.
https://github.com/ruby/ruby/pull/10186

Updated by jaruga (Jun Aruga) 8 months ago · Edited

For your information, I saw the following ppc64le job not starting 10 days ago, and contacted Travis support at that time, and still waiting for the fix, though I didn't find any other failures in last few days.

https://app.travis-ci.com/github/ruby/ruby/jobs/619005133

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received
The build has been terminated

Updated by jaruga (Jun Aruga) 5 months ago

I noticed the following message is displayed on our Travis page. I will contact Travis support.

https://app.travis-ci.com/github/ruby/ruby

We are unable to start your build at this time. You exceeded the number of users allowed for your plan. Please review your plan details and follow the steps to resolution.

Updated by jaruga (Jun Aruga) 5 months ago

jaruga (Jun Aruga) wrote in #note-32:

I noticed the following message is displayed on our Travis page. I will contact Travis support.

https://app.travis-ci.com/github/ruby/ruby

We are unable to start your build at this time. You exceeded the number of users allowed for your plan. Please review your plan details and follow the steps to resolution.

Travis support quickly responded and fix the issue removing the message. And I can see the Travis builds are starting to run since 2 hours ago.

However, I still see the message on the Travis page below for my ruby's forked repository. And I am asking the Travis to enable the builds for the ruby's forked repositories too. I was previously able to run Travis builds in my forked repository too.
https://app.travis-ci.com/github/junaruga/ruby/

Updated by jaruga (Jun Aruga) 5 months ago

Travis support enabled my forked repositories of ruby/ruby, ruby/zlib and ruby/prism where Travis was used. It seems that we need to submit a list of github accounts to enable someone's forked repository from ruby/* organization. Maybe they changed how to enable Travis recently.

I am working to get the list of the github accounts who contributed to the ruby/ruby, ruby/zlib and ruby/prism in the last 2 years, and submit the list to Travis support to enable Travis for their forked repositories.

Updated by jaruga (Jun Aruga) 4 months ago

I am working to get the list of the github accounts who contributed to the ruby/ruby, ruby/zlib and ruby/prism in the last 2 years, and submit the list to Travis support to enable Travis for their forked repositories.

I am working to get the list of the github accounts. For example, in the last 2 years for ruby/ruby master branch, there are 496 people. I want to get the github accounts of the people. If you know, how to get the list easily, please let me know.

$ git remote get-url origin
https://github.com/ruby/ruby.git

$ git branch | grep ^*
* master

$ git shortlog --summary --numbered --since="2022-06-27" | head
  2238	Nobuyoshi Nakada
  1397	Takashi Kokubun
  1352	Kevin Newton
  1133	Hiroshi SHIBATA
   697	Peter Zhu
   508	git[bot]
   406	David Rodríguez
   302	Alan Wu
   252	yui-knk
   242	Jemma Issroff

$ git shortlog --summary --numbered --since="2022-06-27" | wc -l
496

Updated by jaruga (Jun Aruga) 4 months ago

As a note, by the Travis's above change, while people can run Travis by sending a pull-request to the ruby/* repositories, people cannot run Travis by pushing commits to the branches in people's forked repositories of the ruby/* repositories right now.

Updated by jaruga (Jun Aruga) 2 months ago

I am seeing Travis infra errors. It started at least since 13th August 2024 until now. Now the errors only happen on ppc64le/s390x cases. I will contact Travis support.

13th August 2024
https://app.travis-ci.com/github/ruby/prism/builds/271850178
https://app.travis-ci.com/github/ruby/prism/builds/271851132

https://app.travis-ci.com/github/ruby/ruby/builds/271996369

Updated by jaruga (Jun Aruga) 2 months ago

Sent a PR to allow failures for ppc64le/s390x on ruby/prism.
https://github.com/ruby/prism/pull/3005

Updated by jaruga (Jun Aruga) 2 months ago

jaruga (Jun Aruga) wrote in #note-37:

I am seeing Travis infra errors. It started at least since 13th August 2024 until now. Now the errors only happen on ppc64le/s390x cases. I will contact Travis support.

13th August 2024
https://app.travis-ci.com/github/ruby/prism/builds/271850178
https://app.travis-ci.com/github/ruby/prism/builds/271851132

https://app.travis-ci.com/github/ruby/ruby/builds/271996369

I emailed Travis support about this issue.

Updated by jaruga (Jun Aruga) 2 months ago

Right now I am seeing Travis's arm64 (and arm32) take time around 7 hours to start the jobs. Therefore I will allow failures for the cases to avoid waiting for the jobs.

https://app.travis-ci.com/github/ruby/ruby/builds/272105049
https://app.travis-ci.com/github/ruby/ruby/builds/272105708

This is the PR. Sorry for your inconvenience.
https://github.com/ruby/ruby/pull/11509

Updated by jaruga (Jun Aruga) 2 months ago

All the Travis partner pipelines (non-x86_64) jobs don't start. So, I will drop all the pipelines on the 2nd commit on the following PR.
https://app.travis-ci.com/github/ruby/ruby/builds/272123937
https://github.com/ruby/ruby/pull/11509

Updated by jaruga (Jun Aruga) 2 months ago

So, I will drop all the pipelines on the 2nd commit on the following PR.

Dropped Travis as temporary workaround. The infra issue is still ongoing. The Travis customer support replied with the following message.

Our team is actively looking into the reported issue and is working hard to find a solution. We will keep you posted on the progress.

Updated by jaruga (Jun Aruga) about 2 months ago

We drop Travis CI for ruby/zlib, ruby/prism too. Because the Travis infra issues are still ongoing since my last report.

The Travis CI status page is not accurate.
https://www.traviscistatus.com/ - Past Incidents

I still see the jobs are not starting with empty outputs.
https://app.travis-ci.com/github/ruby/prism/builds/272244955

Updated by jaruga (Jun Aruga) about 2 months ago

I see the case that the jobs are running with showing the outputs (and failing as expected).
https://app.travis-ci.com/github/ruby/ruby/builds/272246788

Updated by jaruga (Jun Aruga) 11 days ago

I received the following email from Travis support on 19th October.

Could you please try to trigger a build once again when you are ready and let us know if this issue now appears to be resolved?

We will follow up on your reply.

I am starting to test Travis on my forked repository.

Updated by jaruga (Jun Aruga) 10 days ago

We enabled Travis CI again (enabling the allow_failures) by the PR https://github.com/ruby/ruby/pull/11948 solving https://bugs.ruby-lang.org/issues/20810.

Updated by jaruga (Jun Aruga) 6 days ago

I sent the following PRs to enable Travis CI for ruby/prism and ruby/zlib again.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0