Project

General

Profile

Bug #15490

socket.rb - recurring segmentation faults

Added by matthew.oriordan (Matthew O'Riordan) 7 months ago. Updated about 1 month ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin18], ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18], ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
[ruby-core:90833]

Description

With Ruby 2.5.3p105 and now with Ruby 2.6.0 following our recent upgrade, we are sadly still seeing reasonably frequent segmentation faults from Ruby, specifically within socket.rb

Looking in socket.rb, it seems it's related to the address lookup:

Addrinfo.getaddrinfo(nodename, service, family, socktype, protocol, flags).each(&block)

Segfault report below in full. Attached are diagnostic reports too. If there is anything I can do to help reproduce I will, however sadly I have never been able to reproduce reliably, yet sadly it happens once every few days.


Files

ruby_2018-12-31-032126-2_MacBook-Pro.crash (46.8 KB) ruby_2018-12-31-032126-2_MacBook-Pro.crash matthew.oriordan (Matthew O'Riordan), 12/31/2018 03:46 AM
ruby_2018-12-31-032126-3_MacBook-Pro.crash (46.8 KB) ruby_2018-12-31-032126-3_MacBook-Pro.crash matthew.oriordan (Matthew O'Riordan), 12/31/2018 03:46 AM
ruby_2018-12-31-032126-1_MacBook-Pro.crash (46.8 KB) ruby_2018-12-31-032126-1_MacBook-Pro.crash matthew.oriordan (Matthew O'Riordan), 12/31/2018 03:46 AM
ruby_2018-12-31-032125_MacBook-Pro.crash (46.8 KB) ruby_2018-12-31-032125_MacBook-Pro.crash matthew.oriordan (Matthew O'Riordan), 12/31/2018 03:46 AM
bug-15490.log (833 KB) bug-15490.log nobu (Nobuyoshi Nakada), 12/31/2018 08:47 AM

Related issues

Related to Ruby master - Bug #13646: Segmentation fault with postgresql_adapter in RailsOpenActions
Has duplicate Ruby master - Bug #15639: [BUG] Segmentation fault at 0x000000010e82ca3aOpenActions

History

Updated by nobu (Nobuyoshi Nakada) 7 months ago

Always it happens here, though I couldn't find the source of si_destination_compare, it may be a problem in libsystem_info.dylib.

7   ???                             0x00007fc6cddeaac0 0 + 140491834174144
8   libsystem_trace.dylib           0x00007fff6e31adb4 os_log_type_enabled + 627
9   libsystem_info.dylib            0x00007fff6e23305b si_destination_compare_statistics + 1659
10  libsystem_info.dylib            0x00007fff6e231bf3 si_destination_compare_internal + 707
11  libsystem_info.dylib            0x00007fff6e231762 si_destination_compare + 530
12  libsystem_info.dylib            0x00007fff6e20f95f _gai_addr_sort + 111
13  libsystem_c.dylib               0x00007fff6e1b9a0f _isort + 193
14  libsystem_c.dylib               0x00007fff6e1b993c _qsort + 2159
15  libsystem_info.dylib            0x00007fff6e207135 _gai_sort_list + 789
16  libsystem_info.dylib            0x00007fff6e205b88 si_addrinfo + 2040
17  libsystem_info.dylib            0x00007fff6e205262 _getaddrinfo_internal + 242
18  libsystem_info.dylib            0x00007fff6e20515d getaddrinfo + 61

Updated by matthew.oriordan (Matthew O'Riordan) 7 months ago

Is there something I can do to help with the source of si_destination_compare, and the problem you believe is related to libsystem_info.dylib?

Updated by jessebs (Jesse Bowes) 6 months ago

I have run into a similar issue using Ruby 2.5.1 but unfortunately don't have an easy way to reproduce.

A couple of things that help mitigate it (and may be useful for finding the actual issue):

getaddrinfo is in the backtrace and this is happening around some network code for me. I found that using an IP address instead of hostname makes the issue go away.

Another option that I have found is that around the code giving problems, turning off Garbage Collection will make it go away as well (GC.disable).

#4

Updated by nobu (Nobuyoshi Nakada) 5 months ago

  • Has duplicate Bug #15639: [BUG] Segmentation fault at 0x000000010e82ca3a added

Updated by zormandi (Zoltan Ormandi) 4 months ago

We're seeing this issue as well, on Ruby 2.6.1. For us, it occurs towards the end of a fairly large test suite when running one of our legacy Cucumber tests. When we only run the Cucumber section of our test suite (not the whole thing) then the issue does not occur. Also, it does not happen on our CI server which makes me suspect that this might be an OSX-exclusive problem - we're only seeing it on our Macbooks.

The test that triggers the crash starts up a fake web server using WEBrick to simulate one of our services. It binds to 'http://localhost:42638' but the suggestion of using an IP address instead of a hostname didn't solve the problem for us; it still occurs if we change the binding to 'http://127.0.0.1:42638'.

Let me know if there's any information that could help (other than a reproduce script, which I obviously cannot provide) - it would be great to get rid of this bug.

UPDATE

Unfortunately, I was wrong. The issue does sometimes occur even when only the Cucumber section of our test suite is being executed. Also, turning off the GC didn't help either.

Updated by PikachuEXE (Pikachu Leung) 4 months ago

I might got a similar issue with 2.6.2 (also crash at os_log_type_enabled + 627)
https://bugs.ruby-lang.org/issues/15623#note-2

See update #2

Updated by matthew.oriordan (Matthew O'Riordan) 2 months ago

  • ruby -v changed from ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-darwin18] to ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18]

This issue is still happening with the latest version of Ruby 2.6.3. Happy to provide more logs / run tests if I can help.

Updated by matthew.oriordan (Matthew O'Riordan) 2 months ago

Some background to how I have worked around this for now, which may be useful.
I use the parallel gem https://github.com/grosser/parallel, which can parallelise tasks using threads of processes. When switching from processes to threads, this issue has gone away. In some code baths with a CLI we use locally, processes are preferable given the isolation from the running code, however in this case it was not an issue to use threads and arguably also better from a resource perspective.

#9

Updated by jeremyevans0 (Jeremy Evans) about 1 month ago

  • Related to Bug #13646: Segmentation fault with postgresql_adapter in Rails added

Updated by mylesgearon (Myles Gearon) about 1 month ago

  • ruby -v changed from ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18] to ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin18], ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18], ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]

I have been experiencing this issue as well, but only on a computer running OSX 10.14.5. I can't seem to recreate this on linux using Fedora 29 or Ubuntu 18.04.

Switching the OSX over to 127.0.0.1 instead of localhost seems to crash less? But I'm still getting the segfault there. The segfault happens on 2.5.0 and 2.6.3 for OSX.

Also available in: Atom PDF