Bug #21790
open`Socket.getaddrinfo` hangs after `fork()` on macOS 26.1 (Tahoe) for IPv4-only hosts
Description
Ruby's Socket.getaddrinfo hangs indefinitely in forked child processes on macOS 26.1 (Tahoe) when resolving IPv4-only hostnames. This is a regression that does not occur on macOS 15.x (Sonoma) or earlier.
Ruby version:
ruby 3.3.8 (2025-04-09 revision b200bad6cd) [arm64-darwin24]
Also confirmed this affects Ruby 3.2.6 and 3.4.1.
Reproducible script:
require "socket"
require "timeout"
puts "Ruby #{RUBY_VERSION} on #{RUBY_PLATFORM}"
Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM)
puts "Parent: DNS completed"
pid = fork do
puts "Child: Attempting DNS resolution..."
begin
Timeout.timeout(90) do
Socket.getaddrinfo("api.segment.io", 443, nil, :STREAM)
end
puts "Child: SUCCESS"
exit 0
rescue Timeout::Error
puts "Child: FAILED - hung for 90 seconds"
exit 1
end
end
Process.wait(pid)
Note: Remove the Timeout.timeout(90) wrapper to observe the hang indefinitely. The timeout is included only to allow the script to exit for testing purposes.
Result of reproduce process:
Ruby 3.3.8 on arm64-darwin24
Parent: DNS completed
Child: Attempting DNS resolution...
Child: FAILED - hung for 90 seconds
The child process hangs with one thread consuming 100% CPU.
Expected result: The child process should complete DNS resolution successfully, as it does on macOS 15.x and earlier.
Analysis:
Stack trace shows:
Main thread: Blocked in wait_getaddrinfo → _pthread_cond_wait
DNS thread: Spinning in _gai_nat64_second_pass → nw_path_access_agent_cache → _os_log_preferences_refresh → SIGSEGV
The crash occurs in macOS's NAT64 synthesis code path. Ruby's signal handler catches the SIGSEGV but cannot recover, causing the DNS thread to spin.
Key observations:
- Only affects IPv4-only hosts. Hosts with IPv6 (like google.com) work correctly.
- Using
AF_INETinstead ofAF_UNSPECworks.Socket.getaddrinfo("api.segment.io", 443, Socket::AF_INET, :STREAM)succeeds. - Python is not affected. Python calls
getaddrinfo()synchronously without a background thread. - Parent must do DNS before fork. If the parent has not called getaddrinfo(), the child works correctly.
Workaround:
- Use
resolv-replaceto bypass the native DNS resolver:require "resolv-replace"
Impact:
This breaks all Ruby applications using pre-forking worker models (Resque, Unicorn, Puma, Sidekiq, Passenger) on macOS Tahoe.
Apple Bug Report:
Filed with Apple as Feedback Assistant #FB21364061
Files
Updated by adamoffat (Adam Moffat) 1 day ago
To confirm: MacOS Sequoia also did not have this issue.
Updated by adamoffat (Adam Moffat) 1 day ago
I saw that this was added in 3.4.0: https://github.com/ruby/ruby/pull/10864
Seen here: (https://github.com/ruby/ruby/releases/tag/v3_4_0_preview2)
But I also tested this using 3.4.1 and it was still an issue.
Updated by mame (Yusuke Endoh) about 23 hours ago
Thank you for the report.
Since I don't have access to Tahoe, I cannot test this in my own environment. However, I have a few questions to clarify the situation.
The change to perform DNS lookups in a dedicated background thread was introduced in Ruby 3.3.0. You mentioned that this affects Ruby 3.2.6 as well. Are you certain it reproduces on 3.2.6?
If it fails on 3.2.6, the cause might be unrelated to the background thread, as its behavior should be similar to Python's. Would it be possible to provide a stack trace from the 3.2.6 crash?
Though it's just a guess, this might be a bug with getaddrinfo on Tahoe itself, but I could be wrong.
Updated by adamoffat (Adam Moffat) about 15 hours ago
· Edited
mame (Yusuke Endoh) wrote in #note-3:
Thank you for the report.
Since I don't have access to Tahoe, I cannot test this in my own environment. However, I have a few questions to clarify the situation.
The change to perform DNS lookups in a dedicated background thread was introduced in Ruby 3.3.0. You mentioned that this affects Ruby 3.2.6 as well. Are you certain it reproduces on 3.2.6?
If it fails on 3.2.6, the cause might be unrelated to the background thread, as its behavior should be similar to Python's. Would it be possible to provide a stack trace from the 3.2.6 crash?
Though it's just a guess, this might be a bug with
getaddrinfoon Tahoe itself, but I could be wrong.
Ah yes, sorry I should have clarified this in my post. I tested this in 3.2.6 but it manifests differently in that version.
When I ran the same reproduction script with Ruby 3.2.6, rather than hanging indefinitely, it crashed immediately with a segmentation fault when the child process attempts DNS resolution.
The crash occurs at the getaddrinfo call in the forked child. The backtrace shows the fault originating in macOS system libraries, specifically in libsystem_trace.dylib at _os_log_preferences_refresh.
This confirms Ruby 3.2.6 is also affected by the same underlying issue - it just manifests as an immediate crash rather than a hang.
I've attached the full crash output for reference.
Updated by mame (Yusuke Endoh) about 12 hours ago
Thank you. This looks like the same issue reported multiple times in the past, but we were previously stuck without a way to investigate.
https://bugs.ruby-lang.org/issues/15490
https://bugs.ruby-lang.org/issues/15794
https://github.com/redis/redis-rb/issues/859
https://github.com/hanami/hanami/issues/993
It is greatly appreciated that the reproduction conditions are now much clearer.
This issue does not affect Python even in a forked child process, right? If Python avoids this error, checking how it calls getaddrinfo might give us a hint for a fix or workaround.
It is difficult for me to debug this without a reproducing environment. Are there any committers or contributors who can reproduce the issue and investigate?
Updated by mame (Yusuke Endoh) about 12 hours ago
- Related to Bug #15490: socket.rb - recurring segmentation faults added
- Related to Bug #15794: Can not start Puma with Rails after bundle install added
Updated by adamoffat (Adam Moffat) about 12 hours ago
· Edited
- File python_dns_fork_test.py python_dns_fork_test.py added
Updated by adamoffat (Adam Moffat) about 12 hours ago
- File python_dns_fork_test.py python_dns_fork_test.py added
- File python_crash_output.txt python_crash_output.txt added
Ah my earlier Python script had a bug.
My initial Python test incorrectly reported success. The script used os.WEXITSTATUS() to check the child's exit status, but this function only works for processes that exit normally. When a process is killed by a signal (SIGSEGV), it returns 0, giving a false positive.
After fixing the script to check os.WIFSIGNALED(), I was able to confirm the child is killed by signal 11 (SIGSEGV). The crash logs show the identical stack trace to Ruby: _gai_nat64_second_pass → nw_path_access_agent_cache → _os_log_preferences_refresh.
This is an OS-level bug in macOS Tahoe, not language-specific. My apologies.