Bug #19144
closedRuby should set AI_V4MAPPED | AI_ADDRCONFIG getaddrinfo flags by default
Description
Currently, DNS lookups made with getaddrinfo
from Ruby (i.e. not from the Resolv
module) cause both A and AAAA DNS requests to be made, even on systems that don’t actually have an IPv6 address that could possibly make the AAAA response useful. I wouldn’t really care about this, normally, but glibc has a bug (https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1961697) which can cause a 5-second delay in DNS lookups when both A and AAAA records are queried in parallel. This bug is fixed in glibc upstream but still present in some LTS linux distros (Ubuntu 18.04 and 20.04 at least), so I think it’s worthwhile to try and work around it in circumstances where the AAAA request is pointless anyway.
The dual A/AAAA lookup happens because whenever Ruby calls getaddrinfo to perform DNS lookups, it always sets hints
, and sets hints->ai_flags
to zero by default unless flags are specified by the caller (e.g. AI_PASSIVE
is set when binding a TCP server socket in TCPServer.new
).
This matches the default value of ai_flags
specified by POSIX, which is zero. However, glibc behaves differently. When glibc’s getaddrinfo
function is called with NULL
for the hints
parameter, it defaults the ai_flags
value to (AI_V4MAPPED | AI_ADDRCONFIG)
. The manpage (from the Linux man-pages project - https://man7.org/linux/man-pages/man3/getaddrinfo.3.html) claims “this is an improvement on the standard” (although I couldn’t find this mentioned in the glibc manual itself).
Of course, we’re not actually ever calling getaddrinfo
with NULL hints
; so, we never actually use these flags on glibc systems (unless they’re explicitly specified by the caller).
My proposal is that we should change Ruby to set these two flags by default, when they’re available, in the following circumstances:
- In all calls made internally to
rsock_getaddrinfo
as a result of socket functions likeTCPSocket.new
,UDPSocket.new
, etc. - EXCEPT when
AI_PASSIVE
is also set (i.e. when we’re trying to get an address to bind for listener socket - see below) - In calls made to
rsock_getaddrinfo
as a direct result of callingAddrinfo.getaddrinfo
from Ruby with nil flags - EXCEPT calls to
Addrinfo.getaddrinfo
where explicit flags are provided
Both of these seem like something you would almost always want to be doing in any outgoing connection scenario:
-
AI_V4MAPPED
ensures that, if AF_INET6 is explicitly specified as the desired protocol, and there is no AAAA record in DNS, that any A record that is present gets converted to an IPv4-mapped IPv6 address so it can be used e.g. with NAT64. -
AI_ADDRCONFIG
ensures that, if a machine has no IPv6 address, it doesn’t bother making an AAAA lookup that will return IPv6 addresses that can’t actually be used for anything (and vice versa for IPv4).
The reason why we wouldn’t want to set AI_ADDRCONFIG
in circumstances where Ruby currently sets AI_PASSIVE
is that loopback addresses are not considered in deciding if a system has an IPv4/IPv6 address. Conceivably, you might want to bind to a ::1
loopback address, and allow other processes on the same machine to connect to that.
Does changing this default sound reasonable? If so I can prepare a patch. Another option I considered is doing this only when Ruby is built against glibc (so that other system behaviour is most closely matched).
Updated by kjtsanaktsidis (KJ Tsanaktsidis) almost 2 years ago
A gentle poke to see if anybody has some thoughts on this?
Updated by akr (Akira Tanaka) almost 2 years ago
- Status changed from Open to Feedback
I feel AI_ADDRCONFIG is good if the result addresses are used immediately for making a connection.
But getaddrinfo can be used just for getting DNS information.
AI_ADDRCONFIG is not suitable for this situation.
I don't understand why AI_V4MAPPED is useful.
Also, some systems, such as NetBSD, seems doesn't have AI_V4MAPPED.
https://man.netbsd.org/NetBSD-9.3/getaddrinfo.3
Using AI_V4MAPPED introduces incompatibility.
Ruby has several methods to invoke getaddrinfo() and connect() internally, such as TCPSocket.new.
How about we specify AI_ADDRCONFIG for getaddrinfo invocations in such methods?
This avoids the problem (useless AAAA query) and
doesn't affect applications that invoke of getaddrinfo (possibly it may have a problem with AI_ADDRCONFIG).
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
Thank you for having a look at this!
Ruby has several methods to invoke getaddrinfo() and connect() internally, such as TCPSocket.new.
How about we specify AI_ADDRCONFIG for getaddrinfo invocations in such methods?
I'm OK with doing just this, and not changing direct calls to Addrinfo.getaddrinfo
. You're right, it's going to solve 99% of the problems and avoids any potential compatibility issue
I don't understand why AI_V4MAPPED is useful.
I did a bit more research into this. Actually what I said in the original issue about NAT64 is wrong, v4 mapped v6 addresses have nothing to do with NAT64.
What this flag does actually is:
- When making a call to getaddrinfo with both AF_INET6 and AI_V4MAPPED,
- If there is no AAAA record for a name,
- And there is an A record for a name,
- Return an "IPv4-mapped IPv6 address", which is an IPv6 address prefixed with
::FFFF
and then the four bytes of the IPv4 address at the end e.g.::FFFF:1.2.3.4
The point of the IPv4-mapped IPv6 address actually has nothing to do with NAT64. Rather, when calling connect(2)
on such an IPv6 address, if the host actually does have an IPv4 address as well, it will make the connection with the IPv4 stack. The purpose of this, it seems, is to allow applications to be written to only handle IPv6, and they'll transparently get IPv4 support for free.
I don't think Ruby actually needs this flag - it defaults to making the request with AF_UNSPEC
and can handle getting either IPv4 or IPv6 addresses out of getaddrinfo
correctly. In fact, the only way for any of the socket connect methods to pass a specific address family in here is UDPSocket.new(Socket::AF_INET6).connect('hostname', port_number)
. If this actually made an IPv4 connection because getaddrinfo returned an IPv4-mapped IPv6 address, I think that would be very confusing.
So, I think you're right - we should not set AI_V4MAPPED
by default.
Also, some systems, such as NetBSD, seems doesn't have AI_V4MAPPED.
I would add feature checks for these flags in socket's extconf.rb
i think.
Thanks again for your feedback. I'll try and send a PR later this week which defaults AI_ADDRCONFIG
to on when getaddrinfo
is called from inside the socket connection methods (but NOT when called explicitly with Socket.getaddrinfo
et al).
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
OK, I opened https://github.com/ruby/ruby/pull/7295 with those changes. Thanks again!
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
@akr (Akira Tanaka) could you take a look at my PR when you get a chance? I think I addressed your feedback, please let me know if I have misunderstood!
Updated by jeremyevans0 (Jeremy Evans) 11 months ago
- Status changed from Feedback to Open
Updated by akr (Akira Tanaka) 11 months ago
kjtsanaktsidis (KJ Tsanaktsidis) wrote in #note-5:
@akr (Akira Tanaka) could you take a look at my PR when you get a chance? I think I addressed your feedback, please let me know if I have misunderstood!
It seems fine.
I agree that we remove the test, test_ai_addrconfig, because it is too complicated.
Updated by Anonymous 11 months ago
- Status changed from Open to Closed
Applied in changeset git|d2ba8ea54a4089959afdeecdd963e3c4ff391748.
Set AI_ADDRCONFIG when making getaddrinfo(3) calls for outgoing conns (#7295)
When making an outgoing TCP or UDP connection, set AI_ADDRCONFIG in the
hints we send to getaddrinfo(3) (if supported). This will prompt the
resolver to NOT issue A or AAAA queries if the system does not
actually have an IPv4 or IPv6 address (respectively).
This makes outgoing connections marginally more efficient on
non-dual-stack systems, since we don't have to try connecting to an
address which can't possibly work.
More importantly, however, this works around a race condition present
in some older versions of glibc on aarch64 where it could accidently
send the two outgoing DNS queries with the same DNS txnid, and get
confused when receiving the responses. This manifests as outgoing
connections sometimes taking 5 seconds (the DNS timeout before retry) to
be made.
Fixes #19144