Feature #16476
closedSocket.getaddrinfo cannot be interrupted by Timeout.timeout
Description
It seems like the blocking syscall done by Socket.getaddrinfo
blocks Ruby VM in a way that Timeout.timeout has no effect.
See reproduction steps in getaddrinfo_interrupt.rb (https://gist.github.com/kirs/00c02ef92e0418578135fe0a6cbd3d7d). This affects all modern Ruby versions, including the latest 2.7.0.
Combined with default 10s resolv timeout on many Linux systems, this can have a very noticeable effect on production Ruby apps being not resilient to slow DNS resolutions, and being unable to fail fast even with Timeout.timeout
.
While https://bugs.ruby-lang.org/issues/15553 improves the situation for Addrinfo.getaddrinfo
, Socket.getaddrinfo
is still blocking the VM and Timeout has no effect.
I'd like to discuss what could be done to make that call non-blocking for threads in Ruby VM.
UPD: looking closer, I can see that Socket.getaddrinfo("www.ruby-lang.org", "http")
and Addrinfo.getaddrinfo("www.ruby-lang.org", "http")
call non-interruptible getaddrinfo
, while Addrinfo.getaddrinfo("www.ruby-lang.org", "http", timeout: 10)
calls getaddrinfo_a
, which is interruptible:
# interrupts as expected
Timeout.timeout(1) do
Addrinfo.getaddrinfo("www.ruby-lang.org", "http", timeout: 10)
end
I'd maybe suggest that we try to always use getaddrinfo_a
when it's available, including in Socket.getaddrinfo
. What downsides that would have?
I'd be happy to work on a patch.
Updated by Dan0042 (Daniel DeLorme) about 5 years ago
+1
This has been an issue for a very long time, and it's often been handled by installing an asynchronous DNS resolver gem, but it would be nice if it "just worked". If it's really as simple as using getaddrinfo_a
, that sounds great.
Updated by kirs (Kir Shatrov) about 5 years ago
Dan0042 (Daniel DeLorme) wrote:
+1
This has been an issue for a very long time, and it's often been handled by installing an asynchronous DNS resolver gem, but it would be nice if it "just worked". If it's really as simple as using
getaddrinfo_a
, that sounds great.
Thanks for feedback Daniel!
I've put a PR with the suggested fix: https://github.com/ruby/ruby/pull/2827
Updated by kirs (Kir Shatrov) almost 5 years ago
- File deleted (
getaddrinfo_interrupt.rb)
Updated by mame (Yusuke Endoh) almost 5 years ago
- Tracker changed from Bug to Feature
- Status changed from Open to Assigned
- Assignee set to Glass_saga (Masaki Matsushita)
- ruby -v deleted (
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]) - Backport deleted (
2.5: UNKNOWN, 2.6: UNKNOWN)
We discussed this issue at the dev-meeting, and it requires @Glass_saga's review.
Note:
- It is uninterruptable under a platform that getaddrinfo_a is unavailable, but this problem is not only this proposal but also
timeout:
option ofAddrinfo.getaddrinfo()
. - Interruptable version can be implemented without getaddrinfo_a: Creating pthread for getaddrinfo function and pthread_cancel when interrupted. Contribution is welcome.
Updated by kirs (Kir Shatrov) over 4 years ago
mame (Yusuke Endoh) wrote in #note-7:
We discussed this issue at the dev-meeting, and it requires @Glass_saga's review.
Note:
- It is uninterruptable under a platform that getaddrinfo_a is unavailable, but this problem is not only this proposal but also
timeout:
option ofAddrinfo.getaddrinfo()
.- Interruptable version can be implemented without getaddrinfo_a: Creating pthread for getaddrinfo function and pthread_cancel when interrupted. Contribution is welcome.
Thanks for feedback!
I've opened https://github.com/ruby/ruby/pull/3171 with the approach you've described. Please let me know if I miss anything.
Updated by Glass_saga (Masaki Matsushita) over 4 years ago
- Status changed from Assigned to Closed
- Target version set to 36
Updated by Glass_saga (Masaki Matsushita) over 4 years ago
- Related to Feature #16381: Accept resolv_timeout in Net::HTTP added
Updated by hsbt (Hiroshi SHIBATA) over 4 years ago
- Target version changed from 36 to 3.0
Updated by ioquatix (Samuel Williams) about 4 years ago
I have some feedback:
- I wish we don't use strange abbreviations like "resolv_timeout" and use correct English like "resolve_timeout".
- I'm not convinced that
getaddrinfo_a
is a good idea, it has a user-space thread pool and the implementation doesn't seem great. - We will introduce asynchronous DNS resolution using a scheduler hook anyway, for Ruby 3: https://github.com/bruno-/ruby/pull/1
Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.
That being said, I'm not against this PR. I'm just not sure it's worth the effort/complexity.
Updated by jeremyevans0 (Jeremy Evans) about 4 years ago
ioquatix (Samuel Williams) wrote in #note-12:
Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.
One way around this is to use a deadline instead of a timeout:
# or something based on Process.clock_gettime
deadline = Time.now + 5
Addrinfo.getaddrinfo("www.ruby-lang.org", "http", deadline: deadline)
Addrinfo.getaddrinfo("docs.ruby-lang.org", "http", deadline: deadline)
Addrinfo.getaddrinfo("bugs.ruby-lang.org", "http", deadline: deadline)
I think this approach is much better than a Timeout.timeout
block. For one, Timeout.timeout
requires the use of Thread#raise
and results in less deterministic behavior. Additionally, a deadline option can potentially use a different approach than raising an exception, similar to IO#read_nonblock
.
Updated by naruse (Yui NARUSE) about 4 years ago
- Related to Feature #17134: Add resolv_timeout to TCPSocket added
Updated by naruse (Yui NARUSE) about 4 years ago
- Status changed from Closed to Open
- Target version changed from 3.0 to 3.1
ioquatix (Samuel Williams) wrote in #note-12:
- I'm not convinced that getaddrinfo_a is a good idea, it has a user-space thread pool and the implementation doesn't seem great.
https://github.com/ruby/ruby/commit/2038cc6cab6ceeffef3ec3a765c70ae684f829ed is reverted because of [Bug #17220].
Since getaddrinfo doesn't provide nonblocking version, I think we need our own implementation of getaddrinfo_a.
Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.
In my experience timeout is important for web applications to return a response when a DNS resolution is too slow.
It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.
Updated by Dan0042 (Daniel DeLorme) about 4 years ago
naruse (Yui NARUSE) wrote in #note-15:
It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.
How common is it to need separate timeouts for all of these? I can easily imagine the resolv_timeout as being part of the connect_timeout (i.e. resolv+connect may not exceed connect_timeout). And at least in my experience I usually want a single "general timeout" for resolv+connect+write+read. For that reason I really like Jeremy's suggestion of a single deadline
argument. You can even have a single deadline for multiple http requests. This is really perfect for most uses cases that I'm familiar with.
Updated by naruse (Yui NARUSE) about 4 years ago
Dan0042 (Daniel DeLorme) wrote in #note-16:
naruse (Yui NARUSE) wrote in #note-15:
It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.How common is it to need separate timeouts for all of these? I can easily imagine the resolv_timeout as being part of the connect_timeout (i.e. resolv+connect may not exceed connect_timeout). And at least in my experience I usually want a single "general timeout" for resolv+connect+write+read. For that reason I really like Jeremy's suggestion of a single
deadline
argument. You can even have a single deadline for multiple http requests. This is really perfect for most uses cases that I'm familiar with.
Both Timeout.timeout
and deadline
is not the essential problem of this topic. The topic this ticket handles is "getaddrinfo is not interruptable nor timeout ready". This is because getaddrinfo(3) doesn't have timeout, async, nor nonblocking API. This ticket intends to provide the underlying implementation of such feature.
Updated by hsbt (Hiroshi SHIBATA) over 2 years ago
- Target version changed from 3.1 to 3.2
Updated by hsbt (Hiroshi SHIBATA) about 2 years ago
- Target version deleted (
3.2)
Updated by hsbt (Hiroshi SHIBATA) about 1 year ago
- Related to Feature #19965: Make the name resolution interruptible added
Updated by mame (Yusuke Endoh) about 1 year ago
- Status changed from Open to Closed
#19965 solved the problem, so I'll close it.
I completely forgot this ticket, but the concept (pthread_create on every call to getaddrinfo(3)) was the same as in #19965. Just for case, #19965 is a bit more elaborate:
- It works as before in environments where pthread is not available.
- It includes the same hack for not only
getaddrinfo(3)
but alsogetnameinfo(3)
. -
pthread_cancel
is not used because until recently there was a bug in glibc wherepthread_cancel
of a pthread that is callinggetaddrinfo(3)
would cause subsequent name resolution to stick: - It attempts to use
pthread_setaffinity_np
to reduce speed degradation.