Project

General

Profile

Actions

Feature #16476

open

Socket.getaddrinfo cannot be interrupted by Timeout.timeout

Added by kirs (Kir Shatrov) over 1 year ago. Updated 9 months ago.

Status:
Open
Priority:
Normal
Target version:
[ruby-core:96642]

Description

It seems like the blocking syscall done by Socket.getaddrinfo blocks Ruby VM in a way that Timeout.timeout has no effect.
See reproduction steps in getaddrinfo_interrupt.rb (https://gist.github.com/kirs/00c02ef92e0418578135fe0a6cbd3d7d). This affects all modern Ruby versions, including the latest 2.7.0.

Combined with default 10s resolv timeout on many Linux systems, this can have a very noticeable effect on production Ruby apps being not resilient to slow DNS resolutions, and being unable to fail fast even with Timeout.timeout.

While https://bugs.ruby-lang.org/issues/15553 improves the situation for Addrinfo.getaddrinfo, Socket.getaddrinfo is still blocking the VM and Timeout has no effect.

I'd like to discuss what could be done to make that call non-blocking for threads in Ruby VM.

UPD: looking closer, I can see that Socket.getaddrinfo("www.ruby-lang.org", "http") and Addrinfo.getaddrinfo("www.ruby-lang.org", "http") call non-interruptible getaddrinfo, while Addrinfo.getaddrinfo("www.ruby-lang.org", "http", timeout: 10) calls getaddrinfo_a, which is interruptible:

# interrupts as expected
Timeout.timeout(1) do
  Addrinfo.getaddrinfo("www.ruby-lang.org", "http", timeout: 10)
end

I'd maybe suggest that we try to always use getaddrinfo_a when it's available, including in Socket.getaddrinfo. What downsides that would have?
I'd be happy to work on a patch.


Related issues

Related to Ruby master - Feature #16381: Accept resolv_timeout in Net::HTTPOpenActions
Related to Ruby master - Feature #17134: Add resolv_timeout to TCPSocketOpenActions
Actions #1

Updated by kirs (Kir Shatrov) over 1 year ago

  • Description updated (diff)
Actions #2

Updated by kirs (Kir Shatrov) over 1 year ago

  • Description updated (diff)

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

+1

This has been an issue for a very long time, and it's often been handled by installing an asynchronous DNS resolver gem, but it would be nice if it "just worked". If it's really as simple as using getaddrinfo_a, that sounds great.

Updated by kirs (Kir Shatrov) over 1 year ago

Dan0042 (Daniel DeLorme) wrote:

+1

This has been an issue for a very long time, and it's often been handled by installing an asynchronous DNS resolver gem, but it would be nice if it "just worked". If it's really as simple as using getaddrinfo_a, that sounds great.

Thanks for feedback Daniel!

I've put a PR with the suggested fix: https://github.com/ruby/ruby/pull/2827

Actions #5

Updated by kirs (Kir Shatrov) over 1 year ago

  • File deleted (getaddrinfo_interrupt.rb)
Actions #6

Updated by kirs (Kir Shatrov) over 1 year ago

  • Description updated (diff)

Updated by mame (Yusuke Endoh) over 1 year ago

  • Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN)
  • ruby -v deleted (ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux])
  • Assignee set to Glass_saga (Masaki Matsushita)
  • Status changed from Open to Assigned
  • Tracker changed from Bug to Feature

We discussed this issue at the dev-meeting, and it requires Glass_saga (Masaki Matsushita)'s review.

Note:

  • It is uninterruptable under a platform that getaddrinfo_a is unavailable, but this problem is not only this proposal but also timeout: option of Addrinfo.getaddrinfo().
  • Interruptable version can be implemented without getaddrinfo_a: Creating pthread for getaddrinfo function and pthread_cancel when interrupted. Contribution is welcome.

Updated by kirs (Kir Shatrov) over 1 year ago

mame (Yusuke Endoh) wrote in #note-7:

We discussed this issue at the dev-meeting, and it requires Glass_saga (Masaki Matsushita)'s review.

Note:

  • It is uninterruptable under a platform that getaddrinfo_a is unavailable, but this problem is not only this proposal but also timeout: option of Addrinfo.getaddrinfo().
  • Interruptable version can be implemented without getaddrinfo_a: Creating pthread for getaddrinfo function and pthread_cancel when interrupted. Contribution is welcome.

Thanks for feedback!
I've opened https://github.com/ruby/ruby/pull/3171 with the approach you've described. Please let me know if I miss anything.

Updated by Glass_saga (Masaki Matsushita) about 1 year ago

  • Target version set to 36
  • Status changed from Assigned to Closed
Actions #10

Updated by Glass_saga (Masaki Matsushita) about 1 year ago

Actions #11

Updated by hsbt (Hiroshi SHIBATA) 12 months ago

  • Target version changed from 36 to 3.0

Updated by ioquatix (Samuel Williams) 10 months ago

I have some feedback:

  • I wish we don't use strange abbreviations like "resolv_timeout" and use correct English like "resolve_timeout".
  • I'm not convinced that getaddrinfo_a is a good idea, it has a user-space thread pool and the implementation doesn't seem great.
  • We will introduce asynchronous DNS resolution using a scheduler hook anyway, for Ruby 3: https://github.com/bruno-/ruby/pull/1

Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.

That being said, I'm not against this PR. I'm just not sure it's worth the effort/complexity.

Updated by jeremyevans0 (Jeremy Evans) 10 months ago

ioquatix (Samuel Williams) wrote in #note-12:

Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.

One way around this is to use a deadline instead of a timeout:

# or something based on Process.clock_gettime
deadline = Time.now + 5

Addrinfo.getaddrinfo("www.ruby-lang.org", "http", deadline: deadline)
Addrinfo.getaddrinfo("docs.ruby-lang.org", "http", deadline: deadline)
Addrinfo.getaddrinfo("bugs.ruby-lang.org", "http", deadline: deadline)

I think this approach is much better than a Timeout.timeout block. For one, Timeout.timeout requires the use of Thread#raise and results in less deterministic behavior. Additionally, a deadline option can potentially use a different approach than raising an exception, similar to IO#read_nonblock.

Actions #14

Updated by naruse (Yui NARUSE) 9 months ago

Updated by naruse (Yui NARUSE) 9 months ago

  • Target version changed from 3.0 to 3.1
  • Status changed from Closed to Open

ioquatix (Samuel Williams) wrote in #note-12:

  • I'm not convinced that getaddrinfo_a is a good idea, it has a user-space thread pool and the implementation doesn't seem great.

https://github.com/ruby/ruby/commit/2038cc6cab6ceeffef3ec3a765c70ae684f829ed is reverted because of [Bug #17220].

Since getaddrinfo doesn't provide nonblocking version, I think we need our own implementation of getaddrinfo_a.

Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.

In my experience timeout is important for web applications to return a response when a DNS resolution is too slow.
It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.

Updated by Dan0042 (Daniel DeLorme) 9 months ago

naruse (Yui NARUSE) wrote in #note-15:

It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.

How common is it to need separate timeouts for all of these? I can easily imagine the resolv_timeout as being part of the connect_timeout (i.e. resolv+connect may not exceed connect_timeout). And at least in my experience I usually want a single "general timeout" for resolv+connect+write+read. For that reason I really like Jeremy's suggestion of a single deadline argument. You can even have a single deadline for multiple http requests. This is really perfect for most uses cases that I'm familiar with.

Updated by naruse (Yui NARUSE) 9 months ago

Dan0042 (Daniel DeLorme) wrote in #note-16:

naruse (Yui NARUSE) wrote in #note-15:

It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.

How common is it to need separate timeouts for all of these? I can easily imagine the resolv_timeout as being part of the connect_timeout (i.e. resolv+connect may not exceed connect_timeout). And at least in my experience I usually want a single "general timeout" for resolv+connect+write+read. For that reason I really like Jeremy's suggestion of a single deadline argument. You can even have a single deadline for multiple http requests. This is really perfect for most uses cases that I'm familiar with.

Both Timeout.timeout and deadline is not the essential problem of this topic. The topic this ticket handles is "getaddrinfo is not interruptable nor timeout ready". This is because getaddrinfo(3) doesn't have timeout, async, nor nonblocking API. This ticket intends to provide the underlying implementation of such feature.

Actions

Also available in: Atom PDF