Project

General

Profile

Actions

Feature #16476

closed

Socket.getaddrinfo cannot be interrupted by Timeout.timeout

Added by kirs (Kir Shatrov) about 4 years ago. Updated 5 months ago.

Status:
Closed
Target version:
-
[ruby-core:96642]

Description

It seems like the blocking syscall done by Socket.getaddrinfo blocks Ruby VM in a way that Timeout.timeout has no effect.
See reproduction steps in getaddrinfo_interrupt.rb (https://gist.github.com/kirs/00c02ef92e0418578135fe0a6cbd3d7d). This affects all modern Ruby versions, including the latest 2.7.0.

Combined with default 10s resolv timeout on many Linux systems, this can have a very noticeable effect on production Ruby apps being not resilient to slow DNS resolutions, and being unable to fail fast even with Timeout.timeout.

While https://bugs.ruby-lang.org/issues/15553 improves the situation for Addrinfo.getaddrinfo, Socket.getaddrinfo is still blocking the VM and Timeout has no effect.

I'd like to discuss what could be done to make that call non-blocking for threads in Ruby VM.

UPD: looking closer, I can see that Socket.getaddrinfo("www.ruby-lang.org", "http") and Addrinfo.getaddrinfo("www.ruby-lang.org", "http") call non-interruptible getaddrinfo, while Addrinfo.getaddrinfo("www.ruby-lang.org", "http", timeout: 10) calls getaddrinfo_a, which is interruptible:

# interrupts as expected
Timeout.timeout(1) do
  Addrinfo.getaddrinfo("www.ruby-lang.org", "http", timeout: 10)
end

I'd maybe suggest that we try to always use getaddrinfo_a when it's available, including in Socket.getaddrinfo. What downsides that would have?
I'd be happy to work on a patch.


Related issues 3 (2 open1 closed)

Related to Ruby master - Feature #16381: Accept resolv_timeout in Net::HTTPOpenActions
Related to Ruby master - Feature #17134: Add resolv_timeout to TCPSocketOpenActions
Related to Ruby master - Feature #19965: Make the name resolution interruptibleClosedmame (Yusuke Endoh)Actions
Actions #1

Updated by kirs (Kir Shatrov) about 4 years ago

  • Description updated (diff)
Actions #2

Updated by kirs (Kir Shatrov) about 4 years ago

  • Description updated (diff)

Updated by Dan0042 (Daniel DeLorme) about 4 years ago

+1

This has been an issue for a very long time, and it's often been handled by installing an asynchronous DNS resolver gem, but it would be nice if it "just worked". If it's really as simple as using getaddrinfo_a, that sounds great.

Updated by kirs (Kir Shatrov) about 4 years ago

Dan0042 (Daniel DeLorme) wrote:

+1

This has been an issue for a very long time, and it's often been handled by installing an asynchronous DNS resolver gem, but it would be nice if it "just worked". If it's really as simple as using getaddrinfo_a, that sounds great.

Thanks for feedback Daniel!

I've put a PR with the suggested fix: https://github.com/ruby/ruby/pull/2827

Actions #5

Updated by kirs (Kir Shatrov) about 4 years ago

  • File deleted (getaddrinfo_interrupt.rb)
Actions #6

Updated by kirs (Kir Shatrov) about 4 years ago

  • Description updated (diff)

Updated by mame (Yusuke Endoh) about 4 years ago

  • Tracker changed from Bug to Feature
  • Status changed from Open to Assigned
  • Assignee set to Glass_saga (Masaki Matsushita)
  • ruby -v deleted (ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux])
  • Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN)

We discussed this issue at the dev-meeting, and it requires @Glass_saga's review.

Note:

  • It is uninterruptable under a platform that getaddrinfo_a is unavailable, but this problem is not only this proposal but also timeout: option of Addrinfo.getaddrinfo().
  • Interruptable version can be implemented without getaddrinfo_a: Creating pthread for getaddrinfo function and pthread_cancel when interrupted. Contribution is welcome.

Updated by kirs (Kir Shatrov) almost 4 years ago

mame (Yusuke Endoh) wrote in #note-7:

We discussed this issue at the dev-meeting, and it requires @Glass_saga's review.

Note:

  • It is uninterruptable under a platform that getaddrinfo_a is unavailable, but this problem is not only this proposal but also timeout: option of Addrinfo.getaddrinfo().
  • Interruptable version can be implemented without getaddrinfo_a: Creating pthread for getaddrinfo function and pthread_cancel when interrupted. Contribution is welcome.

Thanks for feedback!
I've opened https://github.com/ruby/ruby/pull/3171 with the approach you've described. Please let me know if I miss anything.

Updated by Glass_saga (Masaki Matsushita) over 3 years ago

  • Status changed from Assigned to Closed
  • Target version set to 36
Actions #10

Updated by Glass_saga (Masaki Matsushita) over 3 years ago

Actions #11

Updated by hsbt (Hiroshi SHIBATA) over 3 years ago

  • Target version changed from 36 to 3.0

Updated by ioquatix (Samuel Williams) over 3 years ago

I have some feedback:

  • I wish we don't use strange abbreviations like "resolv_timeout" and use correct English like "resolve_timeout".
  • I'm not convinced that getaddrinfo_a is a good idea, it has a user-space thread pool and the implementation doesn't seem great.
  • We will introduce asynchronous DNS resolution using a scheduler hook anyway, for Ruby 3: https://github.com/bruno-/ruby/pull/1

Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.

That being said, I'm not against this PR. I'm just not sure it's worth the effort/complexity.

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

ioquatix (Samuel Williams) wrote in #note-12:

Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.

One way around this is to use a deadline instead of a timeout:

# or something based on Process.clock_gettime
deadline = Time.now + 5

Addrinfo.getaddrinfo("www.ruby-lang.org", "http", deadline: deadline)
Addrinfo.getaddrinfo("docs.ruby-lang.org", "http", deadline: deadline)
Addrinfo.getaddrinfo("bugs.ruby-lang.org", "http", deadline: deadline)

I think this approach is much better than a Timeout.timeout block. For one, Timeout.timeout requires the use of Thread#raise and results in less deterministic behavior. Additionally, a deadline option can potentially use a different approach than raising an exception, similar to IO#read_nonblock.

Actions #14

Updated by naruse (Yui NARUSE) about 3 years ago

Updated by naruse (Yui NARUSE) about 3 years ago

  • Status changed from Closed to Open
  • Target version changed from 3.0 to 3.1

ioquatix (Samuel Williams) wrote in #note-12:

  • I'm not convinced that getaddrinfo_a is a good idea, it has a user-space thread pool and the implementation doesn't seem great.

https://github.com/ruby/ruby/commit/2038cc6cab6ceeffef3ec3a765c70ae684f829ed is reverted because of [Bug #17220].

Since getaddrinfo doesn't provide nonblocking version, I think we need our own implementation of getaddrinfo_a.

Adding timeouts as arguments is not particularly useful either. It's not particularly easy to compose timeouts or use a single timeout for multiple operations, and it makes the underlying implementation more complex.

In my experience timeout is important for web applications to return a response when a DNS resolution is too slow.
It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.

Updated by Dan0042 (Daniel DeLorme) about 3 years ago

naruse (Yui NARUSE) wrote in #note-15:

It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.

How common is it to need separate timeouts for all of these? I can easily imagine the resolv_timeout as being part of the connect_timeout (i.e. resolv+connect may not exceed connect_timeout). And at least in my experience I usually want a single "general timeout" for resolv+connect+write+read. For that reason I really like Jeremy's suggestion of a single deadline argument. You can even have a single deadline for multiple http requests. This is really perfect for most uses cases that I'm familiar with.

Updated by naruse (Yui NARUSE) about 3 years ago

Dan0042 (Daniel DeLorme) wrote in #note-16:

naruse (Yui NARUSE) wrote in #note-15:

It's a long requested series of improvements for HTTP client like read_timeout, connect_timeout, and write_timeout.
resolv_timeout is the last piece of that.

How common is it to need separate timeouts for all of these? I can easily imagine the resolv_timeout as being part of the connect_timeout (i.e. resolv+connect may not exceed connect_timeout). And at least in my experience I usually want a single "general timeout" for resolv+connect+write+read. For that reason I really like Jeremy's suggestion of a single deadline argument. You can even have a single deadline for multiple http requests. This is really perfect for most uses cases that I'm familiar with.

Both Timeout.timeout and deadline is not the essential problem of this topic. The topic this ticket handles is "getaddrinfo is not interruptable nor timeout ready". This is because getaddrinfo(3) doesn't have timeout, async, nor nonblocking API. This ticket intends to provide the underlying implementation of such feature.

Actions #18

Updated by hsbt (Hiroshi SHIBATA) almost 2 years ago

  • Target version changed from 3.1 to 3.2
Actions #19

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago

  • Target version deleted (3.2)
Actions #20

Updated by hsbt (Hiroshi SHIBATA) 5 months ago

  • Related to Feature #19965: Make the name resolution interruptible added

Updated by mame (Yusuke Endoh) 5 months ago

  • Status changed from Open to Closed

#19965 solved the problem, so I'll close it.

I completely forgot this ticket, but the concept (pthread_create on every call to getaddrinfo(3)) was the same as in #19965. Just for case, #19965 is a bit more elaborate:

  • It works as before in environments where pthread is not available.
  • It includes the same hack for not only getaddrinfo(3) but also getnameinfo(3).
  • pthread_cancel is not used because until recently there was a bug in glibc where pthread_cancel of a pthread that is calling getaddrinfo(3) would cause subsequent name resolution to stick:
  • It attempts to use pthread_setaffinity_np to reduce speed degradation.
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0