Bug #14997

Socket connect timeout exceeds the timeout value for

Added by maciej.mensfeld (Maciej Mensfeld) almost 2 years ago. Updated 9 months ago.

Target version:


Given a case, where a domain is being resolved to multiple IPs (4 in the following example):

dig a

; <<>> DiG 9.10.3-P4-Ubuntu <<>> a
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54375
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

; IN A

;; ANSWER SECTION: 60 IN A 60 IN A 60 IN A 60 IN A

;; Query time: 4 msec
;; WHEN: Tue Aug 14 13:46:18 UTC 2018
;; MSG SIZE  rcvd: 132

and when connect_timeout is set to a certain value (N), the overall timeout upon non-responsive endpoints that don't immediately throw an exception can reach N * 4.

This can disrupt some time-sensitive systems.

We've experienced it with the following setup:

  • TCP server (event machine) behind an AWS NLB
  • TCP server process goes down behind NLB but NLB is still responsive
  • Socket connect_timeout is set to 100ms
  • AWS NLB keeps the connection in the waiting state hoping that the service behind it will get back to normal (but it doesn't)
  • Ruby timeouts after 100ms
  • Ruby tries to connect to the next IP from the pool (AWS NLB again)
  • Due to 4 hosts resolving, the overall timeout is 400ms.

Not sure whether this should be qualified as a bug or a feature, but I believe it should be definitely documented or there should be an option to "hard" block this limit.

Here's the code actually responsible for this behavior:

Related issues

Related to Ruby master - Feature #15553: Addrinfo.getaddrinfo supports timeoutClosedGlass_saga (Masaki Matsushita)Actions

Updated by maciej.mensfeld (Maciej Mensfeld) almost 2 years ago

  • Description updated (diff)

Updated by maciej.mensfeld (Maciej Mensfeld) almost 2 years ago

If anyone is actually willing to confirm, that it is indeed an unwanted / unexpected behavior, I offer to fix it.

It could be fixed by tracking how much of the time "pool" has been used and lowering the timeout value appropriate for the next attempts. That would guarantee, that we would never exceed the timeout.

I think this is the most elegant solution.

Updated by tenderlovemaking (Aaron Patterson) about 1 year ago

This really sounds like a bug to me. Please make a patch and I will apply it.


Updated by Glass_saga (Masaki Matsushita) about 1 year ago

  • Related to Feature #15553: Addrinfo.getaddrinfo supports timeout added

Updated by kirs (Kir Shatrov) 9 months ago

tenderlovemaking (Aaron Patterson) wrote:

This really sounds like a bug to me. Please make a patch and I will apply it.

Do you mind taking a look at Based on my testing it's solving the problem.

Together with (already merged), many of us at Shopify would really love to see that fixed in 2.7 as it would improve resiliency and avoid Ruby processes to hang for 10s (default resolv timeout) when DNS is experiencing issues.

Also available in: Atom PDF