Bug #17781
closedResolv::DNS RequestID table allocations are never freed, causing DNS lookups to eventually hang
Description
In between ruby 2.7.2 and 2.7.3/3.0.0 releases a bug was introduced into Resolv::DNS
functionality that eventually will break long-running processes that do DNS lookups.
Resolv::DNS.free_request_id
is no longer called once requests are complete. The implication of this is that a maximum of 65536 DNS lookups can be performed against a single DNS server; any further requests will cause ruby to hang in an infinite loop in Resolv::DNS.allocate_request_id
(all allocations are taken, however the code will loop forever trying to find one that is unallocated).
Reproduction of the problem is possible with the following code:
require 'resolv'
65536.times { Resolv::DNS.new.getresource('www.example.net', Resolv::DNS::Resource::IN::A) }
puts "Ran 65536 times"
Resolv::DNS.new.getresource('www.example.net', Resolv::DNS::Resource::IN::A)
puts "Ran 65537 times" # never printed
The following script can be used to evaluate if a given ruby version has this problem:
#!/usr/bin/env ruby
require 'resolv'
puts "ruby version: #{RUBY_VERSION}"
r = Resolv::DNS.new
100.times { r.getresource('www.example.net', Resolv::DNS::Resource::IN::A).address }
puts "RequestID table size: #{Resolv::DNS::RequestID.values.first&.count or "(empty)"}"
A properly behaving version should report an empty RequestID
table. For example:
○ → rvm use ruby-2.7.2; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-2.7.2
ruby version: 2.7.2
RequestID table size: (empty)
○ → rvm use ruby-2.7.3; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-2.7.3
ruby version: 2.7.3
RequestID table size: 100
○ → rvm use ruby-3.0.0; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-3.0.0
ruby version: 3.0.0
RequestID table size: 100
○ → rvm use ruby-3.0.1; ./test_dns.rb
Using /home/michael/.rvm/gems/ruby-3.0.1
ruby version: 3.0.1
RequestID table size: 100
Example output from rbtrace on 2.7.2:
$ rbtrace -p 42 -m 'allocate_request_id(RequestID)' 'free_request_id(RequestID)'
*** attached to process 42
Resolv::DNS.allocate_request_id(RequestID={}) <0.000094>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{6377=>true}}) <0.000092>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000086>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{4537=>true}}) <0.000107>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000086>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{64587=>true}}) <0.000100>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000097>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{37627=>true}}) <0.000051>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000045>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{37265=>true}}) <0.000044>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000047>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{39586=>true}}) <0.000050>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000040>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{50696=>true}}) <0.000051>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000040>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{35266=>true}}) <0.000044>
Resolv::DNS.allocate_request_id(RequestID={}) <0.000041>
Resolv::DNS.free_request_id(RequestID={["192.168.1.51", 53]=>{42458=>true}}) <0.000043>
Example output from rbtrace on 2.7.3:
$ rbtrace -p 42 -m 'allocate_request_id(RequestID.values.first.count)' 'free_request_id(RequestID.values.first.count)'
*** attached to process 42
Resolv::DNS.allocate_request_id(RequestID.values.first.count=544) <0.000161>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=545) <0.000162>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=546) <0.000136>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=547) <0.000111>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=548) <0.000166>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=549) <0.000140>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=550) <0.000118>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=551) <0.000149>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=552) <0.000113>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=553) <0.000141>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=554) <0.000180>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=555) <0.000174>
Resolv::DNS.allocate_request_id(RequestID.values.first.count=556) <0.000155>
…
not reproducible on:
- ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux-musl] (ruby:2.7.2-alpine docker image)
- ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
reproducible on:
- ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [x86_64-linux-musl] (ruby:2.7.3-alpine docker image)
- ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [x86_64-linux]
- ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
- ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
Updated by nagachika (Tomoyuki Chikanaga) over 3 years ago
- Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN to 2.5: DONTNEED, 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED
If the issue was introduced by 70c3a195f39763dccdf9367d0c9b7e815431a41a, ruby-2.6.7 might be affected too. Since the changeset was backported into ruby_2_6 too. https://bugs.ruby-lang.org/issues/17658#note-5
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
- Status changed from Open to Assigned
- Assignee set to akr (Akira Tanaka)
This should be fixed by https://github.com/ruby/resolv/pull/9
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
- Has duplicate Bug #17783: Running sidekiq with ruby 3.0 (possible thread deadlock on resolv request_id allocation and free) added
Updated by bensullivan (Ben Sullivan) over 3 years ago
It looks like we also ran into this problem in production with ruby 2.6.7 as well. Following the instructions given, I was able to recreate the issue on our ruby:2.6.7-slim based docker image.
Updated by sam.saffron (Sam Saffron) over 3 years ago
update... https://github.com/ruby/resolv/pull/9/files got merged.
Updated by mame (Yusuke Endoh) over 3 years ago
- Status changed from Assigned to Closed
Fixed by 9edc162583a4f685332239f6249745ad9b518cbe.
Updated by nagachika (Tomoyuki Chikanaga) over 3 years ago
- Backport changed from 2.5: DONTNEED, 2.6: REQUIRED, 2.7: REQUIRED, 3.0: REQUIRED to 2.5: DONTNEED, 2.6: REQUIRED, 2.7: REQUIRED, 3.0: DONE
ruby_3_0 9c0df2e81c22e6e35f3c5d69a070c2a3cf67e320 merged revision(s) 9edc162583a4f685332239f6249745ad9b518cbe.
Updated by usa (Usaku NAKAMURA) over 3 years ago
- Backport changed from 2.5: DONTNEED, 2.6: REQUIRED, 2.7: REQUIRED, 3.0: DONE to 2.5: DONTNEED, 2.6: REQUIRED, 2.7: DONE, 3.0: DONE
ruby_2_7 87d02eacd26d0b2884016315baf2440d100f177e merged revision(s) 9edc162583a4f685332239f6249745ad9b518cbe.
Updated by usa (Usaku NAKAMURA) over 3 years ago
- Backport changed from 2.5: DONTNEED, 2.6: REQUIRED, 2.7: DONE, 3.0: DONE to 2.5: DONTNEED, 2.6: DONE, 2.7: DONE, 3.0: DONE
ruby_2_6 r67946 merged revision(s) 9edc1625.