Project

General

Profile

Actions

Feature #19965

closed

Make the name resolution interruptible

Added by mame (Yusuke Endoh) about 1 year ago. Updated 11 months ago.

Status:
Closed
Target version:
-
[ruby-core:115104]

Description

Problem

Currently, Ruby name resolution is not interruptible.

$ cat /etc/resolv.conf
nameserver 198.51.100.1

$ ./local/bin/ruby -rsocket -e 'Addrinfo.getaddrinfo("www.ruby-lang.org", 80)'
^C^C^C^C

If you set a non-responsive IP as the nameserver, you cannot stop Addrinfo.getaddrinfo by pressing Ctrl+C. Note that Timeout.timeout does not work either.

This is because there is no way to cancel getaddrinfo(3).

Proposal

I wrote a patch to make getaddrinfo(3) work in a separate pthread.

https://github.com/ruby/ruby/pull/8695

Whenever it needs name resolution, it creates a worker pthread, and executes getaddrinfo(3) in it.
The caller thread waits for the worker to complete.
When an interrupt occurs, the caller thread leaves stop waiting and leaves the worker pthread.
The detached worker pthread will exit after getaddrinfo(3) completes (or name resolution times out).

Evaluation

By applying this patch, name resolution is now interruptible.

$ ./local/bin/ruby -rsocket -e 'pp Addrinfo.getaddrinfo("www.ruby-lang.org", 80)'
^C-e:1:in `getaddrinfo': Interrupt
        from -e:1:in `<main>'

As a drawback, name resolution performance will be degraded.

10000.times { Addrinfo.getaddrinfo("www.ruby-lang.org", 80) }
# Before patch: 2.3 sec.
# After ptach: 3.0 sec.

However, I think that name resolution is typically short enough for the application's runtime. For example, the difference is small for the performance of URI.open.

100.times { URI.open("https://www.ruby-lang.org").read }
# Before patch: 3.36 sec.
# After ptach: 3.40 sec.

Alternative approaches

I proposed using c-ares to resolve this issue (#19430). However, there was an opinion that it would be a problem that c-ares does not respect the platform-dependent own name resolution.

Room for improvement


Related issues 3 (1 open2 closed)

Related to Ruby master - Feature #19430: Contribution wanted: DNS lookup by c-ares libraryOpenActions
Related to Ruby master - Feature #16476: Socket.getaddrinfo cannot be interrupted by Timeout.timeoutClosedGlass_saga (Masaki Matsushita)Actions
Related to Ruby master - Bug #20172: Socket.addrinfo failing randomlyClosedActions
Actions #1

Updated by mame (Yusuke Endoh) about 1 year ago

  • Related to Feature #19430: Contribution wanted: DNS lookup by c-ares library added
Actions #2

Updated by mame (Yusuke Endoh) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|3dc311bdc8badb680267f5a10e0c467ddd9dfe4c.


Make rb_getaddrinfo interruptible

When pthread_create is available, rb_getaddrinfo creates a pthread and
executes getaddrinfo(3) in it. The caller thread waits for the pthread
to complete, but detaches it if interrupted. This allows name resolution
to be interuppted by Timeout.timeout, etc. even if it takes a long time
(for example, when the DNS server does not respond). [Feature #19965]

Actions #3

Updated by hsbt (Hiroshi SHIBATA) about 1 year ago

  • Related to Feature #16476: Socket.getaddrinfo cannot be interrupted by Timeout.timeout added

Updated by byroot (Jean Boussier) about 1 year ago

@mame (Yusuke Endoh) we just ran into a crash on our ruby-head nightly CI that seem related:

/app/components/platform/essentials/lib/http_host_restriction.rb:50: [BUG] Segmentation fault at 0x00007ff23f795910
ruby 3.3.0dev (2023-11-06T03:01:06Z shopify a763d085e4) +YJIT [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0139 p:---- s:0684 e:000683 CFUNC  :ip_address
c:0138 p:0006 s:0680 e:000679 METHOD /app/components/platform/essentials/lib/http_host_restriction.rb:50
c:0137 p:0024 s:0672 e:000671 METHOD /app/components/platform/essentials/lib/http_host_restriction.rb:86

-- C level backtrace information -------------------------------------------
/usr/local/ruby/bin/ruby(rb_print_backtrace+0x14) [0x557302ccae51] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm_dump.c:812
/usr/local/ruby/bin/ruby(rb_vm_bugreport) /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm_dump.c:1143
/usr/local/ruby/bin/ruby(rb_bug_for_fatal_signal+0xfc) [0x557302e77a7c] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/error.c:1065
/usr/local/ruby/bin/ruby(sigsegv+0x4d) [0x557302c1763d] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/signal.c:920
/lib/x86_64-linux-gnu/libc.so.6(0x7ff42d333520) [0x7ff42d333520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_setaffinity_np+0x4) [0x7ff42d38c524]
/usr/local/ruby/lib/ruby/3.3.0+0/x86_64-linux/socket.so(rb_getnameinfo+0xf2) [0x7ff40f1c8f92] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/ext/socket/raddrinfo.c:711
/usr/local/ruby/lib/ruby/3.3.0+0/x86_64-linux/socket.so(rb_getnameinfo) (null):0
/usr/local/ruby/lib/ruby/3.3.0+0/x86_64-linux/socket.so(addrinfo_getnameinfo+0x88) [0x7ff40f1c94e8] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/ext/socket/raddrinfo.c:2372
/usr/local/ruby/lib/ruby/3.3.0+0/x86_64-linux/socket.so(addrinfo_ip_address+0x59) [0x7ff40f1c95f9] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/ext/socket/raddrinfo.c:2430
/usr/local/ruby/bin/ruby(vm_call_cfunc_with_frame_+0x117) [0x557302ca77e7] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm_insnhelper.c:3503
/usr/local/ruby/bin/ruby(vm_sendish+0x9e) [0x557302cbafc7] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm_insnhelper.c:5581
/usr/local/ruby/bin/ruby(vm_exec_core) /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/insns.def:822
/usr/local/ruby/bin/ruby(rb_vm_exec+0x18e) [0x557302cab87e] /tmp/ruby-build/ruby-3.3.0-a763d085e446d4a3cb09bd5f6bcaffc30484e804/vm.c:2472

Let me know if I can provide more information.

Updated by mame (Yusuke Endoh) about 1 year ago

  • Status changed from Closed to Assigned
  • Assignee set to mame (Yusuke Endoh)

@byroot (Jean Boussier) Thanks! Maybe I'm misunderstanding the usage of pthread_setaffinity_np. I'll check it out. If I don't understand it, I'll stop using pthread_setaffinity_np.

Updated by byroot (Jean Boussier) about 1 year ago

Important note: in our environment we do fork a lot, so it's not impossible that the cause may be the that the thread in dead.

Updated by mame (Yusuke Endoh) about 1 year ago

Actually, I saw the same problem with CI on RedHat on s390x.

https://rubyci.s3.amazonaws.com/rhel_zlinux/ruby-master/log/20231025T093302Z.fail.html.gz

-- C level backtrace information -------------------------------------------
unknown address_size:0/home/chkbuild/build/20231025T093302Z/ruby/ruby(rb_print_backtrace+0x10) [0x2aa22b5eb06] vm_dump.c:812
/home/chkbuild/build/20231025T093302Z/ruby/ruby(rb_vm_bugreport) vm_dump.c:1143
/home/chkbuild/build/20231025T093302Z/ruby/ruby(rb_bug_for_fatal_signal+0xc2) [0x2aa22c62da2] error.c:1065
/home/chkbuild/build/20231025T093302Z/ruby/ruby(sigill+0x0) [0x2aa22a9f000] signal.c:920
/home/chkbuild/build/20231025T093302Z/ruby/ruby(sigsegv) (null):0
[0x3fef1782718]
/lib64/libpthread.so.0(pthread_setaffinity_np+0x44) [0x3ff8031103c]
/home/chkbuild/build/20231025T093302Z/ruby/.ext/s390x-linux/socket.so(rb_getnameinfo+0x290) [0x3ff567a3340]

I thought it might be specific to glibc on s390x, and I stopped using pthread_setaffinity_np on only s390x. But if it appears on other environments as well (especially x86_64), I'll have to do something.

Actions #9

Updated by mame (Yusuke Endoh) about 1 year ago

  • Status changed from Assigned to Closed

Applied in changeset git|d0066211f2052bf1444ffeb11544860a12cebff2.


Detach a pthread after pthread_setaffinity_np

After a pthread for getaddrinfo is detached, we cannot predict when the
thread will exit. It would lead to a segfault by setting
pthread_setaffinity to the terminated pthread. I guess this problem
would be more likely to occur in high-load environments.

This change detaches the pthread after pthread_setaffinity is called.
[Feature #19965]

Actions #10

Updated by hsbt (Hiroshi SHIBATA) 11 months ago

  • Related to Bug #20172: Socket.addrinfo failing randomly added
Actions

Also available in: Atom PDF

Like2
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0