Project

General

Profile

Actions

Bug #7584

closed

Ruby hangs when shutting down an ssl connection in gc finalization

Added by bpot (Bob Potter) about 12 years ago. Updated almost 12 years ago.

Status:
Closed
Target version:
ruby -v:
ruby 1.9.3p327 (2012-11-10) [i686-linux]
Backport:
[ruby-core:50951]

Description

It appears that during garbage collection ruby will sometimes attempt to shutdown an ssl connection that has already been shutdown. At least in certain cases this causes the SSL_shutdown call to block forever.

(gdb) bt
#0 0xb76fa424 in __kernel_vsyscall ()
#1 0xb740d17b in read () from /lib/i386-linux-gnu/tls/i686/nosegneg/libc.so.6
#2 0xb6faabae in ?? () from /lib/i386-linux-gnu/libcrypto.so.1.0.0
#3 0xb6fa8412 in BIO_read () from /lib/i386-linux-gnu/libcrypto.so.1.0.0
#4 0xb70c3157 in ?? () from /lib/i386-linux-gnu/libssl.so.1.0.0
#5 0xb70c42d5 in ?? () from /lib/i386-linux-gnu/libssl.so.1.0.0
#6 0xb70c1371 in ?? () from /lib/i386-linux-gnu/libssl.so.1.0.0
#7 0xb70dc931 in SSL_shutdown () from /lib/i386-linux-gnu/libssl.so.1.0.0
#8 0xb711ba9d in ossl_ssl_shutdown (ssl=0xb21c1f18) at ossl_ssl.c:987
#9 0xb711bb31 in ossl_ssl_free (ssl=0xb21c1f18) at ossl_ssl.c:998
#10 0xb753bba5 in run_final (obj=229474180, objspace=) at gc.c:2932
#11 finalize_list (objspace=0x98805c0, p=0xdad7f84) at gc.c:1946
(gdb) f 9
#9 0xb711bb31 in ossl_ssl_free (ssl=0xb21c1f18) at ossl_ssl.c:998
998 in ossl_ssl.c
(gdb) p (*ssl).shutdown
$8 = 1

I'm not sure exactly what is causing this condition, but it seems like we should be checking if the connection is already shutdown (ssl_get_shutdown()) before trying to shut it down in ossl_ssl_free.


Files

info.txt (8.4 KB) info.txt bpot (Bob Potter), 03/23/2013 10:32 AM

Related issues 1 (0 open1 closed)

Related to Backport200 - Backport #8240: SSLSocket breaks other connections or files on GCClosednagachika (Tomoyuki Chikanaga)04/09/2013Actions

Updated by drbrain (Eric Hodel) about 12 years ago

  • Category set to ext
  • Assignee set to MartinBosslet (Martin Bosslet)
  • Target version set to 2.0.0

Updated by usa (Usaku NAKAMURA) about 12 years ago

  • Status changed from Open to Assigned

Updated by mame (Yusuke Endoh) almost 12 years ago

I'm not sure how significant this is.
Martin, what do you think?

--
Yusuke Endoh

Updated by bpot (Bob Potter) almost 12 years ago

For what it's worth, we were able to work around this by explicitly shutting down ssl connections instead of letting it happen during GC.

Updated by mame (Yusuke Endoh) almost 12 years ago

  • Target version changed from 2.0.0 to 2.1.0

Thank you Bob!

Then I postpone this to 2.1.0.
If there is a good patch available soon, we may backport it to ruby_2_0_0.

--
Yusuke Endoh

Updated by bpot (Bob Potter) almost 12 years ago

=begin
Digging into this some more because it cropped up in another project.

I've had some luck reproducing this issue with a small script: https://gist.github.com/bpot/5225920

Sometimes it reproduces easily, sometimes it doesn't. I've been testing with "ruby 1.9.3p374". I haven't tested to see if it happens with the latest trunk.

I've attached a file with some gdb logs of the state of the SSL struct and lsof output for the process after it hangs. One oddity is that the SSL connection appears to be for FD 11 but lsof doesn't have output for such a file descriptor. You would expect the hanging read to just receive EBADF in this case, right?

I don't know if this is related to the hanging issue but I've noticed that often during GC the underlying TCPSocket object's finalizer will be called before the SSLSocket's finalizer preventing us from closing down SSL connection correctly:

19:28:56.520556 close(13) = 0
19:28:56.520969 write(13, "\25\3\3\0\26\177\n\232\255{9iWfz4\255\240 \276\254!\222\374\10C{", 27) = -1 EBADF (Bad file descriptor)

Should SSLSocket be setting a custom finalizer on the underlying rb_io_t?
=end

Updated by bpot (Bob Potter) almost 12 years ago

It looks like it is a race condition where the TCPSocket finalizer closes a fd, that fd is reused by another thread and then we try to do SSL_shutdown on this new connection and bad things happen.

Would it make sense to have the TCPSocket finalizer be a no-op (when it's used for ssl) and have SSLSocket's free operation take care of shutting everything down? Is there a better way to solve this?

Updated by nobu (Nobuyoshi Nakada) almost 12 years ago

The order of finalizers is indeterminate.

You'd be able to leave FDs open with IO#autoclose.

Updated by MartinBosslet (Martin Bosslet) almost 12 years ago

Bob, could you please check if the patch applied in https://bugs.ruby-lang.org/issues/8240 fixed this? Thanks!

Updated by bpot (Bob Potter) almost 12 years ago

Looks like that fixes it! I was unable to reproduce with "ruby 2.1.0dev (2013-04-15 trunk 40308) [x86_64-linux]".

Updated by MartinBosslet (Martin Bosslet) almost 12 years ago

  • Status changed from Assigned to Closed

Great, thanks for checking! I'll close this, too, then. If anyone still runs into this or a similar issue, please let me know!

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0