Bug #12678
closedNo way to set a timeout for TLS handshake when using Net::SMTP
Description
When establishing a connection to an SMTP server, Net::SMTP doesn't offer a way to specify a timeout for how long the TLS handshake should take.
In our production environment, this means we routinely see hangs under this callstack:
.../lib/ruby/2.1.0/net/smtp.rb:586:in connect' .../lib/ruby/2.1.0/net/smtp.rb:586:in
tlsconnect'
.../lib/ruby/2.1.0/net/smtp.rb:563:in do_start' .../lib/ruby/2.1.0/net/smtp.rb:520:in
start'
.../shared/bundle/ruby/2.1.0/gems/mail-2.5.4/lib/mail/network/delivery_methods/smtp.rb:112:in `deliver!'
The C-level backtrace here looks like this:
Thread 1 (Thread 0x7f7709587700 (LWP 30870)):
#0 0x00007f77085ea4b7 in ppoll () from /lib64/libc.so.6
#1 0x00007f7709719cd9 in rb_wait_for_single_fd (fd=, events=, tv=0x0) at thread.c:3675
#2 0x00007f7709719e32 in rb_thread_wait_fd_rw (fd=) at thread.c:3514
#3 rb_thread_wait_fd (fd=) at thread.c:3525
#4 0x00007f77095f08bf in rb_io_wait_readable (f=34) at io.c:1094
#5 0x00007f77009aa894 in ossl_start_ssl (self=140149417009120, func=0x7f7700755570 <SSL_connect>, funcname=0x7f77009c3ba7 "SSL_connect", nonblock=) at ossl_ssl.c:1282
#6 0x00007f77096f522a in vm_call_cfunc_with_frame (th=0x7f770a9585b0, reg_cfp=0x7f7709583530, ci=) at vm_insnhelper.c:1510
#7 0x00007f7709708e11 in vm_call_cfunc (th=0x7f770a9585b0, cfp=0x7f7709583530,
You can replicate this behavior by starting a dummy server using 'nc' (netcat), and then running the attached ruby script to connect to it.
Steps to reproduce:
- Start a netcat process listening on port 8888 ('nc -l 8888')
- Run the attached net-smtp-connect-timeout.rb
- Wait
Expected results:
The call to smtp.start should eventually time out.
Actual results:
The call to smtp.start hangs forever.
Notes:
Net::HTTP addresses this same issue by using a loop around SSLSocket.connect_nonblock:
https://github.com/ruby/ruby/blob/trunk/lib/net/http.rb#L934-L943
... seems like Net::SMTP should do the same.
We're hitting this issue using Ruby 2.1.7 in production, but I've also verified that it happens on 2.3.1.