Bug #19017
openNet::HTTP may block when attempting to reuse a persistent connection
Description
Ruby's Net::HTTP code performs a blocking Net::BufferedIO#eof?
check when attempting to reuse a persistent HTTP connection. See https://github.com/ruby/ruby/blob/6b099328af2ae2d04cbfd06fedc36a19cdecd30d/lib/net/http.rb#L1573. The bug is the check can hang for up to the HTTP read_timeout
, which is 60 seconds by default.
The code calls TCPSocket#wait_readable(0)
to see if the socket is readable before calling the blocking eof?
method. However, it's possible for the socket to be readable with SSL Handshake records and no Application Data. So the call to eof?
will process the SSL Handshake records, but hang since no Application Data is available.
The issue can be triggered if a TLS 1.3 server sends a NewSessionTicket
sometime after Application Data is written. The attached client and server code demonstrate the problem. Note it's important that the client and server be on separate hosts otherwise eof?
will always return immediately.
On the server, copy Server.java
and certs.p12
into a directory, install JDK 17, compile the server and run it:
$ openssl pkcs12 -info -in certs.p12 -noout -passin pass:password
MAC: sha1, Iteration 2048
MAC length: 20, salt length: 8
PKCS7 Encrypted data: pbeWithSHA1And40BitRC2-CBC, Iteration 2048
Certificate bag
PKCS7 Data
Shrouded Keybag: pbeWithSHA1And3-KeyTripleDES-CBC, Iteration 2048
$ sudo apt install -y openjdk-17-jdk openjdk-17-jre
$ javac Server.java
$ java -Djavax.net.debug=ssl,verbose Server
Loaded pkcs12
On the client, copy http.rb
and ca.pem
into a directory, add the IP address for the server as pluto
to /etc/hosts
:
$ file ca.pem
$ sudo vi /etc/hosts
...
192.168.0.10 pluto
...
$ ruby --version
ruby 2.7.6p219 (2022-04-12 revision c9c2245c0a) [x86_64-linux]
$ openssl version
OpenSSL 1.1.1f 31 Mar 2020
Run the client to make the first request:
$ ruby http.rb
opening connection to pluto:8888...
opened
starting SSL for pluto:8888...
The server will handle request_1 and trigger a new session ticket:
javax.net.ssl|DEBUG|10|main|2022-09-22 18:18:23.269 UTC|SSLCipher.java:466|jdk.tls.keyLimits: entry = AES/GCM/NoPadding KeyUpdate 2^37. AES/GCM/NOPADDING:KEYUPDATE = 137438953472
Connected to 37532
Handling request_0
... snip ...
javax.net.ssl|DEBUG|10|main|2022-09-22 18:18:25.310 UTC|SSLCipher.java:2024|KeyLimit write side: algorithm = AES/GCM/NOPADDING:KEYUPDATE
countdown value = 137438953472
javax.net.ssl|DEBUG|10|main|2022-09-22 18:18:25.335 UTC|SSLCipher.java:1870|KeyLimit read side: algorithm = AES/GCM/NOPADDING:KEYUPDATE
countdown value = 137438953472
read body
updated session data
javax.net.ssl|ALL|10|main|2022-09-22 18:18:25.343 UTC|SSLSocketImpl.java:1564|trigger new session ticket
wrote response
Handling request_1
The client will hang when trying to reuse the persistent connection:
OSSL_DEBUG: SSL SESSION new callback added [ossl_ssl.c:963]
SSL established, protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384
<- "POST / HTTP/1.1\r\nAccept-Encoding: identity\r\nConnection: keep-alive\r\nContent-Type: text/plain\r\nAccept: */*\r\nUser-Agent: Ruby\r\nHost: pluto:8888\r\nContent-Length: 0\r\n\r\n"
<- ""
OSSL_DEBUG: SSL SESSION new callback entered [ossl_ssl.c:454]
-> "HTTP/1.1 200 OK\r\n"
-> "Content-Length: 0\r\n"
-> "\r\n"
reading 0 bytes...
-> ""
read 0 bytes
Conn keep-alive
OSSL_DEBUG: SSL SESSION new callback entered [ossl_ssl.c:454]
Pressing Ctrl-C shows the backtrace:
^CTraceback (most recent call last):
11: from http.rb:10:in `<main>'
10: from /home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:933:in `start'
9: from http.rb:17:in `block in <main>'
8: from /home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:1294:in `post'
7: from /home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:1506:in `send_entity'
6: from /home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:1492:in `request'
5: from /home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:1518:in `transport_request'
4: from /home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/net/http.rb:1573:in `begin_transport'
3: from /home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/net/protocol.rb:134:in `eof?'
2: from /home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/openssl/buffering.rb:300:in `eof?'
1: from /home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/openssl/buffering.rb:57:in `fill_rbuff'
/home/josh/.rbenv/versions/2.7.6/lib/ruby/2.7.0/openssl/buffering.rb:57:in `sysread': Interrupt
I get the same behavior with latest ruby too:
$ ruby --version
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Changing Net::HTTP to the following:
elsif @socket.io.read_nonblock(0, exception: false).nil?
Resolves the issue:
$ ruby http.rb
opening connection to pluto:8888...
opened
starting SSL for pluto:8888...
OSSL_DEBUG: SSL SESSION new callback added [ossl_ssl.c:963]
SSL established, protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384
<- "POST / HTTP/1.1\r\nAccept-Encoding: identity\r\nConnection: keep-alive\r\nContent-Type: text/plain\r\nAccept: */*\r\nUser-Agent: Ruby\r\nHost: pluto:8888\r\nContent-Length: 0\r\n\r\n"
<- ""
OSSL_DEBUG: SSL SESSION new callback entered [ossl_ssl.c:454]
-> "HTTP/1.1 200 OK\r\n"
-> "Content-Length: 0\r\n"
-> "\r\n"
reading 0 bytes...
-> ""
read 0 bytes
Conn keep-alive
<- "POST / HTTP/1.1\r\nAccept-Encoding: identity\r\nConnection: keep-alive\r\nContent-Type: text/plain\r\nAccept: */*\r\nUser-Agent: Ruby\r\nHost: pluto:8888\r\nContent-Length: 0\r\n\r\n"
<- ""
OSSL_DEBUG: SSL SESSION new callback entered [ossl_ssl.c:454]
-> "HTTP/1.1 200 OK\r\n"
-> "Content-Length: 0\r\n"
-> "\r\n"
reading 0 bytes...
-> ""
read 0 bytes
Conn keep-alive
However, based on https://github.com/ruby/ruby/pull/1089#issuecomment-159878003 that change may not be correct. Or it could be that Ruby on Windows doesn't have this issue anymore.
Files
Updated by ioquatix (Samuel Williams) about 2 years ago
I think we should solve this in IO, not in Net::HTTP. Checking the class/type is a strong indication to me that the problem is not being solved at the right level.
Updated by joshc (Josh C) about 2 years ago
Forgot to add the link to the draft PR: https://github.com/ruby/ruby/pull/6423.
I think we should solve this in IO, not in Net::HTTP. Checking the class/type is a strong indication to me that the problem is not being solved at the right level.
Yes, makes sense. Though for context, I added the type check to temporarily work around these two failing tests:
https://github.com/ruby/spec/blob/8d26c0c202d3c098478fe17067a12b803504187e/library/net/http/http/request_spec.rb#L24-L25
https://github.com/ruby/spec/blob/8d26c0c202d3c098478fe17067a12b803504187e/library/net/http/http/request_spec.rb#L71-L72
Those tests fail because the preceding HEAD tests, e.g. https://github.com/ruby/spec/blob/8d26c0c202d3c098478fe17067a12b803504187e/library/net/http/http/request_spec.rb#L21-L22. leave the socket in a bad state, which causes the POST test to fail.
I think there's likely a bug in net/http when processing HEAD requests with persistent connections. I think it's not fully draining the socket, but I need to investigate more. Once that's resolved, then I think I can remove the type check for http vs https. I also wanted to test on Windows to see if https://github.com/ruby/ruby/pull/1089#issuecomment-159878003 is actually still an issue.
Updated by ioquatix (Samuel Williams) about 2 years ago
@joshc (Josh C) thanks for your work investigating this issue and please let me know once you are ready for another code review.