Project

General

Profile

Actions

Bug #19468

closed

Ruby 3.2: net/http sets UTF-8 encoding for binary responses

Added by romuloceccon (Rômulo Ceccon) over 1 year ago. Updated over 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.1 (2023-02-22 revision 65ab2c1ef2) [x86_64-linux]
[ruby-core:112630]

Description

net/http on Ruby 3.2 has changed the encoding of binary responses from SSL connected hosts (non-SSL connections are not affected):

# req.rb
require 'openssl'
require 'net/http'
puts "openssl ext: #{OpenSSL::VERSION}"
puts "openssl lib: #{OpenSSL::OPENSSL_VERSION}"
puts "net-protocol: #{Net::Protocol::VERSION}"
puts "net-http: #{Net::HTTP::VERSION}"
puts Net::HTTP.get(URI(ARGV.first)).encoding

Ruby 3.1 (with updated net-protocol and net-http libs):

$ ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
$ ruby req.rb https://www.gnu.org/software/gzip/manual/gzip.pdf
openssl ext: 3.0.0
openssl lib: OpenSSL 1.1.1n  15 Mar 2022
net-protocol: 0.2.1
net-http: 0.3.2
ASCII-8BIT     # <== CORRECT

Ruby 3.2 (latest git revision):

$ ruby -v
ruby 3.2.1 (2023-02-22 revision 65ab2c1ef2) [x86_64-linux]
$ ruby req.rb https://www.gnu.org/software/gzip/manual/gzip.pdf
openssl ext: 3.1.0
openssl lib: OpenSSL 1.1.1n  15 Mar 2022
net-protocol: 0.2.1
net-http: 0.3.2
UTF-8          # <== WRONG

I've tracked the problem down to the SSL socket call at https://github.com/ruby/ruby/blob/9557c8edf2dcf18fdece066c596a71696b2f2b30/lib/net/protocol.rb#L218.

The string returned has the encoding set to ASCII-8BIT, but #ascii_only? also always reports true, even when there are non-ascii bytes. This seems to be a bug, and is the probably cause of the change in behavior in net/http. On Ruby 3.1 concatenating the result of reading the SSL socket to a UTF-8 string produces an ASCII-8BIT string. On Ruby 3.2 the concatenation produces a UTF-8 string.

Here's a program demonstrating the behavior of the SSL socket:

# ssltest.rb
require 'openssl'
require 'uri'

url = URI(ARGV.first)
path = url.path
path += '?' + url.query if url.query

req = "GET #{path} HTTP/1.1\r\nHost: #{url.hostname}\r\nAccept: */*\r\n\r\n"

sock = OpenSSL::SSL::SSLSocket.open(url.hostname, url.port || HTTPS.default_https_port)
sock.connect
sock.write(req)

sleep(1)

loop do
  sleep(0.1)
  b = ''.b
  r = sock.read_nonblock(1024 * 16, b, exception: false)
  break unless String === r
  p [r.bytesize, r.encoding.to_s, r.ascii_only?]
end

Ruby 3.1:

$ ruby ssltest.rb https://www.gnu.org/software/gzip/manual/gzip.pdf
[475, "ASCII-8BIT", true]
[16384, "ASCII-8BIT", false] # <== always false (except HTTP header): CORRECT
[16384, "ASCII-8BIT", false]
...
[13927, "ASCII-8BIT", false]

Ruby 3.2:

$ ruby ssltest.rb https://www.gnu.org/software/gzip/manual/gzip.pdf
[475, "ASCII-8BIT", true]
[16384, "ASCII-8BIT", true] # <== always true: WRONG
[16384, "ASCII-8BIT", true]
...
[13927, "ASCII-8BIT", true]
Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0