Actions
Bug #16497
openStringIO#internal_encoding is broken (more severely in 2.7)
Description
To the best of my understanding from Encoding docs, the following is true:
- external encoding (explicitly specified or taken from
Encoding.default_external
) specifies how the IO understands input and stores it internally - internal encoding (explicitly specified or taken from
Encoding.default_internal
) specifies how the IO converts what it reads.
Demonstration with regular files:
# prepare data
File.write('test.txt', 'Україна'.encode('KOI8-U'), encoding: 'KOI8-U') #=> 7
def test(io)
str = io.read
[io.external_encoding, io.internal_encoding, str, str.encoding]
end
# read it:
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
# We can specify internal encoding when opening the file:
test(File.open('test.txt', 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]
# ...or when it is already opened
test(File.open('test.txt').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]
# ...or with Encoding.default_internal
Encoding.default_internal = 'UTF-8'
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]
But with StringIO, internal encoding can't be set in Ruby 2.6:
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')
# Simplest form:
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
# Try to set via mode
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
# Try to set via set_encoding:
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
# Try to set via Enoding.default_internal:
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
So, in 2.6, any attempt to do something with StringIO's internal encoding are just ignored.
In 2.7, though, matters became much worse:
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')
# Behaves same as 2.6
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
# Try to set via mode: WEIRD behavior starts
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]
# Try to set via set_encoding: still just ignored
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
# Try to set via Enoding.default_internal: WEIRD behavior again
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]
So, 2.7 not just ignores attempts to set internal encoding, but erroneously sets it to external one, so strings are not recoded, but their encoding is forced to change.
I believe it is severe bug (more severe than 2.6's "just ignoring").
This Reddit thread shows how it breaks existing code:
- the author uses
StringIO
to work withASCII-8BIT
strings; - the code is performed in Rails environment (which sets
internal_encoding
toUTF-8
by default); - under 2.7,
StringIO#read
returnsASCII-8BIT
content in Strings saying their encoding isUTF-8
.
Actions
Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0