Bug #18995
openIO#set_encoding sometimes set an IO's internal encoding to the default external encoding
Description
This script demonstrates the behavior:
def show(io)
printf(
"external encoding: %-25p internal encoding: %-25p\n",
io.external_encoding,
io.internal_encoding
)
end
Encoding.default_external = 'iso-8859-1'
Encoding.default_internal = 'iso-8859-2'
File.open('/dev/null') do |f|
f.set_encoding('utf-8', nil)
show(f) # f.internal_encoding is iso-8859-2, as expected
f.set_encoding('utf-8', 'invalid')
show(f) # f.internal_encoding is now iso-8859-1!
Encoding.default_external = 'iso-8859-3'
Encoding.default_internal = 'iso-8859-4'
show(f) # f.internal_encoding is now iso-8859-3!
end
In the 1st case, we see that the IO's internal encoding is set to the current setting of Encoding.default_internal. In the 2nd case, the IO's internal encoding is set to Encoding.default_external instead. The 3rd case is more interesting because it shows that the IO's internal encoding is actually following the current setting of Encoding.default_external. It didn't just copy it when #set_encoding was called. It changes whenever Encoding.default_external changes.
What should the correct behavior be?
Updated by javanthropus (Jeremy Bopp) about 2 years ago
Can anyone confirm that this is a bug and not a misunderstanding? It looks like the changes to fix this will require a fair bit of refactoring, and there don't yet appear to be any tests around the various cases for arguments to IO#set_encoding
where IO#internal_encoding
and IO#external_encoding
are checked. I found tests around various ways of opening files and pipes with encoding arguments which do check the resulting internal and external encodings of the IO object, but none of those test these corner cases.
Updated by javanthropus (Jeremy Bopp) 6 months ago ยท Edited
@jeremyevans0 (Jeremy Evans), did you ever take a look at this issue when I referenced it in #18899? The behavior is unchanged in Ruby 3.3.
The script above prints the following:
external encoding: #<Encoding:UTF-8> internal encoding: #<Encoding:ISO-8859-2>
external encoding: #<Encoding:UTF-8> internal encoding: #<Encoding:ISO-8859-1>
external encoding: #<Encoding:UTF-8> internal encoding: #<Encoding:ISO-8859-3>
I expected it to print this:
external encoding: #<Encoding:UTF-8> internal encoding: #<Encoding:ISO-8859-2>
external encoding: #<Encoding:UTF-8> internal encoding: #<Encoding:ISO-8859-2>
external encoding: #<Encoding:UTF-8> internal encoding: #<Encoding:ISO-8859-4>