Bug #20412
openUTF-8 String encoding behavior differs between 3.2, 3.3 and master
Description
When a String that contains only a \0
byte is mutated by an extension to an invalid UTF-8 sequence, calling .encode('UTF-8')
does not consistently raise UndefinedConversionError
across ruby versions. When the string is longer than 1 byte, all versions I've tested correctly raise UndefinedConversionError
.
For Ruby 3.2, UndefinedConversionError
being raised appears to depend on where the string was originally allocated.
For Ruby 3.3, UndefinedConversionError
is never raised.
For master ad90fdd24c, UndefinedConversionError
is always correctly raised.
I haven't been able to find a bug for this, but it seems like there is a fix in master that should be backported to at least 3.2 and 3.3.
I have not tested 3.1.
The attached reproducer depends on rbnacl
because it is minimized from a cryptographic project, and I wasn't able to reduce it further.
Expected Output¶
For all versions:
$ ruby repro.rb 1
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
$ ruby repro.rb 2
"RUBY: [version]"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
Actual Output¶
Ruby 3.2¶
$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 1
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"
$ ASDF_RUBY_VERSION=3.2.3 ruby -v; ASDF_RUBY_VERSION=3.2.3 ruby repro.rb 2
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
"RUBY: 3.2.3"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
Ruby 3.3¶
$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 1
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"FAIL: ciphertext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: ciphertext_local is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_rbnacl is not valid UTF-8 and did not error during encoding to UTF-8"
"FAIL: plaintext_local is not valid UTF-8 and did not error during encoding to UTF-8"
$ ASDF_RUBY_VERSION=3.3.0 ruby -v; ASDF_RUBY_VERSION=3.3.0 ruby repro.rb 2
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]
"RUBY: 3.3.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
Ruby Master¶
$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 1
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
$ ASDF_RUBY_VERSION=ruby-dev ruby -v; ASDF_RUBY_VERSION=ruby-dev ruby repro.rb 2
ruby 3.4.0dev (2024-04-06T17:33:16Z master ad90fdd24c) [x86_64-linux]
"RUBY: 3.4.0"
"OK: ciphertext_rbnacl is not valid UTF-8"
"OK: ciphertext_local is not valid UTF-8"
"OK: plaintext_rbnacl is not valid UTF-8"
"OK: plaintext_local is not valid UTF-8"
Files