Bug #12431
closedStrange behavior of String#encode('UTF-8', 'UTF-8', ...) when the encoding of the source string is not UTF-8
Description
When the dst_encoding and src_encoding options of String#encode are the same, it appears to ignore the encoding given and instead operate on the actual encoding of the string. Examples:
"abcdÁ".force_encoding('ASCII').encode('UTF-8', 'UTF-8', invalid: :replace, undef: :replace)
=> "abcd??"
"abcdÁ".force_encoding('ASCII').encode('UTF-8', 'UTF-8', invalid: :replace, undef: :replace, replace: '�')
Encoding::CompatibilityError: incompatible character encodings: US-ASCII and UTF-8
"abcdÁ\xff".encode('ASCII', 'ASCII', invalid: :replace, undef: :replace).force_encoding('UTF-8')
=> "abcdÁ�"
Also, without the "replace" options, exceptions are not raised as they should be:
"\xff".force_encoding('ASCII').encode('UTF-8', 'UTF-8')
=> "\xFF"
I looked a little at the code, and I think the problem might be in this block where the given string is passed to rb_str_scrub
without any other encoding information.
What I would expect is for s.dup.force_encoding('X').encode('Y', opts)
to behave identically to s.encode('Y', 'X', opts)
, but that is clearly not the case.
Verified on Ruby 2.1.5, 2.3.0, and 2.3.1.
Updated by nobu (Nobuyoshi Nakada) over 8 years ago
- Status changed from Open to Closed
Applied in changeset r55181.
transcode.c: scrub in the given encoding
- transcode.c (str_transcode0): scrub in the given encoding when
the source encoding is given, not in the encoding of the
receiver. [ruby-core:75732] [Bug #12431]
Updated by usa (Usaku NAKAMURA) over 8 years ago
- Backport changed from 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN to 2.1: WONTFIX, 2.2: REQUIRED, 2.3: REQUIRED
Updated by nagachika (Tomoyuki Chikanaga) about 8 years ago
- Backport changed from 2.1: WONTFIX, 2.2: REQUIRED, 2.3: REQUIRED to 2.1: WONTFIX, 2.2: REQUIRED, 2.3: DONE
ruby_2_3 r55905 merged revision(s) 55181.
Updated by usa (Usaku NAKAMURA) about 8 years ago
- Backport changed from 2.1: WONTFIX, 2.2: REQUIRED, 2.3: DONE to 2.1: WONTFIX, 2.2: DONE, 2.3: DONE
ruby_2_2 r55936 merged revision(s) 55181.
Updated by nobu (Nobuyoshi Nakada) about 7 years ago
- Related to Bug #13874: String#valid_encoding? has side effects added
Updated by duerst (Martin Dürst) almost 6 years ago
- Related to Bug #8123: Transcoding exception when using replace along with universal_newline added