Project

General

Profile

Actions

Bug #18679

closed

Encoding::UndefinedConversionError: "\xE2" from ASCII-8BIT to UTF-8

Added by taf2 (Todd Fisher) about 2 years ago. Updated about 2 years ago.

Status:
Rejected
Assignee:
-
Target version:
-
[ruby-core:108176]

Description

We are facing an issue only when running ruby on arm from amazon linux. In some cases when we puts a string we'll receive the above error message. However when we run the same data through puts on Intel we do not receive this error. I am not sure if this is a ruby issue maybe an iconv issue... but what would be the best way to capture more data to help from here?

Updated by taf2 (Todd Fisher) about 2 years ago

I found some additional insight... on Intel we can puts File.read("this-file-contains-utf8") # and no crash

On arm in some cases when we do

puts File.read("this-file-contains-uf8") # it crashes with an encoding error ...

Adding encoding: 'UTF-8' # does resolve this but... still in some cases we have found that if we receive bytes say from an HTTP request... and puts it'll crash... on arm but not intel...

Updated by duerst (Martin Dürst) about 2 years ago

First, if the error says Encoding::UndefinedConversionError, then I think it's not related to iconv, because iconv only gets used when you explicitly say so. Ruby has its own internal character conversion code.

Second, it's very clear that you get a conversion error when you try to convert "\xE2" from ASCII-8BIT to UTF-8. In ASCII-8BIT, "\xE2" is just a binary byte, without any character defined on it. There's no way to convert that to a character in UTF-8.

The "\xE2" byte may be the start of an UTF-8 byte sequence, somewhere between U+2000 (E2 80 80) and U+2FFF (E2 BF BF). But in that case, there would be no need to convert, only a need to label the encoding correctly. Of course, the "\E2" byte may also be something else.

Updated by byroot (Jean Boussier) about 2 years ago

You might want to look at wether Encoding.default_internal and Encoding.default_external matches on your two platforms.

Updated by Eregon (Benoit Daloze) about 2 years ago

My bet would be the locale is not set properly on the arm machine.
locale probably shows C or POSIX and many things don't work with that.
You probably need export LANG=en_US.UTF-8 or so.

I think CRuby should warn in that case. TruffleRuby already does.

Updated by taf2 (Todd Fisher) about 2 years ago

@byroot (Jean Boussier) thank you! that was it on intel: Encoding.default_internal
=> #Encoding:UTF-8

On arm:

Encoding.default_internal
=> nil

Actions #6

Updated by byroot (Jean Boussier) about 2 years ago

@taf2 (Todd Fisher), in that case it's indeed a $LANG problem.

Updated by duerst (Martin Dürst) about 2 years ago

  • Status changed from Open to Rejected

It seems clear that this isn't a Ruby bug. So I'm closing this issue. But please feel free to continue discussing the solution here if that helps.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0