Project

General

Profile

Actions

Bug #1098

closed

Unclear encoding error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>

Added by tomel (Tom Link) about 15 years ago. Updated about 13 years ago.

Status:
Rejected
Target version:
-
ruby -v:
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
Backport:
[ruby-core:21802]

Description

=begin
The test script below exits with the error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>

This is weird/unclear/incomprehensible because I fail to see what makes ruby think I'm working with utf8. If this isn't a bug, I would kindly ask to make the error message slightly more intelligible by adding information about what is set to UTF-8, what to ISO-8859-1 etc. The way it is now this message is slighlty esoteric.

Test script:

Encoding: CP850

p Encoding.default_internal, Encoding.default_external # => nil, CP850
s = "weiß"
p s, s.encoding
p s.encode('ISO-8859-1')
=end

Actions #1

Updated by matz (Yukihiro Matsumoto) about 15 years ago

=begin
Hi,

In message "Re: [ruby-core:21802] [Bug #1098] Unclear encoding error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>"
on Tue, 3 Feb 2009 22:53:34 +0900, Tom Link writes:

|The test script below exits with the error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>

First, since we haven't implemented direct conversion path from CP850
to ISO-8859-1 (yet), Ruby converts strings via UTF-8, hence the
message. If you have suggestion for better description, we are open.

Second, I couldn't reproduce the problem from your test script. The
conversion process goes from CP850 to UTF-8, then from UTF-8 to
ISO-8859-1. The message says resulting UTF-8 text is "\xE2\x96\x80",
which does not have corresponding character in ISO-8859-1 at all.
We have no more clue to draw any conclusion. There are a lot of
possibilities, from a bug in your script, to a bug in Cygwin, of
course including a bug in the trancoding engine.

						matz.

=end

Actions #2

Updated by naruse (Yui NARUSE) about 15 years ago

  • Category set to M17N
  • Status changed from Open to Rejected
  • Assignee set to naruse (Yui NARUSE)

=begin
You declared in magic comment as CP850, but your exact script encoding seems ISO-8859-1.

Your character: U+00DF (Latin Small Leter Sharp S) is,
\xDF in ISO-8859-1
\xE1 in CP850

"\xdf".encode("iso-8859-1","CP850")
Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1
from (irb):24:in `encode'

=end

Actions #3

Updated by tomel (Tom Link) about 15 years ago

=begin

First, since we haven't implemented direct conversion path from CP850
to ISO-8859-1 (yet), Ruby converts strings via UTF-8, hence the
message.  If you have suggestion for better description, we are open.

I'd suggest some duplication of information:

UndefinedConversionError: "..." from UTF-8 to ISO-8859-1 in indirect
conversion from CP850 to UTF-8 to ISO-8859-1

or "in indirect conversion from CP850 to ISO-8859-1 via UTF-8"

Second, I couldn't reproduce the problem from your test script.

Well, the problem was that the input really wasn't CP850 but latin-1
and that the setting LANG to xx_XX.ISO-8859-1 doesn't seem make ruby
set the external encoding properly -- although I had assumed that
http://redmine.ruby-lang.org/issues/show/956 would make that possible.

=end

Actions #4

Updated by tomel (Tom Link) about 15 years ago

=begin

You declared in magic comment as CP850, but your exact script encoding seems ISO-8859-1.

It wasn't there in the original script. But you're right.

=end

Actions #5

Updated by duerst (Martin Dürst) about 15 years ago

=begin
At 10:38 09/02/04, you wrote:

Hi,

In message "Re: [ruby-core:21802] [Bug #1098] Unclear encoding error:
#<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to
ISO-8859-1 in conversion from CP850 to ISO-8859-1>"
on Tue, 3 Feb 2009 22:53:34 +0900, Tom Link writes:

|The test script below exits with the error:
#<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to
ISO-8859-1 in conversion from CP850 to ISO-8859-1>

First, since we haven't implemented direct conversion path from CP850
to ISO-8859-1 (yet),

Frankly speaking, I don't think we ever will. It's simply unrealistic
to expect Ruby to have N*(N-1) data tables for N encodings. No
transcoding engine I know does that. We can always add direct
conversions between two non-UTF-8 encodings if it turns out to
be really necessary, but I don't see the reason in this case,
and there's definitely no sense to do it just for improving
error messages.

Ruby converts strings via UTF-8, hence the
message. If you have suggestion for better description, we are open.

Yes indeed. I think one step is to explain better what happened.

Second, I couldn't reproduce the problem from your test script. The
conversion process goes from CP850 to UTF-8, then from UTF-8 to
ISO-8859-1. The message says resulting UTF-8 text is "\xE2\x96\x80",
which does not have corresponding character in ISO-8859-1 at all.

Yes, this is character U+2580 (for a handy conversion script,
I use http://people.w3.org/rishida/scripts/uniview/conversion.php),
UPPER HALF BLOCK. It doesn't exist in ISO-8859-1, and therefore
the script produces the above error. It simply says that there
is no defined conversion between UTF-8 and ISO-8859-1 for that
character, which by extension means that there is no defined
conversion from CP850 to ISO-8859-1 for this character.

We have no more clue to draw any conclusion. There are a lot of
possibilities, from a bug in your script, to a bug in Cygwin, of
course including a bug in the trancoding engine.

This part is wrong. The conclusion is very clear. The script,
Cygwin, and the transcoding engine all are okay (at least as
far as this issue is concerned).

Regards, Martin.

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp

=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0