Actions

Copy link

Bug #1098

closed

Unclear encoding error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>

Added by tomel (Tom Link) over 16 years ago. Updated about 14 years ago.

Status:

Rejected

Assignee:

naruse (Yui NARUSE)

Target version:

ruby -v:

ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]

Backport:

[ruby-core:21802]

Description

=begin
The test script below exits with the error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>

This is weird/unclear/incomprehensible because I fail to see what makes ruby think I'm working with utf8. If this isn't a bug, I would kindly ask to make the error message slightly more intelligible by adding information about what is set to UTF-8, what to ISO-8859-1 etc. The way it is now this message is slighlty esoteric.

Test script:

Encoding: CP850¶

p Encoding.default_internal, Encoding.default_external # => nil, CP850
s = "weiß"
p s, s.encoding
p s.encode('ISO-8859-1')
=end

Actions

Copy link

Updated by matz (Yukihiro Matsumoto) over 16 years ago

=begin
Hi,

In message "Re: [ruby-core:21802] [Bug #1098] Unclear encoding error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>"
on Tue, 3 Feb 2009 22:53:34 +0900, Tom Link redmine@ruby-lang.org writes:

|The test script below exits with the error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>

First, since we haven't implemented direct conversion path from CP850
to ISO-8859-1 (yet), Ruby converts strings via UTF-8, hence the
message. If you have suggestion for better description, we are open.

Second, I couldn't reproduce the problem from your test script. The
conversion process goes from CP850 to UTF-8, then from UTF-8 to
ISO-8859-1. The message says resulting UTF-8 text is "\xE2\x96\x80",
which does not have corresponding character in ISO-8859-1 at all.
We have no more clue to draw any conclusion. There are a lot of
possibilities, from a bug in your script, to a bug in Cygwin, of
course including a bug in the trancoding engine.

						matz.

=end

Actions

Copy link

Updated by naruse (Yui NARUSE) over 16 years ago

Category set to M17N
Status changed from Open to Rejected
Assignee set to naruse (Yui NARUSE)

=begin
You declared in magic comment as CP850, but your exact script encoding seems ISO-8859-1.

Your character: U+00DF (Latin Small Leter Sharp S) is,
\xDF in ISO-8859-1
\xE1 in CP850

"\xdf".encode("iso-8859-1","CP850")
Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1
from (irb):24:in `encode'

=end

Actions

Copy link

Updated by tomel (Tom Link) over 16 years ago

=begin

First, since we haven't implemented direct conversion path from CP850
to ISO-8859-1 (yet), Ruby converts strings via UTF-8, hence the
message. If you have suggestion for better description, we are open.

I'd suggest some duplication of information:

UndefinedConversionError: "..." from UTF-8 to ISO-8859-1 in indirect
conversion from CP850 to UTF-8 to ISO-8859-1

or "in indirect conversion from CP850 to ISO-8859-1 via UTF-8"

Second, I couldn't reproduce the problem from your test script.

Well, the problem was that the input really wasn't CP850 but latin-1
and that the setting LANG to xx_XX.ISO-8859-1 doesn't seem make ruby
set the external encoding properly -- although I had assumed that
http://redmine.ruby-lang.org/issues/show/956 would make that possible.

=end

Actions

Copy link

Updated by tomel (Tom Link) over 16 years ago

=begin

You declared in magic comment as CP850, but your exact script encoding seems ISO-8859-1.

It wasn't there in the original script. But you're right.

=end

Actions

Copy link

Updated by duerst (Martin Dürst) over 16 years ago

=begin
At 10:38 09/02/04, you wrote:

Hi,

In message "Re: [ruby-core:21802] [Bug #1098] Unclear encoding error:
#<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to
ISO-8859-1 in conversion from CP850 to ISO-8859-1>"
on Tue, 3 Feb 2009 22:53:34 +0900, Tom Link redmine@ruby-lang.org writes:

|The test script below exits with the error:
#<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to
ISO-8859-1 in conversion from CP850 to ISO-8859-1>

First, since we haven't implemented direct conversion path from CP850
to ISO-8859-1 (yet),

Frankly speaking, I don't think we ever will. It's simply unrealistic
to expect Ruby to have N*(N-1) data tables for N encodings. No
transcoding engine I know does that. We can always add direct
conversions between two non-UTF-8 encodings if it turns out to
be really necessary, but I don't see the reason in this case,
and there's definitely no sense to do it just for improving
error messages.

Ruby converts strings via UTF-8, hence the
message. If you have suggestion for better description, we are open.

Yes indeed. I think one step is to explain better what happened.

Second, I couldn't reproduce the problem from your test script. The
conversion process goes from CP850 to UTF-8, then from UTF-8 to
ISO-8859-1. The message says resulting UTF-8 text is "\xE2\x96\x80",
which does not have corresponding character in ISO-8859-1 at all.

Yes, this is character U+2580 (for a handy conversion script,
I use http://people.w3.org/rishida/scripts/uniview/conversion.php),
UPPER HALF BLOCK. It doesn't exist in ISO-8859-1, and therefore
the script produces the above error. It simply says that there
is no defined conversion between UTF-8 and ISO-8859-1 for that
character, which by extension means that there is no defined
conversion from CP850 to ISO-8859-1 for this character.

We have no more clue to draw any conclusion. There are a lot of
possibilities, from a bug in your script, to a bug in Cygwin, of
course including a bug in the trancoding engine.

This part is wrong. The conclusion is very clear. The script,
Cygwin, and the transcoding engine all are okay (at least as
far as this issue is concerned).

Regards, Martin.

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp

=end

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #1098

Unclear encoding error: #<Encoding::UndefinedConversionError: "\xE2\x96\x80" from UTF-8 to ISO-8859-1 in conversion from CP850 to ISO-8859-1>

Encoding: CP850¶

Updated by matz (Yukihiro Matsumoto) over 16 years ago

Updated by naruse (Yui NARUSE) over 16 years ago

Updated by tomel (Tom Link) over 16 years ago

Updated by tomel (Tom Link) over 16 years ago

Updated by duerst (Martin Dürst) over 16 years ago