Hello Adam,
On 2009/11/01 10:35, Adam Salter wrote:
Issue #2313 has been updated by Adam Salter.
OK I understand now :) I was mixing up the available encoding converters... There is no Encoding::Converter
from UTF-8 to ASCII-8BIT (or visa versa ;).
No, there should be an Encoding::Converter
from UTF-8 to ASCII-8BIT (or
you should be able to create one). The underlying conversion table is
available. For example, the following works:
puts 'abc'.encode('UTF-8').encode('ASCII-8BIT')
=> abc
The reason this works is that ASCII-8BIT is defined to contain (7-bit)
ASCII. The fact that
"元気".encode('UTF-8').encode('ASCII-8BIT')
doesn't work is very similar to the fact that e.g.
"Dürst".encode('UTF-8').encode('shift_jis')
doesn't work: There is no "ü" character in Shift_JIS, and there is no
"元" character in ASCII-8BIT. So the transcoding engine has to give up,
usually with an exception. This can also be understood when noticing
that String#encode
tries to preserve character identity. If we just
copied arbitrary bytes into an ASCII-8BIT string, we would still have
the same bytes (you can do that with force-encoding), but the only thing
Ruby knows is that these are bytes, it has no idea which characters they
represent. That's why for removing such information (e.g. with
.force_encoding('ASCII-8BIT')
) as well as for adding such information
(e.g. with .force_encoding('UTF-8')
), we use a long and forceful method
name that should give programmers the message "watch out, you need to
know by yourself what you're doing".
Regards, Martin.
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp