Feature #4145
closedThe result of UTF-16 encoded string concatenation
Description
=begin
C:\work>irb
irb(main):001:0> a = 'abc'.encode('UTF-16')
=> "\uFEFFabc"
irb(main):002:0> b = a + a
=> "\uFEFFabc\uFEFFabc"
irb(main):003:0> c = b.encode('UTF-8')
=> "abc\uFEFFabc"
irb(main):004:0> d = b.encode('US-ASCII')
Encoding::UndefinedConversionError: U+FEFF to US-ASCII in conversion from UTF-16
to UTF-8 to US-ASCII
from (irb):4:in encode' from (irb):4 from c:/usr/bin/irb.bat:19:in
'
irb(main):005:0> b << b
=> "\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc"
irb(main):006:0> b * 3
=> "\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc"
irb(main):007:0>
Although I understand this behaviour, is there any possibility of generating only one \uFEFF ?
=end
Updated by naruse (Yui NARUSE) about 14 years ago
- Status changed from Open to Assigned
- Assignee set to naruse (Yui NARUSE)
=begin
Strings encoded in UTF-16 don't support concatenation.
Use UTF-16BE or UTF-16LE for processing.
I'm considering to warn concatenation of strings encoded in dummy encoding.
=end
Updated by duerst (Martin Dürst) about 14 years ago
=begin
We should try to get a better overall idea of what "UTF-16" and so on
are for. I asked some questions at the very end of [ruby-core:33461].
Yui, can you try to give answers? I hope this will help having a general
discussion of the issues involved.
Regards, Martin.
On 2010/12/10 14:53, Yui NARUSE wrote:
Issue #4145 has been updated by Yui NARUSE.
Status changed from Open to Assigned
Assigned to set to Yui NARUSEStrings encoded in UTF-16 don't support concatenation.
Use UTF-16BE or UTF-16LE for processing.I'm considering to warn concatenation of strings encoded in dummy encoding.¶
http://redmine.ruby-lang.org/issues/show/4145
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
=end
Updated by duerst (Martin Dürst) about 14 years ago
=begin
We should try to get a better overall idea of what "UTF-16" and so on
are for. I asked some questions at the very end of [ruby-core:33461].
Yui, can you try to give answers? I hope this will help having a general
discussion of the issues involved.
Regards, Martin.
On 2010/12/10 14:53, Yui NARUSE wrote:
Issue #4145 has been updated by Yui NARUSE.
Status changed from Open to Assigned
Assigned to set to Yui NARUSEStrings encoded in UTF-16 don't support concatenation.
Use UTF-16BE or UTF-16LE for processing.I'm considering to warn concatenation of strings encoded in dummy encoding.¶
http://redmine.ruby-lang.org/issues/show/4145
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
=end
Updated by naruse (Yui NARUSE) about 14 years ago
=begin
(2010/12/10 18:14), "Martin J. Dürst" wrote:
We should try to get a better overall idea of what "UTF-16" and so on
are for. I asked some questions at the very end of [ruby-core:33461].
Yui, can you try to give answers? I hope this will help having a
general discussion of the issues involved.
Current implementation is what I thought to be.
My main questions here are:
A) Which one of the above is the current Ruby implementation effort
(the above patch and a few related ones) targetting?
This is, 2b) XML strictly requires a BOM.
Because the spec (2a) collides the real (2c).
B) How complete is that implementation (thought to be)?
Current one is completed one.
C) What about other implementation needs?
Nothing, in current situation.
D) What can we do to make sure users have at least a chance of
understanding what "UTF-16" in Ruby is good for?
This is open problem, but so I implement it and am seeing user's reactions.
--
NARUSE, Yui naruse@airemix.jp
=end
Updated by naruse (Yui NARUSE) about 14 years ago
- Status changed from Assigned to Closed
=begin
=end