Project

General

Profile

Actions

Feature #4145

closed

The result of UTF-16 encoded string concatenation

Added by phasis68 (Heesob Park) almost 14 years ago. Updated over 13 years ago.

Status:
Closed
Target version:
[ruby-core:33661]

Description

=begin
C:\work>irb
irb(main):001:0> a = 'abc'.encode('UTF-16')
=> "\uFEFFabc"
irb(main):002:0> b = a + a
=> "\uFEFFabc\uFEFFabc"
irb(main):003:0> c = b.encode('UTF-8')
=> "abc\uFEFFabc"
irb(main):004:0> d = b.encode('US-ASCII')
Encoding::UndefinedConversionError: U+FEFF to US-ASCII in conversion from UTF-16
to UTF-8 to US-ASCII
from (irb):4:in encode' from (irb):4 from c:/usr/bin/irb.bat:19:in '
irb(main):005:0> b << b
=> "\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc"
irb(main):006:0> b * 3
=> "\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc\uFEFFabc"
irb(main):007:0>

Although I understand this behaviour, is there any possibility of generating only one \uFEFF ?
=end

Actions #1

Updated by naruse (Yui NARUSE) almost 14 years ago

  • Status changed from Open to Assigned
  • Assignee set to naruse (Yui NARUSE)

=begin
Strings encoded in UTF-16 don't support concatenation.
Use UTF-16BE or UTF-16LE for processing.

I'm considering to warn concatenation of strings encoded in dummy encoding.
=end

Actions #2

Updated by duerst (Martin Dürst) almost 14 years ago

=begin
We should try to get a better overall idea of what "UTF-16" and so on
are for. I asked some questions at the very end of [ruby-core:33461].
Yui, can you try to give answers? I hope this will help having a general
discussion of the issues involved.

Regards, Martin.

On 2010/12/10 14:53, Yui NARUSE wrote:

Issue #4145 has been updated by Yui NARUSE.

Status changed from Open to Assigned
Assigned to set to Yui NARUSE

Strings encoded in UTF-16 don't support concatenation.
Use UTF-16BE or UTF-16LE for processing.

I'm considering to warn concatenation of strings encoded in dummy encoding.

http://redmine.ruby-lang.org/issues/show/4145


http://redmine.ruby-lang.org

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp
=end

Actions #3

Updated by duerst (Martin Dürst) almost 14 years ago

=begin
We should try to get a better overall idea of what "UTF-16" and so on
are for. I asked some questions at the very end of [ruby-core:33461].
Yui, can you try to give answers? I hope this will help having a general
discussion of the issues involved.

Regards, Martin.

On 2010/12/10 14:53, Yui NARUSE wrote:

Issue #4145 has been updated by Yui NARUSE.

Status changed from Open to Assigned
Assigned to set to Yui NARUSE

Strings encoded in UTF-16 don't support concatenation.
Use UTF-16BE or UTF-16LE for processing.

I'm considering to warn concatenation of strings encoded in dummy encoding.

http://redmine.ruby-lang.org/issues/show/4145


http://redmine.ruby-lang.org

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp

=end

Actions #4

Updated by naruse (Yui NARUSE) almost 14 years ago

=begin
(2010/12/10 18:14), "Martin J. Dürst" wrote:

We should try to get a better overall idea of what "UTF-16" and so on
are for. I asked some questions at the very end of [ruby-core:33461].
Yui, can you try to give answers? I hope this will help having a
general discussion of the issues involved.

Current implementation is what I thought to be.

My main questions here are:
A) Which one of the above is the current Ruby implementation effort
(the above patch and a few related ones) targetting?

This is, 2b) XML strictly requires a BOM.
Because the spec (2a) collides the real (2c).

B) How complete is that implementation (thought to be)?

Current one is completed one.

C) What about other implementation needs?

Nothing, in current situation.

D) What can we do to make sure users have at least a chance of
understanding what "UTF-16" in Ruby is good for?

This is open problem, but so I implement it and am seeing user's reactions.

--
NARUSE, Yui

=end

Actions #5

Updated by naruse (Yui NARUSE) almost 14 years ago

  • Status changed from Assigned to Closed

=begin

=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0