Project

General

Profile

Actions

Bug #7964

open

Writing an ASCII-8BIT String to a StringIO created from a UTF-8 String

Added by brixen (Brian Shirai) almost 12 years ago. Updated about 1 year ago.

Status:
Assigned
Target version:
-
ruby -v:
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
Backport:
[ruby-core:52921]

Description

=begin
In the following script, an ASCII-8BIT String is written to a StringIO created with a UTF-8 String without error. However, a << b or a + b will raise an exception, as will writing an ASCII-8BIT String to a File with UTF-8 external encoding.

  • $ cat file_enc.rb

    encoding: utf-8

    require 'stringio'

    a = "On a very cold morning, it was -8°F."
    b = a.dup.force_encoding "ascii-8bit"

    io = StringIO.new a
    io.write(b)
    p io.string.encoding

    File.open "data.txt", "w:utf-8" do |f|
    f.write a
    f.write b
    end

  • $ ruby2.0 -v file_enc.rb
    ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
    #Encoding:UTF-8
    file_enc.rb:13:in write': "\xC2" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError) from file_enc.rb:13:in block in '
    from file_enc.rb:11:in open' from file_enc.rb:11:in '

  • $ ruby1.9.3 -v file_enc.rb
    ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin10.8.0]
    #Encoding:UTF-8
    file_enc.rb:13:in write': "\xC2" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError) from file_enc.rb:13:in block in '
    from file_enc.rb:11:in open' from file_enc.rb:11:in '
    =end

Updated by nobu (Nobuyoshi Nakada) almost 12 years ago

  • Description updated (diff)

Updated by nobu (Nobuyoshi Nakada) almost 12 years ago

  • Category set to ext
  • Status changed from Open to Assigned
  • Assignee set to nobu (Nobuyoshi Nakada)
  • Target version set to 2.1.0

Currently, StringIO does not support encoding conversion on write, so `io.write(b)' does not raise any exceptions.

Updated by duerst (Martin Dürst) almost 12 years ago

nobu (Nobuyoshi Nakada) wrote:

Currently, StringIO does not support encoding conversion on write, so `io.write(b)' does not raise any exceptions.

Should StringIO support encoding conversion? I think it should, because it should work like IO. However, the question is whether the resulting string should always be BINARY (exactly mirroring what happens with real IO), or whether it should have its own encoding (this might allow collecting substrings in different encodings into a string with a single encoding without any explicit conversions).

I think that somebody should open a feature for this, and of course patches would be welcome.

As an aside, I think it would be easier implementing StingIO in Ruby, or is StringIO performance critical?

Updated by naruse (Yui NARUSE) almost 12 years ago

The examples are not equal.
Correct comparison is

StringIO.open a, "w" do |io|
io.write(b)
end

File.open "data.txt", "w" do |io|
io.write b
end

So it won't raise error even if StringIO supports external/internal encoding.

duerst (Martin Dürst) wrote:

nobu (Nobuyoshi Nakada) wrote:

Currently, StringIO does not support encoding conversion on write, so `io.write(b)' does not raise any exceptions.

Should StringIO support encoding conversion? I think it should, because it should work like IO. However, the question is whether the resulting string should always be BINARY (exactly mirroring what happens with real IO), or whether it should have its own encoding (this might allow collecting substrings in different encodings into a string with a single encoding without any explicit conversions).

Agreed.

I think that somebody should open a feature for this, and of course patches would be welcome.

As an aside, I think it would be easier implementing StingIO in Ruby, or is StringIO performance critical?

see https://bugs.ruby-lang.org/issues/5677

Updated by brixen (Brian Shirai) almost 12 years ago

Martin, what do you mean by: "However, the question is whether the resulting string should always be BINARY (exactly mirroring what happens with real IO)..."?

If StringIO is going to fake aliasing #pos across instances that have been #dup'd, it should certainly have the same encoding-related behavior. Cf. http://bugs.ruby-lang.org/issues/7220

Updated by hsbt (Hiroshi SHIBATA) almost 11 years ago

  • Target version changed from 2.1.0 to 2.2.0
Actions #7

Updated by naruse (Yui NARUSE) about 7 years ago

  • Target version deleted (2.2.0)

Updated by madeline-hou (Madeline Hou) about 1 year ago

@naruse (Yui NARUSE) (Yui NARUSE)

Like you said, the example comparisons aren't equal. I saw that the Feature #5677 that your linked to in your comment was rejected, would it be alright to close this issue?

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0