Bug #4603
closedlib/csv.rb: when the :encoding parameter is not provided, the encoding of CSV data is treated as ASCII-8BIT
Added by nobuoka (yu nobuoka) over 13 years ago. Updated over 12 years ago.
Description
=begin
This issue is involved in three methods, CSV::open, CSV::read and CSV::foreach.
The document of CSV::read says "This method also understands an additional
:encoding parameter that you can use to specify the Encoding of the data
in the file to be read. You must provide this unless your data is in
Encoding::default_external()."
However, when the :encoding parameter is not provided, the encoding of the CSV data
is treated as ASCII-8BIT. Not as Encoding.default_external.
CSV::open and CSV::foreach are also similar.
I think the actual behaviour of these methods doesn't conform to the document of these.
=end
Files
csv_test.rb (814 Bytes) csv_test.rb | nobuoka (yu nobuoka), 04/24/2011 03:33 PM |
Updated by naruse (Yui NARUSE) over 13 years ago
- Status changed from Open to Assigned
- Assignee set to JEG2 (James Gray)
- Target version set to 1.9.2
=begin
=end
Updated by naruse (Yui NARUSE) over 13 years ago
- ruby -v changed from ruby 1.9.2p188 (2011-03-28 revision 31204) [x86_64-linux] to -
=begin
2011/4/25 James Gray james@graysoftinc.com:
On Sun, Apr 24, 2011 at 1:33 AM, yu nobuoka nobuoka@r-definition.com
wrote:The document of CSV::read says "This method also understands an additional
:encoding parameter that you can use to specify the Encoding of the data
in the file to be read. You must provide this unless your data is in
Encoding::default_external()."
However, when the :encoding parameter is not provided, the encoding of the
CSV data
is treated as ASCII-8BIT. Not as Encoding.default_external.
CSV::open and CSV::foreach are also similar.I think the actual behaviour of these methods doesn't conform to the
document of these.It seems this was an intentional change not made by me:
r25362 | naruse | 2009-10-15 22:04:38 -0500 (Thu, 15 Oct 2009) | 2 lines
- lib/csv.rb (CSV#raw_encoding): returns ASCII-8BIT when the io doesn't have encoding.
This seems like a wrong choice. Why would we not want to support the
default encodings? Can someone please explain to me why this was done?
Ah, sorry, that commit message doesn't explain the intention.
It is for IO-like object which doesn't have encoding method, for example Zlib::GzipReader
test_gzip_reader_bug_fix in test/csv/test_features.rb.
Anyway, even if I applied following patch, the problem is still reproduced.
diff --git a/lib/csv.rb b/lib/csv.rb
index 45273f9..ee35ccc 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -2296,7 +2296,7 @@ class CSV
elsif @io.respond_to? :encoding
@io.encoding
else
-
default
-
endEncoding.default_internal || Encoding.default_external
end
end
--
NARUSE, Yui naruse@airemix.jp
=end
Updated by naruse (Yui NARUSE) over 13 years ago
- ruby -v changed from - to ruby 1.9.2p188 (2011-03-28 revision 31204) [x86_64-linux]
=begin
=end
Updated by Anonymous over 13 years ago
=begin
On Sun, Apr 24, 2011 at 1:33 AM, yu nobuoka nobuoka@r-definition.comwrote:
The document of CSV::read says "This method also understands an additional
:encoding parameter that you can use to specify the Encoding of the data
in the file to be read. You must provide this unless your data is in
Encoding::default_external()."
However, when the :encoding parameter is not provided, the encoding of the
CSV data
is treated as ASCII-8BIT. Not as Encoding.default_external.
CSV::open and CSV::foreach are also similar.I think the actual behaviour of these methods doesn't conform to the
document of these.
It seems this was an intentional change not made by me:
r25362 | naruse | 2009-10-15 22:04:38 -0500 (Thu, 15 Oct 2009) | 2 lines
- lib/csv.rb (CSV#raw_encoding): returns ASCII-8BIT when the io
doesn't have encoding.
This seems like a wrong choice. Why would we not want to support the
default encodings? Can someone please explain to me why this was done?
James Edward Gray II
=end
Updated by Anonymous over 13 years ago
=begin
On Sun, Apr 24, 2011 at 11:29 PM, NARUSE, Yui naruse@airemix.jp wrote:
Ah, sorry, that commit message doesn't explain the intention.
It is for IO-like object which doesn't have encoding method, for
example Zlib::GzipReader
test_gzip_reader_bug_fix in test/csv/test_features.rb.Anyway, even if I applied following patch, the problem is still reproduced.
diff --git a/lib/csv.rb b/lib/csv.rb
index 45273f9..ee35ccc 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -2296,7 +2296,7 @@ class CSV
elsif @io.respond_to? :encoding
@io.encoding
else
default
endEncoding.default_internal || Encoding.default_external
end
end
OK, I see the problem.
The issue is that a mode of "rb" is being used to suppress newline
translation on Windows. However, that's also switching my Encoding to
ASCII-8BIT. Usually I love that feature, but here it's not what I want. Is
there anyway to shut off the translation and not get the encoding change?
James Edward Gray II
=end
Updated by nobu (Nobuyoshi Nakada) over 13 years ago
- ruby -v changed from ruby 1.9.2p188 (2011-03-28 revision 31204) [x86_64-linux] to -
=begin
Hi,
At Mon, 25 Apr 2011 22:57:52 +0900,
James Gray wrote in [ruby-core:35878]:
The issue is that a mode of "rb" is being used to suppress newline
translation on Windows. However, that's also switching my Encoding to
ASCII-8BIT. Usually I love that feature, but here it's not what I want. Is
there anyway to shut off the translation and not get the encoding change?
Now fixed so that universal_newline: false can work. What's
about the following patch?
diff --git i/lib/csv.rb w/lib/csv.rb
index 1aad2f3..d51cb55 100644
--- i/lib/csv.rb
+++ w/lib/csv.rb
@@ -1334,10 +1334,18 @@ class CSV
def self.open(*args)
# find the +options+ Hash
options = if args.last.is_a? Hash then args.pop else Hash.new end
-
default to a binary open mode¶
- args << "rb" if args.size == 1 and !options.key?(:mode)
-
wrap a File opened with the remaining +args+¶
- csv = new(File.open(*args, options), options)
-
wrap a File opened with the remaining +args+ with no newline¶
-
decorator¶
-
file_opts = {universal_newline: false}.merge(options)
-
begin
-
f = File.open(*args, file_opts)
-
rescue ArgumentError => e
-
throw unless /needs binmode/ =~ e.message and args.size == 1
-
args << "rb"
-
file_opts = {encoding: Encoding.default_external}.merge(file_opts)
-
retry
-
end
-
csv = new(f, options)
handle blocks like Ruby's open(), not like the CSV library¶
if block_given?
@@ -1398,11 +1406,8 @@ class CSV
# encoding: "UTF-32BE:UTF-8" would read UTF-32BE data from the file
# but transcode it to UTF-8 before CSV parses it.
#
- def self.read(path, options = Hash.new)
- encoding = options.delete(:encoding)
- mode = "rb"
- mode << ":#{encoding}" if encoding
- open(path, mode, options) { |csv| csv.read }
- def self.read(path, *options)
- open(path, *options) { |csv| csv.read }
end
# Alias for CSV::read().
diff --git i/test/csv/test_encodings.rb w/test/csv/test_encodings.rb
index 3880f3a..54c34f3 100755
--- i/test/csv/test_encodings.rb
+++ w/test/csv/test_encodings.rb
@@ -79,6 +79,21 @@ class TestCSV::Encodings < TestCSV
end
end
- def test_read_with_default_encoding
- data = "abc"
- default_external = Encoding.default_external
- each_encoding do |encoding|
-
File.open(@temp_csv_path, "wb", encoding: encoding) {|f| f << data}
-
begin
-
Encoding.default_external = encoding
-
result = CSV.read(@temp_csv_path)[0][0]
-
ensure
-
Encoding.default_external = default_external
-
end
-
assert_equal(encoding, result.encoding)
- end
- end
- #######################################################################
Stress Test ASCII Compatible and Non-ASCII Compatible Encodings¶
#######################################################################
--
Nobu Nakada
=end
Updated by nobuoka (yu nobuoka) over 13 years ago
=begin
Hi,
Nobuyoshi Nakada wrote:
Now fixed so that universal_newline: false can work. What's
about the following patch?
diff --git i/lib/csv.rb w/lib/csv.rb
index 1aad2f3..d51cb55 100644
--- i/lib/csv.rb
+++ w/lib/csv.rb
@@ -1334,10 +1334,18 @@ class CSV
def self.open(*args)
# find the +options+ Hash
options = if args.last.is_a? Hash then args.pop else Hash.new end
default to a binary open mode¶
- args << "rb" if args.size == 1 and !options.key?(:mode)
wrap a File opened with the remaining +args+¶
- csv = new(File.open(*args, options), options)
wrap a File opened with the remaining +args+ with no newline¶
decorator¶
- file_opts = {universal_newline: false}.merge(options)
- begin
f = File.open(*args, file_opts)
- rescue ArgumentError => e
throw unless /needs binmode/ =~ e.message and args.size == 1
args << "rb"
file_opts = {encoding: Encoding.default_external}.merge(file_opts)
retry
- end
- csv = new(f, options)
In +rescue+ clause, a +throw+ expression is used. Is it correct?
I think a +raise+ expression should be used instead. Or my idea is wrong...?
=end
Updated by Anonymous over 13 years ago
=begin
On Tue, Apr 26, 2011 at 3:13 PM, yu nobuoka nobuoka@r-definition.comwrote:
Issue #4603 has been updated by yu nobuoka.
Hi,
Nobuyoshi Nakada wrote:
Now fixed so that universal_newline: false can work. What's
about the following patch?diff --git i/lib/csv.rb w/lib/csv.rb
index 1aad2f3..d51cb55 100644
--- i/lib/csv.rb
+++ w/lib/csv.rb
@@ -1334,10 +1334,18 @@ class CSV
def self.open(*args)
# find the +options+ Hash
options = if args.last.is_a? Hash then args.pop else Hash.new endEncoding.default_external}.merge(file_opts)
retry
- end
- csv = new(f, options)
In +rescue+ clause, a +throw+ expression is used. Is it correct?
I think a +raise+ expression should be used instead. Or my idea is
wrong...?
Good point. I think raise() is correct.
James Edward Gray II
=end
Updated by nobu (Nobuyoshi Nakada) over 13 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
=begin
This issue was solved with changeset r31370.
yu, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- lib/csv.rb (CSV::open): suppress universal newline decorator.
fixes #4603
=end
Updated by theirishpenguin (Declan McGrath) over 12 years ago
Hi,
I seem to be still getting this problem on Ruby 1.9.2p290 revision 32553. This issue should be fixed in revision 32553, correct? (as 32553 > 31370)
See below for test case.
Regards,
Declan
Test case¶
-
Contents of test.csv are:
á,b
1,2 -
Steps to reproduce issue:
declan@foo:~$ ruby -v
ruby 1.9.2p290 (2011-07-09 revision 32553) [i686-linux]
declan@foo:~$ irb
irb(main):004:0> Encoding::default_internal = 'UTF-8'
=> "UTF-8"
irb(main):007:0> Encoding::default_external = 'UTF-8'
=> "UTF-8"
irb(main):009:0> require 'csv'
=> true
irb(main):010:0> CSV.read('test.csv')
=> [["\xC3\xA1", "b"], ["1", "2"]]
Updated by naruse (Yui NARUSE) over 12 years ago
- Description updated (diff)
theirishpenguin (Declan McGrath) wrote:
I seem to be still getting this problem on Ruby 1.9.2p290 revision 32553.
This issue should be fixed in revision 32553, correct? (as 32553 > 31370)
No, revision numbers are repository global number.
r32553 > r31370 doesn't mean it because they are different branch.
r32553 changes trunk and ruby_1_9_3 branch but doesn't change ruby_1_9_2 branch.
r31370 is not merged to ruby_1_9_2.
Updated by theirishpenguin (Declan McGrath) over 12 years ago
Thanks Naruse!