Bug #4069
closed
String#parse_csv fails to parse "\r" character embedded string
Added by phasis68 (Heesob Park) about 14 years ago.
Updated over 13 years ago.
Description
=begin
C:\work>ruby -rcsv -ve 'p ["aa\rbb"].to_csv.parse_csv'
ruby 1.9.3dev (2010-11-18 trunk 29823) [i386-mswin32_90]
c:/usr/lib/ruby/1.9.1/csv.rb:1914:in block in shift': Unclosed quoted field on line 1. (CSV::MalformedCSVError) from c:/usr/lib/ruby/1.9.1/csv.rb:1831:in
loop'
from c:/usr/lib/ruby/1.9.1/csv.rb:1831:in shift' from c:/usr/lib/ruby/1.9.1/csv.rb:1390:in
parse_line'
from c:/usr/lib/ruby/1.9.1/csv.rb:2341:in parse_csv' from -e:1:in
'
=end
=begin
["aa\rbb"].to_csv results in the string ""aa\rbb"\n"
When you don't specify a row separator the ruby CSV library makes a guess by searching for the first occurrence of \r or \n.
In the case of ""aa\rbb"\n" it encounters the \r and assumes that it is your row separator. In order to point it to the correct row separator, you have to supply the option :row_sep => "\n" :
$ ruby -rcsv -ve 'p ["aa\rbb"].to_csv.parse_csv(:row_sep => "\n")'
ruby 1.9.3dev (2010-11-19 trunk 29830) [x86_64-linux]
["aa\rbb"]
=end
- Status changed from Open to Rejected
- Assignee set to JEG2 (James Gray)
=begin
Sorry, not sure how I missed this ticket. As Timothy says, this is intended documented behavior:
# <b><tt>:row_sep</tt></b>:: The String appended to the end of each
# row. This can be set to the special
# <tt>:auto</tt> setting, which requests
# that CSV automatically discover this
# from the data. Auto-discovery reads
# ahead in the data looking for the next
# <tt>"\r\n"</tt>, <tt>"\n"</tt>, or
# <tt>"\r"</tt> sequence. A sequence
# will be selected even if it occurs in
# a quoted field, assuming that you
# would have the same line endings
# there. If none of those sequences is
# found, +data+ is <tt>ARGF</tt>,
# <tt>STDIN</tt>, <tt>STDOUT</tt>, or
# <tt>STDERR</tt>, or the stream is only
# available for output, the default
# <tt>$INPUT_RECORD_SEPARATOR</tt>
# (<tt>$/</tt>) is used. Obviously,
# discovery takes a little time. Set
# manually if speed is important. Also
# note that IO objects should be opened
# in binary mode on Windows if this
# feature will be used as the
# line-ending translation can cause
# problems with resetting the document
# position to where it was before the
# read ahead. This String will be
# transcoded into the data's Encoding
# before parsing.
=end
Also available in: Atom
PDF
Like0
Like0Like0