String#parse_csv fails to parse "\r" character embedded string
C:\work>ruby -rcsv -ve 'p ["aa\rbb"].to_csv.parse_csv'
ruby 1.9.3dev (2010-11-18 trunk 29823) [i386-mswin32_90]
block in shift': Unclosed quoted field on line 1. (CSV::MalformedCSVError) from c:/usr/lib/ruby/1.9.1/csv.rb:1831:in loop'
shift' from c:/usr/lib/ruby/1.9.1/csv.rb:1390:in parse_line'
parse_csv' from -e:1:in '
Updated by ender672 (Timothy Elliott) over 11 years ago
["aa\rbb"].to_csv results in the string ""aa\rbb"\n"
When you don't specify a row separator the ruby CSV library makes a guess by searching for the first occurrence of \r or \n.
In the case of ""aa\rbb"\n" it encounters the \r and assumes that it is your row separator. In order to point it to the correct row separator, you have to supply the option :row_sep => "\n" :
$ ruby -rcsv -ve 'p ["aa\rbb"].to_csv.parse_csv(:row_sep => "\n")'
ruby 1.9.3dev (2010-11-19 trunk 29830) [x86_64-linux]
Updated by JEG2 (James Gray) over 11 years ago
- Status changed from Open to Rejected
- Assignee set to JEG2 (James Gray)
Sorry, not sure how I missed this ticket. As Timothy says, this is intended documented behavior:
# <b><tt>:row_sep</tt></b>:: The String appended to the end of each # row. This can be set to the special # <tt>:auto</tt> setting, which requests # that CSV automatically discover this # from the data. Auto-discovery reads # ahead in the data looking for the next # <tt>"\r\n"</tt>, <tt>"\n"</tt>, or # <tt>"\r"</tt> sequence. A sequence # will be selected even if it occurs in # a quoted field, assuming that you # would have the same line endings # there. If none of those sequences is # found, +data+ is <tt>ARGF</tt>, # <tt>STDIN</tt>, <tt>STDOUT</tt>, or # <tt>STDERR</tt>, or the stream is only # available for output, the default # <tt>$INPUT_RECORD_SEPARATOR</tt> # (<tt>$/</tt>) is used. Obviously, # discovery takes a little time. Set # manually if speed is important. Also # note that IO objects should be opened # in binary mode on Windows if this # feature will be used as the # line-ending translation can cause # problems with resetting the document # position to where it was before the # read ahead. This String will be # transcoded into the data's Encoding # before parsing.