Bug #3252
closed1.9.2 -> 1.9.1 backport fail: invalid byte sequence in UTF-8 (ArgumentError): "\xA0;" =~ /\./
Description
=begin
Test case file:
coding=utf-8¶
vi: set fileencoding=utf-8 :¶
puts RUBY_DESCRIPTION
puts "ext: " << Encoding.default_external.to_s # UTF-8
puts "int: " << Encoding.default_internal.to_s
puts "loc: " << Encoding.locale_charmap.to_s # UTF-8
"\xA0;" =~ /./
produces:
$ ~/ruby19/bin/ruby test.rb
ruby 1.9.1p378 (2010-01-10 revision 26273) [i686-linux]
ext: UTF-8
int:
loc: UTF-8
test.rb:10:in `': invalid byte sequence in UTF-8 (ArgumentError)
synopsis of web research: fixed in 1.9.2 (must be in git; not fixed in RC on ftp), but meanwhile breaking things in 1.9.1
I came across this when parsing an html (from a quirks mode web page encoded as windows-1252) into Nokogiri (via mechanize, iirc)
(matchable with [\302\240] see: http://www.vitarara.org/cms/hpricot_to_nokogiri_day_1)
Related:
relevant history: http://redmine.ruby-lang.org/issues/show/2762
is this the/one patch to backport?: http://groups.google.com/group/rubyonrails-core/browse_thread/thread/5c1718cdbeb1ba17
http://redmine.ruby-lang.org/issues/show/1370 -- a year ago; still no 1.9.2 release nor 1.9.1 backport. boo hoo
not backported: http://redmine.ruby-lang.org/issues/show/1839
old (but good?): http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/
worked around: https://rails.lighthouseapp.com/projects/8994/tickets/2628-ruby-19-and-activesupport
Invalid byte sequence in UTF-8 error for anything but ASCII
http://www.redmine.org/boards/2/topics/9842
and so on ... http://www.google.com/search?q=ruby%20%22invalid%20byte%20sequence%20in%20utf-8%22
*** alternate vector of breakage?: ***
from the end of: http://blog.grayproductions.net/articles/ruby_19s_string
massi added 9 months later:
Hi,
I'm trying to render an image from mysql using send_data and I'm getting this error : invalid byte sequence in UTF-8 Here is my code :
def get_photo
@image_data = Photo.find(params[:id])
@image = @image_data.binary_data
@url = @image_data.url
send_data(@image, :type => 'image/jpeg,
:filename => "#{params[:id]}.jpg",
:disposition => 'inline')
end
BTW, I'm using ruby 1.9.1 with rails 2.3.5.
=end
Files
Updated by nobu (Nobuyoshi Nakada) almost 14 years ago
- Status changed from Open to Rejected
=begin
"\xa0;" is invalid sequence in UTF-8.
You might want to write "\u{a0};"?
=end
Updated by naruse (Yui NARUSE) almost 14 years ago
=begin
Test case file:
(snip)
"\xA0;" =~ /./
As nobu said, "\xA0" is invalid UTF-8 string.
Both 1.9.1 and 1.9.2 raises error.
You confused "\u00A0" or Windows-1252?
relevant history: http://redmine.ruby-lang.org/issues/show/2762
This is independent issue.
http://groups.google.com/group/rubyonrails-core/browse_thread/thread/5c1718cdbeb1ba17?pli=1
This may be related but it has a problem.
"it checks each byte's first 0 bit to determine its validity." breaks Shift_JIS, Windows-31J, CP949 and so on.
http://redmine.ruby-lang.org/issues/show/1370 -- a year ago; still no 1.9.2 release nor 1.9.1 backport. boo hoo
This is mswin specific.
not backported: http://redmine.ruby-lang.org/issues/show/1839
This is backported in r24457.
old (but good?): http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/
This is not good, //IGNORE is glibc iconv/GNU libiconv's dependent.
worked around: https://rails.lighthouseapp.com/projects/8994/tickets/2628-ruby-19-and-activesupport
It seems correct fix.
Invalid byte sequence in UTF-8 error for anything but ASCII
http://www.redmine.org/boards/2/topics/9842
Redmine is not 1.9 ready.
from the end of: http://blog.grayproductions.net/articles/ruby_19s_string
Binary string is not a UTF-8 string.
You must specify those string as ASCII-8BIT encoding.
=end