Project

General

Profile

Actions

Bug #3252

closed

1.9.2 -> 1.9.1 backport fail: invalid byte sequence in UTF-8 (ArgumentError): "\xA0;" =~ /\./

Added by volante (M L) almost 14 years ago. Updated almost 13 years ago.

Status:
Rejected
Assignee:
-
ruby -v:
ruby 1.9.1p378 (2010-01-10 revision 26273) [i686-linux]
[ruby-core:30043]

Description

=begin
Test case file:

coding=utf-8

vi: set fileencoding=utf-8 :

puts RUBY_DESCRIPTION
puts "ext: " << Encoding.default_external.to_s # UTF-8
puts "int: " << Encoding.default_internal.to_s
puts "loc: " << Encoding.locale_charmap.to_s # UTF-8
"\xA0;" =~ /./

produces:

$ ~/ruby19/bin/ruby test.rb
ruby 1.9.1p378 (2010-01-10 revision 26273) [i686-linux]
ext: UTF-8
int:
loc: UTF-8
test.rb:10:in `': invalid byte sequence in UTF-8 (ArgumentError)

synopsis of web research: fixed in 1.9.2 (must be in git; not fixed in RC on ftp), but meanwhile breaking things in 1.9.1

I came across this when parsing an html   (from a quirks mode web page encoded as windows-1252) into Nokogiri (via mechanize, iirc)

(matchable with [\302\240] see: http://www.vitarara.org/cms/hpricot_to_nokogiri_day_1)

Related:

relevant history: http://redmine.ruby-lang.org/issues/show/2762

is this the/one patch to backport?: http://groups.google.com/group/rubyonrails-core/browse_thread/thread/5c1718cdbeb1ba17

http://redmine.ruby-lang.org/issues/show/1370 -- a year ago; still no 1.9.2 release nor 1.9.1 backport. boo hoo

not backported: http://redmine.ruby-lang.org/issues/show/1839

old (but good?): http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/

worked around: https://rails.lighthouseapp.com/projects/8994/tickets/2628-ruby-19-and-activesupport

Invalid byte sequence in UTF-8 error for anything but ASCII
http://www.redmine.org/boards/2/topics/9842

and so on ... http://www.google.com/search?q=ruby%20%22invalid%20byte%20sequence%20in%20utf-8%22

*** alternate vector of breakage?: ***
from the end of: http://blog.grayproductions.net/articles/ruby_19s_string

massi added 9 months later:

Hi,

I'm trying to render an image from mysql using send_data and I'm getting this error : invalid byte sequence in UTF-8 Here is my code :

def get_photo
@image_data = Photo.find(params[:id])
@image = @image_data.binary_data
@url = @image_data.url
send_data(@image, :type => 'image/jpeg,
:filename => "#{params[:id]}.jpg",

                   :disposition => 'inline')

end

BTW, I'm using ruby 1.9.1 with rails 2.3.5.

=end


Files

test.rb (244 Bytes) test.rb minimal test case to break volante (M L), 05/06/2010 01:20 PM
Actions #1

Updated by nobu (Nobuyoshi Nakada) almost 14 years ago

  • Status changed from Open to Rejected

=begin
"\xa0;" is invalid sequence in UTF-8.
You might want to write "\u{a0};"?
=end

Actions #2

Updated by naruse (Yui NARUSE) almost 14 years ago

=begin

Test case file:
(snip)
"\xA0;" =~ /./

As nobu said, "\xA0" is invalid UTF-8 string.
Both 1.9.1 and 1.9.2 raises error.
You confused "\u00A0" or Windows-1252?

relevant history: http://redmine.ruby-lang.org/issues/show/2762

This is independent issue.

http://groups.google.com/group/rubyonrails-core/browse_thread/thread/5c1718cdbeb1ba17?pli=1

This may be related but it has a problem.
"it checks each byte's first 0 bit to determine its validity." breaks Shift_JIS, Windows-31J, CP949 and so on.

http://redmine.ruby-lang.org/issues/show/1370 -- a year ago; still no 1.9.2 release nor 1.9.1 backport. boo hoo

This is mswin specific.

not backported: http://redmine.ruby-lang.org/issues/show/1839

This is backported in r24457.

old (but good?): http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/

This is not good, //IGNORE is glibc iconv/GNU libiconv's dependent.

worked around: https://rails.lighthouseapp.com/projects/8994/tickets/2628-ruby-19-and-activesupport

It seems correct fix.

Invalid byte sequence in UTF-8 error for anything but ASCII
http://www.redmine.org/boards/2/topics/9842

Redmine is not 1.9 ready.

from the end of: http://blog.grayproductions.net/articles/ruby_19s_string

Binary string is not a UTF-8 string.
You must specify those string as ASCII-8BIT encoding.
=end

Actions

Also available in: Atom PDF

Like0
Like0Like0