Project

General

Profile

Bug #3780

RDoc::Parser.binary? broken for some utf8 files longer than 1024 bytes

Added by stepheneb (Stephen Bannasch) about 10 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Target version:
ruby -v:
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]
Backport:
[ruby-core:32003]

Description

=begin
RDoc truncates files at 1024 bytes when checking if the file is binary. This will invalidate the file encoding if the file is truncated in the middle of a utf8 char and cause RDoc to exit.

I found this problem when running rdoc on the ruby 1.9.2 source.

$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]
$ rdoc --version
rdoc 2.5.11

More description of the bug and a patch with a failing test is on this issue in RubyForge rdoc issue tracker.

http://rubyforge.org/tracker/index.php?func=detail&aid=28525&group_id=627&atid=2472

The same issue appears to be in the 1_9 source, see: http://github.com/ruby/ruby/blob/trunk/lib/rdoc/parser.rb#L70

I find it confusing knowing where to create an RDoc issue: RubyForge or here -- so I've created an issue in both places.

This gist: http://gist.github.com/561350 (possible_fix.rb) shows how I changed RDoc::Parser.binary? locally -- but I don't think it is correct to classify all utf8 files which are invalid when truncated at 1024 bytes as binary.

That same gist (show_parsing_error.rb) also shows another strategy for solving the invalid encoding issue but there are probably better ways to determine if a file is binary.
=end

Also available in: Atom PDF