Bug #3780

RDoc::Parser.binary? broken for some utf8 files longer than 1024 bytes

Added by stepheneb (Stephen Bannasch) about 10 years ago. Updated over 9 years ago.

Target version:
ruby -v:
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]


RDoc truncates files at 1024 bytes when checking if the file is binary. This will invalidate the file encoding if the file is truncated in the middle of a utf8 char and cause RDoc to exit.

I found this problem when running rdoc on the ruby 1.9.2 source.

$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]
$ rdoc --version
rdoc 2.5.11

More description of the bug and a patch with a failing test is on this issue in RubyForge rdoc issue tracker.

The same issue appears to be in the 1_9 source, see:

I find it confusing knowing where to create an RDoc issue: RubyForge or here -- so I've created an issue in both places.

This gist: (possible_fix.rb) shows how I changed RDoc::Parser.binary? locally -- but I don't think it is correct to classify all utf8 files which are invalid when truncated at 1024 bytes as binary.

That same gist (show_parsing_error.rb) also shows another strategy for solving the invalid encoding issue but there are probably better ways to determine if a file is binary.

Also available in: Atom PDF