RDoc::Parser.binary? broken for some utf8 files longer than 1024 bytes
RDoc truncates files at 1024 bytes when checking if the file is binary. This will invalidate the file encoding if the file is truncated in the middle of a utf8 char and cause RDoc to exit.
I found this problem when running rdoc on the ruby 1.9.2 source.
$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]
$ rdoc --version
More description of the bug and a patch with a failing test is on this issue in RubyForge rdoc issue tracker.
The same issue appears to be in the 1_9 source, see: http://github.com/ruby/ruby/blob/trunk/lib/rdoc/parser.rb#L70
I find it confusing knowing where to create an RDoc issue: RubyForge or here -- so I've created an issue in both places.
This gist: http://gist.github.com/561350 (possible_fix.rb) shows how I changed RDoc::Parser.binary? locally -- but I don't think it is correct to classify all utf8 files which are invalid when truncated at 1024 bytes as binary.
That same gist (show_parsing_error.rb) also shows another strategy for solving the invalid encoding issue but there are probably better ways to determine if a file is binary.