Bug #1332
closedReading file on Windows is 500x slower then with previous Ruby version
Description
=begin
time = [Time.new]
c = ''
'aaaa'.upto('zzzz') {|e| c << e}
3.times { c << c }
time << Time.new
File.open('out.file','w') { |f| f.write(c) }
time << Time.new
c = File.open('out.file','r') { |f| f.read }
time << Time.new
0.upto(time.size - 2) {|i| p "#{i} #{time[i+1]-time[i]}" }
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]
"0 0.537075"
"1 0.696244"
"2 40.188834"
ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
"0 0.551"
"1 0.133"
"2 0.087"
That is about 5x slower write and 500x read operation. Times are the
same if I do:
f = File.new('out.file','r')
c = f.read
f.close
Tried on two machines. Vista SP1 and XP SP3. Same results.
Tried with virus scanner disabled. Same results.
Tried on old Win2K P4 2.4Ghz machine without virus scanner
"0 1.0625"
"1 1.09375"
"2 111.171875"
Thats 111 seconds to read 14.623.232 bytes long file which is probably read from cache anyway.
The problem doesn't seem to exist on Linux althow I have tried only Ruby 1.9.0 version.
by
TheR
=end
Updated by yugui (Yuki Sonoda) over 15 years ago
- Status changed from Open to Assigned
- Assignee set to akr (Akira Tanaka)
- Priority changed from Normal to 3
Updated by rogerdpack (Roger Pack) about 15 years ago
I believe this is related to other issues regarding reading files in non-binary mode being slow in 1.9
a = File.open('l', 'w'); 10000000.times { a.write "abc\n" }; a.close
Benchmark.measure { a = File.open('l', 'r'); a.readlines; a.close }.real
=> 11.890625
Benchmark.measure { a = File.open('l', 'rb'); a.readlines; a.close }.real
=> 3.59375
I believe that it is doing a string conversion from one encoding ["\r\n"] to another ["\n"].
Perhaps there is a way to speed this up? (ex: special case it somehow)?
-r
refs:
http://www.ruby-forum.com/topic/182691
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/24824
Updated by usa (Usaku NAKAMURA) about 15 years ago
=begin
Hello,
In message "[ruby-core:26505] [Bug #1332] Reading file on Windows is 500x slower then with previous Ruby version"
on Nov.04,2009 04:50:49, redmine@ruby-lang.org wrote:
I believe that it is doing a string conversion from one encoding ["\r\n"] to another ["\n"].
right.
Perhaps there is a way to speed this up? (ex: special case it somehow)?
Currently, we has implemented the newline conversion as a
transcode converter, just like encoding conversion.
But the design of transcode is too general to use it such
a simple operation, as our finding.
We want to find a better mechanism which doesn't deviate
from the current design of IO...
Regards,¶
U.Nakamura usa@garbagecollect.jp
=end
Updated by jonforums (Jon Forums) about 15 years ago
=begin
Currently, we has implemented the newline conversion as a
transcode converter, just like encoding conversion.
But the design of transcode is too general to use it such
a simple operation, as our finding.
We want to find a better mechanism which doesn't deviate
from the current design of IO...
Do you think the current transcode design is also the cause of
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/24839
Jon
=end
Updated by rogerdpack (Roger Pack) about 15 years ago
A temporary work around [though not actually binary compatible] appears to be
Index: ruby.c
===================================================================
--- ruby.c (revision 25830)
+++ ruby.c (working copy)
@@ -1484,6 +1484,7 @@
int fd, mode = O_RDONLY;
#if defined DOSISH || defined __CYGWIN__
{
+ mode |= O_BINARY;
const char *ext = strrchr(fname, '.');
if (ext && STRCASECMP(ext, ".exe") == 0)
mode |= O_BINARY;
This causes all ruby script files loaded to be loaded as binary. The drawback is that if you have a ruby script that was saved as ascii and contains strings that wrap lines, those strings will have an extra "\n" in them, ex:
File.write 'stringy.rb', "a="abc\r\ndef"; puts a.inspect"
normal ruby:
C:>ruby stringy.rb
"abc\ndef"
patched ruby:
C:\>ruby stringy.rb
"abc\r\ndef"
But if your files were saved in binary mode it will be the same.
And the slowdown is gone for now.
Hopefully a better fix can be created.
Thanks.
-r
Updated by usa (Usaku NAKAMURA) about 15 years ago
Hello,
In message "[ruby-core:26840] [Bug #1332] Reading file on Windows is 500x slower then with previous Ruby version"
on Nov.21,2009 08:10:45, redmine@ruby-lang.org wrote:
This causes all ruby script files loaded to be loaded as binary. The drawback is that if you have a ruby script that was saved as ascii and contains strings that wrap lines, those strings will have an extra "\n" in them, ex:
pseudo-IO DATA recognizes the script file as data file.
So, changing default mode breaks the compatibility of such
scripts.
Regards,¶
U.Nakamura usa@garbagecollect.jp
Updated by rogerdpack (Roger Pack) almost 15 years ago
Appears that
-
the writes have slowed down, "only" by about 100% (take twice as long to write in ascii 1.9 as in 1.8). Not terrible.
-
the reads have slowed down by something like 40000% (!)
I think to avoid the slowdown with reads you can "hack a work around" like
c = File.open('out.file','rb') { |f| f.read }
c.gsub!("\r\n", "\n")
But this seems like there might be a bug in there, too.
-rp
Updated by mame (Yusuke Endoh) over 14 years ago
- Status changed from Assigned to Closed
Hi,
This was fixed at r27340.
Buffer was extended (realloc'ed) in linear-order, which resulted
in O(n^2 ). Now it is extended using "double memory if you run out"
rule, like String. So the problem was solved, I think.
Thanks,
--
Yusuke Endoh mame@tsg.ne.jp
Updated by rogerdpack (Roger Pack) over 14 years ago
appears to be much better in trunk.
1.9.1:
"0 0.396039"
"1 0.352035"
"2 43.111311"
1.9.2:
"0 0.369037"
"1 0.513051"
"2 1.626163" # still 10x as slow as 1.8.6, but probably because of a different reason.
Thanks!
-rp
Updated by mame (Yusuke Endoh) over 14 years ago
Hi,
2010/4/16 Roger Pack redmine@ruby-lang.org:
1.9.2:
"0 0.369037" "1 0.513051" "2 1.626163" # still 10x as slow as 1.8.6, but probably because of a different reason.
Yes, text mode is still 10x -- 30x slower than binary mode.
It is reproduced not only on windows but also Linux.
Perhaps, this is the symptom because of the reason explained
in [ruby-core:26515].
--
Yusuke ENDOH mame@tsg.ne.jp