Bug #1696
Updated by jeremyevans0 (Jeremy Evans) about 5 years ago
=begin I've been using a a bit of code which use open-uri and http urls to download files and have found this is unuseably slow. Example: require 'open-uri' open( <http uri> ) I did some benchmarking and the download of a 157MB file from the local machine was taking over 20 seconds and using 100% where as wget for the same file only took 0.7seconds. Digging some more and profiling the code with RubyProf revealed that during the execution of the download over 11,000 threads where being created. This I've tracked down to the net/protocol module and the Net::BufferIO::rbuf_fill method which is using a timeout block to wrap the @io.sysread(1024) call. This is clearly an extremely bad way to do this, totally crippling the performance. Below are benchmarks and some profiles of this issue and various quick fixes I tested here == 1.8.6 core == user system total real open 15.664062 6.164062 21.828125 ( 22.810859) %self total self wait child calls name 14.04 5.27 3.00 2.27 0.00 109426 <Class::Thread>#start (ruby_runtime:0} 10.97 13.14 2.34 0.00 10.80 109427 <Module::Timeout>#timeout (/usr/local/lib/ruby/1.8/timeout.rb:52} 6.65 2.52 1.42 1.10 0.00 109426 Thread#kill (ruby_runtime:0} 5.78 21.36 1.23 0.00 20.12 1 Net::BufferedIO#read (/usr/local/lib/ruby/1.8/net/protocol.rb:79} 5.27 14.27 1.12 0.00 13.14 109427 Object#timeout (/usr/local/lib/ruby/1.8/timeout.rb:92} 5.12 1.11 1.09 0.02 0.00 109426 IO#sysread (ruby_runtime:0} 4.75 2.61 1.02 0.00 1.59 109426 OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 3.40 3.62 0.73 0.00 2.89 109426 Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:380} 2.52 0.54 0.54 0.00 0.00 109435 String#slice! (ruby_runtime:0} 2.30 1.33 0.49 0.00 0.84 109415 #<Class:0x801ac6f30>#<< ((eval):1} 2.08 0.50 0.45 0.00 0.05 109427 Class#initialize (ruby_runtime:0} 2.08 0.98 0.45 0.00 0.54 109435 Net::BufferedIO#rbuf_consume (/usr/local/lib/ruby/1.8/net/protocol.rb:137} 2.01 4.05 0.43 0.00 3.62 109426 Net::ReadAdapter#<< (/usr/local/lib/ruby/1.8/net/protocol.rb:371} 1.94 0.41 0.41 0.00 0.00 109416 IO#write (ruby_runtime:0} 1.72 1.03 0.37 0.00 0.66 109435 Class#new (ruby_runtime:0} 1.50 0.32 0.32 0.00 0.00 328286 String#size (ruby_runtime:0} 1.32 14.55 0.28 0.00 14.27 109426 Net::BufferedIO#rbuf_fill (/usr/local/lib/ruby/1.8/net/protocol.rb:131} 1.21 0.84 0.26 0.00 0.58 109417 Kernel#__send__ (ruby_runtime:0} 1.13 0.24 0.24 0.00 0.00 109429 String#<< (ruby_runtime:0} 0.99 0.30 0.21 0.00 0.09 109428 Fixnum#== (ruby_runtime:0} 0.84 0.18 0.18 0.00 0.00 328289 Fixnum#+ (ruby_runtime:0} 0.77 0.16 0.16 0.00 0.00 109427 <Class::Class>#allocate (ruby_runtime:0} 0.77 0.58 0.16 0.00 0.41 109415 IO#<< (ruby_runtime:0} 0.73 0.28 0.16 0.00 0.12 109475 Hash#[] (ruby_runtime:0} ... == 1.8.6 core + no timeout block == user system total real open 1.757812 0.765625 2.523438 ( 2.873194) %self total self wait child calls name 21.62 2.63 2.63 0.00 0.00 109426 IO#sysread (ruby_runtime:0} 15.72 1.91 1.91 0.00 0.00 109429 String#<< (ruby_runtime:0} 13.98 1.70 1.70 0.00 0.00 109416 IO#write (ruby_runtime:0} 9.56 12.16 1.16 0.00 10.99 1 Net::BufferedIO#read (/usr/local/lib/ruby/1.8/net/protocol.rb:79} 7.83 3.73 0.95 0.00 2.77 109426 OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 5.58 4.68 0.68 0.00 4.00 109426 Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:385} 3.66 4.99 0.45 0.00 4.55 109426 Net::BufferedIO#rbuf_fill (/usr/local/lib/ruby/1.8/net/protocol.rb:138} 3.34 2.38 0.41 0.00 1.97 109415 #<Class:0x801ac82b8>#<< ((eval):1} 3.27 0.53 0.40 0.00 0.13 109435 Net::BufferedIO#rbuf_consume (/usr/local/lib/ruby/1.8/net/protocol.rb:142} 2.95 5.04 0.36 0.00 4.68 109426 Net::ReadAdapter#<< (/usr/local/lib/ruby/1.8/net/protocol.rb:376} 2.25 0.27 0.27 0.00 0.00 328289 Fixnum#+ (ruby_runtime:0} 1.54 0.19 0.19 0.00 0.00 328286 String#size (ruby_runtime:0} 1.28 0.27 0.16 0.00 0.12 109475 Hash#[] (ruby_runtime:0} 1.22 0.15 0.15 0.00 0.00 109431 Module#=== (ruby_runtime:0} 1.22 1.85 0.15 0.00 1.70 109415 IO#<< (ruby_runtime:0} 1.09 0.13 0.13 0.00 0.00 109435 String#slice! (ruby_runtime:0} 0.96 1.97 0.12 0.00 1.85 109417 Kernel#__send__ (ruby_runtime:0} 0.96 0.12 0.12 0.00 0.00 109426 String#length (ruby_runtime:0} 0.96 0.12 0.12 0.00 0.00 109435 Hash#default (ruby_runtime:0} 0.71 0.09 0.09 0.00 0.00 109439 Fixnum#< (ruby_runtime:0} ... == 1.8.6 core + no timeout block + 10K sysread == user system total real open 0.570312 1.406250 1.976562 ( 2.521614) %self total self wait child calls name 27.32 0.39 0.39 0.00 0.00 10953 String#slice! (ruby_runtime:0} 18.58 0.27 0.27 0.00 0.00 10943 IO#write (ruby_runtime:0} 12.02 0.17 0.17 0.00 0.00 10944 IO#sysread (ruby_runtime:0} 8.74 1.42 0.12 0.00 1.30 1 Net::BufferedIO#read (/usr/local/lib/ruby/1.8/net/protocol.rb:79} 7.10 0.50 0.10 0.00 0.40 10944 OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 5.46 0.37 0.08 0.00 0.29 10942 #<Class:0x801ac8b28>#<< ((eval):1} 3.28 0.05 0.05 0.00 0.00 32840 String#size (ruby_runtime:0} 2.73 0.55 0.04 0.00 0.52 10944 Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:385} 2.73 0.04 0.04 0.00 0.00 32843 Fixnum#+ (ruby_runtime:0} 2.73 0.59 0.04 0.00 0.55 10944 Net::ReadAdapter#<< (/usr/local/lib/ruby/1.8/net/protocol.rb:376} 1.64 0.21 0.02 0.00 0.19 10944 Net::BufferedIO#rbuf_fill (/usr/local/lib/ruby/1.8/net/protocol.rb:138} 1.64 0.41 0.02 0.00 0.39 10953 Net::BufferedIO#rbuf_consume (/usr/local/lib/ruby/1.8/net/protocol.rb:142} 1.09 0.02 0.02 0.00 0.00 10944 String#length (ruby_runtime:0} 1.09 0.02 0.02 0.00 0.00 10947 String#<< (ruby_runtime:0} 1.09 0.02 0.02 0.00 0.00 10953 Hash#default (ruby_runtime:0} 1.09 0.28 0.02 0.00 0.27 10942 IO#<< (ruby_runtime:0} ... == 1.8.6 core + no timeout block + 100K sysread == user system total real open 0.390625 0.484375 0.875000 ( 1.013516) %self total self wait child calls name 45.45 0.43 0.43 0.00 0.00 1098 IO#sysread (ruby_runtime:0} 33.88 0.32 0.32 0.00 0.00 1097 IO#write (ruby_runtime:0} 8.26 0.08 0.08 0.00 0.00 1107 String#slice! (ruby_runtime:0} 1.65 0.45 0.02 0.00 0.44 1098 Net::BufferedIO#rbuf_fill (/usr/local/lib/ruby/1.8/net/protocol.rb:138} 1.65 0.02 0.02 0.00 0.00 3302 String#size (ruby_runtime:0} 1.65 0.35 0.02 0.00 0.34 1098 OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 1.65 0.93 0.02 0.00 0.91 1 Net::BufferedIO#read (/usr/local/lib/ruby/1.8/net/protocol.rb:79} 0.83 0.01 0.01 0.00 0.00 1 URI::HTTP#initialize (/usr/local/lib/ruby/1.8/uri/http.rb:77} 0.83 0.09 0.01 0.00 0.08 1107 Net::BufferedIO#rbuf_consume (/usr/local/lib/ruby/1.8/net/protocol.rb:142} 0.83 0.01 0.01 0.00 0.00 357 Module#method_added (ruby_runtime:0} 0.83 0.01 0.01 0.00 0.00 37 <Class::Dir>#[] (ruby_runtime:0} 0.83 0.01 0.01 0.00 0.00 1101 String#<< (ruby_runtime:0} 0.83 0.36 0.01 0.00 0.35 1098 Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:385} 0.83 0.33 0.01 0.00 0.32 1096 IO#<< (ruby_runtime:0} ... == 1.8.6 core + no timeout block + 1M sysread == user system total real open 0.445312 0.335938 0.781250 ( 0.913736) %self total self wait child calls name 29.66 0.27 0.27 0.00 0.00 157 IO#sysread (ruby_runtime:0} 28.81 0.27 0.27 0.00 0.00 166 String#slice! (ruby_runtime:0} 28.81 0.27 0.27 0.00 0.00 156 IO#write (ruby_runtime:0} 7.63 0.07 0.07 0.00 0.00 160 String#<< (ruby_runtime:0} 1.69 0.30 0.02 0.00 0.28 157 Net::ReadAdapter#call_block (/usr/local/lib/ruby/1.8/net/protocol.rb:385} 0.85 0.01 0.01 0.00 0.00 37 <Class::Dir>#[] (ruby_runtime:0} 0.85 0.01 0.01 0.00 0.00 98 Class#inherited (ruby_runtime:0} 0.85 0.02 0.01 0.00 0.02 2 Kernel#gem_original_require (ruby_runtime:0} 0.85 0.28 0.01 0.00 0.27 157 OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} ... == 1.8.6 core + no timeout block + 1M sysread + slice bypass == user system total real open 0.093750 0.523438 0.617188 ( 0.717748) %self total self wait child calls name 66.67 0.23 0.23 0.00 0.00 52 IO#write (ruby_runtime:0} 13.33 0.05 0.05 0.00 0.00 52 IO#sysread (ruby_runtime:0} 4.44 0.02 0.02 0.00 0.00 55 String#<< (ruby_runtime:0} 4.44 0.02 0.02 0.00 0.00 1 TCPSocket#initialize (ruby_runtime:0} 2.22 0.01 0.01 0.00 0.00 101 Hash#[] (ruby_runtime:0} 2.22 0.01 0.01 0.00 0.00 10 String#slice! (ruby_runtime:0} 2.22 0.02 0.01 0.00 0.01 2 Kernel#gem_original_require (ruby_runtime:0} 2.22 0.25 0.01 0.00 0.24 52 OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 2.22 0.01 0.01 0.00 0.00 37 <Class::Dir>#[] (ruby_runtime:0} ... Note: The change does also raises questions about the core slice method not performing as well as it might for the no-op case. == 1.8.6 core + select timeout + 1M sysread + slice bypass == user system total real open 0.070312 0.226562 0.296875 ( 0.898285) %self total self wait child calls name 64.10 0.20 0.20 0.00 0.00 49 IO#write (ruby_runtime:0} 15.38 0.05 0.05 0.00 0.00 49 IO#sysread (ruby_runtime:0} 7.69 0.02 0.02 0.00 0.00 52 String#<< (ruby_runtime:0} 5.13 0.23 0.02 0.00 0.21 49 OpenURI::Buffer#<< (/usr/local/lib/ruby/1.8/open-uri.rb:297} 2.56 0.02 0.01 0.00 0.02 2 Kernel#gem_original_require (ruby_runtime:0} 2.56 0.01 0.01 0.00 0.00 37 <Class::Dir>#[] (ruby_runtime:0} 2.56 0.02 0.01 0.00 0.01 3 Array#each (ruby_runtime:0} ... So overall these changes improve the speed from 22seconds to 0.7seconds in my test case making ruby comparable with wget's performance. The patch for this last run is attached and should be compatible with existing functionality through the use of select + raise Timeout::Error instead of timeout( @read_timeout ) { .. } Hope this helps, any questions let me know =end