https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112014-07-06T07:47:43ZRuby Issue Tracking SystemRuby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=476022014-07-06T07:47:43Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul></ul><p>My environment is Debian 3.2.0-4-amd64</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=476042014-07-06T13:28:26Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul></ul><p>Alexandre Riveira wrote:<br>
I applied tests using Rubinius.<br>
Rubinius uses only 1 processor due to applied taskset, results:</p>
<p>first 18164692<br>
second 10007 <==========<br>
third 18184825</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=476392014-07-08T01:21:55Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>I'll try resurrecting an old eventfd proposal and maybe also bare futexes<br>
to see if that improves things.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=476432014-07-08T09:56:41Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul><li><strong>File</strong> <a href="/attachments/4525">teste_thread_schedule.py</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4525/teste_thread_schedule.py">teste_thread_schedule.py</a> added</li><li><strong>File</strong> <a href="/attachments/4526">teste_thread_schedule.rb</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4526/teste_thread_schedule.rb">teste_thread_schedule.rb</a> added</li></ul><p>Eric Wong wrote:</p>
<blockquote>
<p>I'll try resurrecting an old eventfd proposal and maybe also bare futexes<br>
to see if that improves things.</p>
</blockquote>
<p>Tank's Eric,</p>
<p>If an application running Rainbows has only one thread using 100% the worker is affected greatly in the query database. The solution is to try the fork worker for heavy tasks but often this is not possible.</p>
<p>GIL in Pyhton is better but use 160% of cpu and ruby use 100% of cpu.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=476472014-07-08T20:37:26Znormalperson (Eric Wong)normalperson@yhbt.net
<ul><li><strong>File</strong> <a href="/attachments/4528">test_thread_sched_pipe.rb</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4528/test_thread_sched_pipe.rb">test_thread_sched_pipe.rb</a> added</li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/47647/diff?detail_id=34434">diff</a>)</li></ul><p>eventfd doesn't help performance (but still reduces FD count),<br>
I never expected eventfd to improve speed, though.</p>
<p>Lowering TIME_QUANTUM_USEC (in thread_pthread.c) helps with the I/O case<br>
(try it yourself if you have a 1000HZ kernel); but hurts overall<br>
throughput.</p>
<p>Attached is a I/O bench using pipes without Postgres requirement.<br>
Increasing GVL (or any lock) performance is tricky because we need to<br>
balance fairness and avoid starvation cases. The GVL was rewritten to<br>
avoid starvation in 1.9.3, so that's likely the cause of the major<br>
difference starting with 1.9.3.</p>
<p>I doubt I can noticeably improve performance with futexes vs mutex/condvar.</p>
<p>How much does GVL performance between 1.9.2 and 2.1 affect real-world<br>
performance on Rainbows!/yahns apps for you? (not "hello world"-type<br>
apps).</p>
<p>I hope to make GVL optional in a few years, but that is tricky.<br>
Ironically, part of the reason I don't like GVL is I don't want to pay<br>
any threading/locking costs for tiny single-threaded apps, either :)</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=476492014-07-09T01:28:31Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul></ul><p>My application is not web-site is an ERP. So reporting and very heavy tasks are performed. Then the system crashes because only one thread using 100% cpu will damage the whole worker, passing any request for at least 2 seconds, then the requests go piling.<br>
The key point is a thread using 100% of cpu will make all worker threads just make a few requests for postgres.</p>
<p>Ruby Without GVL is that possible??</p>
<p>I believe python works best because it uses part of another cpu (160%) to manage all the threads.<br>
Doing the same test with pypy it uses 100% cpu as ruby and presents the same problems as ruby.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=476502014-07-09T01:46:07Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul></ul><p>information that I consider important<br>
Kernels BFS and ruby 1.9.2 work fine as if applied taskset.<br>
Other kernels like freebsd and macos with ruby 1.9.2 has similar behavior.</p>
<p><a href="http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler" class="external">http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler</a><br>
<a href="https://wiki.archlinux.org/index.php/linux-ck" class="external">https://wiki.archlinux.org/index.php/linux-ck</a></p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=476732014-07-10T06:04:17Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul></ul><p>Alexandre Riveira wrote:</p>
<blockquote>
<p>information that I consider important<br>
Kernels BFS and ruby 1.9.2 work fine as if applied taskset.<br>
Other kernels like freebsd and macos with ruby 1.9.2 has similar behavior.</p>
<p><a href="http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler" class="external">http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler</a><br>
<a href="https://wiki.archlinux.org/index.php/linux-ck" class="external">https://wiki.archlinux.org/index.php/linux-ck</a></p>
</blockquote>
<p>My results kernel linux BFS/CK + taskset</p>
<p>first 103214331<br>
second 2762 <======<br>
third 24259986</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=477232014-07-11T20:34:57Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul></ul><p>Eric Wong wrote:</p>
<blockquote>
<p>Lowering TIME_QUANTUM_USEC (in thread_pthread.c) helps with the I/O case<br>
(try it yourself if you have a 1000HZ kernel); but hurts overall<br>
throughput.</p>
</blockquote>
<p>Hello Eric!!!!</p>
<p>I stayed enjoyed the result of change TIME_QUANTUM_USEC. Changed its value to 1000 only see the results:</p>
<p>ruby 2.<br>
first 17434583<br>
second 2754 <=============<br>
third 16752441</p>
<p>If you have any problems I will try 10 * 1000.</p>
<p>It seems incredible because there was no need to apply taskset.<br>
As this is a microbenchmark'll do the tests and if all goes well put into production. After I report news.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=477242014-07-11T22:31:08Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul></ul><p>Alexandre Riveira wrote:</p>
<blockquote>
<p>Eric Wong wrote:</p>
<blockquote>
<p>Lowering TIME_QUANTUM_USEC (in thread_pthread.c) helps with the I/O case<br>
(try it yourself if you have a 1000HZ kernel); but hurts overall<br>
throughput.</p>
</blockquote>
<p>Hello Eric!!!!</p>
<p>I stayed enjoyed the result of change TIME_QUANTUM_USEC. Changed its value to 1000</p>
</blockquote>
<p>Tests completes, my system without changes join stress tests 30 secons for load page, after changes, pages loading in instant all pages loading in less than 1 second.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=477642014-07-14T20:10:54Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Good to know it works for you. Keep in mind TIME_QUANTUM_USEC=1000 is<br>
very low and may cause problems on some systems, too.</p>
<p>My gut feeling is 100ms (default) is too high, but 10ms is too low<br>
(based on kosaki's comment). Maybe 20ms - 50ms is acceptable. There is<br>
a wide variety of configuration we must work with (even just on Linux).</p>
<p>Can you try 20-50ms?</p>
<p>About GVL:<br>
Replacing GVL with fine-grained locks is possible (and ko1 tried it),<br>
but performance suffered for single-thread cases.<br>
It should be possible to do with lock-free techniques, but that is<br>
difficult to get right.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=477652014-07-15T01:24:49Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul></ul><p>Hi Eric !</p>
<p>Eric Wong wrote:</p>
<blockquote>
<p>Good to know it works for you. Keep in mind TIME_QUANTUM_USEC=1000 is</p>
</blockquote>
<p>What problems do I have?</p>
<blockquote>
<p>Can you try 20-50ms?</p>
</blockquote>
<p>In the application do a stress test where 5 threads overload.</p>
<p>I tested 50 and the latency is over the next 15 seconds.<br>
I tested the latency is 20 and next 10 seconds.<br>
I tested the latency is 10 and next 4 or 5 seconds.</p>
<p>The magic number is TIME_QUANTUM_USEC=1000. There is no latency in this case</p>
<p>Follow microbenckmars teste_thread_schedule_2 with postgres</p>
<p>TIME_QUANTUM_USEC (1000)<br>
first 22882400<br>
second 2654 <===<br>
third 22642172<br>
in 21.08 seconds</p>
<p>2654 / 21.08 is 125 connections for database per second</p>
<p>TIME_QUANTUM_USEC (20 * 1000)<br>
first 33003617<br>
second 258 <==<br>
third 33851933<br>
in 23.07 seconds<br>
258 / 23.07 is 11 connections for database per second. I think this small amount of connections per second but accept comments.</p>
<p>TIME_QUANTUM_USEC (50 * 1000)<br>
first 42811975<br>
second 116<br>
third 42005480<br>
in 25.12 seconds</p>
<p>116 / 25.12 is 5 connections for database per second.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=483722014-08-16T08:38:44Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p><a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a> wrote:</p>
<blockquote>
<p>I doubt I can noticeably improve performance with futexes vs mutex/condvar.</p>
</blockquote>
<p>Totally not-speed-optimized futex-based lock/condvar implementation at</p>
<pre><code>git://bogomips.org/ruby.git (futex branch)
http://bogomips.org/ruby.git/patch?id=ae93c50c8de
</code></pre>
<p>I am not sure if my implementation is correct, but "make check" passes<br>
with both 8 cores and 1 core active (8-core Vishera). I will probably<br>
write an independent (C-only) test for more parallelism and maybe steal<br>
some from glibc (I also plan on using this futex-based lock<br>
implementation outside of Ruby).</p>
<p>Benchmarks don't seem to show much (if any) improvement, yet. Speed<br>
improvement from reimplementing GVL around bare futex interface may be<br>
possible (w/o using separate condvar/mutex layer).</p>
<p>On amd64 GNU/Linux, pthread_mutex_t is 40 bytes, but these futex-based<br>
locks only need 4 bytes. Similarly, pthread_cond_t is 48 bytes, making<br>
rb_nativethread_cond_t 56 bytes with pthreads; this futex implementation<br>
currently requires only 16 bytes for a condvar.</p>
<p>Size improvement may be noticeable for some apps with many Mutexes:<br>
the lock/cond reductions mean rb_mutex_struct is now 48 bytes instead<br>
of 128 bytes.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=483762014-08-16T15:55:45Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul><li><strong>File</strong> <a href="/attachments/4626">test_thread_sched.rb</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4626/test_thread_sched.rb">test_thread_sched.rb</a> added</li></ul><p>I rewrote the test, I created the --tasket --postgres arguments and to use the same test file.</p>
<p>Feel free to change whatever you want.</p>
<p>Soon bring news about the test with futex</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=483772014-08-16T17:50:55Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul><li><strong>File</strong> <a href="/attachments/4627">test_thread_sched.rb</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4627/test_thread_sched.rb">test_thread_sched.rb</a> added</li><li><strong>File</strong> <a href="/attachments/4628">tests.txt</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4628/tests.txt">tests.txt</a> added</li></ul><p>I added in the uname test script for details kernel / platform<br>
Follow the accompanying tests</p>
<p>tests (test_thread_sche.rb --postgres) in debian-kfreebsd-amd64</p>
<p>ruby 1.9.2<br>
name...........: 9.0-2-amd64 x86_64<br>
processor......: Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz with (4 processores)<br>
taskset........: false<br>
total..........: 101480933<br>
postgres.......: 467<br>
time...........: 20.232985931 (ideal value of 20 seconds)</p>
<p>ruby 2.1.2<br>
name...........: 9.0-2-amd64 x86_64<br>
processor......: Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz with (4 processores)<br>
taskset........: false<br>
total..........: 71870185<br>
postgres.......: 58<br>
time...........: 21.123303293 (ideal value of 20 seconds)</p>
<p>ruby 2.1.2 with TIME_QUANTUM_USEC = 1000<br>
name...........: 9.0-2-amd64 x86_64<br>
processor......: Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz with (4 processores)<br>
taskset........: false<br>
total..........: 63996063<br>
postgres.......: 2510<br>
time...........: 20.050760184 (ideal value of 20 seconds)</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=483902014-08-17T23:10:25Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Some tests adapted from glibc:</p>
<pre><code>git clone git://80x24.org/rb_futex_test
</code></pre>
<p>tst-cond18-f/p are micro benchmarks, -f (futex version) is roughly<br>
twice a fast as the -p (pthreads version); but that doesn't seem<br>
to translate to noticeable real-world speed improvements in Ruby.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=497252014-10-29T13:13:34Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul></ul><p>.</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=497262014-10-29T13:23:35Zariveira (Alexandre Riveira)alexandre@objectdata.com.br
<ul><li><strong>File</strong> <a href="/attachments/4821">test.py</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4821/test.py">test.py</a> added</li></ul><p>Following script in python to buy blocking io python x ruby</p>
<p>Results:</p>
<p>ruby without changes TIME_QUANTUM_USEC (100 * 1000)<br>
first..........: 32445253<br>
second.........: 30660119<br>
postgres.......: 61<br>
time...........: 1.5022704 secs</p>
<p>ruby with TIME_QUANTUM_USEC (1 * 1000)</p>
<p>first..........: 17793384<br>
second.........: 17438453<br>
postgres.......: 4638</p>
<p>python</p>
<p>first 17498064<br>
postgres 2027<br>
third 18702539</p> Ruby master - Bug #10009: IO operation is 10x slower in multi-thread environmenthttps://redmine.ruby-lang.org/issues/10009?journal_id=525532015-05-21T07:19:46Zhsbt (Hiroshi SHIBATA)hsbt@ruby-lang.org
<ul><li><strong>Assignee</strong> set to <i>ko1 (Koichi Sasada)</i></li><li><strong>Priority</strong> changed from <i>6</i> to <i>Normal</i></li></ul>