https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112017-06-08T20:58:08ZRuby Issue Tracking SystemRuby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=653212017-06-08T20:58:08ZAnonymous
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Applied in changeset trunk|r59047.</p>
<hr>
<p>tool/runruby.rb: test with smallest possible machine stack</p>
<p>Lets ensure none of our C functions use too much stack space and<br>
fix all excessive stack usage before releasing the next version.<br>
Reducing C stack usage should reduce conservative GC scanning<br>
time and improve performance.</p>
<p>If there are platform-dependent test failures; excessive stack<br>
usage should be fixed; rather than increasing minimum values or<br>
removing these envs from testing.</p>
<ul>
<li>tool/runruby.rb: use smallest possible machine stack size<br>
<a href="https://blade.ruby-lang.org/ruby-core/81597">[ruby-core:81597]</a> [Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: [PATCH] tool/runruby.rb: test with smallest possible machine stack (Closed)" href="https://redmine.ruby-lang.org/issues/13637">#13637</a>]</li>
</ul> Ruby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=653242017-06-09T04:35:44Zko1 (Koichi Sasada)
<ul></ul><p>I missed this ticket.<br>
I wonder there are no failures on CI.</p>
<p>Do you mean that we shouldn't use recursive call which can increase machine stack on <code>Thread</code> and <code>Fiber</code> in our tests?<br>
Now, we don't have such tests (so that we don't have failures/errors) but it is possible.</p>
<p>I agree that we should consider about machine stack size, but I'm not sure this approach is correct (at least now it seems no problem). Or if we need to introduce such recursive calls, we remove this restriction?</p>
<p>Thanks,<br>
Koichi</p> Ruby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=653262017-06-09T07:41:27Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p><a href="mailto:ko1@atdot.net" class="email">ko1@atdot.net</a> wrote:</p>
<blockquote>
<p>I missed this ticket.<br>
I wonder there are no failures on CI.</p>
</blockquote>
<p>That is fortunate to hear :)</p>
<blockquote>
<p>Do you mean that we shouldn't use recursive call which can increase machine stack on <code>Thread</code> and <code>Fiber</code> in our tests?<br>
Now, we don't have such tests (so that we don't have failures/errors) but it is possible.</p>
</blockquote>
<p>We should reconsider our code and data structures before<br>
introducing recursion in code we control.</p>
<blockquote>
<p>I agree that we should consider about machine stack size, but I'm not sure this approach is correct (at least now it seems no problem). Or if we need to introduce such recursive calls, we remove this restriction?</p>
</blockquote>
<p>If we really need to introduce recursion; we can use<br>
assert_separately to test it.</p>
<p>However, we should avoid recursion if possible because of GC<br>
cost and potential portability + safety problems. Instead; we<br>
can redesign data structures and algorithms to avoid recursion<br>
(and maybe encourage Rubyists to do the same).</p>
<p>Coincidentally, I (finally) announced msgthr earlier on ruby-talk;<br>
which makes non-recursive modifications to previously well-known<br>
recursive algorithm:</p>
<p><a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/437984" class="external">http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/437984</a></p>
<p>These were originally implemented in Perl5:</p>
<p>Mail::Thread in CPAN:</p>
<p><a href="https://rt.cpan.org/Ticket/Display.html?id=116727" class="external">https://rt.cpan.org/Ticket/Display.html?id=116727</a><br>
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833479" class="external">http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833479</a></p>
<p>and public-inbox:</p>
<p><a href="https://public-inbox.org/meta/20160621031201.28089-1-e@80x24.org/t/" class="external">https://public-inbox.org/meta/20160621031201.28089-1-e@80x24.org/t/</a></p> Ruby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=659062017-07-24T20:28:06ZReiOdaira (Rei Odaira)Rei.Odaira@gmail.com
<ul></ul><p>Ruby CI on AIX have frequently hit SystemStackError since this change was introduced.<br>
<a href="http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/recent.html" class="external">http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/recent.html</a><br>
<a href="http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/log/20170723T103301Z.fail.html.gz" class="external">http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/log/20170723T103301Z.fail.html.gz</a></p>
<blockquote>
<p>If there are platform-dependent test failures; excessive stack usage should be fixed; rather than increasing minimum values or removing these envs from testing.</p>
</blockquote>
<p>How do you think we can fix the "excessive stack usage"?</p> Ruby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=659082017-07-24T21:08:34Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p><a href="mailto:Rei.Odaira@gmail.com" class="email">Rei.Odaira@gmail.com</a> wrote:</p>
<blockquote>
<p>Ruby CI on AIX have frequently hit SystemStackError since this change was introduced.<br>
<a href="http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/recent.html" class="external">http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/recent.html</a><br>
<a href="http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/log/20170723T103301Z.fail.html.gz" class="external">http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/log/20170723T103301Z.fail.html.gz</a></p>
</blockquote>
<p>Either that is the 16K buffer in IO.copy_stream (see below) or<br>
OpenSSL itself is using lots of stack. I don't think we can<br>
fix OpenSSL... (curious, which version do you use?)</p>
<blockquote>
<blockquote>
<p>If there are platform-dependent test failures; excessive stack usage should be fixed; rather than increasing minimum values or removing these envs from testing.</p>
</blockquote>
<p>How do you think we can fix the "excessive stack usage"?</p>
</blockquote>
<p>Does Linux checkstack.pl work for your binaries?</p>
<p><a href="https://80x24.org/mirrors/linux.git/plain/scripts/checkstack.pl" class="external">https://80x24.org/mirrors/linux.git/plain/scripts/checkstack.pl</a><br>
(usage in comment)</p>
<p>IO.copy_stream buffer:</p>
<p>Can you try the following patch to move allocation from stack<br>
to heap? It may slow down small copies a little, but releasing<br>
GVL also hurts, so I doubt the slow down will be noticeable.</p>
<p>diff --git a/io.c b/io.c<br>
index 60af120c18..f4b3fcec4a 100644<br>
--- a/io.c<br>
+++ b/io.c<br>
@@ -10692,7 +10692,7 @@ nogvl_copy_stream_write(struct copy_stream_struct *stp, char *buf, size_t len)<br>
static void<br>
nogvl_copy_stream_read_write(struct copy_stream_struct *stp)<br>
{</p>
<ul>
<li>char buf[1024*16];</li>
</ul>
<ul>
<li>char *buf;<br>
size_t len;<br>
ssize_t ss;<br>
int ret;<br>
@@ -10702,6 +10702,13 @@ nogvl_copy_stream_read_write(struct copy_stream_struct *stp)<br>
int use_pread;</li>
</ul>
<p>copy_length = stp->copy_length;</p>
<ul>
<li>if (copy_length < 0) {</li>
<li>buf = xmalloc(16384);</li>
<li>}</li>
<li>else {</li>
<li>buf = xmalloc(copy_length > 16384 ? 16384 : copy_length);</li>
<li>}</li>
<li>
</ul>
<p>use_eof = copy_length == (off_t)-1;<br>
src_offset = stp->src_offset;<br>
use_pread = src_offset != (off_t)-1;</p>
<p>Thanks</p> Ruby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=659132017-07-25T02:51:39Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>Sorry, original patch was broken :x (yet "make exam" passed...)<br>
(it leaked memory and used sizeof improperly)</p>
<p>Can you try the following, instead?</p>
<p>diff --git a/io.c b/io.c<br>
index 60af120c18..0d5ca0d95b 100644<br>
--- a/io.c<br>
+++ b/io.c<br>
@@ -10692,7 +10692,7 @@ nogvl_copy_stream_write(struct copy_stream_struct *stp, char *buf, size_t len)<br>
static void<br>
nogvl_copy_stream_read_write(struct copy_stream_struct *stp)<br>
{</p>
<ul>
<li>char buf[1024*16];</li>
</ul>
<ul>
<li>char *buf;<br>
size_t len;<br>
ssize_t ss;<br>
int ret;<br>
@@ -10700,8 +10700,14 @@ nogvl_copy_stream_read_write(struct copy_stream_struct *stp)<br>
int use_eof;<br>
off_t src_offset;<br>
int use_pread;</li>
<li>size_t alloc_size = 16384;</li>
</ul>
<p>copy_length = stp->copy_length;</p>
<ul>
<li>if (copy_length > 0 && copy_length < alloc_size) {</li>
<li>alloc_size = copy_length;</li>
<li>}</li>
<li>buf = xmalloc(alloc_size);</li>
<li>
</ul>
<p>use_eof = copy_length == (off_t)-1;<br>
src_offset = stp->src_offset;<br>
use_pread = src_offset != (off_t)-1;<br>
@@ -10713,18 +10719,18 @@ nogvl_copy_stream_read_write(struct copy_stream_struct *stp)<br>
if (r == (off_t)-1 && errno) {<br>
stp->syserr = "lseek";<br>
stp->error_no = errno;</p>
<ul>
<li>
<pre><code> return;
</code></pre>
</li>
</ul>
<ul>
<li>
<pre><code> goto out;
</code></pre>
</li>
</ul>
<p>}<br>
src_offset = (off_t)-1;<br>
use_pread = 0;<br>
}</p>
<p>while (use_eof || 0 < copy_length) {</p>
<ul>
<li>
<pre><code> if (!use_eof && copy_length < (off_t)sizeof(buf)) {
</code></pre>
</li>
</ul>
<ul>
<li>
<pre><code> if (!use_eof && copy_length < (off_t)alloc_size) {
</code></pre>
</li>
</ul>
<p>len = (size_t)copy_length;<br>
}<br>
else {</p>
<ul>
<li>
<pre><code> len = sizeof(buf);
</code></pre>
</li>
</ul>
<ul>
<li>
<pre><code> len = alloc_size;
</code></pre>
</li>
</ul>
<p>}<br>
if (use_pread) {<br>
ss = maygvl_copy_stream_read(0, stp, buf, len, src_offset);<br>
@@ -10735,15 +10741,17 @@ nogvl_copy_stream_read_write(struct copy_stream_struct <em>stp)<br>
ss = maygvl_copy_stream_read(0, stp, buf, len, (off_t)-1);<br>
}<br>
if (ss <= 0) /</em> EOF or error */</p>
<ul>
<li>
<pre><code> return;
</code></pre>
</li>
</ul>
<ul>
<li>
<pre><code> goto out;
</code></pre>
</li>
</ul>
<p>ret = nogvl_copy_stream_write(stp, buf, ss);<br>
if (ret < 0)</p>
<ul>
<li>
<pre><code> return;
</code></pre>
</li>
</ul>
<ul>
<li>
<pre><code> goto out;
</code></pre>
</li>
</ul>
<p>if (!use_eof)<br>
copy_length -= ss;<br>
}<br>
+out:</p>
<ul>
<li>free(buf);<br>
}</li>
</ul>
<p>static void *</p>
<p>Thanks.</p> Ruby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=659462017-07-26T20:40:57ZReiOdaira (Rei Odaira)Rei.Odaira@gmail.com
<ul></ul><p>Thanks for the patch. Unfortunately, it did not solve the problem. Looks like this test does not call <code>nogvl_copy_stream_read_write()</code> but instead calls <code>copy_stream_fallback_body()</code>. As far as I read the code, there is no large array stack-allocated on that path...?</p> Ruby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=659632017-07-27T09:41:27Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p><a href="mailto:Rei.Odaira@gmail.com" class="email">Rei.Odaira@gmail.com</a> wrote:</p>
<blockquote>
<p>Thanks for the patch. Unfortunately, it did not solve the problem. Looks like this test does not call <code>nogvl_copy_stream_read_write()</code> but instead calls <code>copy_stream_fallback_body()</code>. As far as I read the code, there is no large array stack-allocated on that path...?</p>
</blockquote>
<p>Ah, looks like you're right. Hmm.. which OpenSSL version do you<br>
use? Perhaps we can set a higher stack size for some versions<br>
of OpenSSL on AIX; I don't think we've seen this other<br>
platforms...</p>
<p>Also, are NFDBITS and HAVE_RB_FD_INIT macros defined?<br>
Platforms without them will allocate select() bitmaps on stack;<br>
which can get big.</p>
<p>Perhaps enabling the (currently Linux-only) poll()<br>
rb_wait_for_single_fd can avoid big bitmaps for you:</p>
<p>diff --git a/thread.c b/thread.c<br>
index b7ee1d8d9b..c9e52b8698 100644<br>
--- a/thread.c<br>
+++ b/thread.c<br>
@@ -3823,7 +3823,7 @@ rb_thread_fd_select(int max, rb_fdset_t * read, rb_fdset_t * write, rb_fdset_t *</p>
<ul>
<li>one we know of that supports using poll() in all places select()</li>
<li>would work.<br>
*/<br>
-#if defined(HAVE_POLL) && defined(<strong>linux</strong>)<br>
+#if defined(HAVE_POLL) && (defined(<strong>linux</strong>) || defined(_AIX))</li>
</ul>
<a name="define-USE_POLL"></a>
<h1 >define USE_POLL<a href="#define-USE_POLL" class="wiki-anchor">¶</a></h1>
<p>#endif</p>
<p>Also, does pahole work on your binaries?<br>
git clone git://git.kernel.org/pub/scm/devel/pahole/pahole.git</p>
<p>That helps find down big stack users (including OpenSSL<br>
or any other 3rd party binaries).</p> Ruby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=659712017-07-28T16:57:14ZReiOdaira (Rei Odaira)Rei.Odaira@gmail.com
<ul></ul><p>I think I am using openssl-1.0.1s.</p>
<p><code>HAVE_RB_FD_INIT</code> is not defined, but in fact AIX has <code>fd_mask</code>. It turned out that when defining <code>HAVE_RB_FD_INIT</code> by checking <code>fd_mask</code>, configure.in does not include sys/select.h. On AIX you have to explicitly include it to have <code>fd_mask</code>. I fixed it in r59440, and the SystemStackError disappeared.</p>
<p>Thanks much for your help!</p> Ruby master - Feature #13637: [PATCH] tool/runruby.rb: test with smallest possible machine stackhttps://redmine.ruby-lang.org/issues/13637?journal_id=662802017-08-25T11:45:43Zvo.x (Vit Ondruch)v.ondruch@tiscali.cz
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/13757">Bug #13757</a>: TestBacktrace#test_caller_lev segaults on PPC</i> added</li></ul>