Feature #13637
closed[PATCH] tool/runruby.rb: test with smallest possible machine stack
Description
Lets ensure none of our C functions use too much stack space and
fix all excessive stack usage before releasing the next version.
Reducing C stack usage should reduce conservative GC scanning
time and improve performance.
Hopefully there are no objections; I will commit in a few days.
If there are platform-dependent CI failures; excessive stack
usage should be fixed; rather than increasing minimum values or
removing these envs from testing.
Files
Updated by Anonymous over 7 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r59047.
tool/runruby.rb: test with smallest possible machine stack
Lets ensure none of our C functions use too much stack space and
fix all excessive stack usage before releasing the next version.
Reducing C stack usage should reduce conservative GC scanning
time and improve performance.
If there are platform-dependent test failures; excessive stack
usage should be fixed; rather than increasing minimum values or
removing these envs from testing.
- tool/runruby.rb: use smallest possible machine stack size
[ruby-core:81597] [Feature #13637]
Updated by ko1 (Koichi Sasada) over 7 years ago
I missed this ticket.
I wonder there are no failures on CI.
Do you mean that we shouldn't use recursive call which can increase machine stack on Thread
and Fiber
in our tests?
Now, we don't have such tests (so that we don't have failures/errors) but it is possible.
I agree that we should consider about machine stack size, but I'm not sure this approach is correct (at least now it seems no problem). Or if we need to introduce such recursive calls, we remove this restriction?
Thanks,
Koichi
Updated by normalperson (Eric Wong) over 7 years ago
ko1@atdot.net wrote:
I missed this ticket.
I wonder there are no failures on CI.
That is fortunate to hear :)
Do you mean that we shouldn't use recursive call which can increase machine stack on
Thread
andFiber
in our tests?
Now, we don't have such tests (so that we don't have failures/errors) but it is possible.
We should reconsider our code and data structures before
introducing recursion in code we control.
I agree that we should consider about machine stack size, but I'm not sure this approach is correct (at least now it seems no problem). Or if we need to introduce such recursive calls, we remove this restriction?
If we really need to introduce recursion; we can use
assert_separately to test it.
However, we should avoid recursion if possible because of GC
cost and potential portability + safety problems. Instead; we
can redesign data structures and algorithms to avoid recursion
(and maybe encourage Rubyists to do the same).
Coincidentally, I (finally) announced msgthr earlier on ruby-talk;
which makes non-recursive modifications to previously well-known
recursive algorithm:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/437984
These were originally implemented in Perl5:
Mail::Thread in CPAN:
https://rt.cpan.org/Ticket/Display.html?id=116727
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833479
and public-inbox:
https://public-inbox.org/meta/20160621031201.28089-1-e@80x24.org/t/
Updated by ReiOdaira (Rei Odaira) over 7 years ago
Ruby CI on AIX have frequently hit SystemStackError since this change was introduced.
http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/recent.html
http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/log/20170723T103301Z.fail.html.gz
If there are platform-dependent test failures; excessive stack usage should be fixed; rather than increasing minimum values or removing these envs from testing.
How do you think we can fix the "excessive stack usage"?
Updated by normalperson (Eric Wong) over 7 years ago
Rei.Odaira@gmail.com wrote:
Ruby CI on AIX have frequently hit SystemStackError since this change was introduced.
http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/recent.html
http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/log/20170723T103301Z.fail.html.gz
Either that is the 16K buffer in IO.copy_stream (see below) or
OpenSSL itself is using lots of stack. I don't think we can
fix OpenSSL... (curious, which version do you use?)
If there are platform-dependent test failures; excessive stack usage should be fixed; rather than increasing minimum values or removing these envs from testing.
How do you think we can fix the "excessive stack usage"?
Does Linux checkstack.pl work for your binaries?
https://80x24.org/mirrors/linux.git/plain/scripts/checkstack.pl
(usage in comment)
IO.copy_stream buffer:
Can you try the following patch to move allocation from stack
to heap? It may slow down small copies a little, but releasing
GVL also hurts, so I doubt the slow down will be noticeable.
diff --git a/io.c b/io.c
index 60af120c18..f4b3fcec4a 100644
--- a/io.c
+++ b/io.c
@@ -10692,7 +10692,7 @@ nogvl_copy_stream_write(struct copy_stream_struct *stp, char *buf, size_t len)
static void
nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
{
- char buf[1024*16];
- char *buf;
size_t len;
ssize_t ss;
int ret;
@@ -10702,6 +10702,13 @@ nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
int use_pread;
copy_length = stp->copy_length;
- if (copy_length < 0) {
- buf = xmalloc(16384);
- }
- else {
- buf = xmalloc(copy_length > 16384 ? 16384 : copy_length);
- }
use_eof = copy_length == (off_t)-1;
src_offset = stp->src_offset;
use_pread = src_offset != (off_t)-1;
Thanks
Updated by normalperson (Eric Wong) over 7 years ago
Sorry, original patch was broken :x (yet "make exam" passed...)
(it leaked memory and used sizeof improperly)
Can you try the following, instead?
diff --git a/io.c b/io.c
index 60af120c18..0d5ca0d95b 100644
--- a/io.c
+++ b/io.c
@@ -10692,7 +10692,7 @@ nogvl_copy_stream_write(struct copy_stream_struct *stp, char *buf, size_t len)
static void
nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
{
- char buf[1024*16];
- char *buf;
size_t len;
ssize_t ss;
int ret;
@@ -10700,8 +10700,14 @@ nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
int use_eof;
off_t src_offset;
int use_pread; - size_t alloc_size = 16384;
copy_length = stp->copy_length;
- if (copy_length > 0 && copy_length < alloc_size) {
- alloc_size = copy_length;
- }
- buf = xmalloc(alloc_size);
use_eof = copy_length == (off_t)-1;
src_offset = stp->src_offset;
use_pread = src_offset != (off_t)-1;
@@ -10713,18 +10719,18 @@ nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
if (r == (off_t)-1 && errno) {
stp->syserr = "lseek";
stp->error_no = errno;
-
return;
-
goto out;
}
src_offset = (off_t)-1;
use_pread = 0;
}
while (use_eof || 0 < copy_length) {
-
if (!use_eof && copy_length < (off_t)sizeof(buf)) {
-
if (!use_eof && copy_length < (off_t)alloc_size) {
len = (size_t)copy_length;
}
else {
-
len = sizeof(buf);
-
len = alloc_size;
}
if (use_pread) {
ss = maygvl_copy_stream_read(0, stp, buf, len, src_offset);
@@ -10735,15 +10741,17 @@ nogvl_copy_stream_read_write(struct copy_stream_struct stp)
ss = maygvl_copy_stream_read(0, stp, buf, len, (off_t)-1);
}
if (ss <= 0) / EOF or error */
-
return;
-
goto out;
ret = nogvl_copy_stream_write(stp, buf, ss);
if (ret < 0)
-
return;
-
goto out;
if (!use_eof)
copy_length -= ss;
}
+out:
- free(buf);
}
static void *
Thanks.
Updated by ReiOdaira (Rei Odaira) over 7 years ago
Thanks for the patch. Unfortunately, it did not solve the problem. Looks like this test does not call nogvl_copy_stream_read_write()
but instead calls copy_stream_fallback_body()
. As far as I read the code, there is no large array stack-allocated on that path...?
Updated by normalperson (Eric Wong) over 7 years ago
Rei.Odaira@gmail.com wrote:
Thanks for the patch. Unfortunately, it did not solve the problem. Looks like this test does not call
nogvl_copy_stream_read_write()
but instead callscopy_stream_fallback_body()
. As far as I read the code, there is no large array stack-allocated on that path...?
Ah, looks like you're right. Hmm.. which OpenSSL version do you
use? Perhaps we can set a higher stack size for some versions
of OpenSSL on AIX; I don't think we've seen this other
platforms...
Also, are NFDBITS and HAVE_RB_FD_INIT macros defined?
Platforms without them will allocate select() bitmaps on stack;
which can get big.
Perhaps enabling the (currently Linux-only) poll()
rb_wait_for_single_fd can avoid big bitmaps for you:
diff --git a/thread.c b/thread.c
index b7ee1d8d9b..c9e52b8698 100644
--- a/thread.c
+++ b/thread.c
@@ -3823,7 +3823,7 @@ rb_thread_fd_select(int max, rb_fdset_t * read, rb_fdset_t * write, rb_fdset_t *
- one we know of that supports using poll() in all places select()
- would work.
*/
-#if defined(HAVE_POLL) && defined(linux)
+#if defined(HAVE_POLL) && (defined(linux) || defined(_AIX))
define USE_POLL¶
#endif
Also, does pahole work on your binaries?
git clone git://git.kernel.org/pub/scm/devel/pahole/pahole.git
That helps find down big stack users (including OpenSSL
or any other 3rd party binaries).
Updated by ReiOdaira (Rei Odaira) over 7 years ago
I think I am using openssl-1.0.1s.
HAVE_RB_FD_INIT
is not defined, but in fact AIX has fd_mask
. It turned out that when defining HAVE_RB_FD_INIT
by checking fd_mask
, configure.in does not include sys/select.h. On AIX you have to explicitly include it to have fd_mask
. I fixed it in r59440, and the SystemStackError disappeared.
Thanks much for your help!
Updated by vo.x (Vit Ondruch) about 7 years ago
- Related to Bug #13757: TestBacktrace#test_caller_lev segaults on PPC added