Actions

Copy link

Feature #13637

closed

[PATCH] tool/runruby.rb: test with smallest possible machine stack

Added by normalperson (Eric Wong) about 8 years ago. Updated about 8 years ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:<unknown>]

Description

Lets ensure none of our C functions use too much stack space and
fix all excessive stack usage before releasing the next version.
Reducing C stack usage should reduce conservative GC scanning
time and improve performance.

Hopefully there are no objections; I will commit in a few days.

If there are platform-dependent CI failures; excessive stack
usage should be fixed; rather than increasing minimum values or
removing these envs from testing.

Files

0001-tool-runruby.rb-test-with-smallest-possible-machine-.patch (1.09 KB) 0001-tool-runruby.rb-test-with-smallest-possible-machine-.patch

normalperson (Eric Wong), 06/06/2017 11:02 PM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Anonymous about 8 years ago

Status changed from Open to Closed

Applied in changeset trunk|r59047.

tool/runruby.rb: test with smallest possible machine stack

If there are platform-dependent test failures; excessive stack
usage should be fixed; rather than increasing minimum values or
removing these envs from testing.

tool/runruby.rb: use smallest possible machine stack size
[ruby-core:81597] [Feature #13637]

Actions

Copy link

#2 [ruby-core:81627]

Updated by ko1 (Koichi Sasada) about 8 years ago

I missed this ticket.
I wonder there are no failures on CI.

Do you mean that we shouldn't use recursive call which can increase machine stack on Thread and Fiber in our tests?
Now, we don't have such tests (so that we don't have failures/errors) but it is possible.

I agree that we should consider about machine stack size, but I'm not sure this approach is correct (at least now it seems no problem). Or if we need to introduce such recursive calls, we remove this restriction?

Thanks,
Koichi

Actions

Copy link

#3 [ruby-core:81630]

Updated by normalperson (Eric Wong) about 8 years ago

ko1@atdot.net wrote:

I missed this ticket.
I wonder there are no failures on CI.

That is fortunate to hear :)

Do you mean that we shouldn't use recursive call which can increase machine stack on Thread and Fiber in our tests?
Now, we don't have such tests (so that we don't have failures/errors) but it is possible.

We should reconsider our code and data structures before
introducing recursion in code we control.

I agree that we should consider about machine stack size, but I'm not sure this approach is correct (at least now it seems no problem). Or if we need to introduce such recursive calls, we remove this restriction?

If we really need to introduce recursion; we can use
assert_separately to test it.

However, we should avoid recursion if possible because of GC
cost and potential portability + safety problems. Instead; we
can redesign data structures and algorithms to avoid recursion
(and maybe encourage Rubyists to do the same).

Coincidentally, I (finally) announced msgthr earlier on ruby-talk;
which makes non-recursive modifications to previously well-known
recursive algorithm:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/437984

These were originally implemented in Perl5:

Mail::Thread in CPAN:

https://rt.cpan.org/Ticket/Display.html?id=116727
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833479

and public-inbox:

https://public-inbox.org/meta/20160621031201.28089-1-e@80x24.org/t/

Actions

Copy link

#4 [ruby-core:82151]

Updated by ReiOdaira (Rei Odaira) about 8 years ago

Ruby CI on AIX have frequently hit SystemStackError since this change was introduced.
http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/recent.html
http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/log/20170723T103301Z.fail.html.gz

If there are platform-dependent test failures; excessive stack usage should be fixed; rather than increasing minimum values or removing these envs from testing.

How do you think we can fix the "excessive stack usage"?

Actions

Copy link

#5 [ruby-core:82153]

Updated by normalperson (Eric Wong) about 8 years ago

Rei.Odaira@gmail.com wrote:

Ruby CI on AIX have frequently hit SystemStackError since this change was introduced.
http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/recent.html
http://rubyci.s3.amazonaws.com/aix71_ppc/ruby-trunk/log/20170723T103301Z.fail.html.gz

Either that is the 16K buffer in IO.copy_stream (see below) or
OpenSSL itself is using lots of stack. I don't think we can
fix OpenSSL... (curious, which version do you use?)

If there are platform-dependent test failures; excessive stack usage should be fixed; rather than increasing minimum values or removing these envs from testing.

How do you think we can fix the "excessive stack usage"?

Does Linux checkstack.pl work for your binaries?

https://80x24.org/mirrors/linux.git/plain/scripts/checkstack.pl
(usage in comment)

IO.copy_stream buffer:

Can you try the following patch to move allocation from stack
to heap? It may slow down small copies a little, but releasing
GVL also hurts, so I doubt the slow down will be noticeable.

diff --git a/io.c b/io.c
index 60af120c18..f4b3fcec4a 100644
--- a/io.c
+++ b/io.c
@@ -10692,7 +10692,7 @@ nogvl_copy_stream_write(struct copy_stream_struct *stp, char *buf, size_t len)
static void
nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
{

char buf[1024*16];

char *buf;
size_t len;
ssize_t ss;
int ret;
@@ -10702,6 +10702,13 @@ nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
int use_pread;

copy_length = stp->copy_length;

if (copy_length < 0) {
buf = xmalloc(16384);
}
else {
buf = xmalloc(copy_length > 16384 ? 16384 : copy_length);
}

use_eof = copy_length == (off_t)-1;
src_offset = stp->src_offset;
use_pread = src_offset != (off_t)-1;

Thanks

Actions

Copy link

#6 [ruby-core:82159]

Updated by normalperson (Eric Wong) about 8 years ago

Sorry, original patch was broken :x (yet "make exam" passed...)
(it leaked memory and used sizeof improperly)

Can you try the following, instead?

diff --git a/io.c b/io.c
index 60af120c18..0d5ca0d95b 100644
--- a/io.c
+++ b/io.c
@@ -10692,7 +10692,7 @@ nogvl_copy_stream_write(struct copy_stream_struct *stp, char *buf, size_t len)
static void
nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
{

char buf[1024*16];

char *buf;
size_t len;
ssize_t ss;
int ret;
@@ -10700,8 +10700,14 @@ nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
int use_eof;
off_t src_offset;
int use_pread;
size_t alloc_size = 16384;

copy_length = stp->copy_length;

if (copy_length > 0 && copy_length < alloc_size) {
alloc_size = copy_length;
}
buf = xmalloc(alloc_size);

use_eof = copy_length == (off_t)-1;
src_offset = stp->src_offset;
use_pread = src_offset != (off_t)-1;
@@ -10713,18 +10719,18 @@ nogvl_copy_stream_read_write(struct copy_stream_struct *stp)
if (r == (off_t)-1 && errno) {
stp->syserr = "lseek";
stp->error_no = errno;

```
       return;
```

```
 goto out;
```

}
src_offset = (off_t)-1;
use_pread = 0;
}

while (use_eof || 0 < copy_length) {

   if (!use_eof && copy_length < (off_t)sizeof(buf)) {

   if (!use_eof && copy_length < (off_t)alloc_size) {

len = (size_t)copy_length;
}
else {

```
       len = sizeof(buf);
```

```
       len = alloc_size;
```

}
if (use_pread) {
ss = maygvl_copy_stream_read(0, stp, buf, len, src_offset);
@@ -10735,15 +10741,17 @@ nogvl_copy_stream_read_write(struct copy_stream_struct stp)
ss = maygvl_copy_stream_read(0, stp, buf, len, (off_t)-1);
}
if (ss <= 0) / EOF or error */

```
       return;
```

```
 goto out;
```

ret = nogvl_copy_stream_write(stp, buf, ss);
if (ret < 0)

```
       return;
```

```
 goto out;
```

if (!use_eof)
copy_length -= ss;
}
+out:

free(buf);
}

static void *

Thanks.

Actions

Copy link

#7 [ruby-core:82182]

Updated by ReiOdaira (Rei Odaira) about 8 years ago

Thanks for the patch. Unfortunately, it did not solve the problem. Looks like this test does not call nogvl_copy_stream_read_write() but instead calls copy_stream_fallback_body(). As far as I read the code, there is no large array stack-allocated on that path...?

Actions

Copy link

#8 [ruby-core:82197]

Updated by normalperson (Eric Wong) about 8 years ago

Rei.Odaira@gmail.com wrote:

Thanks for the patch. Unfortunately, it did not solve the problem. Looks like this test does not call nogvl_copy_stream_read_write() but instead calls copy_stream_fallback_body(). As far as I read the code, there is no large array stack-allocated on that path...?

Ah, looks like you're right. Hmm.. which OpenSSL version do you
use? Perhaps we can set a higher stack size for some versions
of OpenSSL on AIX; I don't think we've seen this other
platforms...

Also, are NFDBITS and HAVE_RB_FD_INIT macros defined?
Platforms without them will allocate select() bitmaps on stack;
which can get big.

Perhaps enabling the (currently Linux-only) poll()
rb_wait_for_single_fd can avoid big bitmaps for you:

diff --git a/thread.c b/thread.c
index b7ee1d8d9b..c9e52b8698 100644
--- a/thread.c
+++ b/thread.c
@@ -3823,7 +3823,7 @@ rb_thread_fd_select(int max, rb_fdset_t * read, rb_fdset_t * write, rb_fdset_t *

one we know of that supports using poll() in all places select()
would work.
*/
-#if defined(HAVE_POLL) && defined(linux)
+#if defined(HAVE_POLL) && (defined(linux) || defined(_AIX))

define USE_POLL¶

#endif

Also, does pahole work on your binaries?
git clone git://git.kernel.org/pub/scm/devel/pahole/pahole.git

That helps find down big stack users (including OpenSSL
or any other 3rd party binaries).

Actions

Copy link

#9 [ruby-core:82207]

Updated by ReiOdaira (Rei Odaira) about 8 years ago

I think I am using openssl-1.0.1s.

HAVE_RB_FD_INIT is not defined, but in fact AIX has fd_mask. It turned out that when defining HAVE_RB_FD_INIT by checking fd_mask, configure.in does not include sys/select.h. On AIX you have to explicitly include it to have fd_mask. I fixed it in r59440, and the SystemStackError disappeared.

Thanks much for your help!

Actions

Copy link

#10

Updated by vo.x (Vit Ondruch) almost 8 years ago

Related to Bug #13757: TestBacktrace#test_caller_lev segaults on PPC added

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #13637

[PATCH] tool/runruby.rb: test with smallest possible machine stack

Updated by Anonymous about 8 years ago

Updated by ko1 (Koichi Sasada) about 8 years ago

Updated by normalperson (Eric Wong) about 8 years ago

Updated by ReiOdaira (Rei Odaira) about 8 years ago

Updated by normalperson (Eric Wong) about 8 years ago

Updated by normalperson (Eric Wong) about 8 years ago

Updated by ReiOdaira (Rei Odaira) about 8 years ago

Updated by normalperson (Eric Wong) about 8 years ago

define USE_POLL¶

Updated by ReiOdaira (Rei Odaira) about 8 years ago

Updated by vo.x (Vit Ondruch) almost 8 years ago