Project

General

Profile

Actions

Bug #6341

closed

SIGSEGV: Thread.new { fork { GC.start } }.join

Added by rudolf (r stu3) almost 13 years ago. Updated over 12 years ago.

Status:
Third Party's Issue
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.3p196 (2012-04-21) [x86_64-netbsd6.0.]
Backport:
[ruby-core:44533]

Description

When running ruby (ruby-lang.org) test suite, I am able to provoke a segfault using a test from bootstraptest/test_thread.rb:
begin
Thread.new { fork { GC.start } }.join
pid, status = Process.wait2
$result = status.success? ? :ok : :ng
rescue NotImplementedError
$result = :ok
end

It is easy to provoke the problem with this command launched in shell (1-20 times until the problem shows):
$ ruby -e "Thread.new { fork { GC.start } }.join"

Environment:

  • NetBSD-6.0_BETA amd64, from 2012/04/21
  • ruby_1_9_3 branch, revision 35416 (I've also tried trunk and it has similar problem)
  • 8 GB RAM, a lot of free memory, dual-core CPU
  • gcc version 4.5.3 (NetBSD nb2 20110806)

Backtrace and "bt full" is available in the attached gdb_session.txt file.


Files

gdb_session.txt (9.23 KB) gdb_session.txt Backtrace rudolf (r stu3), 04/22/2012 11:27 PM

Related issues 4 (0 open4 closed)

Related to Backport187 - Backport #2603: NetBSD 5.0以降でpthreadの処理に由来する不具合Closedshyouhei (Shyouhei Urabe)01/14/2010Actions
Related to Backport187 - Backport #2739: ruby 1.8.7 built with pthreads hangs under some circumstancesClosedshyouhei (Shyouhei Urabe)02/12/2010Actions
Related to Ruby master - Bug #270: lazy timer thraed creationClosedko1 (Koichi Sasada)Actions
Related to Ruby master - Bug #2724: fork from other than the main thread causes wrong pthread condition on NetBSDThird Party's Issue02/09/2010Actions

Updated by mame (Yusuke Endoh) over 12 years ago

  • Status changed from Open to Feedback
  • Priority changed from 5 to 3

Hello,

NetBSD is not supported currently. A patch is welcome.

The following is just my guess.

There is a constraint in POSIX that only async-signal-safe functions
can be called after fork. In many systems which conforms to POSIX,
this constraint is more or less relaxed. In recent NetBSD, however,
I heard that the constraint is real. To comply with it, I have no
idea but prohibiting Kernel#fork in NetBSD (as well as windows).

Again, this is just my guess. I'm sorry if my guess is wrong.

--
Yusuke Endoh

Actions #2

Updated by rudolf (r stu3) over 12 years ago

Thanks for the comment. I am sorry, but I don't understand the sentence "NetBSD is not supported". I can see NetBSD-related statements in configure.in, so that means, that the build infrastructure has support for NetBSD (and indeed, it builds on NetBSD). Can you please explain, what you mean with that?

For the problem at hand: if I understand correctly, the mentioned test (from bootstraptest/test_thread.rb) should not be run under POSIX systems?

I have found some related notes (https://www.opengroup.org/austin/docs/austin_446.txt) about posix_spawn() supposedly being async-signal-safe. Maybe Kernel#fork should use posix_spawn() on systems which implement this function?

Updated by mame (Yusuke Endoh) over 12 years ago

rudolf (r stu3) wrote:

Thanks for the comment. I am sorry, but I don't understand the sentence "NetBSD is not supported". I can see NetBSD-related statements in configure.in, so that means, that the build infrastructure has support for NetBSD (and indeed, it builds on NetBSD).

An early NetBSD was somewhat supported.
IIRC, the strong constraint I said has been employed since NetBSD 5.0+.

Can you please explain, what you mean with that?

See below.

https://bugs.ruby-lang.org/projects/ruby-trunk/wiki/SupportedPlatforms

For the problem at hand: if I understand correctly, the mentioned test (from bootstraptest/test_thread.rb) should not be run under POSIX systems?

Strictly speaking, yes, I think.
But the feature works and is actually useful in many "supported" platforms.
So we won't remove the test.

I have found some related notes (https://www.opengroup.org/austin/docs/austin_446.txt) about posix_spawn() supposedly being async-signal-safe. Maybe Kernel#fork should use posix_spawn() on systems which implement this function?

posix_spawn does not substitute for fork. It is almost fork+exec.
IOW, we cannot use posix_spawn to implement Kernel#fork.

FYI, we have Kernel#spawn already.

--
Yusuke Endoh

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

Hello,

I think this is a kind of documentation issue. If you use C, you can't use both thread and fork. It's obvious.
And almost people think ruby naturally has the same limitation because ruby is written by C. But rudolf implicitly
pointed out some people don't think so.

So, if anyone send me a documentation patch, i'll merge it.

Updated by rudolf (r stu3) over 12 years ago

mame (Yusuke Endoh) wrote:

rudolf (r stu3) wrote:

I don't understand the sentence "NetBSD is not supported". [...] Can you please explain, what you mean with that?

See below.
https://bugs.ruby-lang.org/projects/ruby-trunk/wiki/SupportedPlatforms

Thanks, i was not aware of that. I'll respect that.

FYI, we have Kernel#spawn already.

I didn't know that, thanks.

Updated by rudolf (r stu3) over 12 years ago

kosaki (Motohiro KOSAKI) wrote:

I think this is a kind of documentation issue. If you use C, you can't use both thread and fork. It's obvious.
And almost people think ruby naturally has the same limitation because ruby is written by C. But rudolf implicitly
pointed out some people don't think so.

So, if anyone send me a documentation patch, i'll merge it.

Yes, I agree it will help if the documentation will mention that. But the sole existence of a test which uses both thread and fork is somewhat contradictory :)

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

kosaki (Motohiro KOSAKI) wrote:

I think this is a kind of documentation issue. If you use C, you can't use both thread and fork. It's obvious.
And almost people think ruby naturally has the same limitation because ruby is written by C. But rudolf implicitly
pointed out some people don't think so.

So, if anyone send me a documentation patch, i'll merge it.

Yes, I agree it will help if the documentation will mention that. But the sole existence of a test which uses both thread and fork is somewhat contradictory :)

Just add "skip if NetBSD" idiom into the test. :)
Our test suite aren't intended to test posix issue, it only works for
detecting unintended commit from careless commiter.

Updated by mame (Yusuke Endoh) over 12 years ago

Hello,

2012/4/24, kosaki (Motohiro KOSAKI) :

I think this is a kind of documentation issue. If you use C, you can't use
both thread and fork. It's obvious.
And almost people think ruby naturally has the same limitation because ruby
is written by C. But rudolf implicitly
pointed out some people don't think so.

No, this is not documentation issue.
The easy code actually causes SEGV on the platform, doesn't it?
I'm against remaining such a significant problem and adding ad-hoc
guards to the tests.

I suggest to make Kernel#fork raise a NotImplementedError on NetBSD
5.0+.
Fortunately, the tests already have a guard for NotImplementedError
because there is a supported platform that does not support
Kernel#fork (you know, windows).
Even on the platform, SEGV is not allowed, in principle.

Note, my suggestion is based on my uncertain guess about NetBSD 5.0+.
I'm not even a user of NetBSD. I think anyone should confirm if my
guess is correct.
Kosaki-san, if you seriously want to support NetBSD, I'd like you to
be a platform maintainer for NetBSD.

--
Yusuke Endoh

Updated by naruse (Yui NARUSE) over 12 years ago

I tried the code with NetBSD 5.1_STABLE (GENERIC) i386 Mon Oct 3 14:22:23 JST 2011 and it works.
So NetBSD recently broke something around pthread.

As far as I know, Ruby 1.9 on NetBSD 4.0 can't fork because pthread can't mix with fork as mame and kosaki said.
But NetBSD 5.0 or later seems to try to work them.
(I reported such problem as kern/42772)

So this should be handled as Third Party's Issue, and report to NetBSD.
NetBSD may fix if they treated this as a bug.

mame (Yusuke Endoh) wrote:

2012/4/24, kosaki (Motohiro KOSAKI) :

I think this is a kind of documentation issue. If you use C, you can't use
both thread and fork. It's obvious.
And almost people think ruby naturally has the same limitation because ruby
is written by C. But rudolf implicitly
pointed out some people don't think so.

No, this is not documentation issue.
The easy code actually causes SEGV on the platform, doesn't it?
I'm against remaining such a significant problem and adding ad-hoc
guards to the tests.

Ruby shouldn't cause SEGV, but

I suggest to make Kernel#fork raise a NotImplementedError on NetBSD
5.0+.
Fortunately, the tests already have a guard for NotImplementedError
because there is a supported platform that does not support
Kernel#fork (you know, windows).
Even on the platform, SEGV is not allowed, in principle.

Note, my suggestion is based on my uncertain guess about NetBSD 5.0+.
I'm not even a user of NetBSD. I think anyone should confirm if my
guess is correct.
Kosaki-san, if you seriously want to support NetBSD, I'd like you to
be a platform maintainer for NetBSD.

as I said above, it works with NetBSD 5.1.
So this is a bug added after 5.1.

Updated by naruse (Yui NARUSE) over 12 years ago

naruse (Yui NARUSE) wrote:

as I said above, it works with NetBSD 5.1.
So this is a bug added after 5.1.

I tried it on i386-netbsdelf6.99.4 (2012-04-13), and it works.
It maybe x86_64 issue, or fixed in current, or something.

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

On Mon, Apr 23, 2012 at 11:17 PM, Yusuke Endoh wrote:

Hello,

2012/4/24, kosaki (Motohiro KOSAKI) :

I think this is a kind of documentation issue. If you use C, you can't use
both thread and fork. It's obvious.
And almost people think ruby naturally has the same limitation because ruby
is written by C. But rudolf implicitly
pointed out some people don't think so.

No, this is not documentation issue.
The easy code actually causes SEGV on the platform, doesn't it?

Any platform may cause SEGV on such code. In fact, it's not NetBSD specific.
Please think why thread+fork require async-signal-safe.

Do you want raise NotImplementError on all platform?

I'm against remaining such a significant problem and adding ad-hoc
guards to the tests.

I suggest to make Kernel#fork raise a NotImplementedError on NetBSD
5.0+.
Fortunately, the tests already have a guard for NotImplementedError
because there is a supported platform that does not support
Kernel#fork (you know, windows).
Even on the platform, SEGV is not allowed, in principle.

No. it works if user don't use threads. So, one option is, fork after
thread.new raise
an exception on all platform.

Note, my suggestion is based on my uncertain guess about NetBSD 5.0+.
I'm not even a user of NetBSD.  I think anyone should confirm if my
guess is correct.
Kosaki-san, if you seriously want to support NetBSD, I'd like you to
be a platform maintainer for NetBSD.

Nope. I'm not NetBSD user.

Actions #12

Updated by naruse (Yui NARUSE) over 12 years ago

kosaki (Motohiro KOSAKI) wrote:

On Mon, Apr 23, 2012 at 11:17 PM, Yusuke Endoh wrote:

I suggest to make Kernel#fork raise a NotImplementedError on NetBSD
5.0+.
Fortunately, the tests already have a guard for NotImplementedError
because there is a supported platform that does not support
Kernel#fork (you know, windows).
Even on the platform, SEGV is not allowed, in principle.

No. it works if user don't use threads. So, one option is, fork after
thread.new raise an exception on all platform.

It is wrong.
Ruby 1.9 makes timer thread even if there is only the main thread.

Updated by rudolf (r stu3) over 12 years ago

I am not sure if the following information are relevant, but I want you to have a complete picture.

I've tried to bring this issue on NetBSD maillist and I've got the following reply (http://mail-index.netbsd.org/current-users/2012/04/22/msg019961.html):
``Memory faults in any malloc implementation are mostly due to bugs is the application (use after free, overwrite end/start allocated buffer, etc).''

When trying with rubinius (yesterday's git snapshot of master branch), it survives without a hiccup.

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

kosaki (Motohiro KOSAKI) wrote:

On Mon, Apr 23, 2012 at 11:17 PM, Yusuke Endoh wrote:
 > I suggest to make Kernel#fork raise a NotImplementedError on NetBSD
 > 5.0+.
 > Fortunately, the tests already have a guard for NotImplementedError
 > because there is a supported platform that does not support
 > Kernel#fork (you know, windows).
 > Even on the platform, SEGV is not allowed, in principle.

 No. it works if user don't use threads. So, one option is, fork after
 thread.new raise an exception on all platform.

It is wrong.
Ruby 1.9 makes timer thread even if there is only the main thread.

You don't understand the issue. On other OSs, async-signal-unsafe function usage
makes unlocked libc lock issue. therefore our current fork code works.
It kill timer thread
at once.

Updated by mame (Yusuke Endoh) over 12 years ago

Hello,

2012/4/24, KOSAKI Motohiro :

Do you want raise NotImplementError on all platform?

My answer is yes, if the problem occurs actually.

So, one option is, fork after thread.new raise
an exception on all platform.

Looks good.

Note, my suggestion is based on my uncertain guess about NetBSD 5.0+.
I'm not even a user of NetBSD. I think anyone should confirm if my
guess is correct.
Kosaki-san, if you seriously want to support NetBSD, I'd like you to
be a platform maintainer for NetBSD.

Nope. I'm not NetBSD user.

Why do all the tests need to be passed on a platform that
there is no maintainer for?

--
Yusuke Endoh

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

(4/24/12 6:55 AM), Yusuke Endoh wrote:

Hello,

2012/4/24, KOSAKI Motohiro:

Do you want raise NotImplementError on all platform?

My answer is yes, if the problem occurs actually.

So, one option is, fork after thread.new raise
an exception on all platform.

Looks good.

okey. at first, I'd like to add a warnings and observe how much apps
makes whiny.

Note, my suggestion is based on my uncertain guess about NetBSD 5.0+.
I'm not even a user of NetBSD. I think anyone should confirm if my
guess is correct.
Kosaki-san, if you seriously want to support NetBSD, I'd like you to
be a platform maintainer for NetBSD.

Nope. I'm not NetBSD user.

Why do all the tests need to be passed on a platform that
there is no maintainer for?

I think, no maintainer != we should ignore such platform. It only mean
we care a little than supported one.

Updated by naruse (Yui NARUSE) over 12 years ago

(2012/04/24 19:44), KOSAKI Motohiro wrote:

kosaki (Motohiro KOSAKI) wrote:

On Mon, Apr 23, 2012 at 11:17 PM, Yusuke Endoh wrote:

I suggest to make Kernel#fork raise a NotImplementedError on NetBSD
5.0+.
Fortunately, the tests already have a guard for NotImplementedError
because there is a supported platform that does not support
Kernel#fork (you know, windows).
Even on the platform, SEGV is not allowed, in principle.

No. it works if user don't use threads. So, one option is, fork after
thread.new raise an exception on all platform.

It is wrong.
Ruby 1.9 makes timer thread even if there is only the main thread.

You don't understand the issue. On other OSs, async-signal-unsafe function usage
makes unlocked libc lock issue. therefore our current fork code works.
It kill timer thread
at once.

You don't know the original issue, what you are saying seems recent one.
see also http://bugs.ruby-lang.org/issues/show/270

--
NARUSE, Yui

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

On Tue, Apr 24, 2012 at 12:13 PM, NARUSE, Yui wrote:

(2012/04/24 19:44), KOSAKI Motohiro wrote:

kosaki (Motohiro KOSAKI) wrote:

On Mon, Apr 23, 2012 at 11:17 PM, Yusuke Endoh wrote:
 > I suggest to make Kernel#fork raise a NotImplementedError on NetBSD
 > 5.0+.
 > Fortunately, the tests already have a guard for NotImplementedError
 > because there is a supported platform that does not support
 > Kernel#fork (you know, windows).
 > Even on the platform, SEGV is not allowed, in principle.

 No. it works if user don't use threads. So, one option is, fork after
 thread.new raise an exception on all platform.

It is wrong.
Ruby 1.9 makes timer thread even if there is only the main thread.

You don't understand the issue. On other OSs, async-signal-unsafe function usage
makes unlocked libc lock issue. therefore our current fork code works.
It kill timer thread
at once.

You don't know the original issue, what you are saying seems recent one.
see also http://bugs.ruby-lang.org/issues/show/270

I don't? Why? Even though I joined bug#270 discussion.

Updated by normalperson (Eric Wong) over 12 years ago

KOSAKI Motohiro wrote:

(4/24/12 6:55 AM), Yusuke Endoh wrote:

2012/4/24, KOSAKI Motohiro:

Do you want raise NotImplementError on all platform?

My answer is yes, if the problem occurs actually.

So, one option is, fork after thread.new raise
an exception on all platform.

Looks good.

okey. at first, I'd like to add a warnings and observe how much apps
makes whiny.

Shouldn't the presence of the GVL allow Thread.new + fork to be safe in
pure Ruby code? Also, the async-signal-safe requirement for fork() may
be dropped in future versions of POSIX:
http://austingroupbugs.net/view.php?id=62

Updated by kosaki (Motohiro KOSAKI) over 12 years ago

On Tue, Apr 24, 2012 at 2:25 PM, Eric Wong wrote:

KOSAKI Motohiro wrote:

(4/24/12 6:55 AM), Yusuke Endoh wrote:

2012/4/24, KOSAKI Motohiro:

Do you want raise NotImplementError on all platform?

My answer is yes, if the problem occurs actually.

So, one option is, fork after thread.new raise
an exception on all platform.

Looks good.

okey. at first, I'd like to add a warnings and observe how much apps
makes whiny.

Shouldn't the presence of the GVL allow Thread.new + fork to be safe in
pure Ruby code?  Also, the async-signal-safe requirement for fork() may
be dropped in future versions of POSIX:
http://austingroupbugs.net/view.php?id=62

I think you misunderstand this URL. Currently fork() itself is one of
async-signal-safe function (i.e. can be called from signal handler).
but it shouldn't.
Linux and any other OSs can implement it because we have pthead_atfork(),
therefore austin plan to remove the lie statement. We still can't call
async-signal-safe function after fork on multi-threads enviroment.

Actions #21

Updated by naruse (Yui NARUSE) over 12 years ago

Following patch to NetBSD fixes this issue.
thanks @_enami

Index: pthread.c

RCS file: /cvsroot/src/lib/libpthread/pthread.c,v
retrieving revision 1.133
diff -u -p -r1.133 pthread.c
--- pthread.c 22 Mar 2012 20:01:18 -0000 1.133
+++ pthread.c 24 Apr 2012 16:15:55 -0000
@@ -240,7 +240,7 @@ pthread__fork_callback(void)
struct __pthread_st *self;

    /* lwpctl state is not copied across fork. */
  •   if (_lwp_ctl(LWPCTL_FEATURE_CURCPU, &pthread__first->pt_lwpctl)) {
    
  •   if (_lwp_ctl(LWPCTL_FEATURE_CURCPU, &pthread_self()->pt_lwpctl)) {
              err(1, "_lwp_ctl");
      }
      self = pthread__self();
    

Updated by rudolf (r stu3) over 12 years ago

naruse (Yui NARUSE) wrote:

Following patch to NetBSD fixes this issue.
thanks @_enami
[...]

I've tested the patch and can no more reproduce the problem. Thanks!

Updated by naruse (Yui NARUSE) over 12 years ago

  • Status changed from Feedback to Third Party's Issue

rudolf (r stu3) wrote:

naruse (Yui NARUSE) wrote:

Following patch to NetBSD fixes this issue.
thanks @_enami
[...]

I've tested the patch and can no more reproduce the problem. Thanks!

enami committed it to NetBSD current as rev 1.134.
http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libpthread/pthread.c?rev=1.134&content-type=text/x-cvsweb-markup&only_with_tag=MAIN
Thank you for reporting!

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0