Project

General

Profile

Actions

Bug #2025

closed

problem with pthread handling on non NPTL platform

Added by Petr.Salinger@seznam.cz (Petr Salinger) over 14 years ago. Updated over 11 years ago.

Status:
Closed
Target version:
ruby -v:
1.9.1.243
Backport:
[ruby-core:25217]

Description

=begin
I tried to fix some testsuite failures on GNU/kFreeBSD,
http://bugs.debian.org//cgi-bin/bugreport.cgi?bug=542927.
I observed some problems in the pthread related code.
The hang in 1st test in
http://redmine.ruby-lang.org/issues/show/1525
also applies for us.

IMO, the ruby should try to work under any POSIX pthread
conforming implementation, not only NPTL.
The code audit in this area seems needed.

There are some problems with handling of fork()/exec().
There really should be reinitialization of locks in child,
the timer should be started using pthread_once(), the current
approach is fragile and might lead to start of more timer threads.
http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_once.html

In general, I do not understand how code in thread_pthread.c:

static pthread_t timer_thread_id;
static pthread_cond_t timer_thread_cond = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t timer_thread_lock = PTHREAD_MUTEX_INITIALIZER;
rb_thread_create_timer_thread()
thread_timer()

could survive correctly fork(), see also
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

I really doubt the following code in process.c
for rb_f_fork(VALUE obj) is correct:

  switch (pid = rb_fork(0, 0, 0, Qnil)) {
    case 0:

#ifdef linux
after_exec();
#endif
rb_thread_atfork();
if (rb_block_given_p()) {
int status;

          rb_protect(rb_yield, Qundef, &status);
          ruby_stop(status);
      }

The conditional after_exec() shouldn't be here.
There is already "after_fork()" at line 2331,
which is executed for both parent and child.
The exception is when chfunc is not NULL,
then it is not executed at all.

The bug is timing dependent, i.e. there is a race condition.
Sometimes the child process would have 2 timer threads, sometimes
it would have the expected 1.

Only the probability of 2 is higher on linuxthreads compared to NPTL,
but it can happen under any pthread implementation.

Ruby should not use PTHREAD_CREATE_DETACHED and after that use pthread_join.
http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_join.html:
"The behavior is undefined if the value specified by the thread argument
to pthread_join() does not refer to a joinable thread."

Ruby should use pthread_sigmask() instead of sigprocmask()
when available and so on.
http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_sigmask.html:
"The use of the sigprocmask() function is unspecified in a

This would work correctly on both linuxthreads/NPTL and should on any
POSIX pthread conforming implementation.
Ideally, ruby would not require full conformance, but also
accept some known exceptions, like our getpid() difference.
=end


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #1525: Deadlock in Ruby 1.9's VM caused by ConditionVariable.wait and fork?Closedko1 (Koichi Sasada)05/28/2009Actions
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0