Project

General

Profile

Actions

Bug #1341

closed

pthread_cond_timedwait failing in 1.9.1-p0 thread tests on HP-UX 11i v2

Added by graza (Graham Agnew) almost 16 years ago. Updated about 12 years ago.

Status:
Rejected
Assignee:
-
Target version:
ruby -v:
ruby 1.9.1p0 (2009-01-30 revision 21907) [ia64-hpux11.23]
Backport:
[ruby-core:23082]

Description

=begin
I have been trying to compile and test 1.9.1-p0 on HP-UX 11i v2. When running the tests, the threads tests crash with the following bug:

[BUG] pthread_cond_timedwait: 22
ruby 1.9.1p0 (2009-01-30 revision 21907) [ia64-hpux11.23]

-- control frame ----------

-- Ruby level backtrace information-----------------------------------------

[NOTE]
You may encounter a bug of Ruby interpreter. Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

The errno 22 means EINVAL. I put some print statements into the thread_pthread.c file to work out what was going on and it looks like there's a condition variable that is being initialised twice.
=end

Actions #1

Updated by yugui (Yuki Sonoda) over 15 years ago

  • Priority changed from Normal to 3
  • Target version changed from 1.9.1 to 2.0.0

=begin

=end

Actions #2

Updated by docwhat (Christian Höltje) about 15 years ago

=begin
This also happens when running "make test" in solaris but not "env RUBYLIB=./lib ./ruby test/ruby/test_thread.rb"

test_thread.rb ....bootstraptest.tmp.rb:6: [BUG] pthread_cond_timedwait: 22
ruby 1.9.1p243 (2009-07-16 revision 24175) [sparc-solaris2.8]

-- control frame ----------
c:0010 p:---- s:0028 b:0028 l:000027 d:000027 CFUNC :join
c:0009 p:0013 s:0024 b:0024 l:0018b8 d:000023 BLOCK bootstraptest.tmp.rb:6
c:0008 p:---- s:0020 b:0020 l:000019 d:000019 FINISH
c:0007 p:---- s:0018 b:0018 l:000017 d:000017 CFUNC :each
c:0006 p:0018 s:0015 b:0015 l:0018b8 d:002430 BLOCK bootstraptest.tmp.rb:3
c:0005 p:---- s:0012 b:0012 l:000011 d:000011 FINISH
c:0004 p:---- s:0010 b:0010 l:000009 d:000009 CFUNC :times
c:0003 p:0013 s:0007 b:0006 l:0018b8 d:001fd8 EVAL bootstraptest.tmp.rb:2
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:0018b8 d:0018b8 TOP

-- Ruby level backtrace information-----------------------------------------
bootstraptest.tmp.rb:6:in join' bootstraptest.tmp.rb:6:in block (2 levels) in '
bootstraptest.tmp.rb:3:in each' bootstraptest.tmp.rb:3:in block in '
bootstraptest.tmp.rb:2:in times' bootstraptest.tmp.rb:2:in '

[NOTE]
You may encounter a bug of Ruby interpreter. Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

E......

It then hangs at this test.
=end

Actions #3

Updated by mame (Yusuke Endoh) over 14 years ago

  • Assignee set to mame (Yusuke Endoh)

=begin
Hi,

I guess this is the limitation of Solaris:

Maybe, HP-UX has the same limitation, though I cannot find the
evidence.

I wrote a workaround patch:

diff --git a/thread_pthread.c b/thread_pthread.c
index e6295db..7387724 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -633,6 +633,35 @@ native_sleep(rb_thread_t *th, struct timeval *tv)
(unsigned long)ts.tv_sec, ts.tv_nsec);
r = pthread_cond_timedwait(&th->native_thread_data.sleep_cond,
&th->interrupt_lock, &ts);

  •  if (r == EINVAL) {
    
  •      /* workaround for Solaris: wait by MEGA_SEC's.
    
  •       * on Solaris, pthread_cond_timedwait fails with EINVAL
    
  •       * if time is too far from now.  [Bug #1341]
    
  •       * - http://docs.sun.com/app/docs/doc/806-0630/6j9vkb8ct?a=view
    
  •       * - http://bugs.opensolaris.org/view_bug.do?bug_id=4038480
    
  •       */
    

+#define MEGA_SEC 1000000

  •      struct timeval ltv = *tv;
    
  •      r = ETIMEDOUT;
    
  •      while (r == ETIMEDOUT && ltv.tv_sec > MEGA_SEC) {
    
  •  	ts.tv_sec = tvn.tv_sec + MEGA_SEC;
    
  •  	ts.tv_nsec = tvn.tv_usec * 1000;
    
  •  	ltv.tv_sec -= MEGA_SEC;
    
  •  	r = pthread_cond_timedwait(&th->native_thread_data.sleep_cond,
    
  •  				   &th->interrupt_lock, &ts);
    
  •  	if (r && r != ETIMEDOUT) rb_bug_errno("pthread_cond_timedwait", r);
    
  •      }
    
  •      if (r == ETIMEDOUT) {
    
  •  	ts.tv_sec = tvn.tv_sec + ltv.tv_sec;
    
  •  	ts.tv_nsec = (tvn.tv_usec + ltv.tv_usec) * 1000;
    
  •  	if (ts.tv_nsec >= PER_NANO){
    
  •  		    ts.tv_sec += 1;
    
  •  		    ts.tv_nsec -= PER_NANO;
    
  •  	}
    
  •  	r = pthread_cond_timedwait(&th->native_thread_data.sleep_cond,
    
  •  				   &th->interrupt_lock, &ts);
    
  •      }
    
  •  }
     if (r && r != ETIMEDOUT) rb_bug_errno("pthread_cond_timedwait", r);
    
     thread_debug("native_sleep: pthread_cond_timedwait end (%d)\n", r);
    

--
Yusuke Endoh
=end

Actions #4

Updated by graza (Graham Agnew) over 14 years ago

=begin
Today I downloaded, patched, and compiled the latest snapshot.

On HP-UX, that patch stopped the rb_bug_errno happening, although test/ruby/test_thread.rb scripts blocked indefinitely. On being interrupted it was the following test:

#874 test_thread.rb:34:in `<top (required)>':

The code in your patch doesn't look right to me. Shouldn't the code re-fetch the time using gettimeofday each time through the loop, and then add the MEGA_SEC to that? As it is, it's wrong because it adds a MEGA_SEC to tvn, so after one MEGA_SEC it will enter a hard loop.

Cheers,
Gra.
=end

Actions #5

Updated by mame (Yusuke Endoh) over 14 years ago

=begin
Hi,

2010/5/6 Graham Agnew :

On HP-UX, that patch stopped the rb_bug_errno happening, although the test/ruby/test_io.rb and test/ruby/test_thread.rb scripts both blocked indefinitely.

The code in your patch doesn't look right to me. ?Shouldn't the code re-fetch the time using gettimeofday each time through the loop, and then add the MEGA_SEC to that? ?As it is, it's wrong because it adds a MEGA_SEC to tvn, so after one MEGA_SEC it will enter a hard loop.

Thank you for your testing! How about the following patch?

To tell the truth, I'm writing a patch without test because I
don't have HP-UX. If this is wrong again, It is really helpful
for you to correct the patch by yourself.

diff --git a/thread_pthread.c b/thread_pthread.c
index e6295db..3c13f72 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -631,8 +631,29 @@ native_sleep(rb_thread_t *th, struct timeval *tv)
int r;
thread_debug("native_sleep: pthread_cond_timedwait start (%ld, %ld)\n",
(unsigned long)ts.tv_sec, ts.tv_nsec);

  •    again:
     r = pthread_cond_timedwait(&th->native_thread_data.sleep_cond,
     			   &th->interrupt_lock, &ts);
    
  •  if (r == EINVAL) {
    
  •      /* workaround for Solaris: wait by MEGA_SEC's.
    
  •       * on Solaris, pthread_cond_timedwait fails with EINVAL
    
  •       * if time is too far from now.  [Bug #1341]
    
  •       * - http://docs.sun.com/app/docs/doc/806-0630/6j9vkb8ct?a=view
    
  •       * - http://bugs.opensolaris.org/view_bug.do?bug_id=4038480
    
  •       */
    

+#define MEGA_SEC 1000000

  •      struct timespec lts;
    
  •      r = ETIMEDOUT;
    
  •      while (r == ETIMEDOUT) {
    
  •  	gettimeofday(&tvn, NULL);
    
  •  	lts.tv_sec = tvn.tv_sec + MEGA_SEC;
    
  •  	lts.tv_nsec = tvn.tv_usec * 1000;
    
  •  	if (lts.tv_sec >= ts.tv_sec) goto again;
    
  •  	r = pthread_cond_timedwait(&th->native_thread_data.sleep_cond,
    
  •  				   &th->interrupt_lock, &lts);
    
  •  	if (r && r != ETIMEDOUT) rb_bug_errno("pthread_cond_timedwait", r);
    
  •      }
    
  •  }
     if (r && r != ETIMEDOUT) rb_bug_errno("pthread_cond_timedwait", r);
    
     thread_debug("native_sleep: pthread_cond_timedwait end (%d)\n", r);
    

--
Yusuke Endoh

=end

Actions #6

Updated by graza (Graham Agnew) over 14 years ago

=begin
Hi Yusuke,

That code looks better although I'm still getting test/ruby/test_thread.rb blocking indefinitely. I will see if I can attach to the process with a debugger and figure out what it's blocked on.

Thanks,
Gra.
=end

Actions #7

Updated by shyouhei (Shyouhei Urabe) over 14 years ago

  • Status changed from Open to Assigned

=begin

=end

Updated by ko1 (Koichi Sasada) over 13 years ago

Endo-san,

Can we close this issue?

Updated by mame (Yusuke Endoh) over 13 years ago

I guess it still reproduces on Solaris.
I have no idea about HP-UX.

--
Yusuke Endoh

Updated by mame (Yusuke Endoh) over 13 years ago

  • Status changed from Assigned to Open
  • Assignee deleted (mame (Yusuke Endoh))

Hello,

Now, following two links are dead. Do anyone know new URLs?

Here.

http://download.oracle.com/docs/cd/E19683-01/816-0216/6m6ngupgv/index.html

EINVAL
Invalid argument. For cond_init(), type is not a recognized type. For cond_timedwait(), the specified number of seconds, abstime, is greater than current_time + 100,000,000, where current_time is the current time, or the number of nanoseconds is greater than or equal to 1,000,000,000.

The problem that I now focus on is that pthread_cond_timedwait may fail
with EINVAL if an argument is greater than current_time + 100,000,000
on Solaris.
The patch of [ruby-core:29702] is too old and cannot be applied, so
I rewrote and committed a new patch at r32409. Now, "make test" passes
on Solaris. Congrats.

Unfortunately, the original issue that OP reported was a different problem.
But I guess that there is no hope of fixing the issue. At least I cannot.
So I resign the assignee of this ticket. Sorry for late action.

--
Yusuke Endoh

Updated by naruse (Yui NARUSE) over 13 years ago

  • Status changed from Open to Feedback

Feedback about HP-UX is welcome

Updated by kosaki (Motohiro KOSAKI) over 13 years ago

  • Subject changed from pthread_cond_timedwait failing in 1.9.1-p0 thread tests to pthread_cond_timedwait failing in 1.9.1-p0 thread tests on HP-UX 11i v2

Updated by ko1 (Koichi Sasada) about 12 years ago

  • Target version changed from 2.0.0 to 2.6

Please tell us if you have HP-UX.

Updated by mame (Yusuke Endoh) about 12 years ago

  • Status changed from Feedback to Rejected

HP-UX is not supported. I'm sorry, but please create a patch that works yourself.
If you provide us the patch and it looks benign to other platforms, we may apply it to trunk.

--
Yusuke Endoh

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0