Project

General

Profile

Actions

Bug #6825

closed

forking and pthread_cond_timedwait: Invalid argument (EINVAL) on OS X / 1.9.3-p194

Added by xentronium (Mark A) over 11 years ago. Updated almost 7 years ago.

Status:
Closed
Target version:
-
[ruby-core:<unknown>]

Description

https://gist.github.com/47e48301aea114e7b1d3 here is the gist with required setup to reproduce bug. Also crash log and stdout.

It seems that forking is essential for this setup to crash. Also, if you use database connection in some way prior to forking, it might not crash (however, with more complex code it still does).

ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin10.8.0]
OS X 10.6.8

hostinfo output:

Mach kernel version:
	 Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386
Kernel configured for up to 4 processors.
2 processors are physically available.
4 processors are logically available.
Processor type: i486 (Intel 80486)
Processors active: 0 1 2 3
Primary memory available: 8.00 gigabytes
Default processor set: 88 tasks, 627 threads, 4 processors
Load average: 0.55, Mach factor: 3.43

compiled with gcc version:

i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Actions #1

Updated by xentronium (Mark A) over 11 years ago

Also notably ruby 1.8.7 does not crash.

Actions #2

Updated by xentronium (Mark A) over 11 years ago

Ubuntu 11.04 doesn't crash with 1.9 ruby either.

Updated by xentronium (Mark A) over 11 years ago

Ruby 1.9.2 doesn't crash either.

Updated by xentronium (Mark A) over 11 years ago

ruby 2.0.0dev (2012-08-02 trunk 36596) [x86_64-darwin10.8.0] also does not crash.

Updated by xentronium (Mark A) over 11 years ago

OS X 10.8 seems to be unaffected.

Updated by xentronium (Mark A) over 11 years ago

Confirmed with another 10.6 / 1.9.3-p194.

Updated by xentronium (Mark A) over 11 years ago

After some more fiddling it looks like it's mysql2 problem, not ruby's. Will duplicate issue there.

Actions #8

Updated by drbrain (Eric Hodel) over 11 years ago

  • Status changed from Open to Feedback

Updated by xentronium (Mark A) over 11 years ago

Update: I greatly simplified my test-case.

https://gist.github.com/47e48301aea114e7b1d3

If I remove sleeping threads on line 31 or require 'active_record' on line 1, bug stops reproducing.

Returning back to ruby-lang, as there is no mysql2 there anymore. Crash is still with the same error.

Ideas, suggestions?

Updated by xentronium (Mark A) over 11 years ago

Oh, it also reproduced on ruby 2.0.0.dev from current git, so I guess it is still not fixed.

Updated by xentronium (Mark A) over 11 years ago

Another update: I opened up a file active_record.rb inside installed activerecord gem and completely commented it out (so that even ActiveRecord is not defined after require 'active_record'). Still crashes. I guess that takes care of gems and everything, so the problem should be between ruby, rubygems and standard library.

Line require 'active_record' is still required for whole setup to crash for some reason.

Hope that gives you some idea.

Updated by drbrain (Eric Hodel) over 11 years ago

=begin
I can't reproduce on OS X 10.8 ruby 2.0.0dev (2012-08-03 trunk 36602) [x86_64-darwin12.0.0]

I modified your script to remove require 'active_record' and altered the main thread to sleep forever. This ensures that mysql and other C extensions are not loaded. It ran for over two minutes without problems.

Can you reproduce this with require 'mysql' and not active_record?

Can you show the console output with your modified active_record.rb (the loaded features section is of particular interest).

Here is what I used:

require 'net/http'

Thread.abort_on_exception = true
class Worker
def initialize
@tasks = []
work
end

def work
  Thread.new do
    loop do
      task = nil
      task = @tasks.shift if @tasks.length > 0
      task.call if task
      sleep(0.25)
    end
  end
end

def schedule(&block)
  @tasks << block
end

end

pid = fork do
class TestLoop
def initialize
@worker = Worker.new
(1..10).map { Thread.new { loop { sleep(0.5) } } }
end

  def run
    loop do
      @worker.schedule { puts Net::HTTP.get("github.com", "/").length }
      sleep(0.25)
    end
  end
end

TestLoop.new.run

end

sleep

(I don't have mysql installed to check.)

=end

Updated by xentronium (Mark A) over 11 years ago

I can't reproduce on OS X 10.8 ruby 2.0.0dev (2012-08-03 trunk 36602) [x86_64-darwin12.0.0]

That seems right, 10.8 seems to be unaffected.

Can you show the console output with your modified active_record.rb (the loaded features section is of particular interest).

1.9.3p194 :002 > require 'active_record'
 => true 
1.9.3p194 :003 > $LOADED_FEATURES.each(&method(:puts))
enumerator.so
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin10.8.0/enc/encdb.bundle
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin10.8.0/enc/trans/transdb.bundle
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/defaults.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin10.8.0/rbconfig.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/deprecate.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/exceptions.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/e2mmap.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/init.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/workspace.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/inspector.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/context.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/extend-command.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/output-method.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/notifier.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/slex.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/ruby-token.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/ruby-lex.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/src_encoding.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/magic-file.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin10.8.0/readline.bundle
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/input-method.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/locale.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/version.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/requirement.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/platform.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/specification.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/path_support.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/completion.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/prettyprint.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/pp.rb
/Users/mark/.rvm/scripts/irbrc.rb
/Users/mark/.rvm/gems/ruby-1.9.3-p194@ruby-pthread-bug/gems/activerecord-3.2.7/lib/active_record.rb

If rubygems don't get hit (as in add gem directory to $LOAD_PATH manually), bug doesn't seem to trigger. I am not sure, whether rubygems themselves are the reason or something non-trivial needs to happen for it to show itself.

Actions #14

Updated by xentronium (Mark A) over 11 years ago

Switching require 'active_record' with require 'mysql2' still crashes the interpreter as long as required file is taken using gems mechanism.

Updated by drbrain (Eric Hodel) over 11 years ago

  • Status changed from Feedback to Open

Updated by kosaki (Motohiro KOSAKI) over 11 years ago

At least, require 'mysql2' version nor drbrain version don't crash on my Mountain Lion environment.

Updated by xentronium (Mark A) over 11 years ago

Yeah, that seems to be restricted to snow leopard.

Updated by mame (Yusuke Endoh) over 11 years ago

  • Status changed from Open to Assigned
  • Assignee set to kosaki (Motohiro KOSAKI)

Kosaki-san, do you have any idea to addres this issue?

--
Yusuke Endoh

Updated by kosaki (Motohiro KOSAKI) over 11 years ago

  • Assignee changed from kosaki (Motohiro KOSAKI) to mrkn (Kenta Murata)

Kosaki-san, do you have any idea to addres this issue?

I have no idea. unfortunately snow leopard is too old and i have no chance to get it.
@mrkn (Kenta Murata), do you have any chance to see this issue?

Updated by mrkn (Kenta Murata) over 11 years ago

I don't have snow-leopard environment, so I cannot investigate this issue.

Updated by kosaki (Motohiro KOSAKI) over 11 years ago

  • Status changed from Assigned to Closed

I don't have snow-leopard environment, so I cannot investigate this issue.

OK. Thank you.
This looks like old OS X bug but we have no way to dig. give up. I'd like to close this feature as won't fix.
Anyway snow leopard is no longer supported.

To Mark, please reopen when you find exact reason and fixing way. we are very sorry for inconvenience.

Updated by Eregon (Benoit Daloze) over 11 years ago

mrkn (Kenta Murata) wrote:

I don't have snow-leopard environment, so I cannot investigate this issue.

I do still have a snow-leopard environment and I can reproduce with
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin10.8.0]
and trunk (r37359). I'll try to investigate if I find time.

Updated by Eregon (Benoit Daloze) over 11 years ago

  • Status changed from Closed to Assigned
  • Assignee changed from mrkn (Kenta Murata) to Eregon (Benoit Daloze)

This is also true with latest trunk (r37462).

I'm assigning to myself.

bug6825.rb:31: [BUG] pthread_cond_timedwait: Invalid argument (EINVAL)
ruby 2.0.0dev (2012-11-04 trunk 37462) [x86_64-darwin10.8.0]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0013 e:000012 CFUNC :sleep
c:0004 p:0007 s:0009 e:000008 BLOCK bug6825.rb:31 [FINISH]
c:0003 p:---- s:0007 e:000006 CFUNC :loop
c:0002 p:0005 s:0004 e:000003 BLOCK bug6825.rb:31 [FINISH]
c:0001 p:---- s:0002 e:000001 TOP [FINISH]

bug6825.rb:31:in block (2 levels) in initialize' bug6825.rb:31:in loop'
bug6825.rb:31:in block (3 levels) in initialize' bug6825.rb:31:in sleep'

Updated by Eregon (Benoit Daloze) over 11 years ago

I poked around an produced a core dump (the bug would not reproduce under gdb with a breakpoint set).

Arguments to pthread_cond_timedwait() seem valid, in particular the timespec is about 500ms in the future.
Other calls to pthread_cond_timedwait() always return ETIMEDOUT or 0.

I saw rb_thread_t::native_thread_data.sleep_cond was weirdly initialized.
It is not initialized in native_thread_init() if HAVE_PTHREAD_CONDATTR_INIT is undefined.
And it is used in any case in ubf_pthread_cond_signal().
Maybe checks for HAVE_PTHREAD_CONDATTR_INIT should not be done in native_thread_init() and native_thread_destroy() since these functions already do the right checks?
This should be unrelated though, since OS X has pthread_condattr_init().

It might be related to GVL release by multiple threads but I have no clue.
It does not seem related directly to the parallel DNS resolution, since some traces have only threads in native_cond_timedwait().
And from the "only reproducible on snow-leopard" argument, it seems snow-leopard pthread's bug.

@kosaki (Motohiro KOSAKI) @mrkn (Kenta Murata) Would it be useful if I could provide you the core dump and other info?

Updated by kosaki (Motohiro KOSAKI) over 11 years ago

@kosaki (Motohiro KOSAKI) @mrkn (Kenta Murata) Would it be useful if I could provide you the core dump and other info?

Thanks!
r35672 seems broke this area and I'll fix it soon. However there is no r35672 in 1.9.3 branch and 1.9.3 seems correct. hmm...
Could you please try 1.9.3 branch too?

Actions #26

Updated by kosaki (Motohiro KOSAKI) over 11 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r37474.
Mark, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • thread_pthread.c (native_thread_init, native_thread_destroy):
    removed HAVE_PTHREAD_CONDATTR_INIT check because this silly
    #ifdef makes use-uninitialized-var issue and (2) native_cond_initialize()
    already have a right platform and caller don't need any additional care.
    [Bug #6825]

Updated by kosaki (Motohiro KOSAKI) over 11 years ago

  • Status changed from Closed to Assigned

Reopened. because r37474 only fixed trunk.

Actions #28

Updated by usa (Usaku NAKAMURA) over 11 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby master to Backport193
  • Assignee changed from Eregon (Benoit Daloze) to usa (Usaku NAKAMURA)

Kosaki-san, you can move a ticket to Backport because you are a committer.
So, pelase do so instead of only changing the status to Open.

Updated by Eregon (Benoit Daloze) over 11 years ago

kosaki (Motohiro KOSAKI) wrote:

@kosaki (Motohiro KOSAKI) @mrkn (Kenta Murata) Would it be useful if I could provide you the core dump and other info?

Thanks!
r35672 seems broke this area and I'll fix it soon. However there is no r35672 in 1.9.3 branch and 1.9.3 seems correct. hmm...
Could you please try 1.9.3 branch too?

Unfortunately, r37474 does not seem to solve the problem (but it was definitely a potential problem).
This is expected because snow leopard has pthread_condattr_init().
So I don't know the reason for the bug.

Actions #30

Updated by kosaki (Motohiro KOSAKI) over 11 years ago

  • Tracker changed from Backport to Bug
  • Project changed from Backport193 to Ruby master
  • Assignee changed from usa (Usaku NAKAMURA) to Eregon (Benoit Daloze)

Hi Eregon,

Oops, I'm sorry. Perhaps I'm still overlooking anything else. Can you please share me your config.h and core file and build revision number?
I'm willing to look core file myself.

Updated by Eregon (Benoit Daloze) almost 7 years ago

  • Status changed from Assigned to Closed

This bug report is on a unsupported Ruby version, closing.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0