Project

General

Profile

Actions

Bug #6246

closed

1.9.3-p125 intermittent segfault

Added by jshow (Jodi Showers) over 12 years ago. Updated over 12 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-linux]
Backport:
[ruby-core:44084]

Description

I've seen this behavior in 1.9.3p0 and 1.9.3p125

Intermittently, a rails 3.2 rake task will break while compiling assets - about 1 in 10 runs will segfault. Our code is 10K's of lines of proprietary code. I am working on a simplified version you can use.

RAILS_ENV=new_staging bundle exec rake assets:precompile --trace

command: https://gist.github.com/2254457 (stacktrace and gem list)

backtrace : https://gist.github.com/2286260

Operating System : Debian Squeeze 6.0.4
Ruby installed through RVM 1.11.3


Files

noname (500 Bytes) noname Anonymous, 04/03/2012 07:53 AM
noname (500 Bytes) noname Anonymous, 04/04/2012 02:23 AM
noname (500 Bytes) noname Anonymous, 04/04/2012 11:53 AM
noname (500 Bytes) noname Anonymous, 04/04/2012 11:53 AM
valgrind_full.txt.zip (781 KB) valgrind_full.txt.zip valgrind - verbose jshow (Jodi Showers), 04/05/2012 12:36 AM

Updated by Anonymous over 12 years ago

On Tue, Apr 03, 2012 at 04:22:04AM +0900, jshow (Jodi Showers) wrote:

Issue #6246 has been reported by jshow (Jodi Showers).


Bug #6246: 1.9.3-p125 intermittent segfault
https://bugs.ruby-lang.org/issues/6246

Author: jshow (Jodi Showers)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-linux]

I've seen this behavior in 1.9.3p0 and 1.9.3p125

Intermittently, a rails 3.2 rake task will break while compiling assets - about 1 in 10 runs will segfault. Our code is 10K's of lines of proprietary code. I am working on a simplified version you can use.

RAILS_ENV=new_staging bundle exec rake assets:precompile --trace

command: https://gist.github.com/2254457 (stacktrace and gem list)

backtrace : https://gist.github.com/2286260

Operating System : Debian Squeeze 6.0.4
Ruby installed through RVM 1.11.3

Can you either provide the application so that we can try to reproduce
the issue, somehow reduce the list of moving parts, or run valgrind against
your app?

The size of your gem list is daunting, it could be anything in there.

If you need, I can give you the valgrind command to use. :-)

--
Aaron Patterson
http://tenderlovemaking.com/

Updated by jshow (Jodi Showers) over 12 years ago

yes please - never used valgrind but will learn.

thanks

Updated by jshow (Jodi Showers) over 12 years ago

Aaron, I ran a valgrind as follows:

valgrind /home/deploy/.rvm/rubies/ruby-1.9.3-p125/bin/ruby -Iext:test /home/deploy/.rvm/gems/ruby-1.9.3-p125@boogala/bin/bundle exec rake RAILS_ENV=new_staging assets:precompile

the results are here: https://gist.github.com/2288812

Long before the actual startup of the rake task (rails boot) there were a great number of valgrind outputs -

==11285== Use of uninitialised value of size 8
==11285== at 0x4EA3203: gc_mark_children (gc.c:1867)
==11285== by 0x4EA491C: mark_current_machine_context (gc.c:1650)
==11285== by 0x4EA4A3D: gc_marks (gc.c:2474)
==11285== by 0x4EA5DBF: rb_newobj (gc.c:2195)
==11285== by 0x4F46115: str_alloc (string.c:373)
==11285== by 0x4FA0132: vm_call0 (vm_eval.c:79)
==11285== by 0x4FA0CC9: rb_funcall (vm_eval.c:236)
==11285== by 0x4ED926E: rb_obj_alloc (object.c:1603)
==11285== by 0x4EDA835: rb_obj_dup (object.c:315)
==11285== by 0xDE7034D: path_initialize (pathname.c:41)
==11285== by 0x4FA0132: vm_call0 (vm_eval.c:79)
==11285== by 0x4ED931F: rb_class_new_instance (object.c:1636)
==11285==

I can send all of these if you feel they'll help. None of these were outputted after the actual rake task began.

Updated by Anonymous over 12 years ago

On Tue, Apr 03, 2012 at 11:15:14AM +0900, jshow (Jodi Showers) wrote:

Issue #6246 has been updated by jshow (Jodi Showers).

Aaron, I ran a valgrind as follows:

valgrind /home/deploy/.rvm/rubies/ruby-1.9.3-p125/bin/ruby -Iext:test /home/deploy/.rvm/gems/ruby-1.9.3-p125@boogala/bin/bundle exec rake RAILS_ENV=new_staging assets:precompile

Try running valgrind with these flags:

--num-callers=50 --error-limit=no --partial-loads-ok=yes --undef-value-errors=no

Post all of the errors in a gist if you don't mind.

Can you also try your code against edge 1.9.3? Nari-san has fixed a
segv error in that branch. If we're lucky it's the same issue, and we
don't have to debug anymore. ;)

--
Aaron Patterson
http://tenderlovemaking.com/

Updated by jshow (Jodi Showers) over 12 years ago

Aaron,

attached is the full output from valgrind using your options - https://gist.github.com/2296847

I'm presently looking for the edge branch - not seeing it. Do you mean head?

thanks

Updated by Anonymous over 12 years ago

On Wed, Apr 04, 2012 at 10:18:26AM +0900, jshow (Jodi Showers) wrote:

Issue #6246 has been updated by jshow (Jodi Showers).

Aaron,

attached is the full output from valgrind using your options - https://gist.github.com/2296847

I'm presently looking for the edge branch - not seeing it. Do you mean head?

I guess? I mean the ruby_1_9_3 branch:

https://github.com/ruby/ruby/tree/ruby_1_9_3

:-)

--
Aaron Patterson
http://tenderlovemaking.com/

Updated by Anonymous over 12 years ago

On Wed, Apr 04, 2012 at 10:18:26AM +0900, jshow (Jodi Showers) wrote:

Issue #6246 has been updated by jshow (Jodi Showers).

Aaron,

attached is the full output from valgrind using your options - https://gist.github.com/2296847

I'm thinking that your segv is happening in a subshell (or not the
process on which you're running valgrind).

Are there hours when I can meet you on IRC? It might be faster than
mail exchange here. :-)

--
Aaron Patterson
http://tenderlovemaking.com/

Actions #8

Updated by jshow (Jodi Showers) over 12 years ago

I'm in the north american eastern time zone

I can meet today if you can

11am eastern is doable for me - just over an hour away

In the meantime I'll start trying ruby-head

much thanks Aaron

Updated by jshow (Jodi Showers) over 12 years ago

Aaron, I've updated this gist with a run that uses a more direct command line - which I think was hiding some of the valgrind output

valgrind --num-callers=50 --error-limit=no --partial-loads-ok=yes --undef-value-errors=no rake RAILS_ENV=new_staging assets:precompile

https://gist.github.com/2296847

still it seems we're not getting much value from valgrind - it's reporting 0 errors from 0 contexts

I'm re-running now with -v and --leak-check=full for more

Updated by jshow (Jodi Showers) over 12 years ago

attached is a more verbose valgrind

$ valgrind --num-callers=50 --error-limit=no --partial-loads-ok=yes --undef-value-errors=no -v --leak-check=full rake RAILS_ENV=new_staging assets:precompile

Updated by hectormalot (Dennis de Reus) over 12 years ago

I have a similar problem when creating a new rails (3.2.3) app. It segfaults on the bundle install step. I think this is related, so I won't create a new bugreport.

Ruby -v: ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin11.3.0]

I found this when I created a new rails (3.2.3) project. Ruby gave a segfault when running bundle install. The details are in this gist: https://gist.github.com/2330262. I'm not sure when the issue arises exactly, but I've also had trouble running the tests from active_admin using ruby 1.9.3-p125, going back to 1.9.2 didn't gave any segmentation faults.

I'm sorry for not giving a more detailed bugreport, but I'm not that well versed in ruby yet. I hope that other with the same problem can add to this report.

Updated by jshow (Jodi Showers) over 12 years ago

Dennis, looking at your ruby trace, it looks like you've run into something else

It appears that you are on osx, and there are some documented (try google) about ssl and ruby - seems that the ssl libs shipped with osx are not compatible with the latest builds of ruby

Updated by mame (Yusuke Endoh) over 12 years ago

  • Status changed from Open to Assigned
  • Assignee set to authorNari (Narihiro Nakamura)

Hello,

Perhaps I could reproduce this issue with small example:

class Foo
def Foo.shutdown(*args)
p Bar.inspect.size
end
Bar = (0..300).map do |x|
x = x.to_s
ObjectSpace.define_finalizer(x, Foo.method(:shutdown))
x
end
end

This code causes segfault in my system.
The number (300) may vary depending on a system.

This looks like a bug of GC finalizer.
A finalizer invoked at process termination, seems to refer
GC'ed object, which lead to segfault. I guess.

In fact, the "logging" gem, which is included in OP's trace,
seems to use define_finalizer as a substitute for at_exit.

https://github.com/TwP/logging/blob/master/lib/logging.rb#L552

Nari, please check it.

--
Yusuke Endoh

Updated by jshow (Jodi Showers) over 12 years ago

Yusuke Endoh,

I have tried your example, but unfortunately couldn't cause a crash. I experimented with various values of as high as 3000.

I would appreciate whatever effort can be applied - I am literally stuck as 125 crashes so often it's unusable.

If I can provide more details I would be happy to do so

OT, but I don't receive any emails from redmine - I'm watching this issue, and have the general mail settings turned on. Hard to follow - I presently come back to visit regularly.

thanks.

Updated by normalperson (Eric Wong) over 12 years ago

"jshow (Jodi Showers)" wrote:

I would appreciate whatever effort can be applied - I am literally
stuck as 125 crashes so often it's unusable.

As Aaron mentioned, did you get a chance to try the ruby_1_9_3 branch?
Now that Ruby 1.9.3-p194 is out, maybe it's easier to give that a try.
Again, if we're lucky we it's the same issue we encountered with
Nokogiri and fixed in 1.9.3-p194.

OT, but I don't receive any emails from redmine - I'm watching this
issue, and have the general mail settings turned on. Hard to follow -
I presently come back to visit regularly.

I'm not sure if this redmine is configured emails users directly, but
subscribing to the ruby-core list works.

Updated by mame (Yusuke Endoh) over 12 years ago

Hello,

2012/4/24, jshow (Jodi Showers) :

I have tried your example, but unfortunately couldn't cause a crash. I
experimented with various values of as high as 3000.

It is difficult to reproduce a problem about GC with a small code.
Indeed, my example does not cause SEGV on my another system.

Can you try the following patch?
This is an ad-hoc patch that stops finalizer call at exit.
SEGV should disappear if my guess is right.
If it still causes SEGV, my example is another issue.

diff --git a/eval.c b/eval.c
index 4dddcaa..b2358f8 100644
--- a/eval.c
+++ b/eval.c
@@ -98,7 +98,6 @@ ruby_finalize_1(void)
{
ruby_sig_finalize();
GET_THREAD()->errinfo = Qnil;

  • rb_gc_call_finalizer_at_exit();
    }

void

--
Yusuke Endoh

Updated by jshow (Jodi Showers) over 12 years ago

I've been using p194 for about a day now - no crashes yet

I'll notify you here over the next while

if the crashing resumes, I'll try the patch Yusuke.

much thanks to you both.

Updated by cout (Paul Brannan) over 12 years ago

/home/deploy/.rvm/rubies/ruby-1.9.3-p125/lib/libruby.so.1.9(rb_gc_call_finalizer_at_exit+0x219) [0x7f9f37246359] gc.c:3044

/home/deploy/.rvm/rubies/ruby-1.9.3-p125/lib/libruby.so.1.9(ruby_cleanup+0x1ec) [0x7f9f37233fac] eval.c:101
/home/deploy/.rvm/rubies/ruby-1.9.3-p125/lib/libruby.so.1.9(ruby_run_node+0x25) [0x7f9f37234205] eval.c:244

So we're cleaning up, and:

#0 st_lookup (table=0x0, key=792, value=0x7fffffffc858) at st.c:330
#1 0x00007ffff7b1629f in search_method (recv=,
mid=792) at vm_method.c:374
#2 rb_method_entry_get_without_cache (recv=,
mid=792) at vm_method.c:393

Someone called a method on an object that had already been GC'd.

That someone is in the logging gem:

ObjectSpace.define_finalizer self, Logging.method(:shutdown)

Ruby makes no guarantees about the order in which finalizers are called
at exit. This can be problematic, because the finalizer may need to
perform operations on objects that have already been GC'd.

I have avoided this before in C code by explicitly checking for a GC'd
object. But since the logging gem is ruby (not C), I don't know what it
should do.

IMO, it should have one finalizer per appender, rather than a single
finalizer that cleans up all the appenders. But I'm not sure that would
fix the crash.

Paul

Updated by jshow (Jodi Showers) over 12 years ago

Since my last post, I've continued to see the crash

But today I came across this bug report for the logging gem

https://github.com/TwP/logging/issues/30

which may very well be the source

I'll update my experiences here

Updated by authorNari (Narihiro Nakamura) over 12 years ago

  • Status changed from Assigned to Closed

Hi!

I think ruby-1.9.3-p194 fixed this bug.
if you see the crash with ruby-1.9.3-p194, please reopen this ticket.

Thanks.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0