Bug #6246
closed1.9.3-p125 intermittent segfault
Description
I've seen this behavior in 1.9.3p0 and 1.9.3p125
Intermittently, a rails 3.2 rake task will break while compiling assets - about 1 in 10 runs will segfault. Our code is 10K's of lines of proprietary code. I am working on a simplified version you can use.
RAILS_ENV=new_staging bundle exec rake assets:precompile --trace
command: https://gist.github.com/2254457 (stacktrace and gem list)
backtrace : https://gist.github.com/2286260
Operating System : Debian Squeeze 6.0.4
Ruby installed through RVM 1.11.3
Files
Updated by Anonymous over 12 years ago
On Tue, Apr 03, 2012 at 04:22:04AM +0900, jshow (Jodi Showers) wrote:
Issue #6246 has been reported by jshow (Jodi Showers).
Bug #6246: 1.9.3-p125 intermittent segfault
https://bugs.ruby-lang.org/issues/6246Author: jshow (Jodi Showers)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-linux]I've seen this behavior in 1.9.3p0 and 1.9.3p125
Intermittently, a rails 3.2 rake task will break while compiling assets - about 1 in 10 runs will segfault. Our code is 10K's of lines of proprietary code. I am working on a simplified version you can use.
RAILS_ENV=new_staging bundle exec rake assets:precompile --trace
command: https://gist.github.com/2254457 (stacktrace and gem list)
backtrace : https://gist.github.com/2286260
Operating System : Debian Squeeze 6.0.4
Ruby installed through RVM 1.11.3
Can you either provide the application so that we can try to reproduce
the issue, somehow reduce the list of moving parts, or run valgrind against
your app?
The size of your gem list is daunting, it could be anything in there.
If you need, I can give you the valgrind command to use. :-)
--
Aaron Patterson
http://tenderlovemaking.com/
Updated by jshow (Jodi Showers) over 12 years ago
yes please - never used valgrind but will learn.
thanks
Updated by jshow (Jodi Showers) over 12 years ago
Aaron, I ran a valgrind as follows:
valgrind /home/deploy/.rvm/rubies/ruby-1.9.3-p125/bin/ruby -Iext:test /home/deploy/.rvm/gems/ruby-1.9.3-p125@boogala/bin/bundle exec rake RAILS_ENV=new_staging assets:precompile
the results are here: https://gist.github.com/2288812
Long before the actual startup of the rake task (rails boot) there were a great number of valgrind outputs -
==11285== Use of uninitialised value of size 8
==11285== at 0x4EA3203: gc_mark_children (gc.c:1867)
==11285== by 0x4EA491C: mark_current_machine_context (gc.c:1650)
==11285== by 0x4EA4A3D: gc_marks (gc.c:2474)
==11285== by 0x4EA5DBF: rb_newobj (gc.c:2195)
==11285== by 0x4F46115: str_alloc (string.c:373)
==11285== by 0x4FA0132: vm_call0 (vm_eval.c:79)
==11285== by 0x4FA0CC9: rb_funcall (vm_eval.c:236)
==11285== by 0x4ED926E: rb_obj_alloc (object.c:1603)
==11285== by 0x4EDA835: rb_obj_dup (object.c:315)
==11285== by 0xDE7034D: path_initialize (pathname.c:41)
==11285== by 0x4FA0132: vm_call0 (vm_eval.c:79)
==11285== by 0x4ED931F: rb_class_new_instance (object.c:1636)
==11285==
I can send all of these if you feel they'll help. None of these were outputted after the actual rake task began.
Updated by Anonymous over 12 years ago
On Tue, Apr 03, 2012 at 11:15:14AM +0900, jshow (Jodi Showers) wrote:
Issue #6246 has been updated by jshow (Jodi Showers).
Aaron, I ran a valgrind as follows:
valgrind /home/deploy/.rvm/rubies/ruby-1.9.3-p125/bin/ruby -Iext:test /home/deploy/.rvm/gems/ruby-1.9.3-p125@boogala/bin/bundle exec rake RAILS_ENV=new_staging assets:precompile
Try running valgrind with these flags:
--num-callers=50 --error-limit=no --partial-loads-ok=yes --undef-value-errors=no
Post all of the errors in a gist if you don't mind.
Can you also try your code against edge 1.9.3? Nari-san has fixed a
segv error in that branch. If we're lucky it's the same issue, and we
don't have to debug anymore. ;)
--
Aaron Patterson
http://tenderlovemaking.com/
Updated by jshow (Jodi Showers) over 12 years ago
Aaron,
attached is the full output from valgrind using your options - https://gist.github.com/2296847
I'm presently looking for the edge branch - not seeing it. Do you mean head?
thanks
Updated by Anonymous over 12 years ago
On Wed, Apr 04, 2012 at 10:18:26AM +0900, jshow (Jodi Showers) wrote:
Issue #6246 has been updated by jshow (Jodi Showers).
Aaron,
attached is the full output from valgrind using your options - https://gist.github.com/2296847
I'm presently looking for the edge branch - not seeing it. Do you mean head?
I guess? I mean the ruby_1_9_3 branch:
https://github.com/ruby/ruby/tree/ruby_1_9_3
:-)
--
Aaron Patterson
http://tenderlovemaking.com/
Updated by Anonymous over 12 years ago
On Wed, Apr 04, 2012 at 10:18:26AM +0900, jshow (Jodi Showers) wrote:
Issue #6246 has been updated by jshow (Jodi Showers).
Aaron,
attached is the full output from valgrind using your options - https://gist.github.com/2296847
I'm thinking that your segv is happening in a subshell (or not the
process on which you're running valgrind).
Are there hours when I can meet you on IRC? It might be faster than
mail exchange here. :-)
--
Aaron Patterson
http://tenderlovemaking.com/
Updated by jshow (Jodi Showers) over 12 years ago
I'm in the north american eastern time zone
I can meet today if you can
11am eastern is doable for me - just over an hour away
In the meantime I'll start trying ruby-head
much thanks Aaron
Updated by jshow (Jodi Showers) over 12 years ago
Aaron, I've updated this gist with a run that uses a more direct command line - which I think was hiding some of the valgrind output
valgrind --num-callers=50 --error-limit=no --partial-loads-ok=yes --undef-value-errors=no rake RAILS_ENV=new_staging assets:precompile
https://gist.github.com/2296847
still it seems we're not getting much value from valgrind - it's reporting 0 errors from 0 contexts
I'm re-running now with -v and --leak-check=full for more
Updated by jshow (Jodi Showers) over 12 years ago
- File valgrind_full.txt.zip valgrind_full.txt.zip added
attached is a more verbose valgrind
$ valgrind --num-callers=50 --error-limit=no --partial-loads-ok=yes --undef-value-errors=no -v --leak-check=full rake RAILS_ENV=new_staging assets:precompile
Updated by hectormalot (Dennis de Reus) over 12 years ago
I have a similar problem when creating a new rails (3.2.3) app. It segfaults on the bundle install step. I think this is related, so I won't create a new bugreport.
Ruby -v: ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin11.3.0]
I found this when I created a new rails (3.2.3) project. Ruby gave a segfault when running bundle install. The details are in this gist: https://gist.github.com/2330262. I'm not sure when the issue arises exactly, but I've also had trouble running the tests from active_admin using ruby 1.9.3-p125, going back to 1.9.2 didn't gave any segmentation faults.
I'm sorry for not giving a more detailed bugreport, but I'm not that well versed in ruby yet. I hope that other with the same problem can add to this report.
Updated by jshow (Jodi Showers) over 12 years ago
Dennis, looking at your ruby trace, it looks like you've run into something else
It appears that you are on osx, and there are some documented (try google) about ssl and ruby - seems that the ssl libs shipped with osx are not compatible with the latest builds of ruby
Updated by mame (Yusuke Endoh) over 12 years ago
- Status changed from Open to Assigned
- Assignee set to authorNari (Narihiro Nakamura)
Hello,
Perhaps I could reproduce this issue with small example:
class Foo
def Foo.shutdown(*args)
p Bar.inspect.size
end
Bar = (0..300).map do |x|
x = x.to_s
ObjectSpace.define_finalizer(x, Foo.method(:shutdown))
x
end
end
This code causes segfault in my system.
The number (300) may vary depending on a system.
This looks like a bug of GC finalizer.
A finalizer invoked at process termination, seems to refer
GC'ed object, which lead to segfault. I guess.
In fact, the "logging" gem, which is included in OP's trace,
seems to use define_finalizer as a substitute for at_exit.
https://github.com/TwP/logging/blob/master/lib/logging.rb#L552
Nari, please check it.
--
Yusuke Endoh mame@tsg.ne.jp
Updated by jshow (Jodi Showers) over 12 years ago
Yusuke Endoh,
I have tried your example, but unfortunately couldn't cause a crash. I experimented with various values of as high as 3000.
I would appreciate whatever effort can be applied - I am literally stuck as 125 crashes so often it's unusable.
If I can provide more details I would be happy to do so
OT, but I don't receive any emails from redmine - I'm watching this issue, and have the general mail settings turned on. Hard to follow - I presently come back to visit regularly.
thanks.
Updated by normalperson (Eric Wong) over 12 years ago
"jshow (Jodi Showers)" jodi@homestars.com wrote:
I would appreciate whatever effort can be applied - I am literally
stuck as 125 crashes so often it's unusable.
As Aaron mentioned, did you get a chance to try the ruby_1_9_3 branch?
Now that Ruby 1.9.3-p194 is out, maybe it's easier to give that a try.
Again, if we're lucky we it's the same issue we encountered with
Nokogiri and fixed in 1.9.3-p194.
OT, but I don't receive any emails from redmine - I'm watching this
issue, and have the general mail settings turned on. Hard to follow -
I presently come back to visit regularly.
I'm not sure if this redmine is configured emails users directly, but
subscribing to the ruby-core list works.
Updated by mame (Yusuke Endoh) over 12 years ago
Hello,
2012/4/24, jshow (Jodi Showers) jodi@homestars.com:
I have tried your example, but unfortunately couldn't cause a crash. I
experimented with various values of as high as 3000.
It is difficult to reproduce a problem about GC with a small code.
Indeed, my example does not cause SEGV on my another system.
Can you try the following patch?
This is an ad-hoc patch that stops finalizer call at exit.
SEGV should disappear if my guess is right.
If it still causes SEGV, my example is another issue.
diff --git a/eval.c b/eval.c
index 4dddcaa..b2358f8 100644
--- a/eval.c
+++ b/eval.c
@@ -98,7 +98,6 @@ ruby_finalize_1(void)
{
ruby_sig_finalize();
GET_THREAD()->errinfo = Qnil;
- rb_gc_call_finalizer_at_exit();
}
void
--
Yusuke Endoh mame@tsg.ne.jp
Updated by jshow (Jodi Showers) over 12 years ago
I've been using p194 for about a day now - no crashes yet
I'll notify you here over the next while
if the crashing resumes, I'll try the patch Yusuke.
much thanks to you both.
Updated by cout (Paul Brannan) over 12 years ago
/home/deploy/.rvm/rubies/ruby-1.9.3-p125/lib/libruby.so.1.9(rb_gc_call_finalizer_at_exit+0x219) [0x7f9f37246359] gc.c:3044
/home/deploy/.rvm/rubies/ruby-1.9.3-p125/lib/libruby.so.1.9(ruby_cleanup+0x1ec) [0x7f9f37233fac] eval.c:101
/home/deploy/.rvm/rubies/ruby-1.9.3-p125/lib/libruby.so.1.9(ruby_run_node+0x25) [0x7f9f37234205] eval.c:244
So we're cleaning up, and:
#0 st_lookup (table=0x0, key=792, value=0x7fffffffc858) at st.c:330
#1 0x00007ffff7b1629f in search_method (recv=,
mid=792) at vm_method.c:374
#2 rb_method_entry_get_without_cache (recv=,
mid=792) at vm_method.c:393
Someone called a method on an object that had already been GC'd.
That someone is in the logging gem:
ObjectSpace.define_finalizer self, Logging.method(:shutdown)
Ruby makes no guarantees about the order in which finalizers are called
at exit. This can be problematic, because the finalizer may need to
perform operations on objects that have already been GC'd.
I have avoided this before in C code by explicitly checking for a GC'd
object. But since the logging gem is ruby (not C), I don't know what it
should do.
IMO, it should have one finalizer per appender, rather than a single
finalizer that cleans up all the appenders. But I'm not sure that would
fix the crash.
Paul
Updated by jshow (Jodi Showers) over 12 years ago
Since my last post, I've continued to see the crash
But today I came across this bug report for the logging gem
https://github.com/TwP/logging/issues/30
which may very well be the source
I'll update my experiences here
Updated by authorNari (Narihiro Nakamura) over 12 years ago
- Status changed from Assigned to Closed
Hi!
I think ruby-1.9.3-p194 fixed this bug.
if you see the crash with ruby-1.9.3-p194, please reopen this ticket.
Thanks.