Project

General

Profile

Actions

Bug #9473

closed

Corruption and Segmentation faults all over

Added by drasch (David Rasch) almost 11 years ago. Updated over 8 years ago.

Status:
Third Party's Issue
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]
Backport:
[ruby-core:60427]

Description

We're in the process of moving from Rails 2.3 to 3.2 (both running on Ruby 1.9.3-p484)

In this process we've run into a snag where we're seeing errors crop up within 2-3 hours of taking production traffic (or replays thereof with siege). We cannot be certain that these errors would not occur with rails 2.3, however they appear more quickly and pervasively in the 3.2 branch.

These corruptions sometimes appear as: (in places where these errors are highly improbable if not impossible):
"string contains null byte"
ActiveModel::MissingAttributeError "missing attribute: ..."
"undefined method `table_name' for false:FalseClass"

for example - this error doesn't make much/any sense:
string contains null byte
activesupport (3.2.16) lib/active_support/core_ext/class/attribute.rb:97:in `block in class_attribute'

As a result we've tried:

  1. Upgrading ruby 1.9.3 HEAD
  2. Removing our Garbage collection tweaks
  3. Turning on/off different areas of our codebase
  4. upgrading gems with C extensions

and run independent tests on most of these variables but haven't been able to isolate it.

We're assuming these spurious errors are also related to the segmentation faults we've been seeing. I've attached some examples.
The segfaults have happened all over the place including GC, compile, str_replace.

We've tried running against valgrind to identify a root cause and it indicates (on several reproductions) the first error in st.c:330 in st_lookup.


Files

valgrind.txt (28.9 KB) valgrind.txt drasch (David Rasch), 02/03/2014 04:58 PM
segfault1.txt (983 KB) segfault1.txt drasch (David Rasch), 02/03/2014 04:59 PM
segfault2.txt (1.02 MB) segfault2.txt drasch (David Rasch), 02/03/2014 04:59 PM
segfault3.txt (1.05 MB) segfault3.txt drasch (David Rasch), 02/03/2014 05:00 PM

Updated by drasch (David Rasch) almost 11 years ago

And we've also gotten from valgrind:
==13233== Thread 5:
==13233== Invalid read of size 8
==13233== at 0x3F2B4326A6: __sigsetjmp (in /lib64/libc-2.12.so)
==13233== Address 0xcef8730 is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid read of size 8
==13233== at 0x3F2B4326CC: __sigsetjmp (in /lib64/libc-2.12.so)
==13233== Address 0xcef8730 is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid read of size 8
==13233== at 0x3F2B4326E1: __sigsetjmp (in /lib64/libc-2.12.so)
==13233== Address 0xcef8730 is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid read of size 4
==13233== at 0x3F2AC0DF98: _dl_fixup (in /lib64/ld-2.12.so)
==13233== Address 0xcef8718 is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid write of size 4
==13233== at 0x3F2AC0E09C: _dl_fixup (in /lib64/ld-2.12.so)
==13233== Address 0xcef871c is not stack'd, malloc'd or (recently) free'd
==13233==
==13233== Invalid read of size 4
==13233== at 0x3F2AC0DFAD: _dl_fixup (in /lib64/ld-2.12.so)
==13233== Address 0xcef874c is not stack'd, malloc'd or (recently) free'd

Updated by drbrain (Eric Hodel) almost 11 years ago

  • Tracker changed from Backport to Bug
  • Project changed from Backport193 to Ruby master
  • Priority changed from 5 to Normal

Fixed project, tracker and priority

Updated by normalperson (Eric Wong) almost 11 years ago

wrote:

  1. upgrading gems with C extensions

Can you reproduce this without C extensions?
Which C extensions do you run? Likely one of them is corrupting
memory, so it could be an odd/strange one somewhere..

It looks like one of them (Pool2/Implementation.cpp) is passenger,
so maybe try reproducing the error with unicorn?

Updated by drasch (David Rasch) almost 11 years ago

We've been running further tests and when running our app under Unicorn instead of Passenger the problem hasn't occurred yet.

Updated by drasch (David Rasch) almost 11 years ago

We've continued to see no crashes under Unicorn. We've done further testing but aren't certain if this is a systemic issue w/ Passenger and our setup.

Updated by normalperson (Eric Wong) almost 11 years ago

Interesting. Have you contacted the Passenger developers about this?
Anyways I'm happy unicorn is working well for you :)

Updated by hsbt (Hiroshi SHIBATA) over 8 years ago

  • Status changed from Open to Third Party's Issue
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0