Backport #2326
closed1.8.7 Segmentation fault
Description
=begin
Ruby 1.8.7 crashes with a "Segmentation fault" on this tiny example:
% cat foo.rb
t1 = t2 = Time.now
while t1.sec == t2.sec
t2 = Time.now
end
%
% ruby foo.rb
foo.rb:3: [BUG] Segmentation fault
ruby 1.8.7 (2009-09-11 patchlevel 202) [i686-linux]
zsh: abort (core dumped) /home/johan/src/ruby/installs/ruby_1_8_7/bin/ruby foo.rb
The crash in the example above is with a Ruby built locally by myself using the tip of the "ruby_1_8_7" branch:
% svn info | egrep "URL|Revision|Date"
URL: http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8_7
Revision: 25631
Last Changed Date: 2009-09-11 05:23:37 +0200 (Fri, 11 Sep 2009)
Originally I discovered the crash with the Ruby in Ubuntu 9.10. But since the problem occurs with the latest revision on the "ruby_1_8_7" branch too, I report it here rather than to Ubuntu. The version in Ubuntu 9.10 is:
ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]
( I get other Ruby crashes on Unbuntu 9.10 too, it seems like the default Ubuntu setup uses Ruby, and sometimes detects a Ruby crash and reports it via the GNOME-desktop GUI. I have not investigated those crashes yet, so I don̈́t know if the crashes there and the one in my example are related. The Ubuntu-crash is of course an Ubuntu problem, but I mention it here just in case it can be of interest to the Ruby core developers too. )
=end
Updated by vvs (Vladimir Sizikov) about 15 years ago
=begin
I can verify this crash on Ubuntu 9.10 as well. Even if I use my own built MRI 1.8.7 which works fine on earlier Ubuntus.
On the other hand, the code that triggers this crash is really, really bad too, it creates huge amount of new Time objects during the whole second. I'm not surprised that MRI's GC can't keep up with that. :)
=end
Updated by wyhaines (Kirk Haines) about 15 years ago
=begin
In testing, the case which causes the error can be simplified to:
Time.now while true
In general, this segfault can be triggered by creating, in a tight loop, any object that has a Data_Make_Struct call in it's creation, and that has a free function passed into that call.
What appears to happen is that these objects are deferred when obj_free() is called during the gc_sweep():
case T_DATA:
if (DATA_PTR(obj)) {
if ((long)RANY(obj)->as.data.dfree == -1) {
RUBY_CRITICAL(free(DATA_PTR(obj)));
}
else if (RANY(obj)->as.data.dfree) {
make_deferred(RANY(obj));
return 1;
}
}
Consequently, they don't get removed from the heap before the heap fills up. This wouldn't be a problem except that rb_newobj has the following code:
if (ruby_gc_stress || !freelist) garbage_collect();
obj = (VALUE)freelist;
freelist = freelist->as.free.next;
The problem with this check is that it doesn't go far enough. If freelist points to something, but freelist->as.free.next points to 0, then when the assignment to freelist runs, a segfault occurs.
The simple fix for this is to expand the check:
if (ruby_gc_stress || !freelist || !freelist->as.free.next) garbage_collect();
This works, and it's the same fix on the 1.8.6 and 1.8.7 versions which are broken. But is it the best way to fix this? Feedback is appreciated.
Kirk Haines
=end
Updated by hongli (Hongli Lai) about 15 years ago
=begin
Confirmed on 1.8.7-p174 on OS X Snow Leopard. OS X's own version (1.8.7-p72) doesn't have the problem.
=end
Updated by wyhaines (Kirk Haines) about 15 years ago
=begin
On Sun, Nov 8, 2009 at 4:25 AM, Hongli Lai redmine@ruby-lang.org wrote:
Issue #2326 has been updated by Hongli Lai.
Confirmed on 1.8.7-p174 on OS X Snow Leopard. OS X's own version
(1.8.7-p72) doesn't have the problem.
It exists on everything in the 1.8.7 line on or after p21435, it seems.
Build 1.8.7p21334 and it doesn't happen. Build 1.8.7p21435, and it does.
On the 1.8.6 line, it is introduced at p21433.
Those patch levels introduced the notion of deferring GC on certain objects.
Kirk Haines
Issue #2326 has been updated by Hongli Lai.
Confirmed on 1.8.7-p174 on OS X Snow Leopard. OS X's own version (1.8.7-p72) doesn't have the problem.
It exists on everything in the 1.8.7 line on or after p21435, it seems. Build 1.8.7p21334 and it doesn't happen. Build 1.8.7p21435, and it does.
On the 1.8.6 line, it is introduced at p21433.
Those patch levels introduced the notion of deferring GC on certain objects.
Kirk Haines
=end
Updated by wyhaines (Kirk Haines) about 15 years ago
=begin
I would like to apply this fix to 1.8.6, and see it applied upstream in the
1.8 tree, as it fixes an easily reproducible segfault condition.
Does anyone see a problem with this approach, or have a suggestion for a
different, better approach for fixing the problem?
Thank you,
Kirk Haines
I would like to apply this fix to 1.8.6, and see it applied upstream in the 1.8 tree, as it fixes an easily reproducible segfault condition.
Does anyone see a problem with this approach, or have a suggestion for a different, better approach for fixing the problem?
Thank you,
Kirk Haines
=end
Updated by nobu (Nobuyoshi Nakada) about 15 years ago
- Status changed from Open to Assigned
- Assignee set to shyouhei (Shyouhei Urabe)
=begin
Backport r24713.
=end
Updated by btm (Bryan McLellan) about 15 years ago
=begin
I'm sending a patch containing Kirk's fix to debian and ubuntu as it is affecting a number of users on Ubuntu 9.10 (karmic).
https://bugs.launchpad.net/ruby/+bug/488115
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=557924
http://tickets.opscode.com/browse/CHEF-530
=end
Updated by shyouhei (Shyouhei Urabe) about 15 years ago
- Status changed from Assigned to Closed
=begin
Revision r24713 has been backported to 1.8.7 as revision r25801.
=end