Project

General

Profile

Actions

Bug #2258

closed

Kernel#require inside rb_require() inside rb_protect() inside SysV context fails

Added by sunaku (Suraj Kurapati) over 14 years ago. Updated almost 13 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]
Backport:
[ruby-core:26244]

Description

=begin
Hi,

I get the following error when I call Kernel#require()
inside a Ruby script that is being run by rb_require(),
which itself is running inside rb_protect(), which in
turn is running inside a UNIX System V context.

"[BUG] object allocation during garbage collection phase"

An example test case to reproduce this bug is attached, and
the result of running this test case is shown below.

Note that this test case runs correctly without any errors
if you remove the call to Kernel#require in "hello.rb" line 7.

Thanks for your consideration.

###############################################################

ruby -v extconf.rb && sed -i 's,-shared,,g' Makefile

###############################################################

ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]
checking for ruby_init() in -lruby-static... yes
checking for sys/ucontext.h... yes
creating Makefile

###############################################################

make && ./main.so

###############################################################

gcc -I. -I/usr/include/ruby-1.9.1/i686-linux -I/usr/include/ruby-1.9.1/ruby/backward -I/usr/include/ruby-1.9.1 -I. -DHAVE_SYS_UCONTEXT_H -D_FILE_OFFSET_BITS=64 -fPIC -march=i686 -mtune=generic -O2 -pipe -O2 -g -Wall -Wno-parentheses -fPIC -o main.o -c main.c
main.c: In function ‘relay_from_main_to_ruby’:
main.c:15: warning: implicit declaration of function ‘swapcontext’
main.c: In function ‘main’:
main.c:99: warning: implicit declaration of function ‘getcontext’
main.c:100: warning: implicit declaration of function ‘makecontext’
gcc -o main.so main.o -L. -L/usr/lib -Wl,-R/usr/lib -L. -Wl,--hash-style=gnu -Wl,--as-needed -rdynamic -Wl,-export-dynamic -Wl,-R -Wl,/usr/lib -L/usr/lib -lruby -lruby-static -lpthread -lrt -ldl -lcrypt -lm -lc
Main: relay_from_main_to_ruby() begin
Ruby: context begin
Ruby: relay 0
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 1
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 2
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: require 'hello' begin

~~~~> Doing relay from /home/sun/lab/ruby-ucontext/hello.rb:3
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
~~~~> OMG, back from main!! :-)
/home/sun/lab/ruby-ucontext/hello.rb:7: [BUG] Segmentation fault
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]

-- control frame ----------
c:0004 p:---- s:0010 b:0010 l:000009 d:000009 CFUNC  :require
c:0003 p:0080 s:0006 b:0006 l:000005 d:000005 TOP    /home/sun/lab/ruby-ucontext/hello.rb:7
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:001cac d:001cac TOP   
---------------------------
/home/sun/lab/ruby-ucontext/hello.rb:7: [BUG] object allocation during garbage collection phase
ruby 1.9.1p243 (2009-07-16 revision 24175) [i686-linux]

-- control frame ----------
c:0004 p:---- s:0010 b:0010 l:000009 d:000009 CFUNC  :require
c:0003 p:0080 s:0006 b:0006 l:000005 d:000005 TOP    /home/sun/lab/ruby-ucontext/hello.rb:7
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:001cac d:001cac TOP   
---------------------------
Segmentation fault
=end

Files

extconf.rb (179 Bytes) extconf.rb sunaku (Suraj Kurapati), 10/23/2009 08:37 AM
hello.rb (212 Bytes) hello.rb sunaku (Suraj Kurapati), 10/23/2009 08:37 AM
main.c (2.78 KB) main.c sunaku (Suraj Kurapati), 10/23/2009 08:37 AM
extconf.rb (206 Bytes) extconf.rb updated to handle Ruby 1.8 sunaku (Suraj Kurapati), 10/23/2009 12:13 PM
main.c (2.95 KB) main.c updated to handle Ruby 1.8 and ucontext.h location sunaku (Suraj Kurapati), 10/23/2009 12:13 PM
ruby-ucontext.tgz (12.1 KB) ruby-ucontext.tgz tarball of ruby fully inside SysV context example sunaku (Suraj Kurapati), 10/23/2009 11:35 PM
ruby-ucontext-half.tgz (15.6 KB) ruby-ucontext-half.tgz example test case with ruby_init() in main() and rb_require() in SysV context sunaku (Suraj Kurapati), 10/24/2009 12:33 AM
ruby-ucontext-full.tgz (12.1 KB) ruby-ucontext-full.tgz example test case with both ruby_init() and rb_require() in SysV context sunaku (Suraj Kurapati), 10/24/2009 12:33 AM
ruby-ucontext-full.tgz (16.6 KB) ruby-ucontext-full.tgz increased SysV context stack size to 4 MiB sunaku (Suraj Kurapati), 10/24/2009 04:01 PM
0001-make-SET_STACK_END-macro-respect-rb_thread_t.machine.patch (2.31 KB) 0001-make-SET_STACK_END-macro-respect-rb_thread_t.machine.patch patch to make SET_STACK_END respect machine_stack_maxsize sunaku (Suraj Kurapati), 10/25/2009 03:40 AM
0001-add-ruby_bind_stack-to-inform-GC-about-explicit-stac.patch (1.84 KB) 0001-add-ruby_bind_stack-to-inform-GC-about-explicit-stac.patch good, minimal solution for this problem sunaku (Suraj Kurapati), 10/25/2009 01:28 PM
ruby-ucontext-full.tgz (9.88 KB) ruby-ucontext-full.tgz updated to do more complex Ruby things in hello.rb sunaku (Suraj Kurapati), 10/25/2009 01:28 PM
ruby-ucontext-static-stack.tgz (9.91 KB) ruby-ucontext-static-stack.tgz statically allocated stack sunaku (Suraj Kurapati), 10/25/2009 01:50 PM
ruby-ucontext-dynamic-stack.tgz (10.3 KB) ruby-ucontext-dynamic-stack.tgz dynamically allocated stack sunaku (Suraj Kurapati), 10/25/2009 01:50 PM
ruby-libpcl-dynamic-stack.tgz (10 KB) ruby-libpcl-dynamic-stack.tgz sunaku (Suraj Kurapati), 10/25/2009 03:19 PM
ruby-libpcl-static-stack.tgz (10.3 KB) ruby-libpcl-static-stack.tgz sunaku (Suraj Kurapati), 10/25/2009 03:19 PM
ruby-ucontext-dynamic-stack.tgz (10.2 KB) ruby-ucontext-dynamic-stack.tgz sunaku (Suraj Kurapati), 10/25/2009 03:19 PM
ruby-ucontext-static-stack.tgz (10 KB) ruby-ucontext-static-stack.tgz sunaku (Suraj Kurapati), 10/25/2009 03:19 PM
ruby_bind_stack.patch (2.67 KB) ruby_bind_stack.patch added changelog entry; fixed stack_start bug; renamed variables sunaku (Suraj Kurapati), 10/26/2009 03:31 AM
ruby_bind_stack.patch (2.67 KB) ruby_bind_stack.patch corrected ruby-core message ID to 26244 sunaku (Suraj Kurapati), 10/26/2009 03:49 AM
ruby_bind_stack.patch (2.66 KB) ruby_bind_stack.patch updated VALUE* to void* because value at memory address has no type sunaku (Suraj Kurapati), 10/27/2009 03:02 AM

Related issues 1 (1 open0 closed)

Has duplicate Ruby master - Feature #2294: [PATCH] ruby_bind_stack() to embed Ruby in coroutineAssignedko1 (Koichi Sasada)Actions
Actions #1

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

Below is the result of running the example test case attached
to this issue on the Ruby Issue Tracker with Ruby @ SVN trunk.

The only difference is that I see:

[BUG] Segmentation fault

Instead of:

[BUG] object allocation during garbage collection phase

Thanks for your consideration.

###############################################################

ruby -v extconf.rb && sed -i 's,-shared,,g' Makefile

###############################################################

ruby 1.9.2dev (2009-10-23 trunk 25426) [i686-linux]
checking for ruby_init() in -lruby-static... yes
checking for sys/ucontext.h... yes
creating Makefile

###############################################################

make && ./main.so

###############################################################

gcc -I. -I/home/sun/app/ruby/include/ruby-1.9.1/i686-linux -I/home/sun/app/ruby/include/ruby-1.9.1/ruby/backward -I/home/sun/app/ruby/include/ruby-1.9.1 -I. -DHAVE_SYS_UCONTEXT_H -D_FILE_OFFSET_BITS=64 -fPIC -g -o main.o -c main.c
gcc -o main.so main.o -L. -L/home/sun/app/ruby/lib -Wl,-R/home/sun/app/ruby/lib -L. -rdynamic -Wl,-export-dynamic -lruby-static -lpthread -lrt -ldl -lcrypt -lm -lc
Main: relay_from_main_to_ruby() begin
Ruby: context begin
Ruby: relay 0
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 1
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 2
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: require 'hello' begin

~~~~> Doing relay from /home/sun/lab/ruby-ucontext/hello.rb:3
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
~~~~> OMG, back from main!! :-)
/home/sun/lab/ruby-ucontext/hello.rb:7: [BUG] Segmentation fault
ruby 1.9.2dev (2009-10-23 trunk 25426) [i686-linux]

-- control frame ----------
c:0004 p:---- s:0010 b:0010 l:000009 d:000009 CFUNC  :require
c:0003 p:0080 s:0006 b:0006 l:000005 d:000005 TOP    /home/sun/lab/ruby-ucontext/hello.rb:7
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:001bdc d:001bdc TOP   
---------------------------
/home/sun/lab/ruby-ucontext/hello.rb:7:in `<top (required)>'
/home/sun/lab/ruby-ucontext/hello.rb:7:in `require'

-- C level backtrace information -------------------------------------------
./main.so(rb_vm_bugreport+0x6b) [0x8150f7d]
./main.so [0x805fbe4]
./main.so(rb_bug+0x34) [0x805fc3e]
./main.so [0x80f1f22]
[0xb779640c]
./main.so [0x8071b4c]
./main.so [0x8071bd0]
./main.so [0x80732cd]
./main.so [0x8073514]
./main.so [0x80715ca]
./main.so(rb_newobj+0x52) [0x8071683]
./main.so(rb_node_newnode+0xb) [0x8071690]
./main.so [0x80b27da]
./main.so(ruby_yyparse+0x5d3e) [0x80a8209]
./main.so [0x80abf83]
./main.so(ruby_suppress_tracing+0xab) [0x815641c]
./main.so [0x80ac0d5]
./main.so(rb_parser_compile_file+0x75) [0x80ac4fa]
./main.so [0x80f0ba7]
./main.so(rb_ensure+0x5f) [0x80644f9]
./main.so [0x80f0c9d]
./main.so(rb_load_file+0x34) [0x80f0cd7]
./main.so [0x80659a9]
./main.so(rb_require_safe+0x136) [0x80664bf]
./main.so(rb_f_require+0x1a) [0x8065d59]
./main.so [0x813ee9b]
./main.so [0x813f832]
./main.so [0x813fd3d]
./main.so [0x8144196]
./main.so [0x814e674]
./main.so(rb_iseq_eval+0x2b) [0x814ecb6]
./main.so [0x8065a08]
./main.so(rb_require_safe+0x136) [0x80664bf]
./main.so(rb_require+0x56) [0x8066613]
./main.so(rb_protect+0xc4) [0x8064429]
./main.so [0x805d09f]
./main.so [0x805d211]
/lib/libc.so.6(makecontext+0x4b) [0xb75e803b]
./main.so [0x8203140]
/lib/libc.so.6(__libc_start_main+0xe6) [0xb75c4a36]
./main.so [0x805cf41]

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

Aborted

=end

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I am attaching updated "extconf.rb" and "main.c" files
which now support Ruby 1.8 and native ucontext.h location.

With these updated files, I have found that the example
test case fails in the same way on Ruby 1.8.6 and 1.8.7
(see output below) as it does for Ruby 1.9.1 and 1.9.2.

Thanks for your consideration.

                     Ruby 1.8.7

###############################################################

ruby -v extconf.rb && sed -i 's,-shared,,g' Makefile

###############################################################

ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-linux]
checking for ruby_init() in -lruby-static... yes
checking for ruby_sysinit()... no
checking for sys/ucontext.h... yes
creating Makefile
/home/sun/.multiruby/install/1.8.7-p174/lib/ruby/1.8/mkmf.rb:1345: warning: global variable `$preload' not initialized

###############################################################

make && ./main.so

###############################################################

gcc -I. -I. -I/home/sun/.multiruby/install/1.8.7-p174/lib/ruby/1.8/i686-linux -I. -DHAVE_SYS_UCONTEXT_H -D_FILE_OFFSET_BITS=64 -fPIC -g -O2 -fPIC -c main.c
gcc -o main.so main.o -L. -L/home/sun/.multiruby/install/1.8.7-p174/lib -Wl,-R/home/sun/.multiruby/install/1.8.7-p174/lib -L. -rdynamic -Wl,-export-dynamic -Wl,-R -Wl,/home/sun/.multiruby/install/1.8.7-p174/lib -L/home/sun/.multiruby/install/1.8.7-p174/lib -lruby -lruby-static -lrt -ldl -lcrypt -lm -lc
Main: relay_from_main_to_ruby() begin
Ruby: context begin
Ruby: relay 0
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 1
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 2
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: require 'hello' begin

~~~~> Doing relay from ./hello.rb:3
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
~~~~> OMG, back from main!! :-)
Segmentation fault




                        Ruby 1.8.6

###############################################################
# ruby -v extconf.rb && sed -i 's,-shared,,g' Makefile
###############################################################

ruby 1.8.6 (2009-08-04 patchlevel 383) [i686-linux]
checking for ruby_init() in -lruby-static... yes
checking for ruby_sysinit()... no
checking for sys/ucontext.h... yes
creating Makefile
/home/sun/.multiruby/install/1.8.6-p383/lib/ruby/1.8/mkmf.rb:1136: warning: global variable `$preload' not initialized

###############################################################
# make && ./main.so
###############################################################

gcc  -o main.so main.o -L. -L/home/sun/.multiruby/install/1.8.6-p383/lib -Wl,-R/home/sun/.multiruby/install/1.8.6-p383/lib -L.  -rdynamic -Wl,-export-dynamic    -Wl,-R -Wl,/home/sun/.multiruby/install/1.8.6-p383/lib -L/home/sun/.multiruby/install/1.8.6-p383/lib -lruby -lruby-static  -lrt -ldl -lcrypt -lm   -lc
Main: relay_from_main_to_ruby() begin
Ruby: context begin
Ruby: relay 0
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 1
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 2
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: require 'hello' begin
~~~~> Hello World!  The time is Thu Oct 22 20:04:01 -0700 2009.
~~~~> Doing relay from ./hello.rb:3
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
~~~~> OMG, back from main!! :-)
Segmentation fault

=end
Actions #3

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

Below is a full GDB backtrace of the bug in Ruby 1.9.2dev.
I hope this is more useful than the output I posted so far.

Thanks for your consideration.

###############################################################

ruby -v extconf.rb && sed -i 's,-shared,,g' Makefile

###############################################################

ruby 1.9.2dev (2009-10-23 trunk 25426) [i686-linux]
checking for ruby_init() in -lruby-static... yes
checking for ruby_sysinit()... yes
checking for sys/ucontext.h... yes
creating Makefile

###############################################################

make && gdb ./main.so

###############################################################

gcc -I. -I/home/sun/app/ruby/include/ruby-1.9.1/i686-linux -I/home/sun/app/ruby/include/ruby-1.9.1/ruby/backward -I/home/sun/app/ruby/include/ruby-1.9.1 -I. -DHAVE_RUBY_SYSINIT -DHAVE_SYS_UCONTEXT_H -D_FILE_OFFSET_BITS=64 -fPIC -g -o main.o -c main.c
gcc -o main.so main.o -L. -L/home/sun/app/ruby/lib -Wl,-R/home/sun/app/ruby/lib -L. -rdynamic -Wl,-export-dynamic -lruby-static -lpthread -lrt -ldl -lcrypt -lm -lc
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /home/sun/lab/ruby-ucontext/main.so...done.
(gdb) run
Starting program: /home/sun/lab/ruby-ucontext/main.so
[Thread debugging using libthread_db enabled]
[New Thread 0xb7fdfb70 (LWP 1814)]
Main: relay_from_main_to_ruby() begin
Ruby: context begin
Ruby: relay 0
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 1
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: relay 2
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
Ruby: require 'hello' begin

~~~~> Doing relay from /home/sun/lab/ruby-ucontext/hello.rb:3
Ruby: relay_from_ruby_to_main() begin
Main: relay_from_main_to_ruby() end
Main: relay_from_main_to_ruby() begin
Ruby: relay_from_ruby_to_main() end
~~~~> OMG, back from main!! :-)

Program received signal SIGSEGV, Segmentation fault.
mark_locations_array (objspace=0x82134f0, x=0x82d9000, n=-302737297) at gc.c:1312
1312	        v = *x;
(gdb) bt full
#0  mark_locations_array (objspace=0x82134f0, x=0x82d9000, n=-302737297) at gc.c:1312
        v = 0
#1  0x08071bd0 in gc_mark_locations (objspace=0x82134f0, start=0x82036c0, end=0xbffff1c0) at gc.c:1328
        n = -302518592
#2  0x080732cd in mark_current_machine_context (objspace=0x82134f0, th=0x8213290) at gc.c:2123
        stack_start = 0x82036c0
        stack_end = 0xbffff1c0
=end
Actions #4

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I ran a new experiment where the Ruby interpreter is completely enclosed by a System V context (previously it was initialized on
the main() stack, and then run inside a System V context).

Attached is a tarball containing source code and output log file.

In this experiment, I noticed that Ruby 1.8.6 and 1.8.7 exhibit the
same behavior (failure is in Kernel#require inside rb_require()) as
they did in the original example test case. In contrast, Ruby 1.9
fails immediately at ruby_init().

Thanks for your consideration.
=end

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

For your convenience, I am attaching two tarballs:

ruby-ucontext-half.tgz - example test case with
ruby_init() in main() and
rb_require() in SysV context

ruby-ucontext-full.tgz - example test case with both
ruby_init() and rb_require()
inside SysV context

Inside the tarballs, you will find an "output.log" file
that shows the results of running ./run.sh on my system.

At this point, I believe the trouble I am having is due to
Ruby stack initialization making assumptions that are not
satisfied by the UNIX System V context containing Ruby.

I will try to hack the Ruby source code @ SVN trunk to
correct these assumptions accordingly. Any hints would
be greatly appreciated. :-)

Thanks for your consideration.
=end

Actions #6

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I ran the ruby-ucontext-full.tgz example on Ruby @ SVN trunk
in a debugger and found that the real problem was being hidden
by a call to rb_longjmp(). Below is the stack trace of the
real problem, just before the long jump is executed.

The real problem is that the "stdin" value passed from
Inito_IO() to prep_stdio() is becoming corrupted somehow:

(gdb) f
#1 0x0808b2e4 in Init_IO () at io.c:9785
(gdb) p stdin
$3 = (struct _IO_FILE *) 0xb7f3b420
(gdb) down
#0 prep_stdio (f=0x8213008, fmode=1, klass=136569860, path=0x81860bd "") at io.c:6267

Notice that stdin is 0xb7f3b420 inside Init_IO(),
but inside prep_stdio() its value is 0x8213008.

This can only mean one thing: the stack pointer in Init_IO()
is radically different from the stack pointer in prep_stdio(),
and the arguments for the function call to prep_stdio() are
being placed at a different stack address than what prep_stdio()
is reading from, as illustrated in this pseudo-assembly code:

Init_IO():

 stack.push stdin # 0xb7f3b420 (correct value)
 stack.push fmode
 stack.push klass
 stack.push path
 stack.push program_counter + size_of_machine_instruction
 jump prep_stdio

prep_stdio():

 # PROBLEM: stack pointer is wrong!

 return_address = stack.pop
 path           = stack.pop
 klass          = stack.pop
 fmode          = stack.pop
 stdin          = stack.pop # 0x8213008 (wrong value!)

Any suggestions?

Thanks for your consideration.

###############################################################

The full backtrace before longjmp() is executed:

###############################################################

(gdb) bt full
#0 0xb7e23201 in siglongjmp () from /lib/libc.so.6
No symbol table info available.
#1 0x08063e73 in rb_longjmp (tag=6, mesg=136568440) at eval.c:425
at = 136588340
e = 136568440
th = 0x8213290
file = 0x0
line = 0
#2 0x08063eaa in rb_exc_raise (mesg=136568440) at eval.c:434
No locals.
#3 0x08061866 in rb_raise (exc=136570260, fmt=0x8185657 "closed stream") at error.c:1150
args = 0x8205108 "h\275'\b\264\336#\b\b0!\b\001"
mesg = 136568460
#4 0x0807a6e2 in rb_io_check_closed (fptr=0x827bd68) at io.c:248
No locals.
#5 0x08085044 in prep_stdio (f=0x8213008, fmode=1, klass=136569860, path=0x81860bd "") at io.c:6267
fptr = 0x827bd68
io = 136568500
=end

Actions #7

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I inspected the assembler code for the prep_stdio() function call
inside Init_IO() and found that the problem was not a corrupted
stack pointer as I had imagined, but instead, the wrong argument
value was being passed to prep_stdio() by the assembler code:

rb_stdin = prep_stdio(stdin, FMODE_READABLE, rb_cIO, "");

Dump of assembler code from 0x808b2bd to 0x808b3bd:
0x0808b2bd <Init_IO+4171>: mov 0x8212980,%edx
0x0808b2c3 <Init_IO+4177>: mov 0x8202fa4,%eax # <=== HERE
0x0808b2c8 <Init_IO+4182>: movl $0x81860bd,0xc(%esp)
0x0808b2d0 <Init_IO+4190>: mov %edx,0x8(%esp)
0x0808b2d4 <Init_IO+4194>: movl $0x1,0x4(%esp)
0x0808b2dc <Init_IO+4202>: mov %eax,(%esp) # <=== stdin
0x0808b2df <Init_IO+4205>: call 0x8084fee <prep_stdio>

For some reason, the assembler code is passing the wrong value
for the "stdin" parameter. GDB shows that the value of "stdin"
inside Init_IO() is:

(gdb) p stdin
$8 = (struct _IO_FILE *) 0xb7f3b420

But the assembler code thinks that "stdin" is 0x8202fa4 (see "HERE" above).

Any suggestions for debugging?

Thanks for your consideration.
=end

Actions #8

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Sorry, I meant that the assembler code thinks that "stdin"
is 0x8213008. The value is initially 0x8202fa4 when stored
in the EAX register, but at the time of the call instruction,
the value is 0x8213008.
=end

Actions #9

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I did some more debugging to find out why the value of "stdin"
was incorrect inside Ruby's prep_stdio() but correct in main().

In my compilation of the ruby-ucontext-full.tgz test case, the
value of "stdin" is always located at memory address 0x8202fa4.

Before the System V context that houses Ruby is created in
main(), the value at memory address 0x8202fa4 is 0xb7f3b420.

During the course of running the System V context, the stack
pointer occupies memory address 0x8202fa4 (which is supposed
to be reserved for the "stdin" variable!) and overwrites its
value with 0x8213008 -- which is the value of "stdin" as read
inside Ruby's prep_stdio() function.

By setting a watchpoint in GDB to stop the debugger as soon as
the value at memory address 0x8202fa4 does not equal 0xb7f3b420,
I found that the stack pointer was occupying the memory for "stdin".

In conclusion, the problem is that the System V context's
pre-allocated stack of SIGSTKSZ bytes (8192 bytes on my system)
is insuffient to support the execution of the Ruby interpreter.

I will try to malloc() a large region of heap memory according to
getrlimit() results for use as the System V context's stack and
I will post a new example test case that makes ruby-ucontext-full.tgz
behave in the same way as ruby-ucontext-half.tgz: failure at
Kernel#require inside rb_require() inside rb_protect inside SysV context.

Thanks for your consideration.
=end

Actions #10

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I am attaching a new ruby-ucontext-full.tgz example test case
which statically allocates 4 MiB of memory for the System V
context's (which houses the embedded Ruby interpreter) stack.

Now the ruby-ucontext-full.tgz example fails at
the same location as ruby-ucontext-half.tgz does:

Program received signal SIGSEGV, Segmentation fault.
#0 mark_locations_array (objspace=0x86114f0, x=0x86d7000, n=-303783813) at gc.c:1312
#1 0x08071bf8 in gc_mark_locations (objspace=0x86114f0, start=0x8601680, end=0xbffff1f0) at gc.c:1328
#2 0x080732f5 in mark_current_machine_context (objspace=0x86114f0, th=0x8611290) at gc.c:2123
...

Furthermore, this change fixes the problem of Kernel#require
failing inside rb_require() for Ruby 1.8.6 and 1.8.7, as the "output.log" file inside ruby-ucontext-full.tgz shows.

I will now debug the above backtrace for Ruby @ SVN trunk.

Thanks for your consideration.
=end

Actions #11

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I debugged the segfault during Kernel#require inside
rb_require() inside rb_protect() inside a SysV context,
and found that the culprit is gc_mark_locations().

It is processing (and corrupting) a small portion of
the SysV context's stack space, regardless of whether
the stack space is statically or dynamically allocated,
as shown in the GDB output below.

Notice that the "start" argument for gc_mark_locations()
is within ruby_context.uc_ss_sp and ss_sp + ss_size.

To solve this problem, I will try to add a function in
Ruby's C API that tells Ruby the boundaries of its stack.

This is strangely reminiscent of my feature request #2126.
I suppose my experiments and discoveries so far provide
make a case for exposing stack functions in the Ruby C API.

Thanks for your consideration.

###############################################################

Statically allocated SysV context stack

###############################################################

(gdb) p ruby_context.uc_stack
$19 = {ss_sp = 0x82032a0, ss_flags = 0, ss_size = 4194304}

(gdb) p /x ruby_context.uc_stack.ss_sp + ruby_context.uc_stack.ss_size
$18 = 0x86032a0

(gdb) f
#1 0x08071bf8 in gc_mark_locations (objspace=0x86114f0, start=0x8601680, end=0xbffff1f0) at gc.c:1328

###############################################################

Dynamically allocated SysV context stack

###############################################################

(gdb) p ruby_context.uc_stack
$15 = {ss_sp = 0xb79f6008, ss_flags = 0, ss_size = 4194304}

(gdb) p /x ruby_context.uc_stack.ss_sp + ruby_context.uc_stack.ss_size
$16 = 0xb7df6008

(gdb) f
#1 0x08071c00 in gc_mark_locations (objspace=0x82114f0, start=0xb7df4430, end=0xbffff1f0) at gc.c:1328

=end

Actions #12

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

The root cause of the problem of gc_mark_locations() processing
(and corrupting) a portion of the SysV context's stack space was
the SET_STACK_END macro. It was blindly setting machine_stack_end
without respecting the machine_stack_maxsize limit in rb_thread_t.

I am attaching a patch which solves this problem. I ran make test
after applying this patch and all tests pass successfully, but it
needs to be tested on more architectures (64-bit and opposite stack
grow direction):

PASS all 949 tests
./miniruby -I./lib -I.ext/common -I./- -r./ext/purelib.rb ./tool/runruby.rb --extout=.ext -- "./bootstraptest/runner.rb" --ruby="ruby" ./KNOWNBUGS.rb
2009-10-24 11:23:47 -0700
Driver is ruby 1.9.2dev (2009-10-23 trunk 25426) [i686-linux]
Target is ruby 1.9.2dev (2009-10-23 trunk 25426) [i686-linux]

With this patch, the ruby-ucontext-full.tgz example is able to
proceed further --- but it still does not complete successfully.
There still seems to be some amount of memory corruption occuring
at the new point of failure (see below).

Notice the fname parameter in some of these stack frames; the
"\266k\be" in "\266k\be/sun/app/ruby/lib/ruby/1.9.1/date.rb"
should really be "/home".

I will debug this memory corruption now and try to make the
ruby-ucontext-full.tgz example run successfully to completion.

Thanks for your consideration.

###############################################################

new point of failure for Kernel#require inside rb_protect()

###############################################################

Program received signal SIGSEGV, Segmentation fault.
0x080abc85 in parser_yyerror (parser=0x86b94a0, msg=0x86018c0 "syntax error, unexpected $end, expecting keyword_end") at parse.y:4897
(gdb) bt
#0 0x080abc85 in parser_yyerror (parser=0x86b94a0, msg=0x86018c0 "syntax error, unexpected $end, expecting keyword_end") at parse.y:4897
#1 0x080ab651 in ruby_yyparse (parser=0x86b94a0) at parse.c:10412
#2 0x080ac10f in yycompile0 (arg=141268128, tracing=0) at parse.y:5009
#3 0x08156601 in ruby_suppress_tracing (func=0x80ac00c , arg=141268128, always=1) at thread.c:4031
#4 0x080ac261 in yycompile (parser=0x86b94a0, f=0x86b9428 "\266k\be/sun/app/ruby/lib/ruby/1.9.1/date.rb", line=1) at parse.y:5035 #5 0x080ac686 in rb_parser_compile_file (vparser=140736600, f=0x86b9428 "\266k\be/sun/app/ruby/lib/ruby/1.9.1/date.rb", file=140736580, start=1) at parse.y:5164
#6 0x080f0d33 in load_file_internal (arg=140517824) at ruby.c:1582

=end

Actions #13

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

The segfault mentioned in my previous message no longer occurs
with the newer Ruby @ SVN r25457 plus my SET_STACK_END patch.

The ruby-ucontext-full.tgz example now runs to completion
without a segfault. However, an IO exception is raised
during the call to Kernel.require('date') as follows:

Ruby: require 'hello' begin

~~~~> Doing relay from /home/sun/lab/ruby-ucontext/hello.rb:3
Relay: ruby => main
Relay: main <= ruby
Relay: main => ruby
Relay: ruby <= main
~~~~> OMG, back from main!! :-)
rb_require('./hello.rb') failed with status=6
... because an exception was raised:
#<IOError: closed stream>
/home/sun/lab/ruby-ucontext/hello.rb:7:in `require'
/home/sun/lab/ruby-ucontext/hello.rb:7:in `<top (required)>'
Ruby: require 'hello' end

The corresponding GDB backtrace for this exception shows
really strange file names (which seem to be source lines
from a previously read file --- the standard "date.rb"):

#0  rb_raise (exc=140756320, fmt=0x8185b03 "closed stream") at error.c:1150
[...]
#14 0x080ac3a1 in yycompile (parser=0x86af230, f=0x86af158 "  # The method will attempt to parse a date-time from the String\n", line=1) at parse.y:5035
#15 0x080ac7c6 in rb_parser_compile_file (vparser=140736400, f=0x86af158 "  # The method will attempt to parse a date-time from the String\n", file=140736380, start=1) at parse.y:5164
#16 0x080f0e73 in load_file_internal (arg=140517824) at ruby.c:1582
#17 0x08064571 in rb_ensure (b_proc=0x80f08c3 <load_file_internal>, data1=140517824, e_proc=0x80f0f02 <restore_lineno>, data2=1) at eval.c:695
#18 0x080f0f69 in load_file (parser=140736400, fname=0x86af158 "  # The method will attempt to parse a date-time from the String\n", script=0, opt=0x86021f8) at ruby.c:1610
#19 0x080f0fa3 in rb_load_file (fname=0x86af158 "  # The method will attempt to parse a date-time from the String\n") at ruby.c:1618
#20 0x08065a21 in rb_load_internal (fname=140736600, wrap=0) at load.c:289
[...]

Is this another case of memory corruption?  Or perhaps this is
a side-effect of setjmp/longjmp being used for exception handling?

Thanks for your consideration.
=end
Actions #14

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

By adding a watchpoint on the filename, I found that it
was being free()d by rb_newobj_from_heap() (not directly
but via garbage_collect()) before allocating space for a
few lines read from the standard "date.rb" file (see below).

How strange! Why does the GC feel pressured to free
the filename (which is a VALUE inside Ruby's stack space)?
Is there not enough heap space for it to allocate a new object?

These are the questions I will investigate next.
Feel free to give me a hint if you know the answer.

Thanks for your consideration.

#5 0x0807315e in obj_free (objspace=0x86114f0, obj=140736600) at gc.c:2000
#6 0x08072c34 in gc_sweep (objspace=0x86114f0) at gc.c:1876
#7 0x080737c0 in garbage_collect (objspace=0x86114f0) at gc.c:2204
#8 0x08071662 in rb_newobj_from_heap (objspace=0x86114f0) at gc.c:1011
#9 0x0807171b in rb_newobj () at gc.c:1094
#10 0x080fab39 in str_alloc (klass=140776360) at string.c:378
#11 0x080fabd6 in str_new (klass=140776360, ptr=0x86bab89 " new_by_frags(elem, sg)\n end\n\n # Create a new DateTime object by parsing from a String,\n # without specifying the format.\n #\n # +str+ is a String holding a date-time representation.\n # +comp+"..., len=27) at string.c:397
#12 0x080fad61 in rb_str_new (ptr=0x86bab89 " new_by_frags(elem, sg)\n end\n\n # Create a new DateTime object by parsing from a String,\n # without specifying the format.\n #\n # +str+ is a String holding a date-time representation.\n # +comp+"..., len=27) at string.c:417
#13 0x0807e083 in rb_io_getline_fast (fptr=0x86af2e8, enc=0x8644898) at io.c:2254
=end

Actions #15

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Sorry, I meant that the filename is a valid VALUE in the
memory range that is scanned by mark_current_machine_context().

The filename is certainly not in Ruby's stack space.
=end

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I am happy to announce that the problem is now solved! :-)

The attached 0001-add-ruby_bind_stack*.patch file adds a
ruby_bind_stack() function to the Ruby C API which allows
me to inform the GC about the stack boundaries of the
System V context in which the Ruby interperter is running:

void ruby_bind_stack(VALUE *lower, VALUE *upper);

I am also attaching an updated ruby-ucontext-full.tgz example
which makes use of the above ruby_bind_stack() function when
it is available.

Continue reading for a detailed explanation of the problem.

Thanks for your consideration.

##########################################################

Problem explanation

##########################################################

The problem was that the GC assumed that Ruby's stack
was aligned with the native pthread's stack:

upper=0xc1bff1f0
lower=0xbffff1f0

As Ruby executes, the lower boundary is adjusted to
reflect the current C stack pointer by calls to the
SET_STACK_END macro. In contrast, the upper boundary
is not updated at all.

When the GC runs, it scans from the current stack
lower boundary to the upper boundary. Since the lower
boundary was updated to reflect the C stack pointer,
we face the following situation:

(high memory address)

0xc1bff1f0 Ruby's stack upper boundary

0x086032a0 System V context's stack upper boundary

0x08601680 Ruby's stack lower boundary
(after update by SET_STACK_END)

0x082032a0 System V context's stack lower boundary

(low memory address)

To correct this situation, we need to update Ruby's
stack upper boundary to reflect the System V context's
stack upper boundary:

(high memory address)

0x086032a0 Ruby's stack upper boundary and also
System V context's stack upper boundary

0x08601680 Ruby's stack lower boundary
(after update by SET_STACK_END)

0x082032a0 System V context's stack lower boundary

(low memory address)

For machines where the stack grows upward, a similar problem
would be faced where Ruby's stack lower boundary would need
to be updated to reflect the System V context's stack lower
boundary.
=end

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I am attaching two versions of the ruby-ucontext-full.tgz
example: one which uses a statically allocated stack,
and one which uses a dynamically allocated stack for the
UNIX System V context which houses the Ruby interpreter.

Thanks for your consideration.
=end

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I am attaching variations of the ruby-ucontext examples
which use libpcl1 instead of UNIX System V contexts,
as well as updated versions of the ruby-ucontext examples.

Each tarball has an "output.log" file which contains the
result of running script -c ./run.sh output.log on my
machine:

uname -a
Linux yantram 2.6.31-ARCH #1 SMP PREEMPT Tue Oct 13 13:36:23 CEST 2009 i686 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux

The last section in output.log is for a Ruby 1.9.2dev which
is patched with the 0001-add-ruby_bind_stack*.patch attached
to this bug report.

Thanks for your consideration.

=end

Actions #19

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I am attaching a new "ruby_bind_stack.patch" file which
fixes a bug in my previous patch, adds a Changelog entry,
and renames the variables to better reflect their function.

Thanks for your consideration.
=end

Actions #21

Updated by nobu (Nobuyoshi Nakada) over 14 years ago

=begin
Hi,

At Fri, 23 Oct 2009 08:37:40 +0900,
Suraj Kurapati wrote in [ruby-core:26244]:

I get the following error when I call Kernel#require()
inside a Ruby script that is being run by rb_require(),
which itself is running inside rb_protect(), which in
turn is running inside a UNIX System V context.

Ko1 is going to improve Fiber with ucontext. Wait for it.

--
Nobu Nakada

=end

Actions #22

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

Nobu Nakada wrote:

Ko1 is going to improve Fiber with ucontext. Wait for it.

This is good news. But I am confused. Earlier I tried to
implement Ruby as a C coroutine using fibers and ran into a
limitation where a Fiber created from inside one rb_protect()
call could not be resumed from inside another rb_protect() call:

http://redmine.ruby-lang.org/issues/show/2127

How should I update my "ruby-ucontext" example
test case to accomodate ko1's planned changes?

Thanks for your consideration.
=end

Actions #24

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

I created a new issue to track my ruby_bind_stack patch and
code examples: http://redmine.ruby-lang.org/issues/show/2294

Mr. Nobu, could you please answer my question about how
my code examples need to change in order to accomodate this:

Ko1 is going to improve Fiber with ucontext. Wait for it.

Thanks for your consideration.
=end

Actions #25

Updated by sunaku (Suraj Kurapati) over 14 years ago

=begin
Hi,

Suraj Kurapati wrote:

Mr. Nobu, could you please answer my question about [...]

Please forgive my rudeness and childish impatience. I shall
wait quietly until you feel the time is right for this issue
to warrant consideration, or even a response for that matter.

Thank you for understanding.
=end

Actions #26

Updated by nobu (Nobuyoshi Nakada) over 14 years ago

  • Status changed from Open to Closed

=begin
This issue was solved with changeset r25842.
Suraj, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0