Project

General

Profile

Actions

Bug #18780

closed

Incorrect binding receiver for C API rb_eval_string()

Added by daveola (David Stellar) over 2 years ago. Updated about 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]
[ruby-core:108546]

Description

% ruby -v
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]

(Though looking at the source code this problem still exists in ruby 3.0)

The rb_eval_string() is seemingly capable of everything that eval is capable of, with one slight deviation. The binding is oddly setup to be correct except for the receiver/self.

This means that while these both act the same:

ruby: eval("puts someLocalVariable")
C API: rb_eval_string("puts someLocalVariable")

These do not:

ruby: eval("puts @someInstanceVar")
C API: rb_eval_string("puts @someInstanceVar") # nil

And this is because these do not work:

ruby: eval("puts self") # self of calling context
ruby: eval("puts binding().receiver") # self of calling context
C API: rb_eval_string("puts self") # main
C API: rb_eval_string("puts binding().receiver") # main

We can see the problem in the MRI source in ruby_eval_string_from_file() which has:

return eval_string_with_cref(rb_vm_top_self(), rb_str_new2(str), NULL, file, 1);

We've passed in rb_vm_top_self instead of the correct 'self'

Although possibly the issue is in the API itself with the fact that there's no way to plumb through the receiver that you are given in the C extension method function call, i.e.:

    // My C extension that calls eval and knows what it's 'self' is.
    VALUE method_myCMethod(int argc, VALUE *argv, VALUE self) {
            rb_eval_string("...");  // <- no way to be given self?
    }

Having said that, rb_eval_string is able to determine the vast majority of the binding context, since it correctly builds the binding() object except that the receiver is set to main, so perhaps this is something that can be determined. It is something that the builtin eval is able to do, after all. So possibly this is just a failure with MRI. I don't have other rubies to test. (I'm on ruby 2.7.0 but the source relevant to this hasn't changed.)

I would argue this is a bug, because we are essentially given a corrupted result from binding() - one where we may have access to the local variables of an instance method, but one where the self itself is not properly set. That's not an actual legit binding state in the ruby code.


Related issues 2 (1 open1 closed)

Related to Ruby master - Bug #18487: Kernel#binding behaves differently depending on implementation language of items on the stackClosedActions
Related to Ruby master - Feature #15778: Expose an API to pry-open the stack frames in RubyAssignedko1 (Koichi Sasada)Actions

Updated by Eregon (Benoit Daloze) over 2 years ago

  • Status changed from Open to Rejected

rb_eval_string() is used inside C code/C extensions, there is no "caller Binding" to use, C extension frames have no Binding.
If you want to eval with a specific Binding, capture it (e.g., with Kernel#binding in Ruby code), then pass it to the C extension and call Binding#eval(code) (e.g. with rb_funcall).
Or simply call Binding#eval from Ruby code and avoid C altogether.

Updated by daveola (David Stellar) over 2 years ago

That's actually not really true, there is a caller binding, as evidenced by the fact that you can get the binding inside of the eval.

So it sets up the binding() properly, except that binding().receiver is incorrect and it effectively gives you a state that does not actually exist (the local variables are accessible, but the instance variables are not.).

The entire point of this is to come up with a C api that acts like Kernel::eval but with some modifications, so the answer of "just supply binding() to the method" is not really working.

Updated by Eregon (Benoit Daloze) over 2 years ago

It's the same as TOPLEVEL_BINDING, isn't it?
It doesn't use any information from the caller, which is a C extension, which has binding, hence "no caller binding".

so the answer of "just supply binding() to the method" is not really working.

It is, use e.g. rb_funcall(my_binding, rb_intern("eval"), rb_str_new(code)).
And my_binding comes from an argument to that method defined in the C extension.

There seems to be no C function to create a binding from a receiver, but that's also expected, a Binding is not just a receiver, it points to a frame, the receiver but also its local variables, the default definee, the constant scope, etc.

Updated by daveola (David Stellar) over 2 years ago

Eregon (Benoit Daloze) wrote in #note-3:

It's the same as TOPLEVEL_BINDING, isn't it?
It doesn't use any information from the caller, which is a C extension, which has binding, hence "no caller binding".

No, it's not. It's the same as the caller's binding, except that the receiver/self is set to TOPLEVEL.

In other words:

RUBY CODE


class Foo
  def bar
    @instanceVar = 42
    localVar = 42
    <call my C method here>
  end
end


@instanceVar = "main"
localVar = "main"
Foo.new().bar() # Create a foo and call bar which calls my C api

C CODE

  // inside the C api method:
  VALUE bind = evalReq("binding()");

This binding will be pointing to localVar = 42 but @instanceVar = "main"

How you can justify that would be interesting.

Actions #5

Updated by Eregon (Benoit Daloze) over 2 years ago

  • Related to Bug #18487: Kernel#binding behaves differently depending on implementation language of items on the stack added

Updated by Eregon (Benoit Daloze) over 2 years ago

  • Status changed from Rejected to Open

OK, finally the issue is clear.

Could you try with latest master?
With #18487 I believe you should get a RuntimeError if you call binding from a C frame/method.

That also makes it clear one should not try to get a Binding from a C frame, it must be captured from a Ruby frame and then everything is consistent.

Updated by Eregon (Benoit Daloze) over 2 years ago

https://gist.github.com/eregon/31259907c132ab61e9f136d684b1fc39 is an easy way to have a single-file repro with both C code and Ruby code, it'd be nice if you can provide a complete repro as well.

Updated by daveola (David Stellar) over 2 years ago

Eregon (Benoit Daloze) wrote in #note-6:

OK, finally the issue is clear.

Could you try with latest master?
With #18487 I believe you should get a RuntimeError if you call binding from a C frame/method.

That also makes it clear one should not try to get a Binding from a C frame, it must be captured from a Ruby frame and then everything is consistent.

This is 100% not what I was hoping for.

  1. That seems like it will break a number of things, if your binding is no longer correct. What if you try to eval some code that ends up needing to use the binding?
  2. This is inconsistent with Kernel::eval, which, while written in C, is able to get the binding of the caller.

Let's say I need to write a myEval method which is like eval, but first prints out the string that is eval'd. Or possibly checks it for certain information. You have just made this literally impossible to write in ruby.

Updated by daveola (David Stellar) over 2 years ago

You have just made this literally impossible to write in ruby.

And before you say "just have the caller pass in the binding" - keep in mind that Kernel::eval does not require this. So you would not, for example, be able to make code that would wrap or replace eval, but still have proper bindings and be able to access local variables, instance variables, scope info, etc..

    foobar = 42
    eval("foobar += 1")      # Works
    myEval("foobar += 1")    # Impossible

Updated by Eregon (Benoit Daloze) over 2 years ago

  • Status changed from Open to Rejected

I'm reluctant to reply since you seem to ignore my comments as a Ruby implementer.

Anyway, you are asking for something to get the caller's binding, that is very much on purpose not provided by Ruby.
Such a functionality breaks encapsulation in a very bad manner.
The fact calling Kernel#binding from a C extension did something like that was accidental, and has been fixed (#18487).

There have been discussions in the past and such functionality to get the caller binding should only be made available for debugging situations, in which you can use the debug_inspector C API or https://github.com/banister/debug_inspector.
That is of course slow and should only be used for debugging situations, but maybe it makes sense to use it conditionally in your case based on some configuration/option (and otherwise just alias myEval eval).

It's a fair point this makes it impossible to wrap eval and still get the same binding (without the debug_inspector C API), but Ruby will not add access to the caller binding just for that.

Updated by daveola (David Stellar) over 2 years ago

Eregon (Benoit Daloze) wrote in #note-10:

I'm reluctant to reply since you seem to ignore my comments as a Ruby implementer.

I'm sorry you feel that way, and I regret posting to the truffleruby discussion by mistake, I lost track of the many threads that are talking about this.

Anyway, you are asking for something to get the caller's binding, that is very much on purpose not provided by Ruby.

This is being discussed I now see on:
https://bugs.ruby-lang.org/issues/15778

You, in fact, said that you thought it was a good idea (3 years ago) and I hope that is still the case, because this mechanism would solve my problem.

I understand that one almost never wants to break encapsulation (except in debug), but there are cases, and eval() itself does break encapsulation and is part of ruby.

That is of course slow and should only be used for debugging situations, but maybe it makes sense to use it conditionally in your case

I hear the suggestion and I will just have to switch to using a debugger for my code to work, it's not a conditional, I need the ability to write a different version of eval() for my code, and I fought hard at my company to use ruby, so it's a shame that it has to be done with a debug hack and all of it's performance penalties, but I hear that you are not willing to add this type of a pathway.

Hopefully 15778 will get resolved some day and then none of this will be an issue anymore for me. :)

Actions #12

Updated by Eregon (Benoit Daloze) over 2 years ago

  • Related to Feature #15778: Expose an API to pry-open the stack frames in Ruby added

Updated by Eregon (Benoit Daloze) over 2 years ago

daveola (David Stellar) wrote in #note-11:

I hear the suggestion and I will just have to switch to using a debugger for my code to work, it's not a conditional, I need the ability to write a different version of eval() for my code, and I fought hard at my company to use ruby, so it's a shame that it has to be done with a debug hack and all of it's performance penalties, but I hear that you are not willing to add this type of a pathway.

Note that no debugger is needed for the debug_inspector gem, it's a regular gem/dependency.
That C API that the gem uses is meant to only be used for debugging situations, but there is no actual check for it, it relies on people using it only for valid situations.
So you should be able to use it for your use-case without issues.

Hopefully 15778 will get resolved some day and then none of this will be an issue anymore for me. :)

My understanding of that issue's discussion so far is that is unlikely.
I would personally be happy to have a Ruby API for this as proposed there, but I understand concerns of other CRuby committers to not make it too easy to use such a dangerous/breaking-encapsulation API, and so I guess it will remain a C API only (+ the gem) for a while.

Updated by alanwu (Alan Wu) over 2 years ago

Thanks for the bug report. The "binding with mixed information from two
contexts" situation from https://bugs.ruby-lang.org/issues/18780#note-4
definitely looks wrong. I wrote a reproducer and luckily, it's no
longer an issue on the master branch (ab10f111c3).

Let's say I need to write a myEval method which is like eval, but first
prints out the string that is eval'd. ... You have just made this literally
impossible to write in ruby.

This is in fact possible with the rb_binding_new() API on both 2.7.6 and on
the master branch. Here is a demo script:

puts RUBY_DESCRIPTION

require 'tmpdir'
require 'rbconfig'

# From https://gist.github.com/eregon/31259907c132ab61e9f136d684b1fc39
# courtesy of Benoit.
def inline_c_extension(c_code)
  Dir.mktmpdir('inline_c_extension') do |dir|
    File.write("#{dir}/cext.c", c_code)
    File.write("#{dir}/extconf.rb", <<~RUBY)
      require 'mkmf'
      create_makefile('cext')
    RUBY

    out = IO.popen([RbConfig.ruby, 'extconf.rb'], chdir: dir, &:read)
    raise "ruby extconf.rb failed: #{$?.inspect}\n#{out}" unless $?.success?

    out = IO.popen(['make'], chdir: dir, &:read)
    raise "make failed: #{$?.inspect}\n#{out}" unless $?.success?

    require "#{dir}/cext.#{RbConfig::CONFIG['DLEXT']}"
  end
end

inline_c_extension <<~C
#include "ruby.h"

// A wrapped version of Kernel#eval that prints the program before evaluating.
// Uses the caller's Ruby context just like Kernel#eval.
static VALUE
my_eval(VALUE self, VALUE program)
{
    ID p_id = rb_intern("p");
    ID eval_id = rb_intern("eval");
    VALUE binding;

    if (1 /* do what I understand OP wants */) {
        binding = rb_binding_new();
    }
    else {
        // Flip the if to check the "impossible binding" issue from [ruby-core:108597]
        binding = rb_eval_string("p [:rbeval_string, @foo, foo]; binding");
    }

    rb_funcall(self, p_id, 1, program);
    return rb_funcall(binding, eval_id, 1, program);
}

void
Init_cext(void)
{
    rb_define_global_function("my_eval", my_eval, 1);
}
C

@foo = :main
foo = :mainl
my_eval("p [:in_main, self, @foo, foo]")

class Foo
  def foo
    @foo = :inclass
    foo = :inclassl
    my_eval("p [:inside_foo, self, @foo, foo]")
  end

  new.foo
end

$ ruby demo.rb
ruby 3.2.0dev (2022-06-14T16:06:06Z master ab10f111c3) [x86_64-darwin21]
"p [:in_main, self, @foo, foo]"
[:in_main, main, :main, :mainl]
"p [:inside_foo, self, @foo, foo]"
[:inside_foo, #<Foo:0x000000010fcc1360 @foo=:inclass>, :inclass, :inclassl]

In terms of [#18487], my_eval in the demo is not any more powerful than
Kernel#binding and Kernel#eval since it's only getting the binding of
the Ruby frame immediately below it.

By the way, thanks to some awesome work by @shyouhei (Shyouhei Urabe), newer versions of public C headers
have API documentations. I would recommend reading those docs instead of the implementation.
For example for rb_eval_string():

/**
 * Evaluates the given string in an isolated binding.
 *
 * Here  "isolated"  means  that  the   binding  does  not  inherit  any  other
 * bindings.  This behaves same as the binding for required libraries.
 * ...

I wonder if we host the HTML version of the C API docs anywhere...

Updated by alanwu (Alan Wu) over 2 years ago

  • Status changed from Rejected to Open

I wrote a reproducer and luckily, it's no
longer an issue on the master branch (ab10f111c3).

I spoke too soon! Flipping the if in demo.rb:

ruby 3.2.0dev (2022-06-14T16:06:06Z master ab10f111c3) [x86_64-darwin21]
[:rbeval_string, :main, :mainl]
"p [:in_main, self, @foo, foo]"
[:in_main, main, :main, :mainl]
[:rbeval_string, :main, :inclassl] <<<<<< mixed context!
"p [:inside_foo, self, @foo, foo]"
[:inside_foo, main, :main, :inclassl]

I'll reopen this and look into it later.

Updated by daveola (David Stellar) over 2 years ago

alanwu (Alan Wu) wrote in #note-15:

I spoke too soon! Flipping the if in demo.rb:

I'll reopen this and look into it later.

Doesn't the change in https://bugs.ruby-lang.org/issues/18487 (raising a kernel error) imply that this doesn't need to be fixed?

Also, thanks for the pointer to rb_binding_new(), I'll take a look at that, but issue 18487 implies that it will likely be viewed as incorrect to return the caller's binding.

Updated by alanwu (Alan Wu) over 2 years ago

The issue with #18487 was that Kernel#binding used to find the
first Ruby frame on the stack and that this iterating behavior
is brittle as it basically acts as an assertion that the frames
that it skips over remain as non-Ruby frames in future versions. We
changed it so that it doesn't iterate anymore and only checks
the direct caller. It's easier to reason about since it's a
weaker assertion and there is only one frame involved.

So now binding related APIs should only consider the direct
caller. The weird rb_eval_string() scenario somewhat betrays
this new semantics because when the frame for Kernel#binding
is active the stack looks like this:

   Type Name
   C    Kernel#binding
   Ruby <eval code frame>
   C    my_eval
   Ruby Foo#foo

It returns the local from Foo#foo, so not only does it skip
over a C frame, it also skips over a Ruby frame! Now, maybe
we could understand Kernel#binding's behavior here as not
skipping frames but rather returning a binding previously
created at the time when we call rb_eval_string(), when the
stack looked like:

   C    my_eval
   Ruby Foo#foo

But the way it idiosyncratically creates a binding by mixing
contexts is still surprising. The behavior doesn't match
what the docs for rb_eval_string() would suggest either
since it's not an isolated binding.

Calling rb_binding_new() from a C method doesn't create
a new frame, and it's distinct from calling Kernel#binding
from a C method because of that. Using rb_binding_new() is
somewhat like assuming the role of Kernel#binding. One can only call
rb_binding_new() from a C method so making it raise in that
context would render it completely useless -- not desirable!

Updated by mame (Yusuke Endoh) over 2 years ago

We discussed this issue at the dev meeting.

@matz (Yukihiro Matsumoto) said self should be changed. rb_eval_string should evaluate the argument under the current context. The self should be picked from the context instead of main. If there is no caller binding (which is a main use case for rb_eval_string, as @Eregon (Benoit Daloze) said), self should be main.

We will revisit the decision if compatibility issues are discovered.

Updated by shyouhei (Shyouhei Urabe) over 2 years ago

Updating the header document.

From b968f277386649b7531a8999be54eacf3c599cdb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=8D=9C=E9=83=A8=E6=98=8C=E5=B9=B3?=
 <shyouhei@ruby-lang.org>
Date: Thu, 21 Jul 2022 18:00:04 +0900
Subject: [PATCH] [DOC] update rb_eval_string() document.

---
 include/ruby/internal/eval.h | 33 ++++++++++++++++++++++++++++++---
 template/Doxyfile.tmpl       |  1 +
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/include/ruby/internal/eval.h b/include/ruby/internal/eval.h
index 34a53849da..b7ab66515f 100644
--- a/include/ruby/internal/eval.h
+++ b/include/ruby/internal/eval.h
@@ -28,10 +28,12 @@ RBIMPL_SYMBOL_EXPORT_BEGIN()
 
 RBIMPL_ATTR_NONNULL(())
 /**
- * Evaluates the given string in an isolated binding.
+ * Evaluates the given string.
  *
- * Here  "isolated"  means  that  the   binding  does  not  inherit  any  other
- * bindings.  This behaves same as the binding for required libraries.
+ * In case  it is called  from within a  C-backended method, the  evaluation is
+ * done under  the current binding.  However  there can be no  method.  On such
+ * situation this  function evaluates  in an  isolated binding,  like `require`
+ * runs in a separate one.
  *
  * `__FILE__`  will  be  `"(eval)"`,  and  `__LINE__`  starts  from  1  in  the
  * evaluation.
@@ -39,6 +41,31 @@ RBIMPL_ATTR_NONNULL(())
  * @param[in]  str            Ruby code to evaluate.
  * @exception  rb_eException  Raises an exception on error.
  * @return     The evaluated result.
+ *
+ * @internal
+ *
+ * @shyouhei's old tale about the birth and growth of this function:
+ *
+ * At  the beginning,  there  was no  rb_eval_string().   @shyouhei heard  that
+ * @shugo, author of  Apache httpd's mod_ruby module, requested  @matz for this
+ * API.  He wanted a way so that mod_ruby can evaluate ruby scripts one by one,
+ * separately, in each different contexts.  So  this function was made.  It was
+ * designed to be a global interpreter entry point like ruby_run_node().
+ *
+ * The  way it  is implemented  however  allows extension  libraries (not  just
+ * programs like  Apache httpd) to call  this function.  Because its  name says
+ * nothing  about the  initial design,  people  started to  think of  it as  an
+ * orthodox  way  to  call  ruby  level  `eval`  method  from  their  extension
+ * libraries.  Even our `extension.rdoc` has had a description of this function
+ * basically according to this understanding.
+ *
+ * The old  (mod_ruby like) usage still  works.  But over time,  usages of this
+ * function from extension libraries got  popular, while mod_ruby faded out; is
+ * no  longer maintained  now.  Devs  decided to  actively support  both.  This
+ * function  now auto-detects  how  it is  called, and  switches  how it  works
+ * depending on it.
+ *
+ * @see https://bugs.ruby-lang.org/issues/18780
  */
 VALUE rb_eval_string(const char *str);
 
diff --git a/template/Doxyfile.tmpl b/template/Doxyfile.tmpl
index 36c0b1c8d6..502e171384 100644
--- a/template/Doxyfile.tmpl
+++ b/template/Doxyfile.tmpl
@@ -274,6 +274,7 @@ ALIASES               += "old{1}=Old name of @ref \1.^^@deprecated Use @ref \1 i
 ALIASES               += "shyouhei=\@shyouhei"
 ALIASES               += "ko1=\@ko1"
 ALIASES               += "matz=\@matz"
+ALIASES               += "shugo=\@shugo"
 
 # Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources
 # only. Doxygen will then generate output that is more tailored for C. For
-- 
2.17.1


Updated by Eregon (Benoit Daloze) over 2 years ago

Nice story :)

Probably a typo: s/old table/old tale/

Updated by shyouhei (Shyouhei Urabe) over 2 years ago

Yes, it is a typo. Thank you.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0