Feature #19617
openAdd Method#binding and UnboundMethod#binding, similar to Proc#binding
Description
When a method is defined dynamically with define_method
, it would be useful to obtain access to the defining block's binding and the local variables it has captured, based on the defining block's binding. For methods defined using the def
keyword, the binding's local variables might be empty or might be all of the names in the method's locals table, with all values set to nil
.
For UnboundMethod, it is unclear (to me) what the appropriate receiver for the binding would be, so perhaps unbound.binding.receiver
should raise an exception.
Alternatively (or additionally), something like Method#defining_proc
and UnboundMethod#defining_proc
might be added and return nil
for def
definitions and the proc for define_method
definitions.
This would be a useful tool when debugging from the console. As another example, it might be used to scan a code base for dynamically generated regexps which are only reachable via the enclosed local variables and test that they are all linear time (see https://github.com/ruby/net-imap/blob/92db350b24c388d2a2104f36cac9caa49a1044df/test/net/imap/regexp_collector.rb).
Updated by Eregon (Benoit Daloze) over 1 year ago
nevans (Nicholas Evans) wrote:
For methods defined using the
def
keyword, the binding's local variables might be empty or might be all of the names in the method's locals table, with all values set tonil
.
Rather, the binding
method should return nil
. There is simply no captured binding for def
methods.
Updated by Eregon (Benoit Daloze) over 1 year ago
This would be a useful tool when debugging from the console. As another example, it might be used to scan a code base for dynamically generated regexps which are only reachable via the enclosed local variables and test that they are all linear time
Interesting. Maybe an easier/better/more portable way to do that would be to have a flag to check that all Regexp are linear when they are executed, e.g., by the test suite?
Like https://speakerdeck.com/eregon/just-in-time-compiling-ruby-regexps-on-truffleruby?slide=19
Maybe it could be a performance warning (or even its own warning category) and then you could easily tweak the behavior, e.g. raise an exception if there is such a warning.
TruffleRuby already has a flag to enable such a warning for non-linear regexps, currently it's named --warn-truffle-regex-compile-fallback
but we'd probably want something shorter longer term.
It could also be a flag that can be set directly by Ruby code, not necessarily or only a command-line flag.
Updated by byroot (Jean Boussier) over 1 year ago
Or iterate over all regexps through ObjectSpace.each_object(Regexp)
? But I suppose it doesn't allow to only look at a specific namespace.
Updated by Eregon (Benoit Daloze) over 1 year ago
Ah BTW one concern I have with https://github.com/ruby/net-imap/blob/92db350b24c388d2a2104f36cac9caa49a1044df/test/net/imap/regexp_collector.rb is it could only work CRuby due to using RubyVM::InstructionSequence.
That seems problematic for a security thing that really isn't CRuby-specific.
Updated by nevans (Nicholas Evans) over 1 year ago
Eregon (Benoit Daloze) wrote in #note-2:
Maybe an easier/better/more portable way to do that would be to have a flag to check that all Regexp are linear when they are executed, e.g., by the test suite?
Like https://speakerdeck.com/eregon/just-in-time-compiling-ruby-regexps-on-truffleruby?slide=19
Maybe it could be a performance warning (or even its own warning category) and then you could easily tweak the behavior, e.g. raise an exception if there is such a warning.
...
It could also be a flag that can be set directly by Ruby code, not necessarily or only a command-line flag.
Yes, I like that a lot. Perhaps something like Regexp.warn_nonlinear = true
or Regexp.on_nonlinear = ->re { warn "nonlinear" }
. And I think it's especially useful to warn or raise when the regexp is created, not only when it's executed. But that would be a different issue from this one.
When discussing this particular use case with others, several people suggested new regexp options to either enforce linear runtime when the regexp is created, or to allow non-linear regexps without warnings or exceptions (e.g. for simple scripts or when the inputs are safely controlled), e.g. /enforced linear/l
and /allow non-linear/L
. But that would also be a different issue from this one. ;)
Eregon (Benoit Daloze) wrote in #note-4:
one concern I have with https://github.com/ruby/net-imap/blob/92db350b24c388d2a2104f36cac9caa49a1044df/test/net/imap/regexp_collector.rb is it could only work CRuby due to using RubyVM::InstructionSequence.
That seems problematic for a security thing that really isn't CRuby-specific.
It also doesn't work with CRuby 3.0 or 3.1, and it isn't inspecting the iseq for procs that are stored as constants, etc. It can't catch everything, but it's a useful first step: it already found three existing non-linear regexps, and prevented a PR that would have added a fourth. But improvements to that test are also a different issue from this one.
I assume TruffleRuby has a comparable mechanism to find regexps within the methods that define them, and PRs are welcome. But TruffleRuby already has a built in approach for warning on non-linear regexps are executed, so it's ahead on this. We should probable enable that TruffleRuby config flag in CI... but that's a different issue from this one. ;)
byroot (Jean Boussier) wrote in #note-3:
Or iterate over all regexps through
ObjectSpace.each_object(Regexp)
? But I suppose it doesn't allow to only look at a specific namespace.
Yes, that is one of the reasons I didn't want to use that approach in the tests. Related but more significant for my use-case: ObjectSpace doesn't tell me what created the regexp or is holding a reference to any regexps I find. But it's still a useful technique that I used to manually gauge how thorough the tests were.
Another approach would be to use TracePoint while loading and/or testing the library. I think that should allow limiting by namespace... but it's still a different ticket from this one.
Updated by nevans (Nicholas Evans) over 1 year ago
Perhaps I shouldn't have given the Regexp use-case, since there are many other approaches we can (and should) use to audit our regexps. Although that was the immediate trigger for this ticket, it wasn't the first or the most common reason I've wanted this feature.
I think this (or something like it) should be added for all of the same reasons that Proc#binding
was added. The most common use-case (for me) is simply debugging and exploring through run-time inspection on the console. The only way I know of to get captured local variables at the moment is to use TracePoint or a debug breakpoint and execute the code. But that shouldn't be required.
Besides this and data that is managed by native extensions, what other forms of ruby state require either tracepoint or a debug breakpoint to inspect? There's already a ticket for #15778.
Updated by nevans (Nicholas Evans) over 1 year ago
Another use-case is simply to match Proc#binding
. I originally assumed that Method#binding and UnboundMethod#binding already existed. And when I asked several other very knowledgable rubyist's if they knew how to access the enclosed local variables, most of them had the same assumption.
Updated by Eregon (Benoit Daloze) over 1 year ago
I think Method#defining_proc
and UnboundMethod#defining_proc
make more sense, as most methods (all the ones defined with def
) don't have a Binding so just having Method#binding
defined would be misleading, and also if there are other aspects of the Proc we wouldn't want to duplicate all those methods on Method and UnboundMethod.
I think we need a good use case though to add defining_proc
, hence the discussion above about the use case you linked to.
I think this (or something like it) should be added for all of the same reasons that Proc#binding was added.
I think nobody remembers why Proc#binding was added. Some issues are even considering to remove it because it causes a significant overhead to Ruby performance.
The most common use-case (for me) is simply debugging and exploring through run-time inspection on the console.
Could you give a concrete example? Of something you could do if that was added and you cannot today or it's much more difficult?