Project

General

Profile

Actions

Feature #20999

closed

Add RubyVM object source support

Added by bkuhlmann (Brooke Kuhlmann) 3 days ago. Updated about 16 hours ago.

Status:
Rejected
Assignee:
-
Target version:
-
[ruby-core:120467]

Description

Hello. 👋

I'd like to propose adding the ability to acquire the source of any object within memory via the RubyVM. A couple use cases come to mind:

  • This would simplify the Method Source gem implementation and possibly eliminate the need for the gem.
  • Another use case is this allows DSLs, like Initable, to elegantly acquire the source code of objects and/or functions (in my case, I'm most interested in the lazy evaluation of function bodies).

⚠️ I'm also aware that the RubyVM documentation clearly stats this isn't meant for production use:

This module is for very limited purposes, such as debugging, prototyping, and research. Normal users must not use it. This module is not portable between Ruby implementations.

...but I'd like to entertain this proposed feature request, regardless. Here's an example, using the aforementioned Initable gem, where I use the RubyVM to obtain the source of a Proc:

class Demo
  include Initable[%i[req name], [:key, :default, proc { Object.new }]]
end

puts Demo.new("demo").inspect
#<Demo:0x000000014349a400 @name="demo", @default=#<Object:0x000000014349a360>>

With the above, I'm lazily obtaining the source code of the Proc in order to dynamically define the #initialize method (essentially a module_eval on Demo, simply speaking) using a nested array as specified by Method#parameters because I don't want an instance of Object until initialization is necessary.

Context

Prior to the release of Ruby 3.4.0, you could do this:

function = proc { Object.new }
ast = RubyVM::AbstractSyntaxTree.of function
ast.children.last.source

# "Object.new"

Unfortunately, with the release of Ruby 3.4.0 -- which defaults to the Prism parser -- the ability to acquire source code is a bit more complicated. For example, to achieve what is shown above, you have to do this:

function = proc { Object.new }
RubyVM::InstructionSequence.of(function).script_lines

# [
#   "function = proc { Object.new }\n",
#   "RubyVM::InstructionSequence.of(function).script_lines\n",
#   "\n",
#   ""
# ]

Definitely doable, but now requires more work to pluck "Object.new" from the body of the Proc. One solution is to use a regular expression to find and extract the first line of the result. Example:

/
  proc          # Proc statement.
  \s*           # Optional space.
  \{            # Block open.
  (?<body>.*?)  # Source code body.
  \}            # Block close.
/x

Definitely doesn't account for all use cases (like when a Proc spans multiple lines or uses do...end syntax) but will get you close.

How

I think there are a couple of paths that might be nice to support this use case.

Option A

Teach RubyVM::InstructionSequence to respond to #source which would be similar to what was possible prior to Ruby 3.4.0. Example:

function = proc { Object.new }
RubyVM::InstructionSequence.of(function).source

# "Object.new"

Option B

This is something that Samuel Williams mentioned in Feature 6012 which would be to provide a Source object as answered by Method#source and Proc#source. Example (using a Proc):

# Implementation
# Method#source (i.e. Source.new path, line_number, line_count, body)

# Usage:

function = proc { Object.new }

method.source.code      # "Object.new"
method.source.path      # "$HOME/demo.rb"
method.source.location  # [2, 0, 3, 3]

Option C

It could be nice to support both Option A and B.


Related issues 1 (1 open0 closed)

Related to Ruby master - Feature #21005: Update the source location method to include line start/stop and column start/stop detailsOpenActions

Updated by kddnewton (Kevin Newton) 3 days ago

Proc, Method, and UnboundMethod all respond to #source_location already, so there is prior art here we can lean on. Personally I'd rather see that method expanded to include columns and end line, because that simplifies this whole discussion. Further enhancing RubyVM when it is primarily meant for debugging seems like not a great direction.

If you do have the columns and end line, then it's possible to read the file and extract that source you're talking about. If RubyVM isn't defined, you can use Prism to do a slightly better educated guess. Here's a script that combines both, so that it's portable to other Ruby implementations:

def prism_callable(callable, absolute_path, lineno)
  require "prism"
  root = Prism.parse_file(absolute_path).value

  case callable
  when Method, UnboundMethod
    root.breadth_first_search do |node|
      node.start_line == lineno && node.is_a?(Prism::DefNode) &&
        node.name == callable.name
    end
  when Proc
    root.breadth_first_search do |node|
      node.start_line == lineno && (
        (node.is_a?(Prism::CallNode) && node.name == :proc) ||
          node.is_a?(Prism::LambdaNode)
      )
    end
  end
end

def source_location(callable)
  if defined?(RubyVM::InstructionSequence)
    iseq = RubyVM::InstructionSequence.of(callable)
    [iseq.absolute_path, *iseq.to_a[4][:code_location]]
  else
    absolute_path, lineno = callable.source_location
    found = prism_callable(callable, absolute_path, lineno)
    [absolute_path, found.start_line, found.start_column, found.end_line, found.end_column] if found
  end
end

def source(callable)
  location = source_location(callable)
  return nil unless location

  filepath, start_line, start_column, end_line, end_column = location
  lines = File.read(filepath).lines[(start_line - 1)..(end_line - 1)]

  lines[-1] = lines[-1].byteslice(0...end_column)
  lines[0] = lines[0].byteslice(start_column..-1)
  lines.join
end

class Foo
  def bar; end
end

p source(-> { Object.new })
p source(proc { Object.new })
p source(Foo.new.method(:bar))
p source(Foo.instance_method(:bar))

The Prism part won't work in weird edge cases like defining a method with the same name on the same line, like def foo; end; def foo; end, but those cases should be few and far between. Note that if #source_location were to be enhanced, it wouldn't be necessary to pull in Prism at all here

Updated by bkuhlmann (Brooke Kuhlmann) 3 days ago

Thanks! This clarifies confusion I had between RubyVM::InstructionSequence and Prism.

I'd rather see that method expanded to include columns and end line, because that simplifies this whole discussion.

Agreed. I think this would be most helpful for debugging and dynamic parsing of original source code. Should a different issue be opened for this?

The Prism part won't work in weird edge cases like defining a method with the same name on the same line,

For my situation, I think that's within an acceptable margin of error. 😉

[I]f #source_location were to be enhanced, it wouldn't be necessary to pull in Prism at all here

Nice. So -- to return to my question above -- should I open an issue specifically for improving #source_location to include columns and end lines?

By the way, here's what I see when running your code snippet locally:

# RubyVM::InstructionSequence
# " { Object.new }"
# "{ Object.new }"
# "def bar; end"
# "def bar; end"

# Prism
# "-> { Object.new }"
# "proc { Object.new }"
# "def bar; end"
# "def bar; end"

I prefer the former (RubyVM::InstructionSequence) because the result, especially for procs/lambdas is much easier to clean up than what Prism answers back. Plus, as you stated, this would eliminate the need to require Prism.

There is still one outstanding issue, though, which is dealing with IRB. For instance, in my description above I show how you can use the AST in Ruby 3.3.6 to obtain the source of a function:

function = proc { Object.new }
ast = RubyVM::AbstractSyntaxTree.of function
ast.children.last.source

# "Object.new"

There doesn't appear to be an equivalent way to do this in Ruby 3.4.0. Is there a way to parse in-memory objects like Ruby 3.3.6 had? I find this so useful when experimenting within IRB.

Updated by kddnewton (Kevin Newton) 2 days ago

Should a different issue be opened for this?

Yeah, I would say make that a new issue for expanding source_location or making a new method.

There is still one outstanding issue, though, which is dealing with IRB. For instance, in my description above I show how you can use the AST in Ruby 3.3.6 to obtain the source of a function:

Yeah, for that you're still going to need to rely on RubyVM::InstructionSequence.of().script_lines, because that's not exposed in any other way. It's on the iseq itself, so it would be possible to expose in another API, so it would need to be a feature request.

Just for completeness, for the example code above you would replace File.read(filepath) with RubyVM::InstructionSequence.of(callable).source_lines.join.

Updated by Eregon (Benoit Daloze) 1 day ago

bkuhlmann (Brooke Kuhlmann) wrote in #note-2:

Nice. So -- to return to my question above -- should I open an issue specifically for improving #source_location to include columns and end lines?

I would say a new ticket and a comment on https://bugs.ruby-lang.org/issues/6012 since it seems pretty similar.

This functionality is not CRuby-specific, so it shouldn't be under RubyVM.

Updated by bkuhlmann (Brooke Kuhlmann) 1 day ago

Kevin: Thanks and thanks for reminding of RubyVM::InstructionSequence.of().script_lines. I'm using that as a fallback when the absolute path of the instruction sequence can't be found.

Benoit: Thanks. I've also updated Feature 6012 as well.

Both: Thanks for all of the feedback. I've logged a new feature request here: 21005. Feel free to close this issue.

Actions #6

Updated by Eregon (Benoit Daloze) about 16 hours ago

  • Status changed from Open to Rejected
Actions #7

Updated by Eregon (Benoit Daloze) about 16 hours ago

  • Related to Feature #21005: Update the source location method to include line start/stop and column start/stop details added
Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0