Project

General

Profile

Actions

Feature #16142

open

Implement code_range in Proc and Method

Added by okuramasafumi (Masafumi OKURA) about 5 years ago. Updated 10 months ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-dev:50845]

Description

Abstract

Add a new method code_range as an alternative to source_location to Proc and Method

Background

I'd like to get a body from a Proc in TraceLocation gem (https://github.com/yhirano55/trace_location), in order to add what's executed to the output. There's no way to do that in current Ruby implementation, so as an alternative, I considered getting source code location of a Proc.

Proposal

I propose that Proc#code_range and Method#code_range. Other names can work as well, for example Proc#source_region. It returns an array containing filename as a first argument and position information as a second array. For example:
a_proc.position # => [(irb), [1, 5, 3, 25]]

Implementation

I've implemented a simpler version of this, see gist for more details.

https://gist.github.com/okuramasafumi/ac90bbf04a1c13b7d67954c9c5e62553

Notice I use code_location from iseq struct.

Discussion

One might say that we can simply add columns and end position to Proc#source_location. However, this can easily brake existing apps such as Pry.
It's also possible that we add additional keyword argument to Proc#source_location, for instance:
a_proc.source_location(including_range: true)
This change can also break existing apps since in old Rubies this keyword argument cannot be accepted.
Therefore, adding a new method is better in terms of backward compatibility. It might be better at readability as well.

Summary

I propose an API to get code position of Proc and Method so that we can get body of them (especially of a Proc).

Updated by osyo (manga osyo) about 5 years ago

hi.
I want to use it when I want to get the block source code.
e.g. https://github.com/osyo-manga/gem-binding-debug
https://github.com/osyo-manga/gem-binding-debug/blob/a5e19728893ddb92ec04170fcd8afbdf43db2eab/lib/binding/debug.rb#L104-L107

I think it would be nice to add a new method :)
Also, the value is different between iseq code_location and Proc#code_location .
So it ’s better to use a name different from #code_location .
(e.g. #source_region ?

MEMO

Output example when using https://gist.github.com/okuramasafumi/ac90bbf04a1c13b7d67954c9c5e62553.

# in test.rb
expr = proc {
  puts "hoge"
  puts "foo"
  puts "bar"
}

# Return [path, beg_pos.lineno, beg_pos.column, end_pos.lineno, end_pos.column]
p expr.code_location
# => ["./test.rb", 2, 12, 6, 1]

Example using RubyVM::InstructionSequence

# in test.rb
expr = proc {
  puts "hoge"
  puts "foo"
  puts "bar"
}

iseq = RubyVM::InstructionSequence.of(expr)
path = iseq.to_a[6]
code_location = iseq.to_a[4][:code_location]

p [path, *code_location]
# => ["./test.rb", 2, 12, 6, 1]

Updated by Eregon (Benoit Daloze) about 5 years ago

Just a note: "code range" is an implementation-level concept for Strings. For instance there is CR_7BIT which means all characters are < 128.
So I would recommend another method name than code_range, as I think that will be confusing.

IMHO the name should be very similar to source_location since it is highly related.

I think we should just extend source_location and fix the few existing usages that might be problematic.
Detecting whether the method takes keywords should be easy either by trying and rescuing ArgumentError, or using UnboundMethod#arity.

Updated by ioquatix (Samuel Williams) almost 5 years ago

I discussed this previously but I don't know if there is an issue for it.

I believe that returning an array of strings is kind of a bad design.

We should prefer higher level abstraction IMHO, e.g.

proc.source_location -> SourceLocation.new(path, byte_offset, line_offset, character_offset) or something similar. Then, expose some convenient methods like

source_location.read -> String of source code.

Updated by Eregon (Benoit Daloze) almost 5 years ago

Maybe we could change #source_location to return a SourceLocation and have SourceLocation#to_a so file, line = proc.source_location would still work?

Updated by ioquatix (Samuel Williams) almost 5 years ago

Maybe we could change #source_location to return a SourceLocation and have SourceLocation#to_a so file, line = proc.source_location would still work?

I've spent some time thinking about this.

I wanted to understand why we have this proposal.

  • Because source_location returns an array, so it's hard to extend with additional functionality.
  • So, we introduce new method code_range which also returns an array. But this time, the array is different.

This proposal is repeating the same mistake with a new method name. The mistake was to use an Array rather than some rich object. Because it's an array, we cannot add meaningful operations like #filename, #source and other things which would be very useful.

So, given this, I'm against adding a new method, and I'd rather try to fix the existing one.

I thought about how to do this. I looked at some existing code which uses source_location.

I feel like we can do something like the following, to maintain backwards compatibility, while supporting the proposed use case, and allowing further extensions in the future.

class SourceLocation
  def initialize(path, line_number, source_range: nil)
    @path = path
    @line_number = line_number
    @source_range = source_range
  end

  attr :path
  attr :line_number
  
  def to_ary
    [@path, @line_number]
  end
  
  def [] index
    case index
    when 0
      @path
    when 1
      @line_number
    end
  end
  
  def read
    File.open(@path) do |file|
      file.seek(@source_range.min)
      
      return file.read(@source_range.size)
    end
  end
end

module MethodSourceLocation
  def source_location
    SourceLocation.new(*super, source_range: 667...(667+30))
  end
end

Method.prepend(MethodSourceLocation)

class Test
  def foo
    return "bar"
  end
end

test = Test.new
method = test.method(:foo)

source = method.source_location.read

puts source
# def foo
#     return "bar"
#   end

In the case that methods are defined dynamically, #read might work differently, e.g. turn the AST back into source code, or refer to the code which dynamically generated it.

In some cases SourceLocation might refer to Ruby code, and in other cases, C, or Java. We might want to expose #language (for syntax highlighting) and support reading from source code of interpreter. For tools like pry, this is VERY useful.

e.g.

samuel@Fukurou ioquatix % pry
[1] pry(main)> show-source p

From: io.c (C Method):
Owner: Kernel
Visibility: private
Number of lines: 9

static VALUE
rb_f_p(int argc, VALUE *argv, VALUE self)
{
    struct rb_f_p_arg arg;
    arg.argc = argc;
    arg.argv = argv;

    return rb_uninterruptible(rb_f_p_internal, (VALUE)&arg);
}

Some other things maybe worth considering:

  • expose RDoc comments? SourceLocation#read(comments: true) or SourceLocation#comments
  • get the indentation level of the first line e.g. SourceLocation#indentation_level or SourceLocation#read(lines: true) or SourceLocation#first_line (including whitespace). It's useful for nicely printing the source code with the same formatting/indentation as the source file.
  • expose the AST (VM specific) SourceLocation#ast.

In some situations, we wanted to get the source of a method, generate the AST, rewrite it with instrumentation, and compile it back into the interpreter.

Updated by matz (Yukihiro Matsumoto) almost 5 years ago

Note: this is not a final decision:

  • Having a method to retrieve the endpoint of a method/proc is OK for me.
  • The name of the method should not be code_range, because the name is confusing.
  • Making the return value as a specific class seems overkill.
  • What do we need from the method? line numbers? offset? whatever?

Matz.

Updated by okuramasafumi (Masafumi OKURA) 10 months ago

Hi I noticed we didn't get any conclusion on this topic.
I agree that method name should be something different. It follows the Matz's question: "What do we need from the method? line numbers? offset? whatever?
"

https://docs.ruby-lang.org/en/master/IO.html#method-c-read
And we have IO.read method that accepts length and offset parameter. Since it's expected to be used with IO.read, it is convenient to return these two values.

And Matz says

Making the return value as a specific class seems overkill.

And I agree. So the return value could be either Array or Hash. I think Hash is better than Array in this case because the value such as [19, 10] doesn't make sense itself.

So what would be a good name? The combination of length and offset doesn't have a specific name as far as I know. What about location if Proc has this method?
The final version would look like this.

my_proc = proc do
  do_something
end
loc = my_proc.location
p loc # => {length:10, offset:19} maybe wrong number

p File.read(__FILE__, loc[:length], loc[:offset]) # => do_something

Updated by byroot (Jean Boussier) 10 months ago

So what would be a good name?

What about position? I think it goes well together with location.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0