Feature #16142
openImplement code_range in Proc and Method
Description
Abstract¶
Add a new method code_range
as an alternative to source_location
to Proc and Method
Background¶
I'd like to get a body from a Proc in TraceLocation gem (https://github.com/yhirano55/trace_location), in order to add what's executed to the output. There's no way to do that in current Ruby implementation, so as an alternative, I considered getting source code location of a Proc.
Proposal¶
I propose that Proc#code_range
and Method#code_range
. Other names can work as well, for example Proc#source_region
. It returns an array containing filename as a first argument and position information as a second array. For example:
a_proc.position # => [(irb), [1, 5, 3, 25]]
Implementation¶
I've implemented a simpler version of this, see gist for more details.
https://gist.github.com/okuramasafumi/ac90bbf04a1c13b7d67954c9c5e62553
Notice I use code_location
from iseq struct.
Discussion¶
One might say that we can simply add columns and end position to Proc#source_location. However, this can easily brake existing apps such as Pry.
It's also possible that we add additional keyword argument to Proc#source_location
, for instance:
a_proc.source_location(including_range: true)
This change can also break existing apps since in old Rubies this keyword argument cannot be accepted.
Therefore, adding a new method is better in terms of backward compatibility. It might be better at readability as well.
Summary¶
I propose an API to get code position of Proc and Method so that we can get body of them (especially of a Proc).
Updated by osyo (manga osyo) about 5 years ago
hi.
I want to use it when I want to get the block source code.
e.g. https://github.com/osyo-manga/gem-binding-debug
https://github.com/osyo-manga/gem-binding-debug/blob/a5e19728893ddb92ec04170fcd8afbdf43db2eab/lib/binding/debug.rb#L104-L107
I think it would be nice to add a new method :)
Also, the value is different between iseq code_location
and Proc#code_location
.
So it ’s better to use a name different from #code_location
.
(e.g. #source_region
?
MEMO
Output example when using https://gist.github.com/okuramasafumi/ac90bbf04a1c13b7d67954c9c5e62553.¶
# in test.rb
expr = proc {
puts "hoge"
puts "foo"
puts "bar"
}
# Return [path, beg_pos.lineno, beg_pos.column, end_pos.lineno, end_pos.column]
p expr.code_location
# => ["./test.rb", 2, 12, 6, 1]
Example using RubyVM::InstructionSequence
# in test.rb
expr = proc {
puts "hoge"
puts "foo"
puts "bar"
}
iseq = RubyVM::InstructionSequence.of(expr)
path = iseq.to_a[6]
code_location = iseq.to_a[4][:code_location]
p [path, *code_location]
# => ["./test.rb", 2, 12, 6, 1]
Updated by Eregon (Benoit Daloze) about 5 years ago
Just a note: "code range" is an implementation-level concept for Strings. For instance there is CR_7BIT which means all characters are < 128.
So I would recommend another method name than code_range
, as I think that will be confusing.
IMHO the name should be very similar to source_location
since it is highly related.
I think we should just extend source_location
and fix the few existing usages that might be problematic.
Detecting whether the method takes keywords should be easy either by trying and rescuing ArgumentError, or using UnboundMethod#arity.
Updated by ioquatix (Samuel Williams) about 5 years ago
I discussed this previously but I don't know if there is an issue for it.
I believe that returning an array of strings is kind of a bad design.
We should prefer higher level abstraction IMHO, e.g.
proc.source_location
-> SourceLocation.new(path, byte_offset, line_offset, character_offset)
or something similar. Then, expose some convenient methods like
source_location.read
-> String
of source code.
Updated by Eregon (Benoit Daloze) about 5 years ago
Maybe we could change #source_location to return a SourceLocation
and have SourceLocation#to_a
so file, line = proc.source_location
would still work?
Updated by ioquatix (Samuel Williams) about 5 years ago
Maybe we could change #source_location to return a
SourceLocation
and haveSourceLocation#to_a
sofile, line = proc.source_location
would still work?
I've spent some time thinking about this.
I wanted to understand why we have this proposal.
- Because
source_location
returns an array, so it's hard to extend with additional functionality. - So, we introduce new method
code_range
which also returns an array. But this time, the array is different.
This proposal is repeating the same mistake with a new method name. The mistake was to use an Array rather than some rich object. Because it's an array, we cannot add meaningful operations like #filename
, #source
and other things which would be very useful.
So, given this, I'm against adding a new method, and I'd rather try to fix the existing one.
I thought about how to do this. I looked at some existing code which uses source_location
.
I feel like we can do something like the following, to maintain backwards compatibility, while supporting the proposed use case, and allowing further extensions in the future.
class SourceLocation
def initialize(path, line_number, source_range: nil)
@path = path
@line_number = line_number
@source_range = source_range
end
attr :path
attr :line_number
def to_ary
[@path, @line_number]
end
def [] index
case index
when 0
@path
when 1
@line_number
end
end
def read
File.open(@path) do |file|
file.seek(@source_range.min)
return file.read(@source_range.size)
end
end
end
module MethodSourceLocation
def source_location
SourceLocation.new(*super, source_range: 667...(667+30))
end
end
Method.prepend(MethodSourceLocation)
class Test
def foo
return "bar"
end
end
test = Test.new
method = test.method(:foo)
source = method.source_location.read
puts source
# def foo
# return "bar"
# end
In the case that methods are defined dynamically, #read
might work differently, e.g. turn the AST back into source code, or refer to the code which dynamically generated it.
In some cases SourceLocation
might refer to Ruby code, and in other cases, C, or Java. We might want to expose #language
(for syntax highlighting) and support reading from source code of interpreter. For tools like pry
, this is VERY useful.
e.g.
samuel@Fukurou ioquatix % pry
[1] pry(main)> show-source p
From: io.c (C Method):
Owner: Kernel
Visibility: private
Number of lines: 9
static VALUE
rb_f_p(int argc, VALUE *argv, VALUE self)
{
struct rb_f_p_arg arg;
arg.argc = argc;
arg.argv = argv;
return rb_uninterruptible(rb_f_p_internal, (VALUE)&arg);
}
Some other things maybe worth considering:
- expose RDoc comments?
SourceLocation#read(comments: true)
orSourceLocation#comments
- get the indentation level of the first line e.g.
SourceLocation#indentation_level
orSourceLocation#read(lines: true)
orSourceLocation#first_line
(including whitespace). It's useful for nicely printing the source code with the same formatting/indentation as the source file. - expose the AST (VM specific)
SourceLocation#ast
.
In some situations, we wanted to get the source of a method, generate the AST, rewrite it with instrumentation, and compile it back into the interpreter.
Updated by matz (Yukihiro Matsumoto) about 5 years ago
Note: this is not a final decision:
- Having a method to retrieve the endpoint of a method/proc is OK for me.
- The name of the method should not be
code_range
, because the name is confusing. - Making the return value as a specific class seems overkill.
- What do we need from the method? line numbers? offset? whatever?
Matz.
Updated by okuramasafumi (Masafumi OKURA) 12 months ago
Hi I noticed we didn't get any conclusion on this topic.
I agree that method name should be something different. It follows the Matz's question: "What do we need from the method? line numbers? offset? whatever?
"
https://docs.ruby-lang.org/en/master/IO.html#method-c-read
And we have IO.read
method that accepts length
and offset
parameter. Since it's expected to be used with IO.read
, it is convenient to return these two values.
And Matz says
Making the return value as a specific class seems overkill.
And I agree. So the return value could be either Array or Hash. I think Hash is better than Array in this case because the value such as [19, 10]
doesn't make sense itself.
So what would be a good name? The combination of length and offset doesn't have a specific name as far as I know. What about location
if Proc
has this method?
The final version would look like this.
my_proc = proc do
do_something
end
loc = my_proc.location
p loc # => {length:10, offset:19} maybe wrong number
p File.read(__FILE__, loc[:length], loc[:offset]) # => do_something
Updated by byroot (Jean Boussier) 12 months ago
So what would be a good name?
What about position
? I think it goes well together with location
.