Project

General

Profile

Actions

Misc #20260

closed

ISEQ flag for prism compiler

Added by kddnewton (Kevin Newton) 9 months ago. Updated 9 months ago.

Status:
Closed
Assignee:
-
[ruby-core:116681]

Description

In order to support error highlight, there's needs to be a way to tell which compiler generated an instruction sequence (compile.c or prism_compile.c). That's because when the file is reparsed to find the node based on the node id, we need to know which parser to use. We can't look at the current parser because it may have been dumped to binary and the parser/compile pair might not match up to the current running process.

I would like to add a flag to rb_iseq_constant_body that indicates it came from prism, as well as the :prism hash key to the misc hash in the array form of iseqs. This will allow us to determine the source of the iseq from both C and Ruby.

Since this is user-facing (albeit from an experimental library that shouldn't be relied upon) I wanted to make sure this way okay before merging.

Updated by ko1 (Koichi Sasada) 9 months ago

I'm okay to change the internal data structure because we can change them in future.

How to change the user facing interface? Changing only to_a format?
For example, errror_highlight needs to know the code which comes from.

Endoh-san propose me to raise an error if RubyVM::AST.of() is called for ISeqs from prism.

Updated by kddnewton (Kevin Newton) 9 months ago

Yes, the user-facing part for iseq is only changing RubyVM::InstructionSequence.to_a to include prism: true/false.

For error highlight to know, we will either need to change RubyVM::AST.of to know that it is coming from prism or we will need to add something to Thread::Backtrace::Location like prism?.

This code here: https://github.com/ruby/error_highlight/blob/80ede6b8ca3219310f30cff42cd17177a7e47f25/lib/error_highlight/base.rb#L55C7-L59 does:

return nil unless Thread::Backtrace::Location === loc

node = RubyVM::AbstractSyntaxTree.of(loc, keep_script_lines: true)

Spotter.new(node, **opts).spot

That would need to be either:

return nil unless Thread::Backtrace::Location === loc

if loc.prism?
  node = Prism.node_for(loc)
  PrismSpotter.new(node, **opts).spot
else
  node = RubyVM::AbstractSyntaxTree.of(loc, keep_script_lines: true)
  Spotter.new(node, **opts).spot
end

or it would need to be:

return nil unless Thread::Backtrace::Location === loc

node = RubyVM::AbstractSyntaxTree.of(loc, keep_script_lines: true)

if node.is_a?(Prism::Node)
  PrismSpotter.new(node, **opts).spot
else
  Spotter.new(node, **opts).spot
end

or maybe some other kind of API change. Either way we would need to access the prism flag that is on the iseq associated with the backtrace location. I'm fine with any kind of API that gets us there.

Updated by kddnewton (Kevin Newton) 9 months ago

Sorry to respond again, but in thinking about it, I think Prism.node_for might not be right. Maybe it should be RubyVM::PrismSyntaxTree.of, or something else under RubyVM because it is specific to CRuby.

Updated by kddnewton (Kevin Newton) 9 months ago

Ahh sorry again, but actually if we introduce Thread::Backtrace::Location#iseq, all of this can be done in Ruby, because we could make it:

return nil unless Thread::Backtrace::Location === loc

if loc.iseq.to_a[4][:prism]
  # reparse with prism
else
  node = RubyVM::AbstractSyntaxTree.of(loc, keep_script_lines: true)
  Spotter.new(node, **opts).spot
end

Updated by mame (Yusuke Endoh) 9 months ago

How about the following API?

  • Make RubyVM::AbstractSyntaxTree.of raise a RuntimeError for objects with iseq complied from Prism.
  • Introduce Thread::Backtrace::Location#node_id, which returns node_id at the current instruction, regardless of compiled from parse.y or Prism.
begin
  node = RubyVM::AbstractSyntaxTree.of(loc, keep_script_lines: true)
  Spotter.new(node, **opt).spot
rescue RuntimeError
  if $!.message =~ /prism/
    node = Prism.parse(loc.path).node_for(loc.node_id)
    PrismSpotter.new(node, **opt).spot
  else
    raise
  end
end

The rationale of this API design is as follows.

In principle, we don't want to introduce a temporal method outside of RubyVM that will be useless in the near future. Location#prism? will become useless when Prism becomes Ruby's parser successfully. Ruby VM::InstructionSequence.complied_by_prism?(loc) is a possible option. However, we need to make RubyVM::AbstractSyntaxTree.of raise an exception for the Prism-derived iseq anyway. If so, you can use it to determine if the iseq is compiled from Prism without having any extra methods.

Location#iseq is not good because it returns CRuby's internal data structure. Location#node_id is not best too, but it seems acceptable and necessary to tell Prism the node_id.

Updated by mame (Yusuke Endoh) 9 months ago

BTW, I prefer to have ErrorHighlight::Spotter.new accept both RubyVM::AST::Node and Prism::Node instead of adding ErrorHighlight::PrismSpotter. (I guess you showed it just for clarity.)

Updated by kddnewton (Kevin Newton) 9 months ago

Okay! That makes sense, thank you for the discussion. I'm also fine making ErrorHighlight::Spotter handle both, as you say it was just for illustrative purposes. I'll update this ticket when I have updated RubyVM::AST.of and Thread::Backtrace::Location.node_id in the PR.

Updated by Eregon (Benoit Daloze) 9 months ago

It would be great if ErrorHighlight can work on other Ruby implementations too, which means not relying on RubyVM.

Thread::Backtrace::Location#node_id might not be ideal because other Ruby implementations AFAIK don't keep a node_id (for footprint reasons).

TruffleRuby OTOH keeps the byte offset and byte length of each node, so this could be used to identify a given call node.
That also seems similar to the 2nd paragraph in https://github.com/ruby/prism/issues/2401.

I think ErrorHighlight could even use the byte offset + byte length to reparse just that part of the file and find where the ./[/etc is.
Then the generic implementation-agnostic API could be Thread::Backtrace::Location#{byte_offset,byte_length} (or {first_column,first_lineno,last_column,last_lineno} but it seems less efficient/convenient).
Or even Thread::Backtrace::Location#code which would return a String containing the source code of that call node.

@mame (Yusuke Endoh) WDYT, would one of these be enough for ErrorHighlight? If so it sounds ideal because it avoids needing implementation details like node_id, iseq, etc.

Updated by kddnewton (Kevin Newton) 9 months ago

@mame (Yusuke Endoh) I've made these changes here: https://github.com/ruby/ruby/pull/9934. Could you review? Thanks!

Updated by mame (Yusuke Endoh) 9 months ago

@Eregon (Benoit Daloze) I don't think it is a good idea to try to guess a node from a code range. To identify a unique node, the Prism AST would have to avoid creating multiple nodes from the exact same code range. It may be possible to carefully design the AST that way, but I am afraid if it will prevent from future syntax extensions.

Though I don't think node_id is super cool, it will no longer be an implementation detail, assuming the Prism AST. Calculating byte_offset + byte_length from node_id is trivial, and the reverse is not.

By the way, is there any advantage of the representation of byte_offset and byte_length? It is larger in size than (or at best equal to) node_id. To display to the user, we need to convert it to lineno + column. It may be useful to blindly extract the corresponding code fragment from a file, but I think such a use case is rare.

Updated by Eregon (Benoit Daloze) 9 months ago

mame (Yusuke Endoh) wrote in #note-11:

@Eregon (Benoit Daloze) I don't think it is a good idea to try to guess a node from a code range. To identify a unique node, the Prism AST would have to avoid creating multiple nodes from the exact same code range. It may be possible to carefully design the AST that way, but I am afraid if it will prevent from future syntax extensions.

I think for call nodes it would identify uniquely, but it may not be true for all nodes.

By the way, is there any advantage of the representation of byte_offset and byte_length? It is larger in size than (or at best equal to) node_id. To display to the user, we need to convert it to lineno + column. It may be useful to blindly extract the corresponding code fragment from a file, but I think such a use case is rare.

It's basically byte_offset and byte_length vs line and node_id.
If all 4 are the same integer size then it's the same in memory footprint, but from byte_offset and byte_length we also have precise column information readily available without parsing again (although that means keeping the source code in memory or rereading the file, could be avoided if a file is only US-ASCII characters then only line offsets is enough), which is nice for tools, debugger, etc.

Updated by kddnewton (Kevin Newton) 9 months ago

@matz (Yukihiro Matsumoto) The work is done to add support for the flag on the ISEQ, but I think we additionally need your approval on adding Thread::Backtrace::Location#node_id.

Updated by kddnewton (Kevin Newton) 9 months ago

  • Status changed from Open to Closed

@mame (Yusuke Endoh) remembered that we have RubyVM::AbstractSyntaxTree.node_id_for_backtrace_location(loc), so we don't actually need Thread::Backtrace::Location#node_id. In this case we've only implemented the flag for the body.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0