Project

General

Profile

Actions

Feature #17930

closed

Add column information into error backtrace

Added by mame (Yusuke Endoh) over 3 years ago. Updated over 3 years ago.

Status:
Closed
Target version:
-
[ruby-core:104109]

Description

Consider the following code and error.

data["data"].first["field"] #=> undefined method `[]` for nil:NilClass

There are two possibilities; the variable data is nil, or the return value of first is nil. Unfortunately, the error message is less informative to say which.

This proposal allows to help identifying which method call failed.

$ ruby -r ./sample/no_method_error_ext.rb err1.rb
err1.rb:2:in `<main>': undefined method `[]' for nil:NilClass (NoMethodError)

data["data"].first["field"]
                  ^^^^^^^^^

Proposal

I'd like to propose a feature to get column information from each Thread::BacktraceLocation. Maybe it is good to provide the following four methods:

  • Thread::BacktraceLocation#first_lineno
  • Thread::BacktraceLocation#first_column
  • Thread::BacktraceLocation#last_lineno
  • Thread::BacktraceLocation#last_column

These names came from RubyVM::AbstraceSyntaxTree::Node's methods.

Implementation

Here is a proof-of-concept implementation: https://github.com/ruby/ruby/pull/4540

See https://github.com/ruby/ruby/pull/4540/commits/6ff516f4985826e9f9c5606638001c3c420f7cad for an example usage.
(Note that, currently, you need to build ruby with ./configure cflags=-DEXPERIMENTAL_ISEQ_NODE_ID to enable the feature.)

To put it simply, this PR provides only a raw API, Thread::BacktraceLocation#node_id. To get actual column information, you need to manually identify RubyVM::AbstractSyntaxTree::Node that corresponds to Thread::BacktraceLocation#node_id.
But it would be arguable to expose "node_id", so I will wrap it as the above four methods if this is accepted.

Credit: the original implementation was done by @yui-knk (Kaneko Yuichiro).

Drawback

To use this feature, we need to enable -DEXPERIMENTAL_ISEQ_NODE_ID to add "node_id" information (a subtree ID of the original abstract syntax tree) into each byte code instruction. If we provide this feature, the option should be enabled by default. However, the option increases memory consumption.

I performed a simple experiment: I created a scaffold app by rails new, and measured the memory usage after rails s. The result was 97 MB without -DEXPERIMENTAL_ISEQ_NODE_ID, and 100 MB with the option enabled.

In my opinion, it is not so large, but requiring more gems will increase the difference. I will appriciate it if anyone could provide the actual memory increase in a more practical Rails app.

Do you think this feature deserves the memory increase?


Files

image.png (73.3 KB) image.png ttanimichi (Tsukuru Tanimichi), 05/31/2021 12:22 PM

Related issues 4 (2 open2 closed)

Related to Ruby master - Feature #18231: `RubyVM.keep_script_lines`Closedko1 (Koichi Sasada)Actions
Related to Ruby master - Feature #8751: Add offsets to method#source_locationOpenActions
Related to Ruby master - Feature #6012: Proc#source_location also return the columnAssignednobu (Nobuyoshi Nakada)Actions
Is duplicate of Ruby master - Feature #10982: Clarify location of NoMethod errorClosedActions

Updated by ttanimichi (Tsukuru Tanimichi) over 3 years ago

Hi, I like this idea!
This feature would be quite useful when you get NoMethodError in the production environment and error tracking tools like Sentry inform you of that error.
Consider the example I've attached to this comment. This method chain calls Array#[] two times in a line, and you can’t tell which receiver of .[] method calls was nil. To investigate this incident, you need to SSH into the production environment and check what the actual value is. It would be quite inconvenient.

NoMethodError

Updated by matz (Yukihiro Matsumoto) over 3 years ago

I agree with the idea too. Let's see how it works.

Matz.

Updated by mame (Yusuke Endoh) over 3 years ago

  • Status changed from Open to Assigned
  • Assignee set to mame (Yusuke Endoh)

Thank you @matz (Yukihiro Matsumoto) .

I talked with @ko1 (Koichi Sasada), and we agreed that allowing RubyVM::AbstractSyntaxTree.of(Thread::Backtrace::Location) is a good approach to allow this feature.

def target
  RubyVM::AbstractSyntaxTree.of(caller_locations[0])
end

p target #=> #<RubyVM::AbstractSyntaxTree::Node:VCALL@5:2-5:8>
# ^^^^^^ Line 5, Column 2--8

I've created a new PR. https://github.com/ruby/ruby/pull/4558

By using this feature, we can implement an error message with column information inside of did_you_mean gem. I'll create another PR later.

Updated by Eregon (Benoit Daloze) over 3 years ago

I talked with ko1 (Koichi Sasada), and we agreed that allowing RubyVM::AbstractSyntaxTree.of(Thread::Backtrace::Location) is a good approach to allow this feature.

I strongly disagree, RubyVM is a CRuby-specific API.
This effectively prevents other Ruby implementations to support this new feature for no good reason.

Thread::Backtrace::Location methods are much better and make more sense, and they don't mix unrelated things.

So +1 for this feature, but it MUST be methods on shared public API (on Thread::Backtrace::Location), not RubyVM-specific.

Besides, RubyVM::AbstractSyntaxTree.of(Thread::Backtrace::Location) seems inefficient and brittle as it likely involves parsing, requires the file to exist on the filesystem, etc.
There is no point to use RubyVM::AbstractSyntaxTree when all the information is already in the Thread::Backtrace::Location.

Updated by mame (Yusuke Endoh) over 3 years ago

I've created a PR for did_you_mean: https://github.com/ruby/did_you_mean/pull/151

Eregon (Benoit Daloze) wrote in #note-4:

There is no point to use RubyVM::AbstractSyntaxTree when all the information is already in the Thread::Backtrace::Location.

In fact there is no enough information because Thread::Backtrace::Location only knows path and node_id. @ko1 (Koichi Sasada) doesn't want to use more memory for this feature and I agree with him. I'm not a big fan of the mechanism of AST.of, but it is a completely different issue from this ticket.

My first proposal was four separate methods, but they require keeping the result of AST.of for each Thread::Backtrace::Location, which requires additional memory.

We may introduce Thread::Backtrace::Location#code_location or something which returns [first_lineno, first_column, last_lineno, last_column]. However, to show the code line clearly, we may need to know the node type (i.e., a method call or something), so I'm not 100% sure if four integers are enough for the use case.

Updated by mame (Yusuke Endoh) over 3 years ago

Ahh, I've tried but I found that the current prototype draws an underline from "the end column of the receiver of the call" to "the end column of the call", so four integers are clearly not enough. Hardcoding such a location looks very ad-hoc, so AST::Node is only a reasonable way as far as I think of.

Updated by Eregon (Benoit Daloze) over 3 years ago

mame (Yusuke Endoh) wrote in #note-5:

In fact there is no enough information because Thread::Backtrace::Location only knows path and node_id.

This seems an implementation detail, not something that should decide what the API is.

For instance, on TruffleRuby, Thread::Backtrace::Location knows column information (which is stored on nodes), and is represented as a backtrace + index (https://github.com/oracle/truffleruby/blob/master/src/main/java/org/truffleruby/core/thread/RubyBacktraceLocation.java).
There is no need to re-parse anything, and requiring reparsing for this feature seems really suboptimal (slow) and prone to issues if e.g., the file no longer exists.

Wouldn't a byte offset + byte length be almost as compact as node_id, but without needing to keep the node or to reparse?

@ko1 (Koichi Sasada) doesn't want to use more memory for this feature and I agree with him. I'm not a big fan of the mechanism of AST.of, but it is a completely different issue from this ticket.

My first proposal was four separate methods, but they require keeping the result of AST.of for each Thread::Backtrace::Location, which requires additional memory.

The footprint of a Thread::Backtrace::Location does not seem a big issue, those objects are typically GC'd pretty quickly.
Why would they require to keep the result of AST.of?
Can the column information be kept from the parser in the first place?

Ahh, I've tried but I found that the current prototype draws an underline from "the end column of the receiver of the call" to "the end column of the call", so four integers are clearly not enough. Hardcoding such a location looks very ad-hoc, so AST::Node is only a reasonable way as far as I think of.

Could you show the output of that? I'm not sure I follow the decription.

I think whatever the API is we should also design something for did_you_mean that can work on other Ruby implementations.

As far as I see, RubyVM::AbstractSyntaxTree exposes MRI internals (e.g., node types, node field names, internal node field order) and seems difficult to support on other Ruby implementations (it reaches deep in MRI internals).
did_you_mean is a public gem, so I think it should avoid using such a internal, unstable (the AST will evolve and differs per Ruby implementation), and MRI-specific API.
For instance, nd_recv = nd_call.children[0] feels hacky (should be call_node.receiver or call_node[:receiver]), and hardcoding node types feel brittle (e.g., what if there is a new call node type?)

It would be great to turn RubyVM::AbstractSyntaxTree into a proper API which can be supported on other Ruby implementations, and then it could be used by many other gems.
I think that would require to have proper names and more thinking towards making it more future proof.

Updated by ioquatix (Samuel Williams) over 3 years ago

I agree we should introduce an abstract rich model for this.

https://bugs.ruby-lang.org/issues/13383 is a similar problem.

It makes sense to me to introduce some kind of Backtrace::Location#source_location.

We should stop thinking about arrays of integers and think about a rich interface, even so far as having read method that returns the code in question (reading from disk doesn't always work after the fact).

It makes a lot more sense to me and I think easier for people to consume with a well defined interface.

Updated by mame (Yusuke Endoh) over 3 years ago

did_you_mean is a public gem, so I think it should avoid using such a internal, unstable (the AST will evolve and differs per Ruby implementation), and MRI-specific API.

That makes sense.

First I tried to implement the feature in prelude.rb. However, the approach made MJIT tests fail because it changes JIT invocation count that is checked by the tests. So @ko1 (Koichi Sasada) suggested to implement the feature in a builtin gem required by default like did_you_mean. However, indeed it has a problem you stated.

I'll try to implement it again in prelude.rb, or think of creating a private, MRI-specific default gem.

Updated by Eregon (Benoit Daloze) over 3 years ago

I'll try to implement it again in prelude.rb, or think of creating a private, MRI-specific default gem.

Why not exposing column information on Thread::Backtrace::Location?
That would be a proper public API.
Given that Thread::Backtrace::Location already exposes code locations (path and line currently), it feels natural that it can provide column information too.

Is the problem that does not show the proper place for the call and underlines the receiver too?

This feature should really be available on all Rubies, not just CRuby, so I think we need to expose proper APIs to make it work.

Another possibility might be for NoMethodError#message to already include the underlining (and be multiple lines then),
or to add a method on NoMethodError that would returns the underlying String or column information.
Of course that would be less general and be useful for fewer use cases.

Updated by Eregon (Benoit Daloze) over 3 years ago

Another thought: for the case of NoMethodError, maybe showing the arguments would be enough to differentiate in most cases which of the two calls with the same name was problematic?
NoMethodError#args already exists, so it would be the matter of showing it in the exception message.

Updated by yuki24 (Yuki Nishijima) over 3 years ago

Another thought: for the case of NoMethodError, maybe showing the arguments would be enough to differentiate in most cases which of the two calls with the same name was problematic?
NoMethodError#args already exists, so it would be the matter of showing it in the exception message.

I don't think it is ideal. If you chain #map in one line and got a nil by accident during development then checking the number of args may not be helpful.

Just FYI here is the same request from over 6 years ago: https://bugs.ruby-lang.org/issues/10982

Actions #13

Updated by mame (Yusuke Endoh) over 3 years ago

  • Is duplicate of Feature #10982: Clarify location of NoMethod error added

Updated by mame (Yusuke Endoh) over 3 years ago

If needed, we can implement Thread::Backtrace::Location#code_location by using AST.of.

class Thread::Backtrace::Location
  def code_location
    node = RubyVM::AbstractSyntaxTree.of(self)
    [
      node.first_lineno,
      node.first_column,
      node.last_lineno,
      node.last_column,
    ]
  end
end

If @Eregon (Benoit Daloze) wants this, we may introduce this as a built-in method. But as @yuki24 (Yuki Nishijima) said, I believe this is less useful because it is too rough for the underline feature.

Updated by Eregon (Benoit Daloze) over 3 years ago

mame (Yusuke Endoh) wrote in #note-14:

If needed, we can implement Thread::Backtrace::Location#code_location by using AST.of.
If @Eregon (Benoit Daloze) wants this, we may introduce this as a built-in method.

I think it is good to add, as basically Thread::Backtrace::Location-backtraces (caller_locations, backtrace_locations) then get column information, which might be nice in debugger, user-formatted backtraces (e.g. in HTML one could highlight that region of code), etc.

But as @yuki24 (Yuki Nishijima) said, I believe this is less useful because it is too rough for the underline feature.

To clarify, you mean it would show:


data["data"].first["field"].bar
^^^^^^^^^^^^^^^^^^^^^^^^^^^
json.first.fisrt.bar
^^^^^^^^^^^^^^^^

instead of:

data["data"].first["field"].bar
                  ^^^^^^^^^
json.first.fisrt.bar
          ^^^^^^

That seems enough to know which one it was, but it is indeed less clear.

@mame (Yusuke Endoh) How do you think about something like NoMethodError#code_location (or some other name) that would return the lines&cols for that second region?
I think that would be possible when we have the call node or bytecode's node_id, and then can find the receiver node's last column (& line) from there.
It might not always be available, e.g., from public_send or from rb_funcall(), but that seems unavoidable anyway.

Updated by mame (Yusuke Endoh) over 3 years ago

I brought this ticket to the agenda of today's dev-meeting, and confirmed that matz has already accepted this.

The accepted implementation approach is (1) to extend RubyVM::AST.of to accept Thread::Backtrace::Location to get the node and column information and (2) to create a new built-in MRI-specific gem (rather than did_you_mean gem) to extend Exception#message.

I'm prototyping the new gem, but I found that it is not trivial where to underline depending on context. So I will have to adjust the underline location, according to bug reports. After the specification is almost fixed, the implementation may be exposed as NameError#code_location or something in future, but at first, I would like to start this as a built-in feature.

Updated by mame (Yusuke Endoh) over 3 years ago

I've created a PR to add a new default gem to extend the error message of NameError: https://github.com/ruby/ruby/pull/4586

$ ./local/bin/ruby -e '1.time {}'
-e:1:in `<main>': undefined method `time' for 1:Integer (NoMethodError)

1.time {}
 ^^^^^
Did you mean?  times

Unfortunately it caused some test failures in minitest's test which exactly checks an error message string. I've created a PR to minitest (https://github.com/seattlerb/minitest/pull/880). After it is merged, I'll merge my PR.


BTW, I'd like to hear opinions from you, especially English native people, about the feature name. I tentatively named it "error_squiggle". Squiggle means a wavy underline used to indicate an error in code. But @pocke (Masataka Kuwabara) said that the feature name should not depend on its output appearance because we may change how to display in future.

The name is not so important because a casual user will not refer the name directly, but we need to decide it anyway. Do you have any idea about the name?

Thanks,

Updated by ioquatix (Samuel Williams) over 3 years ago

Looks interesting. Can you hide AST implementation inside Thread::Backtrace::Location and expose some meta data like column start/offset?

Updated by duerst (Martin Dürst) over 3 years ago

mame (Yusuke Endoh) wrote in #note-17:

BTW, I'd like to hear opinions from you, especially English native people, about the feature name. I tentatively named it "error_squiggle". Squiggle means a wavy underline used to indicate an error in code. But @pocke (Masataka Kuwabara) said that the feature name should not depend on its output appearance because we may change how to display in future.

The name is not so important because a casual user will not refer the name directly, but we need to decide it anyway. Do you have any idea about the name?

I think error_highlight might be best. error_emphasize is a verb, which is often best for a method name, but not necessarily here. Maybe error_emphasis could work. In a strict sense, 'highlight' is also presentation-specific (and also a verb besides being a noun), but in its wider sense, it works very well.

Updated by sawa (Tsuyoshi Sawada) over 3 years ago

mame (Yusuke Endoh) wrote in #note-17:

BTW, I'd like to hear opinions from you, especially English native people, about the feature name. I tentatively named it "error_squiggle". Squiggle means a wavy underline used to indicate an error in code. But @pocke (Masataka Kuwabara) said that the feature name should not depend on its output appearance because we may change how to display in future.

I think that, to pinpoint the origin of an error out of some things, the expression "offending ..." is usually used. So I suggest something along the lines of offending_method, offending_token, or offending_location.

Updated by Eregon (Benoit Daloze) over 3 years ago

mame (Yusuke Endoh) wrote in #note-17:

I've created a PR to add a new default gem to extend the error message of NameError: https://github.com/ruby/ruby/pull/4586

Is a gem needed here?

I like this feature, but I think it is very important to design it in a way that it can be supported on other Ruby implementations, and does not expose CRuby implementation details needlessly.

If it's a gem and it relies on RubyVM::AST it's an issue as it won't reasonably work on anything but CRuby (until RubyVM::AST is made a proper API, but there seems to be little effort in that direction) and it is suboptimal (see below).
It also makes it all too easy for Ruby users to depend on that gem or its API, and so long term it would be likely other Ruby implementations would have to support that gem, which might quite awkward.

IMHO this feature should be considered "core", to provide maximum flexibility in implementation.
For instance, TruffleRuby already has the AST available (since it executes it) and can get column information from there, so it makes no sense to re-parse to get column information, or use a gem for it.

I guess you want to have of the logic in Ruby code (rather than C) here.
How about using a "core" .rb file at the root of the repo, and define the methods/classes under an internal module?
This internal module could be accessible e.g. via Primitive.internal_module (so only accessible in core, and ObjectSpace could skip it), so it would be strictly implementation internals and no risk to leak to the user.

There is another potential issue though if the approach is to NameError.prepend(module), that will both change the ancestors of NameError which seems suboptimal, and expose that module publicly.
Could there simply be a hook in NameError#message to call to this new functionality to add the underlining?

Updated by mame (Yusuke Endoh) over 3 years ago

Thank you for suggestions about the name. For the record, @pocke (Masataka Kuwabara) and @naruse (Yui NARUSE) like "pretty_error". I'll bring this to the next dev-meeting and ask @matz (Yukihiro Matsumoto) to pick up.

Eregon (Benoit Daloze) wrote in #note-21:

How about using a "core" .rb file at the root of the repo, and define the methods/classes under an internal module?

It was a possible alternative approach that I had considered, but the gem-based approach was accepted in the previous dev-meeting. If it is a gem, we can allow users to update it by "gem update error_highlight" or something. As I said, we will have to adjust the underline location after the 3.1 release, so the update capabilty will be useful for a while. After the specification becomes stable, we might remove the gem and make the feature truly built in the core.

Updated by duerst (Martin Dürst) over 3 years ago

mame (Yusuke Endoh) wrote in #note-22:

Thank you for suggestions about the name. For the record, @pocke (Masataka Kuwabara) and @naruse (Yui NARUSE) like "pretty_error". I'll bring this to the next dev-meeting and ask @matz (Yukihiro Matsumoto) to pick up.

I'm not convinced about "pretty_error". It's mostly an adjective, not a verb. In that sense, "prettify_error" would be clearer. Also, most errors aren't pretty at all, and highlighting doesn't make them any prettier, or does it?

Updated by sawa (Tsuyoshi Sawada) over 3 years ago

To my understanding, the word "prettify" means and has been used to add cosmetics to the output, like adding line breaks and indenting or adding colors. The words "emphasis" and "highlight" are in a similar vein. The functionality in question here is not about cosmetics, but is to add information to the output, so I think it might not be appropriate to use these words.

Updated by duerst (Martin Dürst) over 3 years ago

sawa (Tsuyoshi Sawada) wrote in #note-24:

To my understanding, the word "prettify" means and has been used to add cosmetics to the output, like adding line breaks and indenting or adding colors. The words "emphasis" and "highlight" are in a similar vein. The functionality in question here is not about cosmetics, but is to add information to the output, so I think it might not be appropriate to use these words.

If all of pretty/emphasis/highlight are problematic, what's your suggestion?

I can very well imagine that instead of "squiggles" (^^^^^ or so), on a colored display some kind of actual highlighting (in the narrow sense) might be used. Both emphasis and highlight make clear that a specific part of the output is more important than the rest, which is exactly what we are talking about.

Updated by mame (Yusuke Endoh) over 3 years ago

@duerst (Martin Dürst) It sounds similar to prettyprint which is an existing built-in gem. As far as I know, there is no grammatical rules about a gem name. I don't find any consistent rule when I see prettyprint, power_assert, did_you_mean, debug, etc. Also, I agree that "prettify" is a bit confusing.

@sawa (Tsuyoshi Sawada) What language uses the word "offending" for the meaning? Could you show any reference that "offending" is used?

My current preference is power_error because the functionality is similar to power_assert, but I'm not picky about it.

Updated by sawa (Tsuyoshi Sawada) over 3 years ago

duerst (Martin Dürst) wrote in #note-25:

If all of pretty/emphasis/highlight are problematic, what's your suggestion?

I already gave suggestions.

mame (Yusuke Endoh) wrote in #note-26:

@sawa (Tsuyoshi Sawada) What language uses the word "offending" for the meaning? Could you show any reference that "offending" is used?

I had English in mind. The link https://eow.alc.co.jp/search?q=offending provides some examples:

  • offending agent 原因薬物
  • offending bacteria 原因菌
  • offending corporation 問題となっている企業
  • offending conduct 犯罪行為
  • offending drug 原因薬剤
  • offending enterprise 加害企業
  • offending environment 原因となっている環境
  • offending party 違反した当事者

And in academics (linguistics), when a (natural language) sentence is ungrammatical due to some cause, that cause is said to be the "offending word/operation/movement/..."

I have also found uses in programming languages:

Updated by mame (Yusuke Endoh) over 3 years ago

Okay thanks. As far as I take a glance at the references that you showed, the word seems to be mainly used for a "cause". However, the underline in my proposal is not necessarily a cause, so I'm unsure whether it is more appropriate.

AFAIK, Rust compiler reports an error in the style that I propose, like this: https://blog.rust-lang.org/2016/08/10/Shape-of-errors-to-come.html. I'd like to know the name that they calls the particular feature, if any. Anyone knows?

Updated by mame (Yusuke Endoh) over 3 years ago

I read the source code of rustc. https://github.com/rust-lang/rust/tree/master/compiler/rustc_errors

The feature name seems to be "rustc_errors", which is not helpful for my proposal to decide the gem name, but I found some inspiring words.

  • diagnostics
  • suggestions
  • highlight line(s)

"Diagnostics" and "suggestions" are very suitable for some analysis results, but may sound exaggerated for my proposal to just show a line that an error occurred. "Highlight" may be more suitable (but a little mundane to me :-).

Updated by matz (Yukihiro Matsumoto) over 3 years ago

The pretty from pretty print strongly suggests beautifying indentation, which the proposed feature doesn't provide.
So I vote for highlight or emphasize.

Matz.

Updated by mame (Yusuke Endoh) over 3 years ago

@matz (Yukihiro Matsumoto) Thank you for your comment! Then, I'll pick up error_highlight. I will update and then merge my PR later.

Updated by mame (Yusuke Endoh) over 3 years ago

  • Status changed from Assigned to Closed

I've merged my PR https://github.com/ruby/ruby/pull/4586, created a new repository https://github.com/ruby/error_highlight
, and released a new gem https://rubygems.org/gems/error_highlight.

I will monitor the CI, but please let me know if you notice anything.

Updated by mame (Yusuke Endoh) over 3 years ago

Just FYI: Python will have the same feature https://www.python.org/dev/peps/pep-0657/

Updated by Eregon (Benoit Daloze) over 3 years ago

mame (Yusuke Endoh) wrote in #note-33:

Just FYI: Python will have the same feature https://www.python.org/dev/peps/pep-0657/

Interesting, two things to note there:

IMHO those are better choices, a proper API might be useful for other things, and re-parsing + trying to narrow better seems inherently brittle.

Actions #35

Updated by Eregon (Benoit Daloze) over 2 years ago

Actions #36

Updated by Eregon (Benoit Daloze) about 1 month ago

  • Related to Feature #8751: Add offsets to method#source_location added
Actions #37

Updated by Eregon (Benoit Daloze) about 1 month ago

  • Related to Feature #6012: Proc#source_location also return the column added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0