Project

General

Profile

Actions

Feature #20024

open

SyntaxError subclasses

Added by kddnewton (Kevin Newton) 5 months ago. Updated 4 months ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:115491]

Description

There are many places around the Ruby ecosystem that handle syntax errors in different ways. Some provide highlighting, others provide recovery of some form, still more provide LSP metadata. In order to provide more rich information, most of them switch on the message of the error being returned, as in:

https://github.com/ruby/irb/blob/f86d9dbe2fc05ed62332069a27f4aacc59ba9634/lib/irb/ruby-lex.rb#L220-L267

Within ruby/spec, specific error messages are required for these kinds of messages in order to support this implicit interface that syntax errors have a hidden type, which is only expressed through their message. For example:

https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/if_spec.rb#L323
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/numbered_parameters_spec.rb#L31
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/pattern_matching_spec.rb#L210
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/rescue_spec.rb#L262
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/yield_spec.rb#L196

It's not clear from these specs or from the parser itself which error messages are permanent/guaranteed versus which are changeable. Either way, relying on the error message itself as opposed to the type of the error is brittle at best.

I would like to suggest instead we implement subclasses on SyntaxError that would allow tools that depend on specific syntax errors to rescue those subclasses instead of parsing the message. In addition to alleviating the need to parse error messages with regex, this would also allow for the possibility that the error messages could change in the future without breaking external tooling.

Allowing these to change would allow them to be potentially enhanced or changed by other tools - for example by providing recovery information or translating them.

This is particularly important for Prism since we are getting down to individual spec failures and some of the failures are related to the fact that we have messages like "Numbered parameter is already used in outer scope" where the spec requires /numbered parameter is already used in/. Even this case-sensitivity is causing failures, which seems like we're testing the wrong thing.

Updated by Eregon (Benoit Daloze) 5 months ago

There are dozens of different error messages for SyntaxError so that would mean dozens of subclasses.
That seems a lot to me.
Also it would be equally hard to name all these subclasses.
And we could not rename them easily either (more easily than changing messages but still).

Even this case-sensitivity is causing failures, which seems like we're testing the wrong thing.

I think we are testing what is necessary. As you see IRB does not allow for case changes there because it does not expect it.
Many other gems expect a very specific exception message (not just for SyntaxError).
So I think it's clear Prism needs to have the exact same messages whenever possible, when used as a Ruby parser for running Ruby code.

For tooling I think having another more verbose/clearer/improved set of messages would make sense (as suggested by Andrii on the Prism issue tracker). I.e. the gem could provide improved messages (maybe as an option?).

Updated by kddnewton (Kevin Newton) 5 months ago

I understand the need for compatibility with regard to Prism, but this ticket isn't about Prism specifically. I'm proposing this as a solution to the problem in general, devoid of the Prism context.

Regardless of other parsers, there exists a problem that tools are relying on error messages for behavior. We can fix that by introducing subclasses. Of course it won't fix the immediate situation for anyone, but going forward it will be a much better situation.

Updated by byroot (Jean Boussier) 5 months ago

Rather than sub classes, which have the annoying issue of having to stick around even if for some reason the case no longer exist, could we simply expose SyntaxError#issue (the name isn't very inspired) that would return a symbol?

e.g. :void_value, :number_parameter_used, etc etc.

Updated by Eregon (Benoit Daloze) 5 months ago

A Symbol per type of SyntaxError seems easier to maintain and wouldn't create so many classes/constants/etc which would be messy in docs.
I think that's a much easier improvement to do than adding a bunch of subclasses.
It's also better memory-wise as classes are not small.

Updated by Dan0042 (Daniel DeLorme) 5 months ago

byroot (Jean Boussier) wrote in #note-3:

Rather than sub classes, which have the annoying issue of having to stick around even if for some reason the case no longer exist, could we simply expose SyntaxError#issue (the name isn't very inspired) that would return a symbol?

That sounds good to me. It's much less cumbersome than dozens of subclasses, and has much better forward-compatibility. If this is introduced in ruby 3.3 but a "foo" syntax error is added to ruby 3.4, then err.is_a?(FooSyntaxError) would raise an error in ruby 3.3 but err.message_code == :foo would not.

Updated by kddnewton (Kevin Newton) 5 months ago

I think I am okay with that compromise. I don't like that you won't be able to rescue specific errors, but I understand the concern about forward compatibility.

My other desire for subclasses is that different syntax errors have different information on them. For example, _1 = 1 is a syntax error that has the additional information that you're assigning to _1. Most libraries right now parse that out of the message, but if we had separate objects then we could define the shapes of the errors to have that information as a field. For example, that would be the name or identifier field. This could probably be addressed with something like a metadata hash or something else, if we're just adding fields to SyntaxError.

Updated by zverok (Victor Shepelev) 5 months ago

I don't like that you won't be able to rescue specific errors, but I understand the concern about forward compatibility.

An aside note on this: in general, rescue clause is already looking very alike pattern-matching, and if we continue this approach (with adding causes/details to exceptions), we might once want to be able to...

rescue SyntaxError(issue: :void_value) => ex

Just a thought (but might have many interesting consequences).

Updated by jeremyevans0 (Jeremy Evans) 5 months ago

zverok (Victor Shepelev) wrote in #note-7:

An aside note on this: in general, rescue clause is already looking very alike pattern-matching, and if we continue this approach (with adding causes/details to exceptions), we might once want to be able to...

rescue SyntaxError(issue: :void_value) => ex

That's already valid syntax, which calls the SyntaxError method with an issue keyword argument (the method should return a class or module to work correctly).

Updated by kddnewton (Kevin Newton) 5 months ago

@zverok (Victor Shepelev) That's an interesting proposal but let's move that discussion to another issue. I don't want this to become a syntax discussion because that could take us in a totally different direction.

Updated by zverok (Victor Shepelev) 5 months ago

@jeremyevans0 (Jeremy Evans)

That's already valid syntax

I am aware; I just have a feeling that repurposing it with PM might still be possible (the chance that a lot of code uses this syntax with UpcasedMethods to produce exception classes dynamically is relatively low—though, I admit, not non-existent).

Anyway, I think @kddnewton (Kevin Newton) is correct: here is not the best place, and the idea is very rough, I am sorry for stealing the discussion, just got a sudden idea.

Updated by Dan0042 (Daniel DeLorme) 5 months ago

This idea of subclasses/identifier is coming up in the context of Prism and SyntaxError, but I think it applies to any kind of Exception, so I would like to see this #issue added to Exception rather than just SyntaxError. For pretty much any type of Exception I see things like this:

rescue SomeError => err
  if err.message =~ /some pattern/
    #ignore the error, or do something else
  else
    raise
  end

Ideally this SomeError would have a subclass that allows to catch the exact error subtype, but in practice that doesn't alway happen, and is often out of our control. it would be really nice to be able to compare against a Symbol rather than a string message. Especially for a message that might be expensive to generate (#18285).


(PS: @zverok (Victor Shepelev) I think your idea is very much worth exploring)

Updated by yui-knk (Kaneko Yuichiro) 5 months ago

SyntaxError includes multiple errors

SyntaxError includes multiple errors like below, in this case two errors are included into one SyntaxError. Therefore it needs to consider how to handle such cases.

begin
  eval <<~CODE
    def m
      retry
    
  CODE
rescue SyntaxError => e
  puts e.message
end

# (eval at test.rb:2):2: Invalid retry without rescue
#   retry
#   ^~~~~
# (eval at test.rb:2):3: syntax error, unexpected end-of-input, expecting `end' or dummy end

We need to avoid losing information to provide rich information to SyntaxError users. Therefore these are not options for this problem:

  • Merge multiple errors into one SyntaxError subclass. Because this is misleading if these errors are different types of errors.
  • Use only a single error and ignore following erorrs. Because information is lost.

irb use case and error tolerance parser

Regarding the irb use case, it categorizes error , which is recoverable by adding tokens to the end, as recoverable_error and error, which is recoverable by deleting tokens, as unrecoverable_error, so that irb can determine to require more input or not.

When irb was created, error tolerance parser didn't exist. Therefore irb needs to take care of how to recover the input using SyntaxError#message. However it's parser responsibility to recover errors.

irb can categorize syntax error if error tolerance parser provides information how the parser recover errors.
If only single error and recovery requires only token insert operations for the last of input, it's recoverable_error.

By the way, does the parser always raise SyntaxError for invalid input? For this use case, it's better to not raise SyntaxError but users ask to parser to have syntax errors or not after parsing.

Proposal

Considering these points, my proposal is defining parser interface.

  • Parser can run without SyntaxError mode
  • Parser provides a method to get syntax errors
  • Each syntax error includes
    • message: "syntax error, unexpected end-of-input, expecting `end' or dummy end"
    • location: (1,0)-(1,1), first/last & line/column
    • operations for recovery:
      • insert / delete
      • location of recovery

Updated by ioquatix (Samuel Williams) 5 months ago

I'm interested in this topic and proposed a while ago to add SyntaxError#diagnostics which is a more detailed information relating to parse errors and how to rectify them. I'd be interested in discussing this in more detail.

Updated by kddnewton (Kevin Newton) 5 months ago

@yui-knk (Kaneko Yuichiro) your proposal matches how prism works today. Prism never raises a syntax error, it returns a list of errors that have (exactly as you have said) a type, a message, and a location.

I agree that a single raised error is not great because it loses information. I like the idea that @ioquatix (Samuel Williams) has proposed of SyntaxError#diagnostics. With such a proposal, we could attach all syntax errors in the file to a single raised error, and you could query them appropriately.

In this case we would need to design the shape of a SyntaxError::Diagnostic class. I would imagine it would have the same fields that @yui-knk (Kaneko Yuichiro) is proposing, something like:

class SyntaxError::Location
  attr_reader :start_line, :end_line # 1-indexed
  attr_reader :start_column, :end_column # in bytes
end

class SyntaxError::Diagnostic
  attr_reader :location # SyntaxError::Location
  attr_reader :message # String
  attr_reader :type # Symbol
end

class SyntaxError
  attr_reader :diagnostics # Array[SyntaxError::Diagnostic]
end

Would that be amenable?

Updated by yui-knk (Kaneko Yuichiro) 5 months ago

For the irb use cases, irb wants to know additional input may be able to recover the current invalid input.
For example end; def m is not recoverable with additional input because end; can not be recovered by adding some tokens. Then irb raises "syntax error" for the input.
On the other hand def m is recoverable with additional input. Then irb requires next input.
Parser needs to provide information how the parser recover errors for removing dependency from irb to syntax error messages.

Updated by palkan (Vladimir Dementyev) 4 months ago

kddnewton (Kevin Newton) wrote in #note-14:

In this case we would need to design the shape of a SyntaxError::Diagnostic class. I would imagine it would have the same fields that @yui-knk (Kaneko Yuichiro) is proposing, something like:
...

I think, we can get some inspiration from Parser: diagnostic (messages — a lot of them, btw). For example, it also includes a level information (error, fatal, warn); something similar to the "recoverable" property proposed above, but more granular.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like3Like0Like0Like0Like0Like0Like1Like0Like1Like0Like0Like0Like0Like0