Feature #20024
openSyntaxError subclasses
Description
There are many places around the Ruby ecosystem that handle syntax errors in different ways. Some provide highlighting, others provide recovery of some form, still more provide LSP metadata. In order to provide more rich information, most of them switch on the message of the error being returned, as in:
Within ruby/spec, specific error messages are required for these kinds of messages in order to support this implicit interface that syntax errors have a hidden type, which is only expressed through their message. For example:
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/if_spec.rb#L323
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/numbered_parameters_spec.rb#L31
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/pattern_matching_spec.rb#L210
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/rescue_spec.rb#L262
https://github.com/ruby/spec/blob/c3206f644325c026fc5b700f0ea75ce9bd2e9d02/language/yield_spec.rb#L196
It's not clear from these specs or from the parser itself which error messages are permanent/guaranteed versus which are changeable. Either way, relying on the error message itself as opposed to the type of the error is brittle at best.
I would like to suggest instead we implement subclasses on SyntaxError
that would allow tools that depend on specific syntax errors to rescue those subclasses instead of parsing the message. In addition to alleviating the need to parse error messages with regex, this would also allow for the possibility that the error messages could change in the future without breaking external tooling.
Allowing these to change would allow them to be potentially enhanced or changed by other tools - for example by providing recovery information or translating them.
This is particularly important for Prism since we are getting down to individual spec failures and some of the failures are related to the fact that we have messages like "Numbered parameter is already used in outer scope"
where the spec requires /numbered parameter is already used in/
. Even this case-sensitivity is causing failures, which seems like we're testing the wrong thing.
Updated by Eregon (Benoit Daloze) 12 months ago
There are dozens of different error messages for SyntaxError so that would mean dozens of subclasses.
That seems a lot to me.
Also it would be equally hard to name all these subclasses.
And we could not rename them easily either (more easily than changing messages but still).
Even this case-sensitivity is causing failures, which seems like we're testing the wrong thing.
I think we are testing what is necessary. As you see IRB does not allow for case changes there because it does not expect it.
Many other gems expect a very specific exception message (not just for SyntaxError).
So I think it's clear Prism needs to have the exact same messages whenever possible, when used as a Ruby parser for running Ruby code.
For tooling I think having another more verbose/clearer/improved set of messages would make sense (as suggested by Andrii on the Prism issue tracker). I.e. the gem could provide improved messages (maybe as an option?).
Updated by kddnewton (Kevin Newton) 12 months ago
I understand the need for compatibility with regard to Prism, but this ticket isn't about Prism specifically. I'm proposing this as a solution to the problem in general, devoid of the Prism context.
Regardless of other parsers, there exists a problem that tools are relying on error messages for behavior. We can fix that by introducing subclasses. Of course it won't fix the immediate situation for anyone, but going forward it will be a much better situation.
Updated by byroot (Jean Boussier) 12 months ago
Rather than sub classes, which have the annoying issue of having to stick around even if for some reason the case no longer exist, could we simply expose SyntaxError#issue
(the name isn't very inspired) that would return a symbol?
e.g. :void_value
, :number_parameter_used
, etc etc.
Updated by Eregon (Benoit Daloze) 12 months ago
A Symbol per type of SyntaxError seems easier to maintain and wouldn't create so many classes/constants/etc which would be messy in docs.
I think that's a much easier improvement to do than adding a bunch of subclasses.
It's also better memory-wise as classes are not small.
Updated by Dan0042 (Daniel DeLorme) 12 months ago
byroot (Jean Boussier) wrote in #note-3:
Rather than sub classes, which have the annoying issue of having to stick around even if for some reason the case no longer exist, could we simply expose
SyntaxError#issue
(the name isn't very inspired) that would return a symbol?
That sounds good to me. It's much less cumbersome than dozens of subclasses, and has much better forward-compatibility. If this is introduced in ruby 3.3 but a "foo" syntax error is added to ruby 3.4, then err.is_a?(FooSyntaxError)
would raise an error in ruby 3.3 but err.message_code == :foo
would not.
Updated by kddnewton (Kevin Newton) 12 months ago
I think I am okay with that compromise. I don't like that you won't be able to rescue specific errors, but I understand the concern about forward compatibility.
My other desire for subclasses is that different syntax errors have different information on them. For example, _1 = 1
is a syntax error that has the additional information that you're assigning to _1
. Most libraries right now parse that out of the message, but if we had separate objects then we could define the shapes of the errors to have that information as a field. For example, that would be the name
or identifier
field. This could probably be addressed with something like a metadata
hash or something else, if we're just adding fields to SyntaxError.
Updated by zverok (Victor Shepelev) 12 months ago
I don't like that you won't be able to rescue specific errors, but I understand the concern about forward compatibility.
An aside note on this: in general, rescue
clause is already looking very alike pattern-matching, and if we continue this approach (with adding causes/details to exceptions), we might once want to be able to...
rescue SyntaxError(issue: :void_value) => ex
Just a thought (but might have many interesting consequences).
Updated by jeremyevans0 (Jeremy Evans) 12 months ago
zverok (Victor Shepelev) wrote in #note-7:
An aside note on this: in general,
rescue
clause is already looking very alike pattern-matching, and if we continue this approach (with adding causes/details to exceptions), we might once want to be able to...rescue SyntaxError(issue: :void_value) => ex
That's already valid syntax, which calls the SyntaxError
method with an issue
keyword argument (the method should return a class or module to work correctly).
Updated by kddnewton (Kevin Newton) 12 months ago
@zverok (Victor Shepelev) That's an interesting proposal but let's move that discussion to another issue. I don't want this to become a syntax discussion because that could take us in a totally different direction.
Updated by zverok (Victor Shepelev) 12 months ago
That's already valid syntax
I am aware; I just have a feeling that repurposing it with PM might still be possible (the chance that a lot of code uses this syntax with UpcasedMethods
to produce exception classes dynamically is relatively low—though, I admit, not non-existent).
Anyway, I think @kddnewton (Kevin Newton) is correct: here is not the best place, and the idea is very rough, I am sorry for stealing the discussion, just got a sudden idea.
Updated by Dan0042 (Daniel DeLorme) 12 months ago
This idea of subclasses/identifier is coming up in the context of Prism and SyntaxError, but I think it applies to any kind of Exception, so I would like to see this #issue added to Exception rather than just SyntaxError. For pretty much any type of Exception I see things like this:
rescue SomeError => err
if err.message =~ /some pattern/
#ignore the error, or do something else
else
raise
end
Ideally this SomeError
would have a subclass that allows to catch the exact error subtype, but in practice that doesn't alway happen, and is often out of our control. it would be really nice to be able to compare against a Symbol rather than a string message. Especially for a message that might be expensive to generate (#18285).
(PS: @zverok (Victor Shepelev) I think your idea is very much worth exploring)
Updated by yui-knk (Kaneko Yuichiro) 12 months ago
SyntaxError includes multiple errors¶
SyntaxError
includes multiple errors like below, in this case two errors are included into one SyntaxError
. Therefore it needs to consider how to handle such cases.
begin
eval <<~CODE
def m
retry
CODE
rescue SyntaxError => e
puts e.message
end
# (eval at test.rb:2):2: Invalid retry without rescue
# retry
# ^~~~~
# (eval at test.rb:2):3: syntax error, unexpected end-of-input, expecting `end' or dummy end
We need to avoid losing information to provide rich information to SyntaxError
users. Therefore these are not options for this problem:
- Merge multiple errors into one
SyntaxError
subclass. Because this is misleading if these errors are different types of errors. - Use only a single error and ignore following erorrs. Because information is lost.
irb use case and error tolerance parser¶
Regarding the irb use case, it categorizes error , which is recoverable by adding tokens to the end, as recoverable_error
and error, which is recoverable by deleting tokens, as unrecoverable_error
, so that irb can determine to require more input or not.
When irb was created, error tolerance parser didn't exist. Therefore irb needs to take care of how to recover the input using SyntaxError#message
. However it's parser responsibility to recover errors.
irb can categorize syntax error if error tolerance parser provides information how the parser recover errors.
If only single error and recovery requires only token insert operations for the last of input, it's recoverable_error
.
By the way, does the parser always raise SyntaxError
for invalid input? For this use case, it's better to not raise SyntaxError
but users ask to parser to have syntax errors or not after parsing.
Proposal¶
Considering these points, my proposal is defining parser interface.
- Parser can run without SyntaxError mode
- Parser provides a method to get syntax errors
- Each syntax error includes
- message: "syntax error, unexpected end-of-input, expecting `end' or dummy end"
- location: (1,0)-(1,1), first/last & line/column
- operations for recovery:
- insert / delete
- location of recovery
Updated by ioquatix (Samuel Williams) 12 months ago
I'm interested in this topic and proposed a while ago to add SyntaxError#diagnostics
which is a more detailed information relating to parse errors and how to rectify them. I'd be interested in discussing this in more detail.
Updated by kddnewton (Kevin Newton) 12 months ago
@yui-knk (Kaneko Yuichiro) your proposal matches how prism works today. Prism never raises a syntax error, it returns a list of errors that have (exactly as you have said) a type, a message, and a location.
I agree that a single raised error is not great because it loses information. I like the idea that @ioquatix (Samuel Williams) has proposed of SyntaxError#diagnostics
. With such a proposal, we could attach all syntax errors in the file to a single raised error, and you could query them appropriately.
In this case we would need to design the shape of a SyntaxError::Diagnostic
class. I would imagine it would have the same fields that @yui-knk (Kaneko Yuichiro) is proposing, something like:
class SyntaxError::Location
attr_reader :start_line, :end_line # 1-indexed
attr_reader :start_column, :end_column # in bytes
end
class SyntaxError::Diagnostic
attr_reader :location # SyntaxError::Location
attr_reader :message # String
attr_reader :type # Symbol
end
class SyntaxError
attr_reader :diagnostics # Array[SyntaxError::Diagnostic]
end
Would that be amenable?
Updated by yui-knk (Kaneko Yuichiro) 12 months ago
For the irb use cases, irb wants to know additional input may be able to recover the current invalid input.
For example end; def m
is not recoverable with additional input because end;
can not be recovered by adding some tokens. Then irb raises "syntax error" for the input.
On the other hand def m
is recoverable with additional input. Then irb requires next input.
Parser needs to provide information how the parser recover errors for removing dependency from irb to syntax error messages.
Updated by palkan (Vladimir Dementyev) 11 months ago
kddnewton (Kevin Newton) wrote in #note-14:
In this case we would need to design the shape of a
SyntaxError::Diagnostic
class. I would imagine it would have the same fields that @yui-knk (Kaneko Yuichiro) is proposing, something like:
...
I think, we can get some inspiration from Parser: diagnostic (messages — a lot of them, btw). For example, it also includes a level information (error, fatal, warn); something similar to the "recoverable" property proposed above, but more granular.