Bug #21097
open`x = a rescue b in c` and `def f = a rescue b in c` parsed differently between parse.y and prism
Description
x = a rescue b in c
(x = (a rescue b)) in c # parse.y, prism(ruby 3.4)
x = (a rescue (b in c)) # prism(ruby 3.5)
def f = a rescue b in c #=> true(parse.y), :f(prism)
(def f = (a rescue b)) in c # parse.y
def f = (a rescue (b in c)) # prism
There is no difference between prism and parse.y parsing these codes
a rescue b in c # a rescue (b in c)
x = a rescue b # x = (a rescue b)
x = b in c # (x = b) in c
def f = a rescue b # def f = (a rescue b)
def f = b in c # (def f = a) in b
Updated by tompng (tomoya ishida) 8 months ago
not
in
and not
rescue
has the same problem
$ ruby --parser=parse.y -e "def f = not 1 in 2; p f"
false
$ ruby --parser=prism -e "def f = not 1 in 2; p f"
true
$ ruby --parser=parse.y -e "def f = not a rescue true; p f"
false
$ ruby --parser=prism -e "def f = not a rescue true; p f"
true
Updated by tenderlovemaking (Aaron Patterson) 8 months ago
- Assignee set to prism
Updated by hsbt (Hiroshi SHIBATA) 7 months ago
- Status changed from Open to Assigned
Updated by nobu (Nobuyoshi Nakada) 7 months ago
- Related to Bug #21132: Changed postposition `rescue` and `if` behavior since Ruby 3.4 added
Updated by matz (Yukihiro Matsumoto) 7 months ago
The behavior of Prism in 3.5 is close to my intention.
Matz.
Updated by kddnewton (Kevin Newton) 7 months ago
In this case, I'm not sure if the assignee should be prism, if we now have the desired behavior. @tompng (tomoya ishida) does this match your understanding?
Updated by tompng (tomoya ishida) 7 months ago
- Status changed from Assigned to Open
- Assignee deleted (
prism)
Updated by mame (Yusuke Endoh) 6 months ago
- Status changed from Open to Assigned
- Assignee set to nobu (Nobuyoshi Nakada)
Updated by yui-knk (Kaneko Yuichiro) 12 days ago
- Related to Bug #21378: variable pinning does not look for method arguments added
Updated by yui-knk (Kaneko Yuichiro) 3 days ago
I am working on this ticket and have some questions about grammar where I'd like to ask for Matz's opinions.
Precedence of not, in, rescue¶
The precedence of not, in, rescue is referred on https://bugs.ruby-lang.org/issues/21097#note-1.
Q1. Is it okay to combine them as follows?
# def f = not (1 in 2)
def f = not 1 in 2
# def f = (not a) rescue true
def f = not a rescue true
# def f = (not (a in 1)) rescue true
def f = not a in 1 rescue true
# def f = (not a) rescue (1 in 1)
def f = not a rescue 1 in 1
This is consistent with how it's interpreted when written in the body of a method definition that has an end
.
def f1
# not (1 in 2)
not 1 in 2
end
def f2
# (not a) rescue true
not a rescue true
end
def f3
# (not (a in 1)) rescue true
not a in 1 rescue true
end
def f4
# (not a) rescue (1 in 1)
not a rescue 1 in 1
end
By the way, the operator precedence derived from this interpretation is modifier_rescue < not < in
, which differs from the in < not < modifier_rescue
defined in the precedence table.
Precedence of =, in, rescue¶
I think this ticket is about making the precedence of =
, in
, and rescue
consistent.
From the desire for x = a rescue b in c
to be interpreted as x = (a rescue (b in c))
, it's thought that the precedence of these operators should be in the order of =
< modifier_rescue
< in
.
Q2. The following code does not follow this precedence in either parse.y
or prism
, so is it correct to say that it's supposed to be interpreted as follows?
x = b in c # x = (b in c)
def f = b in c # def f = (b in c )
What's happening in parse.y
¶
In the parse.y
definition, single-line pattern matching can't appear on the right-hand side of an assignment.
Furthermore, the left-hand side of single-line pattern matching (the left side of in
or =>
) is an arg
rule, and an arg
can contain an assignment.
Because of this, x = a rescue b in c
is interpreted as (x = a rescue b) in c
.
Try to make single-line pattern matching to be arg
¶
In Ruby's grammar, the right-hand side of an assignment is often the arg
rule. For example, the right-hand side of x = 1
, x = 1 + 2
, and x = obj.m
are all arg
.
These assignments have the unique characteristic that they can be arguments to a method. This means that m(x = 2, x)
is valid code.
Currently, single-line pattern matching is an expr
rule, which is not part of arg
. The arg
grammar element, as its name suggests, can be a method argument.
If we were to make single-line pattern matching an arg
, it would cause conflicts with several tokens.
Conflict on “=>”¶
Let's consider the code m(v => [1])
.
This can be interpreted as either a pattern matching =>
or a hash =>
.
Conflict on “,”¶
Let's consider the code m(v in 1, 2, 3)
.
Because the brackets []
can be omitted in an array pattern, this code can be interpreted as either an array pattern v in 1, 2, 3
or as a method call to m
with multiple arguments like m((v in 1), 2, 3)
.
Conflict on “|”¶
Let's consider the code m(v in :a | :b)
.
The |
symbol is used in pattern matching to separate patterns. However, it's also used as a binary operator.
Therefore, this code can be interpreted as either a method call to m
with a pattern matching argument of v in :a | :b
, or as a method call to m
with an argument that connects v in :a
and :b
with the binary operator |
.
Conflict on “^”¶
Let's consider the code m(v in 1, ^a)
.
It may be a bit surprising that an ambiguity arises with the ^
used for variable pinning. However, recall that the brackets []
can be omitted in an array pattern, and that a trailing comma can be used in an array pattern.
This code can therefore be interpreted in two ways: either the v in 1, ^a
part is treated as a pattern, or it's a pattern matching v in 1,
(with a trailing comma) connected to a
by the binary operator ^
.
Should the ambiguity be resolved?¶
These ambiguities aren't a parser problem; they're a grammar problem.
It's possible to resolve these ambiguities by deciding on a specific interpretation for each case. For example, we could resolve the conflict with =>
by only making in
part of the arg
rule for single-line pattern matching, or we could fix the conflict with ,
by prohibiting the omission of brackets []
in array patterns.
However, let's consider a different approach here.
An alternative approach: Allow it only in assignments that aren't arg
rule¶
Many assignments in Ruby are arg
, but there are some exceptions.
For example, a method call where the parentheses are omitted (called a command) can be written on the right-hand side of an assignment, but that assignment cannot be used as an argument.
x = cmd 1, 2 # OK
m(x = cmd 1, 2) # Syntax Error
Let's consider an approach that allows single-line pattern matching on the right-hand side only for these kinds of assignments, which I'll temporarily call "top-level assignments." For example, the implementation would be as here.
Q3. What do you think about limiting assignments with pattern matching to the top level?
What can be written in the body of an endless method definition?¶
Now, various things can be written in the body of an endless method definition.
def m = 1 + 2
def m = cmd 1, 2
def m = a in b
def f = a rescue b
def f = a rescue b in c
Let's go over a few examples of things that cannot be written in the body of an endless method definition. Since an arg
can be written in the body, we will specifically look at the production rules included in stmt
and expr
.
# expr
def m = cmd 1, 2 do end # SyntaxError
def m = !cmd 1, 2 # SyntaxError
!cmd 1, 2 # ok
x = !cmd 1, 2 # SyntaxError
# stmt
def m = x = cmd 1, 2 # SyntaxError
x = cmd 1, 2 # ok
def m = x += cmd 1, 2 # SyntaxError
x += cmd 1, 2 # ok
def m = def m2 = obj.m 1 # SyntaxError
def m2 = obj.m 1 # ok
I will provide a simple explanation for each.
In the first code example, when we write private def m = cmd 1, 2 do end
, an ambiguity arises over whether the do end
block is attached to private
or to cmd
.
The second code example is simply a matter of whether or not to allow it. Since it's allowed when it's not an assignment but prohibited on the right-hand side of an assignment, the question is which behavior to make consistent.
For the three cases starting with stmt
, a conflict occurs with and
, or
, and do
. This is because the grammar has two competing rules: command_rhs: command_call_value
and command_rhs: command_call_value modifier_rescue after_rescue stmt
.
# def m = x = cmd 1, 2 rescue (a and b)
# (def m = x = cmd 1, 2 rescue a) and b
def m = x = cmd 1, 2 rescue a and b
# private def m = x = (cmd 1, 2 do exp end)
# private (def m = x = cmd 1, 2) do exp end
private def m = x = cmd 1, 2 do exp end
For example, we can resolve this by changing command_call_value
to command
and the stmt
after rescue
to an arg
(simply changing stmt
to expr
would still leave a conflict with do...end
because expr: command_call
exists).
command_rhs : command %prec tOP_ASGN
| command modifier_rescue after_rescue arg
Q4. What can be written in the body of an endless method definition?
Associativity of modifier rescue¶
When used as a postfix operator, rescue
is a left-associative operator, but its behavior on the right-hand side of an assignment is slightly different.
On the right-hand side of an assignment, only one postfix rescue
is associated to an expression.
# (((a rescue b) rescue c) rescue d)
a rescue b rescue c rescue d
# (x = a rescue b) rescue c rescue d
x = a rescue b rescue c rescue d
Q5. In the case of an endless method definition, it uses the same binding method as a normal (non-assignment) case. Should this be left as is?
# def m = (((a rescue b) rescue c) rescue d)
def m = a rescue b rescue c rescue d
When we previously discussed the combination of and
or or
with an endless method definition, a comment in a bug report ( https://bugs.ruby-lang.org/issues/19392#note-9) mentioned the analogy between an endless method definition and an assignment.
Based on this, I thought that if we follow the behavior of an assignment, it might also be possible to interpret (def m = a rescue b) rescue c rescue d
.
Certain endless method definitions are arg
¶
Because some endless method definitions are currently defined as arg
, the following code is valid.
private :m, def m = 1 rescue 2
private x = def m = 1 rescue 2
However, the endless method definition in the following codes is a stmt
, which causes a syntax error.
private :m, def m = 1 rescue 2 in 3
private x = def m = 1 rescue 2 in 3
This difference depends on whether pattern matching is written inside the postfix rescue
, so in some cases, you can't tell the difference without reading quite far ahead.
Adding the ability to use pattern matching on the right-hand side of an assignment, defined as a stmt
, can be seen as a move towards increasing the gap between assignment that can be used as arg
and those that cannot.
At the same time, when the right-hand side is a command, it behaves similarly, and the command syntax has been in Ruby for a long time. So, one could say it's a characteristic of Ruby's grammar to want to include as many writable things as possible in the arg
rule.
It seems that the technique of writing an assignment as an argument has become widespread among people who participate in code golfing, a sport of writing code as short as possible.
x = cmd 1, 2, 3 # ok
m(x = cmd 1, 2, 3) # SyntaxError
Q6. Should this distinction be maintained? Also, in cases where there is only one argument, like private def m = 1 rescue 2 in 3
, the body can be passed as an argument even if it's not an arg
.
The list of questions¶
- Q1. Is it okay to combine them as follows?
- Q2. The following code does not follow this precedence in either
parse.y
orprism
, so is it correct to say that it's supposed to be interpreted as follows? - Q3. What do you think about limiting assignments with pattern matching to the top level?
- Q4. What can be written in the body of an endless method definition?
- Q5. In the case of an endless method definition, it uses the same binding method as a normal (non-assignment) case. Should this be left as is?
- Q6. Should this distinction be maintained? Also, in cases where there is only one argument, like
private def m = 1 rescue 2 in 3
, the body can be passed as an argument even if it's not anarg
.
Updated by naruse (Yui NARUSE) 2 days ago
Prism's behavior should be compatible with Ruby 3.3.
Unless a design change is accepted, it should not break a compatibility.
Could you change behaviors showed in this ticket to Ruby 3.3's behavior?
Updated by naruse (Yui NARUSE) 2 days ago
- Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.1: DONTNEED, 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED
Updated by kddnewton (Kevin Newton) about 1 hour ago
Should this now be assigned to prism since there is an incompatibility? Sorry I am not clear on the conclusion.
Updated by alanwu (Alan Wu) 40 minutes ago
matz (Yukihiro Matsumoto) wrote in #note-5:
The behavior of Prism in 3.5 is close to my intention.
Matz.
naruse (Yui NARUSE) wrote in #note-11:
Prism's behavior should be compatible with Ruby 3.3.
Unless a design change is accepted, it should not break a compatibility.
Could you change behaviors showed in this ticket to Ruby 3.3's behavior?
Looks like the behavior for 3.5 is undecided for now. In any case, the behavior for 3.4 should be consistent with 3.3, so it'd be nice to fix Prism's behavior when parsing with 3.4 grammar.