Project

General

Profile

Actions

Bug #21139

closed

Prism and parse.y parses `it = it` differently

Added by tompng (tomoya ishida) 6 months ago. Updated 1 day ago.

Status:
Feedback
Assignee:
Target version:
-
ruby -v:
ruby 3.5.0dev (2025-02-14T16:49:52Z master ee181d1bb7) +PRISM [x86_64-linux]
[ruby-core:121054]

Description

# ruby --parser=parse.y -e "42.tap { it = it; p it }"
nil
# ruby --parser=prism -e "42.tap { it = it; p it }"
42

Files

clipboard-202503081702-idzz2.png (22.6 KB) clipboard-202503081702-idzz2.png alanwu (Alan Wu), 03/08/2025 10:02 PM

Related issues 2 (0 open2 closed)

Related to Ruby - Bug #21137: Compound assignment operator with "it" parsed differently between parse.y and prismClosedprismActions
Related to Ruby - Bug #21138: The modifier expression with "it" is parsed differently in parse.y and Prism, which is unexpected in both.ClosedprismActions

Updated by matz (Yukihiro Matsumoto) 6 months ago

Assignment to it should be prohibited in the long run. Temporarily, I vote for Prism behavior.

Matz.

By the way, how can I clear assignee of the issue on Redmine?

Updated by nobu (Nobuyoshi Nakada) 6 months ago

matz (Yukihiro Matsumoto) wrote in #note-1:

Assignment to it should be prohibited in the long run. Temporarily, I vote for Prism behavior.

  1. Currently, it is an ordinary local variable if assigned syntactically.
  2. And a local variable is nil before assigned, even in its RHS, at the runtime.

The behavior in parse.y obeys above principles.

Do you want to change the rule 1, make it always special?

Actions #3

Updated by nobu (Nobuyoshi Nakada) 6 months ago

  • Related to Bug #21137: Compound assignment operator with "it" parsed differently between parse.y and prism added
Actions #4

Updated by nobu (Nobuyoshi Nakada) 6 months ago

  • Related to Bug #21138: The modifier expression with "it" is parsed differently in parse.y and Prism, which is unexpected in both. added

Updated by alanwu (Alan Wu) 5 months ago

matz (Yukihiro Matsumoto) wrote in #note-1:

By the way, how can I clear assignee of the issue on Redmine?

The null option in the menu is the very first one and is rendered with a tiny height to make it hard to mouse over. It's easier to pick it with arrow keys.
Maybe the UX could be improved by tweaking some settings? CC @hsbt (Hiroshi SHIBATA)

Updated by matz (Yukihiro Matsumoto) 5 months ago

I said I prefer Prism behavior to parse.y. But It was due to my misunderstanding. it = it should initialize the local variable named it and the value should be nil, just like other x = x assignment in Ruby.
In the future, all assignment to it could be prohibited, maybe.

Matz.

Updated by kddnewton (Kevin Newton) 5 months ago

  • Assignee set to prism
Actions #8

Updated by hsbt (Hiroshi SHIBATA) 4 months ago

  • Status changed from Open to Assigned

Updated by tenderlovemaking (Aaron Patterson) 12 days ago

Can we get some clarification on this? @matz (Yukihiro Matsumoto) likes the behavior of Prism, but this bug is assigned to the Prism team. It sounds like this should be considered a feature in parse.y.

Thanks.

-Aaron

Actions #10

Updated by tenderlovemaking (Aaron Patterson) 12 days ago

  • Status changed from Assigned to Feedback

Updated by Eregon (Benoit Daloze) 12 days ago

@tenderlovemaking (Aaron Patterson) From matz's latest reply in https://bugs.ruby-lang.org/issues/21139#note-6:

But It was due to my misunderstanding. it = it should initialize the local variable named it and the value should be nil,

Which means it should be the parse.y behavior in the OP example, and so Prism should behave the same as parse.y here, which I suppose is why Kevin re-assigned to prism.

Updated by S_H_ (Shun Hiraoka) 10 days ago

I've created a pull request to fix this Prism behavior where 42.tap { it = it; p it } returns 42 instead of nil.

https://github.com/ruby/prism/pull/3604

This change makes Prism treat the right-hand side it as a local variable in assignment context, matching parse.y's behavior.
I'm not entirely sure if there are other edge cases that need to be addressed. I would appreciate any feedback or suggestions.

Updated by vinistock (Vinicius Stock) 2 days ago

Bringing my comment from the pull request here. Reviewed with @AMomchilov

We wonder if Prism's current behaviour is more intuitive for developers. In this example

42.tap { it = it; p it }

The proposed parse.y behaviour would treat it = it as if it were it = nil (reading from a non-intiailized local variable to that same local variable). We would argue it's more intuitive if it prints 42, because it = it is creating a new local variable from the result of reading the it parameter (before it gets "shadowed" by a local variable of the same name), which is 42. If developers want the the nil behaviour, they can explicitly write it = nil instead.

Updated by mame (Yusuke Endoh) 2 days ago

@vinistock To clarify, do you want to change the scope of all variables, or only for it?

If you meant all variables, it would create a major incompatibility.
In an assignment foo = expr, the variable foo must be available in expr.
Changing this so that a variable is only defined lexically after expr would break this:

fact = -> n do
  # This call would be invalid because 'fact' is not available here
  fact.call(n - 1)
end

Changing the scope of only a variable named it seems like an ad-hoc rule.
The syntax would be very hard to explain.
I think it would be far simpler to forbid it as a local variable name, though the compatibility implications would need to be carefully considered.

Updated by vinistock (Vinicius Stock) 2 days ago

No, I don't mean all variables. However, the it case feels unintuitive because it refers to the first argument of the block. Take this other example here:

42.tap { it = it; p it }
42.tap { |x| x = x; p x }

To me, these two are exact equivalents. You are storing the first argument of the block in a local variable that shadows the name of that argument. Right now, parse.y returns different results for these two cases, which to me is quite surprising. If you run that snippet with parse.y, you get nil and 42.

The way I reason about this code is as follows:

42.tap do
  it = it
 #     ^ First step: read the value of `it`, which is the block's argument and therefore is equal to 42
 # ^     Second step: declare a local variable called `it` and store the value of the right hand side (which is 42)
  p it
 #  ^    Third step: at this point `it` no longer refers to the special variable `it`, but to a local variable `it`, for which the value is currently 42
end

I think it would be far simpler to forbid it as a local variable name, though the compatibility implications would need to be carefully considered.

Maybe this is a better path forward as these edge cases can be confusing. That said, I still think that the current behaviour in Prism is more intuitive and consistent.

Updated by jeremyevans0 (Jeremy Evans) 2 days ago · Edited

vinistock (Vinicius Stock) wrote in #note-15:

The way I reason about this code is as follows:

42.tap do
  it = it
 #     ^ First step: read the value of `it`, which is the block's argument and therefore is equal to 42
 # ^     Second step: declare a local variable called `it` and store the value of the right hand side (which is 42)
  p it
 #  ^    Third step: at this point `it` no longer refers to the special variable `it`, but to a local variable `it`, for which the value is currently 42
end

This isn't how Ruby works in general. Ruby is parsed left-to-right. When it comes across a local variable declaration, it registers the local variable. If Ruby comes across that identifier later in the same local variable scope (even the RHS of the same assignment expression), Ruby treats the identifier as a local variable reference instead of a method call.

Consider:

def x = 1
x = x
x # => nil

x is nil in the above example because by the time the RHS of the assignment is parsed, x is a local variable, so it is not treated as a method call.

When parsing it inside a block, it is not considered a local variable until it is referenced, and it as a user-defined local variable has precedence over it as a block parameter. You can tell this from the following code:

it = 1
proc{it}.call(2) # 1, not 2

Since it is declared as a user-defined variable on the LHS of the assignment before it is referenced on the RHS of the assignment, it is a user-defined local variable and not the implicit block parameter on the RHS on the assignment, and therefore should have value nil, just as any user-defined local variable would.

I think it would be far simpler to forbid it as a local variable name, though the compatibility implications would need to be carefully considered.

Maybe this is a better path forward as these edge cases can be confusing. That said, I still think that the current behaviour in Prism is more intuitive and consistent.

It think it is inconsistent with the rest of Ruby, and a bug in Prism. Consider the VM instructions:

For it = it outside of a block, both parse.y and Prism, and for it = it inside a block in parse.y:

local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] it@0
0000 getlocal_WC_0                          it@0                      (   1)[Li]
0002 leave

For it = it instead of a block in Prism:

local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 2] ?@0<AmbiguousArg>[ 1] it@1
0000 getlocal_WC_0                          ?@0                       (   1)[LiBc]
0002 dup
0003 setlocal_WC_0                          it@1

You could argue the Prism behavior is more useful. But if you want to make it consistent, you would have to change Ruby so that local variables are not in scope until after they are assigned, instead of after they are declared. That would break a lot of Ruby code.

Updated by jeremyevans0 (Jeremy Evans) 2 days ago · Edited

[deleted, duplicate]

Updated by vinistock (Vinicius Stock) 2 days ago

Thanks for the context, I understand your point about consistency. So this case

42.tap { |x| x = x; p x }

Works because by the time we reach x = x, the block parameter already declared x as a local and so it doesn't end up setting it equal to nil. Would it make sense to automatically declare it as a local inside the block?

Or another question, do we agree that these two should return the same result?

42.tap { it = it; p it }
42.tap { |x| x = x; p x }

Updated by AMomchilov (Alexander Momchilov) 2 days ago · Edited

@jeremyevans0 (Jeremy Evans) I see your point about how ivars take precedence over methods of the same name, but I think it is more like a block argument (well, a local variable) than a method, so I find this inconsistently really surprising:

ruby --parser=parse.y -e "42.tap {      it = it; p it }" # => nil
ruby --parser=parse.y -e "42.tap { |it| it = it; p it }" # => 42

Since it is just a shorthand for "the first block argument," then I think people would expect that it works as if you declared |it| yourself.

Updated by jeremyevans0 (Jeremy Evans) 2 days ago

vinistock (Vinicius Stock) wrote in #note-18:

Thanks for the context, I understand your point about consistency. So this case

42.tap { |x| x = x; p x }

Works because by the time we reach x = x, the block parameter already declared x as a local and so it doesn't end up setting it equal to nil. Would it make sense to automatically declare it as a local inside the block?

Or another question, do we agree that these two should return the same result?

42.tap { it = it; p it }
42.tap { |x| x = x; p x }

No. it is not a local variable until it is referenced, so the first example should always print nil (unless it was already a local variable in higher scope). Everywhere else in Ruby:

lvar = lvar

Will result in lvar being nil if was not already a local variable in scope.

To think that 42.tap { it = it; p it } should print 42 is to believe that it is a local variable before it is referenced. However, if it is never referenced, then it is never a local variable, and the block does not accept an argument:

proc{}.arity   # => 0
proc{it}.arity # => 1

You could argue that it is a local variable from the start of the block instead of where it is first referenced in the block. However, that's not how local variables work in general. Every other place in Ruby where a local variable is declared (referenced in the LHS of an assignment), it is only considered declared after the reference, not before the reference:

def x = 1
def y
  x # => 1
  x = x
  x # => nil
end

If it should be considered a local variable from the start of the block if it is referenced anywhere in the block, then in the above example, the first reference to x inside y should be a local variable reference instead of a method call, even though the x local variable is not declared until afterward.

In general, you shouldn't consider it to be a local variable. If it was a local variable, this would work:

proc{eval("it")}.call(1) # NameError

But it doesn't work, even if it is already referenced:

proc{it; eval("it")}.call(1) # NameError

If at the point of reference, it is already a local variable, Ruby treats it as it would any local variable. If it is not already a local variable, then Ruby treats it as the implicit block parameter. Inside a block, for it = it, the RHS it was previously declared as a local variable by the LHS, so it is treated as a local variable instead of as the implicit block parameter.

AMomchilov (Alexander Momchilov) wrote in #note-19:

@jeremyevans0 (Jeremy Evans) I see your point about how ivars take precedence over methods of the same name, but I think it is more like a block argument (well, a local variable) than a method, so I find this inconsistently really surprising:

ruby --parser=parse.y -e "42.tap {      it = it; p it }" # => nil
ruby --parser=parse.y -e "42.tap { |it| it = it; p it }" # => 42

Since it is just a shorthand for "the first block argument," then I think people would expect that it works as if you declared |it| yourself.

They may. However, doing so is inconsistent with how local variables are designed in Ruby. Personally, I think consistency is much more important in this case. I cannot think of a good reason to define an it local variable while also using it as the implicit block parameter. If you are only using it in the same block scope, no local variable is necessary. If you are setting it to a local variable so it is usable inside a nested block, you should pick a different variable name, as using it is more likely to cause confusion.

Updated by mame (Yusuke Endoh) 1 day ago

AMomchilov (Alexander Momchilov) wrote in #note-19:

Since it is just a shorthand for "the first block argument," then I think people would expect that it works as if you declared |it| yourself.

I can understand that expectation. However, this would mean executing an assignment to the local variable it on every block evaluation. For this performance reason, @ko1 (Koichi Sasada) is against it, as mentioned here: https://bugs.ruby-lang.org/issues/20965#note-7

jeremyevans0 (Jeremy Evans) wrote in #note-20:

Personally, I think consistency is much more important in this case.

I agree with Jeremy.

While we could debate the "intuitive" behavior of code like { it = it; p it }, I believe no one would recommend such code. Therefore, I don't see the need to make it behave "intuitively."

IMO, it should be used only for very small, obvious, single-line blocks, like { it + 1 } or { foo(it) }. I believe that any block containing an assignment should use an explicit parameter name instead of it.

Furthermore, Matz said "Assignment to it should be prohibited in the long run." (This is a difficult decision, however, as an amount of existing code uses it as a local variable name.)

Considering all the above, I believe it is better to maintain consistency with the existing semantics of x = x, rather than introducing an ad-hoc syntactic exception to satisfy an "intuitive" expectation for a piece of code that is itself not recommended.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like1Like1Like0Like0Like0Like0Like0Like0Like0Like0