Bug #21618
openAllow to use the build-in prism version to parse code
Description
Prism is a multi-version parser, which is a great feature.
If one calls Prism.parse("foo")
, it will always use the latest version prism knows about. This may or may not be the version that is currently executing. This is problematic when one wants to introspect code the same way that the currently running ruby version would.
Consider that some syntax will be disallowed in the next ruby version. I would have to specify the ruby version I want to parse as in order to not run into syntax errors: Prism.parse("foo", version: RUBY_VERSION)
. But doing so is not feasable because prism
is distributed out of sync with ruby itself. If the user already has prism in their lockfile, the user may run a prism version that doesn't yet know about lets say ruby 4.0 and thus raise. Similarly, it may parse as an older patch version that has subtle behaviour differences.
ripper
does not have this issue because it is always tied to the ruby version and it is not distributed as a gem, so what the user has will always be exactly what ruby shipped with. I wish for a similar API that utilizes the prism version bundled with ruby itself.
Libraries like rails
have moved from ripper to prism because of its superior developer experience but it risks breaking in unexpected ways with new prism
versions that know about more recent syntax. error_highlight
for example also suffers from the same defect.
It seems like prism currently has 34 methods that optionally take a version (per the rbi file). Many of these are trivial helper methods like Prism.parse_success?
(does the parse result have any errors?). I would be happy with supporting only the minimal functions like Prism.parse
and Prism.parse_file
. Currently I don't have a use-case for any of the other methods. Pretty much just functions to mirror RubyVM::InstructionSequence.compile_file
and RubyVM::InstructionSequence.compile
.
It would be optimal if I would be able to transparently call Prism.parse("foo", version: "current")
(or maybe even have an unspecified version mean the build-in prism version) but I am not sure how feasable that is since I'm pretty sure logic for this would have to live in ruby itself.
@kddnewton (Kevin Newton) does this feature request make sense? Do you have any other ideas/inputs?
Updated by tenderlovemaking (Aaron Patterson) 23 days ago
I think this is a good idea. I recall discussing this type of feature with @kddnewton (Kevin Newton) at one point. He had some ideas about implementing it, but I don't recall the exact ideas. The use case seems very clear though.
Updated by kddnewton (Kevin Newton) 23 days ago
Yes, this was brought up by @k0kubun (Takashi Kokubun) originally as one of the requirements. At the moment the official answer is Prism.parse_file(filepath, version: RUBY_VERSION)
, because that will work for all Ruby versions in normal maintenance right now.
That being said, I do think it would make sense to support a feature like this going forward, it just becomes difficult for other implementations, depending on how they ship Prism. It might make sense for it to be RubyVM.parse
to make it clear it's just for CRuby. I'm not sure if @Eregon (Benoit Daloze) or @enebo would have thoughts on this.
Updated by Earlopain (Earlopain _) 23 days ago
· Edited
RubyVM.parse
doesn't sound particularly nice to me. If you put it there then there have to be defined?
checks and fallback implementations for non-CRuby which to me kinda defeats the point of using prism in the first place. I would massively prefer for it to be part of the stable/shared api, if at all feasable.
Of course, I don't have any better suggestions at the moment.
Updated by kddnewton (Kevin Newton) 23 days ago
So the issue is that there is the C library and then there is the Ruby library. The C library can parse and provide an AST in C structures, but to reify it into Ruby objects requires a specific Ruby library. This is possible, but only if the runtime ships the same matching version for both. I don't think any other implementation does this, because they only need the C library + FFI (JRuby/TruffleRuby/Garnet) or the C library + native bindings (Natalie/Opal).
Of course it is always possible to include methods that are just unimplemented for other runtimes, it just becomes less portable.
Updated by Eregon (Benoit Daloze) 21 days ago
Why is Prism.parse_file(filepath, version: RUBY_VERSION)
not good enough?
In which concrete scenarios does it not work?
The prism
default gem shipped with Ruby 3.3+ will always support version: RUBY_VERSION
.
Another prism
version might be used when using bundler
or installing a newer prism
(e.g. gem install prism
).
If that prism version is newer, version: RUBY_VERSION
will work.
If that prism version is older, it's possible it doesn't recognize the RUBY_VERSION.
That case though could be seen as "that old prism doesn't support a recent Ruby", i.e. the prism dependency should be updated to be less old.
In that sense it's not different than many other gems with older versions which might not work on the newest Ruby version.
Another POV here is: either prism
is not specified in the Gemfile, and version: RUBY_VERSION
will always work,
or it's specified in the Gemfile and then it's the responsibility of the maintainers of that Gemfile to keep it recent enough, e.g. by not using a Gemfile.lock
or by regularly updating the Gemfile.lcok
, at least when supported a new Ruby version (which is the same workflow as for other gem dependencies).
I'm guessing what you are asking is to get the Prism Ruby API for the prism C parser used by the Ruby interpreter to parse Ruby code?
I think that cannot be provided easily, because it would conflict with the prism gem, of which another version might be loaded.
I suppose one could try to nest that "current Ruby" Prism Ruby API under a different module than ::Prism
, but it feels suboptimal and complicated, I think there is not a strong enough need to justify such complexity.
Updated by Eregon (Benoit Daloze) 21 days ago
From the description it seems a (the?) main concern is that Prism.parse(code)
etc use the latest version by default and not version: RUBY_VERSION
by default.
I share that concern, that feels error-prone.
Is there a use case for the default being latest?
I think not, usages would either want the current Ruby syntax or a specific one and then they would specify it explicitly (like in RuboCop).
So how about changing the default parse/syntax version of Prism then to be RUBY_VERSION?
Updated by Earlopain (Earlopain _) 21 days ago
I'm guessing what you are asking is to get the Prism Ruby API for the prism C parser used by the Ruby interpreter to parse Ruby code?
Ripper for all its faults was able to do just that. I understand that there are implementation concerns, I am just asking for something to make that possible again. I don't think parsing exactly (not just almost the same) as the ruby I am running as is an unreasonable request.
You should always bump all gems on ruby version updates. But prism is not patch level aware (that is fine) so even if you do RUBY_VERSION
it is not perfect. 3.2.0 I think had a bug with nested anonymous block parameters that had a pretty big impact. Now I am required to also bump prism with every patch version. If I run older patch versions, I would be have to run older prism versions from around the time my patch version was released.
From the description it seems a (the?) main concern is that Prism.parse(code) etc use the latest version by default
Yeah, I am not a fan of that behaviour as well. I guess this is intential, similar to how the unversioned parser translator class also uses the latest version. But changing that was not really what I was asking for.
To be clear, patch versions of ruby should never contain any big syntax changes. The impact of using any prism version that understands my major/minor version should be minimal in the majority of cases. But, I would prefer to just have something where I don't even have to worry about such things.
Prism.parse("foo", version: RUBY_VERSION)
works good enough but differently to the real intention to the point where it makes me uncomfortable.
Updated by Eregon (Benoit Daloze) 20 days ago
Earlopain (Earlopain _) wrote in #note-7:
I don't think parsing exactly (not just almost the same) as the ruby I am running as is an unreasonable request.
I think they are exactly the same, and if it's not the case that's a bug we can fix in Prism.
I think it is unreasonable to have a duplicate Ruby API just for this under another module (BTW it's quite a lot of classes and memory footprint, not great to have effectively 2 copies of Prism in the same process), + namespacing functions at the C level too (otherwise we get native symbol conflicts), that's a huge amount of complications for as far as I can see no useful difference with using version: RUBY_VERSION
.
There is no possible way to share Prism node classes, as e.g. newer/older versions might have extra node classes.
But prism is not patch level aware (that is fine) so even if you do
RUBY_VERSION
it is not perfect. 3.2.0 I think had a bug with nested anonymous block parameters that had a pretty big impact.
I think if there is a relevant difference between Ruby X.Y.Z and X.Y.(Z+1) then Prism could/should take that into account, since it already takes RUBY_VERSION
as the input.
I would guess though the only differences are bug fixes, and then is it really useful to have the buggy behavior on the buggy Ruby, isn't it better to just have the bug-fixed behavior by ensuring to use a new enough prism
gem?
BTW do you have a link for that bug?
That's before Prism became a default gem though, so of course it had less testing and usage back then.
Now I am required to also bump prism with every patch version. If I run older patch versions, I would be have to run older prism versions from around the time my patch version was released.
Only if there is a such a bug and it matters for your usage and you don't use the prism default gem (since that would have the fix).
That's a lot of if
, IOW I think it's so rare in practice it doesn't matter.
It's also a totally normal workflow to update a gem to get a bug fix.
In such a scenario it's also clearly better to use a newer prism >=A.B.C as a gem/bundle dependency than to get the broken behavior on Ruby 3.2.0 and the correct behavior on 3.2.1.
From the description it seems a (the?) main concern is that Prism.parse(code) etc use the latest version by default
Yeah, I am not a fan of that behaviour as well. I guess this is intential, similar to how the unversioned parser translator class also uses the latest version. But changing that was not really what I was asking for.
I think it's historical at this point, and maybe comes from Prism initially only having one version, but @kddnewton (Kevin Newton) would know better.
My guess is changing the default here would fix more issues than cause incompatibilities.
To be clear, patch versions of ruby should never contain any big syntax changes. The impact of using any prism version that understands my major/minor version should be minimal in the majority of cases. But, I would prefer to just have something where I don't even have to worry about such things.
Prism.parse("foo", version: RUBY_VERSION)
works good enough but differently to the real intention to the point where it makes me uncomfortable.
I think the solution there is, if there is a relevant difference between Ruby X.Y.Z and X.Y.(Z+1) then Prism should take that into account, and then it matches the intention exactly.
Another thought is if you really want to ensure you use the exact same Prism version as the one used by the interpreter, you could check that as long as that default version version is exposed somehow (e.g. it is though gemspecs). I don't think that's great though then, because it prevents other gems to use a newer Prism version (which would work just fine for your use case too).
Updated by Eregon (Benoit Daloze) 20 days ago
Taking the concerns from the description:
Earlopain (Earlopain _) wrote:
Consider that some syntax will be disallowed in the next ruby version. I would have to specify the ruby version I want to parse as in order to not run into syntax errors:
Prism.parse("foo", version: RUBY_VERSION)
. But doing so is not feasable becauseprism
is distributed out of sync with ruby itself.
If the user already has prism in their lockfile, the user may run a prism version that doesn't yet know about lets say ruby 4.0 and thus raise.
If it's locked to an older prism in lockfile, it's expected to not work. Same as any other gem really. Need to update in lockfile.
Similarly, it may parse as an older patch version that has subtle behaviour differences.
If the differences are relevant those could be handled in Prism. They are probably just bug fixes and if so then the fixed behavior seems always best.
IOW, I think they are theoretical concerns.
Is there a concrete realistic problem with using Prism.parse_file(filepath, version: RUBY_VERSION)
where the solution is not trivial?
Updated by Earlopain (Earlopain _) 20 days ago
I think they are exactly the same, and if it's not the case that's a bug we can fix in Prism.
It's already a non-trivial amount of effort to provide major/minor support. Taking patch into account is unreasonable in my eyes.
Accepting X.Y.Z as an input is just for convenience.
BTW do you have a link for that bug?
It was 3.3.0: https://bugs.ruby-lang.org/issues/20090
My guess is changing the default here would fix more issues than cause incompatibilities.
I don't think changing the default would be much of an improvement. You can't tell what people really want. I would prefer no default at all but that ship has sailed.
IOW, I think they are theoretical concerns.
They are. With ripper they were none, which is mostly what I am saying. Prism is otherwise better in every way.
For now, I will just update places to use RUBY_VERSION
since that is clearly more correct that not passing anything at all.
I don't think there would be much discussion if it were trivial to implement. It's ok to reject based on that. RUBY_VERSION
gets you 99% there but it would be nice if it could be 100%.
Updated by Eregon (Benoit Daloze) 18 days ago
· Edited
Earlopain (Earlopain _) wrote in #note-10:
I don't think changing the default would be much of an improvement. You can't tell what people really want.
I think we can establish it's only 2 use cases:
- either they want the same as the running Ruby version, i.e they just want to parse code like the current Ruby parses code (and that's why I think that should be the default),
- or they want some specific other version, in which case they really need to specify it (e.g. "3.4").
I don't know of a realistic use case to parse with "latest version supported by Prism" on an older Ruby (especially considering that changes based on the Prism version),
that seems basically always an error to me (though it might accidentally work good enough for a while), but maybe I'm missing something?
So with that I think it means the default of "latest" is basically of no use, and therefore changing that to "current" fixes it for the first use case (probably the most common), where one expects Prism.parse(code)
to parse code with the same syntax version as RUBY_VERSION.
For the second use case one needs to be explicit anyway, so the default doesn't matter there.
For now, I will just update places to use
RUBY_VERSION
since that is clearly more correct that not passing anything at all.
I think this is the way to go. If there is practical issue with that let's reconsider what to do to address it.
Updated by kddnewton (Kevin Newton) 15 days ago
I don't think these are theoretical concerns at all — this issue has bothered me since we shipped prism initially to be honest. @k0kubun (Takashi Kokubun) put it pretty succinctly in his note on this issue: https://bugs.ruby-lang.org/issues/19772#note-16. We went with option 3, but this ticket is suggesting we go with option 4.
I would very much like to solve this by exposing the exact C API that the current Ruby runtime is exposing, but I haven't quite figured out how yet. The closest I have come to a compromise of the various concerns would be to version the nodes such that we only need a new constant for the ones that change, but I haven't finished thinking about that yet. Like maybe the serialization format contains a version integer at the beginning?
The other issue discussed in here is that prism defaults to the latest Ruby syntax version. That was an intentional choice — the first thing I was building with this was a formatter, and I wanted it to understand the latest syntax no matter what the currently running version of Ruby was. I am actually open to changing this, but it will have to be in a major version, and that's something we can just discuss on the ruby/prism issue tracker instead of here since it doesn't impact ruby/ruby.
Updated by Earlopain (Earlopain _) 15 days ago
They are theoretical for now, at least to me. Perhaps I could find something if I really tried right now but I don't think that would be time well spent.
Right now we are actually at option 5 since prism can parse multiple versions of ruby syntax. If it where option 3, that would be a rather big/majorly inconvenient problem for the described use-case of that comment you linked. It's also the use-case I openend this issue with.
This is just about patch-level compatibility. Parser::CurrentRuby
for example also only provides major/minor parsing.
It's pretty ok the way it is right now already, as long as you don't do Prism.parse
without a version and expect it to use the current ruby instead.
The other issue discussed in here is that prism defaults to the latest Ruby syntax version
I am fine with either using the latest or the current ruby as the default. I think such an API would actually be better served without any default though. But yeah, we don't have to discuss that here.
If Prism.parse(foo, version: "current_ruby")
or similar would be possible, that'd be great actually. Mostly I have complained about outdated gems in the lockfile etc., with that prism itself can raise a very descriptive error if that happens.