Feature #19719
closedUniversal Parser
Description
Background¶
There are three use cases where we need a CRuby parser which is independent of other CRuby features like Object, GC. I call such a parser a Universal Parser.
- Use Universal Parser from Ruby applications.
For example RuboCop. RuboCop needs a parser to get AST of source code. Currently RuboCop uses parser gem. In this sense Universal Parser is a replacement of parser gem. - Use Universal Parser from C/C++ or other language.
For example Sorbet. Sorbet is written in C++. It has its own parser. Universal Parser can be used in such scenario. - Use Universal Parser for other Ruby implementations.
mruby, JRuby and other Ruby implementations will use Universal Parser so that they don’t need to develop & manage their own parser.
Design¶
- Implement Universal Parser by changing CRuby source code, especially parse.y and node.c.
IntroduceUNIVERSAL_PARSER
macro and implement Universal Parser by passing all necessary CRuby related functions viastruct rb_parser_config_struct
. In this step there are two build modes with/without Universal Parser. - Reduce CRuby related functions passed by
struct rb_parser_config_struct
. Some of them are copied into parse.y, e.g.rb_isspace
. Other are reimplemented, e.g.NODE_LIT
has new String struct instead ofVALUE
. - Once CRuby related functions needed for Universal Parser are minimized, replace rest CRuby function references with functions provided by
struct rb_parser_config_struct
and removeUNIVERSAL_PARSER
macro.
Release management¶
There are three options for releasing a binary for Universal Parser (“librubyparser”).
- Release it as so/dll
a. Include it into Ruby release process
b. Create another repository for release management of "librubyparser" - Release it as gem (ruby/universal_parser)
"librubyparser" has only pure C interface and data structure. If you want to use it from Ruby code, you need C extension code to translate "librubyparser" data and Ruby objects. I propose to create "universal_parser" gem for this purpose.
I prefer #1-b to #1-a because it doesn’t increase tasks for Ruby release process. I want to make #2 as a first milestone.
Updated by ioquatix (Samuel Williams) over 1 year ago
Sorry for bikeshedding, but can we please avoid using _struct
on struct tag names? It's already clear it's a struct
because you must always include the struct
keyword, i.e. struct rb_parser_config
. struct rb_parser_config
does not introduce a bare type named rb_parser_config_struct
.
Updated by Eregon (Benoit Daloze) over 1 year ago
As I discussed with @matz (Yukihiro Matsumoto) and @yui-knk (Kaneko Yuichiro) at RubyKaigi, the API, serialization to convert efficiently from C to Java, and AST format (a much cleaner AST) needed for JRuby and TruffleRuby are the ones developed in YARP.
JRuby and TruffleRuby are already working on migrating to YARP.
So for use case 3, I see this only achievable if the Universal Parser would match exactly what YARP does (and matz agreed Universal Parser should match YARP's API/AST).
So I am confused, why trying to reinvent YARP then?
Also none of the design steps mention how to make the AST any easier to consume (relevant for all 3 use cases), it is still the internal messy AST with weird abbreviations created by bison/lrama and type-unsafe RNode
unions.
That is not usable by other Ruby implementations and by tools, just like RubyVM::AbstractSyntaxTree
is not a proper API (#14844).
Also I think to pretend to be a "Universal Parser" it would need support and extended discussions with Ruby implementations authors (not just CRuby) and tool authors.
We probably need to have a discussion on the bug tracker about pros/cons of YARP and lrama Universal Parser.
First of all, I think @yui-knk (Kaneko Yuichiro) should show the status of Universal Parser and how it improves over YARP, otherwise it seems to purposefully ignore a much more advanced and established project (YARP).
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
Eregon (Benoit Daloze) wrote in #note-2:
First of all, I think @yui-knk (Kaneko Yuichiro) should show the status of Universal Parser and how it improves over YARP, otherwise it seems to purposefully ignore a much more advanced and established project (YARP).
As Universal Parser is based on lrama, and my understanding is lrama is 100% compatible and YARP is not yet 100% compatible (though it is getting close), that seems like it could be a major advantage for Universal Parser.
Both @yui-knk (Kaneko Yuichiro) and the YARP team are doing great work on a project of great difficulty (parsing Ruby). Let's try to use more cooperative and less antagonistic and prejudicial language when discussing this issue.
Updated by hsbt (Hiroshi SHIBATA) over 1 year ago
Release it as gem (ruby/universal_parser)
I'll support it. Please ping me when you need to create new repository under the ruby organization.
Updated by matz (Yukihiro Matsumoto) over 1 year ago
I agree API-wise, YARP is better, and already used from various tools, including TruffleRuby. On the other hand, Lrama is 100% compatible to the current parser (by definition).
As long as kaneko-san (@yui-knk (Kaneko Yuichiro)) is willing to try, I'd like to allow him to experiment. Of course, we should keep compatibility during the experiment.
In any case, the final API part (including AST) should be taken from YARP (or at least compatible to YARP).
Matz.
Updated by yui-knk (Kaneko Yuichiro) over 1 year ago
- Status changed from Open to Closed
Applied in changeset git|b481b673d753339204290d7582dbb91a6e14447a.
[Feature #19719] Universal Parser
Introduce Universal Parser mode for the parser.
This commit includes these changes:
- Introduce
UNIVERSAL_PARSER
macro. All of CRuby related functions
are passed viastruct rb_parser_config_struct
when this macro is enabled. - Add CI task with 'cppflags=-DUNIVERSAL_PARSER' for ubuntu.
Updated by Eregon (Benoit Daloze) over 1 year ago
IMO it's quite strange and not respectful of the Ruby development workflow to:
- create an issue about a big feature but discuss literally none of the details. Notably the background, design and release management sections are all completely unrelated, showing there are many holes in this issue description and maybe in the design of this feature.
- open this issue on the day of the meeting, at the last minute
- merge large code changes (+ 6607 lines) with 0 reviews, 0 comments, 0 design discussions
It feels like sneaking in changes to me.
Also while replacing bison and making the parser more independent of CRuby internals is useful, who is going to maintain all that extra code?
Right now I don't see anyone happy to maintain either of those, except Kaneko-san, which of course might not always be available to fix/evolve things.
Maintainability seems a crucial aspect that was not discussed at all here.
For instance, it seems pretty clear to me that Universal Parser wouldn't improve maintainability of the parser significantly (parse.y has many maintainability issues, I think this wouldn't solve most of them).
It would certainly be good if more committers and contributors would feel comfortable modifying the parser, right now it seems everyone avoids it as much as possible, and I don't see Universal Parser helping there.