Feature #20257
closedRearchitect Ripper
Description
Abstract¶
Rearchitect Ripper to provide whole semantic analysis support for Ripper and improve maintainability of Ripper.
This rearchitecture is achieved by modifying Lrama parser generator.
Background and problem¶
Ripper is used for parsing ruby code, for example irb and rdoc use Ripper for parsing source codes.
Ripper and Ruby parser share the algorithm of parsing, however internal logic of Ripper is different from parser.
The differences cause three problems:
- Ripper can not execute some semantic analysis. https://bugs.ruby-lang.org/issues/10436 is an example of this limitaion.
m(&nil) {}
raises syntax error butRipper.sexp("m(&nil) {}")
doesn't. - Ripper can not recognize regexp named capture. https://bugs.ruby-lang.org/issues/18988 is an example.
- Makes prase.y complex. For example, the implementation of
new_array_pattern
is completely different between parser and ripper.
These problems come from the fact parser and Ripper use semantic value stack differently.
Parser stores Node on the stack in many rules but Ripper stores Ruby Object returned by callback method.
Therefore Ripper can not execute semantic analysis which requires Node (#1).
Values on the stack are different then they need to implement same name functions differently (#3).
This leads different behavior like #2 because they have different match_op
function.
Proposal¶
Introduce new semantic value stack for Ripper so that Ripper can manage both Node and Ruby Object separately.
Lrama will provide some callback entry points and new special variable for actions.
Lrama will support these callback directives, specified function is called when the event happens
- %after-shift function_name
- %before-reduce function_name
- %after-reduce function_name
- %after-shift-error-token function_name
- %after-pop-stack function_name
Lrama also provides $:n
variable to access index of each grammar symbols. The variable is translated to the minus index from the top of the stack.
For example
primary: k_if expr_value then compstmt if_tail k_end
{
/*% ripper: if!($:2, $:4, $:5) %*/
/* $:2 = -5, $:4 = -3, $:5 = -2. */
}
We can implement separated stack for Ruby Object by these features.
Implementation note¶
New fields of struct parser_params¶
-
VALUE s_value
: Holds Ruby Object returned by Ripper callback method call. -
VALUE s_lvalue
: Holds Ruby Object responding to LHS of the rule. -
VALUE s_value_stack
: Stack for Ruby Object. It's actually ruby array.
These fields are added only when it's Ripper.
The role of callback functions¶
- %after-shift: Push
s_value
tos_value_stack
. - %before-reduce: Assign the first Ruby Object of RHS to
s_lvalue
(similar with$$ = $1
). - %after-reduce: Pop
s_value_stack
rhs.len
times then pushs_lvalue
tos_value_stack
. - %after-shift-error-token: Push
nil
tos_value_stack
. Thisnil
stands forerror
token. - %after-pop-stack: Pop
s_value_stack
len
times. This is needed to emulate panic mode.
Achievement¶
These bugs are fixed.
- Bug 10436 "ruby -c and ripper inconsistency: m(&nil) {}"
- Bug 18988 "Ripper cannot parse some code that has regexp named capture"
-
Bug 20055 "Ripper seems to skip some checks like
void value expression
andduplicated variable name
"
This means all of Ripper open bugs tickets which are related current architecture will be closed.
Links¶
- Lrama Implementaion: https://github.com/ruby/lrama/pull/367
- Ruby Implementaion: https://github.com/ruby/ruby/pull/9923