Project

General

Profile

Actions

Feature #20257

closed

Rearchitect Ripper

Added by yui-knk (Kaneko Yuichiro) 10 months ago. Updated 9 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:116670]

Description

Abstract

Rearchitect Ripper to provide whole semantic analysis support for Ripper and improve maintainability of Ripper.
This rearchitecture is achieved by modifying Lrama parser generator.

Background and problem

Ripper is used for parsing ruby code, for example irb and rdoc use Ripper for parsing source codes.
Ripper and Ruby parser share the algorithm of parsing, however internal logic of Ripper is different from parser.
The differences cause three problems:

  1. Ripper can not execute some semantic analysis. https://bugs.ruby-lang.org/issues/10436 is an example of this limitaion. m(&nil) {} raises syntax error but Ripper.sexp("m(&nil) {}") doesn't.
  2. Ripper can not recognize regexp named capture. https://bugs.ruby-lang.org/issues/18988 is an example.
  3. Makes prase.y complex. For example, the implementation of new_array_pattern is completely different between parser and ripper.

These problems come from the fact parser and Ripper use semantic value stack differently.
Parser stores Node on the stack in many rules but Ripper stores Ruby Object returned by callback method.
Therefore Ripper can not execute semantic analysis which requires Node (#1).
Values on the stack are different then they need to implement same name functions differently (#3).
This leads different behavior like #2 because they have different match_op function.

Proposal

Introduce new semantic value stack for Ripper so that Ripper can manage both Node and Ruby Object separately.
Lrama will provide some callback entry points and new special variable for actions.

Lrama will support these callback directives, specified function is called when the event happens

  • %after-shift function_name
  • %before-reduce function_name
  • %after-reduce function_name
  • %after-shift-error-token function_name
  • %after-pop-stack function_name

Lrama also provides $:n variable to access index of each grammar symbols. The variable is translated to the minus index from the top of the stack.
For example

primary: k_if expr_value then compstmt if_tail k_end
          {
          /*% ripper: if!($:2, $:4, $:5) %*/
          /* $:2 = -5, $:4 = -3, $:5 = -2. */
          }

We can implement separated stack for Ruby Object by these features.

Implementation note

New fields of struct parser_params

  • VALUE s_value: Holds Ruby Object returned by Ripper callback method call.
  • VALUE s_lvalue: Holds Ruby Object responding to LHS of the rule.
  • VALUE s_value_stack: Stack for Ruby Object. It's actually ruby array.

These fields are added only when it's Ripper.

The role of callback functions

  • %after-shift: Push s_value to s_value_stack.
  • %before-reduce: Assign the first Ruby Object of RHS to s_lvalue (similar with $$ = $1).
  • %after-reduce: Pop s_value_stack rhs.len times then push s_lvalue to s_value_stack.
  • %after-shift-error-token: Push nil to s_value_stack. This nil stands for error token.
  • %after-pop-stack: Pop s_value_stack len times. This is needed to emulate panic mode.

Achievement

These bugs are fixed.

  • Bug 10436 "ruby -c and ripper inconsistency: m(&nil) {}"
  • Bug 18988 "Ripper cannot parse some code that has regexp named capture"
  • Bug 20055 "Ripper seems to skip some checks like void value expression and duplicated variable name"

This means all of Ripper open bugs tickets which are related current architecture will be closed.

Links

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0