Feature #19070
closedEnhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
Description
Background¶
Implementation for Language Server Protocol (LSP) sometimes needs token information. For example both m(1)
and m(1, )
has same AST structure other than node locations then it's impossible to check the existence of ,
from AST. However in later case, it might be better to suggest variables list for the second argument. Token information is important for such case.
Example¶
require "pp"
node = RubyVM::AbstractSyntaxTree.parse(<<~STR, keep_tokens: true)
def m(a, b = 1, *rest, &block)
end
m(1, )
STR
defn = node.children[2].children[0]
fcall = node.children[2].children[1]
puts "defn.tokens"
pp defn.tokens
puts "\n\n"
puts "fcall.tokens"
pp fcall.tokens
puts "\n\n"
puts defn.tokens.map{_1[2]}.join
puts fcall.tokens.map{_1[2]}.join
shows below, where token is [sequence_id, token_type, token_string, [first_line, first_column, last_line, last_column]]
defn.tokens
[[0, :kw, "def", [1, 0, 1, 3]],
[1, :sp, " ", [1, 3, 1, 4]],
[2, :ident, "m", [1, 4, 1, 5]],
[3, :lparen, "(", [1, 5, 1, 6]],
[4, :ident, "a", [1, 6, 1, 7]],
[5, :comma, ",", [1, 7, 1, 8]],
[6, :sp, " ", [1, 8, 1, 9]],
[7, :ident, "b", [1, 9, 1, 10]],
[8, :sp, " ", [1, 10, 1, 11]],
[9, :op, "=", [1, 11, 1, 12]],
[10, :sp, " ", [1, 12, 1, 13]],
[11, :int, "1", [1, 13, 1, 14]],
[12, :comma, ",", [1, 14, 1, 15]],
[13, :sp, " ", [1, 15, 1, 16]],
[14, :op, "*", [1, 16, 1, 17]],
[15, :ident, "rest", [1, 17, 1, 21]],
[16, :comma, ",", [1, 21, 1, 22]],
[17, :sp, " ", [1, 22, 1, 23]],
[18, :op, "&", [1, 23, 1, 24]],
[19, :ident, "block", [1, 24, 1, 29]],
[20, :rparen, ")", [1, 29, 1, 30]],
[21, :ignored_nl, "\n", [1, 30, 1, 31]],
[22, :kw, "end", [2, 0, 2, 3]]]
fcall.tokens
[[25, :ident, "m", [4, 0, 4, 1]],
[26, :lparen, "(", [4, 1, 4, 2]],
[27, :int, "1", [4, 2, 4, 3]],
[28, :comma, ",", [4, 3, 4, 4]],
[29, :sp, " ", [4, 4, 4, 5]],
[30, :rparen, ")", [4, 5, 4, 6]]]
def m(a, b = 1, *rest, &block)
end
m(1, )
Interface¶
- Add
keep_tokens
option forRubyVM::AbstractSyntaxTree.parse
,.parse_file
and.of
- Add
RubyVM::AbstractSyntaxTree::Node#tokens
which returns tokens for the node including tokens for descendants nodes. - Add
RubyVM::AbstractSyntaxTree::Node#all_tokens
which returns all tokens for the input script regardless the receiver node.
Implementation¶
Updated by Eregon (Benoit Daloze) about 2 years ago
Doesn't Ripper.lex
already provide this information?
Updated by matz (Yukihiro Matsumoto) about 2 years ago
Sounds OK.
Matz.
Updated by yui-knk (Kaneko Yuichiro) about 2 years ago
- Status changed from Open to Closed
Applied in changeset git|d8601621edcf29e3323b90dcf04b774edd9fb45e.
Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both m(1)
and m(1, )
has same AST structure other than node locations
then it's impossible to check the existence of ,
from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.
This commit adds these methods.
- Add
keep_tokens
option forRubyVM::AbstractSyntaxTree.parse
,.parse_file
and.of
- Add
RubyVM::AbstractSyntaxTree::Node#tokens
which returns tokens for the node including tokens for descendants nodes. - Add
RubyVM::AbstractSyntaxTree::Node#all_tokens
which returns all tokens for the input script regardless the receiver node.
[Feature #19070]
Impacts on memory usage and performance are below:
Memory usage:
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb
# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb
# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
Performance:
$ cat ../ast_keep_tokens.yml
prelude: |
src = <<~SRC
module M
class C
def m1(a, b)
1 + a + b
end
end
end
SRC
benchmark:
without_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
with_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..
| |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens | 21.659k| 21.303k|
| | 1.02x| -|
|with_keep_tokens | 6.220k| 5.691k|
| | 1.09x| -|