Feature #10333
closed[PATCH 3/1] optimize: "yoda literal" == string
Description
This is a follow-up-to:
- [Feature #10326] optimize: recv << "literal string"
- [Feature #10329] optimize: foo == "literal string"
This can be slightly faster than: (string == "literal") because
we can guaranteed the "yoda literal" is already a string at
compile time.
Updated benchmarks from Xeon E3-1230 v3 @ 3.30GHz:
target 0: trunk (ruby 2.2.0dev (2014-10-06 trunk 47822) [x86_64-linux]) at "/home/ew/rrrr/b/i/bin/ruby"
target 1: built (ruby 2.2.0dev (2014-10-06 trunk 47822) [x86_64-linux]) at "/home/ew/ruby/b/i/bin/ruby"
loop_whileloop2
i = 0
while i< 6_000_000 # benchmark loop 2
i += 1
end
trunk 0.10712811909615993
trunk 0.10693809622898698
trunk 0.10645449301227927
trunk 0.10646287119016051
built 0.10612367931753397
built 0.10581812914460897
built 0.10592922195792198
built 0.10595094738528132
vm2_streq1
i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
i += 1
foo == "literal"
end
trunk 0.47250875690951943
trunk 0.47325073881074786
trunk 0.4726782930083573
trunk 0.4727754699997604
built 0.185972370672971
built 0.1850820742547512
built 0.18558283289894462
built 0.18452610215172172
vm2_streq2
i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
i += 1
"literal" == foo
end
trunk 0.4719057851471007
trunk 0.4715963830240071
trunk 0.47177061904221773
trunk 0.4724834677763283
built 0.18247668212279677
built 0.18143231887370348
built 0.18060296680778265
built 0.17929687118157744
raw data:
[["loop_whileloop2",
[[0.10712811909615993,
0.10693809622898698,
0.10645449301227927,
0.10646287119016051],
[0.10612367931753397,
0.10581812914460897,
0.10592922195792198,
0.10595094738528132]]],
["vm2_streq1",
[[0.47250875690951943,
0.47325073881074786,
0.4726782930083573,
0.4727754699997604],
[0.185972370672971,
0.1850820742547512,
0.18558283289894462,
0.18452610215172172]]],
["vm2_streq2",
[[0.4719057851471007,
0.4715963830240071,
0.47177061904221773,
0.4724834677763283],
[0.18247668212279677,
0.18143231887370348,
0.18060296680778265,
0.17929687118157744]]]]
Elapsed time: 6.097474559 (sec)
benchmark results:
minimum results in each 4 measurements.
Execution time (sec)
name |trunk |built
--------+-------+-------
loop_whileloop2 |0.106 |0.106
vm2_streq1* |0.366 |0.079
vm2_streq2* |0.365 |0.073
Speedup ratio: compare with the result of `trunk' (greater is better)
name |built
--------+-------
loop_whileloop2 |1.006
vm2_streq1* |4.651
vm2_streq2* |4.969
benchmark/bm_vm2_streq2.rb | 6 ++++++
compile.c | 20 +++++++++++++++++++-
insns.def | 20 ++++++++++++++++++++
test/ruby/test_string.rb | 12 ++++++++----
4 files changed, 53 insertions(+), 5 deletions(-)
create mode 100644 benchmark/bm_vm2_streq2.rb
Files
Updated by ko1 (Koichi Sasada) about 10 years ago
Comments for this ticket and the following tickets:
To continue this kind of hack, we need tons of instructions for each methods.
What do you think about it?
Basically, we need to add them more carefully. For example, persuasive explanation are needed, such as statistics (analysis by parser/compier), benchmark results for major use cases (maybe "<< 'literal'" for templates. but not sure this ticket for) .
Another idea is to make more general approach to indicate arguments (and a receiver) are string literal. It is called specialization. Specialized instructions (opt_plus and so on) is a kind of specialization by hands.
Small comments:
(1) iseq_compile_each() should not use opt_* instructions because we should be able to make instructions without opt_* insns (on/off by compile options).
(2) Name of instructions should be reconsidered.
Updated by akr (Akira Tanaka) about 10 years ago
How about to add .freeze to string literals?
i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
i += 1
foo == "literal".freeze
end
It reduce object allocations and may have some performance gain (without new VM instructions.)
Updated by normalperson (Eric Wong) about 10 years ago
ko1@atdot.net wrote:
Comments for this ticket and the following tickets:
To continue this kind of hack, we need tons of instructions for each
methods. What do you think about it?
I am not completely happy with my current patches because of verbosity
and also icache footprint in the main VM loop. Ruby executable sizes
(even stripped) seem to get bigger with every release :<
However, perhaps the biggest performance problem is still too many
allocations and garbage objects; so I am willing to trade some code
size to reduce dynamic allocations.
Basically, we need to add them more carefully. For example, persuasive
explanation are needed, such as statistics (analysis by
parser/compier), benchmark results for major use cases (maybe "<<
'literal'" for templates. but not sure this ticket for) .
Right, we will need to find more real benchmarks.
Sadly, there are many places where garbage grows. So maybe this change
is only 1-2% overall. We may need a lot of small changes to add up to
noticeable improvements.
Another idea is to make more general approach to indicate arguments
(and a receiver) are string literal. It is called specialization.
Specialized instructions (opt_plus and so on) is a kind of
specialization by hands.
I've been thinking along these lines, too. For example, I would like to
see String#tr! and String#gsub! able to avoid allocations for literal
strings. Or even optimize: Time.now.{to_f,to_i,strftime("lit"))}
As suggested by akr, users may .freeze (or use constants), but that is
verbose and requires VM internal knowledge. My goal is to make
optimization as transparent as possible so users may write concise,
idiomatic Ruby code.
It would be great if things like ruby-trunk r47813
(unicode_norm_gen.rb: optimize concatenation)
can be done transparently, even.
Small comments:
(1) iseq_compile_each() should not use opt_* instructions because we
should be able to make instructions without opt_* insns (on/off by
compile options).
Right. I'll see about making it optional and doing it more
idiomatically. I mainly used existing opt_{aref,aset}_with compilation
as a guide.
(2) Name of instructions should be reconsidered.
OK, I do not mind changing names.
I also did informal benchmarks with my system Perl installation
(Perl 5.14.2 on Debian stable x86-64):
loop_whileloop2
use strict;
my $i = 0;
while ($i < 6_000_000) { # benchmark loop 2
$i += 1;
}
Perl 0.228s
trunk 0.10645449301227927
built 0.10581812914460897
Without the string compare, we're already faster than Perl \o/
vm2_streq1
use strict;
my $i = 0;
my $foo = "literal";
while ($i < 6_000_000) { # benchmark loop 2
$i += 1;
$foo eq "literal";
}
Perl 0.0349s
trunk 0.4726782930083573
built 0.18452610215172172
We lose to Perl without the optimization, but win with it :)
This is just a micro-benchmark, of course, but I think it's an important
data point to show gains by avoiding allocations when possible
Updated by normalperson (Eric Wong) about 10 years ago
Eric Wong normalperson@yhbt.net wrote:
ko1@atdot.net wrote:
Another idea is to make more general approach to indicate arguments
(and a receiver) are string literal. It is called specialization.
Specialized instructions (opt_plus and so on) is a kind of
specialization by hands.I've been thinking along these lines, too. For example, I would like to
see String#tr! and String#gsub! able to avoid allocations for literal
strings. Or even optimize: Time.now.{to_f,to_i,strftime("lit"))}
Maybe start moving existing iseq_compile_each optimizations to the
peephole optimizer (work-in-progress):
http://80x24.org/spew/m/ee49aae645e0953fc16fc1557dce6a09b4de4324.txt
Updated by nobu (Nobuyoshi Nakada) about 10 years ago
Updated by nobu (Nobuyoshi Nakada) about 10 years ago
- Description updated (diff)
Updated by normalperson (Eric Wong) about 10 years ago
Eric Wong normalperson@yhbt.net wrote:
Maybe start moving existing iseq_compile_each optimizations to the
peephole optimizer (work-in-progress):
http://80x24.org/spew/m/ee49aae645e0953fc16fc1557dce6a09b4de4324.txt
Part #2, generic putstring_for instruction:
http://80x24.org/spew/m/5a77be4e211c81a509573e3e1ca3bc3ca2383e68.txt
A new putstring_for instruction may replace all current uses of:
- opt_str_freeze
- opt_aref_with
- opt_aset_with
This new instruction should also be usable to implement new
optimizations to avoid rb_str_resurrect
.
Optimizations for literal hash["literal"]
(aref/lookup) and
"literal".freeze
are easily moved to the peephole optimizer.
However, it seems easier to optimize hash["literal"] = val
in iseq_compile_each
right now.
This reduces performance compared to the old opt_aref_with
and
opt_aset_with
instructions slightly, but is more elegant for in
avoiding special cases. We may decide to resurrect opt_aref_with
and opt_aset_with
if we want to recover the small performance loss
and can accept a bigger VM loop.
"".freeze
performance is probably not interesting to anyone :)
benchmark results:
minimum results in each 5 measurements.
Execution time (sec)
name | 2.1.3 | trunk | built
----------------------+--------+--------+-------
loop_whileloop2 | 0.106 | 0.106 | 0.106
vm2_hash_aref_lit* | 0.503 | 0.162 | 0.192
vm2_hash_aset_lit* | 0.587 | 0.214 | 0.241
Speedup ratio: compare with the result of `2.1.3' (greater is better)
name | trunk | built
----------------------+--------+-------
loop_whileloop2 | 1.000 | 0.998
vm2_hash_aref_lit* | 3.099 | 2.621
vm2_hash_aset_lit* | 2.741 | 2.435
Updated by normalperson (Eric Wong) about 10 years ago
nobu@ruby-lang.org wrote:
Thanks. I also noticed your `peephole_opt_send' branch which implements
opt_aset_with. Any reason it was not commited?
I'll take a closer look tomorrow since I have not managed to do
"opt_aset_with" using peephole + "putstring_for" insn [ruby-core:65539]
Updated by normalperson (Eric Wong) about 10 years ago
Eric Wong normalperson@yhbt.net wrote:
Maybe start moving existing iseq_compile_each optimizations to the
peephole optimizer (work-in-progress):
http://80x24.org/spew/m/ee49aae645e0953fc16fc1557dce6a09b4de4324.txtPart #2, generic putstring_for instruction:
http://80x24.org/spew/m/5a77be4e211c81a509573e3e1ca3bc3ca2383e68.txt
Part #3, for obj << "literal"' and
obj == "literal"' calls:
http://80x24.org/spew/m/1412827413-14385-1-git-send-email-e%4080x24.org.txt
Updated by ko1 (Koichi Sasada) about 10 years ago
On 2014/10/09 11:04, Eric Wong wrote:
A new putstring_for instruction may replace all current uses of:
Sorry, I can't find a `putstring_for' instruction. Which patch should I see?
I'm happy how ruby sources compiled to bytecode sequence.
BTW, any instructions except opt_* instrcutions don't include `_'
(underscore) characters. It is easy that which are core isntructions and
which are optional (for maybe optimization) instructions.
--
// SASADA Koichi at atdot dot net
Updated by normalperson (Eric Wong) about 10 years ago
SASADA Koichi ko1@atdot.net wrote:
On 2014/10/09 11:04, Eric Wong wrote:
A new putstring_for instruction may replace all current uses of:
Sorry, I can't find a `putstring_for' instruction. Which patch should I see?
putstring_for is in part #2:
http://80x24.org/spew/m/5a77be4e211c81a509573e3e1ca3bc3ca2383e68.txt
All patches + git also pushed to my "putstring" branch in git:
http://bogomips.org/ruby.git/log/?h=putstring
(git remote add bogomips git://bogomips.org/ruby.git &&
git fetch bogomips)
I'm happy how ruby sources compiled to bytecode sequence.
BTW, any instructions except opt_* instrcutions don't include `_'
(underscore) characters. It is easy that which are core isntructions and
which are optional (for maybe optimization) instructions.
OK, I can rename to "putstring2" or "putstringfor" (or "putstringlazy")?
You should decide the final name.
Updated by normalperson (Eric Wong) about 10 years ago
Current patch (named "opt_str_lit" insn):
http://80x24.org/spew/m/opt-str-lit-v1%40m.txt
One VM instruction now optimizes away object allocation for string
literals in the following cases:
- "lit" % obj
- str << "lit"
- "lit" + str
- str + "lit"
- "lit" * num
- "lit" === obj
- obj === "lit"
- "lit" == str
- str == "lit"
- "lit" != str
- str != "lit"
- hash["lit"]
- hash["lit"] = obj
It is also in the "opt_str_lit" branch of git://bogomips.org/ruby.git
We may rework the current BOP_ redefinition checks via generated code
to make it easier to extend to args of common operations:
- String#{gsub,tr,delete,squeeze,split,unpack})
- Hash#{include?,#key?} ...
- Array#{pack,include?,join} ...
Basically any common operation which takes a string literals.
Updated by normalperson (Eric Wong) about 10 years ago
Eric Wong normalperson@yhbt.net wrote:
Benchmark suite so far:
http://80x24.org/spew/m/opt_str_lit-v1-bench%40whir.txt
Note: this increases iseq size and can slow some performance on some
benchmarks due to the removal of the opt_{aset,aref}with VM
instructions. Since opt_str_lit is generic, many similar optimizations
may be implemented with less effort. (working on getting BOP done more
generically for sub/gsub/tr-type methods)
Updated by normalperson (Eric Wong) about 10 years ago
I've changed the way BOP_* flags are defined (they are now OM_* flags).
Everything is in "opt_str_lit-v2" branch of git://bogomips.org/ruby.git
Full diff of my work-in-progress here against trunk:
http://80x24.org/spew/m/opt-str-lit-v2%40m.txt
Interesting individual patches:
-
Auto enum generation for optimized methods:
http://bogomips.org/ruby.git/patch?id=220be6849 -
Optimized string allocations for gsub/sub/tr/tr_s(!) calls:
http://bogomips.org/ruby.git/patch?id=bc28526fa
I'll work some cleanups and adding more methods tomorrow:
* Hash#{has_key,member,key,include}?
* {Array,Hash,String}#delete
* Array#include?
* String#{squeeze,count,delete!}
* Time#strftime
...
There's a lot of potential, here :)
Updated by ko1 (Koichi Sasada) about 10 years ago
I can't find out which methods are target.
How to see the list?
I'm afraid that many optimized methods becomes overhead for non-target methods.
Updated by normalperson (Eric Wong) about 10 years ago
ko1@atdot.net wrote:
I can't find out which methods are target.
How to see the list?I'm afraid that many optimized methods becomes overhead for non-target methods.
Most comments + optimization are in compile.c (still awork-in-progress).
The peephole optimization phase rewrites many
putstring instructions to opt_str_lit instructions (same insn length).
Sorry my last (v2) patch is dirty. Still a work-in-progress.
I'll try to get an updated patch out this week.
For sub/gsub/tr* methods, the optimization is an obvious win: the only
common use for methods with those method names is with the String class;
so it is effective.
I think start_with?/end_with?/squeeze/pack/unpack fall into the same
category: common case is only one core class uses them with string
literals. This makes opt_str_lit a win with those method names.
For some method names (include?/delete/key?); I see them often used with
both core (Array/Hash) and non-core (Set/GDBM) classes.
We need to be careful with common method names and string literal.
Maybe we may use dynamic instruction rewriting if we detect uses on
non-core classes and dynamically rewrite opt_str_lit back to putstring.
Note: opt_str_lit has the same insn length as putstring;
we may rewrite iseq->iseq at runtime.
Updated by ko1 (Koichi Sasada) about 10 years ago
Thank you for your explanation. Completely agree with you.
I will try to read your patch.
Thanks again,
Koichi
Updated by normalperson (Eric Wong) about 10 years ago
Eric Wong normalperson@yhbt.net wrote:
For some method names (include?/delete/key?); I see them often used with
both core (Array/Hash) and non-core (Set/GDBM) classes.
We need to be careful with common method names and string literal.Maybe we may use dynamic instruction rewriting if we detect uses on
non-core classes and dynamically rewrite opt_str_lit back to putstring.
I think the following solves the problem of improper optimization to
applied to non-core classes:
http://bogomips.org/ruby.git/patch?id=ab08e19d7a70a0fe
We could even expand the idea for existing opt_* insns.
Having the recv_info array stick around indefinitely might get
wasteful, though. I suppose we could walk iseq->mark_ary to
pluck it out.
Note: opt_str_lit has the same insn length as putstring;
we may rewrite iseq->iseq at runtime.
I'll still try to DRY it up more to make testing easier.
Full diff: http://80x24.org/spew/m/opt-str-lit-v3%40m.txt
(broken out: "opt_str_lit-3" branch on git://bogomips.org/ruby.git)
Updated by normalperson (Eric Wong) about 10 years ago
- Status changed from Open to Rejected
obsoleted by Feature #10423