Project

General

Profile

Actions

Feature #10333

closed

[PATCH 3/1] optimize: "yoda literal" == string

Added by normalperson (Eric Wong) over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Assignee:
-
Target version:
[ruby-core:65448]

Description

This is a follow-up-to:

  1. [Feature #10326] optimize: recv << "literal string"
  2. [Feature #10329] optimize: foo == "literal string"

This can be slightly faster than: (string == "literal") because
we can guaranteed the "yoda literal" is already a string at
compile time.

Updated benchmarks from Xeon E3-1230 v3 @ 3.30GHz:

target 0: trunk (ruby 2.2.0dev (2014-10-06 trunk 47822) [x86_64-linux]) at "/home/ew/rrrr/b/i/bin/ruby"
target 1: built (ruby 2.2.0dev (2014-10-06 trunk 47822) [x86_64-linux]) at "/home/ew/ruby/b/i/bin/ruby"


loop_whileloop2

i = 0
while i< 6_000_000 # benchmark loop 2
  i += 1
end
trunk	0.10712811909615993
trunk	0.10693809622898698
trunk	0.10645449301227927
trunk	0.10646287119016051
built	0.10612367931753397
built	0.10581812914460897
built	0.10592922195792198
built	0.10595094738528132

vm2_streq1

i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
  i += 1
  foo == "literal"
end
trunk	0.47250875690951943
trunk	0.47325073881074786
trunk	0.4726782930083573
trunk	0.4727754699997604
built	0.185972370672971
built	0.1850820742547512
built	0.18558283289894462
built	0.18452610215172172

vm2_streq2

i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
  i += 1
  "literal" == foo
end
trunk	0.4719057851471007
trunk	0.4715963830240071
trunk	0.47177061904221773
trunk	0.4724834677763283
built	0.18247668212279677
built	0.18143231887370348
built	0.18060296680778265
built	0.17929687118157744

raw data:

[["loop_whileloop2",
  [[0.10712811909615993,
    0.10693809622898698,
    0.10645449301227927,
    0.10646287119016051],
   [0.10612367931753397,
    0.10581812914460897,
    0.10592922195792198,
    0.10595094738528132]]],
 ["vm2_streq1",
  [[0.47250875690951943,
    0.47325073881074786,
    0.4726782930083573,
    0.4727754699997604],
   [0.185972370672971,
    0.1850820742547512,
    0.18558283289894462,
    0.18452610215172172]]],
 ["vm2_streq2",
  [[0.4719057851471007,
    0.4715963830240071,
    0.47177061904221773,
    0.4724834677763283],
   [0.18247668212279677,
    0.18143231887370348,
    0.18060296680778265,
    0.17929687118157744]]]]

Elapsed time: 6.097474559 (sec)

benchmark results:
minimum results in each 4 measurements.
Execution time (sec)

name |trunk |built
--------+-------+-------
loop_whileloop2 |0.106 |0.106
vm2_streq1* |0.366 |0.079
vm2_streq2* |0.365 |0.073

Speedup ratio: compare with the result of `trunk' (greater is better)

name |built
--------+-------
loop_whileloop2 |1.006
vm2_streq1* |4.651
vm2_streq2* |4.969

 benchmark/bm_vm2_streq2.rb |  6 ++++++
 compile.c                  | 20 +++++++++++++++++++-
 insns.def                  | 20 ++++++++++++++++++++
 test/ruby/test_string.rb   | 12 ++++++++----
 4 files changed, 53 insertions(+), 5 deletions(-)
 create mode 100644 benchmark/bm_vm2_streq2.rb

Files

0001-optimize-yoda-literal-string.patch (6.23 KB) 0001-optimize-yoda-literal-string.patch normalperson (Eric Wong), 10/07/2014 01:11 AM

Updated by ko1 (Koichi Sasada) over 9 years ago

Comments for this ticket and the following tickets:

  1. [Feature #10326] optimize: recv << "literal string"
  2. [Feature #10329] optimize: foo == "literal string"

To continue this kind of hack, we need tons of instructions for each methods.
What do you think about it?

Basically, we need to add them more carefully. For example, persuasive explanation are needed, such as statistics (analysis by parser/compier), benchmark results for major use cases (maybe "<< 'literal'" for templates. but not sure this ticket for) .

Another idea is to make more general approach to indicate arguments (and a receiver) are string literal. It is called specialization. Specialized instructions (opt_plus and so on) is a kind of specialization by hands.

Small comments:

(1) iseq_compile_each() should not use opt_* instructions because we should be able to make instructions without opt_* insns (on/off by compile options).

(2) Name of instructions should be reconsidered.

Updated by akr (Akira Tanaka) over 9 years ago

How about to add .freeze to string literals?

i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
  i += 1
  foo == "literal".freeze
end

It reduce object allocations and may have some performance gain (without new VM instructions.)

Updated by normalperson (Eric Wong) over 9 years ago

wrote:

Comments for this ticket and the following tickets:

  1. [Feature #10326] optimize: recv << "literal string"
  2. [Feature #10329] optimize: foo == "literal string"

To continue this kind of hack, we need tons of instructions for each
methods. What do you think about it?

I am not completely happy with my current patches because of verbosity
and also icache footprint in the main VM loop. Ruby executable sizes
(even stripped) seem to get bigger with every release :<

However, perhaps the biggest performance problem is still too many
allocations and garbage objects; so I am willing to trade some code
size to reduce dynamic allocations.

Basically, we need to add them more carefully. For example, persuasive
explanation are needed, such as statistics (analysis by
parser/compier), benchmark results for major use cases (maybe "<<
'literal'" for templates. but not sure this ticket for) .

Right, we will need to find more real benchmarks.

Sadly, there are many places where garbage grows. So maybe this change
is only 1-2% overall. We may need a lot of small changes to add up to
noticeable improvements.

Another idea is to make more general approach to indicate arguments
(and a receiver) are string literal. It is called specialization.
Specialized instructions (opt_plus and so on) is a kind of
specialization by hands.

I've been thinking along these lines, too. For example, I would like to
see String#tr! and String#gsub! able to avoid allocations for literal
strings. Or even optimize: Time.now.{to_f,to_i,strftime("lit"))}

As suggested by akr, users may .freeze (or use constants), but that is
verbose and requires VM internal knowledge. My goal is to make
optimization as transparent as possible so users may write concise,
idiomatic Ruby code.

It would be great if things like ruby-trunk r47813
(unicode_norm_gen.rb: optimize concatenation)
can be done transparently, even.

Small comments:

(1) iseq_compile_each() should not use opt_* instructions because we
should be able to make instructions without opt_* insns (on/off by
compile options).

Right. I'll see about making it optional and doing it more
idiomatically. I mainly used existing opt_{aref,aset}_with compilation
as a guide.

(2) Name of instructions should be reconsidered.

OK, I do not mind changing names.

I also did informal benchmarks with my system Perl installation
(Perl 5.14.2 on Debian stable x86-64):

loop_whileloop2

	use strict;
	my $i = 0;
	while ($i < 6_000_000) { # benchmark loop 2
		$i += 1;
	}

Perl 0.228s

trunk 0.10645449301227927
built 0.10581812914460897

Without the string compare, we're already faster than Perl \o/

vm2_streq1

 	use strict;
 	my $i = 0;
 	my $foo = "literal";
 	while ($i < 6_000_000) { # benchmark loop 2
 		$i += 1;
 		$foo eq "literal";
 	}

Perl 0.0349s

trunk 0.4726782930083573
built 0.18452610215172172

We lose to Perl without the optimization, but win with it :)
This is just a micro-benchmark, of course, but I think it's an important
data point to show gains by avoiding allocations when possible

Updated by normalperson (Eric Wong) over 9 years ago

Eric Wong wrote:

wrote:

Another idea is to make more general approach to indicate arguments
(and a receiver) are string literal. It is called specialization.
Specialized instructions (opt_plus and so on) is a kind of
specialization by hands.

I've been thinking along these lines, too. For example, I would like to
see String#tr! and String#gsub! able to avoid allocations for literal
strings. Or even optimize: Time.now.{to_f,to_i,strftime("lit"))}

Maybe start moving existing iseq_compile_each optimizations to the
peephole optimizer (work-in-progress):
http://80x24.org/spew/m/ee49aae645e0953fc16fc1557dce6a09b4de4324.txt

Updated by nobu (Nobuyoshi Nakada) over 9 years ago

  • Description updated (diff)

Updated by normalperson (Eric Wong) over 9 years ago

Eric Wong wrote:

Maybe start moving existing iseq_compile_each optimizations to the
peephole optimizer (work-in-progress):
http://80x24.org/spew/m/ee49aae645e0953fc16fc1557dce6a09b4de4324.txt

Part #2, generic putstring_for instruction:
http://80x24.org/spew/m/5a77be4e211c81a509573e3e1ca3bc3ca2383e68.txt

A new putstring_for instruction may replace all current uses of:

  • opt_str_freeze
  • opt_aref_with
  • opt_aset_with

This new instruction should also be usable to implement new
optimizations to avoid rb_str_resurrect.

Optimizations for literal hash["literal"] (aref/lookup) and
"literal".freeze are easily moved to the peephole optimizer.

However, it seems easier to optimize hash["literal"] = val
in iseq_compile_each right now.

This reduces performance compared to the old opt_aref_with and
opt_aset_with instructions slightly, but is more elegant for in
avoiding special cases. We may decide to resurrect opt_aref_with
and opt_aset_with if we want to recover the small performance loss
and can accept a bigger VM loop.

"".freeze performance is probably not interesting to anyone :)

benchmark results:
minimum results in each 5 measurements.
Execution time (sec)

name | 2.1.3 | trunk | built
----------------------+--------+--------+-------
loop_whileloop2 | 0.106 | 0.106 | 0.106
vm2_hash_aref_lit* | 0.503 | 0.162 | 0.192
vm2_hash_aset_lit* | 0.587 | 0.214 | 0.241

Speedup ratio: compare with the result of `2.1.3' (greater is better)

name | trunk | built
----------------------+--------+-------
loop_whileloop2 | 1.000 | 0.998
vm2_hash_aref_lit* | 3.099 | 2.621
vm2_hash_aset_lit* | 2.741 | 2.435

Updated by normalperson (Eric Wong) over 9 years ago

wrote:

Mine is https://github.com/nobu/ruby/tree/peephole-opt

Thanks. I also noticed your `peephole_opt_send' branch which implements
opt_aset_with. Any reason it was not commited?

I'll take a closer look tomorrow since I have not managed to do
"opt_aset_with" using peephole + "putstring_for" insn [ruby-core:65539]

Updated by normalperson (Eric Wong) over 9 years ago

Eric Wong wrote:

Maybe start moving existing iseq_compile_each optimizations to the
peephole optimizer (work-in-progress):
http://80x24.org/spew/m/ee49aae645e0953fc16fc1557dce6a09b4de4324.txt

Part #2, generic putstring_for instruction:
http://80x24.org/spew/m/5a77be4e211c81a509573e3e1ca3bc3ca2383e68.txt

Part #3, for obj << "literal"' and obj == "literal"' calls:
http://80x24.org/spew/m/1412827413-14385-1-git-send-email-e%4080x24.org.txt

Updated by ko1 (Koichi Sasada) over 9 years ago

On 2014/10/09 11:04, Eric Wong wrote:

A new putstring_for instruction may replace all current uses of:

Sorry, I can't find a `putstring_for' instruction. Which patch should I see?

I'm happy how ruby sources compiled to bytecode sequence.

BTW, any instructions except opt_* instrcutions don't include `_'
(underscore) characters. It is easy that which are core isntructions and
which are optional (for maybe optimization) instructions.

--
// SASADA Koichi at atdot dot net

Updated by normalperson (Eric Wong) over 9 years ago

SASADA Koichi wrote:

On 2014/10/09 11:04, Eric Wong wrote:

A new putstring_for instruction may replace all current uses of:

Sorry, I can't find a `putstring_for' instruction. Which patch should I see?

putstring_for is in part #2:
http://80x24.org/spew/m/5a77be4e211c81a509573e3e1ca3bc3ca2383e68.txt

All patches + git also pushed to my "putstring" branch in git:
http://bogomips.org/ruby.git/log/?h=putstring

(git remote add bogomips git://bogomips.org/ruby.git &&
git fetch bogomips)

I'm happy how ruby sources compiled to bytecode sequence.

BTW, any instructions except opt_* instrcutions don't include `_'
(underscore) characters. It is easy that which are core isntructions and
which are optional (for maybe optimization) instructions.

OK, I can rename to "putstring2" or "putstringfor" (or "putstringlazy")?
You should decide the final name.

Updated by normalperson (Eric Wong) over 9 years ago

Current patch (named "opt_str_lit" insn):

http://80x24.org/spew/m/opt-str-lit-v1%40m.txt

One VM instruction now optimizes away object allocation for string
literals in the following cases:

  • "lit" % obj
  • str << "lit"
  • "lit" + str
  • str + "lit"
  • "lit" * num
  • "lit" === obj
  • obj === "lit"
  • "lit" == str
  • str == "lit"
  • "lit" != str
  • str != "lit"
  • hash["lit"]
  • hash["lit"] = obj

It is also in the "opt_str_lit" branch of git://bogomips.org/ruby.git

We may rework the current BOP_ redefinition checks via generated code
to make it easier to extend to args of common operations:

  • String#{gsub,tr,delete,squeeze,split,unpack})
  • Hash#{include?,#key?} ...
  • Array#{pack,include?,join} ...

Basically any common operation which takes a string literals.

Updated by normalperson (Eric Wong) over 9 years ago

Eric Wong wrote:

http://80x24.org/spew/m/opt-str-lit-v1%40m.txt

Benchmark suite so far:

http://80x24.org/spew/m/opt_str_lit-v1-bench%40whir.txt

Note: this increases iseq size and can slow some performance on some
benchmarks due to the removal of the opt_{aset,aref}with VM
instructions. Since opt_str_lit is generic, many similar optimizations
may be implemented with less effort. (working on getting BOP
done more
generically for sub/gsub/tr-type methods)

Updated by normalperson (Eric Wong) over 9 years ago

I've changed the way BOP_* flags are defined (they are now OM_* flags).
Everything is in "opt_str_lit-v2" branch of git://bogomips.org/ruby.git

Full diff of my work-in-progress here against trunk:
http://80x24.org/spew/m/opt-str-lit-v2%40m.txt

Interesting individual patches:

I'll work some cleanups and adding more methods tomorrow:

* Hash#{has_key,member,key,include}?
* {Array,Hash,String}#delete
* Array#include?
* String#{squeeze,count,delete!}
* Time#strftime
  ...

There's a lot of potential, here :)

Updated by ko1 (Koichi Sasada) over 9 years ago

I can't find out which methods are target.
How to see the list?

I'm afraid that many optimized methods becomes overhead for non-target methods.

Updated by normalperson (Eric Wong) over 9 years ago

wrote:

I can't find out which methods are target.
How to see the list?

I'm afraid that many optimized methods becomes overhead for non-target methods.

Most comments + optimization are in compile.c (still awork-in-progress).
The peephole optimization phase rewrites many
putstring instructions to opt_str_lit instructions (same insn length).

Sorry my last (v2) patch is dirty. Still a work-in-progress.
I'll try to get an updated patch out this week.

For sub/gsub/tr* methods, the optimization is an obvious win: the only
common use for methods with those method names is with the String class;
so it is effective.

I think start_with?/end_with?/squeeze/pack/unpack fall into the same
category: common case is only one core class uses them with string
literals. This makes opt_str_lit a win with those method names.

For some method names (include?/delete/key?); I see them often used with
both core (Array/Hash) and non-core (Set/GDBM) classes.
We need to be careful with common method names and string literal.

Maybe we may use dynamic instruction rewriting if we detect uses on
non-core classes and dynamically rewrite opt_str_lit back to putstring.

Note: opt_str_lit has the same insn length as putstring;
we may rewrite iseq->iseq at runtime.

Updated by ko1 (Koichi Sasada) over 9 years ago

Thank you for your explanation. Completely agree with you.

I will try to read your patch.

Thanks again,
Koichi

Updated by normalperson (Eric Wong) over 9 years ago

Eric Wong wrote:

For some method names (include?/delete/key?); I see them often used with
both core (Array/Hash) and non-core (Set/GDBM) classes.
We need to be careful with common method names and string literal.

Maybe we may use dynamic instruction rewriting if we detect uses on
non-core classes and dynamically rewrite opt_str_lit back to putstring.

I think the following solves the problem of improper optimization to
applied to non-core classes:

http://bogomips.org/ruby.git/patch?id=ab08e19d7a70a0fe

We could even expand the idea for existing opt_* insns.

Having the recv_info array stick around indefinitely might get
wasteful, though. I suppose we could walk iseq->mark_ary to
pluck it out.

Note: opt_str_lit has the same insn length as putstring;
we may rewrite iseq->iseq at runtime.

I'll still try to DRY it up more to make testing easier.

Full diff: http://80x24.org/spew/m/opt-str-lit-v3%40m.txt
(broken out: "opt_str_lit-3" branch on git://bogomips.org/ruby.git)

Updated by normalperson (Eric Wong) over 9 years ago

  • Status changed from Open to Rejected

obsoleted by Feature #10423

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0