Project

General

Profile

Actions

Feature #13368

closed

Improve performance of Array#sum with float elements

Added by watson1978 (Shizuo Fujita) almost 7 years ago. Updated about 3 years ago.

Status:
Closed
Target version:
-
[ruby-dev:50045]

Description

The declaration of local variable in loop, it will initialize local variable for each run of the loop with clang generated code.
So, it shouldn't declare the local variable in heavy loop.

Array#sum with float elements will be faster around 30%.

Before

       user     system      total        real
   3.320000   0.010000   3.330000 (  3.336088)

After

       user     system      total        real
   2.590000   0.010000   2.600000 (  2.602399)

Test code

require 'benchmark'

Benchmark.bmbm do |x|
  ary = []
  10000.times { ary << Random.rand }

  x.report do
    50000.times do
      ary.sum
    end
  end

end

Patch

https://github.com/ruby/ruby/pull/1555

Updated by normalperson (Eric Wong) almost 7 years ago

+Cc ruby-core since the original post was English

wrote:

Issue #13368 has been reported by watson1978 (Shizuo Fujita).


Bug #13368: Improve performance of Array#sum with float elements
https://bugs.ruby-lang.org/issues/13368

The declaration of local variable in loop, it will initialize local variable for each run of the loop with clang generated code.
So, it shouldn't declare the local variable in heavy loop.

I disagree with this change. Different compilers optimize
differently.

Patch

https://github.com/ruby/ruby/pull/1555

Disclaimer: I used commit 3635f883096604d0e6453dc9d2484d5c92467109
after having:

fetch = +refs/pull/*:refs/remotes/ruby/pull/*

In the [remote ] section of my .git/config and did not
use any proprietary JavaScript or browser to apply your change.

Updated by normalperson (Eric Wong) almost 7 years ago

+Cc ruby-core since the original post was English

wrote:

Issue #13368 has been reported by watson1978 (Shizuo Fujita).


Bug #13368: Improve performance of Array#sum with float elements
https://bugs.ruby-lang.org/issues/13368

The declaration of local variable in loop, it will initialize local variable for each run of the loop with clang generated code.
So, it shouldn't declare the local variable in heavy loop.

I disagree with this change. Different compilers optimize
differently.

Patch

https://github.com/ruby/ruby/pull/1555

Disclaimer: I used commit 3635f883096604d0e6453dc9d2484d5c92467109
after having:

fetch = +refs/pull/*:refs/remotes/ruby/pull/*

In the [remote ] section of my .git/config and did not
use any proprietary JavaScript or browser to apply your change.

Updated by watson1978 (Shizuo Fujita) almost 7 years ago

When I filed this ticket, I tried to run benchmark on macOS + clang only.
Then, I tried to do on 2 environments in additional.

I found what my patch is effective with clang envrionment only (such as macOS or FreeBSD which use clang as default compiler).

macOS 10.12 + gcc 6.3.0

Before

       user     system      total        real
   2.750000   0.000000   2.750000 (  2.757864)

After

       user     system      total        real
   2.730000   0.010000   2.740000 (  2.740204)

Ubuntu 16.04.4 + gcc 5.4.0

Before

       user     system      total        real
   2.400000   0.000000   2.400000 (  2.395856)

After

       user     system      total        real
   2.330000   0.000000   2.330000 (  2.327889)

Updated by naruse (Yui NARUSE) almost 7 years ago

watson1978 (Shizuo Fujita) wrote:

When I filed this ticket, I tried to run benchmark on macOS + clang only.
Then, I tried to do on 2 environments in additional.

I found what my patch is effective with clang envrionment only (such as macOS or FreeBSD which use clang as default compiler).

Hmm, the result shows clang is not efficient as gcc yet this time.
And improvement should be done by clang, not us.
Or should be done by something cleaner patch.

Updated by watson1978 (Shizuo Fujita) almost 7 years ago

naruse (Yui NARUSE) wrote:

Hmm, the result shows clang is not efficient as gcc yet this time.
And improvement should be done by clang, not us.
Or should be done by something cleaner patch.

Could you report this to clang developer If you don't want to fix this ?

Updated by naruse (Yui NARUSE) almost 7 years ago

watson1978 (Shizuo Fujita) wrote:

naruse (Yui NARUSE) wrote:

Hmm, the result shows clang is not efficient as gcc yet this time.
And improvement should be done by clang, not us.
Or should be done by something cleaner patch.

Could you report this to clang developer If you don't want to fix this ?

Though I reported to clang when it breaks build like https://bugs.llvm.org//show_bug.cgi?id=8319
Clang is considered still development at performance like other inline threshold ones.
They should improve more general cases.

Actions #7

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago

  • Tracker changed from Bug to Feature
  • Backport deleted (2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN)
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0