Bug #18293
closedTime.at in master branch was 25% slower then Ruby 3.0
Added by watson1978 (Shizuo Fujita) about 3 years ago. Updated about 3 years ago.
Description
− | Ruby 3.0.2 | Ruby 3.1.0-dev | result |
---|---|---|---|
Time.at | 8.223M | 6.190M | 0.753x |
Environment¶
- MacBook Pro (14inch, 2021)
- macOS 12.0
- Apple M1 Max
- Apple clang version 13.0.0 (clang-1300.0.29.3)
Ruby 3.1.0-dev¶
$ ruby -v time.rb
ruby 3.1.0dev (2021-11-08T13:15:21Z master bd2674ad33) [arm64-darwin21]
Warming up --------------------------------------
Time.at 614.843k i/100ms
Calculating -------------------------------------
Time.at 6.190M (± 0.3%) i/s - 31.357M in 5.065559s
Ruby 3.0.2¶
$ ruby -v time.rb
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [arm64-darwin21]
Warming up --------------------------------------
Time.at 821.722k i/100ms
Calculating -------------------------------------
Time.at 8.223M (± 0.6%) i/s - 41.908M in 5.096820s
Test code¶
require 'benchmark/ips'
Benchmark.ips do |x|
x.report('Time.at') { Time.at(0) }
end
Files
patch.diff (2.52 KB) patch.diff | k0kubun (Takashi Kokubun), 11/10/2021 04:23 AM |
Updated by watson1978 (Shizuo Fujita) about 3 years ago
- ruby -v set to ruby 3.1.0dev (2021-11-08T13:15:21Z master bd2674ad33) [arm64-darwin21]
Updated by peterzhu2118 (Peter Zhu) about 3 years ago
I was able to repro this on Ubuntu 20.04 on an AMD CPU. Looks like moving these functions from time.c
to timev.rb
is the reason.
77e7082e824af8523c1e7c3bfc111c2e52e7e3b3 (Moved Time.at to builtin)
Time.at 5.663M (± 0.7%) i/s - 28.836M in 5.092655s
Time.now 5.077M (± 0.8%) i/s - 25.841M in 5.089906s
9101597d05ef645949bab3a210d8fa5e61de26e3 (Moved Time.now to builtin)
Time.at 7.755M (± 0.7%) i/s - 39.139M in 5.047441s
Time.now 5.001M (± 0.8%) i/s - 25.217M in 5.042618s
d5fb51d2d3a7ba0e16042282d0f94c5336ed2cc1 (Add time.rb as builtin)
Time.at 7.882M (± 0.7%) i/s - 39.532M in 5.015832s
Time.now 5.450M (± 0.4%) i/s - 27.564M in 5.057575s
Updated by k0kubun (Takashi Kokubun) about 3 years ago
Could anybody share use cases (preferably a link to a particular code calling it) where Time.at
's performance matters?
Updated by byroot (Jean Boussier) about 3 years ago
It's quite a hotspot in our cache serializers:
- https://github.com/Shopify/paquito/blob/aabd8740175f37b382d425da004636266d546901/lib/paquito/types.rb#L88
- https://github.com/Shopify/paquito/blob/aabd8740175f37b382d425da004636266d546901/lib/paquito/types.rb#L173
All Active Record models have at least two Time
instances, often a few more, so when you cache a few hundreds or thousand records for say a product collection page, it's not rare for Time.at
to show up in production profiles as 2-3% of total request time.
Most people however use Marshal
for this, so they'll see Time#_dump
instead.
Updated by Eregon (Benoit Daloze) about 3 years ago
I think it'd be good to understand the slow down here.
Is it due to having the method in Ruby and using Primitive.
?
It seems unexpectedly high for that.
Or maybe the fact there is a declared keyword argument is slow if the kwarg is not given?
But that's not new, e.g. read_nonblock
uses this pattern since a long time.
Updated by k0kubun (Takashi Kokubun) about 3 years ago
- File patch.diff patch.diff added
Time.at
has complicated default arguments. I believe that would result in 5 local variable assignments in VM and be the bottleneck.
If you apply the attached patch and use this benchmark script:
benchmark:
- Time.at(0)
- Time.at_c(0)
- Time.at_1arg(0)
the result is:
$ benchmark-driver -v /tmp/a.yml --rbenv=ruby
ruby: ruby 3.1.0dev (2021-11-10T01:08:48Z master af74cc7504) [x86_64-linux]
Warming up --------------------------------------
Time.at(0) 6.141M i/s - 6.164M times in 1.003748s (162.85ns/i, 618clocks/i)
Time.at_c(0) 9.847M i/s - 9.945M times in 1.009962s (101.56ns/i, 385clocks/i)
Time.at_1arg(0) 8.976M i/s - 9.089M times in 1.012540s (111.40ns/i, 423clocks/i)
Calculating -------------------------------------
Time.at(0) 6.691M i/s - 18.422M times in 2.753387s (149.47ns/i, 567clocks/i)
Time.at_c(0) 11.616M i/s - 29.541M times in 2.543150s (86.09ns/i, 327clocks/i)
Time.at_1arg(0) 10.710M i/s - 26.929M times in 2.514397s (93.37ns/i, 354clocks/i)
Comparison:
Time.at_c(0): 11615765.0 i/s
Time.at_1arg(0): 10709979.3 i/s - 1.08x slower
Time.at(0): 6690511.4 i/s - 1.74x slower
So the bottleneck seems to be not the Primitive.
call but the complicated default argument behavior.
Updated by byroot (Jean Boussier) about 3 years ago
I don't understand why we need to detect wether the parameters were passed, why can't we just have default values?
>> Time.at(946702800, 0, :usec) == Time.at(946702800)
=> true
Or am I missing something?
Updated by k0kubun (Takashi Kokubun) about 3 years ago
FYI, the current iseq is:
irb(main)[01:0]> puts RubyVM::InstructionSequence.of(Time.method(:at)).disasm
== disasm: #<ISeq:at@<internal:timev>:270 (270,2)-(272,5)> (catch: FALSE)
local table (size: 7, argc: 1 [opts: 2, rest: -1, post: 0, block: -1, kw: 1@0, kwrest: -1])
[ 7] time@0<Arg>[ 6] subsec@1<Opt=0>[ 5] unit@2<Opt=7>[ 4] in@3 [ 3] ?@4 [ 2] nosubsec@5 [ 1] nounit@6
0000 putobject true ( 270)[Li]
0002 dup
0003 setlocal_WC_0 nosubsec@5
0005 setlocal_WC_0 subsec@1
0007 putobject true
0009 dup
0010 setlocal_WC_0 nounit@6
0012 setlocal_WC_0 unit@2
0014 getlocal_WC_0 time@0 ( 271)[LiCa]
0016 getlocal_WC_0 subsec@1
0018 getlocal_WC_0 unit@2
0020 getlocal_WC_0 in@3
0022 getlocal_WC_0 nosubsec@5
0024 getlocal_WC_0 nounit@6
0026 invokebuiltin <builtin!time_s_at/6>
0028 leave ( 272)[Re]
obviously this would be slower than things like:
irb(main)[02:0]> puts RubyVM::InstructionSequence.of(Kernel.method(:Float)).disasm
== disasm: #<ISeq:Float@<internal:kernel>:171 (171,2)-(173,5)> (catch: FALSE)
local table (size: 3, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: 1@0, kwrest: -1])
[ 3] arg@0<Arg> [ 2] exception@1[ 1] ?@2
0000 opt_invokebuiltin_delegate_leave <builtin!rb_f_float/2>, 0 ( 172)[LiCa]
0003 leave ( 173)[Re]
I don't understand why we need to detect wether the parameters were passed, why can't we just have default values?
I'm not aware of that, but if we can eliminate nosubsec
and nounit
local variables, we could convert invokebuiltin
to opt_invokebuiltin_delegate_leave
and I guess that would be a large improvement. It would then look like:
== disasm: #<ISeq:at@<internal:timev>:274 (274,2)-(276,5)> (catch: FALSE)
local table (size: 5, argc: 1 [opts: 2, rest: -1, post: 0, block: -1, kw: 1@0, kwrest: -1])
[ 5] time@0<Arg>[ 4] subsec@1<Opt=0>[ 3] unit@2<Opt=4>[ 2] in@3 [ 1] ?@4
0000 putobject true ( 274)
0002 setlocal_WC_0 subsec@1
0004 putobject true
0006 setlocal_WC_0 unit@2
0008 opt_invokebuiltin_delegate_leave <builtin!time_s_at2/4>, 0 ( 275)[LiCa]
0011 leave ( 276)[Re]
It'd be nice if we can eliminate setlocal
for subsec
and unit
as well, but it seems inevitable as long as you use positional arguments with default values.
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
- Status changed from Open to Closed
Applied in changeset git|83bdc2f00c4565a091d88399e5a315a96f6afed6.
Simplify default values of Time.at
[Bug #18293]
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
- Status changed from Closed to Open
I thought and tried it but still 10% slower than 3.0.
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
- Status changed from Open to Closed
Applied in changeset git|8bcff5604b15fd29f357669dd8cb65a6618c9926.
Tentative fix of subsec to Time.at
[Bug #18293]
Updated by byroot (Jean Boussier) about 3 years ago
- Status changed from Closed to Open
Updated by Eregon (Benoit Daloze) about 3 years ago
FYI in TruffleRuby when we need to know if an argument is passed or not we use a default value of undefined
which would be just Qundef
in CRuby.
It works well and it's efficient to check (identity check).
For the anecdote, that's an heritage from Rubinius.
Updated by Eregon (Benoit Daloze) about 3 years ago
On current master I get 6044704
i/s and 5368058
i/s on 3.0.2, 11.2% slower.
Updated by k0kubun (Takashi Kokubun) about 3 years ago
Current status:
$ cat /tmp/a.yml
benchmark:
- Time.at(0)
$ benchmark-driver -v --rbenv '3.0.0;before;after' /tmp/a.yml
3.0.0: ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
before: ruby 3.1.0dev (2021-11-10T08:42:09Z master cc33d07f46) [x86_64-darwin19]
after: ruby 3.1.0dev (2021-11-10T09:14:14Z master 8bcff5604b) [x86_64-darwin19]
Warming up --------------------------------------
Time.at(0) 4.058M i/s - 4.169M times in 1.027334s (246.42ns/i)
Calculating -------------------------------------
3.0.0 before after
Time.at(0) 4.664M 3.559M 3.957M i/s - 12.175M times in 2.610444s 3.421049s 3.077017s
Comparison:
Time.at(0)
3.0.0: 4663774.1 i/s
after: 3956598.5 i/s - 1.18x slower
before: 3558709.9 i/s - 1.31x slower
irb(main)[01:0]> puts RubyVM::InstructionSequence.of(Time.method(:at)).disasm
== disasm: #<ISeq:at@<internal:timev>:270 (270,2)-(272,5)> (catch: FALSE)
local table (size: 5, argc: 1 [opts: 2, rest: -1, post: 0, block: -1, kw: 1@0, kwrest: -1])
[ 5] time@0<Arg>[ 4] subsec@1<Opt=0>[ 3] unit@2<Opt=4>[ 2] in@3 [ 1] ?@4
0000 putobject false ( 270)
0002 setlocal_WC_0 subsec@1
0004 putobject :microsecond
0006 setlocal_WC_0 unit@2
0008 opt_invokebuiltin_delegate_leave <builtin!time_s_at/4>, 0 ( 271)[LiCa]
0011 leave ( 272)[Re]
Updated by k0kubun (Takashi Kokubun) about 3 years ago
It seems like C -> builtin conversion makes method calls with keyword arguments faster, but C seems faster on method calls without keyword arguments if the method has keyword arguments (with default values).
$ benchmark-driver -v --rbenv '3.0.0;master::after' /tmp/a.yml
3.0.0: ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
master: ruby 3.1.0dev (2021-11-10T09:14:14Z master 8bcff5604b) [x86_64-darwin19]
Calculating -------------------------------------
3.0.0 master
Time.at(0) 4.825M 4.025M i/s - 10.000M times in 2.072430s 2.484322s
Time.at(0, in: -"UTC") 1.388M 1.546M i/s - 10.000M times in 7.205443s 6.466879s
Comparison:
Time.at(0)
3.0.0: 4825253.4 i/s
master: 4025243.1 i/s - 1.20x slower
Time.at(0, in: -"UTC")
master: 1546341.0 i/s
3.0.0: 1387839.7 i/s - 1.11x slower
Updated by k0kubun (Takashi Kokubun) about 3 years ago
- Status changed from Open to Closed
- Assignee set to ko1 (Koichi Sasada)
Thanks to 2a3d5d661ce2cadad50aa5d72b54e134da54f5f3, master is no longer 25% slower than Ruby 3.0.0 at Time.at(0)
.
$ benchmark-driver -v --rbenv='3.0.0;master::ruby' benchmark/time_at.yml
3.0.0: ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
master: ruby 3.1.0dev (2021-11-15T06:58:56Z master f943264565) [x86_64-linux]
Warming up --------------------------------------
Time.at(0) 9.546M i/s - 9.734M times in 1.019744s (104.76ns/i, 398clocks/i)
Time.at(0, 500) 9.457M i/s - 9.680M times in 1.023522s (105.74ns/i, 370clocks/i)
Time.at(0, in: "+09:00") 1.977M i/s - 2.044M times in 1.033743s (505.75ns/i)
Time.at(0, 500, in: "+09:00") 1.821M i/s - 1.902M times in 1.044507s (549.27ns/i)
Calculating -------------------------------------
3.0.0 master
Time.at(0) 10.477M 13.172M i/s - 28.637M times in 2.733200s 2.174097s
Time.at(0, 500) 9.686M 7.931M i/s - 28.371M times in 2.929153s 3.577184s
Time.at(0, in: "+09:00") 1.823M 2.542M i/s - 5.932M times in 3.253225s 2.333703s
Time.at(0, 500, in: "+09:00") 1.959M 2.448M i/s - 5.462M times in 2.787425s 2.230970s
Comparison:
Time.at(0)
master: 13171774.0 i/s
3.0.0: 10477358.7 i/s - 1.26x slower
Time.at(0, 500)
3.0.0: 9685817.8 i/s
master: 7931166.6 i/s - 1.22x slower
Time.at(0, in: "+09:00")
master: 2541815.3 i/s
3.0.0: 1823372.9 i/s - 1.39x slower
Time.at(0, 500, in: "+09:00")
master: 2448156.1 i/s
3.0.0: 1959429.5 i/s - 1.25x slower
Updated by Eregon (Benoit Daloze) about 3 years ago
https://github.com/ruby/ruby/pull/5112 for how it works
Updated by byroot (Jean Boussier) about 3 years ago
- Status changed from Closed to Open
I'm afraid that last patch introduced a regression:
> Time.at(123, 123123)
=> 1970-01-01 00:02:03.123123 +0000
> Time.at(*[123, 123123])
<internal:timev>:271:in `at': wrong number of arguments (given 2, expected 1) (ArgumentError)
Updated by byroot (Jean Boussier) about 3 years ago
- Status changed from Open to Closed
Fixed in 1af8ed5f0a2c381c5dee4a5bcffff61f270c30d9