Project

General

Profile

Bug #12373

Optimize CSV#shift

Added by ksss (Yuki Kurihara) about 4 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Target version:
ruby -v:
ruby 2.4.0dev (2016-05-11 trunk 54976) [x86_64-darwin15]
[ruby-core:75462]

Description

I think that str1.start_with?(str2) is faster than str1[0] == str2.
Because str1.start_with?(str2) just call String#start_with?, But str1[0] == str2 call String#[], make new String object and call String#==.

(The patch is csv-shift.patch)

Benchmark results.

csv-benchmark.rb make temp CSV file and call CSV#each method(inner call CSV#shift)

$ ruby csv-benchmark.rb
Warming up --------------------------------------
       old_csv_shift     1.000  i/100ms
       new_csv_shift     1.000  i/100ms
Calculating -------------------------------------
       old_csv_shift      0.444  (± 0.0%) i/s -      3.000  in   6.759200s
       new_csv_shift      0.479  (± 0.0%) i/s -      3.000  in   6.264069s

Comparison:
       new_csv_shift:        0.5 i/s
       old_csv_shift:        0.4 i/s - 1.08x slower

string-start_with.rb is a micro benchmark for str1[0] == str2 and str1.start_with?(str2)

$ ruby string-start_with.rb
Warming up --------------------------------------
           a[0] == b    90.881k i/100ms
    a.start_with?(b)   115.557k i/100ms
Calculating -------------------------------------
           a[0] == b      1.836M (± 3.8%) i/s -      9.179M in   5.006568s
    a.start_with?(b)      3.183M (± 4.2%) i/s -     15.947M in   5.018654s

Comparison:
    a.start_with?(b):  3183386.0 i/s
           a[0] == b:  1836263.5 i/s - 1.73x slower

Of course $ make test-all TESTS="test/csv/*" passed


Files

csv-benchmark.rb (561 Bytes) csv-benchmark.rb ksss (Yuki Kurihara), 05/12/2016 12:39 AM
old_csv_shift.rb (5.1 KB) old_csv_shift.rb ksss (Yuki Kurihara), 05/12/2016 12:39 AM
new_csv_shift.rb (5.11 KB) new_csv_shift.rb ksss (Yuki Kurihara), 05/12/2016 12:39 AM
string-start_with.rb (215 Bytes) string-start_with.rb ksss (Yuki Kurihara), 05/12/2016 12:39 AM
csv-shift.patch (434 Bytes) csv-shift.patch ksss (Yuki Kurihara), 05/12/2016 12:41 AM
csv-shift-1.patch (1.15 KB) csv-shift-1.patch ksss (Yuki Kurihara), 05/12/2016 02:11 PM
csv-shift-2.patch (1.2 KB) csv-shift-2.patch ksss (Yuki Kurihara), 05/12/2016 02:11 PM
csv-shift-3.patch (2.13 KB) csv-shift-3.patch ksss (Yuki Kurihara), 05/12/2016 02:12 PM
csv-shift-hsbt.path (2.05 KB) csv-shift-hsbt.path hsbt (Hiroshi SHIBATA), 05/16/2017 08:22 AM

Updated by ksss (Yuki Kurihara) about 4 years ago

  • Description updated (diff)

Updated by nobu (Nobuyoshi Nakada) about 4 years ago

  • Description updated (diff)

You can replace str1[-1] == str2 with str1.end_with?(str2) too, three places.

Updated by ksss (Yuki Kurihara) about 4 years ago

Thank you for your reply.

You can replace str1[-1] == str2 with str1.end_with?(str2) too, three places.

Yes, That is my next plan.
I'm writing benchmark carefully.

Should I collect patches in this issue?

Updated by ksss (Yuki Kurihara) about 4 years ago

I update patches.

csv-shift-1.patch: Use s1.start_with?(s2) instead of s1[0] == s2 and use s1.end_with?(s2) instead of s1[-1] == s2.

$ ruby csv-benchmark.rb
Warming up --------------------------------------
       old_csv_shift     1.000  i/100ms
       new_csv_shift     1.000  i/100ms
Calculating -------------------------------------
       old_csv_shift      0.417  (± 0.0%) i/s -      3.000  in   7.189555s
       new_csv_shift      0.493  (± 0.0%) i/s -      3.000  in   6.080267s

Comparison:
       new_csv_shift:        0.5 i/s
       old_csv_shift:        0.4 i/s - 1.18x slower

csv-shift-2.patch: Use @double_quote_char instead of @quote_char * 2.

$ ruby csv-benchmark.rb
Warming up --------------------------------------
       old_csv_shift     1.000  i/100ms
       new_csv_shift     1.000  i/100ms
Calculating -------------------------------------
       old_csv_shift      0.443  (± 0.0%) i/s -      3.000  in   6.772626s
       new_csv_shift      0.486  (± 0.0%) i/s -      3.000  in   6.167382s

Comparison:
       new_csv_shift:        0.5 i/s
       old_csv_shift:        0.4 i/s - 1.10x slower

csv-shift-3.patch: Apply both patch csv-shift-1.patch and csv-shift-2.patch.

$ ruby csv-benchmark.rb
Warming up --------------------------------------
       old_csv_shift     1.000  i/100ms
       new_csv_shift     1.000  i/100ms
Calculating -------------------------------------
       old_csv_shift      0.436  (± 0.0%) i/s -      3.000  in   6.875614s
       new_csv_shift      0.567  (± 0.0%) i/s -      3.000  in   5.292932s

Comparison:
       new_csv_shift:        0.6 i/s
       old_csv_shift:        0.4 i/s - 1.30x slower

Updated by ksss (Yuki Kurihara) about 4 years ago

  • Assignee set to JEG2 (James Gray)

Updated by hsbt (Hiroshi SHIBATA) about 4 years ago

  • Status changed from Open to Assigned

Updated by hsbt (Hiroshi SHIBATA) about 3 years ago

  • Assignee changed from JEG2 (James Gray) to hsbt (Hiroshi SHIBATA)
#8

Updated by hsbt (Hiroshi SHIBATA) about 3 years ago

ksss (Yuki Kurihara)

I confirmed your patch and benchmark results.

Warming up --------------------------------------
           csv_shift     1.000  i/100ms
       new_csv_shift     1.000  i/100ms
Calculating -------------------------------------
           csv_shift      1.192  (± 0.0%) i/s -      6.000  in   5.034250s
       new_csv_shift      1.527  (± 0.0%) i/s -      8.000  in   5.243446s

Comparison:
       new_csv_shift:        1.5 i/s
           csv_shift:        1.2 i/s - 1.28x  slower

Should we use @double_quote_char instead of @quote_char ?
your patch named csv-shift-3.patch is only assign @double_quote_char. It's not used in csv.rb

I update replacement it in csv.rb and attched csv-shift-hsbt.patch.

Can you confirm it? If it's ok, I will merge it.

Updated by ksss (Yuki Kurihara) about 3 years ago

hsbt (Hiroshi SHIBATA)

Thank you for responding.
csv-shift-hsbt.patch looks good to me.

Should we use @double_quote_char instead of @quote_char ?
your patch named csv-shift-3.patch is only assign @double_quote_char. It's not used in csv.rb

@double_quote_char and @quote_char are different values. It can not be used instead.
@double_quote_char looks like using in CSV#shift method on csv-shift-3.patch and csv-shift-hsbt.patch.

Updated by hsbt (Hiroshi SHIBATA) about 3 years ago

Ah,

Should we use @double_quote_char instead of @quote_char ?

@quote_char is wrong. I intended to write @quote_char * 2

And, I confirmed csv-shift-3.patch again. It has no problem for me.

Thank you for your explanation.

(In Japanese: csv-shift-3.patch を確認した時に @double_quote_char 使ってないのでは...? と思い込んでいたのですが、今確認したところちゃんと置き換わっていました。失礼しました。)

Updated by ksss (Yuki Kurihara) about 3 years ago

hsbt (Hiroshi SHIBATA)

Good, Thank you for confirming and managing.

#12

Updated by hsbt (Hiroshi SHIBATA) about 3 years ago

  • Status changed from Assigned to Closed

Applied in changeset trunk|r58770.


Optimize CSV#shift.

[Bug #12373][ruby-core:75462]
Patch by Yuki Kurihara.

Benchmark:

  Warming up --------------------------------------
           csv_shift     1.000  i/100ms
       new_csv_shift     1.000  i/100ms
Calculating -------------------------------------
           csv_shift      1.192  (± 0.0%) i/s -      6.000  in 5.034250s
       new_csv_shift      1.527  (± 0.0%) i/s -      8.000  in 5.243446s

Comparison:
       new_csv_shift:        1.5 i/s
           csv_shift:        1.2 i/s - 1.28x  slower

Also available in: Atom PDF