Project

General

Profile

Actions

Bug #9221

closed

Time.parse performance becomes exponentially worse as string length grows

Added by mpelzsherman (Michael Pelz-Sherman) about 11 years ago. Updated almost 11 years ago.

Status:
Closed
Target version:
ruby -v:
ruby-2.0.0-p247
[ruby-core:58876]

Description

See attached script. Output:

parsing 12:00 PM fffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.010443
parsing 12:00 PM ffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.017739
parsing 12:00 PM fffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.028127
parsing 12:00 PM ffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.049885
parsing 12:00 PM fffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.071379
parsing 12:00 PM ffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.112612
parsing 12:00 PM fffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.184517
parsing 12:00 PM ffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.293784
parsing 12:00 PM fffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.476253
parsing 12:00 PM ffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.786087
parsing 12:00 PM fffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 1.256976
parsing 12:00 PM ffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 2.019426
parsing 12:00 PM fffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 3.300646
parsing 12:00 PM ffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 5.37757
parsing 12:00 PM fffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 8.763601
parsing 12:00 PM ffffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 14.165842
parsing 12:00 PM fffffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 23.888907
...


Files

time_parse.rb (237 Bytes) time_parse.rb script to demonstrate Time.parse performance degredation mpelzsherman (Michael Pelz-Sherman), 12/05/2013 10:53 PM

Updated by duerst (Martin Dürst) about 11 years ago

This runs in no time on ruby 2.0.0p247 (2013-06-27) [i386-mingw32] (same patch level!). Can you give more details about your environment?

Updated by mpelzsherman (Michael Pelz-Sherman) about 11 years ago

duerst (Martin Dürst) wrote:

This runs in no time on ruby 2.0.0p247 (2013-06-27) [i386-mingw32] (same patch level!). Can you give more details about your environment?

Model Name: MacBook Pro
Model Identifier: MacBookPro9,1
Processor Name: Intel Core i7
Processor Speed: 2.3 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
Boot ROM Version: MBP91.00D3.B08
SMC Version (system): 2.1f173
Serial Number (system): C02JP19HF1G3

Updated by mpelzsherman (Michael Pelz-Sherman) about 11 years ago

Running ruby-2.0.0-p247 [ x86_64 ], installed using RVM.

Updated by Anonymous about 11 years ago

Reproduced on trunk:

irb(main):027:0> RUBY_DESCRIPTION
=> "ruby 2.1.0dev (2013-12-07 trunk 44044) [x86_64-darwin13.0]"
irb(main):028:0> Benchmark.measure { Time.parse("12:00 PM " + "f"*30) }.to_s
=> " 4.760000 0.000000 4.760000 ( 4.759983)\n"
irb(main):029:0> Benchmark.measure { Time.parse("12:00 PM " + "f"*31) }.to_s
=> " 7.770000 0.000000 7.770000 ( 7.787989)\n"
irb(main):030:0> Benchmark.measure { Time.parse("12:00 PM " + "f"*32) }.to_s
=> " 12.390000 0.000000 12.390000 ( 12.405786)\n"

Updated by hsbt (Hiroshi SHIBATA) about 11 years ago

  • Target version set to 2.1.0

Updated by phasis68 (Heesob Park) about 11 years ago

It can be reproduced on ruby 2.0.0-p247 i386-mingw32.

I modified a test code to extension library free version.

C:\work>ruby -v
ruby 2.0.0p247 (2013-06-27) [i386-mingw32]

C:\work>type test.rb
require 'benchmark'
str = "12:00 PM fffffffffffffffffffffffffffff"
pat=/((?:\d+\s*:\s*\d+(?:\s*:\s*\d+(?:[,.]\d*)?)?|\d+\sh(?:\s\d+m?(?:\s*\d+s?)
?)?)(?:\s*ap)?|\d+\s*ap)(?:\s*((?:gmt|utc?)?[-+]\d
+(?:[,.:]\d+(?::\d+)?)?|[[:alpha:].\s]+(?:standard|daylight)\stime\b|[[:alpha:]]
+(?:\sdst)?\b))?/i
p Benchmark.measure { pat.match(str) }

C:\work>ruby test.rb
#<Benchmark::Tms:0x24a3440 @label="", @real=3.303211, @cstime=0.0, @cutime=0.0,
@stime=0.0, @utime=3.2969999999999997, @total=3.2969999999999997>

Interestingly, this issue is not reproduced on irb session or without filename.

C:\work>ruby < test.rb
#<Benchmark::Tms:0x24cf4e0 @label="", @real=0.0, @cstime=0.0, @cutime=0.0, @stim
e=0.0, @utime=0.0, @total=0.0>

C:\work>irb
DL is deprecated, please use Fiddle
irb(main):001:0> require 'benchmark'
=> true
irb(main):002:0> str = "12:00 PM fffffffffffffffffffffffffffff"
=> "12:00 PM fffffffffffffffffffffffffffff"
<:].\s]+(?:standard|daylight)\stime\b|[[:alpha:]]+(?:\sdst)?\b))?/i
=> /((?:\d+\s*:\s*\d+(?:\s*:\s*\d+(?:[,.]\d*)?)?|\d+\sh(?:\s\d+m?(?:\s*\d+s?)?
)?)(?:\s*ap)?|\d+\s*ap)(?:\s*((?:gmt|utc?)?[-+]\d+
(?:[,.:]\d+(?::\d+)?)?|[[:alpha:].\s]+(?:standard|daylight)\stime\b|[[:alpha:]]+
(?:\sdst)?\b))?/i
irb(main):004:0> p Benchmark.measure { pat.match(str) }
#<Benchmark::Tms:0x27059a0 @label="", @real=0.0, @cstime=0.0, @cutime=0.0, @stim
e=0.0, @utime=0.0, @total=0.0>
=> #<Benchmark::Tms:0x27059a0 @label="", @real=0.0, @cstime=0.0, @cutime=0.0, @s
time=0.0, @utime=0.0, @total=0.0>

Updated by akr (Akira Tanaka) about 11 years ago

  • Status changed from Open to Assigned
  • Assignee set to tadf (tadayoshi funaba)

I assigned this issue to tadf because the pattern
is written in ext/date/date_parse.c.

Updated by phasis68 (Heesob Park) about 11 years ago

I think this is an issue of Oniguruma(regular expressions library).

Here is a shortest code of producing this issue.
p /(?:[[:alpha:]]+\s)?/i.match("f"*40)

The [:alpha:] bracket and case ignore flag combination shows bad performance.

The simple workaround is using [a-z] instead of [:alpha:].

Updated by nobu (Nobuyoshi Nakada) about 11 years ago

[[:alpha:]] doesn't equal to [a-z], the former matches unicode alphabet categories.

Actions #10

Updated by nobu (Nobuyoshi Nakada) about 11 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r44086.
Michael, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


date_parse.c: get rid of backtrack explosion

  • ext/date/date_parse.c (parse_time): unset case-insensitive flag
    for [:alpha:], which already implies both cases, to get rid of
    backtrack explosion. [ruby-core:58876] [Bug #9221]

Updated by phasis68 (Heesob Park) about 11 years ago

I know [[:alpha:]] is not equal to [a-z].
But, it seems that the current time parsing is only for ascii string.

BTW, date_strptime.c has the same regular expression on line 567-572.
It should be modified also.

Updated by nagachika (Tomoyuki Chikanaga) about 11 years ago

  • Status changed from Closed to Open
  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: REQUIRED, 2.0.0: REQUIRED

Thank you for your notice. I'll re-open this ticket.
And I confirmed ruby_1_9_3 and ruby_2_0_0 have these regular expressions.

Actions #14

Updated by nobu (Nobuyoshi Nakada) about 11 years ago

  • Status changed from Open to Closed

This issue was solved with changeset r44126.
Michael, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


date_strptime.c: get rid of backtrack explosion

  • ext/date/date_strptime.c (date__strptime_internal): unset
    case-insensitive flag for [:alpha:], which already implies both
    cases, to get rid of backtrack explosion. [ruby-core:58984]
    [Bug #9221]

Updated by nagachika (Tomoyuki Chikanaga) almost 11 years ago

  • Backport changed from 1.9.3: REQUIRED, 2.0.0: REQUIRED to 1.9.3: REQUIRED, 2.0.0: DONE

r44086 and r44126 are backported to ruby_2_0_0 branch at r44126.

Updated by usa (Usaku NAKAMURA) almost 11 years ago

  • Backport changed from 1.9.3: REQUIRED, 2.0.0: DONE to 1.9.3: DONE, 2.0.0: DONE

backported into ruby_1_9_3 at r44746.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0