Bug #9221
closedTime.parse performance becomes exponentially worse as string length grows
Description
See attached script. Output:
parsing 12:00 PM fffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.010443
parsing 12:00 PM ffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.017739
parsing 12:00 PM fffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.028127
parsing 12:00 PM ffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.049885
parsing 12:00 PM fffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.071379
parsing 12:00 PM ffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.112612
parsing 12:00 PM fffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.184517
parsing 12:00 PM ffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.293784
parsing 12:00 PM fffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.476253
parsing 12:00 PM ffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 0.786087
parsing 12:00 PM fffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 1.256976
parsing 12:00 PM ffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 2.019426
parsing 12:00 PM fffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 3.300646
parsing 12:00 PM ffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 5.37757
parsing 12:00 PM fffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 8.763601
parsing 12:00 PM ffffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 14.165842
parsing 12:00 PM fffffffffffffffffffffffffffffffff
2013-12-05 12:00:00 -0500
elapsed: 23.888907
...
Files
Updated by duerst (Martin Dürst) about 11 years ago
This runs in no time on ruby 2.0.0p247 (2013-06-27) [i386-mingw32] (same patch level!). Can you give more details about your environment?
Updated by mpelzsherman (Michael Pelz-Sherman) about 11 years ago
duerst (Martin Dürst) wrote:
This runs in no time on ruby 2.0.0p247 (2013-06-27) [i386-mingw32] (same patch level!). Can you give more details about your environment?
Model Name: MacBook Pro
Model Identifier: MacBookPro9,1
Processor Name: Intel Core i7
Processor Speed: 2.3 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
Boot ROM Version: MBP91.00D3.B08
SMC Version (system): 2.1f173
Serial Number (system): C02JP19HF1G3
Updated by mpelzsherman (Michael Pelz-Sherman) about 11 years ago
Running ruby-2.0.0-p247 [ x86_64 ], installed using RVM.
Updated by Anonymous about 11 years ago
Reproduced on trunk:
irb(main):027:0> RUBY_DESCRIPTION
=> "ruby 2.1.0dev (2013-12-07 trunk 44044) [x86_64-darwin13.0]"
irb(main):028:0> Benchmark.measure { Time.parse("12:00 PM " + "f"*30) }.to_s
=> " 4.760000 0.000000 4.760000 ( 4.759983)\n"
irb(main):029:0> Benchmark.measure { Time.parse("12:00 PM " + "f"*31) }.to_s
=> " 7.770000 0.000000 7.770000 ( 7.787989)\n"
irb(main):030:0> Benchmark.measure { Time.parse("12:00 PM " + "f"*32) }.to_s
=> " 12.390000 0.000000 12.390000 ( 12.405786)\n"
Updated by hsbt (Hiroshi SHIBATA) about 11 years ago
- Target version set to 2.1.0
Updated by phasis68 (Heesob Park) about 11 years ago
It can be reproduced on ruby 2.0.0-p247 i386-mingw32.
I modified a test code to extension library free version.
C:\work>ruby -v
ruby 2.0.0p247 (2013-06-27) [i386-mingw32]
C:\work>type test.rb
require 'benchmark'
str = "12:00 PM fffffffffffffffffffffffffffff"
pat=/((?:\d+\s*:\s*\d+(?:\s*:\s*\d+(?:[,.]\d*)?)?|\d+\sh(?:\s\d+m?(?:\s*\d+s?)
?)?)(?:\s*ap)?|\d+\s*ap)(?:\s*((?:gmt|utc?)?[-+]\d
+(?:[,.:]\d+(?::\d+)?)?|[[:alpha:].\s]+(?:standard|daylight)\stime\b|[[:alpha:]]
+(?:\sdst)?\b))?/i
p Benchmark.measure { pat.match(str) }
C:\work>ruby test.rb
#<Benchmark::Tms:0x24a3440 @label="", @real=3.303211, @cstime=0.0, @cutime=0.0,
@stime=0.0, @utime=3.2969999999999997, @total=3.2969999999999997>
Interestingly, this issue is not reproduced on irb session or without filename.
C:\work>ruby < test.rb
#<Benchmark::Tms:0x24cf4e0 @label="", @real=0.0, @cstime=0.0, @cutime=0.0, @stim
e=0.0, @utime=0.0, @total=0.0>
C:\work>irb
DL is deprecated, please use Fiddle
irb(main):001:0> require 'benchmark'
=> true
irb(main):002:0> str = "12:00 PM fffffffffffffffffffffffffffff"
=> "12:00 PM fffffffffffffffffffffffffffff"
<:].\s]+(?:standard|daylight)\stime\b|[[:alpha:]]+(?:\sdst)?\b))?/i
=> /((?:\d+\s*:\s*\d+(?:\s*:\s*\d+(?:[,.]\d*)?)?|\d+\sh(?:\s\d+m?(?:\s*\d+s?)?
)?)(?:\s*ap)?|\d+\s*ap)(?:\s*((?:gmt|utc?)?[-+]\d+
(?:[,.:]\d+(?::\d+)?)?|[[:alpha:].\s]+(?:standard|daylight)\stime\b|[[:alpha:]]+
(?:\sdst)?\b))?/i
irb(main):004:0> p Benchmark.measure { pat.match(str) }
#<Benchmark::Tms:0x27059a0 @label="", @real=0.0, @cstime=0.0, @cutime=0.0, @stim
e=0.0, @utime=0.0, @total=0.0>
=> #<Benchmark::Tms:0x27059a0 @label="", @real=0.0, @cstime=0.0, @cutime=0.0, @s
time=0.0, @utime=0.0, @total=0.0>
Updated by akr (Akira Tanaka) about 11 years ago
- Status changed from Open to Assigned
- Assignee set to tadf (tadayoshi funaba)
I assigned this issue to tadf because the pattern
is written in ext/date/date_parse.c.
Updated by phasis68 (Heesob Park) about 11 years ago
I think this is an issue of Oniguruma(regular expressions library).
Here is a shortest code of producing this issue.
p /(?:[[:alpha:]]+\s)?/i.match("f"*40)
The [:alpha:] bracket and case ignore flag combination shows bad performance.
The simple workaround is using [a-z] instead of [:alpha:].
Updated by nobu (Nobuyoshi Nakada) about 11 years ago
[[:alpha:]] doesn't equal to [a-z], the former matches unicode alphabet categories.
Updated by nobu (Nobuyoshi Nakada) about 11 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r44086.
Michael, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
date_parse.c: get rid of backtrack explosion
- ext/date/date_parse.c (parse_time): unset case-insensitive flag
for [:alpha:], which already implies both cases, to get rid of
backtrack explosion. [ruby-core:58876] [Bug #9221]
Updated by phasis68 (Heesob Park) about 11 years ago
I know [[:alpha:]] is not equal to [a-z].
But, it seems that the current time parsing is only for ascii string.
BTW, date_strptime.c has the same regular expression on line 567-572.
It should be modified also.
Updated by nagachika (Tomoyuki Chikanaga) about 11 years ago
- Status changed from Closed to Open
- Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: REQUIRED, 2.0.0: REQUIRED
Thank you for your notice. I'll re-open this ticket.
And I confirmed ruby_1_9_3 and ruby_2_0_0 have these regular expressions.
Updated by headius (Charles Nutter) about 11 years ago
Also affects JRuby: https://github.com/jruby/jruby/issues/1319
Updated by nobu (Nobuyoshi Nakada) about 11 years ago
- Status changed from Open to Closed
This issue was solved with changeset r44126.
Michael, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
date_strptime.c: get rid of backtrack explosion
- ext/date/date_strptime.c (date__strptime_internal): unset
case-insensitive flag for [:alpha:], which already implies both
cases, to get rid of backtrack explosion. [ruby-core:58984]
[Bug #9221]
Updated by nagachika (Tomoyuki Chikanaga) about 11 years ago
- Backport changed from 1.9.3: REQUIRED, 2.0.0: REQUIRED to 1.9.3: REQUIRED, 2.0.0: DONE
r44086 and r44126 are backported to ruby_2_0_0 branch at r44126.
Updated by usa (Usaku NAKAMURA) almost 11 years ago
- Backport changed from 1.9.3: REQUIRED, 2.0.0: DONE to 1.9.3: DONE, 2.0.0: DONE
backported into ruby_1_9_3 at r44746.