Bug #1740
closedruby regexp 100% usage cpu.
Description
=begin
On freebsd i'm test ruby
ruby 1.8.7 (2009-04-08 patchlevel 160) [i386-freebsd6]
And my linux notebook
ruby -v ruby 1.8.7 (2009-06-08 patchlevel 173) [x86_64-linux]
For this code
#######################################
require 'open-uri'
$KCODE = 'u'
reg = %r{<.?div\sclass\s*=\s*.entry.?>[^<]<.?img\ssrc\s*=\s*.([^"|']).?>[^<]<.?p\sclass\s=\s*.date.?>}im
#del = %r{<(?!p|div|img)[^>]>}i
doc = open('http://www.radiokvit.com.ua/?p=1895').read
#doc.gsub!(del, ' ')
a = doc.match(reg)
p a
######################################
My ruby process use 100% cpu for long time and on linux exit normaly, on freebsd no exit %-(.
I'm submited another bug for freebsd http://www.freebsd.org/cgi/query-pr.cgi?pr=136384 but this is for freebsd only.
This templates writes another man for perl and i'm must use here.
=end
Files
Updated by rue (Eero Saynatkari) over 15 years ago
=begin
Excerpts from rubymine message of Tue Jul 07 17:38:10 +0300 2009:
reg =
%r{<.?div\sclass\s*=\s*.entry.?>[^<]<.?img\ssrc\s*=\s*.([^"|']).?>[^<]<
.?p\sclass\s=\s*.date.?>}im
#del = %r{<(?!p|div|img)[^>]>}iMy ruby process use 100% cpu for long time and on linux exit normaly, on
freebsd no exit %-(.
I'm submited another bug for freebsd
http://www.freebsd.org/cgi/query-pr.cgi?pr=136384 but this is for freebsd only.This templates writes another man for perl and i'm must use here.
Firstly, Ruby regexps are not PCRE, so you must have some
leeway constructing the regexp. You cannot (necessarily)
just drop the Perl version in and expect it to work, or
work the same.
Secondly, you should be using something like Nokogiri or
hpricot rather than "parsing" the HTML yourself. For example
your div matcher will fail if the attribute is quoted.
Thirdly, it has "pathological" written all over it. You
should refactor the regexp to try to get some small case
that is reproducible to illustrate the actual problem to
see if it is something that should be fixed.
I am pretty sure there was another thread about really bad
regexp performance in a pathological case a while back, if
you want to search the archives.
Eero¶
Magic is insufficiently advanced technology.
=end
Updated by nobu (Nobuyoshi Nakada) over 15 years ago
- ruby -v changed from ruby 1.8.7 (2009-06-08 patchlevel 173) [x86_64-linux] to ruby 1.8.7 (2009-04-08 patchlevel 160) [i386-freebsd6]
=begin
=end
Updated by nobu (Nobuyoshi Nakada) over 15 years ago
- Status changed from Open to Rejected
=begin
Too many backtracks consume a lot of time.
You can use (?>...) to suppress backtracking:
reg = %r{(?><div\sclass\s=\s*.entry.?>.?<img\b[^<>]\s+src\s=\s*.([^\"|\']).?>).?<p\sclass\s*=\s*.date.*?>}im
=end
Updated by paranormal (paranormal dev) over 15 years ago
=begin
I'm rewriting one big program, and write compatible layer before all refactoring done. And this regexp bad, because it write this program.
=end