https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112021-04-05T06:16:13ZRuby Issue Tracking SystemRuby master - Bug #17774: Quantified empty group causes regex to failhttps://redmine.ruby-lang.org/issues/17774?journal_id=913172021-04-05T06:16:13Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>Thank you, I can reproduce the issue.</p>
<p>The issue is in the code from <a href="https://github.com/k-takata/Onigmo" class="external">onigmo</a>, so it would be helpful if you could report this issue to the upstream.</p>
<p>By a quick investigation, an optimization expands <code>(){4}</code>, and does not expand <code>(){5}</code>, which makes the difference of the behavior.<br>
Enabling debug output suggests that the bug is caused by <code>USE_MONOMANIAC_CHECK_CAPTURES_IN_ENDLESS_REPEAT</code> option. The <code>(){5}</code> case works great by the following change that disables the option, but I'm unsure the performance impact.</p>
<pre><code>diff --git a/regint.h b/regint.h
index 0740429688..968ea6cde8 100644
--- a/regint.h
+++ b/regint.h
@@ -71,7 +71,6 @@
#define USE_PERL_SUBEXP_CALL
#define USE_CAPITAL_P_NAMED_GROUP
#define USE_BACKREF_WITH_LEVEL /* \k<name+n>, \k<name-n> */
-#define USE_MONOMANIAC_CHECK_CAPTURES_IN_ENDLESS_REPEAT /* /(?:()|())*\2/ */
#define USE_NEWLINE_AT_END_OF_STRING_HAS_EMPTY_LINE /* /\n$/ =~ "\n" */
#define USE_WARNING_REDUNDANT_NESTED_REPEAT_OPERATOR
/* !!! moved to regenc.h. */ /* #define USE_CRNL_AS_LINE_TERMINATOR */
</code></pre> Ruby master - Bug #17774: Quantified empty group causes regex to failhttps://redmine.ruby-lang.org/issues/17774?journal_id=917762021-05-01T11:06:58Zwanabe (_ wanabe)s.wanabe@gmail.com
<ul></ul><p>The reproduction example could be a bit shorter.</p>
<pre><code>$ ruby -ve 'p "xxxx" =~ /(?:x(){5})*$/, "xxxx" =~ /(?:x(){4})*$/'
ruby 3.1.0dev (2021-05-01T02:04:17Z origin/master 121fa24a34) [x86_64-linux]
3
0
</code></pre>
<p>This problem has already been fixed in Oniguruma, a derivative of Onigmo.<br>
<a href="https://github.com/kkos/oniguruma/commit/ca64663ca8bb34ca7dc219d18ec6e475cca9dec8" class="external">https://github.com/kkos/oniguruma/commit/ca64663ca8bb34ca7dc219d18ec6e475cca9dec8</a></p>
<pre><code>$ (git checkout ca64663ca8bb34ca7dc219d18ec6e475cca9dec8~ && autoreconf -vfi && ./configure && make -j6 && sed -i sample/simple.c -e 's/\(pattern *= [^"]*\)"[^"]*"/\1"(?:x(){5})*$"/' -e 's/\(str *= [^"]*\)"[^"]*"/\1"xxxx"/' && (cd sample; make simple)) > build.log 2>&1 && ./sample/simple
match at 3
0: (3-4)
1: (4-4)
$ (git checkout ca64663ca8bb34ca7dc219d18ec6e475cca9dec8 && autoreconf -vfi && ./configure && make -j6 && sed -i sample/simple.c -e 's/\(pattern *= [^"]*\)"[^"]*"/\1"(?:x(){5})*$"/' -e 's/\(str *= [^"]*\)"[^"]*"/\1"xxxx"/' && (cd sample; make simple)) > build.log 2>&1 && ./sample/simple
match at 0
0: (0-4)
1: (4-4)
</code></pre>
<p>I think that introducing a mechanism that exists in Oniguruma 6.x, such as empty_status_mem and set_empty_status_check_trav, may solve the problem.</p> Ruby master - Bug #17774: Quantified empty group causes regex to failhttps://redmine.ruby-lang.org/issues/17774?journal_id=917772021-05-01T12:50:10Zsawa (Tsuyoshi Sawada)
<ul></ul><p>wanabe (_ wanabe) wrote in <a href="#note-2">#note-2</a>:</p>
<blockquote>
<p>... Oniguruma, a derivative of Onigmo</p>
</blockquote>
<p>I believe it is the other way around.</p> Ruby master - Bug #17774: Quantified empty group causes regex to failhttps://redmine.ruby-lang.org/issues/17774?journal_id=917822021-05-01T20:06:54Zwanabe (_ wanabe)s.wanabe@gmail.com
<ul></ul><p>sawa (Tsuyoshi Sawada) wrote in <a href="#note-3">#note-3</a>:</p>
<blockquote>
<p>wanabe (_ wanabe) wrote in <a href="#note-2">#note-2</a>:</p>
<blockquote>
<p>... Oniguruma, a derivative of Onigmo</p>
</blockquote>
<p>I believe it is the other way around.</p>
</blockquote>
<p>Oh I'm very sorry, I wrote it wrong.<br>
I was aware of it, but I simply used the wrong word.</p> Ruby master - Bug #17774: Quantified empty group causes regex to failhttps://redmine.ruby-lang.org/issues/17774?journal_id=941232021-10-13T16:43:27Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>I looked into fixing this by removing the define of <code>USE_MONOMANIAC_CHECK_CAPTURES_IN_ENDLESS_REPEAT</code>, as <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/18">@mame (Yusuke Endoh)</a> indicated: <a href="https://github.com/ruby/ruby/commit/018922ba15eb7aea86957789d7defae9ffc43688" class="external">https://github.com/ruby/ruby/commit/018922ba15eb7aea86957789d7defae9ffc43688</a></p>
<p>It ends up breaking a few specs. For example, it changes the behavior of:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="sr">/(a|\2b|())*/</span><span class="p">.</span><span class="nf">match</span><span class="p">(</span><span class="s2">"aaabbb"</span><span class="p">).</span><span class="nf">to_a</span>
<span class="c1"># Before:</span>
<span class="c1"># => ["aaabbb", "", ""]</span>
<span class="c1"># After:</span>
<span class="c1"># => ["aaa", "", ""]</span>
</code></pre>
<p>For this example, Ruby 1.8 returns <code>["aaa", "a", nil]</code>. The equivalent in Perl returns <code>["aaa", "", ""]</code>. The equivalent in Python 2 and 3 returns <code>["aaabbb", "", ""]</code>. I think the <code>["aaabbb", "", ""]</code> result seems best for a greedy match since it matches the most characters. However, I can also see where an implementation would return one of the other results if a scan terminates when no forward progress is made during an iteration.</p>
<p>Anyway, if we are OK with this behavior change for empty capture groups, I can submit the commit as a pull request. However, I think it would be better to wait for a fix in Onigmo.</p>