https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112015-04-18T17:09:45ZRuby Issue Tracking SystemRuby master - Bug #11014: String#partition doesn't return correct result on zero-width matchhttps://redmine.ruby-lang.org/issues/11014?journal_id=521932015-04-18T17:09:45Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/52193/diff?detail_id=37620">diff</a>)</li><li><strong>Status</strong> changed from <i>Open</i> to <i>Assigned</i></li><li><strong>Assignee</strong> set to <i>matz (Yukihiro Matsumoto)</i></li></ul><p>These methods have been taken from Python, and seems same in Python.<br>
I'm not sure what's the rationale of this behavior.</p> Ruby master - Bug #11014: String#partition doesn't return correct result on zero-width matchhttps://redmine.ruby-lang.org/issues/11014?journal_id=836872020-01-07T10:04:40Zsawa (Tsuyoshi Sawada)
<ul></ul><p>The problem is not just for <code>partition</code>, but also involves <code>split</code> and <code>scan</code>.</p>
<p>I think your regex <code>/^=*/</code> is unnecessarily complex. Your point can be made by <code>/\A/</code>, which is simpler.</p>
<p>I tried with four regex patterns <code>/\A/</code>, <code>/\A.*/</code>, <code>/\z/</code>, <code>/.*\z/</code>, and compared methods <code>split</code>, <code>partition</code>, <code>scan</code>. The result of the first example in each group below matches the second and the third, and the fourth one matches the middle element. So far, so good.</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="s2">"foo"</span><span class="p">.</span><span class="nf">match</span><span class="p">(</span><span class="sr">/\z/</span><span class="p">).</span><span class="nf">then</span><span class="p">{[</span><span class="n">_1</span><span class="p">.</span><span class="nf">pre_match</span><span class="p">,</span> <span class="n">_1</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">_1</span><span class="p">.</span><span class="nf">post_match</span><span class="p">]}</span> <span class="c1"># => ["foo", "", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sr">/(\z)/</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># => ["foo", "", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">partition</span><span class="p">(</span><span class="sr">/\z/</span><span class="p">)</span> <span class="c1"># => ["foo", "", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">scan</span><span class="p">(</span><span class="sr">/\z/</span><span class="p">)</span> <span class="c1"># => [""]</span>
</code></pre>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="s2">"foo"</span><span class="p">.</span><span class="nf">match</span><span class="p">(</span><span class="sr">/\A.*/</span><span class="p">).</span><span class="nf">then</span><span class="p">{[</span><span class="n">_1</span><span class="p">.</span><span class="nf">pre_match</span><span class="p">,</span> <span class="n">_1</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">_1</span><span class="p">.</span><span class="nf">post_match</span><span class="p">]}</span> <span class="c1"># => ["", "foo", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sr">/(\A.*)/</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># => ["", "foo", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">partition</span><span class="p">(</span><span class="sr">/\A.*/</span><span class="p">)</span> <span class="c1"># => ["", "foo", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">scan</span><span class="p">(</span><span class="sr">/\A.*/</span><span class="p">)</span> <span class="c1"># => ["foo"]</span>
</code></pre>
<p>In the following, we see inconsistency:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="s2">"foo"</span><span class="p">.</span><span class="nf">match</span><span class="p">(</span><span class="sr">/\A/</span><span class="p">).</span><span class="nf">then</span><span class="p">{[</span><span class="n">_1</span><span class="p">.</span><span class="nf">pre_match</span><span class="p">,</span> <span class="n">_1</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">_1</span><span class="p">.</span><span class="nf">post_match</span><span class="p">]}</span> <span class="c1"># => ["", "", "foo"]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sr">/(\A)/</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># => ["foo"]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">partition</span><span class="p">(</span><span class="sr">/\A/</span><span class="p">)</span> <span class="c1"># => ["foo", "", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">scan</span><span class="p">(</span><span class="sr">/\A/</span><span class="p">)</span> <span class="c1"># => [""]</span>
</code></pre>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="s2">"foo"</span><span class="p">.</span><span class="nf">match</span><span class="p">(</span><span class="sr">/.*\z/</span><span class="p">).</span><span class="nf">then</span><span class="p">{[</span><span class="n">_1</span><span class="p">.</span><span class="nf">pre_match</span><span class="p">,</span> <span class="n">_1</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">_1</span><span class="p">.</span><span class="nf">post_match</span><span class="p">]}</span> <span class="c1"># => ["", "foo", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sr">/(.*\z)/</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># => ["", "foo", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">partition</span><span class="p">(</span><span class="sr">/.*\z/</span><span class="p">)</span> <span class="c1"># => ["", "foo", ""]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">scan</span><span class="p">(</span><span class="sr">/.*\z/</span><span class="p">)</span> <span class="c1"># => ["foo", ""]</span>
</code></pre>
<p>The problematic cases and their expected values (in terms of consistency) are:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="s2">"foo"</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sr">/(\A)/</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># => ["foo"], expected [ "", "", "foo"]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">partition</span><span class="p">(</span><span class="sr">/\A/</span><span class="p">)</span> <span class="c1"># => ["foo", "", ""], expected ["", "", "foo"]</span>
<span class="s2">"foo"</span><span class="p">.</span><span class="nf">scan</span><span class="p">(</span><span class="sr">/.*\z/</span><span class="p">)</span> <span class="c1"># => ["foo", ""], expected ["foo"]</span>
</code></pre>
<p>The case described in the issue is the second case above.</p> Ruby master - Bug #11014: String#partition doesn't return correct result on zero-width matchhttps://redmine.ruby-lang.org/issues/11014?journal_id=836882020-01-07T13:23:45ZDan0042 (Daniel DeLorme)
<ul></ul><p>IIRC this has to do with zero-length matches being ignored in certain conditions, in particular having to do with repeating/multiple matches.</p>
<p>if <code>"foo".split(/\A/)</code> was <code>["","foo"]</code><br>
then <code>"foo".split(//)</code> would have to be <code>["","f","o","o"]</code><br>
and <code>"foo".split(/\G/)</code> could result in infinite loop matching <code>["","","","","",..."foo"]</code></p>
<p><del>But I don't understand why <code>partition</code> doesn't behave like <code>match</code>.</del><br>
Ah, probably because it behaves like <code>split(rx,2)</code></p>
<p>Note that gsub has different behavior:<br>
<code>"foo".gsub(/\G/,'_') #=> "_f_o_o_"</code><br>
<code>"foo".gsub(//,'_') #=> "_f_o_o_"</code></p>
<p>explained better than I ever could:<br>
<a href="https://www.regular-expressions.info/zerolength.html" class="external">https://www.regular-expressions.info/zerolength.html</a></p> Ruby master - Bug #11014: String#partition doesn't return correct result on zero-width matchhttps://redmine.ruby-lang.org/issues/11014?journal_id=838292020-01-14T02:49:29Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>We'd like to focus on String#partition in this ticket.</p>
<p>IMO, String#scan and #split are heavily used so they should not change just for consistency reason. Please create another ticket if you really need to discuss. And a patch suggestion is welcome.</p> Ruby master - Bug #11014: String#partition doesn't return correct result on zero-width matchhttps://redmine.ruby-lang.org/issues/11014?journal_id=838972020-01-16T05:31:37Zakr (Akira Tanaka)akr@fsij.org
<ul></ul><p>nobu (Nobuyoshi Nakada) wrote:</p>
<blockquote>
<p>These methods have been taken from Python, and seems same in Python.<br>
I'm not sure what's the rationale of this behavior.</p>
</blockquote>
<p>I couldn't confirm it.</p>
<pre><code>% python3
Python 3.7.3 (default, Apr 3 2019, 05:39:12)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "abc".partition("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: empty separator
>>>
</code></pre>
<p>The empty separator causes an error in Python.</p> Ruby master - Bug #11014: String#partition doesn't return correct result on zero-width matchhttps://redmine.ruby-lang.org/issues/11014?journal_id=838992020-01-16T05:44:07Zakr (Akira Tanaka)akr@fsij.org
<ul></ul><p>I feel the current behavior is just a bug and <code>"abc".partition(//)</code> should return <code>["", "", "abc"]</code> instead <code>["abc", "", ""]</code>.</p> Ruby master - Bug #11014: String#partition doesn't return correct result on zero-width matchhttps://redmine.ruby-lang.org/issues/11014?journal_id=839052020-01-16T06:45:05Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Closed</i></li></ul><p>Applied in changeset <a class="changeset" title="Fix `String#partition` Split with the matched part when the separator matches the empty part at ..." href="https://redmine.ruby-lang.org/projects/ruby-master/repository/git/revisions/fce54a5404139a77bd0b7d6f82901083fcb16f1e">git|fce54a5404139a77bd0b7d6f82901083fcb16f1e</a>.</p>
<hr>
<p>Fix <code>String#partition</code></p>
<p>Split with the matched part when the separator matches the empty<br>
part at the beginning. [Bug <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: String#partition doesn't return correct result on zero-width match (Closed)" href="https://redmine.ruby-lang.org/issues/11014">#11014</a>]</p>