https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112016-10-11T08:54:14ZRuby Issue Tracking SystemRuby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=608352016-10-11T08:54:14Zakr (Akira Tanaka)akr@fsij.org
<ul></ul><p>matz's old idea: add String#sg (or String#gs).<br>
ruby-dev:33553</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=608482016-10-11T11:29:14Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>We looked at this issue in developer meeting today.</p>
<p>While matz's suggestion of new method was excavated from the past, myself and others argued possibility of extending the gsub method that exists today. Proposals include:</p>
<ol>
<li>what about just passing a MatchData instead of a String. That hurts no one because almost every time when we need a block to pass to gsub etc, what is needed is $1.</li>
<li>what about passing a MatchData <em>in addition</em> to a String like gsub(){|string, md|}. It is 100% safe even when the block parameter is used.</li>
<li>what about extending the block parameter string with a module, like open-uri does, to define additional method.</li>
<li>there are of course attendees who think a separate method is the cleanest solution.</li>
</ol>
<p>My idea was #3 but in reality #2 can be a perfect and easy solution. What do you think matz?</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=608542016-10-11T15:24:36Zakr (Akira Tanaka)akr@fsij.org
<ul></ul><p>Shyouhei Urabe wrote:</p>
<blockquote>
<ol start="2">
<li>what about passing a MatchData <em>in addition</em> to a String like gsub(){|string, md|}. It is 100% safe even when the block parameter is used.</li>
</ol>
</blockquote>
<p>Surprisingly, it may be possible.</p>
<pre><code>% ruby -e '
p "abc".gsub(/b/) {|str| "(#{str})" }
String.prepend Module.new {
def gsub(*args)
super(*args) {|str| yield str, $~ }
end
}
p "abc".gsub(/b/) {|str| "(#{str})" }
p "abc".gsub(/b/) {|str, md| "(#{md.inspect})" }
'
"a(b)c"
"a(b)c"
"a(#<MatchData \"b\">)c"
</code></pre>
<p>It is important that String#gsub yields two arguments<br>
instead of an two-elements array which causes incompatiblity.</p>
<pre><code>% ruby -e '
p "abc".gsub(/b/) {|str| "(#{str})" }
String.prepend Module.new {
def gsub(*args)
super(*args) {|str| yield [str, $~] }
end
}
p "abc".gsub(/b/) {|str| "(#{str})" }
p "abc".gsub(/b/) {|str, md| "(#{md.inspect})" }
'
"a(b)c"
"a([\"b\", #<MatchData \"b\">])c"
"a(#<MatchData \"b\">)c"
</code></pre> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=608612016-10-11T19:21:08Zherwin (Herwin W)
<ul></ul><p>As reference: the e-mail Akira Tanaka mentioned can be found at <a href="http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-dev/33553" class="external">http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-dev/33553</a>, or with a pretty terrible translation to English at <a href="https://translate.google.com/translate?sl=ja&tl=en&u=http%3A%2F%2Fblade.nagaokaut.ac.jp%2Fcgi-bin%2Fscat.rb%2Fruby%2Fruby-dev%2F33553" class="external">https://translate.google.com/translate?sl=ja&tl=en&u=http%3A%2F%2Fblade.nagaokaut.ac.jp%2Fcgi-bin%2Fscat.rb%2Fruby%2Fruby-dev%2F33553</a></p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=608622016-10-11T19:31:46Zherwin (Herwin W)
<ul></ul><p>About the suggestions by Shyouhei Urabe: even though option 2 is very pragmatic (it solves the problem, keeps backwards compatibility and is a small change), the idea of the following code looks a bit off to me:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="n">str</span><span class="p">.</span><span class="nf">gsub!</span><span class="p">(</span><span class="sr">/\[(\w+)\]/</span><span class="p">)</span> <span class="p">{</span> <span class="o">|</span><span class="n">_</span><span class="p">,</span> <span class="n">m</span><span class="o">|</span> <span class="n">placeholder</span><span class="p">(</span><span class="n">m</span><span class="p">)</span> <span class="p">}</span>
</code></pre>
<p>I guess you would use either a String or a MatchObject in the block, but never both (if you'd need them both, you could get the String from the MatchObject of course). Always having to ignore the first argument to the block feels a bit like a hack. Not that I have a better suggestion though.</p>
<p>A separate method that passes a MatchData object would be another solution that feels a bit hacky: String#gsub and String#gs would be almost identical, and which method would be preferred if you don't add a block, but use a second parameter as the replacement string? And without looking at the documentation, I would have no idea what String#gs/String#sg would do, the names are very undescriptive.</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=616732016-11-25T07:18:06Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Feedback</i></li><li><strong>Assignee</strong> set to <i>matz (Yukihiro Matsumoto)</i></li></ul><p>Out of Shyouhei's 4 options,</p>
<ol>
<li>not acceptable for compatibility's sake</li>
<li>not excited, may cause compatibility problem. besides that, <code>String#scan</code> cannot be enhanced by this option</li>
<li>looks nice but I have performance concern</li>
<li>the best solution, if we have a good name candidate.</li>
</ol>
<p>Matz.</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=617532016-11-26T20:45:13Zherwin (Herwin W)
<ul></ul><p>What about <code>sub_md</code> as a name?</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=618672016-12-04T17:40:20Zherwin (Herwin W)
<ul></ul><p>I posted this link to IRC (#ruby on freenode), just to see if anyone had a good name suggestion. The suggestions <code>#matchsub</code> and <code>#msub</code> were offered. I don't really like them, <code>#matchsub</code> sounds like it tries to subtitute a match (exactly what <code>#sub</code> does, <code>#msub</code> reminds me of the <code>/m</code> modifier or regex (multiline, just as <code>#gsub</code> is named after the <code>/g</code> modifier (global)) instead of a <code>MatchData</code> object. (Although I smiled at the suggestion <code>#gsubthewayitshouldhavebeenallalong</code>, the name reminds me to much of <code>mysql_real_escape_string</code> to take serious)</p>
<p>Another option that was suggested was adding a keyword argument to toggle the behaviour. I actually liked that proposal: it keeps backwards compatibility and doesn't result in an explosion of the number of methods</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=620392016-12-15T10:34:55ZEregon (Benoit Daloze)
<ul></ul><p>Maybe this functionality should just be an extra keyword argument to these methods?</p>
<p>Like<br>
str.gsub!(/[(\w+)]/, md: true)</p>
<p>Not as concise, but much better on compatibility and clarity of the intent.</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=620472016-12-16T01:00:09Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>Benoit Daloze wrote:</p>
<blockquote>
<p>Maybe this functionality should just be an extra keyword argument to these methods?</p>
<p>Like<br>
str.gsub!(/[(\w+)]/, md: true)</p>
</blockquote>
<p>We already use that feature for another purpose.</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="nb">p</span> <span class="s2">"emdash"</span><span class="p">.</span><span class="nf">gsub</span><span class="p">(</span><span class="sr">/md/</span><span class="p">,</span> <span class="s2">"md"</span> <span class="o">=></span> <span class="kp">true</span><span class="p">)</span> <span class="c1">#=> "etrueash"</span>
</code></pre> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=625742017-01-19T09:22:22Zakr (Akira Tanaka)akr@fsij.org
<ul></ul><p>How about String#gsb and String#sb ?</p>
<ul>
<li>gsb and sb are shorter and gsub and sub.</li>
<li>gsb and sb is readable as g-substitute and substitute.</li>
</ul> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=625852017-01-19T19:43:22Zherwin (Herwin W)
<ul></ul><p>Regarding <code>String#gsb</code> and <code>String#sb</code>: It is far from clear what the difference between <code>#sub</code> and <code>#sb</code> is just by the method names. And personally I don't care that much about the length of the method names, I prefer clearness over shortness.</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=626112017-01-20T07:50:24Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>Herwin W wrote:</p>
<blockquote>
<p>Regarding <code>String#gsb</code> and <code>String#sb</code>: It is far from clear what the difference between <code>#sub</code> and <code>#sb</code> is just by the method names. And personally I don't care that much about the length of the method names, I prefer clearness over shortness.</p>
</blockquote>
<p>Agreed. I also think that the new methods should have longer names (maybe shorter than <code>gsub_with_match_object</code>, but still longer than just <code>gsub</code>), because most of the time, the old methods will do just fine (they did so for more than 20 years of Ruby history).</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=710102018-03-15T08:09:26Zakr (Akira Tanaka)akr@fsij.org
<ul></ul><p>How about String#subm and String#gsubm ?</p>
<p>It is bit longer but not so long.</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=710112018-03-15T08:10:09Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul></ul><p><code>subm</code> and<code>gsubm</code> are acceptable.</p>
<p>Matz.</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=710182018-03-15T10:49:03ZEregon (Benoit Daloze)
<ul></ul><p>shyouhei (Shyouhei Urabe) wrote:</p>
<blockquote>
<p>Benoit Daloze wrote:</p>
<blockquote>
<p>Maybe this functionality should just be an extra keyword argument to these methods?</p>
<p>Like<br>
str.gsub!(/[(\w+)]/, md: true)</p>
</blockquote>
<p>We already use that feature for another purpose.</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="nb">p</span> <span class="s2">"emdash"</span><span class="p">.</span><span class="nf">gsub</span><span class="p">(</span><span class="sr">/md/</span><span class="p">,</span> <span class="s2">"md"</span> <span class="o">=></span> <span class="kp">true</span><span class="p">)</span> <span class="c1">#=> "etrueash"</span>
</code></pre>
</blockquote>
<p>Not with a Symbol though.<br>
So I think <code>gsub(/regexp/, md: true)</code> would not conflict with existing usages.</p>
<p>matz (Yukihiro Matsumoto) wrote:</p>
<blockquote>
<p><code>subm</code> and<code>gsubm</code> are acceptable.</p>
</blockquote>
<p>Those names sound very cryptic to me.</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=748032018-11-08T07:31:03Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-2 priority-4 priority-default" href="/issues/6802">Feature #6802</a>: String#scan should have equivalent yielding MatchData</i> added</li></ul> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=748052018-11-08T07:31:31Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-2 priority-4 priority-default" href="/issues/5749">Feature #5749</a>: new method String#match_all needed</i> added</li></ul> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=748072018-11-08T07:31:47Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-7 priority-4 priority-default closed" href="/issues/5606">Feature #5606</a>: String#each_match(regexp)</i> added</li></ul> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=748092018-11-08T07:32:36Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-6 priority-4 priority-default closed" href="/issues/546">Feature #546</a>: String#gsub と Strnig#scan のブロックパラメータの一致</i> added</li></ul> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=768652019-02-21T19:17:18Zsawa (Tsuyoshi Sawada)
<ul></ul><p>What about having <code>Enumerator#with_m</code>, such that</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="nb">gsub</span><span class="p">(</span><span class="n">pattern</span><span class="p">).</span><span class="nf">with_m</span><span class="p">{</span><span class="o">|</span><span class="n">match</span><span class="p">,</span> <span class="n">match_data</span><span class="o">|</span> <span class="n">block</span><span class="p">}</span> <span class="err">→</span> <span class="n">new_str</span>
<span class="nb">sub</span><span class="p">(</span><span class="n">pattern</span><span class="p">).</span><span class="nf">with_m</span><span class="p">{</span><span class="o">|</span><span class="n">match</span><span class="p">,</span> <span class="n">match_data</span><span class="o">|</span> <span class="n">block</span><span class="p">}</span> <span class="err">→</span> <span class="n">new_str</span>
</code></pre>
<p>And if we want to expand that to <code>scan</code>, we can introduce</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="nb">scan</span> <span class="err">→</span> <span class="n">enumerator</span>
</code></pre>
<p>And then do</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="nb">scan</span><span class="p">.</span><span class="nf">with_m</span><span class="p">(</span><span class="n">pattern</span><span class="p">){</span><span class="o">|</span><span class="p">(</span><span class="n">match</span><span class="p">,</span> <span class="o">...</span><span class="p">),</span> <span class="n">match_data</span><span class="o">|</span> <span class="n">block</span><span class="p">}</span> <span class="err">→</span> <span class="n">str</span>
</code></pre> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=768662019-02-21T20:02:43Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>sawa (Tsuyoshi Sawada) wrote:</p>
<blockquote>
<p>What about having <code>Enumerator#with_m</code>, such that</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="nb">gsub</span><span class="p">(</span><span class="n">pattern</span><span class="p">).</span><span class="nf">with_m</span><span class="p">{</span><span class="o">|</span><span class="n">match</span><span class="p">,</span> <span class="n">match_data</span><span class="o">|</span> <span class="n">block</span><span class="p">}</span> <span class="err">→</span> <span class="n">new_str</span>
<span class="nb">sub</span><span class="p">(</span><span class="n">pattern</span><span class="p">).</span><span class="nf">with_m</span><span class="p">{</span><span class="o">|</span><span class="n">match</span><span class="p">,</span> <span class="n">match_data</span><span class="o">|</span> <span class="n">block</span><span class="p">}</span> <span class="err">→</span> <span class="n">new_str</span>
</code></pre>
</blockquote>
<p>This doesn't make sense to add to Enumerator in my opinion, since most Enumerators will not be created from <code>gsub</code>/<code>sub</code>:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="nf">each</span><span class="p">.</span><span class="nf">with_m</span><span class="p">{}</span>
</code></pre>
<p>Possibly <code>gsub</code> and <code>sub</code> could return an instance of an Enumerator subclass that supports the <code>with_m</code>, but that seems like overkill.</p>
<p>Reading through the previously discussed alternative approaches:</p>
<ul>
<li>Passing a second argument to the block breaks if the block is a lambda.</li>
<li>A keyword argument via <code>:md</code> is currently allowed but ignored. I don't think that breaks compatibility, but with the discussion of keyword arguments in <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: "Real" keyword argument (Closed)" href="https://redmine.ruby-lang.org/issues/14183">#14183</a>, introducing an <code>:md</code> keyword argument may make it harder to transition such code to Ruby 3.</li>
<li>
<code>subm</code> and <code>gsubm</code> are both fairly cryptic in terms of method names, but compared to other approaches, new method names seem to be the best way to handle this if we want to add support for it.</li>
</ul>
<p>We could leave things as they are and use <code>$1</code>/<code>$2</code>/<code>$~</code> to access the captures instead the block.</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=821692019-10-18T07:20:43Zshugo (Shugo Maeda)
<ul></ul><p>jeremyevans0 (Jeremy Evans) wrote:</p>
<blockquote>
<ul>
<li>Passing a second argument to the block breaks if the block is a lambda.</li>
<li>A keyword argument via <code>:md</code> is currently allowed but ignored. I don't think that breaks compatibility, but with the discussion of keyword arguments in <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: "Real" keyword argument (Closed)" href="https://redmine.ruby-lang.org/issues/14183">#14183</a>, introducing an <code>:md</code> keyword argument may make it harder to transition such code to Ruby 3.</li>
<li>
<code>subm</code> and <code>gsubm</code> are both fairly cryptic in terms of method names, but compared to other approaches, new method names seem to be the best way to handle this if we want to add support for it.</li>
</ul>
</blockquote>
<p>I prefer to the last approach, but don't like names <code>subm</code> and <code>gsubm</code>.</p>
<p>How about <code>repl</code> and <code>repl_all</code>?<br>
JavaScripts's <code>replace</code> is similar to Ruby's <code>sub</code>, but Ruby has already <code>replace</code> and it's destructive.<br>
<code>repl</code> is an abbreviation for replace. <code>grepl</code> is confusing with <code>grep</code>, so I propose <code>repl_all</code> instead.</p>
<p>And I prefer to <code>match_all</code> for the MatchData version of <code>scan</code> (<a class="issue tracker-2 status-2 priority-4 priority-default" title="Feature: new method String#match_all needed (Assigned)" href="https://redmine.ruby-lang.org/issues/5749">#5749</a>).</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=944612021-11-03T14:03:50Zjrochkind (jonathan rochkind)jonathan@dnil.net
<ul></ul><p>I would love to see this resurrected.</p>
<p>If you want the MatchData in a gsub block, at first it looks like you can just use <code>Regexp.last_match</code>.</p>
<p>But I think this, along with variables like <code>$1</code>, may not be thread-safe? See for instance related at <a href="https://bugs.ruby-lang.org/issues/12689" class="external">https://bugs.ruby-lang.org/issues/12689</a> and <a href="https://github.com/jruby/jruby/issues/3031#issuecomment-259057232" class="external">https://github.com/jruby/jruby/issues/3031#issuecomment-259057232</a> and <a href="https://github.com/jruby/jruby/issues/4868#issuecomment-347276140" class="external">https://github.com/jruby/jruby/issues/4868#issuecomment-347276140</a></p>
<p>If so, it makes having a clear fixed MatchData passed as an arg to block more than just a minor convenience, but necessary to do things like this in a thread-safe way at all? Or I may be misunderstanding.</p> Ruby master - Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a Stringhttps://redmine.ruby-lang.org/issues/12745?journal_id=944662021-11-03T23:46:37ZEregon (Benoit Daloze)
<ul></ul><p>The fix for <a class="issue tracker-1 status-1 priority-4 priority-default" title="Bug: Thread isolation of $~ and $_ (Open)" href="https://redmine.ruby-lang.org/issues/12689">#12689</a> seems clear, such variables need to be Thread-local (or even Fiber-local).<br>
That's already what TruffleRuby does (mentioned in that issue) and it seems JRuby is doing the same (<a href="https://github.com/jruby/jruby/issues/3031#issuecomment-660601045" class="external">https://github.com/jruby/jruby/issues/3031#issuecomment-660601045</a>).<br>
And this is only a problem if a given method activation has blocks and those blocks from that same method call are executed in different threads and one of them mutates $~, which seems already fairly rare.</p>