https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112013-04-03T15:27:38ZRuby Issue Tracking SystemBackport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=381542013-04-03T15:27:38Zsawa (Tsuyoshi Sawada)
<ul></ul><p>=begin<br>
A different regex:</p>
<pre><code>regex4 = /[[:space:]]?\z/
</code></pre>
<p>seems to work as expected:</p>
<pre><code>"hello" =~ regex4 # => 5
"こんにちは" =~ regex4 # => 5
</code></pre>
<p>=end</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=381552013-04-03T15:37:51Zsawa (Tsuyoshi Sawada)
<ul></ul><p>=begin<br>
Still a different regex:</p>
<pre><code>regex5 = /\n?$/
</code></pre>
<p>seems to work as expected:</p>
<pre><code>"hello" =~ regex5 # => 5
"こんにちは" =~ regex5 # => 5
</code></pre>
<p>=end</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=382782013-04-06T09:58:50Zsawa (Tsuyoshi Sawada)
<ul></ul><p>=begin<br>
The problem seems to happen with combination of a certain token, <code>?</code>, and <code>\z</code>.</p>
<pre><code>"こんにちは" =~ /a?\z/ # => nil
"こんにちは" =~ / ?\z/ # => nil
"こんにちは" =~ /\t?\z/ # => nil
"こんにちは" =~ /\n?\z/ # => nil
"こんにちは" =~ /\s?\z/ # => nil
"こんにちは" =~ /.?\z/ # => 4
"こんにちは" =~ /\S?\z/ # => 4
"こんにちは" =~ /\W?\z/ # => 5
"こんにちは" =~ /あ?\z/ # => 5
"こんにちは" =~ /\w?\z/ # => 5
</code></pre>
<p>=end</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=382902013-04-06T19:34:18Zsawa (Tsuyoshi Sawada)
<ul></ul><p>Is this bug report wrong? If so, please note so.</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=382922013-04-06T20:15:04Znaruse (Yui NARUSE)naruse@airemix.jp
<ul><li><strong>Category</strong> set to <i>M17N</i></li><li><strong>Status</strong> changed from <i>Open</i> to <i>Assigned</i></li><li><strong>Assignee</strong> set to <i>naruse (Yui NARUSE)</i></li><li><strong>Target version</strong> set to <i>2.1.0</i></li></ul><p>sawa (Tsuyoshi Sawada) wrote:</p>
<blockquote>
<p>Is this bug report wrong? If so, please note so.</p>
</blockquote>
<p>This looks really bug of oniguruma/onigmo.</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=383692013-04-09T04:05:03Zacheong87 (Andrew Cheong)acheong87@gmail.com
<ul></ul><p>Contributing notes regarding this bug can be found here: <a href="http://stackoverflow.com/a/15885857/925913" class="external">http://stackoverflow.com/a/15885857/925913</a>.</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=383782013-04-09T14:42:45Zrondinif (Franco Rondini)rondinif@yahoo.it
<ul></ul><p>Just edited the <a href="http://stackoverflow.com/a/15885857/1657028" class="external">answer</a> and <a href="https://gist.github.com/anonymous/5339185" class="external">test code available</a></p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=383992013-04-10T00:41:27Zk_takata (Ken Takata)
<ul><li><strong>File</strong> <a href="/attachments/3652">fix-8210-1.diff</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/3652/fix-8210-1.diff">fix-8210-1.diff</a> added</li><li><strong>File</strong> <a href="/attachments/3653">fix-8210-2.diff</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/3653/fix-8210-2.diff">fix-8210-2.diff</a> added</li></ul><p>This problem was caused by optimization of \z.<br>
I wrote two patches to fix this problem.</p>
<p>Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,<br>
but the former one tries to do backward search when 'start==range'<br>
after 'start' is adjusted. This behavior is a little bit confusing.</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=384302013-04-11T12:46:53Zsawa (Tsuyoshi Sawada)
<ul></ul><p>Is either of k_takata's bug fix going to be incorporated?</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=384502013-04-11T23:31:38Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>k_takata (Ken Takata) wrote:</p>
<blockquote>
<p>This problem was caused by optimization of \z.<br>
I wrote two patches to fix this problem.</p>
<p>Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,<br>
but the former one tries to do backward search when 'start==range'<br>
after 'start' is adjusted. This behavior is a little bit confusing.</p>
</blockquote>
<p>k_takata (Ken Takata) wrote:</p>
<blockquote>
<p>This problem was caused by optimization of \z.<br>
I wrote two patches to fix this problem.</p>
<p>Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,<br>
but the former one tries to do backward search when 'start==range'<br>
after 'start' is adjusted. This behavior is a little bit confusing.</p>
</blockquote>
<p>I think -1 is suitable because it looks to keep original intention more than -2.</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=385132013-04-13T19:31:24Zk_takata (Ken Takata)
<ul><li><strong>File</strong> <a href="/attachments/3662">fix-8210-1-update.diff</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/3662/fix-8210-1-update.diff">fix-8210-1-update.diff</a> added</li></ul><blockquote>
<p>I think -1 is suitable because it looks to keep original intention more than -2.</p>
</blockquote>
<p>Thanks for your comment.<br>
I have updated onigmo's tmp/ruby-2.0.x branch.<br>
<a href="https://github.com/k-takata/Onigmo/tree/f22cf2e566712cace60d17f84d63119d7c5764ee" class="external">https://github.com/k-takata/Onigmo/tree/f22cf2e566712cace60d17f84d63119d7c5764ee</a></p>
<p>I also attach an updated patch so that can be applied to Ruby 1.9.3.</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=385142013-04-13T20:30:38Znaruse (Yui NARUSE)naruse@airemix.jp
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>This issue was solved with changeset r40276.<br>
Tsuyoshi, thank you for reporting this issue.<br>
Your contribution to Ruby is greatly appreciated.<br>
May Ruby be with you.</p>
<hr>
<ul>
<li>Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee.<br>
[bug] fix problem with optimization of \z (Issue <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: example issue for ruby-1.8 (Closed)" href="https://redmine.ruby-lang.org/issues/16">#16</a>) [Bug <a class="issue tracker-4 status-5 priority-4 priority-default closed" title="Backport: Multibyte character interfering with end-line character within a regex (Closed)" href="https://redmine.ruby-lang.org/issues/8210">#8210</a>]</li>
</ul> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=385152013-04-13T20:32:43Znaruse (Yui NARUSE)naruse@airemix.jp
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Backport</i></li><li><strong>Project</strong> changed from <i>Ruby master</i> to <i>Backport200</i></li><li><strong>Category</strong> deleted (<del><i>M17N</i></del>)</li><li><strong>Status</strong> changed from <i>Closed</i> to <i>Assigned</i></li><li><strong>Assignee</strong> changed from <i>naruse (Yui NARUSE)</i> to <i>nagachika (Tomoyuki Chikanaga)</i></li><li><strong>Target version</strong> deleted (<del><i>2.1.0</i></del>)</li></ul> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=385162013-04-13T22:17:18Zk_takata (Ken Takata)
<ul></ul><p>I think it's better to backport this patch to Ruby 1.9.3 too.</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=387652013-04-20T01:40:35Znagachika (Tomoyuki Chikanaga)nagachika00@gmail.com
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Closed</i></li></ul><p>This issue was solved with changeset r40384.<br>
Tsuyoshi, thank you for reporting this issue.<br>
Your contribution to Ruby is greatly appreciated.<br>
May Ruby be with you.</p>
<hr>
<p>merge revision(s) 40276: [Backport <a class="issue tracker-4 status-5 priority-4 priority-default closed" title="Backport: Multibyte character interfering with end-line character within a regex (Closed)" href="https://redmine.ruby-lang.org/issues/8210">#8210</a>]</p>
<pre><code>* Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee.
[bug] fix problem with optimization of \z (Issue #16) [Bug #8210]
</code></pre> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=387662013-04-20T01:45:18Znagachika (Tomoyuki Chikanaga)nagachika00@gmail.com
<ul><li><strong>Project</strong> changed from <i>Backport200</i> to <i>Backport193</i></li><li><strong>Status</strong> changed from <i>Closed</i> to <i>Assigned</i></li><li><strong>Assignee</strong> changed from <i>nagachika (Tomoyuki Chikanaga)</i> to <i>usa (Usaku NAKAMURA)</i></li></ul><p>Move to Backport93.<br>
But Onigmo is merged after 2.0. I didn't confirm this patch can merge to ruby_1_9_3...</p> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=393212013-05-14T10:41:29Zusa (Usaku NAKAMURA)usa@garbagecollect.jp
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Closed</i></li></ul><p>This issue was solved with changeset r40713.<br>
Tsuyoshi, thank you for reporting this issue.<br>
Your contribution to Ruby is greatly appreciated.<br>
May Ruby be with you.</p>
<hr>
<ul>
<li>regexec.c (onig_search): fix problem with optimization of \z.<br>
[Backport <a class="issue tracker-4 status-5 priority-4 priority-default closed" title="Backport: Multibyte character interfering with end-line character within a regex (Closed)" href="https://redmine.ruby-lang.org/issues/8210">#8210</a>]<br>
patched by k_tanaka at <a href="/issues/8210">[ruby-core:54251]</a>.</li>
</ul> Backport193 - Backport #8210: Multibyte character interfering with end-line character within a regexhttps://redmine.ruby-lang.org/issues/8210?journal_id=393292013-05-14T18:23:22Zk_takata (Ken Takata)
<ul></ul><p>Hi usa,</p>
<blockquote>
<ul>
<li>regexec.c (onig_search): fix problem with optimization of \z.<br>
[Backport <a class="issue tracker-4 status-5 priority-4 priority-default closed" title="Backport: Multibyte character interfering with end-line character within a regex (Closed)" href="https://redmine.ruby-lang.org/issues/8210">#8210</a>]<br>
patched by k_tanaka at <a href="/issues/8210">[ruby-core:54251]</a>.</li>
</ul>
</blockquote>
<p>Thank you for merging my patch.<br>
BTW, my name is not tanaka...</p>