https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112017-07-17T06:39:26ZRuby Issue Tracking SystemRuby master - Feature #13750: Improve String#casecmp? and Symbol#casecmp? performance with ASCII stringhttps://redmine.ruby-lang.org/issues/13750?journal_id=658172017-07-17T06:39:26Zwatson1978 (Shizuo Fujita)watson1978@gmail.com
<ul></ul><p>Because String#casecmp? duplicates object at <code>rb_str_downcase()</code> every time, so String#casecmp? is slower than String#casecmp?</p> Ruby master - Feature #13750: Improve String#casecmp? and Symbol#casecmp? performance with ASCII stringhttps://redmine.ruby-lang.org/issues/13750?journal_id=801022019-07-26T21:29:45Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Feature</i></li><li><strong>Backport</strong> deleted (<del><i>2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN</i></del>)</li></ul> Ruby master - Feature #13750: Improve String#casecmp? and Symbol#casecmp? performance with ASCII stringhttps://redmine.ruby-lang.org/issues/13750?journal_id=860152020-06-07T17:15:11Zthe_spectator (Akshay Birajdar)
<ul></ul><p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/9869">@koic (Koichi ITO)</a> Made a new attempt with patch <a href="https://github.com/ruby/ruby/pull/2941" class="external">https://github.com/ruby/ruby/pull/2941</a></p> Ruby master - Feature #13750: Improve String#casecmp? and Symbol#casecmp? performance with ASCII stringhttps://redmine.ruby-lang.org/issues/13750?journal_id=860202020-06-08T13:39:53Zsawa (Tsuyoshi Sawada)
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/86020/diff?detail_id=57279">diff</a>)</li></ul> Ruby master - Feature #13750: Improve String#casecmp? and Symbol#casecmp? performance with ASCII stringhttps://redmine.ruby-lang.org/issues/13750?journal_id=900742021-01-24T07:12:35Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>When you avoid that case, you have a option around coderange: coderange is a cached information whether the string contains (1) only ASCII 7 bit characters (2) also has 8 bit characters (3) broken byte sequence (4) unknown. Some strings are already scanned its coderange and caches it in a string object, but others are not. Whether this casecmp? optimization uses the cache and not scan string if the cache doesn't exist, or scan if it doesn't have a cache. If you use the cache, I wonder whether strings in real applications have cache or not. If you scan, I wonder if it still gets faster.</p>
<p>Imagine <code>casecmp?</code> with following chatacters:</p>
<ul>
<li>"a" * 100000 + "A"</li>
<li>"a" * 100000 + "a"</li>
<li>"a" * 100000 + "À"</li>
<li>"a" * 100000 + "à"</li>
<li>"ab"<br>
Using <code>rb_enc_str_asciionly_p</code> enforces to scan whole strings if it doesn't have coderange cache, and it can be overhead.</li>
</ul>
<p>To avoid such trade off, I think you need to implement integrated casecmp with rb_str_casemap.</p>