Feature #20709
Updated by zack.ref@gmail.com (Zack Deveau) 3 months ago
`String#rindex` is much slower on OSX than on Linux hosts. This appears due to the lack of a `memrchr` implementation on OSX. The [fallback solution](https://github.com/ruby/ruby/blob/e69945fc57efba5c8895c21235e109145865952d/string.c#L4348-L4402) currently performs a `memcmp` on each character in the search string looking for a substring match. ```c while (s) { if (memcmp(s, t, slen) == 0) { return s - sbeg; } if (s <= sbeg) break; s = rb_enc_prev_char(sbeg, s, e, enc); } ``` ``` ruby-master | 10 char substring (long) near the end of a search string | osx search string enc: UTF-8 search string size: 8010 user system total real index (long substring) 0.000075 0.000000 0.000075 ( 0.000074) rindex 0.000445 0.000000 0.000445 ( 0.000448) ``` Proposed improvement is available on this pull request: https://github.com/ruby/ruby/pull/11519 This patch proposes using an implementation of `memrchr` made for the interpreter on hosts that lack a native implementation. Below are benchmark results which demonstrate that this approach either improves or maintains performance of `String#rindex` for all cases we could come up with. ```c static void* rb_memrchr(const char *search_str, int chr, long search_len) { const char *ptr = search_str + search_len; do { if (*(--ptr) == chr) return (void *)ptr; } while (ptr >= search_str); return ((void *)0); } ``` Special thanks to @jhawthorn who provided some helpful feedback on this solution during development! # Benchmarks Each of the following were made with a variant of the following general purpose benchmark. Note that `String#index` speed can be [impacted by the size of the provided substring](https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274) (Denoted as long/short substring in the results, assume long substring when none is specified). ``` require "benchmark" substring = "98279827AA" the_string = "" 800.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)} the_string.concat(substring) 200.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)} puts the_string.encoding puts the_string.size Benchmark.bm do |x| x.report(:index) { 100.times { the_string.index(substring) } } x.report(:rindex) { 100.times { the_string.rindex(substring) } } end ``` ``` ruby-master | substring near the end | osx https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 UTF-8 user system total real index 0.000075 0.000000 0.000075 ( 0.000074) rindex 0.000445 0.000000 0.000445 ( 0.000448) ------------------------------------------------------ ruby-patched | substring near the end | osx https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 UTF-8 user system total real index (short substring) 0.000400 0.000001 0.000401 ( 0.000399) index (long substring) 0.000080 0.000000 0.000080 ( 0.000080) rindex 0.000057 0.000001 0.000058 ( 0.000057) ``` ``` ruby-master | substring in the middle | osx UTF-8 user system total real index 0.000031 0.000000 0.000031 ( 0.000030) rindex 0.001086 0.000002 0.001088 ( 0.001088) ------------------------------------------------------ ruby-patched | substring in the middle | osx UTF-8 user system total real index 0.000075 0.000000 0.000075 ( 0.000075) rindex 0.000130 0.000000 0.000130 ( 0.000133) ``` ``` ruby-master | substring in the middle with a short substring | osx https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 UTF-8 user system total real index (short substring) 0.000243 0.000004 0.000247 ( 0.000246) rindex 0.001176 0.000005 0.001181 ( 0.001182) ------------------------------------------------------ ruby-patched | substring in the middle with a short substring | osx https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 UTF-8 user system total real index (short substring) 0.000255 0.000005 0.000260 ( 0.000259) rindex 0.000130 0.000000 0.000130 ( 0.000133) ``` ``` ruby-master | substring at the end | osx UTF-8 user system total real index 0.000049 0.000001 0.000050 ( 0.000048) rindex 0.000008 0.000001 0.000009 ( 0.000007) ------------------------------------------------------ ruby-patched | substring at the end | osx UTF-8 user system total real index 0.000075 0.000000 0.000075 ( 0.000074) rindex 0.000006 0.000000 0.000006 ( 0.000006) ``` ``` ruby-master | substring at the start | osx UTF-8 user system total real index 0.000008 0.000001 0.000009 ( 0.000007) rindex 0.002444 0.000008 0.002452 ( 0.002452) ------------------------------------------------------ ruby-patched | substring at the start | osx UTF-8 user system total real index 0.000015 0.000000 0.000015 ( 0.000015) rindex 0.000280 0.000001 0.000281 ( 0.000280) ```