Project

General

Profile

Feature #20709

Updated by zack.ref@gmail.com (Zack Deveau) 2 months ago

`String#rindex` is much slower on OSX than on Linux hosts. This appears due to the lack of a `memrchr` implementation on OSX. The [fallback solution](https://github.com/ruby/ruby/blob/e69945fc57efba5c8895c21235e109145865952d/string.c#L4348-L4402) currently performs a `memcmp` on each character in the search string looking for a substring match. 

 ```c 
     while (s) { 
         if (memcmp(s, t, slen) == 0) { 
             return s - sbeg; 
         } 
         if (s <= sbeg) break; 
         s = rb_enc_prev_char(sbeg, s, e, enc); 
     } 
 ``` 


 ``` 
 ruby-master | 10 char substring (long) near the end of a search string | osx 

 search string enc: UTF-8 
 search string size: 8010 

                         user       system        total          real 
 index (long substring)    0.000075     0.000000     0.000075 (    0.000074) 
 rindex                    0.000445     0.000000     0.000445 (    0.000448) 
 ``` 

 Proposed improvement is available on this pull request: 
 https://github.com/ruby/ruby/pull/11519 


 



 This patch proposes using an implementation of `memrchr` made for the interpreter on hosts that lack a native implementation. Below are benchmark results which demonstrate that this approach either improves or maintains performance of `String#rindex` for all cases we could come up with. 

 ```c 
 static void* 
 rb_memrchr(const char *search_str, int chr, long search_len) 
 { 
     const char *ptr = search_str + search_len; 
     do { 
        if (*(--ptr) == chr) return (void *)ptr; 
     } while (ptr >= search_str); 

     return ((void *)0); 
 } 

 ``` 

 Special thanks to @jhawthorn who provided some helpful feedback on this solution during development! 

 # Benchmarks 

 Each of the following were made with a variant of the following general purpose benchmark. Note that `String#index` speed can be [impacted by the size of the provided substring](https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274) (Denoted as long/short substring in the results, assume long substring when none is specified). 
 ``` 
 require "benchmark" 

 substring = "98279827AA" 
 the_string = "" 
 800.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)} 
 the_string.concat(substring) 
 200.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)} 
 puts the_string.encoding 
 puts the_string.size 

 Benchmark.bm do |x| 
   x.report(:index) { 100.times { the_string.index(substring) } } 
   x.report(:rindex) { 100.times { the_string.rindex(substring) } } 
 end 
 ``` 

 ``` 
 ruby-master | substring near the end | osx 
 https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 

 UTF-8 

        user       system        total          real 
 index    0.000075     0.000000     0.000075 (    0.000074) 
 rindex    0.000445     0.000000     0.000445 (    0.000448) 
 ------------------------------------------------------ 
 ruby-patched | substring near the end | osx 
 https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 

 UTF-8 
                         user       system        total          real 
 index (short substring) 0.000400     0.000001     0.000401 (    0.000399) 
 index (long substring)    0.000080     0.000000     0.000080 (    0.000080) 
 rindex                    0.000057     0.000001     0.000058 (    0.000057) 
 ``` 


 ``` 
 ruby-master | substring in the middle | osx 

 UTF-8 
        user       system        total          real 
 index    0.000031     0.000000     0.000031 (    0.000030) 
 rindex    0.001086     0.000002     0.001088 (    0.001088) 
 ------------------------------------------------------ 
 ruby-patched | substring in the middle | osx 

 UTF-8 
        user       system        total          real 
 index    0.000075     0.000000     0.000075 (    0.000075) 
 rindex    0.000130     0.000000     0.000130 (    0.000133) 
 ``` 

 ``` 
 ruby-master | substring in the middle with a short substring | osx 
 https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 

 UTF-8 
                         user       system        total          real 
 index (short substring) 0.000243     0.000004     0.000247 (    0.000246) 
 rindex                    0.001176     0.000005     0.001181 (    0.001182) 
 ------------------------------------------------------ 
 ruby-patched | substring in the middle with a short substring | osx 
 https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 

 UTF-8 
                          user       system        total          real 
 index (short substring)    0.000255     0.000005     0.000260 (    0.000259) 
 rindex                     0.000130     0.000000     0.000130 (    0.000133) 
 ``` 

 ``` 
 ruby-master | substring at the end | osx 

 UTF-8 
        user       system        total          real 
 index    0.000049     0.000001     0.000050 (    0.000048) 
 rindex    0.000008     0.000001     0.000009 (    0.000007) 
 ------------------------------------------------------ 
 ruby-patched | substring at the end | osx 

 UTF-8 
        user       system        total          real 
 index    0.000075     0.000000     0.000075 (    0.000074) 
 rindex    0.000006     0.000000     0.000006 (    0.000006) 
 ``` 


 ``` 
 ruby-master | substring at the start | osx 

 UTF-8 
        user       system        total          real 
 index    0.000008     0.000001     0.000009 (    0.000007) 
 rindex    0.002444     0.000008     0.002452 (    0.002452) 
 ------------------------------------------------------ 
 ruby-patched | substring at the start | osx 

 UTF-8 
        user       system        total          real 
 index    0.000015     0.000000     0.000015 (    0.000015) 
 rindex    0.000280     0.000001     0.000281 (    0.000280) 
 ```

Back