Project

General

Profile

Feature #15997

Updated by ioquatix (Samuel Williams) over 5 years ago

https://github.com/ruby/ruby/pull/2224 

 This PR improves the performance of fiber allocation and reuse by implementing a better stack cache. As per @ko1's request, we also increased fiber stack size to be the same as thread stack size. 

 The fiber pool manages a singly linked list of fiber pool allocations. The fiber pool allocation contains 1 or more stack (typically more, e.g. 512). It uses N^2 allocation strategy, starting at 8 initial stacks, next is 8, 16, 32, etc. 

 ``` 
 // 
 // base = +-------------------------------+-----------------------+    + 
 //          |VM Stack         |VM Stack         |                         |    | 
 //          |                 |                 |                         |    | 
 //          |                 |                 |                         |    | 
 //          +-------------------------------+                         |    | 
 //          |Machine Stack    |Machine Stack    |                         |    | 
 //          |                 |                 |                         |    | 
 //          |                 |                 |                         |    | 
 //          |                 |                 | .    .    .    .              |    |    size 
 //          |                 |                 |                         |    | 
 //          |                 |                 |                         |    | 
 //          |                 |                 |                         |    | 
 //          |                 |                 |                         |    | 
 //          |                 |                 |                         |    | 
 //          +-------------------------------+                         |    | 
 //          |Guard Page       |Guard Page       |                         |    | 
 //          +-------------------------------+-----------------------+    v 
 // 
 //          +-------------------------------------------------------> 
 // 
 //                                    count 
 // 
 ``` 

 The performance improvement depends on usage: 

 ``` 
 Calculating ------------------------------------- 
                      compare-ruby    built-ruby  
   vm2_fiber_allocate       132.900k      180.852k i/s -      100.000k times in 0.752447s 0.552939s 
      vm2_fiber_count         5.317k      110.724k i/s -      100.000k times in 18.806479s 0.903145s 
      vm2_fiber_reuse        160.128       347.663 i/s -       200.000 times in 1.249003s 0.575269s 
     vm2_fiber_switch        13.429M       13.490M i/s -       20.000M times in 1.489303s 1.482549s 

 Comparison: 
                vm2_fiber_allocate 
           built-ruby:      180851.6 i/s  
         compare-ruby:      132899.7 i/s - 1.36x    slower 

                   vm2_fiber_count 
           built-ruby:      110724.3 i/s  
         compare-ruby:        5317.3 i/s - 20.82x    slower 

                   vm2_fiber_reuse 
           built-ruby:         347.7 i/s  
         compare-ruby:         160.1 i/s - 2.17x    slower 

                  vm2_fiber_switch 
           built-ruby:    13490282.4 i/s  
         compare-ruby:    13429100.0 i/s - 1.00x    slower 
 ``` 

 This test is run on Linux server with 64GB memory and 4-core Xeon (Intel Xeon CPU E3-1240 v6 @ 3.70GHz). "compare-ruby" is `master`, and "built-ruby" is `master+fiber-pool`. 

 Additionally, we conservatively use `madvise(free)` to avoid swap space usage for unused fiber stacks. However, if you remove this requirement, we can get 6x - 10x performance improvement in `vm2_fiber_reuse` benchmark. There are some options to deal with this (e.g. moving it to `GC.compact`) but as this is still a net win, I'd like to merge this PR as is. 

Back