Feature #16984

Updated by alanwu (Alan Wu) 2 months ago

Consider the following code: 

 module M 
   def foo; end 
   def bar; end 

 class C 
   include M 

 The object reference graph from running the code looks like this: 

 +---+                +-----+ 
 | M |--------------| foo |-+ 
 +---+                +-----+ | 
   |                  +-----+ | 
   +----------------| bar | | 
                    +-----+ | 
 +-----------+           |      | 
 | iclass(M) |---------+      | 

 Applying the proposed patch, the graph becomes 

 +---+           +--------------+     +-----+ 
 | M |---------| method table |---| foo | 
 +---+           +--------------+     +-----+ 
 +-----------+           |      |       +-----+ 
 | iclass(M) |---------+      +-----| bar | 
 +-----------+                      +-----+ 


 This change has a similar effect on the constant table. In addition to this, T_ICLASS no longer 
 holds a reference to a ivar table. Code that access the ivar table through iclasses 
 are changed to access it through the object from which the iclass was made. This change 
 impacts autoload and class variable lookup. 

 ## Why? 

 The main goal of this change is to make iclasses and modules write barrier protected. At the moment, they are 
 "shady", which means the GC has to do extra work to handle them. In code bases that use modules a lot, 
 iclasses can easily take up a significant portion of the heap and impact GC time. Inserting write barriers was 
 tricky in 

 In the old setup, because of the way `M` and `iclass(M)` share the method table. 

 Having table, adding a single method 
 to `M` would create multiple edges on the object reference graph. To safely make `M` and `iclass(M)` 
 write barrier protected, one would need to trigger a write barrier for iclasses mean they can age each new edge. This would 
 make the amount of work it takes to add a method a function of the number of times the target module is 

 The new setup also factors the edges in the generational GC. graph. If the number of methods in a module is `M` and the number 
 Once aged, of times the GC can sometimes skip subgraphs rooted at these objects, improving performance. module is prepended or included is `N`, the old setup had `M * (N+1)` edges. The new setup has 
 `M + N + 1` edges instead. For large enough `M` and `N`, the new setup produces fewer edges. Having fewer 
 edges is better since the GC's work is proportional to the number of edges. 

 ## Impact to GC time 

 I measured the impact to minor GC time with the following steps: 
  - load an application 
  - run `GC::Profile.enable` 
  - allocate 50 million objects 
  - run `` 

 Here is the impact to average minor GC time on various apps: 

 |Application               |       Before      |    After    | Speedup ratio | 
 |CRuby's test-all suite    |    2.438ms        | 2.289ms |     1.06          | 
 |`rails new` app           |    1.911ms        | 1.798ms |     1.06          | 
 |Private app A             |    5.182ms        | 5.168ms |     1.00          | 
 |Private app B             |    185.7ms        | 107.9ms |     1.72          | 

 Private app A's heap size is about 22 MiB compared to B's 250 MiB. 
 App B boots up about 15% faster with this change. 

 ## Impact to class variable lookup 

 I included a benchmark in the patch to measure the impact to class variable lookup performance. 
 The difference seems negligible. 

 ## Conclusion 

 This change seems to reduce minor GC time for real-world applications. 


 Credits to @tenderlovemaking for coming up with the idea for this change.