Project

General

Profile

Feature #16557

Updated by Eregon (Benoit Daloze) 2 months ago

Pull Request: https://github.com/ruby/ruby/pull/2859 

 ### Context 

 Real world application contain many duplicated Regexp literals. 

 From a rails/console in Redmine: 

 ``` 
 >> ObjectSpace.each_object(Regexp).count 
 => 6828 
 >> ObjectSpace.each_object(Regexp).uniq.count 
 => 4162 
 >> ObjectSpace.each_object(Regexp).to_a.map { |r| ObjectSpace.memsize_of(r) }.sum 
 => 4611957 # 4.4 MB total 
 >> ObjectSpace.each_object(Regexp).to_a.map { |r| ObjectSpace.memsize_of(r) }.sum - ObjectSpace.each_object(Regexp).to_a.uniq.map { |r| ObjectSpace.memsize_of(r) }.sum 
 => 1490601 # 1.42 MB could be saved 
 ``` 

 Here's the to 10 most duplicated regexps in Redmine: 

 ``` 
 147: /"/ 
 107: /\s+/ 
 103: // 
 89: /\n/ 
 83: /'/ 
 76: /\s+/m 
 37: /\d+/ 
 35: /\[/ 
 33: /./ 
 33: /\\./ 
 ``` 

 Any empty Rails application will have a similar amount of regexps. 

 ### The feature 

 Since https://bugs.ruby-lang.org/issues/16377 made literal regexps frozen, it is possible to deduplicate literal regexps without changing any semantic and save a decent amount of resident memory. 


 ### The patch 

 I tried implementing this feature in a way very similar to the `frozen_strings` table, it's functional but I'm having trouble with a segfault on Linux: https://github.com/ruby/ruby/pull/2859

Back