Feature #11158
closedIntroduce a Symbol.count API as a more efficient alternative to Symbol.all_symbols.size
Description
We're in the process of migrating a very large Rails codebase from a Ruby 2.1.6 runtime to Ruby 2.2.2 and as part of this migration process would like to keep track of Symbol counts and Symbol GC efficiency in our metrics system. Preferably still while on 2.1 (however this implies a backport to 2.1 as well), but would definitely be useful in 2.2 as well.
Currently the recommended and only reliable way to get to the Symbol counts is via Symbol.all_symbols.size, which:
- Allocates an Array
- rb_ary_push and walking the symbol table isn't exactly efficient
Here's some benchmarks:
./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.count } }"
#<Benchmark::Tms:0x007f8bc208bdd0 @label="", @real=0.0011274919961579144, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.01, @total=0.01>
./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.all_symbols.size } }"
#<Benchmark::Tms:0x007fa47205a550 @label="", @real=0.3135859479953069, @cstime=0.0, @cutime=0.0, @stime=0.03, @utime=0.29, @total=0.31999999999999995>
I implemented and attached a patch for a simple Symbol.count API that just returns a numeric version of the symbol table size, without having to do any iteration.
Please let me know if this is inline with an expected core API, anything I could clean up further and if there's any possibility of such a change also being backported to 2.1 as well? (happy to create a new patch for 2.1)
Files
        
           Updated by nobu (Nobuyoshi Nakada) over 10 years ago
          Updated by nobu (Nobuyoshi Nakada) over 10 years ago
          
          
        
        
      
      Lourens Naudé wrote:
Please let me know if this is inline with an expected core API, anything I could clean up further and if there's any possibility of such a change also being backported to 2.1 as well? (happy to create a new patch for 2.1)
New features are never backported to 2.2 or earlier.
        
           Updated by methodmissing (Lourens Naudé) over 10 years ago
          Updated by methodmissing (Lourens Naudé) over 10 years ago
          
          
        
        
      
      Makes sense, my bad, thanks for the consideration.
        
           Updated by marcandre (Marc-Andre Lafortune) over 10 years ago
          Updated by marcandre (Marc-Andre Lafortune) over 10 years ago
          
          
        
        
      
      - Assignee set to matz (Yukihiro Matsumoto)
I'd recommend instead to introduce Symbol.each, which would accept a block and return an Enumerable when none is given.
Symbol.each.size would be then be an efficient (lazy) way of getting the number of symbols, and it would be a more versatile method in case someone wants to iterate on all Symbols for other purposes
        
           Updated by methodmissing (Lourens Naudé) over 10 years ago
          Updated by methodmissing (Lourens Naudé) over 10 years ago
          
          
        
        
      
      Sounds good, I'll take a stab tonight.
        
           Updated by methodmissing (Lourens Naudé) over 10 years ago
          Updated by methodmissing (Lourens Naudé) over 10 years ago
          
          
        
        
      
      - File symbol_enumerator.patch symbol_enumerator.patch added
Please find attached the changes as per Marc-Andre's suggestions. Exposes Symbol.each and extends with Enumerable
  def test_each
    x = Symbol.each.size
    assert_kind_of(Fixnum, x)
    assert_equal x, Symbol.all_symbols.size
    assert_equal x, Symbol.count
    assert_equal Symbol.to_a, Symbol.all_symbols
    answer_to_life = :bacon_lettuce_tomato
    assert_equal [:bacon_lettuce_tomato], Symbol.grep(/bacon_lettuce_tomato/)
  end
Calling size on the enumerator is super efficient.
$ ./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.each.size } }"
#<Benchmark::Tms:0x007fea32039688 @label="", @real=0.005798012993182056, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.01, @total=0.01>
Symbol.count isn't though (not sure if it's possible to replace the definition with Symbol.each.size instead)
$ ./miniruby -Ilib -rbenchmark -e "p Benchmark.measure { 10_000.times{ Symbol.count } }"
#<Benchmark::Tms:0x007fa47907afb0 @label="", @real=0.36278180500085, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.36, @total=0.36>
Thoughts?
        
           Updated by akr (Akira Tanaka) over 10 years ago
          Updated by akr (Akira Tanaka) over 10 years ago
          
          
        
        
      
      - Related to Feature #9963: Symbol.count added
        
           Updated by ko1 (Koichi Sasada) over 10 years ago
          Updated by ko1 (Koichi Sasada) over 10 years ago
          
          
        
        
      
      - Assignee changed from matz (Yukihiro Matsumoto) to ko1 (Koichi Sasada)
        
           Updated by cesario (Franck Verrot) over 10 years ago
          Updated by cesario (Franck Verrot) over 10 years ago
          
          
        
        
      
      Lourens Naudé wrote:
Please find attached the changes as per Marc-Andre's suggestions. Exposes
Symbol.eachand extends withEnumerable
Hi Lourens,
I'm not sure to fully understand why we make Symbol extend Enumerable rather than returning a new enumerator object (probably also extending Enumerable) ? Isn't there way to much overhead to include Enumerable in Symbol?
Thoughts?
Nice work!
        
           Updated by ko1 (Koichi Sasada) over 10 years ago
          Updated by ko1 (Koichi Sasada) over 10 years ago
          
          
        
        
      
      I don't against introduce Symbol.each for shortcut of Symbol.all_symbols.each.
However, For measurement purpose, we should introduce new measurement API into ObjectSpace because they have several types.
    |immortal | mortal
--------+:-------:+:------:
static  |   (1)   |   (2)
dynamic |   (3)   |   (4)
- Immortal symbols
- Static immortal symbols (1)
- Dynamic immortal symbols (3)
 
- Dynamic mortal symbols (4)
There are no (2) type symbols.
Current Symbol.all_symbols.size returns (1) + (3) + (4).
Maybe the number of (1) and (2) (or (1+2)) will be helpful for some kind of people who want to know details.
        
           Updated by methodmissing (Lourens Naudé) over 10 years ago
          Updated by methodmissing (Lourens Naudé) over 10 years ago
          
          
        
        
      
      Thanks for the feedback - I'll take a stab and circle back.
        
           Updated by marcandre (Marc-Andre Lafortune) over 10 years ago
          Updated by marcandre (Marc-Andre Lafortune) over 10 years ago
          
          
        
        
      
      Franck Verrot wrote:
I'm not sure to fully understand why we make
SymbolextendEnumerablerather than returning a new enumerator object
It's not "rather than". Symbol.each without a block will return an Enumerator, that we extend Enumerable or not.
Isn't there way to much overhead to include
EnumerableinSymbol?
Not sure what you mean by overhead. There's no performance cost to it. It adds a bunch of methods to Symbol, and many won't be helpful (I doubt someone would use Symbol.map{...}, but I 'm not sure I see the downside.
        
           Updated by cesario (Franck Verrot) over 10 years ago
          Updated by cesario (Franck Verrot) over 10 years ago
          
          
        
        
      
      Marc-Andre Lafortune wrote:
Franck Verrot wrote:
Isn't there way to much overhead to include
EnumerableinSymbol?Not sure what you mean by overhead. There's no performance cost to it. It adds a bunch of methods to
Symbol, and many won't be helpful (I doubt someone would useSymbol.map{...}, but I 'm not sure I see the downside.
Sorry I haven't formulated this right :-) I was only wondering if including Enumerable in Symbol could lead some of us to rely on methods (like map as you said) that weren't really thought through at the time we introduced each. Maybe that doesn't make sense, so feel free to ignore this comment... still new to the Ruby VM internals and ways of designing its APIs :-)
Thanks!
        
           Updated by ko1 (Koichi Sasada) about 10 years ago
          Updated by ko1 (Koichi Sasada) about 10 years ago
          
          
        
        
      
      - Status changed from Open to Closed
Applied in changeset r51654.
- ext/objspace/objspace.c: add a new method ObjectSpace.count_symbols.
 [Feature #11158]
- symbol.c (rb_sym_immortal_count): added to count immortal symbols.
- symbol.h: ditto.
- test/objspace/test_objspace.rb: add a test for this method.
- NEWS: describe about this method.