Project

General

Profile

Actions

Feature #2065

closed

An ancestors iterator

Added by bahuvrihi (Simon Chiang) about 15 years ago. Updated over 12 years ago.

Status:
Rejected
Target version:

Description

=begin
I have implemented DSLs that add features to a class/module that should be inherited like methods. In those cases I end up iterating ancestors to find the first time a feature has been added (the same way I imagine methods are determined).

The issue is that SomeClass.ancestors regenerates the ancestors array each time it is called. Therefore this is relatively slow:

SomeClass.ancestors.each do |ancestor|
# ...
end

It would be nice if there were a method that iterates ancestors without generating the ancestors array:

SomeClass.each_ancestor do |ancestor|
# ...
end

This could improve the performance of DSLs that want to support method-like inheritance.
=end


Files

cache_benchmark.rb (1.79 KB) cache_benchmark.rb bahuvrihi (Simon Chiang), 03/19/2010 05:37 AM
Actions #1

Updated by rogerdpack (Roger Pack) about 15 years ago

=begin
You could write your own method:

class Object
def ancestors_cached
@ancestors ||= ancestors
end
end

Would that be sufficient? (It's harder to implement that in core since it would need to be invalidated whenever a class hierarchy changes).
-r
=end

Actions #2

Updated by bahuvrihi (Simon Chiang) about 15 years ago

=begin
Actually I think that gets to the heart of the matter. It's not sufficient to cache the ancestors because I want to iterate the most current class hierarchy. Doing so allows added features to really be method-like in their inheritance.

I figure somewhere in core the class hierarchy must already be iterated to generate the ancestors array. I'm envisioning a hook that lets you access that iteration and most likely break out of it at an early point in the ancestry.

As a side note, in my use case I offer caching like you suggest as an option to improve performance. However it's exactly as you point out... hard to invalidate at the right time.

=end

Actions #3

Updated by mame (Yusuke Endoh) over 14 years ago

  • Status changed from Open to Feedback

=begin
Hi,

It would be nice if there were a method that iterates ancestors without generating the ancestors array:

SomeClass.each_ancestor do |ancestor|
# ...
end

This could improve the performance of DSLs that want to support method-like inheritance.

Show a benchmark. I bet it is not bottleneck in your application.

It is endless to provide each_* method corresponding to any methods
that returns an array.

I'll close this ticket unless there is no objection.

As a side note, in my use case I offer caching like you suggest as an option to improve performance. However it's exactly as you point out... hard to invalidate at the right time.

Cannot Module#included be used?

class Class
AncestorsCache = {}
def ancestors_cached
AncestorsCache[self] ||= ancestors
end
end

class Module
def included(x)
Class::AncestorsCache.clear
end
end

class C; end
p C.ancestors_cached #=> [C, Object, Kernel, BasicObject]

module M; end
class C; include M; end
p C.ancestors_cached #=> [C, M, Object, Kernel, BasicObject]

--
Yusuke Endoh
=end

Actions #4

Updated by bahuvrihi (Simon Chiang) over 14 years ago

=begin
Ok, I attached a benchmark that is designed to measure the time required for generating the ancestors array. It does so by comparing 'klass.ancestors.each' vs caching 'klass.ancestor' and iterating 'cache.each' (ie in the second case the ancestors array is only generated once).

Running the attached script I get results like the following:

% ruby cache_benchmark.rb
Benchmark without cache
user system total real
A.value(:one) 0.300000 0.000000 0.300000 ( 0.307086)
B.value(:two) 0.310000 0.000000 0.310000 ( 0.305512)
C.value(:three) 0.310000 0.000000 0.310000 ( 0.307264)

Benchmark with cache
user system total real
A.value(:one) 0.240000 0.000000 0.240000 ( 0.242152)
B.value(:two) 0.240000 0.000000 0.240000 ( 0.244592)
C.value(:three) 0.250000 0.000000 0.250000 ( 0.243842)
Array.new 0.050000 0.000000 0.050000 ( 0.058072)

This is for 100k iterations, so no it's not a bottleneck but likewise that's not why I make this request. I make the request because it would be significantly faster to not generate the ancestors array each time (~20% in this case, which illustrates what is probably the maximum increase for getting an each_ancestor iterator). Notice that the increase in performance corresponds neatly with the time it take to generate 100k arrays.

I imagine somewhere the in-memory ancestry is iterated to make the ancestors array, right? If so I think exposing that iterator would be helpful to DSLs that implement method-like inheritance.

Also I agree that adding each_* methods for every method that return an array would be ridiculous but I'm only asking for a specific one with a specific purpose ;)

The module include method may work to the same end -- I need to explore further -- but I still make this request because it would be less technical to use an each_ancestor iterator, and ultimately less error-prone given all the things that ruby can do with including, extending, undefining and redefining constants. Thank you for the suggestion, and for looking at this.

=end

Actions #5

Updated by mame (Yusuke Endoh) over 14 years ago

=begin
Hi Simon,

2010/3/19 Simon Chiang :

This is for 100k iterations, so no it's not a bottleneck but likewise that's not why I make this request. I make the request because it would be significantly faster to not generate the ancestors array each time (~20% in this case, which illustrates what is probably the maximum increase for getting an each_ancestor iterator).

I cannot get your point. If you admit it is not bottleneck, I wonder
why you hope it faster.
You should know adding a method is not free in terms of maintenance.

Ok, I attached a benchmark that is designed to measure the time required for generating the ancestors array.

Thanks, I could understand what you want to do.

If you want speed at any rate, how about manual search by using
Class#superclass?
It is probably faster than each_ancestor because it does not
yield a block.

module Dsl2
def self.extended(base)
base.registry ||= {}
end

 def inherited(base)
   base.registry ||= {}
 end

 attr_accessor :registry

 def set(key, value)
   registry[key] = value
 end

 def value(key)
   klass = self
   while klass
     if klass.registry.has_key?(key)
       return klass.registry[key]
     end
     klass = klass.superclass
   end
 end

end

Benchmark without cache
user system total real
A.value(:one) 0.340000 0.000000 0.340000 ( 0.340906)
B.value(:two) 0.340000 0.000000 0.340000 ( 0.349000)
C.value(:three) 0.360000 0.000000 0.360000 ( 0.357215)

Benchmark with cache
user system total real
A.value(:one) 0.240000 0.000000 0.240000 ( 0.235161)
B.value(:two) 0.230000 0.000000 0.230000 ( 0.236722)
C.value(:three) 0.240000 0.000000 0.240000 ( 0.233931)
Array.new 0.080000 0.000000 0.080000 ( 0.084003)

Benchmark with manual search
user system total real
A2.value(:one) 0.130000 0.000000 0.130000 ( 0.122530)
B2.value(:two) 0.120000 0.000000 0.120000 ( 0.120361)
C2.value(:three) 0.110000 0.000000 0.110000 ( 0.119009)

--
Yusuke ENDOH

Attachment: benchmark.rb
=end

Actions #6

Updated by bahuvrihi (Simon Chiang) over 14 years ago

=begin
So to clarify... How much any piece of code of a bottleneck depends on how frequently the code gets run, and that is application-specific. In my own application, it IS enough of a bottleneck for me to make this request because I run the code very frequently. In other applications, it may not be a bottleneck.

For DSLs that want to have method-like inheritance, the lack of an ancestors iterator is a potential and unnecessary bottleneck. An each_ancestor iterator would, as per the benchmarks, significantly speed up a useful programming pattern that I think would be used more frequently if it were inherently faster. I hope that clarifies the purpose of this request.

Using superclass would work in the example I gave you but not in general because as I understand it superclass does not visit modules. Using ancestors allows for method-like 'inheritance' because you can visit both modules and superclasses. Using superclass limits you to the class hierarchy.

=end

Actions #7

Updated by mame (Yusuke Endoh) over 14 years ago

=begin
Hi,

2010/3/20 Simon Chiang :

So to clarify... ?How much any piece of code of a bottleneck depends on how frequently the code gets run, and that is application-specific. ?In my own application, it IS enough of a bottleneck for me to make this request because I run the code very frequently. ?In other applications, it may not be a bottleneck.

For DSLs that want to have method-like inheritance, the lack of an ancestors iterator is a potential and unnecessary bottleneck. ?An each_ancestor iterator would, as per the benchmarks, significantly speed up a useful programming pattern that I think would be used more frequently if it were inherently faster. I hope that clarifies the purpose of this request.

I doubt if there is few real application whose bottleneck is
Module#ancestors.

Using superclass would work in the example I gave you but not in general because as I understand it superclass does not visit modules. ?Using ancestors allows for method-like 'inheritance' because you can visit both modules and superclasses. Using superclass limits you to the class hierarchy.

Indeed.
I'd like to make up for my stupid suggestion by giving a patch :-)
I can commit this if matz approves.

diff --git a/class.c b/class.c
index fed2edf..51bc162 100644
--- a/class.c
+++ b/class.c
@@ -763,6 +763,42 @@ rb_mod_ancestors(VALUE mod)
return ary;
}

+/*

    • call-seq:
    • mod.each_ancestor -> nil
      
    • Yields each module included in mod (including mod
    • itself).
    • module Mod
      
    •   include Math
      
    •   include Comparable
      
    • end
      
    • Mod.each_ancestor {|c| p c }  #=> Mod, Comparable, Math
      
    • Math.each_ancestor {|c| p c } #=> Math
      
  • */

+VALUE
+rb_mod_each_ancestor(VALUE mod)
+{

  • VALUE p;
  • RETURN_ENUMERATOR(mod, 0, 0);
  • for (p = mod; p; p = RCLASS_SUPER(p)) {
  • if (FL_TEST(p, FL_SINGLETON))
  •  continue;
    
  • if (BUILTIN_TYPE(p) == T_ICLASS) {
  •  rb_yield(RBASIC(p)->klass);
    
  • }
  • else {
  •  rb_yield(p);
    
  • }
  • }
  • return Qnil;
    +}

#define VISI(x) ((x)&NOEX_MASK)
#define VISI_CHECK(x,f) (VISI(x) == (f))

diff --git a/object.c b/object.c
index 0421824..6c8766e 100644
--- a/object.c
+++ b/object.c
@@ -2449,6 +2449,8 @@ rb_f_array(VALUE obj, VALUE arg)
return rb_Array(arg);
}

+extern VALUE rb_mod_each_ancestor(VALUE mod);
+
/*

  • Document-class: Class

@@ -2657,6 +2659,7 @@ Init_Object(void)
rb_define_method(rb_cModule, "include?", rb_mod_include_p, 1); /*
in class.c /
rb_define_method(rb_cModule, "name", rb_mod_name, 0); /
in variable.c /
rb_define_method(rb_cModule, "ancestors", rb_mod_ancestors, 0);
/
in class.c */

  • rb_define_method(rb_cModule, "each_ancestor",
    rb_mod_each_ancestor, 0); /* in class.c */

    rb_define_private_method(rb_cModule, "attr", rb_mod_attr, -1);
    rb_define_private_method(rb_cModule, "attr_reader",
    rb_mod_attr_reader, -1);
    diff --git a/test/ruby/test_module.rb b/test/ruby/test_module.rb
    index f905431..4330d69 100644
    --- a/test/ruby/test_module.rb
    +++ b/test/ruby/test_module.rb
    @@ -216,6 +216,15 @@ class TestModule < Test::Unit::TestCase

remove_rake_mixins(remove_json_mixins(remove_pp_mixins(String.ancestors))))
end

  • def test_each_ancestor
  • a = []
  • User.each_ancestor {|m| a << m }
  • assert_equal([User, Mixin], a)
  • a = []
  • Mixin.each_ancestor {|m| a << m }
  • assert_equal([Mixin], a)
  • end
  • CLASS_EVAL = 2
    @@class_eval = 'b'

--
Yusuke ENDOH

=end

Actions #8

Updated by now (Nikolai Weibull) over 14 years ago

=begin
On Fri, Mar 19, 2010 at 16:45, Yusuke ENDOH wrote:

I'd like to make up for my stupid suggestion by giving a patch :-)
I can commit this if matz approves.

If I understood everything so far, we’re talking about 0.05–0.07
seconds per 100 000 calls. In what piece of code is that a
bottleneck?

=end

Actions #9

Updated by now (Nikolai Weibull) over 14 years ago

=begin
On Fri, Mar 19, 2010 at 17:40, Nikolai Weibull wrote:

On Fri, Mar 19, 2010 at 16:45, Yusuke ENDOH wrote:

I'd like to make up for my stupid suggestion by giving a patch :-)
I can commit this if matz approves.

If I understood everything so far, we’re talking about 0.05–0.07
seconds per 100 000 calls.  In what piece of code is that a
bottleneck?

I also wonder how much having that extra method slows down method
lookup overall.

=end

Actions #10

Updated by mame (Yusuke Endoh) over 14 years ago

=begin
2010/3/20 Nikolai Weibull :

On Fri, Mar 19, 2010 at 17:40, Nikolai Weibull wrote:

On Fri, Mar 19, 2010 at 16:45, Yusuke ENDOH wrote:

I'd like to make up for my stupid suggestion by giving a patch :-)
I can commit this if matz approves.

If I understood everything so far, we’re talking about 0.05–0.07
seconds per 100 000 calls.  In what piece of code is that a
bottleneck?

I also want to know.

I also wonder how much having that extra method slows down method
lookup overall.

:-)

--
Yusuke ENDOH

=end

Actions #11

Updated by bahuvrihi (Simon Chiang) over 14 years ago

=begin
Well, what can I say? When you put it that way I'm sure this can all seem quite trivial in the grand scheme. Truth is most bottlenecks, including this one, are not so bad... just wait a second. Ruby will be fine with or without an each_ancestors iterator; it's my opinion it would be better with one.

But this suggestion it would slowdown method lookup overall -- is that an issue? I don't know. It is interesting because if having extra methods significantly slows down the lookup then I wonder how much faster ruby would be without the many generally unused methods there are in the language. I'm not trying to be provocative with this question, it is to me an interesting point.

Thanks for writing up a patch! I think it's an improvement but of course I'm not always right. :)

=end

Actions #12

Updated by znz (Kazuhiro NISHIYAMA) over 14 years ago

  • Category set to core
  • Target version set to 2.0.0

=begin

=end

Updated by nahi (Hiroshi Nakamura) over 12 years ago

  • Description updated (diff)
  • Assignee set to mame (Yusuke Endoh)

Updated by mame (Yusuke Endoh) over 12 years ago

  • Status changed from Feedback to Rejected

Hello,

I'm not keen to import this feature.

I'm closing the ticket. Please reopen it if you find a piece of
code in the wild whose bottleneck is solved by #each_ancestor.

--
Yusuke Endoh

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0