Project

General

Profile

Actions

Bug #18622

open

const_get still looks in Object, while lexical constant lookup no longer does

Added by Eregon (Benoit Daloze) 4 months ago. Updated 2 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
[ruby-core:107832]

Description

There is some inconsistency here between literal constant lookup and the meta API (const_get).

Lexical constant lookup no longer uses a special case for Object, and this is good as it avoids surprises: #11547

However, const_get still looks in Object, even though that's confusing, inconsistent and IMHO shouldn't really happen.

module ConstantSpecsTwo
  Foo = :cs_two_foo
end

module ConstantSpecs
end

p ConstantSpecs.const_get("ConstantSpecsTwo::Foo") # => :cs_two_foo
p ConstantSpecs::ConstantSpecsTwo::Foo # => const_get.rb:9:in `<main>': uninitialized constant ConstantSpecs::ConstantSpecsTwo (NameError)

I think we should change it so both behave the same (i.e., NameError).
It's like if cd /foo/bar would go to /bar if /foo/bar does not exist and /bar does.

const_get is a meta API so it cannot know the surrounding Module.nesting, but so I think it should consider the receiver of const_get as the only nesting (so just ConstantSpecs in this case, not Object).

Note this does not affect nested constants inside the const_get like Foo above (another way to look at the inconsistency that the first component is treated differently).

From https://bugs.ruby-lang.org/issues/11547#note-19


Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #11547: remove top-level constant lookupClosedActions
Actions #1

Updated by Eregon (Benoit Daloze) 4 months ago

Updated by NuriYuri (Youri Nouri) 4 months ago

I always believed it was expected because ConstantSpecs.const_get("ConstantSpecsTwo::Foo") behave like:

module ConstantSpecs
  p ConstantSpecsTwo::Foo
end

If you call ConstantSpecs.const_get("ConstantSpecsTwo::Foo", false) it'll be equivalent to ConstantSpecs::ConstantSpecsTwo::Foo.

ConstantSpecs.const_get("ConstantSpecsTwo::Foo", false)
# => (irb):8:in `const_get': uninitialized constant ConstantSpecs::ConstantSpecsTwo (NameError)

Updated by Eregon (Benoit Daloze) 4 months ago

NuriYuri (Youri Nouri) wrote in #note-2:

I always believed it was expected because ConstantSpecs.const_get("ConstantSpecsTwo::Foo") behave like:

Interesting, I didn't think of it like that, but I can see that point of view.
The thing there is that code has a lookup nesting of [ConstantSpecs, Object] (Module.nesting doesn't show Object but it's there internally).

If you call ConstantSpecs.const_get("ConstantSpecsTwo::Foo", false) it'll be equivalent to ConstantSpecs::ConstantSpecsTwo::Foo.

Not really because lexical lookup does look in the ancestor chains, while that does not (but unused for this example).
e.g. class D; A=:f; end; class C < D; p A; end; p C::A works and prints :f twice.


IMHO the problematic/inconsistent/surprising behavior is this:

Enumerable.const_get(:Process) # => Process, BAD
Enumerable::Process # => uninitialized constant Enumerable::Process (NameError), GOOD
Object.const_get("Enumerable::Process") # => uninitialized constant Enumerable::Process (NameError), GOOD

i.e., it seems very similar to the pitfall explained in #11547.

Updated by fxn (Xavier Noria) 4 months ago

I do not know which is the correct definition of const_get, Ruby core knows :).

However, let me say that I have always thought about const_get as what happens during a relative constant lookup. That is, in my mind, a relative constant lookup is basically: 1) Check the nesting + 2) const_get.

Take for example:

Module.new.const_get(:String) # => String

You get String there, but it is not in the ancestors. Why? Because it is checking Object by hand, since the receiver is a module, like the relative constant lookup algorithm does.

I believe

String.const_get(:Hash) # => Hash

has to be understood that way.

Updated by NuriYuri (Youri Nouri) 4 months ago

If the current definition is correct, could we add as a feature an additional lookup parameter (or method) that says "do not lookup in Object" so we can get a sort of Module.const_get(:Constant) strictly equivalent to Module::Constant. (I believe I also have cases like that and it's not practical to have to do Object.const_get("#{module_to_look_up}::#{const_to_lookup}")...)

Updated by fxn (Xavier Noria) 4 months ago

It would not be enough.

When you do a relative constant lookup in which the cref is a module, Object is checked by hand before calling const_missing. Take for example:

module M
  String
end

Why is String found there? It is not in the nesting. It is not in the ancestors. So? Ruby checks Object by hand and finds it there.

But, Ruby does not check Object by hand in constant paths:

M::String #=> NameError

However,

M.const_get(:String) #=> String

My point is that const_get implements the logic used in relative constant lookups. And in that point of view, checking Object is not an inconsistency or bug, but expected.

Updated by Eregon (Benoit Daloze) 4 months ago

fxn (Xavier Noria) wrote in #note-6:

Why is String found there? It is not in the nesting. It is not in the ancestors. So? Ruby checks Object by hand and finds it there.

Actually I'd argue it's in the nesting (just not shown by Module.nesting), the outermost (implicit) "root" lexical constant scope is always Object (and in fact it's implemented that way in TruffleRuby, maybe in CRuby too).
What's special about this root scope is it's checked after the ancestors of the innermost module (and so for classes there is no need as Object is in their ancestors, but for modules that's not the case).

From the implementation POV, about half of usages look in Object and the other half not, I'm not sure there is a general rule for it, it seems a bit ad-hoc currently.
Maybe the rule is if we're looking after a :: then don't look in Object but otherwise do?
And A.const_get(:B) is not considered like A::B but like module A; B; end?

This issue is mostly about do we still need to look in Object, in which cases, and could there be simpler overall rules for the various constant lookups?

Updated by fxn (Xavier Noria) 4 months ago

Actually I'd argue it's in the nesting (just not shown by Module.nesting), the outermost (implicit) "root" lexical constant scope is always Object (and in fact it's implemented that way in TruffleRuby, maybe in CRuby too).

I believe that is confounding the contract with your implementation.

From the POV of the programmer, Module.nesting tells you the nesting, Object is not in the nesting. Relative constant lookup checks first the nesting, if not found there, then you check the ancestors. If not found there, and the cref is a module, you additionally check Object

This is consistent with what Flanagan & Matz says on page 261:

Ruby first attempts to resolve a constant in the lexical scope of the reference. This means that it first checks the class or module that encloses the constant reference to see if that class or module defines the constant. If not, it checks the next enclosing class or module. This continues until there are no more classes or modules. Note that top-level or "global" constants are not considered part of the lexical scope and are not considered during this part of constant lookup. The class method Module.nesting returns the list of classes and modules that are searched in this step, in the order they are searched.

If no constant definition is found in the lexically closing scope, Ruby next tries to resolve the constant in the inheritance hierarchy by checking the ancestors of the class or module that referred to the constant.

If no constant definition is found in the inheritance hierarchy, then top-level constant definitions are checked.

Now, I don't know how TruffleRuby implements lookup, but I am pretty certain Object does not belong to the nesting, and that the public contract, conceptually, is that one.

And A.const_get(:B) is not considered like A::B but like module A; B; end?

This is what I am trying to say in my previous comments. And that is why looking at Object is not inconsistent. What happens in A::B is not relevant for const_get.

This issue is mostly about do we still need to look in Object, in which cases, and could there be simpler overall rules for the various constant lookups?

You look at Object by hand, because people expect String to be available everywhere (except in BasicObject, which is an edge case). So, whether the ancestors have Object or not is a technicality, from the POV of the user, you want module M; String; end to work. And that means you need to introduce a special rule for modules.

Personally, I think the lookup is pretty intuitive and easy to understand once documented.

Updated by fxn (Xavier Noria) 4 months ago

For completeness, let me add that that the described lookup is incomplete. Like all the lookup descriptions I've seen, it obviates Module#autoload. Also, the nesting is not strictly lexical. You can push an anonymous module to the nesting of a script with

load script, true

also, you can push an arbitrary class or module to the nesting with

A::B::C.eval_family_of_methods(STRING)

In any case, my conclusion is that const_get does what it is supposed to do. It is not inconsistent.

Updated by Eregon (Benoit Daloze) 4 months ago

That makes sense, I think we should improve const_get docs to says it's like module A; B; end and not A::B (which I'd think I'm not the only one to assume).

For example I think it's nice to be able to implement mod.const_lookup("B::C") (i.e., before const_get handled :: paths) as path.split('::').inject(mod) { _1.const_get(_2) } but that's actually not fully equivalent.

Regarding Object in the nesting, I feel it's consistency with the general situation for the top-level:

  • default definee: Object (clearly the case)
  • self: main, an Object (clearly the case)
  • cref: Object (clearly the case, but that's not same as being in the nesting unfortunately)

So the top-level is very similar to:

class Object
  private
  self = main # not instance_eval, that would change the default definee
  <source code>
end

But indeed with more subtleties for constant lookup so Object is looked last, after the module/class's ancestors.

I wish we could model constant lookup with just a nesting and have the invariant cref == nesting.first, but that doesn't hold for the top-level (AFAIK only for that case), unless the root constant scope is treated specially (as it is in TruffleRuby, and as we see the top-level constant scope is also special semantically anyway).

Updated by fxn (Xavier Noria) 4 months ago

That makes sense, I think we should improve const_get docs to says it's like module A; B; end and not A::B (which I'd think I'm not the only one to assume).

Agree :). The documentation of the constants API has a lot of room for improvement. Also, lookup algorithms, you basically need to reverse engineer them (or read source code). I remember some 10 years ago in Amsterdam, talking about this, that @headius (Charles Nutter) told me: "You are going through what I had to go through myself to implement JRuby".

I documented quite a bit in the old Rails guide (https://guides.rubyonrails.org/v5.2/autoloading_and_reloading_constants.html#constants-refresher), because in order to understand the caveats of the old system, users had to understand how things work. Still, not a spec, the full precise details would not fulfill that didactic goal.

However, let me add a nuance. The current documentation says:

Checks for a constant with the given name in mod. If inherit is set, the lookup will also search the ancestors (and Object if mod is a Module).

Therefore, it is implicitly saying it is not mimicking A::B, because A::B does not check Object and does not check modules by hand.

But I agree with you, this could be more clear.

Updated by matz (Yukihiro Matsumoto) 2 months ago

For the sake of consistency, it is nice to start searching from the receiver class/module. I really like to fix the behavior, if there's no compatibility issue.
But in reality, changing the behavior possibly affect numerous usage of const_get (4000+ appearance by gem search).
I am not sure this consistency really worth the possible compatibility issues.

Matz.

Updated by austin (Austin Ziegler) 2 months ago

matz (Yukihiro Matsumoto) wrote in #note-12:

For the sake of consistency, it is nice to start searching from the receiver class/module. I really like to fix the behavior, if there's no compatibility issue.
But in reality, changing the behavior possibly affect numerous usage of const_get (4000+ appearance by gem search).
I am not sure this consistency really worth the possible compatibility issues.

I think it would be worth an experimental flag, TBH. I have a few places where I am calling const_get in some gems and my intent is just to search in the space where I am (if bare) or in the receiver class/module (if the receiver is explicit).

Updated by fxn (Xavier Noria) 2 months ago

I you want to check only in the receiver, that is mod.const_get(cname, false) already.

If you want to check from the receiver up the ancestors chain, that is mod.const_get(cname) already.

If the receiver is a module, and the constant is not found in the ancestors the method checks Object manually as it happens in relative constant lookups. This is documented.

So, a relative constant lookup could be seen as checking the nesting + const_get + const_missing. This is how I see const_get and in that sense I am personally fine with the current behavior.

@Eregon (Benoit Daloze) opened this issue because for him, the manual check in Object is surprising (conceptually, regardless of documentation).

Updated by mame (Yusuke Endoh) 2 months ago

FYI: Here is an example that will be affected by the proposed change

https://github.com/googleapis/google-cloud-ruby/blob/367ca0d9c3e7a9a972c367670e56439ff49b8d18/google-cloud-retail-v2/lib/google/cloud/retail/v2/search_service/client.rb#L64

                namespace = ["Google", "Cloud", "Retail", "V2"]
                parent_config = while namespace.any?
                                  parent_name = namespace.join "::"
                                  parent_const = const_get parent_name
                                  break parent_const.configure if parent_const.respond_to? :configure
                                  namespace.pop
                                end

Updated by Eregon (Benoit Daloze) 2 months ago

The fix of that would be to use Object.const_get parent_name instead of const_get parent_name, or to use const_get "::#{parent_name}.
The current code seems rather fragile and would break if there is any nested Google module for instance (with so many nested namespaces it does not sound too unlikely).

I'm unsure how much code this change would break.
It might also help reveal and avoid bugs with similar cases as shown in #11547.
One way to get an idea would be to make a PR, test a few popular gems locally to see if they work or not.
If that does not reveal too many issues, I think the best would be to emit a deprecation warning for one or a few versions when going through the special Object lookup for a module, and then in a later version actually remove that extra lookup.

Updated by fxn (Xavier Noria) 2 months ago

@Eregon (Benoit Daloze) in your proposal

Module.new.const_get(:String)

raises.

Would

Class.new.const_get(:String)

raise as well like mod::String does for mod != Object?

Updated by Eregon (Benoit Daloze) 2 months ago

@fxn (Xavier Noria) A good point, it applies both to module and classes, it seems mod::Name always ignores constants of Object, whether mod is module or class.

I'd like A.const_get("B::C") to behave exactly the same as A::B::C.
That also has the nice property that A.const_get("B::C") = A::B.const_get("C") = A.const_get("B").const_get("C").

If we compare to literal :::

Class.new::String # => uninitialized constant #<Class:0x0000000001b98330>::String (NameError)
Class.new.const_get :String # => String currently, but I'd like NameError

Module.new::String # => uninitialized constant #<Module:0x00000000019456f0>::String (NameError)
Module.new.const_get :String # => String currently, but I'd like NameError

So :: doesn't look into Object (String is a constant of Object) (since #11547 I think).
In other words, with :: classes do look into ancestors except for Object, and modules only look in their own ancestors (which never include Object).
That only applies to Object itself, Kernel constants for instance are considered simply whether they are in the ancestors or not.

module Kernel; K = :kernel; end

Class.new::K # => :kernel
Class.new.const_get :K # => :kernel

Module.new::K # => uninitialized constant #<Module:0x0000000001e7ba20>::K (NameError)
Module.new.const_get :K # => :kernel currently, but I'd like NameError

Module.new.include(Kernel)::K # => :kernel
Module.new.include(Kernel).const_get :K # => :kernel

I think I also understand better your relative and absolute lookups terms now.
In A::B::C, A is a relative lookup/scope lookup/nesting lookup.
And B and C are both using "absolute lookup"/"only ancestors and always ignoring constants of Object to avoid going back to the root scope inadvertently".

Updated by fxn (Xavier Noria) 2 months ago

Exactly :).

Backwards compatibility, adding a flag, a new method, etc., is something for core to consider. But in any case, I like your proposal.

Actions

Also available in: Atom PDF