Project

General

Profile

Actions

Bug #19681

closed

The final classpath of partially named modules is sometimes inconsistent once permanently named

Added by byroot (Jean Boussier) 12 months ago. Updated 11 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:113529]

Description

Reported to me by @fxn (Xavier Noria)

m = Module.new

class m::C; end
p m::C.name # => "#<Module:0x000000010789fbe0>::C"

m::D = m::C
p m::D.name # => "#<Module:0x000000010789fbe0>::C"

M = m
p M::C.name # => "M::D"

Expected behavior:

p M::C.name # => "M::C"

Reason

When the parent is assigned its permanent classpath, we iterate over its const_table to recursively give a permanent name to all the constant it owns.

However, const_table is an id_table so it doesn't retain the insertion order, which means that if the constant was aliased,
we can no longer distinguish between the original name and its aliases, and whichever comes first in the const_table will be used as the permanent name.

Potential solution

I have a tentative fix for it in https://github.com/ruby/ruby/pull/7829. Instead of relying on the const_table key, it extract the original name from the temporary classpath.
It does feel a bit wrong to do a string search in such a place, but it does work.


Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #15765: [PATCH] Module#name without global constant searchClosedActions

Updated by ioquatix (Samuel Williams) 12 months ago

Would it make sense to store insertion order or otherwise store the order (or what is an alias) so that the constant names could be resolved correctly?

I could not reproduce the bug with the given repro, I assume it only occurs sometimes.

Actions #2

Updated by Eregon (Benoit Daloze) 12 months ago

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) 12 months ago

TruffleRuby remembers the given basename (C in this case) as a separate field, so that's similar to your approach.

But I think we need to validate the module is still reachable through that basename, otherwise we would name it incorrectly:

m = Module.new

class m::C; end
p m::C.name # => "#<Module:0x000000010789fbe0>::C"

m::D = m::C
p m::D.name # => "#<Module:0x000000010789fbe0>::C"

m.send :remove_const, :C

M = m
p M::D.name # => "M::D", correct

Updated by byroot (Jean Boussier) 12 months ago

@Eregon (Benoit Daloze) I don't think that's correct, as the first assigned name persists:

m = Module.new

class m::C; end
p m::C.name # => "#<Module:0x000000010789fbe0>::C"

m::D = m::C
p m::D.name # => "#<Module:0x000000010789fbe0>::C"

m.send :remove_const, :C

p m::D.name # => "#<Module:0x000000010789fbe0>::C"

So logically that should be the final name as well.

Would it make sense to store insertion order

It would mean using an st_table which uses more memory. For such an edge case I doubt it's justified.

I could not reproduce the bug with the given repro, I assume it only occurs sometimes.

Really? It depends on how C and D hash, so yeah it's somewhat random, but I tried on several ruby version and the specific script I gave always has the same result. Which Ruby did you use?

Updated by ioquatix (Samuel Williams) 12 months ago

I was trying it on ruby head. I don't think the fact I couldn't reproduce it really matters - I just thought I'd give you feedback since I tried it. It's an interesting bug.

Updated by fxn (Xavier Noria) 12 months ago

Philosopher hat on :): I am surprised that this feature (permanent names) exists at all, really. The Ruby model does not have a tree of objects, it has a graph of modules and constants and that is all.

Class and module objects are objects, they do not have the concept of belonging to anything. A constant belongs to one and only one module. Objects do not have this notion. Class and module objects are no different than Object.new. The first time a module object is assigned to a constant, it gets a name, it is a difference, yes. But from then on, they part. That name means nothing. Objects can be stored in multiple places, the original constants can be deleted, ..., you name it. In particular, you cannot even assume the object ir reachable by const getting the name.

So, all attempts at trying to rely on a tree of objects are doomed to led to edge cases and inconsistencies. You are treating the objects in a way that does not exist in the Ruby language.

Now, given permanent names are a thing, I believe this patch is more aligned with what you'd expect, it is more in line with what happens with regular classes and modules.

Updated by fxn (Xavier Noria) 12 months ago

Let me elaborate the alignment I see.

In Ruby, these lines are different:

C = Class.new
d = C
D = C

The first line creates a class object and assigns it to a constant. At that point, it gets its name, because it is the first constant assignment. This is what happens with

class m::C; end
p m::C.name # => "#<Module:0x000000010789fbe0>::C"

and we expect that "C" segment to persist. Now, I would freeze that and would become the name of the class forever, but Ruby has this concept of permanent names. OK.

What about lines (2) and (3) above? They are totally ordinary assignments with no side-effects associated in your mind. Assigning to a variable or a constant, is the same. It is no different than storing a string object in d or D.

So, it is in this sense that I believe this patch is good, because that expectation holds the same way for the snippet @byroot (Jean Boussier) shared in the description.

Updated by Eregon (Benoit Daloze) 12 months ago

byroot (Jean Boussier) wrote in #note-4:

@Eregon (Benoit Daloze) I don't think that's correct, as the first assigned name persists:

But only because you never give it a permanent name to m, so of course it just uses the anonymous/partial name it knows so far.

On all releases of CRuby I could try:

m = Module.new

class m::C; end
p m::C.name # => "#<Module:0x000000010789fbe0>::C"

m::D = m::C
p m::D.name # => "#<Module:0x000000010789fbe0>::C"

m.send :remove_const, :C

p m::D.name # => "#<Module:0x000000010789fbe0>::C"
M = m
p M::D.name # => "M::D"

So it would be an incompatibility to change that.
Maybe it doesn't matter enough though, but this can't be considered a simple bug fix given it's incompatible for that case (last line would be "M::C" instead of "M::D" with your PR IIUC).

The final/permanent name of a Module should reflect how to access it, and this should hold for e.g. A::B as long as no one does A.remove_const :B or A::B = ... (or const_set obviously).
Which is AFAIK the only 2 ways to break that (the first is private to discourage using it, the second warns, so it's clear these two should not be used lightly).

Updated by Eregon (Benoit Daloze) 12 months ago

In case it's not clear, the problem with the current PR is it would return M::C for that last line, but that's a lie, because M::C would be NameError: uninitialized constant M::C. While M::D is the correct way to refer to that class.
So I think we need to address that, to only use the basename if that is still a valid way to refer to it from the lexical parent module (M).

We could also just leave things as they are, as anyway this issue seems unsolvable if the basename has been removed, then it could be any of the constants in the lexical parent module that refer to it.

I'm also against ordering, that's too much overhead and complications for thread-safe handling of constant lookup caches (TruffleRuby uses a ConcurrentHashMap and things like putIfAbsent for module constants).

Updated by fxn (Xavier Noria) 12 months ago

@Eregon (Benoit Daloze), as I said, there is no requirement that names are reachable constant paths. So the fact that they might not be is irrelevant. They were not constant paths with a #<Module:0x000000010789fbe0> prefix for starters.

And at the same time, ordinary assigments are not expected to have side-effects, that is the real violation.

Updated by fxn (Xavier Noria) 12 months ago

@Eregon (Benoit Daloze) and BTW, your implementation have things ready and squared with the Ruby model. First assignment sets that basename (I understand). So, when the constants are recursively iterated, you use the basename of the object stored in the constant, not the constant.

Updated by Eregon (Benoit Daloze) 12 months ago

fxn (Xavier Noria) wrote in #note-10:

@Eregon (Benoit Daloze), as I said, there is no requirement that names are reachable constant paths.

For Module#name in general, no, but once there is a permanent/non-anonymous (= starts with #<) name, then that is a requirement and a guarantee, until possibly broken by remove_const/const_set (very rare and would be a bug of whoever does that).

Everyone when they see undefined method 'bar' for #<M::C:0x00007fe82b0d1450> (NoMethodError) expect they can refer to that class via M::C (e.g., in a debugger/in irb/in code).
I would go as far as to say every non-trivial Ruby code out there relies on permanent module names to be a way to refer to that module.
Otherwise you couldn't even rely on Hash to refer to the Hash class.

Updated by Eregon (Benoit Daloze) 12 months ago

fxn (Xavier Noria) wrote in #note-11:

@Eregon (Benoit Daloze) and BTW, your implementation have things ready and squared with the Ruby model. First assignment sets that basename (I understand). So, when the constants are recursively iterated, you use the basename of the object stored in the constant, not the constant.

No, by basename I mean "C", and that comes from class m::C; end.
And as you can see here, TruffleRuby just like CRuby uses the name to refer to the constant to name it. So it has the same behavior as current CRuby.
We could use givenBaseName to do something different, but it's a good question whether that should change and how.

Updated by fxn (Xavier Noria) 12 months ago

Oh yes, what I wanted to mean is that TruffleRuby has things modeled so that what I believe is the correct behaviour is easy to implement. You grab the object, if its name is not permanent ask for its basename and make it permanent, the constant is irrelevant.

Regarding "checking if the constant path exists", that is something Ruby does not check:

M = Module.new
X = M

Object.send(:remove_const :M)

X::C = Class.new
X::C.name # => "M::C"

That is fine, X::C.name is "M::C", Ruby does not verify if the constant path resolves. So, no need for that to be a criteria here either.

What happens in Ruby is that if you got a "M" in your name, it is going to be an "M" forever. That is the invariant. And that is why I believe this patch is good.

Updated by Eregon (Benoit Daloze) 12 months ago

Eregon (Benoit Daloze) wrote in #note-8:

The final/permanent name of a Module should reflect how to access it, and this should hold for e.g. A::B as long as no one does A.remove_const :B or A::B = ... (or const_set obviously).
Which is AFAIK the only 2 ways to break that (the first is private to discourage using it, the second warns, so it's clear these two should not be used lightly).

@fxn (Xavier Noria) gave me a counter-example:

M = Module.new
N = M
Object.send :remove_const, :M
p N.name # => "M"
N::C = Class.new
p N::C.name # => "M::C"

So N.name is broken, and that matches what I said.
But N::C.name is broken too, it "inherits" the broken name from its lexical parent.
So the actual semantics are a bit more complex and harder to express.

It could be nice to change the name of a Module once it's no longer a valid way to access it (so do that on remove_const/const_set). Like N.name => <invalid, was: M> or so.
But that's a separate issue, although it would help to clarify such edge cases and avoid confusion.


For this issue, I think we have 3 alternatives:

  1. Intended behavior, not a bug. Pro: more correct for https://bugs.ruby-lang.org/issues/19681#note-8, has been like that since a long time, the example in the description is very edge case. Cons: non-deterministic for such code.
  2. Use the basename if, at this point in time, it is a valid way (i.e., does mod.const_get(basename) == module_being_named) to refer to that module/class being named.
  3. Use the basename, even if the module being named is no longer reachable that way. Pros: simple. Cons: incompatible for https://bugs.ruby-lang.org/issues/19681#note-8, it gives an invalid name when there is a valid one.

Updated by fxn (Xavier Noria) 12 months ago

No, N.name is not broken! It is just the way it works in Ruby.

You cannot claim something is broken because it does not satisfy a property that nobody is claiming it is satisfied!

Updated by Eregon (Benoit Daloze) 12 months ago

By "broken" I just mean it no longer reflects a valid way to access that module, that's all, no more than that.

Updated by fxn (Xavier Noria) 12 months ago

So, I'd like to summarize the points in this discussion.

On one hand, we have a segment changed here:

m = Module.new

class m::C; end
p m::C.name # => "#<Module:0x000000010789fbe0>::C"

m::D = m::C
p m::D.name # => "#<Module:0x000000010789fbe0>::C"

M = m
p M::C.name # => "M::D"

I believe we all agree that "M::D" is unexpected.

I suspect this is not intentional, and it is indeterministic because it is the side-effect of iterating over one particular unordered table. Perhaps that is why @ioquatix (Samuel Williams) cannot reproduce. And it could be the case on paper that if we add more constants to that table, we get "M::C".

On the other hand, we have a 2nd derivative introduced by @Eregon (Benoit Daloze): What if when you visit the D constant, the C constant does not exist?

My opinion is that since for Ruby that is not a condition (as shown in a previous comment), it should not be checked here either for coherence. Ruby generally does cref.name + '::' + cname, and whether cref.name can be resolved or not is irrelevant.

Updated by fxn (Xavier Noria) 12 months ago

I have run another experiment that may weaken my point about the trailing segment being invariant:

m = Module.new
m::X = Class.new
puts m::X.name # => #<Module:0x000000010bf14a80>::X šŸ‘

n = Module.new
n::Y = m::X
puts n::Y.name # => #<Module:0x000000010bf14a80>::X šŸ‘

N = n
puts m::X.name # => "N::Y" 

There, the fact that the name of the class object does not even start with what would correspond to N is ignored. That temporary name was not even "yours", why did you edit it?

If the semantics of this are "the contents of a temporary name are ephemaral and, if made permanent, they can become anything, anytime", then what we see in this ticket is OK.

Updated by Eregon (Benoit Daloze) 12 months ago

That's a really interesting example.
There if the result would be N::X (as I believe https://github.com/ruby/ruby/pull/7829 would do), that would be very surprising, nobody ever assigned this class to N::X.
I think that shows the value of using the name in the constant table.
And using the name in the constant table implies non-determinism, because the first name is used, and it's just skipped for other names.

We can special-case if the module basename is still a way to refer to that module from the lexical parent, and that would solve the description's example (alternative 2. above).
I think that's good and limits the non-determinism to cases using remove_const or naming under a different module, and having 2+ constants referring to that module-being-named, which is pretty rare.

Updated by byroot (Jean Boussier) 12 months ago

So to clarify my thinking, my own mental model, or interpretation of Ruby's intent, is that whichever constant you assign a Module/Class to become it's name, and that's final and permanent.

C = Class.new
D = C
Object.remove_const(:C)
D.name # => "C"

That is why, by this logic, when you do:

m = Module.new
m::C = Class.new

That class's name become C, and the fact that it was assigned to an anonymous module have no impact. I think it's even more obvious with the syntax used by the original repro: class m::C; end.

Note that here I mean name (C), not classpath (A::B::C).

A module may have a permanent name before it has a permanent classpath.

That's why I implemented the patch the way I did.

I totally understand other may have different mental model though. And looking at how it was implemented historically, it's not entirely clear this behavior was fully intended, given that if I read the code right the classpath field in rb_classext_t used to be a lazily set cache, and it's not 100% certain not clearing it on remove_const was a feature or an oversight.

What is certain to me though, is that the current behavior rely on the name hash, hence is "semi-random", and is really undesirable.

I think at this stage we'll have to put this at the dev meeting agenda to get Matz opinion.

Updated by byroot (Jean Boussier) 12 months ago

it's not 100% certain not clearing it on remove_const was a feature or an oversight.

Interesting datapoint on this part:

A = Class.new
B = A
Object.send(:remove_const, :A)
p B.name

From 1.8 to 2.3 -> "B"
Since 2.4 -> "A"

So there was indeed a behavior change at some point, again not clear if intended or an oversight.

Additionally:

A = Class.new
A.name
B = A
Object.send(:remove_const, :A)
p B.name

Always returned "A". So somehow, even on 1.8, once the name had been accessed once, it was permanent.

Updated by fxn (Xavier Noria) 12 months ago

@byroot (Jean Boussier) agree we are reaching a point in which we need some authoritative answer saying: This is how it is supposed to work.

I was surprised by the edition of names you do not "own", I had never tried this. I'd expect those to be skipped. However, I realized this has a consequence for the proposed patch we need to be aware of. Let's consider

m = Module.new
m::C = Class.new

n = Module.new
n::C = Class.new
n::D = m::C

N = n

With this patch, N::C and N::D are different class objects, but their name is the same: "N::C".

For some points of view, this could be counterintuitive. For others, this might be a fine edge case, since class/mod names and constant paths are totally decoupled in the Ruby design anyway.

This decoupling is real. For example, if you have a reloadable module M and you include it in ActiveRecord::Base (this is wrong, but technically possible), on reload, the ancestor chain of ActiveRecord::Base has a module object that is different from the one stored in M, but has the same name: "M".

So, in Ruby class/mod names are not guaranteed or assumed to be unique. In that sense, it might be seen as fine.

We need an authoritative guide.

Updated by fxn (Xavier Noria) 12 months ago

One imaginary rule (in the sense that I do not know if it is real) that could be at play here could be: If you are a class/mod object reachable through a constant path, necessarily you have a permanent name. Which one? Whatever comes first wins and overrides anything. You could be called #<Module:0x000000010bf14a80>::X initially, and end up being A::B::C::D.

Such rule could explain the edition of temp names you do not "own".

Updated by fxn (Xavier Noria) 12 months ago

LOL, it turns out reproducing the non-determinism is trivial. So, on one hand we have the original example:

m = Module.new
m::C = Class.new
m::D = m::C
M = m
M::C.name # => "M::D", unexpected

In my machine, just insert a constant in between, and you get "M::C":

m = Module.new
m::C = Class.new
m::X = Class.new # <-- HERE
m::D = m::C
M = m
M::C.name # => "M::C", OK

This Ruby is ruby 3.2.1 (2023-02-08 revision 31819e82c8) +YJIT [x86_64-darwin22], but we know it does not matter. We know there is an unordered table being iterated, so we know the indeterminism lies in the procedure itself. This just demonstrates it working.

Updated by fxn (Xavier Noria) 12 months ago

OK, I think the discussion to have in the meeting is reduced to one simple question only. Let me explain.

The behaviour I have learnt regarding temp names "owned" by other anonymous modules has test coverage here, therefore this is deliberate.

That means a temp name is considered to be truly ephemeral and can eventually become anything (permanent). Therefore, my expectation that the trailing ::C should be invariant is incorrect. That test contradicts this.

So, given:

m = Module.new
m::C = Class.new
m::D = m::C
M = m
M::C.name # => "M::D"

the options are:

  1. That should be "M::C", make it happen, and add a test for it.
  2. Declare that to be undefined behaviour to keep things simple (if this concept exists in Ruby, not sure). You don't know which name you are going to get.

To me, both are valid resolutions. I personally only want to know which is your choice.

Updated by byroot (Jean Boussier) 12 months ago

has test coverage here, therefore this is deliberate.

Note that this isn't necessarily true. The spec was added by Alan as part of an optimization, but that doesn't mean it was necessarily intended behavior, as in it was trying to preserve the existing behavior, which itself wasn't necessarily desirable in the first place. It's more calcification of existing behavior rather than deliberate design.

Anyway, I'll try to sum up the issue and bring it to the next meeting.

Actions #28

Updated by byroot (Jean Boussier) 12 months ago

  • Related to Feature #15765: [PATCH] Module#name without global constant search added

Updated by matz (Yukihiro Matsumoto) 11 months ago

  • Status changed from Open to Closed

I understand the feeling the name once was "C", could be renamed to "D" later. But in reality, we don't think it's worth maintaining "C" with adding more complexity.
So I choose option 2 in #note-26. If you see the real-world benefit that worth complexity, reopen the issue, please.

Matz.

Updated by Eregon (Benoit Daloze) 11 months ago

Just to clarify, this is not undefined behavior (which has very scary semantics in the C language), it is: either "M::D" or "M::C" is an acceptable outcome for the program in the description.
So it is defined but non-deterministic behavior among a set of possible outcomes, due to an underlying hash which is unordered (and cannot easily be ordered for scalability and performance and footprint).

Updated by fxn (Xavier Noria) 11 months ago

@Eregon (Benoit Daloze) right, I was thinking in terms of "it is M::C or M::D, but which one of the two is undefined".

I was wondering if it would make sense to add a spec that makes this decision explicit, and tests that M::C.name is one of the two possible strings, for example. Could volunteer it.

Updated by Eregon (Benoit Daloze) 11 months ago

Yes, absolutely, a spec example allowing both would be welcome in ruby/spec.

Updated by fxn (Xavier Noria) 11 months ago

New spec and a couple others for the same price in https://github.com/ruby/spec/pull/1044 šŸ‘.

Updated by matz (Yukihiro Matsumoto) 11 months ago

Yes, in the C spec terms, ā€œundefined behaviorā€ really is an ā€œundefined behaviorā€, even a demon can be appeared. We can call it ā€œimplementation definedā€.

Matz.

Updated by fxn (Xavier Noria) 11 months ago

@matz (Yukihiro Matsumoto) the related spec proposed in https://github.com/ruby/spec/pull/1044 says in a commment

You get one of the two, but you don't know which one.

It is written this way because that is all CRuby can say, since we have seen that the name of the module is not predicatable by the user:

m = Module.new
m::C = Class.new
m::D = m::C
M = m
M::C.name # => "M::D"

vs

m = Module.new
m::C = Class.new
m::X = Class.new # <-- This line inserted.
m::D = m::C
M = m
M::C.name # => "M::C"

Do you like that vague wording in the spec? Or would you prefer to use some more formal language?

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like1Like0Like0Like0Like0Like0Like0