Feature #19633
openAllow passing block to `Kernel#autoload` as alternative to second `filename` argument
Description
Kernel#autoload
takes two arguments, a symbol module
representing the constant to be autoloaded, and a filepath
to load the constant from (with require
).
Currently, Zeitwerk has to monkeypatch Kernel#require
to fetch the loader for the file being loaded, then run the original (aliased) require
, then run autoload callbacks. In addition to the monkeypatch, this also requires a registry (Zeitwerk::Registry
) to map file paths to loaders, to know which loader should be used for a given autoload-triggered require
. In fact, Zeitwerk has to assume that the monkey-patched require
call came from an autoload trigger; there is no way to really be sure of the source.
If Ruby allowed passing a block as an alternative to the explicit filepath, then I think this could be improved and would eliminate the need for a monkeypatch. So something like this:
autoload(:B) do
require "lib/b"
# trigger callback, etc
end
I am implementing a gem called Im which is a fork of Zeitwerk, and in the case of this gem, such a feature would be even more useful. Im implements autoloads on anonymous modules by registering an autoload and then "catching" the require and converting it into a load
, passing the module as the second argument (see here.) This is currently quite tricky because, again, it's hard to know where a require
came from.
In addition to removing the monkeypatch (inherited from Zeitwerk), Im would further benefit from the block argument because then it could simply access the module via a closure, rather than pulling it from a registry:
mod.autoload(:Foo) do
load "lib/foo.rb", mod
end
I don't know how hard or easy this would be to implement, but if there is interest in this as a feature I'd be happy to look into implementing it.
Updated by shioyama (Chris Salzberg) over 1 year ago
I know you're not in favour of what I'm doing with autoloads on anonymous modules, but hoping you'd agree this would be a useful thing to have in general and for Zeitwerk specifically. I haven't actually walked through the full path of whether this would allow removing Zeitwerk::Registry
, curious to know what you think. But removing the monkeypatch alone seems like a big win given how many gems already monkeypatch require
.
Updated by shioyama (Chris Salzberg) over 1 year ago
@byroot (Jean Boussier) pointed out today that if it was possible to both register a require path and a block to an autoloaded constant, then this feature may also allow replacing the Tracepoint class event in Zeitwerk for explicit namespaces, because you could instead register the tracepoint callback code in an autoload hook for the explicit constant.
Updated by fxn (Xavier Noria) over 1 year ago
Hey, sorry for not replying earlier, for some reason messages from Redmine often end up in my spam folder.
I know you're not in favour of what I'm doing with autoloads on anonymous modules
That is not true, why do you think that? We have barely talked about your project, and the few interactions that we had in Twitter were positive and encouraging from my side, no?
The reason Zeitwerk does not support anonymous modules today is practical, it is not something I am against it, or refuse to ever support.
First, Zeitwerk does not have isolation as a goal. Zeitwerk is designed for regular applications and gems that want their code to be accessible as in regular Ruby. In particular, your files are normal and have to define things normally, don't get the nesting modified.
So, how can a regular project have an anonymous root namespace? It cannot be in a constant, is it going to be in a global variable? Who does that? I believed it was an edge case that did not deserve to spend time. Why the validation? Because knowing that things have an ordinary constant path simplifies some internal assumptions, so I validate in the boundary to be confident downwards that is the case.
In fact, Zeitwerk has to assume that the monkey-patched require call came from an autoload trigger;
This is not accurate. Zeitwerk does not assume that, really. It is weaker. What Zeitwerk knows is that if the file is registered, the corresponding loader is in charge of that file. That means it has to trace its loading, has to run callbacks, maintain internal state, etc. That is a fact. No matter who ran the require
. Who ran the require
is irrelevant.
Even more, Zeitwerk supports require
calls made by 3rd party! These lines. If your project has some rogue require
s, or you migrate an existing gem to Zeitwerk, and existing client code is issuing require
s, things will work. It is not the case that you are forced to autoload and if a rogue require
happens somewhere the project is broken. No, it is more robust than that on purpose.
And that is one of the reasons I believe Zeitwerk would not use the proposed block. Because I want that robustness. Also, require
has some concurrency control builtin which eases thinking about concurrency.
That is not to say I am against the block, eh? Only saying I don't quite see it used by Zeitwerk at the moment.
I would prefer to not decorate Kernel
, indeed, and I believe what would really work for Zeitwerk would be a callback associated to Kernel#require
: If a file is require
d, invoke the callback with the require
argument, and the absolute path to the file.
A different topic in this proposal is backwards compatibility. Today, Module#autoload?
returns the file name if set, a string. With this addition, the natural return value for a block argument would be the block, but a block is not a string. I use that to know if a constant associated to a file being visited has an autoload set (maybe by another project, it is an edge case of shadowed file).
Updated by fxn (Xavier Noria) over 1 year ago
Oh, regarding TracePoint.
First of all, let me stress that Ruby core confirmed the specific usage of TracePoint made by Zeitwerk is OK. If there was an alternative technique I'd consider it, but I don't think there is anything to fix here.
Now, in explicit namespaces, the problem is that we need to be called right when the class is created:
class M
# At this point, we need to be called.
include X # Must work if this is M::X
end
I don't think a block associated to an autoload could help there. You need to run code in the middle of the file execution. See?
Updated by shioyama (Chris Salzberg) over 1 year ago
Thanks @fxn (Xavier Noria) for your reply! I was going to update this, but I brought it up at the recent Dev Meeting before RubyKaigi and although there was interest, Matz seemed not to be in favour of it. I assume this will be closed.
That aside, I'll hijack this thread to respond to some of your other comments which are very important to me.
First of all, text can be an ambiguous medium so for the record, I would drop work on Im right now if Zeitwerk were to support the same functionality (autoloading to anonymous namespaces). I'm not dying to maintain a fork!
But at the same time, discussions about namespaces (e.g. #10320 and #19024) were getting very hypothetical, and I felt that rather than waiting for some consensus to emerge, it made sense to just forge forward to see what this world would actually look like. That's my immediate intent with Im. Longer-term, I am not sure where I am going with the project yet.
That is not true, why do you think that? We have barely talked about your project, and the few interactions that we had in Twitter were positive and encouraging from my side, no?
I'm sorry, I think we have a case of miscommunication here. I very very much appreciate your message of support on Twitter, and replied so at the time.
When I wrote "not in favour of what I'm doing with autoloads on anonymous modules", I'm referring to this sentiment:
This is all very misaligned with Ruby, in my opinion. Indeed, for me,
load file, mod
is a very edge interface that in my opinion should not be pursued further because it breaks a fundamental assumption in all existing code: That by looking at your source code, you know the nesting. I have always thought it has few use cases, and they involved cooperation caller/callee.
Your following sentence proposes a "design based on explicit cooperation", which perhaps includes what Im is doing in some slightly different way.
But my takeaway from the quote above is that using load file, mod
, particularly in a Kernel#require
monkeypatch (which is what Im does), would not be something you would want to do in Zeitwerk, and further that it is not something that you think should be done.
I apologize if I misread the intent, but this seems to be a natural reading of this statement. And I would completely understand that opinion! But for me, this is the only way I see to move forward bar having some changes to Ruby itself, which (aside from small changes like the one here) I'd prefer to avoid.
It cannot be in a constant, is it going to be in a global variable? Who does that?
As in Im, not all top-level namespaces are being abandoned. The goal is simply to make it possible to load to anonymous-rooted namespaces. So the global here is the registry in Im
itself, which is top-level.
So technically-speaking everything is still "global". But there is a huge difference IMO between a top-level namespace, and a module stored in a registry mapped to a file path.
Because knowing that things have an ordinary constant path simplifies some internal assumptions, so I validate in the boundary to be confident downwards that is the case.
Completely agree with this. Again, I did not mean to criticize Zeitwerk for doing this, at all, quite the contrary it helps that this is explicit. But the commit message (which I read when I forked) is pretty clear that "Anonymous root namespaces are not supported". Since this does not say "currently supported", I assume this meant it would never be supported.
Who ran the
require
is irrelevant.
Yes, I understand this, and my statement was slightly sloppy. Thank your for clarifying this.
If your project has some rogue
require
s, or you migrate an existing gem to Zeitwerk, and existing client code is issuingrequire
s, things will work.
Yes! Again, I was slightly handy-wavy in what I wrote this. I understand this, and Im inherits from this and benefits from using the same approach.
That is not to say I am against the block, eh? Only saying I don't quite see it used by Zeitwerk at the moment.
Fair enough! I don't think it will be accepted anyway, so moot point, but I appreciate your clarity.
I would prefer to not decorate
Kernel
, indeed, and I believe what would really work for Zeitwerk would be a callback associated toKernel#require
: If a file is required, invoke the callback with therequire
argument, and the absolute path to the file.
Yes, and this was also actually brought up as an alternative at the dev meeting. I agree this would achieve the same goal for Zeitwerk. Unfortunately, for Im, this would not be helpful because I want to swap a require
for a load
, so not execute the require
at all.
With this addition, the natural return value for a block argument would be the block, but a block is not a string.
This is a great point, and I hadn't considered it. Although I believe this proposal will be rejected, I want to revisit this in the future and I'll keep this in mind.
Updated by fxn (Xavier Noria) over 1 year ago
@shioyama (Chris Salzberg) ahhh, I see what you mean now re the sentiment.
So, yes, if someone asks my personal opinion about modifying the nesting externally with load file, mod
or a new require
, my personal take is that I don't quite see it in the Ruby language.
However, at the same time, I liked and support that you forked and tried to put that idea into practice and explore where it takes us with real code ❤️.
Let's talk anonymous namespaces for a moment in Zeitwerk. If you could assign an anonymous namespace to /foo
in Zeitwerk, what would go in the top-level class
or module
keyword in /foo/baz.rb
?
Updated by shioyama (Chris Salzberg) over 1 year ago
Let's talk anonymous namespaces for a moment in Zeitwerk. If you could assign an anonymous namespace to
/foo
in Zeitwerk, what would go in the top-levelclass
ormodule
keyword in/foo/baz.rb
?
I think this is the key point where we may have differing views. Not strongly differing perhaps though. I think /foo/baz.rb
should just have a normal structure, i.e. something like e.g.
module Foo
class Baz
end
end
Im will load this under an anonymous namespace if you set up paths accordingly. I believe you would want a "design based on explicit cooperation" to use your earlier quote, in which a the file level itself you would opt-in to being autoloaded under an anonymous namespace.
I personally do not believe this is necessary. When I wrote "if Zeitwerk were to support the same functionality", I mean I would want to be able to (say) create an alternative loader, which like Im subclasses Module
, and which I could assign to paths just like in Zeitwerk. Except that these paths would be loaded under the loader instead of at top-level. Nothing else would change in the files themselves.
Im is exactly what I am imagining, no more no less. I am not married to this vision, and happy to discuss further, but this is where I am coming from.
Personally I feel that "autoloading to anonymous modules" should be the more generic case here, and "autoloading to top-level" the specialized version of that. Of course in practice Ruby is not designed to make the former easy (although much easier as of 3.2!), but that to me is simply an implementation detail.
Also, related to this, I gave a talk at RubyKaigi the other day (in Japanese) about this (slides) and there was a lot of positive response. I feel there are many Ruby developers interested in this topic, who in the past have simply seen this as impossible and thus not talked about it much. It is now no longer impossible, and this is sparking a lot of interesting discussions (like this one here! :) ).
Updated by shioyama (Chris Salzberg) over 1 year ago
I wrote:
I personally do not believe this is necessary.
In fact, I realized this is not quite complete. I don't want to require file-level changes for two reasons.
One, it requires changes to every file to make a project (gem or other autoloaded collection of files) loadable under an anonymous module. Those changes may be small, but the result is significant IMO.
Concretely speaking, look at rails_on_im for an example. Here, I made some changes to Rails setup to replace Zeitwerk with Im and autoload an application under a single root namespace (MyApp
). It mostly works (minus views, which I haven't dug into yet). This I think is a great example of the kind of change that this approach can bring, and would potentially avoid namespace collisions between an application and its gems, without any changes to the normal Rails way of doing things.
Making each file change its class
or module
declarations to do this would mean that experimenting this approach would require changes to N files, where N is the total of all Zeitwerk-autoloadd files. This is a barrier to entry and a barrier to change that I do not feel brings great value, particularly in this case.
The second reason I don't want to do this is, honestly, that if you are ready to make file-level changes to autoload under an anonymous namespace, then to me you don't really need a gem anymore. Or maybe you need a slightly different one, but regardless the problem becomes quite different.
Specifically once the constraints are loosened to the point that you require the file to do something different, you might as well just change the top-level module/class declaration and avoid the constant creation altogether:
mod = Module.new
module mod::Foo
class Baz
end
end
Of course this is not enough, and we'd need to fetch/return mod
here. But to me this renders the exercise much less interesting and useful. The whole reason I am interested in piggy-backing this approach on autoloading is that it eliminates boilerplate and makes it possible to capture an entire loadable file collection under a single module (the loader in Im).
This also, btw, is a reason I was hoping this request here would be accepted. I really like the idea of "object-level autoloads", i.e. you have an object (Im::Loader
in this case) and all autoloads are defined on it, rather than on a global namespace. This seems well-aligned to the ideas of OOP. In practice though, Im has to still hold a registry mapping file paths to loaders. The proposed change here would eliminate that registry, since the autoload
could pass the module to load
through the closure, removing the need for any global registry.
So the loader becomes an encapsulated package whose trigger for loading code in files is autoload
. Implementation concerns aside, I personally find this idea quite elegant.
Updated by fxn (Xavier Noria) over 1 year ago
The question was: If /foo
is a root directory, and you want to associate to it an anonymous module as custom namespace, how would /foo/bar.rb
look like? So, /foo/bar.rb
is not Foo::Bar
. We can also assume, your project has more than one file.
OK, the answer in Zeitwerk is that you need the namespace to be globally available: for example
$__MY_GEM = Module.new
loader.push_dir('/foo', namespace: $__MY_GEM)
and then you cannot:
module $__MY_GEM
class Bar
end
end
you necessarily need to use this style:
class $__MY_GEM::Bar
end
- I have never seen this used
- Inside that body, you cannot refer to other constants in the project using a simple constant reference as usual, because
$__MY_GEM
is not in the nesting. - This does not solve isolation, because
$__MY_GEM
is as reachable asMyGem
is.
This is why Zeitwerk does not support anonymous namespaces today. Would need some work and probably for nothing. I need a real use case.
Wanted to close this circle about the reason anonymous namespaces are not supported today in Zeitwerk.
I'll reflect about the rest of your comments!
Updated by shioyama (Chris Salzberg) over 1 year ago
The question was: If
/foo
is a root directory, and you want to associate to it an anonymous module as custom namespace, how would/foo/bar.rb
look like?
Sorry, I read your comment too quickly. I actually removed the entire concept of a custom namespace in Im because it doesn't really feel useful once the root is anonymous (see this commit).
So, I answered the wrong question, but hopefully my answer is useful anyway!
Updated by fxn (Xavier Noria) over 1 year ago
Concretely speaking, look at rails_on_im for an example. Here, I made some changes to Rails setup to replace Zeitwerk with Im and autoload an application under a single root namespace (MyApp). It mostly works (minus views, which I haven't dug into yet). This I think is a great example of the kind of change that this approach can bring, and would potentially avoid namespace collisions between an application and its gems, without any changes to the normal Rails way of doing things.
Our main discrepancy is that I don't like implicit nesting. I want you to open a file and see what it defines.
Today, as far as Zeitwerk is concerned, you can scope all your autoload paths to MyApp
. Either with my_app
directories, or by passing MyApp
as a namespace in the root directories configuration. If you want to avoid collisions with gems, you can do that today. And when you open the files you'll see the namespace, and a rogue require
won't break the app, because the file does not depend on someone externally injecting a namespace (this is the "coordination" I refer to, both loader and loaded need to be in coordination).
If you could do this normally with anonymous namespaces, they would be supported too. But you cannot in current Ruby.
Updated by shioyama (Chris Salzberg) over 1 year ago
Our main discrepancy is that I don't like implicit nesting. I want you to open a file and see what it defines.
Agreed.
If you want to avoid collisions with gems, you can do that today.
Between an app and its gems, yes (at the cost of what I consider boilerplate). Between two gems, there is no way in Zeitwerk to avoid collisions other than one gem making a namespace change.
If you could do this normally with anonymous namespaces, they would be supported too. But you cannot in current Ruby.
If I understand correctly, you would not be against, say, a format whereby the file explicitly set its toplevel as anonymous, is that correct?
i.e., you don't like what looks like a top-level namespace actually being loaded under an anonymous module. But you wouldn't mind something akin to:
# foo.rb
Zeitwerk.define_module Foo
# ...
end
# foo/bar.rb
Zeitwerk.define_module Foo
module Bar
end
end
Assuming Zeitwerk.define_module
here somehow defined an anonymous-rooted namespace that would be loaded using the same autoload conventions as a named one. (So these namespaces are clearly not toplevel.)
Honestly just curious.
(Edit: there's probably a nicer format for this, I just sketched that above in 2 minutes thinking about this...)
Updated by fxn (Xavier Noria) over 1 year ago
If I understand correctly, you would not be against, say, a format whereby the file explicitly set its toplevel as anonymous, is that correct?
Correct.
For example, if given $MyNamespace = Module.new
this was allowed in Ruby:
module $MyNamespace
class MyClass
end
end
anonymous namespaces would be already supported in Zeitwerk.
Anonymous namespaces are not supported because I don't have a real use case.
Updated by fxn (Xavier Noria) over 1 year ago
For example, if you tell me: I have devised this anonymous namespace manager (external to Zeitwerk, in principle) to enforce the desired separation of components in Shopify, and this would be the syntax I'd use, and this is realistic, I'd write support to accept an anoymous namespace in the namespace
option right away.
I only need a real use case.
Updated by shioyama (Chris Salzberg) over 1 year ago
Thanks, that makes sense.
In one way we're not far apart. In another, though, I think we still are, because the difference is really between whether control is delegated to the caller or the callee. "isolation", which is what I'm after here, says it should be the caller, not the callee. Allowing the callee to control is not wrong to me, but it's prioritizing the goal that "by looking at your source code, you know the nesting" as you describe.
Of course isolation is never complete because it's Ruby, and the callee can of course declare module ::Foo
and escape isolation. I've given up on any kind of real isolation because of concerns expressed by folks like yourself and others, which convinced me Ruby would not make this easy or simple.
That said, there are advantages to even partial isolation. I don't want a stray module Foo
to mean that the caller now has top-level pollution that they did not expect. In the scheme you describe above, that would still be a problem.
We're after different goals here, I think both are valid, but they don't really align. That's my conclusion. And if they do not entirely align, then I will continue working on Im as a fork for the foreseeable future.
One thing that would make my life easier would be if some of the "utility logic" in Zeitwerk could be extracted into a separate gem. Particularly the loader helpers for crawling directories, etc. Then Im could use that gem without the rest of Zeitwerk.
But I can imagine that would be more work for you, and I'm sure you already have enough work maintaining such a popular gem, so I don't expect you to do that. :) Just to mention in case it ever is something you consider.
Updated by fxn (Xavier Noria) over 1 year ago
Yeah, I understand.
I also have the feeling that you are using an autoloader just because it allows you to plug in some tricks towards your isolation goal. But such feature should come from the language to be real. I mean, if isolation was possible, it should be possible without an autoloader. See what I mean?
Splitting Zeitwerk, ..., not really, the library makes sense as a whole to me.
You have a very specific problem in Shopify, now addressed with packwerk from what I understand. Let's see where your research takes you!
Updated by shioyama (Chris Salzberg) over 1 year ago
now addressed with packwerk from what I understand.
Ah, not exactly (or at least not entirely), but that's another story! :sob: