Feature #11148
openAdd a way to require files, but not raise an exception when the file isn't found
Description
Hi,
I'm trying to make is so that RubyGems doesn't need to put directories on $LOAD_PATH (which is why I submitted Feature #11140). I would like the require
implemented in RubyGems to look up the file from a cache generated when the gem is installed, then pass a full file path to require
.
The problem is that the user may have manipulated the load path somehow, and RubyGems needs to detect if the file is in the load path. Today, the algorithm inside RubyGems looks something like this:
def require file
if file_is_from_a_default_gem?(file) # this is so you can install new versions of default gems
add_default_gem_to_loadpath
end
real_require file
rescue LoadError
gem = find_gem_that_contains_file(file)
add_gem_to_loadpath gem
real_require file
end
Instead of adding the directory to the load path, I would like to look up the full file path from a cache that is generated when the gem is installed. If we had a cache, that means the new implementation would look like this:
def require file
if file_is_from_a_default_gem?(file) # this is so you can install new versions of default gems
add_default_gem_to_loadpath
end
real_require file # get slower as paths are added to LOAD_PATH
rescue LoadError
gem = find_gem_that_contains_file(file) # use a cache so lookup is O(1)
fully_qualified_path = gem.full_path file
real_require fully_qualified_path # send a fully qualified path, so LOAD_PATH isn't searched
end
Unfortunately, that means that every call to require in the system would raise an exception. I'd like to add a version of require
that we can call that doesn't raise an exception. Then I could write the code like this:
def require file
if file_is_from_a_default_gem?(file) # this is so you can install new versions of default gems
add_default_gem_to_loadpath
end
found = try_require file
if nil == found
gem = find_gem_that_contains_file(file) # use a cache so lookup is O(1)
fully_qualified_path = gem.full_path file
real_require fully_qualified_path # send a fully qualified path, so LOAD_PATH isn't searched
end
found
end
end
This would keep the load path small, and prevent exceptions from happening during the "normal" case.
I've attached a patch that implements try_require
, but I'm not set on the name. Maybe doing require(file, exception: false)
would work too.
Files
Updated by nobu (Nobuyoshi Nakada) over 9 years ago
- Description updated (diff)
Although I had an idea to separate require
into "search" and "load", this may be simpler.
Updated by Eregon (Benoit Daloze) over 9 years ago
Why is that exception problematic?
For performance (the cost of the search is already large I suppose)
or to only catch the LoadError from require and not accidentally from somewhere else? (this could potentially affect compatibility)
Updated by tenderlovemaking (Aaron Patterson) over 9 years ago
@nobu (Nobuyoshi Nakada) I was thinking the same, but this was the smallest patch that would accomplish what I need
@Benoit (Benoit BENEZECH) yes, for performance, and to avoid catching load errors. If my plan is successful, rubygems would stop adding directories to the load path. That means searching should be relatively fast (since the load path would be relatively small). With the current algorithm, the first require that "activates" a gem will always raise an exception, then the gem gets loaded, and all of the requires inside the gem will not raise an exception. So say 98% of the time, require doesn't raise an exception. If I stop adding directories to the load path, then 98% of requires will raise an exception. I think that would incur a non-trivial overhead (though I don't have numbers for you right now).
Updated by Eregon (Benoit Daloze) over 9 years ago
Aaron Patterson wrote:
@Benoit (Benoit BENEZECH) yes, for performance, and to avoid catching load errors. If my plan is successful, rubygems would stop adding directories to the load path. That means searching should be relatively fast (since the load path would be relatively small). With the current algorithm, the first require that "activates" a gem will always raise an exception, then the gem gets loaded, and all of the requires inside the gem will not raise an exception. So say 98% of the time, require doesn't raise an exception. If I stop adding directories to the load path, then 98% of requires will raise an exception. I think that would incur a non-trivial overhead (though I don't have numbers for you right now).
Right, makes sense. It would be great to have some data though :)
How would the cache deal with duplicated keys, that is when multiple gems have a same relative path inside their lib/,etc directories? I think there might be some expectation for some gems on having the gem lib/,etc in $LOAD_PATH.
Is the first find_gem_that_contains_file(file) O(number of installed gems) or is there some heuristic matching the first component of file with a gem name?