Feature #16978
openRuby should not use realpath for __FILE__
Description
This is the simplest test case:
$ mkdir a
$ echo "puts __FILE__" > a/test.rb
$ ln -s a b
$ ruby -Ib -e "require 'test'"
/builddir/a/test.rb
This behavior is problematic, because Ruby should not know nothing about the a
directory. It was not instructed to use it. I should always refer to the file using the original path and do not dig into the underlying details, otherwise depending on file system setup, one might be forced to used File.realpath
everywhere trying to use __FILE__
.
Updated by jeremyevans0 (Jeremy Evans) over 4 years ago
- Tracker changed from Bug to Feature
- ruby -v deleted (
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]) - Backport deleted (
2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)
I don't think this is a bug. __FILE__
is documented as follows: The path to the current file.
Which path (real, absolute, relative, expanded) is not specified.
Not using the real path would lead to behavior that depends on the first path used when requiring the file.
a/test.rb
(b
symlinked to a
):
def a
__FILE__
end
ruby -Ia -Ib -rtest -e 'a'
# /path/to/a/test.rb
ruby -Ib -Ia -rtest -e 'a'
# Current: /path/to/a/test.rb
# Your proposed: /path/to/b/test.rb
What actually happens is not the file path being converted to a real path, but the include directory being converted to a real path before the file is required (in rb_construct_expanded_load_path
). Changing this to not use a real path would probably break the code that checks that a feature hasn't been require twice. For example, this code would change behavior:
$: << 'a'
require 'test'
$: << 'b'
require 'test'
# Current: not loaded again
# Your proposed: loaded again
If you symlink the file itself and not the include directory, Ruby will attempt to require it as a separate feature.
Note that if you provide a path when requiring, Ruby already operates the way you want:
ruby -r./a/test -e 'a' # /path/to/a/test.rb
ruby -r./b/test -e 'a' # /path/to/b/test.rb
I can certainly see pros and cons from changing the behavior, but I would consider this a feature request and not a bug.
Updated by Eregon (Benoit Daloze) over 4 years ago
The "main file" (the file passed to ruby myfile.rb
is also special in that __FILE__
and $0
can both be relative paths (basically the same path as passed on the command line).
Updated by vo.x (Vit Ondruch) over 4 years ago
this code would change behavior:
Absolutely, different $LOAD_PATH
must result in different behavior.
$: << 'a' require 'test' $: << 'b' require 'test' # Current: not loaded again # Your proposed: loaded again
This is actually where I am coming from. I am not 100 % sure what is the spec of require
, but I would expect that require 'test'
works just once and it does not matter what is the current $LOAD_PATH
status. So my proposal is actually:
# Your proposed: not loaded again
Updated by vo.x (Vit Ondruch) almost 4 years ago
Can this be resolved please? This is another scenario, which should work IMO, but it does not work:
$ mkdir a/
$ echo "require_relative 'b'" > a/test.rb
$ touch b.rb
$ ll
total 4
drwxrwxr-x. 1 vondruch vondruch 14 Jan 22 17:17 a
-rw-rw-r--. 1 vondruch vondruch 0 Jan 22 17:17 b.rb
lrwxrwxrwx. 1 vondruch vondruch 9 Jan 22 17:18 test.rb -> a/test.rb
$ ruby test.rb
Traceback (most recent call last):
1: from test.rb:1:in `<main>'
test.rb:1:in `require_relative': cannot load such file -- /home/vondruch/tt/a/b (LoadError)
The point is that in the context of current directory, the b.rb
is relative to test.rb
and therefore this example should not work. It does not really matter that the test.rb
is symlink. This behavior with requie_relative
being more and more common basically prohibits usage of symlinks for Ruby code.
This is issue because gems typically does not ship their test suites. Therefore when testing packages in Fedora, we would like do something like:
$ gem unpack rspec-rails
Fetching rspec-rails-4.0.2.gem
Unpacked gem: '/home/vondruch/tt/rspec-rails-4.0.2'
$ cd rspec-rails-4.0.2/
# Link from git checkout or from separate archive
$ ln -s ~/projects/rspec/rspec-rails/spec/ .
This blows out as soon as require_relative
is used in test suite, because it cannot refer back to require_relative "../lib/rspec_rails"
, which expands to /home/vondruch/projects/rspec/rspec-rails/spec/../lib/rspec_rails
, where there is either loaded completely different file or nothing. In Fedora 1, the test suite is typically extracted from tarball, so this would crash.
Updated by akr (Akira Tanaka) almost 4 years ago
I think we should use realpath.
The location of actual file is more robust.
Also, ELF dynamic linker has a feature, $ORIGIN, to refer a shared library using a relative path.
It also resolves symlinks.
https://refspecs.linuxbase.org/elf/gabi4+/ch5.dynamic.html
Updated by Dan0042 (Daniel DeLorme) almost 4 years ago
+1 for keeping the current behavior.
I remember in 1.8 we had so many problems with double-loading code because a file could be require'd with different paths. I have no wish to go back to that mess.
Updated by vo.x (Vit Ondruch) almost 4 years ago
I have problem with double loading because of symlinks resolving into real path. If there was ensured that require "foo"
load file just once, then double loading can't happen.
In Fedora, to avoid duplication, we have openssl gem extracted into independent package, which links back to the StdLib to keep the Ruby functionality. Unfortunately, since this commit 1, which introduces require_relative
, we have issues with double loading. The simple reproducer is ruby --disable-gems -e 'require "openssl"; require "openssl/digest"
.
You can admit that Fedora is doing something unexpected, but I think that the package split was done prior the require_relative
was even introduced.
Updated by austin (Austin Ziegler) almost 4 years ago
Why not put some special handling in to require_relative
such that it checks both the pre-realpath
and realpath
versions (perhaps behind a #define
flow so that Fedora and other systems integrators can do this without impacting anyone else, assuming require_relative
is in C) so that it can handle this case?
Updated by Dan0042 (Daniel DeLorme) almost 4 years ago
Interestingly, ruby does not use realpath for __FILE__
, only for __dir__
and require_relative
$ cat test.rb
p __dir__ => __FILE__
require_relative "b"
$ cat a/b.rb
p __dir__ => __FILE__
require_relative "c"
$ ln -s a/b.rb
$ touch c.rb
$ ruby test.rb
{"/home/dan42/16978"=>"test.rb"}
{"/home/dan42/16978/a"=>"/home/dan42/16978/b.rb"}
Traceback (most recent call last):
3: from test.rb:2:in `<main>'
2: from test.rb:2:in `require_relative'
1: from /home/dan42/16978/b.rb:2:in `<top (required)>'
/home/dan42/16978/b.rb:2:in `require_relative': cannot load such file -- /home/dan42/16978/a/c (LoadError)
Updated by Eregon (Benoit Daloze) almost 4 years ago
vo.x (Vit Ondruch) wrote in #note-7:
In Fedora, to avoid duplication, we have openssl gem extracted into independent package, which links back to the StdLib to keep the Ruby functionality.
What does it look like on the filesystem?
Even if the package is split, the standard location could be used, no need for symlinks, isn't it?
Updated by vo.x (Vit Ondruch) almost 4 years ago
Eregon (Benoit Daloze) wrote in #note-10:
vo.x (Vit Ondruch) wrote in #note-7:
In Fedora, to avoid duplication, we have openssl gem extracted into independent package, which links back to the StdLib to keep the Ruby functionality.
What does it look like on the filesystem?
This is how it is constructed. Here is where the OpenSSL structure is created in traditional gem directories:
https://src.fedoraproject.org/rpms/ruby/blob/master/f/ruby.spec#_761
And here are the links back to the StdLib location:
https://src.fedoraproject.org/rpms/ruby/blob/master/f/ruby.spec#_771
Even if the package is split, the standard location could be used, no need for symlinks, isn't it?
Not sure what do you mean specifically, because of course I'd prefer if there are no symlinks. The idea is that rubygem-openssl, should be replaced by newer version of openssl released on rubygems.org, without upgrading Ruby itself, therefore the files have to be located in the gem directory. Of course there are different possibilities, how to workaround that (e.g. #14737). The only question how sustainable they are.
However, Fedora is orthogonal and just distraction to this issue, because I can imagine if I were ruby openssl developer, I'd like to replace content of Ruby build with my development version of OpenSSL from git snapshot via symlinks, e.g.:
$ cd ~
$ git clone https://github.com/ruby/openssl.git
# rm /usr/local/lib/ruby/3.0.0/openssl*
# cd /usr/local/lib/ruby/3.0.0
# ln -s ~/openssl/lib/openssl.rb .
# ln -s ~/openssl/lib/openssl .
$ /usr/local/bin/ruby --disable-gems -e 'require "openssl"; require "openssl/digest"'
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::MD4
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of MD4 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::MD5
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of MD5 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::RIPEMD160
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of RIPEMD160 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA1
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA1 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA224
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA224 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA256
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA256 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA384
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA384 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA512
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA512 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:52:in `<class:Digest>': superclass mismatch for class Digest (TypeError)
from /usr/local/lib/ruby/3.0.0/openssl/digest.rb:16:in `<module:OpenSSL>'
from /usr/local/lib/ruby/3.0.0/openssl/digest.rb:15:in `<top (required)>'
from -e:1:in `require'
from -e:1:in `<main>'
Everybody can try the example above. Is it artificial and should be prohibited? Should it fail?
Updated by Dan0042 (Daniel DeLorme) almost 4 years ago
I think this is a bug and should be fixed, but IMO the proper fix is to use realpath for __FILE__
So in the example above when you do require "openssl/digest"
, ruby will find /usr/local/lib/ruby/3.0.0/openssl/digest.rb
and then it should check the realpath (~/openssl/lib/openssl/digest.rb
) before loading the file.
Updated by vo.x (Vit Ondruch) almost 4 years ago
Dan0042 (Daniel DeLorme) wrote in #note-12:
I think this is a bug and should be fixed, but IMO the proper fix is to use realpath for
__FILE__
This is though one thinking about this again, but you are probably right. Maybe the biggest concern is that the behavior is inconsistent. If the require
and __FILE__
used realpath
, I think the example in #16978-11 would work as you said.
Updated by mame (Yusuke Endoh) almost 4 years ago
- Related to Bug #10222: require_relative and require should be compatible with each other when symlinks are used added
Updated by ko1 (Koichi Sasada) almost 4 years ago
- Assignee set to nobu (Nobuyoshi Nakada)
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
vo.x (Vit Ondruch) wrote in #note-13:
Dan0042 (Daniel DeLorme) wrote in #note-12:
I think this is a bug and should be fixed, but IMO the proper fix is to use realpath for
__FILE__
This is though one thinking about this again, but you are probably right. Maybe the biggest concern is that the behavior is inconsistent. If the
require
and__FILE__
usedrealpath
, I think the example in #16978-11 would work as you said.
It's not difficult to make __FILE__
use realpath
:
@@ -10307,6 +10307,14 @@ numparam_nested_p(struct parser_params *p)
return 0;
}
+VALUE rb_realpath_internal(VALUE basedir, VALUE path, int strict);
+
+static VALUE
+rb_realpath__FILE__(VALUE file)
+{
+ return rb_realpath_internal(Qnil, file, 1);
+}
+
static NODE*
gettable(struct parser_params *p, ID id, const YYLTYPE *loc)
{
@@ -10326,8 +10334,19 @@ gettable(struct parser_params *p, ID id, const YYLTYPE *loc)
VALUE file = p->ruby_sourcefile_string;
if (NIL_P(file))
file = rb_str_new(0, 0);
- else
- file = rb_str_dup(file);
+ else if (compile_for_eval) {
+ file = rb_str_dup(file);
+ }
+ else {
+ int state;
+ VALUE realpath_file = rb_protect(rb_realpath__FILE__, file, &state);
+ if (state) {
+ file = rb_str_dup(file);
+ }
+ else {
+ file = realpath_file;
+ }
+ }
node = NEW_STR(file, loc);
RB_OBJ_WRITTEN(p->ast, Qnil, file);
}
Unfortunately, if __FILE__
uses realpath
, unless you also change the behavior of $0
(which I'm fairly sure would not be acceptable), you break the common idiom for executing code if the file is executed instead of required:
if __FILE__ == $0
do_something
end
Updated by byroot (Jean Boussier) over 3 years ago
Unfortunately, if FILE uses realpath, unless you also change the behavior of $0
The "entrypoint" script could behave differently, but maybe that would be confusing.
Updated by Dan0042 (Daniel DeLorme) over 3 years ago
byroot (Jean Boussier) wrote in #note-17:
The "entrypoint" script could behave differently, but maybe that would be confusing.
I think that would be acceptable. After all the "entrypoint" script already behaves differently; it's the only one where __FILE__
is not always an absolute path.
Updated by byroot (Jean Boussier) over 3 years ago
This issue seems also somewhat related to another I opened a while ago: #17593
Updated by Dan0042 (Daniel DeLorme) over 3 years ago
jeremyevans0 (Jeremy Evans) wrote in #note-16:
It's not difficult to make
__FILE__
userealpath
I read this once more a bit more carefully and I think there's a misunderstanding... the idea here is that realpath should be used in require
before checking if the file is already loaded (and then loading it). So p->ruby_sourcefile_string
would already be the realpath.
Updated by jeremyevans0 (Jeremy Evans) over 3 years ago
Dan0042 (Daniel DeLorme) wrote in #note-20:
jeremyevans0 (Jeremy Evans) wrote in #note-16:
It's not difficult to make
__FILE__
userealpath
I read this once more a bit more carefully and I think there's a misunderstanding... the idea here is that realpath should be used in
require
before checking if the file is already loaded (and then loading it). Sop->ruby_sourcefile_string
would already be the realpath.
I tried that approach first when working on #17885. It doesn't work because the loaded feature index is designed around load paths and not real paths. You would have to rewrite the loaded feature index and its interaction with autoload to switch to that approach.
Updated by vo.x (Vit Ondruch) about 3 years ago
I wonder what is the situation here given that the #17885 seems to be resolved?
Updated by retro (Josef Šimánek) almost 3 years ago
We have hit this problem as well in rubygems and it is really unexpected and confusing.
https://github.com/rubygems/rubygems/pull/5437
TL;DR; test passes calling rake test
, but not using ruby -I... test/file.rb
Updated by retro (Josef Šimánek) almost 3 years ago
In the end RubyGems decided to leave __FILE__
due to this behaviour at all. If I understand Jeremy's original reasoning well, it was actually misused. Anyway it would be great to inline __FILE__
and __dir__
to provide consistent output. https://github.com/rubygems/rubygems/pull/5444
Updated by hsbt (Hiroshi SHIBATA) 10 months ago
- Status changed from Open to Assigned