Feature #15797
closedUse realpath(3) instead of custom realpath implementation if available
Description
One reason to do this is simplicity, as this approach is ~30 lines of
code instead of ~200.
Performance wise, this performs 25%-115% better, using the following
benchmark on OpenBSD 6.5:
require 'benchmark'
f = File
pwd = Dir.pwd
Dir.mkdir('b') unless f.directory?('b')
f.write('b/a', '') unless f.file?('b/a')
args = [
["b/a", nil],
["#{pwd}/b/a", nil],
['a', 'b'],
["#{pwd}/b/a", 'b'],
["b/a", pwd]
]
args.each do |path, base|
print "File.realpath(#{path.inspect}, #{base.inspect}): ".ljust(50)
puts Benchmark.measure{100000.times{f.realpath(path, base)}}
end
Before:
File.realpath("b/a", nil): 4.330000 2.990000 7.320000 ( 7.316244)
File.realpath("/home/testr/ruby/b/a", nil): 3.560000 2.680000 6.240000 ( 6.240951)
File.realpath("a", "b"): 4.370000 3.080000 7.450000 ( 7.452511)
File.realpath("/home/testr/ruby/b/a", "b"): 3.730000 2.640000 6.370000 ( 6.371979)
File.realpath("b/a", "/home/testr/ruby"): 3.590000 2.630000 6.220000 ( 6.226824)
After:
File.realpath("b/a", nil): 1.370000 2.030000 3.400000 ( 3.400775)
File.realpath("/home/testr/ruby/b/a", nil): 1.260000 2.770000 4.030000 ( 4.024957)
File.realpath("a", "b"): 2.090000 1.990000 4.080000 ( 4.080284)
File.realpath("/home/testr/ruby/b/a", "b"): 1.400000 2.620000 4.020000 ( 4.015505)
File.realpath("b/a", "/home/testr/ruby"): 2.150000 2.760000 4.910000 ( 4.910634)
If someone could benchmark before/after with this patch on Linux and/or MacOS X,
and post the results here, I would appreciate it.
My personal reason for wanting this is that the custom realpath
implementation does not work with OpenBSD's unveil(2) system call,
which limits access to the file system, allowing for security
similar to chroot(2), without most of the downsides.
This change passes all tests except for one assertion related to
taintedness. Previously, if either argument to File.realpath
is an
absolute path, then the returned value is considered not tainted.
However, I believe that behavior to be incorrect, because if there is
a symlink anywhere in the path, the returned value can contain a
section that was taken from the file system (unreliable source) that
was not marked as untainted. Example:
Dir.mkdir('b') unless File.directory?('b')
File.write('b/a', '') unless File.file?('b/a')
File.symlink('b', 'c') unless File.symlink?('c')
path = File.realpath('c/a'.untaint, Dir.pwd.untaint)
path # "/home/testr/ruby/b/a"
path.tainted? # should be true, as 'b' comes from file system
I believe it is safer to always mark the output of realpath as tainted
to prevent this issue, which is what this commit does.
Files
Updated by nobu (Nobuyoshi Nakada) over 5 years ago
The tainted-ness issue seems a different story.
Could you make it a separate ticket?
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
nobu (Nobuyoshi Nakada) wrote:
The tainted-ness issue seems a different story.
Could you make it a separate ticket?
Added #15803 for that.
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
Attached is an updated patch against trunk, now that the taint fix in #15803 was committed. One other change in this patch is that the realpath_rec
function is no longer compiled if the native realpath(3) implementation is used.
Updated by akr (Akira Tanaka) over 5 years ago
PATH_MAX is dangerous.
Quotes from http://man7.org/linux/man-pages/man3/realpath.3.html
BUGS
The POSIX.1-2001 standard version of this function is broken by
design, since it is impossible to determine a suitable size for the
output buffer, resolved_path. According to POSIX.1-2001 a buffer of
size PATH_MAX suffices, but PATH_MAX need not be a defined constant,
and may have to be obtained using pathconf(3). And asking
pathconf(3) does not really help, since, on the one hand POSIX warns
that the result of pathconf(3) may be huge and unsuitable for
mallocing memory, and on the other hand pathconf(3) may return -1 to
signify that PATH_MAX is not bounded. The resolved_path == NULL
feature, not standardized in POSIX.1-2001, but standardized in
POSIX.1-2008, allows this design problem to be avoided.
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
akr (Akira Tanaka) wrote:
PATH_MAX is dangerous.
Quotes from http://man7.org/linux/man-pages/man3/realpath.3.html
I'm not sure if this is still a problem on modern Linux, but here's an updated patch that passes a NULL pointer as the second argument to realpath(3), and then frees the pointer after the Ruby string is created.
Also, just in case Windows starts defining realpath(3) in the future, make sure not to use this implementation on Windows, as the check for path
being absolute does not work with drive letters.
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
Attached is the latest version of this patch, with the following improvements:
- Add configure check for realpath(3), to enable this code path automatically.
- Even if realpath(3) is supported, use current code if RB_REALPATH_DIR is given, as most operating systems require the file referenced by realpath(3) exist, and RB_REALPATH_DIR doesn't require this. However, OpenBSD and FreeBSD do not require the last component of the path to exist, only the preceding components, so realpath(3) can be used in the RB_REALPATH_DIR case on OpenBSD and FreeBSD.
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
I have added a pull request for this feature (https://github.com/ruby/ruby/pull/2205), and made the necessary changes so it passes Travis and AppVeyor. This changes related to fixing encoding issues and working around issues in glibc realpath(3) returning ENOTDIR and ENOENT in some cases that the current recursive realpath implementation handles correctly (falling back to the current implementation in those cases if realpath(3) returns NULL).
From some research, it looks like realpath(3) on Mac OS 10.5 and below does not handle a NULL second argument. Does Ruby still support Mac OS X versions that old (10.5 was released October 2007)? If so, I can update configure.ac to check for realpath(3) supporting a NULL second argument if the function itself is supported.
Can anyone run the benchmark in the pull request on Linux and Mac OS X (and possibly other operating systems) and post the results?
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
I was able to run the benchmarks in a Linux virtual machine. This includes the benchmarks I added for File.realdirpath
and for File.realpath
where the paths that do not exist. File.realdirpath
always uses the emulated code on Linux, so performance is basically the same for it. For File.realpath
, this results in a 30%-146% performance improvement.
Warming up --------------------------------------
relative_nil 68.629k i/s - 70.070k times in 1.020996s (14.57μs/i)
absolute_nil 57.387k i/s - 60.762k times in 1.058804s (17.43μs/i)
relative_relative 55.483k i/s - 58.799k times in 1.059775s (18.02μs/i)
absolute_relative 57.913k i/s - 61.438k times in 1.060868s (17.27μs/i)
relative_absolute 45.609k i/s - 45.756k times in 1.003219s (21.93μs/i)
relative_nil_dir 29.184k i/s - 30.960k times in 1.060844s (34.26μs/i)
absolute_nil_dir 33.260k i/s - 35.388k times in 1.063982s (30.07μs/i)
relative_relative_dir 28.179k i/s - 29.892k times in 1.060801s (35.49μs/i)
absolute_relative_dir 32.118k i/s - 33.948k times in 1.056964s (31.13μs/i)
relative_absolute_dir 31.790k i/s - 34.008k times in 1.069771s (31.46μs/i)
relative_nil_notexist 37.898k i/s - 39.276k times in 1.036371s (26.39μs/i)
absolute_nil_notexist 34.093k i/s - 35.460k times in 1.040093s (29.33μs/i)
relative_relative_notexist 32.874k i/s - 34.464k times in 1.048382s (30.42μs/i)
absolute_relative_notexist 33.897k i/s - 35.736k times in 1.054244s (29.50μs/i)
relative_absolute_notexist 28.660k i/s - 30.480k times in 1.063515s (34.89μs/i)
Calculating -------------------------------------
new/ruby old/ruby
relative_nil 69.788k 28.397k i/s - 205.887k times in 2.950191s 7.250217s
absolute_nil 58.143k 32.227k i/s - 172.162k times in 2.961032s 5.342120s
relative_relative 55.612k 27.348k i/s - 166.447k times in 2.992992s 6.086198s
absolute_relative 58.391k 30.164k i/s - 173.738k times in 2.975413s 5.759725s
relative_absolute 45.409k 30.837k i/s - 136.827k times in 3.013203s 4.437175s
relative_nil_dir 29.215k 29.319k i/s - 87.552k times in 2.996856s 2.986143s
absolute_nil_dir 32.813k 33.347k i/s - 99.779k times in 3.040857s 2.992129s
relative_relative_dir 28.121k 28.421k i/s - 84.536k times in 3.006131s 2.974449s
absolute_relative_dir 32.306k 32.606k i/s - 96.355k times in 2.982614s 2.955131s
relative_absolute_dir 31.942k 31.896k i/s - 95.369k times in 2.985704s 2.989975s
relative_nil_notexist 38.438k 20.326k i/s - 113.692k times in 2.957835s 5.593418s
absolute_nil_notexist 34.183k 22.248k i/s - 102.279k times in 2.992088s 4.597170s
relative_relative_notexist 32.986k 19.736k i/s - 98.620k times in 2.989738s 4.996882s
absolute_relative_notexist 33.944k 22.027k i/s - 101.691k times in 2.995879s 4.616749s
relative_absolute_notexist 28.781k 21.903k i/s - 85.979k times in 2.987334s 3.925385s
Comparison:
relative_nil
new/ruby: 69787.7 i/s
old/ruby: 28397.4 i/s - 2.46x slower
absolute_nil
new/ruby: 58142.6 i/s
old/ruby: 32227.3 i/s - 1.80x slower
relative_relative
new/ruby: 55612.2 i/s
old/ruby: 27348.3 i/s - 2.03x slower
absolute_relative
new/ruby: 58391.2 i/s
old/ruby: 30164.3 i/s - 1.94x slower
relative_absolute
new/ruby: 45409.2 i/s
old/ruby: 30836.5 i/s - 1.47x slower
relative_nil_dir
old/ruby: 29319.4 i/s
new/ruby: 29214.6 i/s - 1.00x slower
absolute_nil_dir
old/ruby: 33347.2 i/s
new/ruby: 32812.8 i/s - 1.02x slower
relative_relative_dir
old/ruby: 28420.7 i/s
new/ruby: 28121.2 i/s - 1.01x slower
absolute_relative_dir
old/ruby: 32606.0 i/s
new/ruby: 32305.6 i/s - 1.01x slower
relative_absolute_dir
new/ruby: 31941.9 i/s
old/ruby: 31896.2 i/s - 1.00x slower
relative_nil_notexist
new/ruby: 38437.6 i/s
old/ruby: 20326.0 i/s - 1.89x slower
absolute_nil_notexist
new/ruby: 34183.2 i/s
old/ruby: 22248.3 i/s - 1.54x slower
relative_relative_notexist
new/ruby: 32986.2 i/s
old/ruby: 19736.3 i/s - 1.67x slower
absolute_relative_notexist
new/ruby: 33943.6 i/s
old/ruby: 22026.5 i/s - 1.54x slower
relative_absolute_notexist
new/ruby: 28781.2 i/s
old/ruby: 21903.3 i/s - 1.31x slower
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
After discussion with some OpenBSD developers, the non-POSIX behavior of realpath(3) in OpenBSD will be removed in the future (POSIX realpath(3) requires that all components of the path exist). So I've updated the pull request (https://github.com/ruby/ruby/pull/2205) to remove some of the ifdefs, and now all operating systems that implement realpath(3) will use the same behavior in Ruby for File.realpath
and File.realdirpath
.
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
I've made a couple minor tweaks to the pull request: https://github.com/ruby/ruby/pull/2205. Travis and AppVeyvor tests still pass with it. I plan to commit this in a few days, and if there is any fallout, I will address that by falling back to the emulated approach on those platforms.
Updated by jeremyevans0 (Jeremy Evans) over 5 years ago
- Status changed from Open to Closed
Pull request squashed and merged as 11c311e36fa6f27a9144b0aefe16bdffea651251.