Project

General

Profile

Feature #15797

Use realpath(3) instead of custom realpath implementation if available

Added by jeremyevans0 (Jeremy Evans) 3 months ago. Updated 16 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:92425]

Description

One reason to do this is simplicity, as this approach is ~30 lines of
code instead of ~200.

Performance wise, this performs 25%-115% better, using the following
benchmark on OpenBSD 6.5:

require 'benchmark'

f = File
pwd = Dir.pwd
Dir.mkdir('b') unless f.directory?('b')
f.write('b/a', '') unless f.file?('b/a')

args = [
  ["b/a", nil],
  ["#{pwd}/b/a", nil],
  ['a', 'b'],
  ["#{pwd}/b/a", 'b'],
  ["b/a", pwd]
]

args.each do |path, base|
  print "File.realpath(#{path.inspect}, #{base.inspect}): ".ljust(50)
  puts Benchmark.measure{100000.times{f.realpath(path, base)}}
end

Before:

File.realpath("b/a", nil):                          4.330000   2.990000   7.320000 (  7.316244)
File.realpath("/home/testr/ruby/b/a", nil):         3.560000   2.680000   6.240000 (  6.240951)
File.realpath("a", "b"):                            4.370000   3.080000   7.450000 (  7.452511)
File.realpath("/home/testr/ruby/b/a", "b"):         3.730000   2.640000   6.370000 (  6.371979)
File.realpath("b/a", "/home/testr/ruby"):           3.590000   2.630000   6.220000 (  6.226824)

After:

File.realpath("b/a", nil):                          1.370000   2.030000   3.400000 (  3.400775)
File.realpath("/home/testr/ruby/b/a", nil):         1.260000   2.770000   4.030000 (  4.024957)
File.realpath("a", "b"):                            2.090000   1.990000   4.080000 (  4.080284)
File.realpath("/home/testr/ruby/b/a", "b"):         1.400000   2.620000   4.020000 (  4.015505)
File.realpath("b/a", "/home/testr/ruby"):           2.150000   2.760000   4.910000 (  4.910634)

If someone could benchmark before/after with this patch on Linux and/or MacOS X,
and post the results here, I would appreciate it.

My personal reason for wanting this is that the custom realpath
implementation does not work with OpenBSD's unveil(2) system call,
which limits access to the file system, allowing for security
similar to chroot(2), without most of the downsides.

This change passes all tests except for one assertion related to
taintedness. Previously, if either argument to File.realpath is an
absolute path, then the returned value is considered not tainted.
However, I believe that behavior to be incorrect, because if there is
a symlink anywhere in the path, the returned value can contain a
section that was taken from the file system (unreliable source) that
was not marked as untainted. Example:

Dir.mkdir('b') unless File.directory?('b')
File.write('b/a', '') unless File.file?('b/a')
File.symlink('b', 'c') unless File.symlink?('c')
path = File.realpath('c/a'.untaint, Dir.pwd.untaint)
path # "/home/testr/ruby/b/a"
path.tainted? # should be true, as 'b' comes from file system

I believe it is safer to always mark the output of realpath as tainted
to prevent this issue, which is what this commit does.


Files

use-native-realpath.patch (6.31 KB) use-native-realpath.patch jeremyevans0 (Jeremy Evans), 04/26/2019 08:59 PM
use-native-realpath-v2.patch (4.64 KB) use-native-realpath-v2.patch jeremyevans0 (Jeremy Evans), 04/28/2019 03:21 AM
use-native-realpath-v3.patch (5.18 KB) use-native-realpath-v3.patch jeremyevans0 (Jeremy Evans), 04/28/2019 04:04 AM
use-native-realpath-v4.patch (6.34 KB) use-native-realpath-v4.patch jeremyevans0 (Jeremy Evans), 05/16/2019 02:40 AM

History

Updated by nobu (Nobuyoshi Nakada) 3 months ago

The tainted-ness issue seems a different story.
Could you make it a separate ticket?

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

nobu (Nobuyoshi Nakada) wrote:

The tainted-ness issue seems a different story.
Could you make it a separate ticket?

Added #15803 for that.

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

Attached is an updated patch against trunk, now that the taint fix in #15803 was committed. One other change in this patch is that the realpath_rec function is no longer compiled if the native realpath(3) implementation is used.

Updated by akr (Akira Tanaka) 3 months ago

PATH_MAX is dangerous.

Quotes from http://man7.org/linux/man-pages/man3/realpath.3.html

BUGS
       The POSIX.1-2001 standard version of this function is broken by
       design, since it is impossible to determine a suitable size for the
       output buffer, resolved_path.  According to POSIX.1-2001 a buffer of
       size PATH_MAX suffices, but PATH_MAX need not be a defined constant,
       and may have to be obtained using pathconf(3).  And asking
       pathconf(3) does not really help, since, on the one hand POSIX warns
       that the result of pathconf(3) may be huge and unsuitable for
       mallocing memory, and on the other hand pathconf(3) may return -1 to
       signify that PATH_MAX is not bounded.  The resolved_path == NULL
       feature, not standardized in POSIX.1-2001, but standardized in
       POSIX.1-2008, allows this design problem to be avoided.

Updated by jeremyevans0 (Jeremy Evans) 3 months ago

akr (Akira Tanaka) wrote:

PATH_MAX is dangerous.

Quotes from http://man7.org/linux/man-pages/man3/realpath.3.html

I'm not sure if this is still a problem on modern Linux, but here's an updated patch that passes a NULL pointer as the second argument to realpath(3), and then frees the pointer after the Ruby string is created.

Also, just in case Windows starts defining realpath(3) in the future, make sure not to use this implementation on Windows, as the check for path being absolute does not work with drive letters.

Updated by jeremyevans0 (Jeremy Evans) 2 months ago

Attached is the latest version of this patch, with the following improvements:

  • Add configure check for realpath(3), to enable this code path automatically.
  • Even if realpath(3) is supported, use current code if RB_REALPATH_DIR is given, as most operating systems require the file referenced by realpath(3) exist, and RB_REALPATH_DIR doesn't require this. However, OpenBSD and FreeBSD do not require the last component of the path to exist, only the preceding components, so realpath(3) can be used in the RB_REALPATH_DIR case on OpenBSD and FreeBSD.

Updated by jeremyevans0 (Jeremy Evans) about 2 months ago

I have added a pull request for this feature (https://github.com/ruby/ruby/pull/2205), and made the necessary changes so it passes Travis and AppVeyor. This changes related to fixing encoding issues and working around issues in glibc realpath(3) returning ENOTDIR and ENOENT in some cases that the current recursive realpath implementation handles correctly (falling back to the current implementation in those cases if realpath(3) returns NULL).

From some research, it looks like realpath(3) on Mac OS 10.5 and below does not handle a NULL second argument. Does Ruby still support Mac OS X versions that old (10.5 was released October 2007)? If so, I can update configure.ac to check for realpath(3) supporting a NULL second argument if the function itself is supported.

Can anyone run the benchmark in the pull request on Linux and Mac OS X (and possibly other operating systems) and post the results?

Updated by jeremyevans0 (Jeremy Evans) about 2 months ago

I was able to run the benchmarks in a Linux virtual machine. This includes the benchmarks I added for File.realdirpath and for File.realpath where the paths that do not exist. File.realdirpath always uses the emulated code on Linux, so performance is basically the same for it. For File.realpath, this results in a 30%-146% performance improvement.

Warming up --------------------------------------
              relative_nil    68.629k i/s -     70.070k times in 1.020996s (14.57μs/i)
              absolute_nil    57.387k i/s -     60.762k times in 1.058804s (17.43μs/i)
         relative_relative    55.483k i/s -     58.799k times in 1.059775s (18.02μs/i)
         absolute_relative    57.913k i/s -     61.438k times in 1.060868s (17.27μs/i)
         relative_absolute    45.609k i/s -     45.756k times in 1.003219s (21.93μs/i)
          relative_nil_dir    29.184k i/s -     30.960k times in 1.060844s (34.26μs/i)
          absolute_nil_dir    33.260k i/s -     35.388k times in 1.063982s (30.07μs/i)
     relative_relative_dir    28.179k i/s -     29.892k times in 1.060801s (35.49μs/i)
     absolute_relative_dir    32.118k i/s -     33.948k times in 1.056964s (31.13μs/i)
     relative_absolute_dir    31.790k i/s -     34.008k times in 1.069771s (31.46μs/i)
     relative_nil_notexist    37.898k i/s -     39.276k times in 1.036371s (26.39μs/i)
     absolute_nil_notexist    34.093k i/s -     35.460k times in 1.040093s (29.33μs/i)
relative_relative_notexist    32.874k i/s -     34.464k times in 1.048382s (30.42μs/i)
absolute_relative_notexist    33.897k i/s -     35.736k times in 1.054244s (29.50μs/i)
relative_absolute_notexist    28.660k i/s -     30.480k times in 1.063515s (34.89μs/i)
Calculating -------------------------------------
                             new/ruby    old/ruby
              relative_nil    69.788k     28.397k i/s -    205.887k times in 2.950191s 7.250217s
              absolute_nil    58.143k     32.227k i/s -    172.162k times in 2.961032s 5.342120s
         relative_relative    55.612k     27.348k i/s -    166.447k times in 2.992992s 6.086198s
         absolute_relative    58.391k     30.164k i/s -    173.738k times in 2.975413s 5.759725s
         relative_absolute    45.409k     30.837k i/s -    136.827k times in 3.013203s 4.437175s
          relative_nil_dir    29.215k     29.319k i/s -     87.552k times in 2.996856s 2.986143s
          absolute_nil_dir    32.813k     33.347k i/s -     99.779k times in 3.040857s 2.992129s
     relative_relative_dir    28.121k     28.421k i/s -     84.536k times in 3.006131s 2.974449s
     absolute_relative_dir    32.306k     32.606k i/s -     96.355k times in 2.982614s 2.955131s
     relative_absolute_dir    31.942k     31.896k i/s -     95.369k times in 2.985704s 2.989975s
     relative_nil_notexist    38.438k     20.326k i/s -    113.692k times in 2.957835s 5.593418s
     absolute_nil_notexist    34.183k     22.248k i/s -    102.279k times in 2.992088s 4.597170s
relative_relative_notexist    32.986k     19.736k i/s -     98.620k times in 2.989738s 4.996882s
absolute_relative_notexist    33.944k     22.027k i/s -    101.691k times in 2.995879s 4.616749s
relative_absolute_notexist    28.781k     21.903k i/s -     85.979k times in 2.987334s 3.925385s

Comparison:
                           relative_nil
                  new/ruby:     69787.7 i/s
                  old/ruby:     28397.4 i/s - 2.46x  slower

                           absolute_nil
                  new/ruby:     58142.6 i/s
                  old/ruby:     32227.3 i/s - 1.80x  slower

                      relative_relative
                  new/ruby:     55612.2 i/s
                  old/ruby:     27348.3 i/s - 2.03x  slower

                      absolute_relative
                  new/ruby:     58391.2 i/s
                  old/ruby:     30164.3 i/s - 1.94x  slower

                      relative_absolute
                  new/ruby:     45409.2 i/s
                  old/ruby:     30836.5 i/s - 1.47x  slower

                       relative_nil_dir
                  old/ruby:     29319.4 i/s
                  new/ruby:     29214.6 i/s - 1.00x  slower

                       absolute_nil_dir
                  old/ruby:     33347.2 i/s
                  new/ruby:     32812.8 i/s - 1.02x  slower

                  relative_relative_dir
                  old/ruby:     28420.7 i/s
                  new/ruby:     28121.2 i/s - 1.01x  slower

                  absolute_relative_dir
                  old/ruby:     32606.0 i/s
                  new/ruby:     32305.6 i/s - 1.01x  slower

                  relative_absolute_dir
                  new/ruby:     31941.9 i/s
                  old/ruby:     31896.2 i/s - 1.00x  slower

                  relative_nil_notexist
                  new/ruby:     38437.6 i/s
                  old/ruby:     20326.0 i/s - 1.89x  slower

                  absolute_nil_notexist
                  new/ruby:     34183.2 i/s
                  old/ruby:     22248.3 i/s - 1.54x  slower

             relative_relative_notexist
                  new/ruby:     32986.2 i/s
                  old/ruby:     19736.3 i/s - 1.67x  slower

             absolute_relative_notexist
                  new/ruby:     33943.6 i/s
                  old/ruby:     22026.5 i/s - 1.54x  slower

             relative_absolute_notexist
                  new/ruby:     28781.2 i/s
                  old/ruby:     21903.3 i/s - 1.31x  slower

Updated by jeremyevans0 (Jeremy Evans) 30 days ago

After discussion with some OpenBSD developers, the non-POSIX behavior of realpath(3) in OpenBSD will be removed in the future (POSIX realpath(3) requires that all components of the path exist). So I've updated the pull request (https://github.com/ruby/ruby/pull/2205) to remove some of the ifdefs, and now all operating systems that implement realpath(3) will use the same behavior in Ruby for File.realpath and File.realdirpath.

Updated by jeremyevans0 (Jeremy Evans) 20 days ago

I've made a couple minor tweaks to the pull request: https://github.com/ruby/ruby/pull/2205. Travis and AppVeyvor tests still pass with it. I plan to commit this in a few days, and if there is any fallout, I will address that by falling back to the emulated approach on those platforms.

Updated by jeremyevans0 (Jeremy Evans) 16 days ago

  • Status changed from Open to Closed

Pull request squashed and merged as 11c311e36fa6f27a9144b0aefe16bdffea651251.

Also available in: Atom PDF