Project

General

Profile

Actions

Bug #19246

closed

Rebuilding the loaded feature index much slower in Ruby 3.1

Added by thomthom (Thomas Thomassen) almost 2 years ago. Updated 10 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:111342]

Description

Some background to this issue: (This is a case that is unconventional usage of Ruby, but I hope you bear with me.)

We ship the Ruby interpreter with our desktop applications for plugin support in our application (SketchUp).

One feature we have had since, at least 2006 (maybe earlier-hard to track history beyond that) is that we had a custom alternate require method: Sketchup.require. This allows the users of our API to load encrypted Ruby files.

This originally used rb_provide to add the path to the encrypted file into the list of loaded feature. However, somewhere between Ruby 2.2 and 2.5 there was some string optimisations made and the function rb_provide would not use a copy of the string passed to it. Instead it just held on to a pointer reference. In our case that string came from user-land, being passed in from Sketchup.require and would eventually be garbage collected and cause access violation crashes.

To work around that we changed our custom Sketchup.require to push to $LOADED_FEATURES directly. There was a small penalty to the index being rebuilt after that, but it was negligible.

Recently we tried to upgrade the Ruby interpreter in our application from 2.7 to 3.1 and found a major performance reduction when using our `Sketchup.require. As in, a plugin that would load in half a second would now spend 30 seconds.

From https://bugs.ruby-lang.org/issues/18452 it sounds like there is some expected extra penalty due to changes in how the index is built. But should it really be this much?

Example minimal repro to simulate the issue:

# frozen_string_literal: true
require 'benchmark'

iterations = 200

foo_files = iterations.times.map { |i| "#{__dir__}/tmp/foo-#{i}.rb" }
foo_files.each { |f| File.write(f, "") }

bar_files = iterations.times.map { |i| "#{__dir__}/tmp/bar-#{i}.rb" }
bar_files.each { |f| File.write(f, "") }

biz_files = iterations.times.map { |i| "#{__dir__}/tmp/biz-#{i}.rb" }
biz_files.each { |f| File.write(f, "") }

Benchmark.bm do |x|
  x.report('normal') {
    foo_files.each { |file|
      require file
    }
  }
  x.report('loaded_features') {
    foo_files.each { |file|
      require file
      $LOADED_FEATURES << "#{file}-fake.rb"
    }
  }
  x.report('normal again') {
    biz_files.each { |file|
      require file
    }
  }
end
C:\Users\Thomas\SourceTree\ruby-perf>ruby27.bat
ruby 2.7.4p191 (2021-07-07 revision a21a3b7d23) [x64-mingw32]

C:\Users\Thomas\SourceTree\ruby-perf>ruby test-require.rb
       user     system      total        real
normal  0.000000   0.031000   0.031000 (  0.078483)
loaded_features  0.015000   0.000000   0.015000 (  0.038759)
normal again  0.016000   0.032000   0.048000 (  0.076940)
C:\Users\Thomas\SourceTree\ruby-perf>ruby30.bat
ruby 2.7.4p191 (2021-07-07 revision a21a3b7d23) [x64-mingw32]

C:\Users\Thomas\SourceTree\ruby-perf>ruby test-require.rb
       user     system      total        real
normal  0.000000   0.031000   0.031000 (  0.074733)
loaded_features  0.032000   0.000000   0.032000 (  0.038898)
normal again  0.000000   0.047000   0.047000 (  0.076343)
C:\Users\Thomas\SourceTree\ruby-perf>ruby31.bat
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x64-mingw-ucrt]

C:\Users\Thomas\SourceTree\ruby-perf>ruby test-require.rb
       user     system      total        real
normal  0.016000   0.031000   0.047000 (  0.132633)
loaded_features  1.969000  11.500000  13.469000 ( 18.395761)
normal again  0.031000   0.125000   0.156000 (  0.249130)

Right now we're exploring options to deal with this. Because the performance degradation is a blocker for us upgrading. We also have 16 years of plugins created by third party developer that makes it impossible for us to drop this feature.

Some options as-is, none of which are ideal:

  1. We revert to using rb_provide but ensure the string passed in is not owned by Ruby, instead building a list of strings that we keep around for the duration of the application process. The problem is that some of our plugin developers have on occasion released plugins that will touch $LOADED_FEATURES, and if such a plugin is installed on a user machine it might cause the application to become unresponsive for minutes. The other non-ideal issue with using rb_provide is that we're also using that in ways it wasn't really intended (from that I understand). And it's not an official API?

  2. We create a separate way for our Sketchup.require to keep track of it's loaded features, but then that would diverge even more from the behaviour of require. Replicating require functionality is not trivial and would be prone to subtle errors and possible diverge. It also doesn't address our issue that there is code out there in existing plugins that touches $LOADED_FEATURES. (And it's not something we can just ask people to clean up. From previous experience old versions stick around for a long time and is very hard to purge from circulation.)

I have two questions for the Ruby mantainers:

  1. Would it be reasonable to see an API for adding/removing/checking $LOADED_FEATURE that would allow for a more ideal implementation of a custom require functionality?

  2. Is the performance difference in rebuilding the loaded feature index really expected to be as high as what we're seeing? An increase of nearly 100 times? Is there something there that might be addressed to make the rebuild to be less expensive against? (This would really help to address our challenges with third party plugins occasionally touching the global.)

Updated by thomthom (Thomas Thomassen) almost 2 years ago

Additional note: For development purposes there are also need to remove from $LOADED_FEATURES in order to reload files without reloading the application. The application being a desktop application makes it not ideal to restart it every time you make a source code change while developing a plugin. In this case is performance impact is also a major issue as you'd constantly be hit with extremely slow reloads.

Updated by thomthom (Thomas Thomassen) almost 2 years ago

One more thing (Sorry for sounding like Colombo), we also remove items from $LOADED_FEATURES when plugins are updated. Because the application is long running and we have a plugin manager that lets the user install new versions we remove all entries in $LOADED_FEATURES that belong to the plugin being updated in order to allow the update to take effect immediately. At the moment there is no other means than manipulating the global directly. And in 3.1 the performance impact is too great to be put in front of users. (Something like rb_unprovide would be nice.)

Updated by MSP-Greg (Greg L) over 1 year ago

This does seem like a considerable slowdown on Windows.

The above code uses absolute paths with require, and also adds nonexistent files to $LOADED_FEATURES. I changed the code so that all requires are simply a File.basename,
and the items pushed to $LOADED_FEATURES are real files. Below is data for several Ruby versions, run with both Windows and WSL2/Ubuntu 22.04.

The 'normal' & normal again' benches each require 200 files. The 'loaded features' bench requires 200 files, and additionally adds 200 different file names to $LOADED_FEATURES.

Adding items to $LOADED_FEATURES has worse performance with Ruby 3.1 and later on both Windows & Ubuntu, but the Windows degradation is orders of magnitude larger.

Issue 19378 'Windows: Use less syscalls for faster require of big gems' seems to involve the same problem.

Windows

                      user     system      total        real

ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [x64-mingw32]
normal            0.016000   0.047000   0.063000 (  0.103496)
loaded features   0.031000   0.031000   0.062000 (  0.157406)
normal again      0.000000   0.094000   0.094000 (  0.100160)

ruby 3.0.5p211 (2022-11-24 revision ba5cf0f7c5) [x64-mingw32]
normal            0.000000   0.063000   0.063000 (  0.100240)
loaded features   0.032000   0.031000   0.063000 (  0.156637)
normal again      0.000000   0.062000   0.062000 (  0.098520)

ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x64-mingw-ucrt]
normal            0.000000   0.062000   0.062000 (  0.174335)
loaded features   3.141000  18.860000  22.001000 ( 33.106847)
normal again      0.015000   0.265000   0.280000 (  0.400028)

ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x64-mingw-ucrt]
normal            0.000000   0.157000   0.157000 (  0.168443)
loaded features   3.516000  18.547000  22.063000 ( 32.569175)
normal again      0.047000   0.218000   0.265000 (  0.390879)

ruby 3.3.0dev (2023-02-27T15:23:31Z master ea830ab29d) [x64-mswin64_140]
normal            0.015000   0.094000   0.109000 (  0.175751)
loaded features   3.235000  16.640000  19.875000 ( 31.294871)
normal again      0.031000   0.203000   0.234000 (  0.386469)

                      user     system      total        real
require 800       0.015000   0.235000   0.250000 (  0.391236)  ruby 2.7.7p221
require 800       0.078000   0.156000   0.234000 (  0.391100)  ruby 3.0.5p211
require 800       0.031000   0.484000   0.515000 (  0.664927)  ruby 3.1.0p0
require 800       0.109000   0.391000   0.500000 (  0.651035)  ruby 3.1.3p185
require 800       0.062000   0.406000   0.468000 (  0.704620)  ruby 3.3.0dev

Note: Ruby master timing is similar for mingw, ucrt, & mswin builds

WSL2/Ubuntu

                      user     system      total        real

ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [x86_64-linux]
normal            0.001399   0.002799   0.004198 (  0.004197)
loaded features   0.065028   0.000000   0.065028 (  0.065073)
normal again      0.004481   0.000000   0.004481 (  0.004482)

ruby 3.0.4p208 (2022-04-12 revision 3fa771dded) [x86_64-linux]
normal            0.004395   0.000000   0.004395 (  0.004394)
loaded features   0.065669   0.000000   0.065669 (  0.065717)
normal again      0.004371   0.000000   0.004371 (  0.004371)

ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]
normal            0.006511   0.000000   0.006511 (  0.006505)
loaded features   0.189599   0.138964   0.328563 (  0.328602)
normal again      0.004792   0.002796   0.007588 (  0.007588)

ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x86_64-linux]
normal            0.004290   0.001430   0.005720 (  0.005719)
loaded features   0.098113   0.210443   0.308556 (  0.308597)
normal again      0.006900   0.000000   0.006900 (  0.006899)

ruby 3.3.0dev (2023-02-27T21:34:27Z master ba55706fb6) [x86_64-linux]
normal            0.003708   0.001236   0.004944 (  0.004942)
loaded features   0.118627   0.160026   0.278653 (  0.278693)
normal again      0.000000   0.006377   0.006377 (  0.006378)

                      user     system      total        real
require 800       0.017006   0.000000   0.017006 (  0.017004)  ruby 2.7.7p221
require 800       0.016817   0.000000   0.016817 (  0.016819)  ruby 3.0.4p208
require 800       0.023295   0.000000   0.023295 (  0.023295)  ruby 3.1.0p0
require 800       0.020107   0.000000   0.020107 (  0.020106)  ruby 3.1.3p185
require 800       0.009945   0.009533   0.019478 (  0.019476)  ruby 3.3.0dev

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

Since @matz (Yukihiro Matsumoto) asked for an update and I won't be able to attend the dev meeting, my understanding is that rebuilding the loaded feature index is much slower in Ruby 3.1+, because realpaths are calculated for every loaded feature every time the loaded feature index is rebuilt. This was done to fix #17885.

I didn't consider this a major issue at the time, because in general, the loaded feature index does not need to be rebuilt unless $LOADED_FEATURES is mutated, and it shouldn't be during normal use. Also, the slowdown is not that dramatic on platforms supporting realpath function. It's only very bad on platforms that are also emulating realpath (realpath emulation is quite slow).

One possible optimization I haven't tried yet is using the previous vm->loaded_features_realpaths hash when rebuilding the loaded feature index. Basically, first try looking at the previous hash for the realpath, and if it is found there, assume the realpath has not changed. If the realpath doesn't exist in the previous hash, then try to calculate it. I'll see if I have time today to test that.

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

jeremyevans0 (Jeremy Evans) wrote in #note-4:

One possible optimization I haven't tried yet is using the previous vm->loaded_features_realpaths hash when rebuilding the loaded feature index. Basically, first try looking at the previous hash for the realpath, and if it is found there, assume the realpath has not changed. If the realpath doesn't exist in the previous hash, then try to calculate it. I'll see if I have time today to test that.

I wasn't able to use this approach, since the hash needs to map loaded feature paths to realpaths (and vm->loaded_features_realpaths does not do that), but I was able to add a new hash that does that, and significantly improves performance even when realpath is not emulated. I submitted a pull request for it: https://github.com/ruby/ruby/pull/7699

Actions #6

Updated by jeremyevans (Jeremy Evans) over 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|1f115f141dd17f75049a5e17107906c5bcc372e1.


Speed up rebuilding the loaded feature index

Rebuilding the loaded feature index slowed down with the bug fix
for #17885 in 79a4484a072e9769b603e7b4fbdb15b1d7eccb15. The
slowdown was extreme if realpath emulation was used, but even when
not emulated, it could be about 10x slower.

This adds loaded_features_realpath_map to rb_vm_struct. This is a
hidden hash mapping loaded feature paths to realpaths. When
rebuilding the loaded feature index, look at this hash to get
cached realpath values, and skip calling rb_check_realpath if a
cached value is found.

Fixes [Bug #19246]

Updated by MSP-Greg (Greg L) over 1 year ago

@jeremyevans0 (Jeremy Evans)

Thank you for working on this. Refreshed data:

ruby 2.7.8p225 (2023-03-30 revision 1f4d455848) [x64-mingw32]
                      user     system      total        real
normal            0.015000   0.078000   0.093000 (  0.103699)
loaded features   0.078000   0.078000   0.156000 (  0.157629)
normal again      0.016000   0.063000   0.079000 (  0.100165)

ruby 3.0.6p216 (2023-03-30 revision 23a532679b) [x64-mingw32]
                      user     system      total        real
normal            0.031000   0.047000   0.078000 (  0.102176)
loaded features   0.078000   0.062000   0.140000 (  0.155943)
normal again      0.000000   0.094000   0.094000 (  0.098305)

ruby 3.1.4p223 (2023-03-30 revision 957bb7cb81) [x64-mingw-ucrt]
                      user     system      total        real
normal            0.016000   0.125000   0.141000 (  0.167011)
loaded features   4.672000  24.125000  28.797000 ( 31.891813)
normal again      0.093000   0.266000   0.359000 (  0.369579)

ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt]
                      user     system      total        real
normal            0.047000   0.110000   0.157000 (  0.173098)
loaded features   5.141000  23.828000  28.969000 ( 31.793334)
normal again      0.062000   0.297000   0.359000 (  0.369032)

ruby 3.3.0dev (2023-04-14T08:07:33Z master 6c1b604e29) [x64-mingw-ucrt]
                      user     system      total        real
normal            0.000000   0.141000   0.141000 (  0.162872)
loaded features   0.078000   0.203000   0.281000 (  0.298789)
normal again      0.031000   0.125000   0.156000 (  0.164640)

Ruby master is approx two orders of magnitude faster than Ruby 3.1 & 3.2. A bit slower than Ruby 2.7 and 3.0, but well within acceptable results.

I assume this will be backported to Ruby 3.1 & 3.2?

Thanks again.

Greg

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

  • Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN to 3.0: DONTNEED, 3.1: REQUIRED, 3.2: REQUIRED

I've updated the backport flags for this issue. However, as always, it is up to the branch maintainer whether or not to backport.

Updated by thomthom (Thomas Thomassen) over 1 year ago

Thank you very much for looking into this. We'll be testing the patch to see how the performance is with our application in our, somewhat special use case.

I got a follow up question: As mentioned, the reason why we modify $LOADED_FEATURES is because we provide our own require method for encrypted source code. But the current API rb_provide isn't ideal for this scenario. Partly because the string passed in needs to always live. The other is that for our reloading functionality we need to remove items from this array. If we could use C API methods designed to add/remove to this array without needing to rebuild the cache that'd give us some very nice performance boosts. And make our logic more robust.

Is this something that could be considered? (Maybe best if I log a separate ticket for this topic?)

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

thomthom (Thomas Thomassen) wrote in #note-9:

Is this something that could be considered? (Maybe best if I log a separate ticket for this topic?)

It would be best to log a separate issue stating what your use case is, and what you proposed feature is for fixing it. Even better if you can come up with a proposed patch.

Updated by byroot (Jean Boussier) over 1 year ago

But the current API rb_provide isn't ideal for this scenario. Partly because the string passed in needs to always live.

For what it's worth, I just checked the current version of rb_provide and it uses rb_fstring_cstr() which does end up copying the char *. I'm not sure which version exactly caused you issues, but I think you could go back to use rb_provide again.

Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago

  • Backport changed from 3.0: DONTNEED, 3.1: REQUIRED, 3.2: REQUIRED to 3.0: DONTNEED, 3.1: REQUIRED, 3.2: DONE

Thank you for creating a backport patch and pull request. Merged into ruby_3_2.

Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago

  • Backport changed from 3.0: DONTNEED, 3.1: REQUIRED, 3.2: DONE to 3.0: DONTNEED, 3.1: REQUIRED, 3.2: REQUIRED

Updated by nagachika (Tomoyuki Chikanaga) about 1 year ago

@MSP-Greg (Greg L) Thank you for investigations and providing patches! I will handle it soon.

Updated by nagachika (Tomoyuki Chikanaga) about 1 year ago

  • Backport changed from 3.0: DONTNEED, 3.1: REQUIRED, 3.2: REQUIRED to 3.0: DONTNEED, 3.1: REQUIRED, 3.2: DONE

Updated by usa (Usaku NAKAMURA) about 1 year ago

  • Backport changed from 3.0: DONTNEED, 3.1: REQUIRED, 3.2: DONE to 3.0: DONTNEED, 3.1: DONE, 3.2: DONE

ruby_3_1 8a8d889ca2ba1ace8784e49e02a6616dd5765d9e merged revision(s) 1f115f141dd17f75049a5e17107906c5bcc372e1.

Updated by nagachika (Tomoyuki Chikanaga) 12 months ago

  • Backport changed from 3.0: DONTNEED, 3.1: DONE, 3.2: DONE to 3.0: DONTNEED, 3.1: DONE, 3.2: REQUIRED

I will fill Backport field with "3.2: REQUIRED" according to the comments in https://github.com/ruby/ruby/pull/8275. I will retry to backport to ruby_3_2.

Updated by nagachika (Tomoyuki Chikanaga) 12 months ago

  • Backport changed from 3.0: DONTNEED, 3.1: DONE, 3.2: REQUIRED to 3.0: DONTNEED, 3.1: DONE, 3.2: DONE

ruby_3_2 5baf94f9131fb45d50c8c408e007a138ced46606 merged revision(s) 1f115f141dd17f75049a5e17107906c5bcc372e1.

Updated by thomthom (Thomas Thomassen) 10 months ago

byroot (Jean Boussier) wrote in #note-11:

But the current API rb_provide isn't ideal for this scenario. Partly because the string passed in needs to always live.

For what it's worth, I just checked the current version of rb_provide and it uses rb_fstring_cstr() which does end up copying the char *. I'm not sure which version exactly caused you issues, but I think you could go back to use rb_provide again.

The comment for rb_fstring_cstr:

rb_fstring_new and rb_fstring_cstr family create or lookup a frozen
 * shared string which refers a static string literal.  `ptr` must
 * point a constant string.

https://github.com/ruby/ruby/blob/c68ce6f7f5e6020409cfe689ba5946f284828d4b/string.c#L476-L486

That doesn't sound like it's doing a copy. It's expecting a static string literal. Which is why we ran into issue when we were providing non-static strings.

Updated by byroot (Jean Boussier) 10 months ago

That doesn't sound like it's doing a copy.

My mad, you are right, I missed the copy argument is passed as false.

Honestly I think we could change that. Feature names are generally small, it wouldn't be a noticeable performance degradation to assume the char * isn't constant and to always copy it.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like1Like1Like1Like1Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like1Like0Like0