Misc #17565
openPrefer use of access(2) in rb_file_load_ok() to check for existence of require'd files
Description
When using Ruby in Docker (2.5 in our case, but the code is unchanged in 15 years across all versions) with a large $LOAD_PATH some millions of calls are made to open(2)
with a mean cost of 130µsec per call, where a call to access(2)
has a cost around 5× lower (something around 28µsec).
With a Rails 5 app, without Zeitwerk, the load path is searched iteratively looking for a file to define a constant, this causes something like 2,000,000 calls to open(2)
of which 97.5% are failing with ENOENT
.
I believe that the cost of two syscalls (open(2)
only after successful access(2)
) would, in our case, at least because we would shave-off something like 1,900,000×90µsec (2.85 minutes) from the three minute boot time for our application.
I prepared a very naïve patch with a simple early-return in rb_file_load_ok
:
diff --git a/file.c b/file.c
index 3bf092c05c..c7a7635125 100644
--- a/file.c
+++ b/file.c
@@ -5986,6 +5986,16 @@ rb_file_load_ok(const char *path)
O_NDELAY |
#endif
0);
+ if (access(path, R_OK) == -1) return 0;
int fd = rb_cloexec_open(path, mode, 0);
if (fd == -1) return 0;
rb_update_max_fd(fd);
This hasn't been exhaustively tested as I simply haven't had time yet, but at least it compiled and passed make check
.
I spoke with Aaron Patterson on Twitter, who suggested maybe a wiser approach would be a heuristic approach one level higher (rb_find_file
?) which switches the strategy based on the length of the LOAD_PATH.
Alternatively, maybe the patch could be conditional, guarded somehow, and conditionally compiled only into the Rubies built for Docker, in a way that is portable to the common Ruby version managers.
I am opening this ticket to track my own work, as much as anything, with no expectation that someone implement this on my behalf. I am eager to contribute to Ruby for all the benefit I have seen from it in my career.
If someone knows hints why this may be an unsuccessful adventure, I gratefully receive any and all feedback.
Updated by leehambley (Lee Hambley) almost 4 years ago
There is also eaccess
conditionally defined depending on the platform, which does a suite more tests, probably that should be considered in preference to access
directly, those checks are probably important in some environments.
Updated by byroot (Jean Boussier) almost 4 years ago
open(2) with a mean cost of 130µsec per call, where a call to access(2) has a cost around 5× lower (something around 28µsec)
Which system was this measured on? Is this only a Docker thing?
Also was it 5x faster on missing files or on existing ones? Because intuitively you'd think for a missing file the same work would be performed (path lookup).
Also note that recent OSX are infamous for their slow open(2)
(and stat()
) calls (because of some security feature), I'm not sure wether access(2)
is impacted as well, but it could very well degrade performance on OSX.
Also ultimately, this is a problem solved by Bootsnap by avoiding the O($LOAD_PATH.size)
performance of Kernel.require
by precompiling and caching all loadable paths.
Updated by leehambley (Lee Hambley) almost 4 years ago
Which system was this measured on? Is this only a Docker thing?
Yes, recent macbook pros, couple of different configurations, all with SSDs; Docker with grpcfuse, the new driver they use for macOS and Windows sync to the "invisible" VM in the middle.
Also was it 5x faster on missing files or on existing ones? Because intuitively you'd think for a missing file the same work would be performed (path lookup).
It seems like 130µsec is the cost of calling open
in any case. I have the sumarized strace output, where mean cost per call comes out at 130µsec, but in the long-form report, the price (in case of existence, or non-existence of the file) comes out to 130±5 µsec.
Also note that recent OSX are infamous for their slow open(2) (and stat()) calls (because of some security feature), I'm not sure wether access(2) is impacted as well, but it could very well degrade performance on OSX.
Thanks for the context in that sentence, I wasn't aware of that, but makes sense. access
is deceptively simple, and the docs don't hint how careful it is, Ruby has an implementation (eaccess
) which seems to do more user/role/permission checking which according to what I understood tries to make it behave more like open
, but no clue, I need to look into it more.
Also ultimately, this is a problem solved by Bootsnap
Yes, and no, unfortunately Bootsnap has a really weird interaction with Docker, Docker can be optimized for fast reads, or writes (to some extent, the docs aren't entirely accurate regarding cached/deferred flags on Docker mounts, it was never fully implemented) but Bootsnap causes a lot of writes when booting, so Bootsnap can slow down the boot in our experience.
Bootsnap assumes that checking if the cache is up to date is fast, and it simply isn't on Docker.
...performance of Kernel.require by pre-compiling and caching all loadable paths.
I wonder how to reconcile such a caching with the other problem of Docker (which is often that you don't get reliable filesystem events to clear a cache with), and that Rails for example can be a long-living Ruby process which may need to be able to require
newly created files (controllers, models, etc) that were not in the cache.
I need to continue testing before I can formulate any proposal, but whatever should happen, should probably only be compiled for container workloads of a certain category.
Updated by byroot (Jean Boussier) almost 4 years ago
I wonder how to reconcile such a caching with the other problem of Docker (which is often that you don't get reliable filesystem events to clear a cache with)
Bootsnap doesn't rely on FS events.
Rails for example can be a long-living Ruby process which may need to be able to require newly created files
In newer setup using Zeitwerk, It's better to not have the apps path cached by Bootsnap, because Zeitwerk only require absolute paths anyway, so it's redundant. There's an option in Rails for that: config.add_autoload_paths_to_load_path = false
. I suppose it will become the default soon when the "classic" autoloader is fully removed.
Bootsnap causes a lot of writes when booting
I assume you refer to the compile cache? You should try enabling Bootsnap load_path caching, but disabling the other caches (Iseq & YAML). Honestly the Iseq cache isn't that effective anyway, the real big saver is the load_path_cache, and that one is very small, so doesn't write a lot. It does need to scan directories though.