[Feature] Dir.scan to yield dirent for efficient and composable recursive directory scaning
When you need to recusrsively scan a directory, you either have to use
Dir.glob, which is fine for small directories or simple patterns,
but can easily take several seconds to complete for large repositories or complex patterns and returns a very large array which tend to trash GC.
Or you can use
Dir.foreach recursively, but then you need to
stat each entry to know wether it's a directory, or even symlink if you want to follow them.
This means one syscall per directory, and one per file and directories. This is particularly impactful on OSX where
stat() is several times slower than on Linux because of various sandboxing features.
There's a typical example of this use case in Bootsnap.
os.scandir a few years ago for exactly this purpose. It is functionaly similar to
Dir.each_child, except it yields
DirEntry instances which are a wrapper around the
I reduced the Bootsnap code into a simplified benchmark, and using
os.scandir() Python scan our main repo in a bit over
1s, which 3 to 4 times faster
than Ruby can with
3-4s). For comparison sake
Dir['**/*.rb'] also complete in about
So I beleive that exposing a similar
Dir.scan method, returning
Dir::Entry instances, with methods inspired from
File::Stat such as
directory? would allow for more performant file system scaning
when the query is not easily expressed with a glob pattern.
No data to display