Feature #8709
closedDir.glob should return sorted file list
Description
On OS X, Dir.glob and Dir[] return an ordered list of files.
On Ubuntu Linux, they do not and one must manually sort them.
Returning a list of files that isn't in order fails the Principle of Least Astonishment.
I attach a unit test to demonstrate ideal behaviour.
Files
Updated by Anonymous over 11 years ago
- Status changed from Open to Rejected
Dir.glob
is documented to return filenames in filesystem order:
Note that case sensitivity depends on your system, as does the order in which the results are returned.
Updated by bmwiedemann (Bernhard M. Wiedemann) almost 5 years ago
- Status changed from Rejected to Open
There are two problems with unsorted glob:
-
it is different from glob in C, bash and perl that all sort by default. Even GNU make finally switched back to sorted wildcard/glob ( https://savannah.gnu.org/bugs/index.php?52076 )
-
it causes problems for reproducible builds, so that developers have to patch an infinite number of callers such as
https://github.com/sass/sassc-ruby/pull/178
to be able to get identical build results on identical OSes on different machines.
Updated by hsbt (Hiroshi SHIBATA) almost 5 years ago
- Status changed from Open to Rejected
- Backport deleted (
1.9.3: UNKNOWN, 2.0.0: UNKNOWN)
Do not update the status
without a maintainer's decision.
Updated by Eregon (Benoit Daloze) almost 5 years ago
- Tracker changed from Bug to Feature
- Status changed from Rejected to Open
- ruby -v deleted (
ruby 1.9.3p429 (2013-05-15) [x86_64-linux] Brightbox)
I agree always sorting the result of Dir.glob
makes sense.
Non-determinism caused by Dir.glob is very annoying and IMHO doesn't feel like Ruby.
I would also expect sorting is a low overhead compared to syscalls, so performance-wise I think it's not a big hit.
FWIW, TruffleRuby returns sorted results for Dir.glob
since 2016.
hsbt (Hiroshi SHIBATA) wrote:
Do not update the
status
without a maintainer's decision.
How should we rediscuss this then?
It's not because the documentation mentions it we should never change it.
I'll reopen as a Feature.
Updated by hsbt (Hiroshi SHIBATA) almost 5 years ago
I have no opinion about this feature.
Updated by Eregon (Benoit Daloze) almost 5 years ago
Here are some benchmark results in the ruby repository:
$ ruby -e 'p Dir["**/*"].size'
12171
$ ruby -rbenchmark -e '10.times { p Benchmark.realtime { Dir["**/*"] } }'
0.017877419999422273
0.015390422999189468
0.015255956001055893
0.015021605999208987
0.015777969998453045
0.015484851002838695
0.016179073001694633
0.015210424000542844
0.015358253996964777
0.014319942998554325
$ ruby -rbenchmark -e '10.times { p Benchmark.realtime { Dir["**/*"].sort } }'
0.017600111998035572
0.017109740001615137
0.017832364999776473
0.01726310600133729
0.018130796997866128
0.01659841600121581
0.018173008000303525
0.017528833999676863
0.017515739000373287
0.01770434499849216
So a bit slower but we can likely optimize further if desired.
Updated by Eregon (Benoit Daloze) almost 5 years ago
I added this issue to the next meeting's agenda:
https://bugs.ruby-lang.org/issues/16454
Updated by bmwiedemann (Bernhard M. Wiedemann) almost 5 years ago
The benchmark numbers above show a difference of 12%
That is probably the worst case, because usually, globs will return fewer entries (though for some strange reason I get a 20% diff on a dir with 200 entries)
and usually some processing will be performed on the returned files and that will take much longer than the sorting.
Updated by byroot (Jean Boussier) almost 5 years ago
For what it's worth I also think it should return a sorted array, because:
- Pretty much any rubyist I know have been been bitten by this at least once.
- Many experienced rubyist end up always writing
Dir[patten].sort
- It's particularly prevalent because the "develop on OSX, deploy on Linux" combo is very popular.
If the performance impact is a concern, I think an extra keyword argument could be added: glob( pattern, [flags], [base: path], [sort: true] )
, this way you can avoid the performance impact if you know that you don't need it.
Updated by deivid (David Rodríguez) almost 5 years ago
I got bit by this in the past too when trying to reproduce order dependent test failures (https://github.com/rubygems/rubygems/pull/2626#discussion_r254020218).
Updated by jhawthorn (John Hawthorn) almost 5 years ago
One potential issue with this is that though globs which scanned directories (ex. Dir.glob("foo/*")
) would return results in an inconsistent order, globs which used purely brace expansion (ex. Dir.glob("foo/{a,b,c,d}")
) would return values predictably in the order listed.
Rails versions prior to 6.0 unfortunately relied on this behaviour (6.0+ in most cases doesn't and does sorting manually). It probably shouldn't have relied on it, but it did, and I fear other libraries or tools may have done the same.
We could possibly work around that by sorting when reading directory entries rather than sorting the full result, but that's more complicated to implement and would be hard to document as an exact behaviour developers can expect/rely upon.
Updated by naruse (Yui NARUSE) almost 5 years ago
the Principle of Least Astonishment.
You shouldn't use "the Principle of Least Astonishment".
Without the term you need to explain why the current behavior is bad and need to change.
For example ...
the result of Dir.glob depends a OS and filesystem. People often wrongly write code which depends their local environment.
Though people should carefully write portable code, could we provide a guard to protect people from such pitfalls?
Many people write specs which compare the result of Dir.glob and an expected array, and fails.
If Dir.glob sort the result, people can avoid pitfalls and reduce the cost of writing such specs.
Updated by mame (Yusuke Endoh) almost 5 years ago
Hi @jhawthorn (John Hawthorn), I'm unsure whether you agree with the proposal or not. Do you mean sorting the result may break Rails? Or not sorting the result may do so, i.e., are you against the change?
Updated by Eregon (Benoit Daloze) almost 5 years ago
@jhawthorn (John Hawthorn) Good point, I forgot to mention this.
The sorting must respect explicit order for {...,...}
and conceptually the same as sorting just after readdir(3), not on the full result to be correct.
That's also likely more efficient, due to sorting smaller arrays.
ruby/spec already captures this, 3 specs fail if sorting is done on the returned array instead of per directory.
Updated by Eregon (Benoit Daloze) almost 5 years ago
Even C's glob(3) is sorted (by default), as @bmwiedemann (Bernhard M. Wiedemann) said:
$ man 3 glob
...
GLOB_NOSORT
Don't sort the returned pathnames. The only reason to do this is to save processing time. By default, the returned path‐
names are sorted.
Updated by nobu (Nobuyoshi Nakada) almost 5 years ago
I'm for adding NOSORT
option to the second argument.
Updated by matz (Yukihiro Matsumoto) almost 5 years ago
Accepted. We will add sort: false
keyword option to disable sorting.
Matz.
Updated by Dan0042 (Daniel DeLorme) almost 5 years ago
It's good to sort the result of Dir["*"]
, but as jhawthorn pointed out the brace expansion must keep the same order. I have code that depends on this, and I'm sure many others also have code that depend on this, since it's the behavior found in the shell:
$ touch a2 a1 a0 b2 b1 b0
$ echo {a,b}?
a0 a1 a2 b0 b1 b2
$ echo {b,a}?
b0 b1 b2 a0 a1 a2
Updated by nobu (Nobuyoshi Nakada) almost 5 years ago
- Status changed from Open to Closed
Applied in changeset git|2f1081a451f21ca017cc9fdc585883e5c6ebf618.
Sort globbed results by default [Feature #8709]
Sort the results which matched single wildcard or character set in
binary ascending order, unless sort: false
is given. The order
of an Array of pattern strings and braces are not affected.