Bug #11412
closedThe default filename encoding causes errors on Windows
Description
Ruby is apparently unable to find files it just told me are there (containing japanese characters).
Demo code:
Dir.foreach('.') do |entry|
puts "#{entry} exists? " + File.exist?(entry).to_s
end
Output:
C:\tmp\test\filenames>C:\tmp\rubybackup\ruby-2.2.2-x64-mingw32\bin\ruby.exe test.rb
. exists? true
.. exists? true
a.md exists? true
b.txt exists? true
test.rb exists? true
???.txt exists? false
Directory contents according to cmd.exe/dir:
02.08.2015 22:18
02.08.2015 22:18 ..
02.08.2015 22:04 0 a.md
02.08.2015 22:04 0 b.txt
02.08.2015 22:20 87 test.rb
02.08.2015 22:04 0 ???.txt
The undisplayable filename contains japanese characters and should read: 小悪党.txt
C:\tmp\test\filenames>chcp
Active code page: 850
Attached zip file contains all files necessary to see the problem (with the exception of the script the files have a size of zero bytes).
Files
Updated by usa (Usaku NAKAMURA) over 10 years ago
- Status changed from Open to Rejected
It's spec.
Dir.foreach returns the filenames with the filesystem encoding (in your environment, it may be cp850) for backword compatibility.
You can specify encoding option to Dir.foreach:
Dir.foreach('.', encoding: 'utf-8') do |entry|
Updated by simoneau (Matthew Simoneau) about 10 years ago
- Subject changed from Filename encoding issues (Windows) to The default filename encoding causes errors on Windows
I'd like to see this issue reopened.
The solution of specifying UTF-8 explicitly works, but this should be the default for Ruby on Windows. Two reasons:
-
This is a serious usability issue. It took me half an hour to work this out. Problems like this contribute to the perception (or reality) that Windows is a second-class platform.
-
More significantly, higher-level functions like Pathname#children don't let you specify the encoding. You have to rework this code to call Dir.foreach of similar directly instead.
Updated by naruse (Yui NARUSE) almost 10 years ago
Matthew Simoneau wrote:
I'd like to see this issue reopened.
The solution of specifying UTF-8 explicitly works, but this should be the default for Ruby on Windows. Two reasons:
This is a serious usability issue. It took me half an hour to work this out. Problems like this contribute to the perception (or reality) that Windows is a second-class platform.
More significantly, higher-level functions like Pathname#children don't let you specify the encoding. You have to rework this code to call Dir.foreach of similar directly instead.
Of course the change is discussing but it is still pending because of compatibility.
It can be changed at Ruby 3.0.