Project

General

Profile

Actions

Bug #11412

closed

The default filename encoding causes errors on Windows

Bug #11412: The default filename encoding causes errors on Windows

Added by tokudan (Daniel Frank) over 10 years ago. Updated almost 10 years ago.

Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 2.2.2p95 (2015-04-13 revision 50295) [x64-mingw32]
[ruby-core:70217]

Description

Ruby is apparently unable to find files it just told me are there (containing japanese characters).
Demo code:
Dir.foreach('.') do |entry|
puts "#{entry} exists? " + File.exist?(entry).to_s
end

Output:
C:\tmp\test\filenames>C:\tmp\rubybackup\ruby-2.2.2-x64-mingw32\bin\ruby.exe test.rb
. exists? true
.. exists? true
a.md exists? true
b.txt exists? true
test.rb exists? true
???.txt exists? false

Directory contents according to cmd.exe/dir:
02.08.2015 22:18

.
02.08.2015 22:18 ..
02.08.2015 22:04 0 a.md
02.08.2015 22:04 0 b.txt
02.08.2015 22:20 87 test.rb
02.08.2015 22:04 0 ???.txt

The undisplayable filename contains japanese characters and should read: 小悪党.txt

C:\tmp\test\filenames>chcp
Active code page: 850

Attached zip file contains all files necessary to see the problem (with the exception of the script the files have a size of zero bytes).


Files

filenames.zip (611 Bytes) filenames.zip all example files and the demo script tokudan (Daniel Frank), 08/02/2015 08:27 PM

Updated by usa (Usaku NAKAMURA) over 10 years ago Actions #1 [ruby-core:70222]

  • Status changed from Open to Rejected

It's spec.
Dir.foreach returns the filenames with the filesystem encoding (in your environment, it may be cp850) for backword compatibility.

You can specify encoding option to Dir.foreach:

Dir.foreach('.', encoding: 'utf-8') do |entry|

Updated by simoneau (Matthew Simoneau) about 10 years ago Actions #2 [ruby-core:71318]

  • Subject changed from Filename encoding issues (Windows) to The default filename encoding causes errors on Windows

I'd like to see this issue reopened.

The solution of specifying UTF-8 explicitly works, but this should be the default for Ruby on Windows. Two reasons:

  1. This is a serious usability issue. It took me half an hour to work this out. Problems like this contribute to the perception (or reality) that Windows is a second-class platform.

  2. More significantly, higher-level functions like Pathname#children don't let you specify the encoding. You have to rework this code to call Dir.foreach of similar directly instead.

Updated by naruse (Yui NARUSE) almost 10 years ago Actions #3 [ruby-core:71366]

Matthew Simoneau wrote:

I'd like to see this issue reopened.

The solution of specifying UTF-8 explicitly works, but this should be the default for Ruby on Windows. Two reasons:

  1. This is a serious usability issue. It took me half an hour to work this out. Problems like this contribute to the perception (or reality) that Windows is a second-class platform.

  2. More significantly, higher-level functions like Pathname#children don't let you specify the encoding. You have to rework this code to call Dir.foreach of similar directly instead.

Of course the change is discussing but it is still pending because of compatibility.
It can be changed at Ruby 3.0.

Actions

Also available in: PDF Atom