Bug #4491

Some methods of Dir/File do not encode dirname/filename transparently

Added by yimutang (Joey Zhou) over 9 years ago. Updated over 9 years ago.

Target version:
ruby -v:
ruby 1.9.2p180 (2011-02-18) [i386-mingw32]


My Ruby version is: ruby 1.9.2p180 (2011-02-18) [i386-mingw32]

My OS is Windows 7 (Simplified Chinese), its cmd.exe has a default codepage of CP936 (aka GBK), but I want my scripts to be written in UTF-8, so I've set the scripts' to UTF-8 and add "# encoding: utf-8" on the first line.

I find when I open a file, the filename will be encoded to the system's locale charmap automatically:

puts "漢字file" ).readline #this will be fine, although the "漢字file" literal is UTF-8
puts "漢字file".encode('utf-8') ).readline #explicitly encoded
puts "漢字file".encode('gbk') ).readline
puts "漢字file".encode('big5') ).readline
puts "漢字file".encode('shift_jis') ).readline #all fine, all encoded to Encoding.locale_charmap before asking the OS for the file, no matter to what it's encoded here

I have tested many methods of File & Dir class, most of them have this intimate and convenient feature, but some are not:

p Dir.entries( "漢字dir".encode('gbk') ) # explicit encode is required, or "No such file or directory" error. Since my local charmap is gbk, it can not be encoded to utf-8, big5, shift_jis
p Dir.glob( "漢字dir/*".encode('gbk') )
Dir.foreach( "漢字dir".encode('gbk') ) {|f| puts f} "漢字dir".encode('gbk') ).each {|f| puts f} "漢字dir".encode('gbk') ).each {|f| puts f} # all must explicitly encoded to locale_charmap (my situation is GBK)

and a File method require encoding too:

puts File.absolute_path( "漢字file".encode('gbk') )

These methods may be missed, they may should act the same way as Dir.chdir("漢字dir") or"漢字dir"), with a feature of transparently encoding.

Best regards.


Updated by yimutang (Joey Zhou) over 9 years ago

Is this issue forgotten? :)

Updated by naruse (Yui NARUSE) over 9 years ago

  • Category set to M17N
  • Status changed from Open to Assigned
  • Assignee set to naruse (Yui NARUSE)



Updated by naruse (Yui NARUSE) over 9 years ago

  • Assignee changed from naruse (Yui NARUSE) to usa (Usaku NAKAMURA)



Updated by usa (Usaku NAKAMURA) over 9 years ago

We now consider that this is a bug.

Current code is too complex, so to judge the reason why somebody wrote such code is difficult.
We want to get time a little more, sorry.



Updated by usa (Usaku NAKAMURA) over 9 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r31372.
Joey, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

  • win32/{win32.c,dir.h} (rb_w32_uopendir): new API to pass UTF-8 path.

  • win32/win32.c (opendir_internal, rb_w32_opendir): extract and merge
    common part of rb_w32_opendir() and rb_w32_uopendir().

  • dir.c (do_opendir, glob_helper): encoding.

  • dir.c (dir_initialize, do_opendir): convert path to UTF-8 and call
    rb_w32_uopendir() instead of rb_w32_opendir() on Windows.
    fixes #4491, reported by Joey Zhou.

Also available in: Atom PDF