Bug #4491
closedSome methods of Dir/File do not encode dirname/filename transparently
Description
=begin
My Ruby version is: ruby 1.9.2p180 (2011-02-18) [i386-mingw32]
My OS is Windows 7 (Simplified Chinese), its cmd.exe has a default codepage of CP936 (aka GBK), but I want my scripts to be written in UTF-8, so I've set the scripts' to UTF-8 and add "# encoding: utf-8" on the first line.
I find when I open a file, the filename will be encoded to the system's locale charmap automatically:
puts File.open( "漢字file" ).readline #this will be fine, although the "漢字file" literal is UTF-8
puts File.open( "漢字file".encode('utf-8') ).readline #explicitly encoded
puts File.open( "漢字file".encode('gbk') ).readline
puts File.open( "漢字file".encode('big5') ).readline
puts File.open( "漢字file".encode('shift_jis') ).readline #all fine, all encoded to Encoding.locale_charmap before asking the OS for the file, no matter to what it's encoded here
I have tested many methods of File & Dir class, most of them have this intimate and convenient feature, but some are not:
p Dir.entries( "漢字dir".encode('gbk') ) # explicit encode is required, or "No such file or directory" error. Since my local charmap is gbk, it can not be encoded to utf-8, big5, shift_jis
p Dir.glob( "漢字dir/*".encode('gbk') )
Dir.foreach( "漢字dir".encode('gbk') ) {|f| puts f}
Dir.open( "漢字dir".encode('gbk') ).each {|f| puts f}
Dir.new( "漢字dir".encode('gbk') ).each {|f| puts f} # all must explicitly encoded to locale_charmap (my situation is GBK)
and a File method require encoding too:
puts File.absolute_path( "漢字file".encode('gbk') )
These methods may be missed, they may should act the same way as Dir.chdir("漢字dir") or File.open("漢字dir"), with a feature of transparently encoding.
Best regards.
Joey
=end
Updated by yimutang (Joey Zhou) almost 13 years ago
=begin
Is this issue forgotten? :)
=end
Updated by naruse (Yui NARUSE) almost 13 years ago
- Category set to M17N
- Status changed from Open to Assigned
- Assignee set to naruse (Yui NARUSE)
=begin
=end
Updated by naruse (Yui NARUSE) almost 13 years ago
- Assignee changed from naruse (Yui NARUSE) to usa (Usaku NAKAMURA)
=begin
=end
Updated by usa (Usaku NAKAMURA) almost 13 years ago
=begin
We now consider that this is a bug.
Current code is too complex, so to judge the reason why somebody wrote such code is difficult.
We want to get time a little more, sorry.
=end
Updated by usa (Usaku NAKAMURA) almost 13 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
=begin
This issue was solved with changeset r31372.
Joey, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
-
win32/{win32.c,dir.h} (rb_w32_uopendir): new API to pass UTF-8 path.
-
win32/win32.c (opendir_internal, rb_w32_opendir): extract and merge
common part of rb_w32_opendir() and rb_w32_uopendir(). -
dir.c (do_opendir, glob_helper): encoding.
-
dir.c (dir_initialize, do_opendir): convert path to UTF-8 and call
rb_w32_uopendir() instead of rb_w32_opendir() on Windows.
fixes #4491, reported by Joey Zhou.
=end