Project

General

Profile

Actions

Bug #5297

closed

Either File.expand_path or File.join is corrupting string encoding

Added by luislavena (Luis Lavena) over 10 years ago. Updated about 10 years ago.

Status:
Closed
Priority:
Normal
Target version:
ruby -v:
ruby 1.9.4dev (2011-09-07 trunk 33212) [i386-mingw32]
Backport:
[ruby-core:39355]

Description

Hello,

While working on some API improvements for Windows, found the following issue:

https://gist.github.com/1202366

V:\fóñè>ruby -v
ruby 1.9.4dev (2011-09-07 trunk 33212) [i386-mingw32]

V:\fóñè>chcp 1252
Active code page: 1252

V:\fóñè>ruby -e "puts Encoding.default_external"
Windows-1252

V:\fóñè>irb
irb(main):001:0> a = File.expand_path "."
=> "V:/fóñè"
irb(main):002:0> a.encoding
=> #
irb(main):003:0> b = Dir.glob("../*").first
=> "../fóñè"
irb(main):004:0> b.encoding
=> #
irb(main):005:0> File.expand_path b
=> "V:/fóñè"
irb(main):006:0> c = File.expand_path b
=> "V:/fóñè"
irb(main):007:0> c.encoding
=> #
irb(main):008:0> d = File.join(a, "foo")
=> "V:/f\xF3\xF1\xE8/foo"
irb(main):009:0> d.encoding
=> #                          # <= FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
irb(main):010:0> e = "#{a}/foo"
=> "V:/fóñè/foo"
irb(main):011:0> e.encoding
=> #
irb(main):012:0> File.open(d, "w+") { |f| f.puts "hi" }
Errno::ENOENT: No such file or directory - V:/fóñè/foo       # <= W.T.F.????
        from (irb):12:in `initialize'
        from (irb):12:in `open'
        from (irb):12
        from C:/Users/Luis/Tools/Ruby/ruby-head-i386-mingw32/bin/irb:12:in `'
irb(main):013:0> File.open(e, "w+") { |f| f.puts "hi" }
Errno::ENOENT: No such file or directory - V:/fóñè/foo       # <= W.T.F. * 20!
        from (irb):13:in `initialize'
        from (irb):13:in `open'
        from (irb):13
        from C:/Users/Luis/Tools/Ruby/ruby-head-i386-mingw32/bin/irb:12:in `'
irb(main):014:0>

It is not clear why while File.expand_path worked, File.join broke but string interpolation didn't.

Even worse is that File.open failed.

I'm working on a replacement function for expand_path that rely on MultiByteToWideChar + GetFullPathNameW + WideCharToMultiByte and then uses rb_filesystem_str_new_cstr to return the string.

The funny fact is that replacement work properly:

C:\Users\Luis\Projects\oss\me\fenix>ripl -Ilib
>> require "fenix"
=> true
>> Dir.chdir "V:"
=> 0
>> Dir.pwd
=> "V:/fóñè"
>> c = Fenix::File.expand_path "."
=> "V:/fóñè"
>> c.encoding
=> #
>> File.join(c, "foo").encoding
=> #
>> d = "#{c}/foo"
=> "V:/fóñè/foo"
>> d.encoding
=> #
>> File.open(d, "w") { |f| f.puts "hi" }
=> nil

Updated by luislavena (Luis Lavena) over 10 years ago

  • Status changed from Open to Closed

This has been solved already associated to another bug report.

Updated by patrickb (Patrick Bennett) about 10 years ago

Which other issue is this associated with?
Is this going to be patched back to 1.9.3? It's still present in 1.9.3p125

Updated by luislavena (Luis Lavena) about 10 years ago

Patrick Bennett wrote:

Which other issue is this associated with?
Is this going to be patched back to 1.9.3? It's still present in 1.9.3p125

Sorry, but with released patchlevel 125 I can no longer reproduce this:

V:\fóñè>ruby -v
ruby 1.9.3p125 (2012-02-16) [i386-mingw32]

V:\fóñè>date /T
29/02/2012

V:\fóñè>time /T
02:46 p.m.

V:\fóñè>chcp
Active code page: 1252

V:\fóñè>ruby -e "puts Encoding.default_external"
Windows-1252

V:\fóñè>irb
irb(main):001:0> a = File.expand_path "."
=> "V:/fóñè"
irb(main):002:0> a.encoding
=> #
irb(main):003:0> b = Dir.glob("../*")[1]
=> "../fóñè"
irb(main):004:0> b.encoding
=> #
irb(main):005:0> c = File.expand_path b
=> "V:/fóñè"
irb(main):006:0> c.encoding
=> #
irb(main):007:0> d = File.join(a, "foo")
=> "V:/fóñè/foo"
irb(main):008:0> d.encoding
=> #
irb(main):009:0> e = "#{a}/foo"
=> "V:/fóñè/foo"
irb(main):010:0> e.encoding
=> #
irb(main):011:0> File.open(d, "w+") { |f| f.puts "hi" }
=> nil
irb(main):012:0> File.open(e, "w+") { |f| f.puts "hi" }
=> nil
irb(main):013:0> exit

Updated by patrickb (Patrick Bennett) about 10 years ago

With ruby 1.9.3p125 (2012-02-16) [i386-mingw32] File.join always converts to ASCII-8BIT for me no matter the encoding passed to it.
So, using your irb example up through the File.join
irb(main):001:0> a = File.expand_path "."
=> "d:/test-streams"
irb(main):002:0> a.encoding
=> #Encoding:Windows-1252
irb(main):003:0> b = Dir.glob("../*")[1]
=> "../2dot4DSTree.reg"
irb(main):004:0> b.encoding
=> #Encoding:IBM437
irb(main):005:0> c = File.expand_path b
=> "d:/2dot4DSTree.reg"
irb(main):006:0> c.encoding
=> #Encoding:Windows-1252
irb(main):007:0> d = File.join(a, "foo")
=> "d:/test-streams/foo"
irb(main):008:0> d.encoding
=> #Encoding:ASCII-8BIT
irb(main):009:0> File.join('foo','bar').encoding
=> #Encoding:ASCII-8BIT

The result is the same regardless of my default external codepage - if I change it to 1252 as you have it then b's encoding returns as 1252 instead of 437 (my default) but File.join still returns as ascii-8bit. The fact that we;re apparently using the same ruby version is a little troubling though.

Updated by luislavena (Luis Lavena) about 10 years ago

Patrick Bennett wrote:

With ruby 1.9.3p125 (2012-02-16) [i386-mingw32] File.join always converts to ASCII-8BIT for me no matter the encoding passed to it.

=> #Encoding:ASCII-8BIT

The result is the same regardless of my default external codepage - if I change it to 1252 as you have it then b's encoding returns as 1252 instead of 437 (my default) but File.join still returns as ascii-8bit. The fact that we;re apparently using the same ruby version is a little troubling though.

The problem is your system encoding.

For some reason from IBM437 to Windows-1252 on Dir.glob is not working.

Please open a separate issue.

The issue described here is about File.join messing with encoding and causing File.open to fail.

Actions

Also available in: Atom PDF