Project

General

Profile

Actions

Bug #5486

closed

rb_stat() doesn’t respect input encoding

Added by now (Nikolai Weibull) over 12 years ago. Updated about 12 years ago.

Status:
Closed
Target version:
-
ruby -v:
-
Backport:
[ruby-core:40412]

Description

rb_stat() overrides the input strings encoding and applies one of various encodings through rb_str_encode_ospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.

If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.

I suspect that this is an issue that may appear in various other functions as well.

Updated by usa (Usaku NAKAMURA) over 12 years ago

Sorry, I can't understand your point.
If you think there is a bug, would you show us the bug by code?

Updated by now (Nikolai Weibull) over 12 years ago

On Fri, Oct 28, 2011 at 07:28, Usaku NAKAMURA wrote:

Sorry, I can't understand your point.
If you think there is a bug, would you show us the bug by code?

That’s hard to do, but name a file in an encoding other than
'filesystem' on an NTFS filesystem. What I did was accidentally
create a file whose name was encoded in UTF-16. Then, do
Dir['dir'].entries.each{ |e| printf "%p: %s\n", e, File.file? e },
where 'dir' is the directory containing this file. e.file? will
return false for this file, even though it’s a file. The problem is,
as explained, in rb_stat(), as it re-encodes its argument in the
'filesystem' encoding.

Updated by now (Nikolai Weibull) over 12 years ago

On Fri, Oct 28, 2011 at 08:14, Nikolai Weibull wrote:

On Fri, Oct 28, 2011 at 07:28, Usaku NAKAMURA wrote:

Sorry, I can't understand your point.
If you think there is a bug, would you show us the bug by code?

That’s hard to do, but name a file in an encoding other than
'filesystem' on an NTFS filesystem.  What I did was accidentally
create a file whose name was encoded in UTF-16.  Then, do
Dir['dir'].entries.each{ |e| printf "%p: %s\n", e, File.file? e },
where 'dir' is the directory containing this file.  e.file? will
return false for this file, even though it’s a file.  The problem is,
as explained, in rb_stat(), as it re-encodes its argument in the
'filesystem' encoding.

Actually, it’s probably easier than that. It can be done on a HFS+
filesystem (and probably any other, as well) just as easily

% echo $LC_CTYPE
UTF-8
% mkdir t
% touch t/å
% cat > a.rb

-- coding: utf-8 --

Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
File.file?(e) }
^D
% ruby --version
ruby 2.0.0dev (2011-10-26 trunk 33526) [x86_64-darwin10.8.0]
% ruby a.rb
".", #Encoding:UTF-8, false
"..", #Encoding:UTF-8, false
"å", #Encoding:UTF-8, false

I guess the problem is that Ruby assumes that it can apply an encoding
to something that it gets from the filesystem when it would probably
be better to not do so. It should probably be BINARY or ASCII-8BIT
instead of UTF-8.

(It turns out that this example gave the same results in 1.8.7 (minus
the e.encoding), so perhaps I’m doing something else wrong.)

Trying to do

p File.file?('t/å'.encode('UTF-16LE'))

results in

in `file?': path name must be ASCII-compatible (UTF-16LE): "t/\u00E5"
(Encoding::CompatibilityError)

I give up.

Updated by nobu (Nobuyoshi Nakada) over 12 years ago

  • ruby -v changed from ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32] to -

Hi,

(11/10/28 15:35), Nikolai Weibull wrote:

Actually, it’s probably easier than that. It can be done on a HFS+
filesystem (and probably any other, as well) just as easily

It's not true.

% echo $LC_CTYPE
UTF-8
% mkdir t
% touch t/å
% cat > a.rb

-- coding: utf-8 --

Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
File.file?(e) }
^D

`e' doesn't have directory prefix, "t/". It can't stat.

$ ruby -v -C t -e 'Dir.foreach(".") {|e| printf "%p, %p, %p\n", e, e.encoding, File.file?(e)}'
ruby 2.0.0dev (2011-10-25 trunk 33523) [universal.x86_64-darwin11.2.0]
".", #Encoding:UTF-8, false
"..", #Encoding:UTF-8, false
"å", #Encoding:UTF-8, true

--
Nobu Nakada

Updated by now (Nikolai Weibull) over 12 years ago

On Fri, Oct 28, 2011 at 09:20, Nobuyoshi Nakada wrote:

(11/10/28 15:35), Nikolai Weibull wrote:

Actually, it’s probably easier than that.  It can be done on a HFS+
filesystem (and probably any other, as well) just as easily

It's not true.

% echo $LC_CTYPE
UTF-8
% mkdir t
% touch t/å
% cat > a.rb

-- coding: utf-8 --

Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
File.file?(e) }
^D

`e' doesn't have directory prefix, "t/".  It can't stat.

Ouch, of course. How stupid of me. That explains why it didn’t work
under 1.8.7 either.

The point still remains valid on Windows, however:

% mkdir t
% touch t/→
% ruby -v -C t -e 'Dir.foreach(".") {|e| printf "%p, %p, %p\n", e,
e.encoding, File.file?(e)}'
ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]
".", #Encoding:Windows-1252, false
"..", #Encoding:Windows-1252, false
"?", #Encoding:Windows-1252, false

Hm, I guess here the result of Dir.foreach is broken.

Here’s another case:

% ruby -v -rfind -e 'Find.find("t").each{ |e| printf "%p, %s, %p,
%p\n", e, e.dump, e.encoding, File.file?(e)}'
"t", "t", #Encoding:UTF-8, false
"t/?", "t/?", #Encoding:ASCII-8BIT, false

Equally broken, I guess.

Updated by ko1 (Koichi Sasada) about 12 years ago

  • Status changed from Open to Assigned
  • Assignee set to nobu (Nobuyoshi Nakada)

Updated by nobu (Nobuyoshi Nakada) about 12 years ago

  • Category changed from core to M17N
  • Status changed from Assigned to Feedback
  • Priority changed from 5 to 3

Does this issue still occur?

Updated by now (Nikolai Weibull) about 12 years ago

On Sun, Mar 11, 2012 at 22:41, Nobuyoshi Nakada wrote:

Issue #5486 has been updated by Nobuyoshi Nakada.

Category changed from core to M17N
Status changed from Assigned to Feedback
Priority changed from High to Low

Does this issue still occur?

Yes, it still occurs against trunk:

ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]

Updated by now (Nikolai Weibull) about 12 years ago

2012/3/15 U.Nakamura :

Hello,

In message "[ruby-core:43260] Re: [ruby-core:43236] [ruby-trunk - Bug #5486][Feedback] rb_stat() doesn’t respect input encoding"
   on Mar.13,2012 18:03:04, wrote:

Yes, it still occurs against trunk:

ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]

It's not trunk...
It seems too old.

How can you say that? I just tested it and got the same results. I
showed you my version string above, that’s trunk.

Updated by Anonymous about 12 years ago

On 3/14/12 11:24 PM, Nikolai Weibull wrote:

2012/3/15 U.Nakamura:

Hello,

In message "[ruby-core:43260] Re: [ruby-core:43236] [ruby-trunk - Bug #5486][Feedback] rb_stat() doesn’t respect input encoding"
on Mar.13,2012 18:03:04, wrote:

Yes, it still occurs against trunk:

ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]
It's not trunk...
It seems too old.
How can you say that? I just tested it and got the same results. I
showed you my version string above, that’s trunk.

hi Nikolai,

try compiling an updated version of trunk from this repository:
svn co http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_9_3

your version indicates it's from last year. here's a version string from
a recent compilation on my system:
% ./ruby --version
ruby 1.9.3p163 (2012-03-14 revision 35012) [x86_64-darwin11.3.0]

does that help at all?

Updated by naruse (Yui NARUSE) about 12 years ago

2012/3/15 Trevor Wennblom :

try compiling an updated version of trunk from this repository:
 svn co http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_9_3

It is ruby_1_9_3 branch, not trunk.
For trunk,
svn co http://svn.ruby-lang.org/repos/ruby/trunk

--
NARUSE, Yui  

Updated by naruse (Yui NARUSE) about 12 years ago

  • Status changed from Feedback to Closed

Updated by now (Nikolai Weibull) about 12 years ago

On Thu, Mar 15, 2012 at 05:24, Nikolai Weibull wrote:

2012/3/15 U.Nakamura :

Hello,

In message "[ruby-core:43260] Re: [ruby-core:43236] [ruby-trunk - Bug #5486][Feedback] rb_stat() doesn’t respect input encoding"
   on Mar.13,2012 18:03:04, wrote:

Yes, it still occurs against trunk:

ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]

It's not trunk...
It seems too old.

How can you say that?  I just tested it and got the same results.  I
showed you my version string above, that’s trunk.

Argh, sorry. I ran the test with the incorrect PATH, after all. Yes,
this issue has been resolved. You can close it.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0