Project

General

Profile

Actions

Bug #877

closed

[win32] Ruby Standard Library (maybe smth else): Wrong Encoding in Files, Directories and Environment Variables

Added by eveel (Dmitry A. Ustalov) over 15 years ago. Updated almost 13 years ago.

Status:
Rejected
Target version:
ruby -v:
-
[ruby-core:20557]

Description

=begin
I am from Russia, and my system language is set to Russian.

When I tried to create a directory via Dir.mkdir method:

irb(main):002:0> Dir.mkdir "c:/ruby/проверка"
=> 0

Word "проверка" means "test" in Russian.

Directory name appears in wrong charset (details at the
screenshot).

irb(main):003:0> File.exists? "c:/ruby/проверка"
=> true

This is a root of many problems, for example, when program
tries to create a directory in %USERPROFILE%/Application Data, see:

Microsoft Windows XP [Версия 5.1.2600]
(С) Корпорация Майкрософт, 1985-2001.

C:\Documents and Settings\Администратор>irb
irb(main):001:0> $KCODE = 'utf8'
=> "utf8"
irb(main):002:0> ENV['userprofile']
=> "C:\Documents and Settings\\200\244\254\250\255\250стр\240тор"
irb(main):003:0> $KCODE = ''
=> ""
irb(main):004:0> ENV['userprofile']
=> "C:\Documents and Settings\\200\244\254\250\255\250\341\342\340\240\342\256
\340"
irb(main):005:0> File.exists? ENV['userprofile']
=> false

Word "Администратор" means "Administrator" in Russian.

Microsoft Windows XP [Version 5.1.2600].

C:>ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]

Ruby is installed from http://rubyinstaller.rubyforge.org/ .
=end


Files

ruby.png (103 KB) ruby.png screenshot eveel (Dmitry A. Ustalov), 12/15/2008 07:10 AM
multibyte-cant-help.png (42.5 KB) multibyte-cant-help.png another screenshot eveel (Dmitry A. Ustalov), 12/18/2008 10:18 PM
cucumber-method-is-also-wrong.png (5.28 KB) cucumber-method-is-also-wrong.png another wrong method eveel (Dmitry A. Ustalov), 12/20/2008 11:46 PM
Actions #1

Updated by luislavena (Luis Lavena) over 15 years ago

=begin
I noticed issues with other things, like puts, print and such.

Most of the File and IO functions for Windows are ANSI, not Wide, which limits the options to process properly paths, filenames and even output of strings using UTF/Unicode characters.

Also, the console page affects ruby. By default is 437, but 1252 is needed to get accented strings to work.

Further review of the used Windows API is needed to find these issues.

=end

Actions #2

Updated by eveel (Dmitry A. Ustalov) over 15 years ago

=begin
This is bug or feature? :)

I hope that this behavior in Windows would be corrected in the
new versions of Ruby.

Is there a workaround for this bug?

Also, there are cp1251 for Russian, not cp1252.
=end

Actions #3

Updated by antares (Michael Klishin) over 15 years ago

=begin
Both cp1251 and cp1252 are ASCII extensions from Microsoft, and Ruby 1.8 assumes strings are all ASCII unless you use multibyte gem or activesupport. So try that and if you can get it working in console (windows console, not irb) with mkdir, you can try using Kernel#system.
=end

Actions #4

Updated by eveel (Dmitry A. Ustalov) over 15 years ago

=begin
I've tried to use this workaround, but probably he doesn't work.
=end

Actions #5

Updated by antares (Michael Klishin) over 15 years ago

=begin
Well, since for Ruby cyrillic characters are integers (just like any others), it uses integer values and Windows does not normalize them (OS X does, for instance). I see no way to fix this in 1.8.x branch and in 1.9 you already have encoding-aware strings, IO objects and so forth. But I am by no means M17N expert and may be wrong.
=end

Actions #6

Updated by eveel (Dmitry A. Ustalov) over 15 years ago

=begin
Okay, thanks.

Perhaps I should take into view this behaviour in the Windows until
Ruby 1.9 (or 2.0?) becomes stable.
=end

Actions #7

Updated by antares (Michael Klishin) over 15 years ago

=begin
1.9.1 branch is stable for day-to-day use, I do not know about any available builds for Windows though, and some libraries you may want to use still need to catch up. v1_8_0 tag of Ruby is from 2003 and Windows XP is from 2001 or so. At some point in time, people should consider moving on or just accept what is missing in older versions. Ruby is not unique in this regard.
=end

Actions #8

Updated by eveel (Dmitry A. Ustalov) over 15 years ago

=begin
It is impossible to jump into 1.9.x, because I use Shoes,
which is compiled by _why with Ruby 1.8.x.

So I can not build Shoes with a new Ruby for every operating
system that supported by Shoes.
=end

Actions #9

Updated by antares (Michael Klishin) over 15 years ago

=begin
well, maybe others can help, why don't you ask at shoes mailing list?
=end

Actions #10

Updated by eveel (Dmitry A. Ustalov) over 15 years ago

=begin
Because the bug, described here, doesn't applies especially to Shoes:
many other specific applications (that works in Win32 and operating
with environment variables and with file system entirely) has a
encoding-misunderstanding problem.

I found a dirty workaround: application should place its own folder
into %CommonProgramFiles%/AppName, instead of
%USERPROFILE%/Application Data/AppName.

This method has one disadvantage: data, which stored by application,
is available to everybody. I'm sorry for offtopic.

Issue should be closed, thanks for your time, comments and tips.
=end

Actions #11

Updated by luislavena (Luis Lavena) over 15 years ago

=begin
Similar situation with print of encoded characters happened to Cucumber developers:

http://rspec.lighthouseapp.com/projects/16211-cucumber/tickets/81

They ended using chcp and Iconv to do the character conversion back and forth.

=end

Actions #12

Updated by eveel (Dmitry A. Ustalov) over 15 years ago

=begin
I'm sorry for long answer delay.

Their solution is described at http://codesnippets.joyent.com/posts/show/414,
and implemented in
http://github.com/aslakhellesoy/cucumber/tree/master/lib/cucumber/formatters/unicode.rb

This method applies to output routines only, and it is useless here.
Actually problem is in the Windows-specific implementation of some Ruby
libraries. Ruby reads the environment variable in awful wrong encoding, and
works with file system objects in awful wrong encoding, too.

My attempts to iconv() ENV['userprofile'] to adequate charset
were unsuccessful (cp1251, cp1252, cp866).

Perhaps this bug is unresolvable (as Michael Klishin noted above), and
despite some limitations, I'll use my workaround before Ruby 1.8
will be replaced by 1.9.
=end

Actions #13

Updated by shyouhei (Shyouhei Urabe) about 15 years ago

  • Assignee set to usa (Usaku NAKAMURA)

=begin

=end

Actions #14

Updated by usa (Usaku NAKAMURA) about 15 years ago

  • Category set to core
  • Status changed from Open to Rejected
  • ruby -v set to -

=begin
There are no plan to resolve the original problem on 1.8.
You must pass the path with Win32 file API's encoding to ruby.

I know it's VERY inconvenient for users in Europe, but we cannot break compatibility of commandline/path handling in 1.8 branch.
=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0