Project

General

Profile

Feature #12650

Use UTF-8 encoding for ENV on Windows

Added by davispuh (Dāvis Mosāns) over 2 years ago. Updated about 2 years ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:76668]

Description

Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


Files


Related issues

Related to Ruby trunk - Bug #9715: ENV data yield ASCII-8BIT encoded strings under Windows with unicode usernameOpenActions

History

Updated by usa (Usaku NAKAMURA) over 2 years ago

We don't want to break compatibility.
Wait Ruby3.

Updated by nobu (Nobuyoshi Nakada) over 2 years ago

  • Tracker changed from Bug to Feature

Updated by spatulasnout (B Kelly) over 2 years ago

Hi,

Usaku NAKAMURA wrote:

We don't want to break compatibility.
Wait Ruby3.

We always invoke ruby with -EUTF-8:UTF-8 .

Would make sense to enable this patch in ruby 2.x in such situations
where UTF-8 behavior has been requested explicitly?

#4

Updated by naruse (Yui NARUSE) over 2 years ago

  • Related to Bug #9715: ENV data yield ASCII-8BIT encoded strings under Windows with unicode username added

Updated by Iristyle (Ethan Brown) over 2 years ago

If you could rethink the plan to wait until Ruby 3, that would be great.

I would expect Ruby to normalize on UTF-8 strings everywhere internally, and only convert to local codepage on the boundary (such as writing to console, file, etc).

We are tracking a number of issues in Puppet that we believe are caused by the current behavior:

Updated by thomthom (Thomas Thomassen) over 2 years ago

B Kelly wrote:

Hi,

Usaku NAKAMURA wrote:

We don't want to break compatibility.
Wait Ruby3.

We always invoke ruby with -EUTF-8:UTF-8 .

Would make sense to enable this patch in ruby 2.x in such situations
where UTF-8 behavior has been requested explicitly?

I would like to second this request. We are also troubled by the encoding issues under Windows. Not sure when Ruby 3 is planned to be released, but we would prefer for a more immediate solution.

Updated by shyouhei (Shyouhei Urabe) about 2 years ago

We looked at this issue in today's developer meeting.

First off, attendees' understanding: ENV in Windows is managed by its kernel, and is provided to an userland process as an array of wide characters. Tell me if it's wrong. Also, we already support writing UTF_8 strings into ENV because that has no backwards compatibility problem. The problem is to read from it.

Now, from our long tradition of using OEM codepage in Windows, it has been difficult to change the encoding of ENV to UTF_8. A tragedy is Windows does have chcp 65001, wich is not practically used anywhere. So windows users are left in their code pages.

I understand you want to use UTF_8. In order to do so, changing default encoding is not practically possible now because of backwards compatibility. I advice you to propose other ways; like for instance having some sort of "UTF_8 mode"-like thing. Maybe does it make sense for you to set default_internal encoding (which is set to nil by default)?

Updated by thomthom (Thomas Thomassen) about 2 years ago

I would be ok with it not being default, as long as it can be configured for the whole interpreter and not some magic comment that would have to be in each source file.
In our particular scenario we are embedding Ruby into our application and we would like to configure the Ruby interpreter to use this "UTF-8 mode".
People that are writing Ruby extensions for our application already have to use hacks such as force_encoding to correct this - and it's a constant source of bugs and problems. If we could force ENV strings to be UTF-8 by default for the embedded environment we provide that be a great relief for us.

shyouhei (Shyouhei Urabe) wrote:

We looked at this issue in today's developer meeting.

First off, attendees' understanding: ENV in Windows is managed by its kernel, and is provided to an userland process as an array of wide characters. Tell me if it's wrong. Also, we already support writing UTF_8 strings into ENV because that has no backwards compatibility problem. The problem is to read from it.

Now, from our long tradition of using OEM codepage in Windows, it has been difficult to change the encoding of ENV to UTF_8. A tragedy is Windows does have chcp 65001, wich is not practically used anywhere. So windows users are left in their code pages.

I understand you want to use UTF_8. In order to do so, changing default encoding is not practically possible now because of backwards compatibility. I advice you to propose other ways; like for instance having some sort of "UTF_8 mode"-like thing. Maybe does it make sense for you to set default_internal encoding (which is set to nil by default)?

Also available in: Atom PDF