Bug #13549

MinGW / Windows encoding - Two issues

ruby 2.5.0dev (2017-05-08 trunk 58610) [x64-mingw32]


Issue #1

The documentation for Encoding.default_internal= states:

"The locale encoding (__ENCODING__), not default_internal, is used as the encoding of created strings."

Below is code and the console output for a MinGW build. Whether a variable is assigned to a string, or a string directly, it appears that both are encoded UTF-8, regardless of the locale encoding.

So, something is amiss. Is it --

  1. The documentation mistaken
  2. The behavior is specific to *nix builds
  3. The MinGW build is behaving incorrectly
txt = 'ABCDEF_äÖü'
puts  "filesystem   #{Encoding.find('filesystem')}" \
    "\nlocale       #{Encoding.find('locale')}" \
    "\nexternal     #{Encoding.default_external}" \
    "\ninternal     #{Encoding.default_internal}" \
    "\ntxt          #{txt.encoding.to_s}" \
    "\n'ABCDEF_äÖü' #{'ABCDEF_äÖü'.encoding.to_s}"

Console out with default encoding

filesystem   Windows-1252
locale       IBM437
external     IBM437
txt          UTF-8

Console out with locale set to 1252 with chcp

filesystem   Windows-1252
locale       Windows-1252
external     Windows-1252
txt          UTF-8

Issue #2

In the issue Set Encoding.default_external to UTF-8 on Windows #13488, Lars Kanis proposed changing Ruby default encodings on Windows to UTF-8. Discussion showed that, at present, this would an issue for many users.

In that thread, Nobu posted console output that showed default_external matching filesystem.

C:\Users\nobu\work\ruby\trunk\x64-mswin32_140>.\bin\ruby -e "p Encoding.default_external, Encoding.find('filesystem')"

In recent MinGW builds, I've had 8 failures and 1 error. This weekend I spent a little time patching around three failures, two of which involved encoding. The patches are dependent on the cause/fix for Issue #1, but also seem to work best when locale and default_external encodings are set equal to filesystem.

As noted above, my Windows system (standard American English Win7) has filesystem encoding of Windows-1252, with locale and default_external are IBM437. Why, I don't know.

Given that Nobu showed filesystem equal to default_external, would it be possible to change 'Windows' ruby so that, by default, locale and default_external are set equal to filesystem?

Not being a c type, I cannot create a patch/PR, etc. Lastly, moving this post between my code editor and 'Visual Studio Code' had some encoding issues. Or, yes, Windows does still have encoding issues...

