https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112014-04-07T18:38:55ZRuby Issue Tracking SystemRuby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=461042014-04-07T18:38:55Zusa (Usaku NAKAMURA)usa@garbagecollect.jp
<ul></ul><p>The encoding of the results of Dir.glob are the same encoding with it's parameter.<br>
So, there is no bug about the second case.</p>
<p>But, the first case, the encoding of <strong>FILE</strong> should be Windows-1252 (filesystem encoding)<br>
or UTF-8 (script's encoding), I think.<br>
It may be a bug.</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=461072014-04-07T19:46:15Zthomthom (Thomas Thomassen)thomas@thomthom.net
<ul></ul><p>Usaku NAKAMURA wrote:</p>
<blockquote>
<p>But, the first case, the encoding of <strong>FILE</strong> should be Windows-1252 (filesystem encoding)<br>
or UTF-8 (script's encoding), I think.<br>
It may be a bug.</p>
</blockquote>
<p>Seeing how the Windows file system can use Unicode characters I would expect <strong>FILE</strong> to be unicode encoded. Even if the file encoding was different.<br>
The file system doesn't store file names in Windows-1252 encoded data, that's just the fallback compatibility code page for programs that doesn't declare them selves as Unicode capable. Ruby doesn't do this - it doesn't seem to declare the UNICODE flag, but instead explicitly calls the *W variant of the file functions.</p>
<p>If I need to represent a file name in the UI some way, or write to file, in a different encoding then I can do the appropriate transposing. But I don't see any reason why Ruby's file related functions under Windows should yield any strings that are not Unicode.</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=461082014-04-07T20:30:19Zusa (Usaku NAKAMURA)usa@garbagecollect.jp
<ul></ul><p>Thomas Thomassen wrote:</p>
<blockquote>
<p>If I need to represent a file name in the UI some way, or write to file, in a different encoding then I can do the appropriate transposing. But I don't see any reason why Ruby's file related functions under Windows should yield any strings that are not Unicode.</p>
</blockquote>
<p>Because Ruby is 21 years old. She was born before Unicode spread.<br>
When we began to introduce m17n features into Rubym there are many many l10n scripts existed.<br>
In order to maintain compatibility with these scripts, we designed conservatively about the elements which are subject to the influence of encoding.<br>
Specifications, such as Dir.glob, are performed by such judgment.</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=461092014-04-07T20:45:05Zthomthom (Thomas Thomassen)thomas@thomthom.net
<ul></ul><p>Looking at how Ruby determines the filesystem encoding:</p>
<p><a href="http://rxr.whitequark.org/mri/source/encoding.c#1267" class="external">http://rxr.whitequark.org/mri/source/encoding.c#1267</a><br>
static int enc_set_filesystem_encoding(void)</p>
<pre><code>1266 char cp[sizeof(int) * 8 / 3 + 4];
1267 snprintf(cp, sizeof cp, "CP%d", AreFileApisANSI() ? GetACP() : GetOEMCP());
1268 idx = rb_enc_find_index(cp);
1269 if (idx < 0) idx = ENCINDEX_ASCII;
</code></pre>
<p>It's asking between OEM CP and ASCII CP - both of which are not Unicode. So Ruby will under Windows always try to return using ASCII or the OEM code page?<br>
I can understand the desire for compatibility, but I'd wish for some better control - switches when you compile it so it was possible to set up Ruby under Windows where it wasn't necessary to juggle all these different encoding types.</p>
<p>For <strong>FILE</strong> to use UTF-8 selectively when it contains bytes outside of the filesystem CP seems very erratic. And as can be seen with the Dir.glob function it causes failure cascading further down the ruby scripts as some functions use inconsistent encoding.</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=461142014-04-08T12:15:21Zthomthom (Thomas Thomassen)thomas@thomthom.net
<ul></ul><p>Referringthe docs (<a href="http://www.ruby-doc.org/core-2.1.1/Encoding.html#method-c-default_external-3D" class="external">http://www.ruby-doc.org/core-2.1.1/Encoding.html#method-c-default_external-3D</a>) to <code>Encoding.default_internal=</code>, <code>__FILE__</code> should return strings according to the default internal - but when using <code>-E</code> to set it I don't see this behaviour.</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=461152014-04-08T13:45:51Zthomthom (Thomas Thomassen)thomas@thomthom.net
<ul></ul><p>I'm starting to wonder if the three bugs I recently files could all go under one: that the behaviour described under Encoding.default_internal= isn't happening for all the elements it lists. The issues I'm experiencing appear to would have been working fine if encoding behaved as described for that function.</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=461202014-04-09T06:17:08Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Applied in changeset r45539.</p>
<hr>
<p>encoding.c: fix rdoc of <code>__FILE__</code></p>
<ul>
<li>encoding.c (rb_enc_default_internal): fix rdoc. <code>__FILE__</code> is<br>
in filesystem encoding but not <code>default_internal</code>.<br>
<a href="/issues/9713">[ruby-core:61894]</a> [Bug <a class="issue tracker-2 status-7 priority-4 priority-default closed" title="Feature: __FILE__ return unexpected encoding - breaks Dir.glob (Feedback)" href="https://redmine.ruby-lang.org/issues/9713">#9713</a>]</li>
</ul> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=461212014-04-09T09:03:42Zthomthom (Thomas Thomassen)thomas@thomthom.net
<ul></ul><p>Nobuyoshi Nakada wrote:</p>
<blockquote>
<ul>
<li>encoding.c (rb_enc_default_internal): fix rdoc. <code>__FILE__</code> is<br>
in filesystem encoding but not <code>default_internal</code>.</li>
</ul>
</blockquote>
<p>In my test <code>__FILE__</code> is returned in the OEM encoding - not filesystem encoding.<br>
And is it by design that <code>__FILE__</code> will return a different encoding depending on it's content? And is there no way to configure it to return a consistent encoding?</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=461222014-04-09T09:07:19Zusa (Usaku NAKAMURA)usa@garbagecollect.jp
<ul><li><strong>Status</strong> changed from <i>Closed</i> to <i>Assigned</i></li></ul><p>Thomas Thomassen wrote:</p>
<blockquote>
<p>In my test <code>__FILE__</code> is returned in the OEM encoding - not filesystem encoding.</p>
</blockquote>
<p>So, reopened.</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=552352015-12-04T13:35:42Zthomthom (Thomas Thomassen)thomas@thomthom.net
<ul></ul><p>Revisiting this issue again. Is there a resolution to what can be done to improve this and still satisfy compatibility concerns?</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=552372015-12-04T14:15:45Zusa (Usaku NAKAMURA)usa@garbagecollect.jp
<ul></ul><p>What can I say now is that we are planning to use UTF-8 as filesystem encoding on Windows at Ruby 3.0.</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=552382015-12-04T14:31:49Zthomthom (Thomas Thomassen)thomas@thomthom.net
<ul></ul><p>Usaku NAKAMURA wrote:</p>
<blockquote>
<p>What can I say now is that we are planning to use UTF-8 as filesystem encoding on Windows at Ruby 3.0.</p>
</blockquote>
<p>That's very promising to hear. I'll keep an eye out for that.<br>
Though, Ruby 3 is quite a bit away, isn't it? Anything that can be done to the v2 branch to mitigate issues? I'd be willing to offer my help.</p> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=838462020-01-14T09:11:09Znaruse (Yui NARUSE)naruse@airemix.jp
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Feature</i></li><li><strong>Target version</strong> set to <i>3.0</i></li><li><strong>ruby -v</strong> deleted (<del><i>ruby 2.2.0dev (2014-04-07 trunk 45528) [i386-mswin32_100] </i></del>)</li><li><strong>Backport</strong> deleted (<del><i>2.0.0: UNKNOWN, 2.1: UNKNOWN</i></del>)</li></ul> Ruby master - Feature #9713: __FILE__ return unexpected encoding - breaks Dir.globhttps://redmine.ruby-lang.org/issues/9713?journal_id=891292020-12-10T09:13:23Znaruse (Yui NARUSE)naruse@airemix.jp
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Feedback</i></li><li><strong>Target version</strong> deleted (<del><i>3.0</i></del>)</li></ul>