https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112019-06-13T10:02:56ZRuby Issue Tracking SystemRuby master - Bug #15908: Detecting BOM with non-UTF encodinghttps://redmine.ruby-lang.org/issues/15908?journal_id=785292019-06-13T10:02:56Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-5 priority-4 priority-default closed" href="/issues/15210">Bug #15210</a>: UTF-8 BOM should be removed from String in internal representation</i> added</li></ul> Ruby master - Bug #15908: Detecting BOM with non-UTF encodinghttps://redmine.ruby-lang.org/issues/15908?journal_id=812512019-08-29T06:50:42Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Depending on usage, distinction of UTF-8 (with/without BOM), UTF-16LE without BOM, UTF-16BE with or without BOM, and so on may also be necessary. Also, for Japanese, traditionally distinction between EUC-JP, Shift_JIS, and ISO-2022-JP can additionally be necessary.</p>
<p>For more complex cases, heuristics are needed. On the other hand, applications may not want to (or not be allowed to, as e.g. for the bootstrap phase of an XML parser) allow more than a well defined subset.</p>
<p>This kind of processing is therefore better left to applications.</p>
<p>I'm closing this issue to not leave it dangling, but please feel free to reopen if you disagree.</p> Ruby master - Bug #15908: Detecting BOM with non-UTF encodinghttps://redmine.ruby-lang.org/issues/15908?journal_id=812522019-08-29T06:54:12Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>I understand there's theoretically exist a situation this feature is useful.<br>
But I think it doesn't exist in practice.<br>
I object to provide an additional utility to support legacy encoding.</p> Ruby master - Bug #15908: Detecting BOM with non-UTF encodinghttps://redmine.ruby-lang.org/issues/15908?journal_id=812802019-08-30T02:46:32Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul></ul><p>I thought UTF-16LE and CP932 as the main purpose however, I'm bit surprised that these texts have been extinct on Windows already. :tada:</p> Ruby master - Bug #15908: Detecting BOM with non-UTF encodinghttps://redmine.ruby-lang.org/issues/15908?journal_id=812862019-08-30T08:07:14Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>nobu (Nobuyoshi Nakada) wrote:</p>
<blockquote>
<p>I thought UTF-16LE and CP932 as the main purpose however, I'm bit surprised that these texts have been extinct on Windows already. :tada:</p>
</blockquote>
<p>They are not yet extinct, unfortunately :-(. In Japan, there may be quite a few cases where this would work, but even in Japan, there are many other cases where a larger and/or different selection of encodings is needed.</p>