Project

General

Profile

Actions

Bug #20526

open

File.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows

Added by kou (Kouhei Sutou) 8 months ago. Updated 11 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt]
[ruby-core:118182]
Tags:

Description

I'm not sure whether this is an intentional behavior or not but it seems that encoding: "utf-8" doesn't change newline conversion but encoding: "bom|utf-8" changes newline conversion:

File.write("a.txt", "a\r\n")
File.read("a.txt").bytes # => [97, 13, 10]
File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10]
File.open("a.txt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] XXX: \r\n -> \n
File.open("a.txt", encoding: "bom|utf-8", universal_newline: false) {|f| f.read.bytes} # => [97, 13, 10]

Note that the XXX: line in the above codes. Is this an intentional behavior?

Updated by nobu (Nobuyoshi Nakada) 8 months ago

Probably a bug at push back after BOM look ahead.

BTW, on Windows, File.write and File.read are in text mode by default.
That file would be 4 bytes, "a\r\r\n" in binary.

Actions #2

Updated by nobu (Nobuyoshi Nakada) 8 months ago

  • Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED
Actions #3

Updated by kou (Kouhei Sutou) 8 months ago

  • Description updated (diff)
Actions #4

Updated by hsbt (Hiroshi SHIBATA) 8 months ago

  • Target version deleted (3.2)

Updated by YO4 (Yoshinao Muramatsu) 23 days ago

There are similar strangeness around an encoding specifiers.

preparations

RUBY_VERSION # => "3.3.5"
File.write("a.txt", "a\r\n")
File.binread("a.txt").bytes # => [97, 13, 13, 10]

experimentations

File.open("a.txt")                         {|f| f.read.bytes} # => [97, 13, 10] # expected(msvcrt[_*] newline)
File.open("a.txt", "r:utf-8")              {|f| f.read.bytes} # => [97, 13, 10] # expected
File.open("a.txt", "r", encoding: "utf-8") {|f| f.read.bytes} # => [97, 13, 10] # expected
File.open("a.txt", encoding: "utf-8")      {|f| f.read.bytes} # => [97, 10, 10] # XXX: universal newline enabled?

The omission of the mode parameter seems to enable universal newline.

File.open("a.txt", "rt:utf-8")                  {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline)
File.open("a.txt", "rt:bom|utf-8")              {|f| f.read.bytes} # => [97, 10] # XXX
File.open("a.txt", "rt", encoding: "utf-8")     {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline)
File.open("a.txt", "rt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] # XXX

XXX: This is odd because universal newline and msvcrt newline appear to be cooperating.

Actions #6

Updated by hsbt (Hiroshi SHIBATA) 11 days ago

  • Tags set to windows
Actions #7

Updated by hsbt (Hiroshi SHIBATA) 11 days ago

  • Tags changed from windows to win
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0