Actions
Bug #20526
openFile.open(encoding: "bom|utf-8") converts "\r\n" to "\n" on Windows
Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x64-mingw-ucrt]
Tags:
Description
I'm not sure whether this is an intentional behavior or not but it seems that encoding: "utf-8"
doesn't change newline conversion but encoding: "bom|utf-8"
changes newline conversion:
File.write("a.txt", "a\r\n")
File.read("a.txt").bytes # => [97, 13, 10]
File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10]
File.open("a.txt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] XXX: \r\n -> \n
File.open("a.txt", encoding: "bom|utf-8", universal_newline: false) {|f| f.read.bytes} # => [97, 13, 10]
Note that the XXX:
line in the above codes. Is this an intentional behavior?
Updated by nobu (Nobuyoshi Nakada) 8 months ago
Probably a bug at push back after BOM look ahead.
BTW, on Windows, File.write
and File.read
are in text mode by default.
That file would be 4 bytes, "a\r\r\n" in binary.
Updated by nobu (Nobuyoshi Nakada) 8 months ago
- Backport changed from 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED
Updated by YO4 (Yoshinao Muramatsu) 23 days ago
There are similar strangeness around an encoding specifiers.
preparations
RUBY_VERSION # => "3.3.5"
File.write("a.txt", "a\r\n")
File.binread("a.txt").bytes # => [97, 13, 13, 10]
experimentations
File.open("a.txt") {|f| f.read.bytes} # => [97, 13, 10] # expected(msvcrt[_*] newline)
File.open("a.txt", "r:utf-8") {|f| f.read.bytes} # => [97, 13, 10] # expected
File.open("a.txt", "r", encoding: "utf-8") {|f| f.read.bytes} # => [97, 13, 10] # expected
File.open("a.txt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] # XXX: universal newline enabled?
The omission of the mode parameter seems to enable universal newline.
File.open("a.txt", "rt:utf-8") {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline)
File.open("a.txt", "rt:bom|utf-8") {|f| f.read.bytes} # => [97, 10] # XXX
File.open("a.txt", "rt", encoding: "utf-8") {|f| f.read.bytes} # => [97, 10, 10] # expected(universal newline)
File.open("a.txt", "rt", encoding: "bom|utf-8") {|f| f.read.bytes} # => [97, 10] # XXX
XXX: This is odd because universal newline and msvcrt newline appear to be cooperating.
Actions
Like0
Like0Like0Like0Like0Like0Like0Like0