Bug #20819: IO#readline does not process newlines correctly for wide character encodings - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #20819

open

IO#readline does not process newlines correctly for wide character encodings

Bug #20819: IO#readline does not process newlines correctly for wide character encodings

Added by javanthropus (Jeremy Bopp) over 1 year ago. Updated over 1 year ago.

Status:

Open

Assignee:

Target version:

ruby -v:

ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]

Backport:

3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN

[ruby-core:119633]

Description

When not performing character conversion, IO#readline only processes newline characters as ASCII when reading paragraphs. However, when character conversion is involved, even when converting between 2 ASCII incompatible encodings, newline handling is correct.

require "tempfile"

Tempfile.open(binmode: true) do |f|
  f.set_encoding("utf-16le")
  f.write("\n\n\n\nhello\n\nworld")
  f.rewind

  # No character conversion case.
  # Expecting "hello\n\n".encode(Encoding::UTF_16LE)
  f.readline("")   # => "\0".force_encoding(Encoding::UTF_16LE) + "\n\n\nhello\n\nworld".encode(Encoding::UTF_16LE)

  f.set_encoding("utf-16le:utf-32le")
  f.rewind

  # Character conversion case.
  f.readline("")   # => "hello\n\n".encode(Encoding::UTF_32LE)
end

In the failing case, a newline character appears in the first byte of the input due to the UTF-16LE encoding. This is discarded per the normal behavior of reading paragraphs, but the following null byte is not consumed as required to consume the entire newline character in UTF-16LE encoding. This leads to a leading and invalid null byte in the output of IO#readline. Furthermore, the newlines between "hello" and "world" are not seen as a pair of newline characters sufficient to end the first paragraph because they are not ASCII newlines and instead have a null byte between them.

Updated by nobu (Nobuyoshi Nakada) over 1 year ago Actions
Copy link
#1

Subject changed from IO#readline does not process newlines correctly for non-ASCII compatible encodings to IO#readline does not process newlines correctly for wide character encodings

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #20819

IO#readline does not process newlines correctly for wide character encodings

Updated by nobu (Nobuyoshi Nakada) over 1 year ago Actions
Copy link
#1

Project

General

Profile

Ruby

Custom queries

Bug #20819

IO#readline does not process newlines correctly for wide character encodings

Updated by nobu (Nobuyoshi Nakada) over 1 year ago ActionsCopy link #1

Updated by nobu (Nobuyoshi Nakada) over 1 year ago Actions
Copy link
#1