Bug #20663: Reading characters from IO does not recover gracefully from bad data pushed via IO#ungetc - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #20663

closed

Reading characters from IO does not recover gracefully from bad data pushed via IO#ungetc

Bug #20663: Reading characters from IO does not recover gracefully from bad data pushed via IO#ungetc

Added by javanthropus (Jeremy Bopp) about 1 year ago. Updated about 1 year ago.

Status:

Rejected

Assignee:

Target version:

ruby -v:

ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]

Backport:

3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN

[ruby-core:118782]

Description

If bytes that result in at least 2 invalid characters for the internal encoding of an IO object are pushed into the internal buffer with IO#getc, reading from the stream returns invalid characters composed of both bytes from the internal buffer and the converted bytes from the stream even if the next character in the stream itself is completely valid.

char_bytes = Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write("🍣")
  f.rewind
  f.ungetc("🍣".encode('utf-16le').b[0..-2])
  f.each_char.map(&:bytes)
end
puts char_bytes.inspect

The above outputs:

[[60, 216], [99, 60], [216, 99], [223]]

I expect it to output:

[[60, 216], [99], [60, 216, 99, 223]]

In other words, I expect it to first completely drain the internal character buffer returning as many characters as necessary (invalid or otherwise) before reading from the stream and converting and returning the next character. After a bit of testing, the way it seems to behave is this:

Return the next character from the internal buffer either if it's a valid encoding or if there is more than 1 character in the buffer, valid or not
Otherwise, read another character from the stream, convert it, and append the converted bytes to the buffer
Go back to step 1

Maybe this is desired behavior, but I can't understand why. It can't recover from the kind of erroneous data demonstrated in the example above.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Bug #20663

Reading characters from IO does not recover gracefully from bad data pushed via IO#ungetc

Updated by javanthropus (Jeremy Bopp) about 1 year ago Actions
Copy link
#1

Updated by javanthropus (Jeremy Bopp) about 1 year ago Actions
Copy link
#2

Updated by javanthropus (Jeremy Bopp) about 1 year ago Actions
Copy link
#3

Updated by javanthropus (Jeremy Bopp) about 1 year ago Actions
Copy link
#4 [ruby-core:118820]

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago Actions
Copy link
#5

Project

General

Profile

Ruby

Tags

Custom queries

Bug #20663

Reading characters from IO does not recover gracefully from bad data pushed via IO#ungetc

Updated by javanthropus (Jeremy Bopp) about 1 year ago ActionsCopy link #1

Updated by javanthropus (Jeremy Bopp) about 1 year ago ActionsCopy link #2

Updated by javanthropus (Jeremy Bopp) about 1 year ago ActionsCopy link #3

Updated by javanthropus (Jeremy Bopp) about 1 year ago ActionsCopy link #4 [ruby-core:118820]

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago ActionsCopy link #5

Updated by javanthropus (Jeremy Bopp) about 1 year ago Actions
Copy link
#1

Updated by javanthropus (Jeremy Bopp) about 1 year ago Actions
Copy link
#2

Updated by javanthropus (Jeremy Bopp) about 1 year ago Actions
Copy link
#3

Updated by javanthropus (Jeremy Bopp) about 1 year ago Actions
Copy link
#4 [ruby-core:118820]

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago Actions
Copy link
#5