Project

General

Profile

Actions

Bug #20919

open

IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding

Added by javanthropus (Jeremy Bopp) 3 months ago. Updated 12 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux]
[ruby-core:120043]

Description

When transcoding characters, IO#seek and IO#pos= only clear the internal character buffer if IO#getc is called first:

require 'tempfile'

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)

  f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL NOT be cleared
  f.pos = 2

  f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  # Added a call to #getc here
  f.getc

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL be cleared now
  f.seek(2, :SET)
  # Same behavior for #pos=
  #f.pos = 2

  f.getc       # => '2'.encode('utf-16le')
end
Actions #1

Updated by javanthropus (Jeremy Bopp) 3 months ago

  • Subject changed from IO#seek does not clear the character buffer in some cases while transcoding to IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding
  • Description updated (diff)
Actions #2

Updated by javanthropus (Jeremy Bopp) 3 months ago

  • Description updated (diff)

Updated by mjrzasa (Maciek Rząsa) 18 days ago · Edited

I've reproduced it without transcoding:

Tempfile.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  f.getc       # => 'a'
end
# => 'a'

Updated by mjrzasa (Maciek Rząsa) 17 days ago

It works OK with StringIO (unsurprisingly)

StringIO.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2)
  f.getc
end
# => "1"

Updated by mjrzasa (Maciek Rząsa) 17 days ago

I rerun tests on 3.5.0 and it's indeed related to transcoding

puts "Hello dev-ruby! #{RUBY_VERSION}"

require 'tempfile'
Tempfile.open() do |f|
  f.write('0123456789')
  f.rewind
  f.ungetc('a')
  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)
  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)

  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open() do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a'.encode('utf-16le'))

  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)

  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
  f.write('0123456789')
  f.rewind

  f.ungetc('a')

  # Character buffer WILL NOT be cleared
  f.seek(2, :SET)

  puts f.getc       # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end

Hello dev-ruby! 3.5.0
2
a
2
a2

so the issue happened when encoding was set on .open. Also when a non-encoded char was ungetc'-ed, getc` returned two characters.

Updated by mjrzasa (Maciek Rząsa) 15 days ago

I have a draft of a fix for this one https://github.com/ruby/ruby/pull/12714

Updated by mjrzasa (Maciek Rząsa) 12 days ago

I believe the fix is ready for review https://github.com/ruby/ruby/pull/12714
Some CI jobs were failing (WebAssembly/Cygwin) but the failures seem not to be related to my changes and they're inconsistent (after rebasing Cygwin passed and WebAsm failed).

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0