When transcoding characters, IO#seek and IO#pos= only clear the internal character buffer if IO#getc is called first:
require'tempfile'Tempfile.open(encoding: 'utf-8:utf-16le')do|f|f.write('0123456789')f.rewindf.ungetc('a'.encode('utf-16le'))# Character buffer WILL NOT be clearedf.seek(2,:SET)f.getc# => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')endTempfile.open(encoding: 'utf-8:utf-16le')do|f|f.write('0123456789')f.rewindf.ungetc('a'.encode('utf-16le'))# Character buffer WILL NOT be clearedf.pos=2f.getc# => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')endTempfile.open(encoding: 'utf-8:utf-16le')do|f|f.write('0123456789')f.rewind# Added a call to #getc heref.getcf.ungetc('a'.encode('utf-16le'))# Character buffer WILL be cleared nowf.seek(2,:SET)# Same behavior for #pos=#f.pos = 2f.getc# => '2'.encode('utf-16le')end
Subject changed from IO#seek does not clear the character buffer in some cases while transcoding to IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding
I rerun tests on 3.5.0 and it's indeed related to transcoding
puts "Hello dev-ruby! #{RUBY_VERSION}"
require 'tempfile'
Tempfile.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Hello dev-ruby! 3.5.0
2
a
2
a2
so the issue happened when encoding was set on .open. Also when a non-encoded char was ungetc'-ed, getc` returned two characters.
I believe the fix is ready for review https://github.com/ruby/ruby/pull/12714
Some CI jobs were failing (WebAssembly/Cygwin) but the failures seem not to be related to my changes and they're inconsistent (after rebasing Cygwin passed and WebAsm failed).