Actions
Bug #20919
openIO#seek and IO#pos= do not clear the character buffer in some cases while transcoding
Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.0dev (2024-11-28T12:38:16Z master 3af1a04741) +PRISM [x86_64-linux]
Description
When transcoding characters, IO#seek
and IO#pos=
only clear the internal character buffer if IO#getc
is called first:
require 'tempfile'
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.pos = 2
f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
# Added a call to #getc here
f.getc
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL be cleared now
f.seek(2, :SET)
# Same behavior for #pos=
#f.pos = 2
f.getc # => '2'.encode('utf-16le')
end
Updated by javanthropus (Jeremy Bopp) 3 months ago
- Subject changed from IO#seek does not clear the character buffer in some cases while transcoding to IO#seek and IO#pos= do not clear the character buffer in some cases while transcoding
- Description updated (diff)
Updated by mjrzasa (Maciek Rząsa) 18 days ago
· Edited
I've reproduced it without transcoding:
Tempfile.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
f.getc # => 'a'
end
# => 'a'
Updated by mjrzasa (Maciek Rząsa) 17 days ago
It works OK with StringIO (unsurprisingly)
StringIO.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2)
f.getc
end
# => "1"
Updated by mjrzasa (Maciek Rząsa) 17 days ago
I rerun tests on 3.5.0 and it's indeed related to transcoding
puts "Hello dev-ruby! #{RUBY_VERSION}"
require 'tempfile'
Tempfile.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open() do |f|
f.write('0123456789')
f.rewind
f.ungetc('a'.encode('utf-16le'))
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Tempfile.open(encoding: 'utf-8:utf-16le') do |f|
f.write('0123456789')
f.rewind
f.ungetc('a')
# Character buffer WILL NOT be cleared
f.seek(2, :SET)
puts f.getc # => 'a'.encode('utf-16le'); should be '2'.encode('utf-16le')
end
Hello dev-ruby! 3.5.0
2
a
2
a2
so the issue happened when encoding was set on .open
. Also when a non-encoded char was ungetc'-ed,
getc` returned two characters.
Updated by mjrzasa (Maciek Rząsa) 15 days ago
I have a draft of a fix for this one https://github.com/ruby/ruby/pull/12714
Updated by mjrzasa (Maciek Rząsa) 12 days ago
I believe the fix is ready for review https://github.com/ruby/ruby/pull/12714
Some CI jobs were failing (WebAssembly/Cygwin) but the failures seem not to be related to my changes and they're inconsistent (after rebasing Cygwin passed and WebAsm failed).
Actions
Like0
Like0Like0Like0Like0Like0Like0Like0