Bug #3482
closedStringScanner#pos returns wrong character position if used with multibyte chars
Description
=begin
The StringScanner class from 1.9's stdlib works on bytes rather than on characters. That means, if you want to extract substrings from the original string by use of the return value of StringScanner#pos you get incorrect results:
irb(main):001:0> require "strscan"
=> true
irb(main):002:0> str = "abcädeföghi"
=> "abcädeföghi"
irb(main):003:0> ss = StringScanner.new(str)
=> #<StringScanner 0/13 @ "abc\xC3\xA4...">
irb(main):004:0> ss.scan_until(/ä/)
=> "abcä"
irb(main):005:0> ss.pos
=> 5
irb(main):006:0> ss.scan_until(/ö/)
=> "defö"
irb(main):007:0> ss.pos
=> 10
irb(main):008:0>
After the first scan_until I expected the position to be 4, after the second to be 8, which means we finally have an offset of 2 here.
My Ruby version is ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux], but I also get the same beaviour with the 1.9.2-preview3 (ruby 1.9.2dev (2010-05-31 revision 28117) [x86_64-linux]).
=end
Updated by mame (Yusuke Endoh) over 14 years ago
- Status changed from Open to Rejected
=begin
Hi,
It is a spec. See rdoc of StringScanner#pos.
FYI, IO#pos is also byte-oriented.
I guess this is because #pos is supposed to be byte-oriented.
--
Yusuke Endoh mame@tsg.ne.jp
=end