Project

General

Profile

Actions

Feature #2645

closed

Have a method in StringScanner which returns the position in characters rather than in bytes

Added by stefanocr (Stefano Crocco) almost 15 years ago. Updated about 12 years ago.

Status:
Rejected
Target version:
[ruby-core:27792]

Description

=begin
In ruby 1.9, StringScanner#pos returns the position in number of bytes. I read on the ruby mailing list (http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/352809) this happens because working with character-based indexes would be too slow. However, I think it would be nice if StringScanner also provided a method which returned the position in terms of characters (even if it would be slow). As I see it, the situation is the same as with StringScanner#get_byte and StringScanner#getch. I think this would be useful because, when using StringScanner, you're usually interested in the character rather than in bytes.
=end


Related issues 1 (0 open1 closed)

Is duplicate of Ruby master - Feature #1159: StringScanner に文字ベースでのインデックスを返すメソッドがほしいRejectedaamine (Minero Aoki)02/14/2009Actions
Actions #1

Updated by murphy (Kornelius Kalnbach) almost 15 years ago

=begin
+1...but what to name it?

  • char_pos
  • chpos
  • index (like String#index)

by the way, the documentation for StringScanner#pos states:

In the 'terminated' position (i.e. the string
is exhausted), this value is the length of the string.

This is not true:

irb(main):002:0> s = StringScanner.new('äöü'); s.scan(/.*/); s.pos
=> 6
irb(main):003:0> s.string.length
=> 3
=end

Actions #2

Updated by naruse (Yui NARUSE) almost 15 years ago

  • Priority changed from Normal to 3

=begin
StringScanner's pos is related to IO#pos.

Feature#1159 is also about this. (but in Japanaese)

A problem is:
ss = StringScanner.new("äöü")
ss.get_byte
ss.char_pos #=> what is this result?

And more, I doubt the use case.
Can you tell us more detailed use case?

the documentation for StringScanner#pos states:
In the 'terminated' position (i.e. the string
is exhausted), this value is the length of the string.

thanks, I fixed the doc.
=end

Actions #3

Updated by znz (Kazuhiro NISHIYAMA) over 14 years ago

  • Category set to ext
  • Target version set to 2.0.0

=begin

=end

Actions #4

Updated by znz (Kazuhiro NISHIYAMA) over 14 years ago

  • Status changed from Open to Feedback

=begin

=end

Actions #5

Updated by gettalong (Thomas Leitner) about 14 years ago

=begin
I had a similar problem: I wanted to extract a part of a StringScanner-backed string.

Consider the following use case:

  • The StringScanner ss is used to arrive at a certain position.
  • The current position is saved, ie. start_pos = ss.pos.
  • Then ss is used to do some scanning, arriving at a new position: end_pos = ss.pos
  • Extracting the string between start_pos and end_pos using ss.string[start_pos..end_pos] does not work in case the range contains multibyte characters.

My work-around is the following:

   # Extract the part of the StringScanner +strscan+ backed string specified by the +range+. This
   # method works correctly under Ruby 1.8 and Ruby 1.9.
   def extract_string(range, strscan)
     result = nil
     if RUBY_VERSION >= '1.9'
       begin
         enc = strscan.string.encoding
         strscan.string.force_encoding('ASCII-8BIT')
         result = strscan.string[range].force_encoding(enc)
       ensure
         strscan.string.force_encoding(enc)
       end
     else
       result = strscan.string[range]
     end
     result
   end

=end

Updated by naruse (Yui NARUSE) almost 13 years ago

  • Description updated (diff)

gettalong wrote:

I had a similar problem: I wanted to extract a part of a StringScanner-backed string.

Consider the following use case:

  • The StringScanner ss is used to arrive at a certain position.
  • The current position is saved, ie. start_pos = ss.pos.
  • Then ss is used to do some scanning, arriving at a new position: end_pos = ss.pos
  • Extracting the string between start_pos and end_pos using ss.string[start_pos..end_pos] does not work in case the range contains multibyte characters.

You can use String#byteslice.

Updated by mame (Yusuke Endoh) over 12 years ago

  • Assignee set to naruse (Yui NARUSE)

Updated by naruse (Yui NARUSE) about 12 years ago

  • Status changed from Feedback to Rejected
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0