Feature #2645: Have a method in StringScanner which returns the position in characters rather than in bytes - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #2645

closed

Have a method in StringScanner which returns the position in characters rather than in bytes

Added by stefanocr (Stefano Crocco) over 15 years ago. Updated almost 13 years ago.

Status:

Rejected

Assignee:

naruse (Yui NARUSE)

Target version:

2.0.0

[ruby-core:27792]

Description

=begin
In ruby 1.9, StringScanner#pos returns the position in number of bytes. I read on the ruby mailing list (http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/352809) this happens because working with character-based indexes would be too slow. However, I think it would be nice if StringScanner also provided a method which returned the position in terms of characters (even if it would be slow). As I see it, the situation is the same as with StringScanner#get_byte and StringScanner#getch. I think this would be useful because, when using StringScanner, you're usually interested in the character rather than in bytes.
=end

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by murphy (Kornelius Kalnbach) over 15 years ago

=begin
+1...but what to name it?

char_pos
chpos
index (like String#index)

by the way, the documentation for StringScanner#pos states:

In the 'terminated' position (i.e. the string
is exhausted), this value is the length of the string.

This is not true:

irb(main):002:0> s = StringScanner.new('äöü'); s.scan(/.*/); s.pos
=> 6
irb(main):003:0> s.string.length
=> 3
=end

Actions

Copy link

Updated by naruse (Yui NARUSE) over 15 years ago

Priority changed from Normal to 3

=begin
StringScanner's pos is related to IO#pos.

Feature#1159 is also about this. (but in Japanaese)

A problem is:
ss = StringScanner.new("äöü")
ss.get_byte
ss.char_pos #=> what is this result?

And more, I doubt the use case.
Can you tell us more detailed use case?

the documentation for StringScanner#pos states:
In the 'terminated' position (i.e. the string
is exhausted), this value is the length of the string.

thanks, I fixed the doc.
=end

Actions

Copy link

Updated by znz (Kazuhiro NISHIYAMA) over 15 years ago

Category set to ext
Target version set to 2.0.0

=begin

=end

Actions

Copy link

Updated by znz (Kazuhiro NISHIYAMA) over 15 years ago

Status changed from Open to Feedback

=begin

=end

Actions

Copy link

Updated by gettalong (Thomas Leitner) almost 15 years ago

=begin
I had a similar problem: I wanted to extract a part of a StringScanner-backed string.

Consider the following use case:

The StringScanner ss is used to arrive at a certain position.
The current position is saved, ie. start_pos = ss.pos.
Then ss is used to do some scanning, arriving at a new position: end_pos = ss.pos
Extracting the string between start_pos and end_pos using ss.string[start_pos..end_pos] does not work in case the range contains multibyte characters.

My work-around is the following:

   # Extract the part of the StringScanner +strscan+ backed string specified by the +range+. This
   # method works correctly under Ruby 1.8 and Ruby 1.9.
   def extract_string(range, strscan)
     result = nil
     if RUBY_VERSION >= '1.9'
       begin
         enc = strscan.string.encoding
         strscan.string.force_encoding('ASCII-8BIT')
         result = strscan.string[range].force_encoding(enc)
       ensure
         strscan.string.force_encoding(enc)
       end
     else
       result = strscan.string[range]
     end
     result
   end

=end

Actions

Copy link

#6 [ruby-core:43392]

Updated by naruse (Yui NARUSE) over 13 years ago

Description updated (diff)

gettalong wrote:

I had a similar problem: I wanted to extract a part of a StringScanner-backed string.

Consider the following use case:

The StringScanner ss is used to arrive at a certain position.

The current position is saved, ie. start_pos = ss.pos.

Then ss is used to do some scanning, arriving at a new position: end_pos = ss.pos

Extracting the string between start_pos and end_pos using ss.string[start_pos..end_pos] does not work in case the range contains multibyte characters.

You can use String#byteslice.

Actions

Copy link

#7 [ruby-core:43594]

Updated by mame (Yusuke Endoh) over 13 years ago

Assignee set to naruse (Yui NARUSE)

Actions

Copy link

#8 [ruby-core:48333]

Updated by naruse (Yui NARUSE) almost 13 years ago

Status changed from Feedback to Rejected

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Feature #2645

Have a method in StringScanner which returns the position in characters rather than in bytes

Updated by murphy (Kornelius Kalnbach) over 15 years ago

Updated by naruse (Yui NARUSE) over 15 years ago

Updated by znz (Kazuhiro NISHIYAMA) over 15 years ago

Updated by znz (Kazuhiro NISHIYAMA) over 15 years ago

Updated by gettalong (Thomas Leitner) almost 15 years ago

Updated by naruse (Yui NARUSE) over 13 years ago

Updated by mame (Yusuke Endoh) over 13 years ago

Updated by naruse (Yui NARUSE) almost 13 years ago