Backport #4028
closed
substring selection and utf8 encoding problem
Added by barcala (Fco. Mario Barcala Rodríguez) about 14 years ago.
Updated over 5 years ago.
Description
=begin
Substring selection does not work with some utf8 encoded strings. Below is an example. The first substring is well extracted but the second not (extrange characters appear at the end of the substring).
It seems it occurs when the string includes letters with umlauts, accents, etc.
$ irb
ruby-1.9.1-p378 > word = "Ábaco"
=> "Ábaco"
ruby-1.9.1-p378 > substr = word[word.length-1,word.length]
=> "o"
ruby-1.9.1-p378 > word = "Coordinador de ONG's do País Valenciano"
=> "Coordinador de ONG's do País Valenciano"
ruby-1.9.1-p378 > substr = word[word.length-1,word.length]
=> "o\x00\x00\x01\x00\x01\x00\x00\x00"
=end
=begin
The same error occurs in ruby-1.9.1-p430
=end
=begin
It seems to be solved in ruby-1.9.2-p0 version. I can't reproduce the error in 1.9.2-p0
=end
=begin
Showed example uses substring selection in a wrong way. Example should be:
ruby-1.9.1-p378 > word = "Ábaco"
=> "Ábaco"
ruby-1.9.1-p378 > substr = word[word.length-1,1]
=> "o"
ruby-1.9.1-p378 > word = "Coordinador de ONG's do País Valenciano"
=> "Coordinador de ONG's do País Valenciano"
ruby-1.9.1-p378 > substr = word[word.length-1,1]
=> "o"
This new example works fine, so the problem arises only when the second value of substring selection exceeds the limits of the string.
=end
- Status changed from Open to Assigned
- Assignee set to yugui (Yuki Sonoda)
- Priority changed from 5 to Normal
=begin
Confirmed:
ruby 1.9.1p430 (2010-08-16 revision 28997) [x86_64-freebsd8.1]
ruby-1.9.1-p378 > word = "Coordinador de ONG's do País Valenciano"
=> "Coordinador de ONG's do País Valenciano"
ruby-1.9.1-p378 > substr = word[word.length-1,word.length]
=> "o\x00\x00\x01\x00\x01\x00\x00\x00"
=end
- Description updated (diff)
- Status changed from Assigned to Closed
Also available in: Atom
PDF
Like0
Like0Like0Like0Like0Like0