Bug #4366

closed

UTF-8文字列に対しての部分文字列取得操作で結果にゴミがつくことがある

Bug #4366: UTF-8文字列に対しての部分文字列取得操作で結果にゴミがつくことがある

Added by kosaki (Motohiro KOSAKI) almost 15 years ago. Updated over 14 years ago.

Status:

Closed

Assignee:

Target version:

1.9.2

ruby -v:

ruby 1.9.3dev (2011-02-04 trunk 30761) [x86_64-linux]

Backport:

[ruby-dev:43170]

Description

=begin
test.rb¶

coding: utf-8¶

str="あいうえお"

p str[2,17]¶

結果

% ./ruby -v test.rb
ruby 1.9.3dev (2011-02-04 trunk 30761) [x86_64-linux]
"うえお\u0000"

で、考察なんですが、

static char *
str_utf8_nth(const char *p, const char *e, long *nthp)
{
long nth = *nthp;

 if ((int)SIZEOF_VALUE < e - p && (int)SIZEOF_VALUE * 2 < nth) {

             ↑ e-pつまり文字列長の判定がsizeof(VALUE)*2ではなくsizeof(VALUE) (1)

     do {
         nth -= count_utf8_lead_bytes_with_word(s);
         s++;
     } while (s < t && (int)sizeof(VALUE) <= nth);

              ↑ここがwhileではなくdoループ (2)

なので(1)によりs==tがありえて、その場合(2)により文字列外にたいして
count_utf8_lead_bytes_with_word()呼んじゃってるようです。
=end

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #4366

UTF-8文字列に対しての部分文字列取得操作で結果にゴミがつくことがある

=begin
test.rb¶

coding: utf-8¶

p str[2,17]¶

Updated by kosaki (Motohiro KOSAKI) almost 15 years ago Actions
Copy link
#1

Project

General

Profile

Ruby

Custom queries

Bug #4366

UTF-8文字列に対しての部分文字列取得操作で結果にゴミがつくことがある

=begin test.rb¶

coding: utf-8¶

p str[2,17]¶

Updated by kosaki (Motohiro KOSAKI) almost 15 years ago ActionsCopy link #1

=begin
test.rb¶

Updated by kosaki (Motohiro KOSAKI) almost 15 years ago Actions
Copy link
#1