Actions
Bug #4366
closedUTF-8文字列に対しての部分文字列取得操作で結果にゴミがつくことがある
Description
=begin
test.rb
coding: utf-8¶
str="あいうえお"
p str[2,17]¶
結果
% ./ruby -v test.rb
ruby 1.9.3dev (2011-02-04 trunk 30761) [x86_64-linux]
"うえお\u0000"
で、考察なんですが、
static char *
str_utf8_nth(const char *p, const char *e, long *nthp)
{
long nth = *nthp;
if ((int)SIZEOF_VALUE < e - p && (int)SIZEOF_VALUE * 2 < nth) {
↑ e-pつまり文字列長の判定がsizeof(VALUE)*2ではなくsizeof(VALUE) (1)
do {
nth -= count_utf8_lead_bytes_with_word(s);
s++;
} while (s < t && (int)sizeof(VALUE) <= nth);
↑ここがwhileではなくdoループ (2)
なので(1)によりs==tがありえて、その場合(2)により文字列外にたいして
count_utf8_lead_bytes_with_word()呼んじゃってるようです。
=end
Updated by kosaki (Motohiro KOSAKI) almost 14 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
=begin
This issue was solved with changeset r30779.
Motohiro, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- string.c (str_utf8_nth): fixed a conditon of optimized lead
byte counting. [Bug #4366][ruby-dev:43170]
=end
Actions
Like0
Like0