Actions
Bug #6509
closedString#gsub is too slow if receiver includes a binary
Description
=begin
以下のようなコードで String#gsub が遅くなります。
- b = "" の場合(A): 0.2840230464935303
- b = "\xB9" の場合(B): 4.183771848678589
-- coding: utf-8 --¶
a = ("abcde\n"*50000).force_encoding("binary")
#b = ""
b = "\xB9".force_encoding("binary")
c = ("efghi\n"*50000).force_encoding("binary")
d = "#{a}#{b}#{c}"
start = Time.now.to_f
d.gsub(/\n/) { "" }
puts(Time.now.to_f - start)
それぞれの場合で、プロファイルを取ってみたので添付します。
(B)の場合に、search_nonascii を約20万回呼び出して処理時間の92%を費しています。
(A)の場合は、約10万回しか呼び出しておらず、処理時間も短いです。
=end
Files
Updated by shyouhei (Shyouhei Urabe) over 12 years ago
- Category changed from core to M17N
- Status changed from Open to Assigned
- Assignee set to naruse (Yui NARUSE)
str_gsubの中でdestが一回non asciiになってしまったらそれ以降はsearch_nonasciiしても無駄という気がしますが専門家のご意見をうかがいたいところです。
Updated by naruse (Yui NARUSE) over 12 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r35863.
okkez, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- string.c (rb_enc_cr_str_buf_cat): don't reset coderange as unknown.
the condition 'ptr_a8 && str_cr != ENC_CODERANGE_7BIT' means not
unknown, str is also ASCII-8BIT because str_encindex == ptr_encindex,
and nont (str_cr == ENC_CODERANGE_UNKNOWN) and
str_cr != ENC_CODERANGE_7BIT means str_cr is valid because ASCII-8BIT
can't be broken. [ruby-dev:45688] [Bug #6509]
Actions
Like0
Like0Like0