Project

General

Profile

Bug #12536

SIGABRT during GC on Debian GNU/Linux 8 (x86_64) due to insufficient string buffer allocation related with term fill

Added by ngoto (Naohisa Goto) about 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-dev:49699]

Description

x86_64 の Debian GNU/Linux 8.4 にて、いつの間にか、 make test-all が途中で突然死して失敗するようになりました。

[ 2025/16989] REXMLTests::EncodingTester#test_parse_utf16 = 0.18 s
[ 2026/16989] REXMLTests::EncodingTester#test_parse_utf16_with_utf8_default_inte
rnal = 0.18 s
[ 2027/16989] REXMLTests::EncodingTester#test_ticket_89 = 0.00 s
uncommon.mk:607: recipe for target 'yes-test-all' failed
make: *** [yes-test-all] Aborted

試行錯誤により範囲を絞り込んだ結果、以下のようにすると再現し、SIGABRT を吐くことが明らかとなりました。

$ ruby -r rexml/document -e 'File.open("test/rexml/data/utf16.xml") { |f| REXML::Document.new(f) }; GC.start'

GDB上で実行すると、GC中にglibcのmalloc関連関数がメモリ破壊を検知してSIGABRTで終了していることがわかりました。

$ gdb ./ruby
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
(略)
(gdb) run -r rexml/document -e 'File.open("test/rexml/data/utf16.xml") { |f| REXML::Document.new(f) }; GC.start'
Starting program: /XXXXX/ruby -r rexml/document -e 'File.open("test/rexml/data/utf16.xml") { |f| REXML::Document.new(f) }; GC.start'

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x2aaaaaad4700 (LWP 20695)]

Program received signal SIGABRT, Aborted.
0x00002aaaab8df067 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: そのようなファイルやディレクトリはありません.
(gdb) where
#0  0x00002aaaab8df067 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00002aaaab8e0448 in __GI_abort () at abort.c:89
#2  0x00002aaaab9229c0 in malloc_printerr (action=<optimized out>, 
    str=0x2aaaaba13310 "malloc_check_get_size: memory corruption", 
    ptr=<optimized out>) at malloc.c:5000
#3  0x00002aaaab926d07 in malloc_check_get_size (p=<optimized out>)
    at hooks.c:114
#4  musable (mem=<optimized out>) at malloc.c:4565
#5  __malloc_usable_size (m=<optimized out>) at malloc.c:4581
#6  0x0000555555594a5a in objspace_malloc_size (objspace=0x555555a52a90, 
    ptr=0x2aaaaac6b010, hint=0) at gc.c:7637
#7  0x0000555555594f7e in objspace_xfree (objspace=0x555555a52a90, 
    ptr=0x2aaaaac6b010, old_size=0) at gc.c:7857
#8  0x000055555559530e in ruby_sized_xfree (x=0x2aaaaac6b010, size=0)
    at gc.c:7952
#9  0x000055555559532d in ruby_xfree (x=0x2aaaaac6b010) at gc.c:7959
#10 0x00005555556739ad in rb_str_free (str=93824997851520) at string.c:1220
#11 0x0000555555589586 in obj_free (objspace=0x555555a52a90, 
    obj=93824997851520) at gc.c:2165
#12 0x000055555558c350 in gc_page_sweep (objspace=0x555555a52a90, 
    heap=0x555555a52ab0, sweep_page=0x555555a57aa0) at gc.c:3435
#13 0x000055555558c7d3 in gc_sweep_step (objspace=0x555555a52a90, 
    heap=0x555555a52ab0) at gc.c:3604
#14 0x000055555558c928 in gc_sweep_rest (objspace=0x555555a52a90) at gc.c:3655
#15 0x000055555558c9e1 in gc_sweep (objspace=0x555555a52a90) at gc.c:3688
#16 0x000055555558fbbc in gc_marks_rest (objspace=0x555555a52a90) at gc.c:5508
#17 0x000055555558fd3e in gc_marks (objspace=0x555555a52a90, full_mark=1)
    at gc.c:5563
#18 0x0000555555591baa in gc_start (objspace=0x555555a52a90, full_mark=1, 
    immediate_mark=1, immediate_sweep=1, reason=9217) at gc.c:6349
#19 0x0000555555591924 in garbage_collect (objspace=0x555555a52a90, 
    full_mark=1, immediate_mark=1, immediate_sweep=1, reason=1024) at gc.c:6267
#20 0x00005555555921d4 in gc_start_internal (argc=0, argv=0x2aaaaab04038, 
    self=93824997885720) at gc.c:6582
#21 0x00005555556c7ed8 in call_cfunc_m1 (
    func=0x555555592021 <gc_start_internal>, recv=93824997885720, argc=0, 
    argv=0x2aaaaab04038) at vm_insnhelper.c:1462
#22 0x00005555556c89be in vm_call_cfunc_with_frame (th=0x555555a52690, 
    reg_cfp=0x2aaaaac03f90, calling=0x7fffffffdd80, ci=0x555555f19660, 
    cc=0x555555f9a888) at vm_insnhelper.c:1641
#23 0x00005555556c8b24 in vm_call_cfunc (th=0x555555a52690, 
    reg_cfp=0x2aaaaac03f90, calling=0x7fffffffdd80, ci=0x555555f19660, 
    cc=0x555555f9a888) at vm_insnhelper.c:1736
#24 0x00005555556c99be in vm_call_method_each_type (th=0x555555a52690, 
    cfp=0x2aaaaac03f90, calling=0x7fffffffdd80, ci=0x555555f19660, 
    cc=0x555555f9a888) at vm_insnhelper.c:2028
#25 0x00005555556ca062 in vm_call_method (th=0x555555a52690, 
    cfp=0x2aaaaac03f90, calling=0x7fffffffdd80, ci=0x555555f19660, 
    cc=0x555555f9a888) at vm_insnhelper.c:2152
#26 0x00005555556ca238 in vm_call_general (th=0x555555a52690, 
    reg_cfp=0x2aaaaac03f90, calling=0x7fffffffdd80, ci=0x555555f19660, 
    cc=0x555555f9a888) at vm_insnhelper.c:2195
#27 0x00005555556ce37f in vm_exec_core (th=0x555555a52690, initial=0)
    at insns.def:1064
#28 0x00005555556df832 in vm_exec (th=0x555555a52690) at vm.c:1653
#29 0x00005555556e0505 in rb_iseq_eval_main (iseq=0x555555acb520) at vm.c:1896
#30 0x00005555555793fc in ruby_exec_internal (n=0x555555acb520) at eval.c:244
#31 0x0000555555579525 in ruby_exec_node (n=0x555555acb520) at eval.c:308
#32 0x00005555555794f8 in ruby_run_node (n=0x555555acb520) at eval.c:300
#33 0x00005555555770dd in main (argc=5, argv=0x7fffffffe5e8) at main.c:36
(gdb) 

frame #2 では、libc内から"malloc_check_get_size: memory corruption" というエラーメッセージを出そうとしているのが見えます。(が、表示されないまま強制終了となってしまうようです。)

なお、以前通っていたリビジョンでも今の環境で再コンパイルするとダメだったので、libcなど環境の変化により、以前は検知できていなかったメモリ破壊を検出できたということのようです。

valgrind上で実行するとメモリ破壊発生箇所が特定できました。

$ valgrind ruby -r rexml/document -e 'File.open("test/rexml/data/utf16.xml") { |f| REXML::Document.new(f) }; GC.start'
==1118== Memcheck, a memory error detector
==1118== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1118== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==1118== Command: ruby -r rexml/document -e File.open("test/rexml/data/utf16.xml")\ {\ |f|\ REXML::Document.new(f)\ };\ GC.start
==1118== 
==1118== Invalid write of size 1
==1118==    at 0x4C2F415: memset (vg_replace_strmem.c:1094)
==1118==    by 0x2292BA: rb_str_plus (string.c:1742)
==1118==    by 0x2831B7: vm_exec_core (insns.def:1404)
==1118==    by 0x293831: vm_exec (vm.c:1653)
==1118==    by 0x28B764: vm_call0_body (vm_eval.c:182)
==1118==    by 0x28B319: vm_call0 (vm_eval.c:61)
==1118==    by 0x28BD87: rb_call0 (vm_eval.c:344)
==1118==    by 0x28C8E3: rb_call (vm_eval.c:630)
==1118==    by 0x28D1D9: rb_funcallv (vm_eval.c:856)
==1118==    by 0x12FD35: rb_obj_call_init (eval.c:1319)
==1118==    by 0x19543F: rb_class_new_instance (object.c:1853)
==1118==    by 0x27BED7: call_cfunc_m1 (vm_insnhelper.c:1462)
==1118==  Address 0x94f94f7 is 0 bytes after a block of size 207,463 alloc'd
==1118==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==1118==    by 0x148CB8: objspace_xmalloc0 (gc.c:7782)
==1118==    by 0x148E25: objspace_xmalloc2 (gc.c:7810)
==1118==    by 0x149077: ruby_xmalloc2 (gc.c:7887)
==1118==    by 0x225B49: str_new0 (string.c:699)
==1118==    by 0x225D0C: str_new (string.c:716)
==1118==    by 0x225D3B: rb_str_new (string.c:722)
==1118==    by 0x22920A: rb_str_plus (string.c:1738)
==1118==    by 0x2831B7: vm_exec_core (insns.def:1404)
==1118==    by 0x293831: vm_exec (vm.c:1653)
==1118==    by 0x28B764: vm_call0_body (vm_eval.c:182)
==1118==    by 0x28B319: vm_call0 (vm_eval.c:61)
==1118== 
==1118== 
==1118== HEAP SUMMARY:
==1118==     in use at exit: 4,716,758 bytes in 37,139 blocks
==1118==   total heap usage: 254,757 allocs, 217,618 frees, 10,809,880,914 bytes allocated
==1118== 
==1118== LEAK SUMMARY:
==1118==    definitely lost: 863,638 bytes in 6,579 blocks
==1118==    indirectly lost: 1,338,400 bytes in 12,858 blocks
==1118==      possibly lost: 2,095,303 bytes in 10,822 blocks
==1118==    still reachable: 419,417 bytes in 6,880 blocks
==1118==         suppressed: 0 bytes in 0 blocks
==1118== Rerun with --leak-check=full to see details of leaked memory
==1118== 
==1118== For counts of detected and suppressed errors, rerun with: -v
==1118== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

以下のパッチで、このエラーは出なくなることを確認しました。

--- string.c    (revision 55539)
+++ string.c    (working copy)
@@ -1730,16 +1730,18 @@
     rb_encoding *enc;
     char *ptr1, *ptr2, *ptr3;
     long len1, len2;
+    int termlen;

     StringValue(str2);
     enc = rb_enc_check_str(str1, str2);
     RSTRING_GETMEM(str1, ptr1, len1);
     RSTRING_GETMEM(str2, ptr2, len2);
-    str3 = rb_str_new(0, len1+len2);
+    termlen = rb_enc_mbminlen(enc);
+    str3 = str_new0(rb_cString, 0, len1+len2, termlen);
     ptr3 = RSTRING_PTR(str3);
     memcpy(ptr3, ptr1, len1);
     memcpy(ptr3+len1, ptr2, len2);
-    TERM_FILL(&ptr3[len1+len2], rb_enc_mbminlen(enc));
+    TERM_FILL(&ptr3[len1+len2], termlen);

     FL_SET_RAW(str3, OBJ_TAINTED_RAW(str1) | OBJ_TAINTED_RAW(str2));
     ENCODING_CODERANGE_SET(str3, rb_enc_to_index(enc),

UTF-16LEやUTF-16BEの文字列では、TERM_FILL()により末端2バイトが0で埋められるため、2バイト余分にメモリを確保する必要がありますが、rb_str_new() はそれを想定しておらず常に+1しか確保していないため、メモリ破壊に至ったようです。

このパッチで解決したと思ったら、次に、以下のように別の場所でも同様の突然死が発生することが判明しました。やはり、UTF-16やUTF-32の文字列・正規表現が関連している場所でした。

[13718/16989] TestRegexp#test_source_unescaped = 0.00 s
[13719/16989] TestRegexp#test_to_suncommon.mk:607: recipe for target 'yes-test-a
ll' failed
make: *** [yes-test-all] Aborted

調べたところ、以下のようにすると再現しました。(TestRegexp#test_to_s の内容をassertのかわりにpにして抜粋しただけ)

$ ./ruby -e 'str = "abcd\u3042"; [:UTF_16BE, :UTF_16LE, :UTF_32BE, :UTF_32LE].each do |es|; enc = Encoding.const_get(es); rs = Regexp.new(str.encode(enc)).to_s; p("(?-mix:abcd\u3042)".encode(enc) == rs); p(enc == rs.encoding); end'

仕方ないので、string.c内のTERM_FILLやTERM_LENやtermlenに関係した場所を調べて、全部修正してみたつもりのものが、添付のパッチです。

これで、valgrind経由で実行してもメモリ破壊は検出されなくなりました。

ただし、rb_str_new() と rb_str_buf_new() にて、従来 capa+1 バイトのメモリを確保していた部分を修正するにあたり、後からどんなエンコーディングが関連付けられるかわからないため、現時点での最大と思われる数値 4 を #define TERM_LEN_MAX 4 として定義して、それを常に足すようにしています。つまり、わずかながら、従来より余計にメモリを消費します。将来 '\0' 終端は廃止との方針がChangeLogやNEWSに書いてあるので、それまでの辛抱となります。(あるいは、終端用に何バイト確保したかを struct RString 内に記録するとか?)

また、RSTRING_EMBED_LEN_MAX を使う際は常に term_len を意識する必要がありますが、RSTRING_EMBED_LEN_MAX は (メモリ容量-1) で定義されているため、RSTRING_EMBED_LEN_MAXを使う際は "+1-term_len" などとせざるを得ませんでした。これも、'\0' 終端廃止までの辛抱かと思いますが、+1を忘れそうで、罠の気がします。


Files

20160630-string.c-term-len.patch (5.66 KB) 20160630-string.c-term-len.patch ngoto (Naohisa Goto), 06/30/2016 09:29 AM

Associated revisions

Revision f2ee2237
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Fix memory corruptions when using UTF-16/32 strings.
    [Bug #12536] [ruby-dev:49699]

  • string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
    the same as largest value of rb_enc_mbminlen(enc) among encodings.

  • string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
    +TERM_LEN_MAX bytes instead of +1. This change may increase memory
    usage.

  • string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

  • string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
    of termlen.

  • string.c (str_shared_replace): Copy +termlen bytes instead of +1.

  • string.c (rb_str_times): termlen should not be included in capa.

  • string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
    termlen should be counted with it because embedded strings are
    also processed by TERM_FILL.

  • string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

  • string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55547 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55547
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Fix memory corruptions when using UTF-16/32 strings.
    [Bug #12536] [ruby-dev:49699]

  • string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
    the same as largest value of rb_enc_mbminlen(enc) among encodings.

  • string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
    +TERM_LEN_MAX bytes instead of +1. This change may increase memory
    usage.

  • string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

  • string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
    of termlen.

  • string.c (str_shared_replace): Copy +termlen bytes instead of +1.

  • string.c (rb_str_times): termlen should not be included in capa.

  • string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
    termlen should be counted with it because embedded strings are
    also processed by TERM_FILL.

  • string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

  • string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

Revision 55547
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Fix memory corruptions when using UTF-16/32 strings.
    [Bug #12536] [ruby-dev:49699]

  • string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
    the same as largest value of rb_enc_mbminlen(enc) among encodings.

  • string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
    +TERM_LEN_MAX bytes instead of +1. This change may increase memory
    usage.

  • string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

  • string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
    of termlen.

  • string.c (str_shared_replace): Copy +termlen bytes instead of +1.

  • string.c (rb_str_times): termlen should not be included in capa.

  • string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
    termlen should be counted with it because embedded strings are
    also processed by TERM_FILL.

  • string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

  • string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

Revision 55547
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Fix memory corruptions when using UTF-16/32 strings.
    [Bug #12536] [ruby-dev:49699]

  • string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
    the same as largest value of rb_enc_mbminlen(enc) among encodings.

  • string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
    +TERM_LEN_MAX bytes instead of +1. This change may increase memory
    usage.

  • string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

  • string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
    of termlen.

  • string.c (str_shared_replace): Copy +termlen bytes instead of +1.

  • string.c (rb_str_times): termlen should not be included in capa.

  • string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
    termlen should be counted with it because embedded strings are
    also processed by TERM_FILL.

  • string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

  • string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

Revision 55547
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Fix memory corruptions when using UTF-16/32 strings.
    [Bug #12536] [ruby-dev:49699]

  • string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
    the same as largest value of rb_enc_mbminlen(enc) among encodings.

  • string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
    +TERM_LEN_MAX bytes instead of +1. This change may increase memory
    usage.

  • string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

  • string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
    of termlen.

  • string.c (str_shared_replace): Copy +termlen bytes instead of +1.

  • string.c (rb_str_times): termlen should not be included in capa.

  • string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
    termlen should be counted with it because embedded strings are
    also processed by TERM_FILL.

  • string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

  • string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

Revision 6734a0c3
Added by ngoto (Naohisa Goto) about 3 years ago

string.c: Add parentheses to avoid C source code ambiguity. [Bug #12536]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55551 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55551
Added by ngoto (Naohisa Goto) about 3 years ago

string.c: Add parentheses to avoid C source code ambiguity. [Bug #12536]

Revision 55551
Added by ngoto (Naohisa Goto) about 3 years ago

string.c: Add parentheses to avoid C source code ambiguity. [Bug #12536]

Revision 55551
Added by ngoto (Naohisa Goto) about 3 years ago

string.c: Add parentheses to avoid C source code ambiguity. [Bug #12536]

Revision 55551
Added by ngoto (Naohisa Goto) about 3 years ago

string.c: Add parentheses to avoid C source code ambiguity. [Bug #12536]

Revision 10e28726
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX is used, TERM_LEN(str) should be considered with it because embedded strings are also processed by TERM_FILL. Additional fix for [Bug #12536] [ruby-dev:49699].

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55552 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55552
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX is used, TERM_LEN(str) should be considered with it because embedded strings are also processed by TERM_FILL. Additional fix for [Bug #12536] [ruby-dev:49699].

Revision 55552
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX is used, TERM_LEN(str) should be considered with it because embedded strings are also processed by TERM_FILL. Additional fix for [Bug #12536] [ruby-dev:49699].

Revision 55552
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX is used, TERM_LEN(str) should be considered with it because embedded strings are also processed by TERM_FILL. Additional fix for [Bug #12536] [ruby-dev:49699].

Revision 55552
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX is used, TERM_LEN(str) should be considered with it because embedded strings are also processed by TERM_FILL. Additional fix for [Bug #12536] [ruby-dev:49699].

Revision a92a537b
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Specify termlen as far as possible.
    Additional fix for [Bug #12536] [ruby-dev:49699].

  • string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
    which is apparently 1 for the encodings.

  • string.c (str_new0_cstr): New static function to create a String
    object from a C string with specifying termlen.

  • string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
    termlen by using new str_new0_cstr().

  • string.c (str_new_static): Specify termlen from the given encoding
    when creating a new String object is needed.

  • string.c (rb_tainted_str_new_with_enc): New function to create a
    tainted String object with the given encoding. This means that
    the termlen is correctly specified. Curretly static function.
    The function name might be renamed to rb_tainted_enc_str_new
    or rb_enc_tainted_str_new.

  • string.c (rb_external_str_new_with_enc): Use encoding by using the
    above rb_tainted_str_new_with_enc().

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55555 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55555
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Specify termlen as far as possible.
    Additional fix for [Bug #12536] [ruby-dev:49699].

  • string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
    which is apparently 1 for the encodings.

  • string.c (str_new0_cstr): New static function to create a String
    object from a C string with specifying termlen.

  • string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
    termlen by using new str_new0_cstr().

  • string.c (str_new_static): Specify termlen from the given encoding
    when creating a new String object is needed.

  • string.c (rb_tainted_str_new_with_enc): New function to create a
    tainted String object with the given encoding. This means that
    the termlen is correctly specified. Curretly static function.
    The function name might be renamed to rb_tainted_enc_str_new
    or rb_enc_tainted_str_new.

  • string.c (rb_external_str_new_with_enc): Use encoding by using the
    above rb_tainted_str_new_with_enc().

Revision 55555
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Specify termlen as far as possible.
    Additional fix for [Bug #12536] [ruby-dev:49699].

  • string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
    which is apparently 1 for the encodings.

  • string.c (str_new0_cstr): New static function to create a String
    object from a C string with specifying termlen.

  • string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
    termlen by using new str_new0_cstr().

  • string.c (str_new_static): Specify termlen from the given encoding
    when creating a new String object is needed.

  • string.c (rb_tainted_str_new_with_enc): New function to create a
    tainted String object with the given encoding. This means that
    the termlen is correctly specified. Curretly static function.
    The function name might be renamed to rb_tainted_enc_str_new
    or rb_enc_tainted_str_new.

  • string.c (rb_external_str_new_with_enc): Use encoding by using the
    above rb_tainted_str_new_with_enc().

Revision 55555
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Specify termlen as far as possible.
    Additional fix for [Bug #12536] [ruby-dev:49699].

  • string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
    which is apparently 1 for the encodings.

  • string.c (str_new0_cstr): New static function to create a String
    object from a C string with specifying termlen.

  • string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
    termlen by using new str_new0_cstr().

  • string.c (str_new_static): Specify termlen from the given encoding
    when creating a new String object is needed.

  • string.c (rb_tainted_str_new_with_enc): New function to create a
    tainted String object with the given encoding. This means that
    the termlen is correctly specified. Curretly static function.
    The function name might be renamed to rb_tainted_enc_str_new
    or rb_enc_tainted_str_new.

  • string.c (rb_external_str_new_with_enc): Use encoding by using the
    above rb_tainted_str_new_with_enc().

Revision 55555
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Specify termlen as far as possible.
    Additional fix for [Bug #12536] [ruby-dev:49699].

  • string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
    which is apparently 1 for the encodings.

  • string.c (str_new0_cstr): New static function to create a String
    object from a C string with specifying termlen.

  • string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
    termlen by using new str_new0_cstr().

  • string.c (str_new_static): Specify termlen from the given encoding
    when creating a new String object is needed.

  • string.c (rb_tainted_str_new_with_enc): New function to create a
    tainted String object with the given encoding. This means that
    the termlen is correctly specified. Curretly static function.
    The function name might be renamed to rb_tainted_enc_str_new
    or rb_enc_tainted_str_new.

  • string.c (rb_external_str_new_with_enc): Use encoding by using the
    above rb_tainted_str_new_with_enc().

Revision 61f2ee0d
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_fill_term): When termlen increases, re-allocation of memory for termlen should always be needed. In this fix, if possible, decrease capa instead of realloc. [Bug #12536] [ruby-dev:49699]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55557 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55557
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_fill_term): When termlen increases, re-allocation of memory for termlen should always be needed. In this fix, if possible, decrease capa instead of realloc. [Bug #12536] [ruby-dev:49699]

Revision 55557
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_fill_term): When termlen increases, re-allocation of memory for termlen should always be needed. In this fix, if possible, decrease capa instead of realloc. [Bug #12536] [ruby-dev:49699]

Revision 55557
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_fill_term): When termlen increases, re-allocation of memory for termlen should always be needed. In this fix, if possible, decrease capa instead of realloc. [Bug #12536] [ruby-dev:49699]

Revision 55557
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_fill_term): When termlen increases, re-allocation of memory for termlen should always be needed. In this fix, if possible, decrease capa instead of realloc. [Bug #12536] [ruby-dev:49699]

Revision 3418a277
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Partially reverts r55547 and r55555. ChangeLog about the reverted changes are also deleted in this file. [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55559 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55559
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Partially reverts r55547 and r55555. ChangeLog about the reverted changes are also deleted in this file. [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]

Revision 55559
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Partially reverts r55547 and r55555. ChangeLog about the reverted changes are also deleted in this file. [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]

Revision 55559
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Partially reverts r55547 and r55555. ChangeLog about the reverted changes are also deleted in this file. [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]

Revision 55559
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c: Partially reverts r55547 and r55555. ChangeLog about the reverted changes are also deleted in this file. [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]

Revision 5eff15d1
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_change_terminator_length): New function to change
    termlen and resize heap for the terminator. This is split from
    rb_str_fill_terminator (str_fill_term) because filling terminator
    and changing terminator length are different things. [Bug #12536]

  • internal.h: declaration for rb_str_change_terminator_length.

  • string.c (str_fill_term): Simplify only to zero-fill the terminator.
    For non-shared strings, it assumes that (capa + termlen) bytes of
    heap is allocated. This partially reverts r55557.

  • encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
    is used, and it should be called whenever the termlen is changed.

  • string.c (str_capacity): New static function to return capacity
    of a string with the given termlen, because the termlen may
    sometimes be different from TERM_LEN(str) especially during
    changing termlen or filling terminator with specific termlen.

  • string.c (rb_str_capacity): Use str_capacity.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55575 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55575
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_change_terminator_length): New function to change
    termlen and resize heap for the terminator. This is split from
    rb_str_fill_terminator (str_fill_term) because filling terminator
    and changing terminator length are different things. [Bug #12536]

  • internal.h: declaration for rb_str_change_terminator_length.

  • string.c (str_fill_term): Simplify only to zero-fill the terminator.
    For non-shared strings, it assumes that (capa + termlen) bytes of
    heap is allocated. This partially reverts r55557.

  • encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
    is used, and it should be called whenever the termlen is changed.

  • string.c (str_capacity): New static function to return capacity
    of a string with the given termlen, because the termlen may
    sometimes be different from TERM_LEN(str) especially during
    changing termlen or filling terminator with specific termlen.

  • string.c (rb_str_capacity): Use str_capacity.

Revision 55575
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_change_terminator_length): New function to change
    termlen and resize heap for the terminator. This is split from
    rb_str_fill_terminator (str_fill_term) because filling terminator
    and changing terminator length are different things. [Bug #12536]

  • internal.h: declaration for rb_str_change_terminator_length.

  • string.c (str_fill_term): Simplify only to zero-fill the terminator.
    For non-shared strings, it assumes that (capa + termlen) bytes of
    heap is allocated. This partially reverts r55557.

  • encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
    is used, and it should be called whenever the termlen is changed.

  • string.c (str_capacity): New static function to return capacity
    of a string with the given termlen, because the termlen may
    sometimes be different from TERM_LEN(str) especially during
    changing termlen or filling terminator with specific termlen.

  • string.c (rb_str_capacity): Use str_capacity.

Revision 55575
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_change_terminator_length): New function to change
    termlen and resize heap for the terminator. This is split from
    rb_str_fill_terminator (str_fill_term) because filling terminator
    and changing terminator length are different things. [Bug #12536]

  • internal.h: declaration for rb_str_change_terminator_length.

  • string.c (str_fill_term): Simplify only to zero-fill the terminator.
    For non-shared strings, it assumes that (capa + termlen) bytes of
    heap is allocated. This partially reverts r55557.

  • encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
    is used, and it should be called whenever the termlen is changed.

  • string.c (str_capacity): New static function to return capacity
    of a string with the given termlen, because the termlen may
    sometimes be different from TERM_LEN(str) especially during
    changing termlen or filling terminator with specific termlen.

  • string.c (rb_str_capacity): Use str_capacity.

Revision 55575
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (rb_str_change_terminator_length): New function to change
    termlen and resize heap for the terminator. This is split from
    rb_str_fill_terminator (str_fill_term) because filling terminator
    and changing terminator length are different things. [Bug #12536]

  • internal.h: declaration for rb_str_change_terminator_length.

  • string.c (str_fill_term): Simplify only to zero-fill the terminator.
    For non-shared strings, it assumes that (capa + termlen) bytes of
    heap is allocated. This partially reverts r55557.

  • encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
    is used, and it should be called whenever the termlen is changed.

  • string.c (str_capacity): New static function to return capacity
    of a string with the given termlen, because the termlen may
    sometimes be different from TERM_LEN(str) especially during
    changing termlen or filling terminator with specific termlen.

  • string.c (rb_str_capacity): Use str_capacity.

Revision 2bb292fc
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_buf_cat): Fix capa size for embed string. Fix bug in r55547. [Bug #12536]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55691 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55691
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_buf_cat): Fix capa size for embed string. Fix bug in r55547. [Bug #12536]

Revision 55691
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_buf_cat): Fix capa size for embed string. Fix bug in r55547. [Bug #12536]

Revision 55691
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_buf_cat): Fix capa size for embed string. Fix bug in r55547. [Bug #12536]

Revision 55691
Added by ngoto (Naohisa Goto) about 3 years ago

  • string.c (str_buf_cat): Fix capa size for embed string. Fix bug in r55547. [Bug #12536]

Revision 6f122a4f
Added by nagachika (Tomoyuki Chikanaga) almost 3 years ago

merge revision(s) 55547,55551,55552,55555,55557,55559,55575,55691,55568: [Backport #12536]

    * string.c: Fix memory corruptions when using UTF-16/32 strings.
      [Bug #12536] [ruby-dev:49699]

    * string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
      the same as largest value of rb_enc_mbminlen(enc) among encodings.

    * string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
      +TERM_LEN_MAX bytes instead of +1. This change may increase memory
      usage.

    * string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

    * string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
      of termlen.

    * string.c (str_shared_replace): Copy +termlen bytes instead of +1.

    * string.c (rb_str_times): termlen should not be included in capa.

    * string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
      termlen should be counted with it because embedded strings are
      also processed by TERM_FILL.

    * string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

    * string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

    * string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX
      is used, TERM_LEN(str) should be considered with it because
      embedded strings are also processed by TERM_FILL.
      Additional fix for [Bug #12536] [ruby-dev:49699].
      Additional fix for [Bug #12536] [ruby-dev:49699].

    * string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
      which is apparently 1 for the encodings.

    * string.c (str_new0_cstr): New static function to create a String
      object from a C string with specifying termlen.

    * string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
      termlen by using new str_new0_cstr().

    * string.c (str_new_static): Specify termlen from the given encoding
      when creating a new String object is needed.

    * string.c (rb_tainted_str_new_with_enc): New function to create a
      tainted String object with the given encoding. This means that
      the termlen is correctly specified. Curretly static function.
      The function name might be renamed to rb_tainted_enc_str_new
      or rb_enc_tainted_str_new.

    * string.c (rb_external_str_new_with_enc): Use encoding by using the
      above rb_tainted_str_new_with_enc().

    * string.c (str_fill_term): When termlen increases, re-allocation
      of memory for termlen should always be needed.
      In this fix, if possible, decrease capa instead of realloc.
      [Bug #12536] [ruby-dev:49699]

    * string.c: Partially reverts r55547 and r55555.
      ChangeLog about the reverted changes are also deleted in this file.
      [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]

    * string.c (rb_str_change_terminator_length): New function to change
      termlen and resize heap for the terminator. This is split from
      rb_str_fill_terminator (str_fill_term) because filling terminator
      and changing terminator length are different things. [Bug #12536]

    * internal.h: declaration for rb_str_change_terminator_length.

    * string.c (str_fill_term): Simplify only to zero-fill the terminator.
      For non-shared strings, it assumes that (capa + termlen) bytes of
      heap is allocated. This partially reverts r55557.

    * encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
      is used, and it should be called whenever the termlen is changed.

    * string.c (str_capacity): New static function to return capacity
      of a string with the given termlen, because the termlen may
      sometimes be different from TERM_LEN(str) especially during
      changing termlen or filling terminator with specific termlen.

    * string.c (rb_str_capacity): Use str_capacity.

    * string.c (str_buf_cat): Fix capa size for embed string.
      Fix bug in r55547. [Bug #12536]

    * string.c: Specify termlen as far as possible.
      the termlen is correctly specified. Currently static function.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_3@55988 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 55988
Added by nagachika (Tomoyuki Chikanaga) almost 3 years ago

merge revision(s) 55547,55551,55552,55555,55557,55559,55575,55691,55568: [Backport #12536]

* string.c: Fix memory corruptions when using UTF-16/32 strings.
  [Bug #12536] [ruby-dev:49699]

* string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
  the same as largest value of rb_enc_mbminlen(enc) among encodings.

* string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
  +TERM_LEN_MAX bytes instead of +1. This change may increase memory
  usage.

* string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

* string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
  of termlen.

* string.c (str_shared_replace): Copy +termlen bytes instead of +1.

* string.c (rb_str_times): termlen should not be included in capa.

* string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
  termlen should be counted with it because embedded strings are
  also processed by TERM_FILL.

* string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

* string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

* string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX
  is used, TERM_LEN(str) should be considered with it because
  embedded strings are also processed by TERM_FILL.
  Additional fix for [Bug #12536] [ruby-dev:49699].
  Additional fix for [Bug #12536] [ruby-dev:49699].

* string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
  which is apparently 1 for the encodings.

* string.c (str_new0_cstr): New static function to create a String
  object from a C string with specifying termlen.

* string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
  termlen by using new str_new0_cstr().

* string.c (str_new_static): Specify termlen from the given encoding
  when creating a new String object is needed.

* string.c (rb_tainted_str_new_with_enc): New function to create a
  tainted String object with the given encoding. This means that
  the termlen is correctly specified. Curretly static function.
  The function name might be renamed to rb_tainted_enc_str_new
  or rb_enc_tainted_str_new.

* string.c (rb_external_str_new_with_enc): Use encoding by using the
  above rb_tainted_str_new_with_enc().

* string.c (str_fill_term): When termlen increases, re-allocation
  of memory for termlen should always be needed.
  In this fix, if possible, decrease capa instead of realloc.
  [Bug #12536] [ruby-dev:49699]

* string.c: Partially reverts r55547 and r55555.
  ChangeLog about the reverted changes are also deleted in this file.
  [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]

* string.c (rb_str_change_terminator_length): New function to change
  termlen and resize heap for the terminator. This is split from
  rb_str_fill_terminator (str_fill_term) because filling terminator
  and changing terminator length are different things. [Bug #12536]

* internal.h: declaration for rb_str_change_terminator_length.

* string.c (str_fill_term): Simplify only to zero-fill the terminator.
  For non-shared strings, it assumes that (capa + termlen) bytes of
  heap is allocated. This partially reverts r55557.

* encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
  is used, and it should be called whenever the termlen is changed.

* string.c (str_capacity): New static function to return capacity
  of a string with the given termlen, because the termlen may
  sometimes be different from TERM_LEN(str) especially during
  changing termlen or filling terminator with specific termlen.

* string.c (rb_str_capacity): Use str_capacity.

* string.c (str_buf_cat): Fix capa size for embed string.
  Fix bug in r55547. [Bug #12536]

* string.c: Specify termlen as far as possible.
  the termlen is correctly specified. Currently static function.

Revision 63e4f212
Added by usa (Usaku NAKAMURA) almost 3 years ago

merge revision(s) 55547,55551,55552,55555,55557,55559,55568,55575,55691: [Backport #12536]

    * string.c: Fix memory corruptions when using UTF-16/32 strings.
      [Bug #12536] [ruby-dev:49699]

    * string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
      the same as largest value of rb_enc_mbminlen(enc) among encodings.

    * string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
      +TERM_LEN_MAX bytes instead of +1. This change may increase memory
      usage.

    * string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

    * string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
      of termlen.

    * string.c (str_shared_replace): Copy +termlen bytes instead of +1.

    * string.c (rb_str_times): termlen should not be included in capa.

    * string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
      termlen should be counted with it because embedded strings are
      also processed by TERM_FILL.

    * string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

    * string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

    * string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX
      is used, TERM_LEN(str) should be considered with it because
      embedded strings are also processed by TERM_FILL.
      Additional fix for [Bug #12536] [ruby-dev:49699].
      Additional fix for [Bug #12536] [ruby-dev:49699].

    * string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
      which is apparently 1 for the encodings.

    * string.c (str_new0_cstr): New static function to create a String
      object from a C string with specifying termlen.

    * string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
      termlen by using new str_new0_cstr().

    * string.c (str_new_static): Specify termlen from the given encoding
      when creating a new String object is needed.

    * string.c (rb_tainted_str_new_with_enc): New function to create a
      tainted String object with the given encoding. This means that
      the termlen is correctly specified. Curretly static function.
      The function name might be renamed to rb_tainted_enc_str_new
      or rb_enc_tainted_str_new.

    * string.c (rb_external_str_new_with_enc): Use encoding by using the
      above rb_tainted_str_new_with_enc().

    * string.c (str_fill_term): When termlen increases, re-allocation
      of memory for termlen should always be needed.
      In this fix, if possible, decrease capa instead of realloc.
      [Bug #12536] [ruby-dev:49699]

    * string.c: Partially reverts r55547 and r55555.
      ChangeLog about the reverted changes are also deleted in this file.
      [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]

    * string.c: Specify termlen as far as possible.
      the termlen is correctly specified. Currently static function.

    * string.c (rb_str_change_terminator_length): New function to change
      termlen and resize heap for the terminator. This is split from
      rb_str_fill_terminator (str_fill_term) because filling terminator
      and changing terminator length are different things. [Bug #12536]

    * internal.h: declaration for rb_str_change_terminator_length.

    * string.c (str_fill_term): Simplify only to zero-fill the terminator.
      For non-shared strings, it assumes that (capa + termlen) bytes of
      heap is allocated. This partially reverts r55557.

    * encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
      is used, and it should be called whenever the termlen is changed.

    * string.c (str_capacity): New static function to return capacity
      of a string with the given termlen, because the termlen may
      sometimes be different from TERM_LEN(str) especially during
      changing termlen or filling terminator with specific termlen.

    * string.c (rb_str_capacity): Use str_capacity.

    * string.c (str_buf_cat): Fix capa size for embed string.
      Fix bug in r55547. [Bug #12536]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_2@56301 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 56301
Added by usa (Usaku NAKAMURA) almost 3 years ago

merge revision(s) 55547,55551,55552,55555,55557,55559,55568,55575,55691: [Backport #12536]

* string.c: Fix memory corruptions when using UTF-16/32 strings.
  [Bug #12536] [ruby-dev:49699]

* string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
  the same as largest value of rb_enc_mbminlen(enc) among encodings.

* string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
  +TERM_LEN_MAX bytes instead of +1. This change may increase memory
  usage.

* string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

* string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
  of termlen.

* string.c (str_shared_replace): Copy +termlen bytes instead of +1.

* string.c (rb_str_times): termlen should not be included in capa.

* string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
  termlen should be counted with it because embedded strings are
  also processed by TERM_FILL.

* string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

* string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

* string.c (rb_str_subseq, str_substr): When RSTRING_EMBED_LEN_MAX
  is used, TERM_LEN(str) should be considered with it because
  embedded strings are also processed by TERM_FILL.
  Additional fix for [Bug #12536] [ruby-dev:49699].
  Additional fix for [Bug #12536] [ruby-dev:49699].

* string.c (rb_usascii_str_new, rb_utf8_str_new): Specify termlen
  which is apparently 1 for the encodings.

* string.c (str_new0_cstr): New static function to create a String
  object from a C string with specifying termlen.

* string.c (rb_usascii_str_new_cstr, rb_utf8_str_new_cstr): Specify
  termlen by using new str_new0_cstr().

* string.c (str_new_static): Specify termlen from the given encoding
  when creating a new String object is needed.

* string.c (rb_tainted_str_new_with_enc): New function to create a
  tainted String object with the given encoding. This means that
  the termlen is correctly specified. Curretly static function.
  The function name might be renamed to rb_tainted_enc_str_new
  or rb_enc_tainted_str_new.

* string.c (rb_external_str_new_with_enc): Use encoding by using the
  above rb_tainted_str_new_with_enc().

* string.c (str_fill_term): When termlen increases, re-allocation
  of memory for termlen should always be needed.
  In this fix, if possible, decrease capa instead of realloc.
  [Bug #12536] [ruby-dev:49699]

* string.c: Partially reverts r55547 and r55555.
  ChangeLog about the reverted changes are also deleted in this file.
  [Bug #12536] [ruby-dev:49699] [ruby-dev:49702]

* string.c: Specify termlen as far as possible.
  the termlen is correctly specified. Currently static function.

* string.c (rb_str_change_terminator_length): New function to change
  termlen and resize heap for the terminator. This is split from
  rb_str_fill_terminator (str_fill_term) because filling terminator
  and changing terminator length are different things. [Bug #12536]

* internal.h: declaration for rb_str_change_terminator_length.

* string.c (str_fill_term): Simplify only to zero-fill the terminator.
  For non-shared strings, it assumes that (capa + termlen) bytes of
  heap is allocated. This partially reverts r55557.

* encoding.c (rb_enc_associate_index): rb_str_change_terminator_length
  is used, and it should be called whenever the termlen is changed.

* string.c (str_capacity): New static function to return capacity
  of a string with the given termlen, because the termlen may
  sometimes be different from TERM_LEN(str) especially during
  changing termlen or filling terminator with specific termlen.

* string.c (rb_str_capacity): Use str_capacity.

* string.c (str_buf_cat): Fix capa size for embed string.
  Fix bug in r55547. [Bug #12536]

History

#1

Updated by ngoto (Naohisa Goto) about 3 years ago

  • Status changed from Open to Closed

Applied in changeset r55547.


  • string.c: Fix memory corruptions when using UTF-16/32 strings.
    [Bug #12536] [ruby-dev:49699]

  • string.c (TERM_LEN_MAX): Macro for the longest TERM_FILL length,
    the same as largest value of rb_enc_mbminlen(enc) among encodings.

  • string.c (str_new, rb_str_buf_new, str_shared_replace): Allocate
    +TERM_LEN_MAX bytes instead of +1. This change may increase memory
    usage.

  • string.c (rb_str_new_with_class): Use TERM_LEN of the "obj".

  • string.c (rb_str_plus, rb_str_justify): Use str_new0 which is aware
    of termlen.

  • string.c (str_shared_replace): Copy +termlen bytes instead of +1.

  • string.c (rb_str_times): termlen should not be included in capa.

  • string.c (RESIZE_CAPA_TERM): When using RSTRING_EMBED_LEN_MAX,
    termlen should be counted with it because embedded strings are
    also processed by TERM_FILL.

  • string.c (rb_str_capacity, str_shared_replace, str_buf_cat): ditto.

  • string.c (rb_str_drop_bytes, rb_str_setbyte, str_byte_substr): ditto.

Updated by ngoto (Naohisa Goto) about 3 years ago

ただし、rb_str_new() と rb_str_buf_new() にて、従来 capa+1 バイトのメモリを確保していた部分を修正するにあたり、後からどんなエンコーディングが関連付けられるかわからないため、

に関しては、string.c内の基礎的な部分は別としますが、
本来、rb_enc_associate_index(), rb_enc_associate(), rb_enc_copy() などの正規APIを使う限りは、いつ、どんなエンコーディングが関連付けられても、それによって問題が起こってはならないはずです。

そこで調べたところ、エンコーディング設定時に上記から呼ばれる rb_str_fill_terminator() がバグっていたことが判明しました。
なお、実際の処理は string.c 内の str_fill_term() に記述されています。

具体的には、termlen が増加するときには、その文字列の現状がどうであれ、必ずtermlen相当のメモリを追加確保しないといけないのに、それを怠って、領域が溢れそうなときだけしかメモリ再確保を行ってなかったため、

エンコーディング変更でtermlen増加
→rb_str_fill_terminator()が呼ばれるが、capaからは溢れないのでメモリ再確保は行われない
→その後rb_str_catなどで文字列が追加され増える(がcapaには収まる程度なのでreallocなどは行われない)
→capaがちょうど埋まる程度まで文字列が程よく追加された所で、TERM_FILL処理が発生
→新しいtermlenの長さを0で埋めようとするが、古い(短い)termlen分しかメモリは確保されていない
→メモリ破壊

となっていたようです。

これに伴い、

現時点での最大と思われる数値 4 を #define TERM_LEN_MAX 4 として定義して、それを常に足すようにしています。

については、このバグ解決の本質ではなく不要と判断し、この部分は revert します。

これにより、不要なメモリを確保したり、EMBEDになる文字列の長さが短くなったりすることは無くなるはずです。

Updated by ngoto (Naohisa Goto) about 3 years ago

エンコーディング設定時に上記から呼ばれる rb_str_fill_terminator() がバグっていたことが判明しました。

r55575 にて、エンコーディング設定時=termlen変更時には、
rb_str_change_terminator_length() という新関数を呼ぶように変更し、
str_fill_term() は (capaに無条件に+1していた部分などを除いて)元に戻しました。
termlen変更に伴うheap容量変更と、term_fill処理は、本質的には別操作と考えるべきで、
(SHARED文字列処理の都合でコードが似てくるので、まとめたくなる気持ちはわかりますが)
1関数で両対応していると複雑になり、バグの元と考えたためです。

Updated by ngoto (Naohisa Goto) about 3 years ago

  • Backport changed from 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN to 2.1: UNKNOWN, 2.2: REQUIRED, 2.3: REQUIRED

Updated by nagachika (Tomoyuki Chikanaga) almost 3 years ago

  • Backport changed from 2.1: UNKNOWN, 2.2: REQUIRED, 2.3: REQUIRED to 2.1: UNKNOWN, 2.2: REQUIRED, 2.3: DONE

ruby_2_3 r55988 merged revision(s) 55547,55551,55552,55555,55557,55559,55575,55691,55568.

Updated by usa (Usaku NAKAMURA) almost 3 years ago

  • Backport changed from 2.1: UNKNOWN, 2.2: REQUIRED, 2.3: DONE to 2.1: UNKNOWN, 2.2: DONE, 2.3: DONE

ruby_2_2 r56301 merged revision(s) 55547,55551,55552,55555,55557,55559,55568,55575,55691.

Also available in: Atom PDF