Feature #10185
closed[PATCH] iseq: free untranslated iseq->iseq at compile
Description
running "ruby -rpp -e 'pp GC.stat'", a reduction in
malloc usage is shown:
before:
:malloc_increase=>118784,
:oldmalloc_increase=>1178736,
after:
:malloc_increase=>99832,
:oldmalloc_increase=>1031976,
(numbers from x86-64)
-
compile.c (rb_iseq_translate_threaded_code): free iseq->iseq
(rb_vm_addr2insn): new function for debug
(rb_iseq_untranslate_threaded_code): ditto -
iseq.c (iseq_memsize): support iseq->iseq == 0
(rb_iseq_disasm): untranslate for dump
(iseq_data_to_ary): ditto
(rb_iseq_line_trace_each): ditto -
vm_dump.c (rb_vmdebug_debug_print_pre): ditto
I think we can remove iseq->iseq field from the rb_iseq_t structure in
the future and use transient allocation for the debug functions.
rb_iseq_untranslate_threaded_code is an uncommon code path for debug and
not a performance issue in normal code.
Files
Updated by normalperson (Eric Wong) over 9 years ago
- File iseq-iseq-diet.patch iseq-iseq-diet.patch added
Patch #2:
Subject: [PATCH] iseq (rb_iseq_t): remove iseq->iseq field
It is unnecessary and reduces transient malloc/free during compile.
For "ruby -e exit", valgrind reports over 300K reduction in
overall allocations.
before:
total heap usage: 49,622 allocs, 20,492 frees, 8,697,493 bytes allocated
after:
total heap usage: 48,935 allocs, 19,805 frees, 8,373,773 bytes allocated
Updated by normalperson (Eric Wong) about 9 years ago
https://bugs.ruby-lang.org/issues/10185#change-48562
---Files--------------------------------
iseq-tc-diet.patch (4.43 KB)
iseq-iseq-diet.patch (9.54 KB)
Ping? I hope to commit these two soon.
Updated by ko1 (Koichi Sasada) about 9 years ago
Thank you for waiting my comment.
I'm thinking about compatibility and expandability.
Now, in Ruby source, iseq->iseq is used only for disasm. So that your patch works completely.
However we have several compilers (iseq -> C) to use iseq->iseq directly.
Of course, most of people don't use this compiler.
So your proopsal is reasonable.
On the other hand, the name `iseq_encoded' is not only for direct threaded code. We can make another encoding scheme.
Summary: my proposal is:
- remain iseq->iseq (NULL as default)
- rename rb_iseq_untranslate_threaded_code to rb_iseq_decode_encoded_iseq() and fill iseq->iseq if iseq->iseq is not NULL.
- rb_iseq_decode_encoded_iseq() can call rb_iseq_untranslate_threaded_code() (one of decode scheme).
- expose (not in ruby/ but internal.h and expose symbol) rb_iseq_decode_encoded_iseq() for maniac users.
Updated by ko1 (Koichi Sasada) about 9 years ago
Koichi Sasada wrote:
- rename rb_iseq_untranslate_threaded_code to rb_iseq_decode_encoded_iseq() and fill iseq->iseq if iseq->iseq is not NULL.
decode is also strange.
const VALUE *rb_iseq_orignal_iseq() ?
Updated by normalperson (Eric Wong) about 9 years ago
This combines both patches to do translation in-place.
I couldn't make rb_iseq_original_iseq return const because of
rb_iseq_line_trace_each, but it now caches iseq->iseq field
(at the bottom of rb_iseq_t, away from hotter fields).
Also, I hope to make compile_data and iseq->iseq fields a union,
so it might be iseq->u1.compile_dat/iseq->u1.iseq in the future.
Updated by Anonymous about 9 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
Applied in changeset r47508.
compile: translate iseq in-place
running "ruby -rpp -e 'pp GC.stat'", a reduction in
malloc usage is shown:
before:
:malloc_increase=>118784,
:oldmalloc_increase=>1178736,
after:
:malloc_increase=>99832,
:oldmalloc_increase=>1031976,
For "ruby -e exit", valgrind reports over 300K reduction in
overall allocations (and unnecessary memory copies).
before:
total heap usage: 49,622 allocs, 20,492 frees, 8,697,493 bytes allocated
after:
total heap usage: 48,935 allocs, 19,805 frees, 8,373,773 bytes allocated
(numbers from x86-64)
v2 changes based on ko1 recommendations [ruby-core:64883]:
- squashed in-place direct thread translation to avoid alloc+copy
- renamed rb_iseq_untranslate_threaded_code to rb_iseq_original_iseq,
cache new iseq->iseq_original field.
-
compile.c (rb_iseq_translate_threaded_code): modify in-place w/o copy
(rb_vm_addr2insn): new function for debug
(rb_iseq_original_iseq): ditto
(iseq_set_sequence): assign iseq_encoded directly
[Feature #10185] -
vm_core (rb_iseq_t): move original ->iseq to bottom
-
iseq.c (iseq_free, iseq_free): adjust for new layout
(rb_iseq_disasm): use original iseq for dump
(iseq_data_to_ary): ditto
(rb_iseq_line_trace_each): ditto
(rb_iseq_build_for_ruby2cext): use iseq_encoded directly -
vm_dump.c (rb_vmdebug_debug_print_pre): use original iseq