Feature #10185
closed[PATCH] iseq: free untranslated iseq->iseq at compile
Description
running "ruby -rpp -e 'pp GC.stat'", a reduction in
malloc usage is shown:
before:
:malloc_increase=>118784,
:oldmalloc_increase=>1178736,
after:
:malloc_increase=>99832,
:oldmalloc_increase=>1031976,
(numbers from x86-64)
-
compile.c (rb_iseq_translate_threaded_code): free iseq->iseq
(rb_vm_addr2insn): new function for debug
(rb_iseq_untranslate_threaded_code): ditto -
iseq.c (iseq_memsize): support iseq->iseq == 0
(rb_iseq_disasm): untranslate for dump
(iseq_data_to_ary): ditto
(rb_iseq_line_trace_each): ditto -
vm_dump.c (rb_vmdebug_debug_print_pre): ditto
I think we can remove iseq->iseq field from the rb_iseq_t structure in
the future and use transient allocation for the debug functions.
rb_iseq_untranslate_threaded_code is an uncommon code path for debug and
not a performance issue in normal code.
Files
Updated by normalperson (Eric Wong) about 10 years ago
- File iseq-iseq-diet.patch iseq-iseq-diet.patch added
Patch #2:
Subject: [PATCH] iseq (rb_iseq_t): remove iseq->iseq field
It is unnecessary and reduces transient malloc/free during compile.
For "ruby -e exit", valgrind reports over 300K reduction in
overall allocations.
before:
total heap usage: 49,622 allocs, 20,492 frees, 8,697,493 bytes allocated
after:
total heap usage: 48,935 allocs, 19,805 frees, 8,373,773 bytes allocated
Updated by normalperson (Eric Wong) about 10 years ago
https://bugs.ruby-lang.org/issues/10185#change-48562
---Files--------------------------------
iseq-tc-diet.patch (4.43 KB)
iseq-iseq-diet.patch (9.54 KB)
Ping? I hope to commit these two soon.
Updated by ko1 (Koichi Sasada) about 10 years ago
Thank you for waiting my comment.
I'm thinking about compatibility and expandability.
Now, in Ruby source, iseq->iseq is used only for disasm. So that your patch works completely.
However we have several compilers (iseq -> C) to use iseq->iseq directly.
Of course, most of people don't use this compiler.
So your proopsal is reasonable.
On the other hand, the name `iseq_encoded' is not only for direct threaded code. We can make another encoding scheme.
Summary: my proposal is:
- remain iseq->iseq (NULL as default)
- rename rb_iseq_untranslate_threaded_code to rb_iseq_decode_encoded_iseq() and fill iseq->iseq if iseq->iseq is not NULL.
- rb_iseq_decode_encoded_iseq() can call rb_iseq_untranslate_threaded_code() (one of decode scheme).
- expose (not in ruby/ but internal.h and expose symbol) rb_iseq_decode_encoded_iseq() for maniac users.
Updated by ko1 (Koichi Sasada) about 10 years ago
Koichi Sasada wrote:
- rename rb_iseq_untranslate_threaded_code to rb_iseq_decode_encoded_iseq() and fill iseq->iseq if iseq->iseq is not NULL.
decode is also strange.
const VALUE *rb_iseq_orignal_iseq() ?
Updated by normalperson (Eric Wong) about 10 years ago
This combines both patches to do translation in-place.
I couldn't make rb_iseq_original_iseq return const because of
rb_iseq_line_trace_each, but it now caches iseq->iseq field
(at the bottom of rb_iseq_t, away from hotter fields).
Also, I hope to make compile_data and iseq->iseq fields a union,
so it might be iseq->u1.compile_dat/iseq->u1.iseq in the future.
Updated by Anonymous about 10 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
Applied in changeset r47508.
compile: translate iseq in-place
running "ruby -rpp -e 'pp GC.stat'", a reduction in
malloc usage is shown:
before:
:malloc_increase=>118784,
:oldmalloc_increase=>1178736,
after:
:malloc_increase=>99832,
:oldmalloc_increase=>1031976,
For "ruby -e exit", valgrind reports over 300K reduction in
overall allocations (and unnecessary memory copies).
before:
total heap usage: 49,622 allocs, 20,492 frees, 8,697,493 bytes allocated
after:
total heap usage: 48,935 allocs, 19,805 frees, 8,373,773 bytes allocated
(numbers from x86-64)
v2 changes based on ko1 recommendations [ruby-core:64883]:
- squashed in-place direct thread translation to avoid alloc+copy
- renamed rb_iseq_untranslate_threaded_code to rb_iseq_original_iseq,
cache new iseq->iseq_original field.
-
compile.c (rb_iseq_translate_threaded_code): modify in-place w/o copy
(rb_vm_addr2insn): new function for debug
(rb_iseq_original_iseq): ditto
(iseq_set_sequence): assign iseq_encoded directly
[Feature #10185] -
vm_core (rb_iseq_t): move original ->iseq to bottom
-
iseq.c (iseq_free, iseq_free): adjust for new layout
(rb_iseq_disasm): use original iseq for dump
(iseq_data_to_ary): ditto
(rb_iseq_line_trace_each): ditto
(rb_iseq_build_for_ruby2cext): use iseq_encoded directly -
vm_dump.c (rb_vmdebug_debug_print_pre): use original iseq