Project

General

Profile

Feature #15631

Let round_capa for ID table not allocate excess capacity for power of 2 ints >= 4

Added by ahorek (Pavel Rosický) 6 months ago. Updated 18 days ago.

Status:
Open
Priority:
Normal
Target version:
-
[ruby-core:91651]

Description

right now round_capa value is rounded up to the next power of 2

round_capa(4) -> returns 8
round_capa(8) -> returns 16
round_capa(16) -> returns 32

round_capa(5) -> returns 8
round_capa(9) -> returns 16
round_capa(17) -> returns 32
etc.

it seems wasteful to allocate the extra items capacity, so this PR changes that to

round_capa(4) -> returns 4
round_capa(8) -> returns 8
round_capa(16) -> returns 16

round_capa(5) -> returns 8
round_capa(9) -> returns 16
round_capa(17) -> returns 32
etc.

the main purpose is to reduce memory usage especially during boot

my patch also uses BUILTIN_CLZ macro instead of shifts that makes it slightly faster

here's a benchmark

require 'benchmark/ips'

Benchmark.ips do |x|
  x.config(time: 20, warmup: 3)

  x.report('struct', "Struct.new(*('a'..'z').map { |x| x.to_sym })")
end
trunk
Warming up --------------------------------------
              struct   527.000  i/100ms
Calculating -------------------------------------
              struct      5.461k (± 5.5%) i/s -    109.089k in  20.040253s

methodmising - POW2_P (github)
Warming up --------------------------------------
              struct   544.000  i/100ms
Calculating -------------------------------------
              struct      5.570k (± 4.1%) i/s -    111.520k in  20.057245s

ahorek - BUILTIN_CLZ (id_table.c.patch)
Warming up --------------------------------------
              struct   571.000  i/100ms
Calculating -------------------------------------
              struct      5.812k (± 3.6%) i/s -    116.484k in  20.070607s

discussion https://github.com/ruby/ruby/pull/2083


Files

id_table.c.patch (534 Bytes) id_table.c.patch ahorek (Pavel Rosický), 03/01/2019 01:22 PM
st.c.patch (455 Bytes) st.c.patch ahorek (Pavel Rosický), 07/19/2019 03:15 PM
st.c.patch (434 Bytes) st.c.patch v2 ahorek (Pavel Rosický), 07/21/2019 12:52 AM

History

#1

Updated by ahorek (Pavel Rosický) 6 months ago

  • Description updated (diff)

Updated by methodmissing (Lourens Naudé) 5 months ago

Thanks for raising this Pavel.

st_init_table_with_size(0) effectively also allocates additional capacity, but if and how quickly the hash tables mutate I'll investigate later.

References https://github.com/ruby/ruby/blob/trunk/st.c#L573-L578 , https://github.com/ruby/ruby/blob/trunk/st.c#L595 and https://github.com/ruby/ruby/blob/trunk/st.c#L332-L359

A simple peek suggests a total table size of 152 bytes on init, but will investigate time to mutation of these 0 sized tables this evening:

diff --git a/st.c b/st.c
index ed235c674e..f2b99d7771 100644
--- a/st.c
+++ b/st.c
@@ -615,6 +615,8 @@ st_init_table_with_size(const struct st_hash_type *type, st_index_t size)
 #ifdef ST_DEBUG
     st_check(tab);
 #endif
+    printf("# st_init_table_with_size(%d) -> %d (%d)\n", size, n, st_memsize(tab));
+
     return tab;
 }
linking miniruby
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(102) -> 7 (3384)
# st_init_table_with_size(255) -> 8 (7224)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(1000) -> 10 (28728)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(1000) -> 10 (28728)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(7) -> 3 (248)
# st_init_table_with_size(15) -> 4 (440)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(16) -> 5 (888)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)
# st_init_table_with_size(0) -> 2 (152)

Updated by ko1 (Koichi Sasada) about 1 month ago

  • Assignee set to ko1 (Koichi Sasada)

funny_falcon, do you have any opinion?
if no opinion, we'll discuss it one month later and will commit it.

in fact, I can't check algorithm, so we can try it.

ahorek:
could you give me more performance measurements?

  • you should not call map in iteration (you should prepare IDs before)
  • now you only measures 26 fields. could you measure other numbers, 1 to 50, for example.

Thanks,
Koichi

Updated by ahorek (Pavel Rosický) about 1 month ago

Thanks for the review Koichi. I tested the patch on a rails app (redmine), but unfortunatelly there's no improvement.
0.1MB less memory after boot (150MB total)
No mesurable difference in performance

I'll investigate the second case. Hash creation might be a better place to optimize.

Updated by ahorek (Pavel Rosický) 30 days ago

I've attached the second patch for st_init_table_with_size. In theory it should be faster, but I can't measure any difference in ruby.

Updated by nobu (Nobuyoshi Nakada) 30 days ago

It should use SIZEOF_ST_INDEX_T and nlz_intptr.

Updated by methodmissing (Lourens Naudé) 29 days ago

Pavel added a new patch for get_power2 in https://github.com/ruby/ruby/pull/2292

Updated by ko1 (Koichi Sasada) 20 days ago

pls give us your measurements. We can't understand how it is useful.

Thanks,
Koichi

Updated by ahorek (Pavel Rosický) 18 days ago

I run several benchmark suites for both patches
https://github.com/ruby-bench/ruby-bench-suite
https://github.com/schneems/derailed_benchmarks

but all differences were within margin of error. Here's an optimized assembly comparsion that explains why:
https://user-images.githubusercontent.com/9540855/62237913-2e221380-b3d2-11e9-932a-14b09038bf91.png

saving 1-2 instructions makes no real difference.

feel free to close the issue. Thanks

Also available in: Atom PDF