Feature #15251
closedHash aset should deduplicate non tainted string
Description
I'm not sure if current behavior is expected one or a bug. So feel free to change tracker type.
Currently Hash ASET checks if non-tainted string exists in fstring table or not, if it doesn't then ruby duplicates string and freeze it. This works well for string_literal because they are already registered in fstring table, but it doesn't work for non-literal string.
Patch
https://github.com/ruby/ruby/pull/1993
O/P of attached file(test_hash_keys_deduped.rb) on trunk:
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3 => 100
interpolated_string => 100
string add => 100
string append => 100
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fstring
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3 => 1
interpolated_string => 1
string add => 1
string append => 1
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fstring + GC
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3 => 100
interpolated_string => 100
string add => 100
string append => 100
after patch
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3 => 1
interpolated_string => 1
string add => 1
string append => 1
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fstring
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3 => 1
interpolated_string => 1
string add => 1
string append => 1
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fstring + GC
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3 => 1
interpolated_string => 1
string add => 1
string append => 1
Benchmark result(bench_hash_aset.rb):
Trunk:
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0.880000 0.000000 0.880000 ( 0.880699)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string non-literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0.980000 0.000000 0.980000 ( 0.978089)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`random text`]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3.716000 0.004000 3.720000 ( 3.722688)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0.868000 0.000000 0.868000 ( 0.868405)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string non-literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0.948000 0.000000 0.948000 ( 0.948946)
Patched:
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0.872000 0.000000 0.872000 ( 0.872410)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string non-literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0.864000 0.000000 0.864000 ( 0.865356)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`random text`]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3.780000 0.000000 3.780000 ( 3.779730)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0.868000 0.000000 0.868000 ( 0.867957)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string non-literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0.872000 0.000000 0.872000 ( 0.873573)
Files
Updated by chopraanmol1 (Anmol Chopra) about 6 years ago
- Description updated (diff)
Updated by chopraanmol1 (Anmol Chopra) about 6 years ago
- Description updated (diff)
Updated by chopraanmol1 (Anmol Chopra) about 6 years ago
- Description updated (diff)
Updated by normalperson (Eric Wong) about 6 years ago
chopraanmol1@gmail.com wrote:
I'm not sure if current behavior is expected one or a bug. So fell free to
change tracker type.
current behavior is intentional because of a regression case:
https://bugs.ruby-lang.org/issues/9188
Currently Hash aset checks if non-tainted string exists in
fstring table or not, if it does not then ruby duplicates
string and freeze it. This works well for string_literal
because they are already registered in fstring table during
compilation, but it doesn't work for non-string literal.
Right, I prefer we always dedupe since more programs will
benefit. We can take a small regression for programs that write
uncommon/random keys to hashes.
Updated by chopraanmol1 (Anmol Chopra) about 6 years ago
- File bench_hash_aset.rb bench_hash_aset.rb added
- Description updated (diff)
@normalperson (Eric Wong) I also benchmarked so_k_nucleotide mentioned in https://bugs.ruby-lang.org/issues/9188 with following command
benchmark-driver benchmark/so_k_nucleotide.yml -e "path_to_patched_binary" -e "path_to_trunk_binary" -e "path_to_patched_binary --jit" -e "path_to_trunk_binary --jit" --repeat-count 10
Result:
Calculating -------------------------------------
path_to_patched_binary path_to_trunk_binary path_to_patched_binary --jit path_to_trunk_binary --jit
so_k_nucleotide 0.786 0.755 0.686 0.673 i/s - 1.000 times in 1.272604s 1.324481s 1.458204s 1.485638s
Comparison:
so_k_nucleotide
path_to_patched_binary: 0.8 i/s
path_to_trunk_binary: 0.8 i/s - 1.04x slower
path_to_patched_binary --jit: 0.7 i/s - 1.15x slower
path_to_trunk_binary --jit: 0.7 i/s - 1.17x slower
I think so far it looks good, let me know if I did something wrong while running the above benchmark.
Updated by normalperson (Eric Wong) about 6 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r65371.
hash.c: aset deduplicates un-tainted string
We revisit [Bug #9188] since st.c is much improved since then,
and benchmarks against so_k_nucleotide seem to indicate little
or no performance change compared to before.
[ruby-core:89555] [Feature #15251]
From: Anmol Chopra chopraanmol1@gmail.com
Updated by normalperson (Eric Wong) about 6 years ago
chopraanmol1@gmail.com wrote:
I think so far it looks good, let me know if I did something wrong while running the above benchmark.
I agree, so I've committed your patch as-is for r65371.
I wanted to try a shorter patch:
https://80x24.org/spew/20181026050908.1183-1-e@80x24.org/raw
But I got some spec failures due to singleton class (below). I
haven't investigated, yet, but I think there may be an existing
bug in hash.c, because my shorter patch ought to work...
Hash#[]= duplicates string keys using dup semantics FAILED
Expected "bar"
to equal "oof"
ruby/spec/ruby/core/hash/shared/store.rb:16:in block (2 levels) in <top (required)>' ruby/spec/ruby/core/hash/element_set_spec.rb:5:in
<top (required)>'
Hash#store duplicates string keys using dup semantics FAILED
Expected "bar"
to equal "oof"
ruby/spec/ruby/core/hash/shared/store.rb:16:in block (2 levels) in <top (required)>' ruby/spec/ruby/core/hash/store_spec.rb:5:in
<top (required)>'
Updated by normalperson (Eric Wong) about 6 years ago
But I got some spec failures due to singleton class (below). I
haven't investigated, yet, but I think there may be an existing
bug in hash.c, because my shorter patch ought to work...
Maybe this is an appropriate simplification:
https://80x24.org/spew/20181026055336.25908-1-e@80x24.org/raw