@byroot (Jean Boussier)
I'd suggest running the hash related benchmarks included in ruby's repo: https://github.com/ruby/ruby/tree/master/benchmark
For most of the benchmarks, this changes doesn't make much of a different because they are either too small hashes or use strings or symbols as keys and this is only implemented for the general keys (objects, integers, etc.). Another important detail is that in the way that it is implemented right now the filling of the random matrix may be being benchmaked (so I think that the result of a finished implementation may be even better).
It does make a difference in the bighash
benchmark. I have increased the number of elements inserted in the hash a bit to make the difference clearer. I have also added an extra benchmark, equivalent to this one but with objects instead of numbers as keys: https://github.com/ruby/ruby/commit/99e0c33655dec45a9b36850b6fe2cb36ef5f4d42
Those are the results for those two benchmarks with Simple tabulation and Quadratic Probing [ https://github.com/Ana06/ruby/tree/tabulation ]:
compare-ruby: ruby 2.8.0dev (2020-05-07T16:22:38Z master 7ded8fd29a) [x86_64-linux]
built-ruby: https://github.com/Ana06/ruby/tree/tabulation
Calculating -------------------------------------ยท
compare-ruby built-ruby
bighash_obj 0.079 0.099 i/s - 1.000 times in 12.591126s 10.078022s
bighash 0.394 0.454 i/s - 1.000 times in 2.541103s 2.204265s
Comparison:
bighash_obj
built-ruby: 0.1 i/s
compare-ruby: 0.1 i/s - 1.25x slower
bighash
built-ruby: 0.5 i/s
compare-ruby: 0.4 i/s - 1.15x slower
Those are the results for those two benchmarks with Simple tabulation and the current hash implementation (without Quadratic or Linear Probing) [ https://github.com/Ana06/ruby/tree/tabulation without 95f367a ]:
compare-ruby: ruby 2.8.0dev (2020-05-07T16:22:38Z master 7ded8fd29a) [x86_64-linux]
built-ruby: https://github.com/Ana06/ruby/tree/tabulation without 95f367a
Calculating -------------------------------------
compare-ruby built-ruby
bighash_obj 0.081 0.094 i/s - 1.000 times in 12.271814s 10.585687s
bighash 0.369 0.409 i/s - 1.000 times in 2.711338s 2.445082s
Comparison:
bighash_obj
built-ruby: 0.1 i/s
compare-ruby: 0.1 i/s - 1.16x slower
bighash
built-ruby: 0.4 i/s
compare-ruby: 0.4 i/s - 1.11x slower
@matz (Yukihiro Matsumoto)
ana06 (Ana Maria Martinez Gomez) Do you need any help?
To finish it so that it can be integrated in production I do need some help. I am not sure where the good place to put this code is. Concretely:
- The filling of the random matrix shouldn't be done the first time it is needed, but it should be part of the "initialization" of Ruby (or the Hash class). But I don't know which is the best way to do this. Actually, this doesn't necessarily need to be done as I have implemented it. Simple Tabulation needs 64KB of random data. Ideally it could even be a shared memory section with other programs (I don't think we have to do something like this, but just pointing out that there may be an smarter solution).
- I have removed some functionality out by placing the Simple Tabulation code in the
any_hash
function directly. With that it is not allowed to overwrite the hash method for the hashing implementation (which maybe is ok, it is done for other objects already, related: https://bugs.ruby-lang.org/issues/16850). Using arrays as keys is also broken (at least not consistent with the previous behavior because the array id is now used). I would need some direction how to address these things (probably there is a better place to integrate this code).
There would still be some work to do, such as for example implement it for 32 bites (currently only implemented for 64). But this concretely shouldn't be a big deal.