https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112018-05-30T02:26:01ZRuby Issue Tracking SystemRuby master - Feature #14794: Primitive arrays (Ruby 3x3)https://redmine.ruby-lang.org/issues/14794?journal_id=722962018-05-30T02:26:01Zmrkn (Kenta Murata)muraken@gmail.com
<ul></ul><p>Use numo-narray or nmatrix for homogeneous numeric arrays.</p>
<p><a href="https://github.com/ruby-numo/numo-narray" class="external">https://github.com/ruby-numo/numo-narray</a></p>
<p><a href="https://github.com/SciRuby/nmatrix" class="external">https://github.com/SciRuby/nmatrix</a></p> Ruby master - Feature #14794: Primitive arrays (Ruby 3x3)https://redmine.ruby-lang.org/issues/14794?journal_id=723032018-05-30T13:27:52Zahorek (Pavel Rosický)
<ul></ul><p>I'm interested to improve Ruby array's performance without specifying custom types or C extensions, it should just work out of the box.</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="no">Numo</span><span class="o">::</span><span class="no">Int32</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">100</span><span class="p">)</span>
<span class="no">Array</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="ss">type: :integer</span><span class="p">)</span>
</code></pre>
<p>or something like that isn't very idiomatic for Ruby. I don't want to care about types, but I expect that Ruby will store it efficiently and change the store strategy if necessary:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="o"><<</span> <span class="s1">'test'</span>
</code></pre>
<p>narray can't handle this case or numeric overflows...</p>
<p>but yes it's faster</p>
<pre><code> Numo::Int32 max: 20723.0 i/s
Numo::Int64 max: 11975.4 i/s - 1.73x slower
ruby max (int): 8191.7 i/s - 2.53x slower
</code></pre>
<pre><code>The flexibility and dynamism of dynamically typed languages frustrates most traditional static optimisations. Just-In-Time (JIT) compilers defer optimisations until run-time, when the types of objects at specific points in a program can be identified, and specialised code can be generated. In particular, variables which reference common types such as integers can be ‘unboxed’ [8 , 24]: rather than being references to an object in the heap, they are stored directly where they are used. This lowers memory consumption, improves cache locality, and reduces the overhead on the garbage collector. Unboxing is an important technique in optimising such languages.
Dynamically typed languages therefore pay a significant performance penalty for the possibility that collections may store heterogeneously typed elements, even for programs which create no such collections. Statically typed languages can determine efficient storage representations of collections storing elements of a primitive type based on a collection’s static types.
</code></pre> Ruby master - Feature #14794: Primitive arrays (Ruby 3x3)https://redmine.ruby-lang.org/issues/14794?journal_id=723042018-05-30T13:58:40Zahorek (Pavel Rosický)
<ul></ul><p>btw: 40% of arrays on my rails app contains only primitive elements</p> Ruby master - Feature #14794: Primitive arrays (Ruby 3x3)https://redmine.ruby-lang.org/issues/14794?journal_id=940532021-10-07T00:00:20Zdsisnero (Dominic Sisneros)dsisnero@gmail.com
<ul></ul><p>saw from <a href="https://chrisseaton.com/truffleruby/rubykaigi21/" class="external">https://chrisseaton.com/truffleruby/rubykaigi21/</a> - the shape of ruby objects. Maybe if we can get this implemented it would help out with this. Or use collection storage strategies to implement this. <a href="https://tratt.net/laurie/research/pubs/html/bolz_diekmann_tratt__storage_strategies_for_collections_in_dynamically_typed_languages/" class="external">https://tratt.net/laurie/research/pubs/html/bolz_diekmann_tratt__storage_strategies_for_collections_in_dynamically_typed_languages/</a></p> Ruby master - Feature #14794: Primitive arrays (Ruby 3x3)https://redmine.ruby-lang.org/issues/14794?journal_id=950732021-12-03T02:41:42Zmrkn (Kenta Murata)muraken@gmail.com
<ul></ul><p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/11888">@ahorek (Pavel Rosický)</a> I guess what you want is like <code>pandas.Series</code>. Is my speculation is correct?</p> Ruby master - Feature #14794: Primitive arrays (Ruby 3x3)https://redmine.ruby-lang.org/issues/14794?journal_id=950782021-12-03T03:41:07Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Feedback</i></li></ul><p>As far as I understand, this ticket proposes an optimization of the Array representation, for example, by using raw memory of <code>long*</code> when all the elements are Fixnums, or by using <code>double*</code> when all are Floats. The idea is interesting, but I believe we should start discussing this after anyone implements it and performs an experiment with some real-world applications like Rails.</p>
<p>Aside from SIMD, such an optimization will reduce GC time because we don't have to mark the elements of Fixnum and Float arrays. However, it may introduce additional overhead when degenerating to the current <code>VALUE*</code>representation occurs frequently. What we need is an experiment.</p> Ruby master - Feature #14794: Primitive arrays (Ruby 3x3)https://redmine.ruby-lang.org/issues/14794?journal_id=952892021-12-11T14:07:47Zahorek (Pavel Rosický)
<ul><li><strong>File</strong> <a href="/attachments/9141">clipboard-202112111356-0rasx.png</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/9141/clipboard-202112111356-0rasx.png">clipboard-202112111356-0rasx.png</a> added</li></ul><p>thanks for your interest <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/18">@mame (Yusuke Endoh)</a>!</p>
<p>btw I found a bug in pypy which leads to a slow performance in this benchmark which is now fixed</p>
<blockquote>
<p>performs an experiment with some real-world applications like Rails</p>
</blockquote>
<p>based on experiments on TruffleRuby we can say it won't help much for Rails. Unfortunately, the bottleneck in Rails is mostly bound by IO and hash performance.</p>
<p>but it could significantly help in other computational heavy areas like image processing, parsers, matrix multiplication...</p>
<blockquote>
<p>I guess what you want is like pandas.Series</p>
</blockquote>
<p>the same performance could be achievable by a C extension or a specialized type like Numo::Int32, but these are just performance hacks. What I'm proposing is that the language should determine, that an array [1,2,3] contains integers only, so it should be stored using raw memory of long* instead of objects which reference the actual value.<br>
<img src="https://redmine.ruby-lang.org/attachments/download/9141/clipboard-202112111356-0rasx.png" alt="" loading="lazy"></p>
<blockquote>
<p>such an optimization will reduce GC time</p>
</blockquote>
<p>sure. It'll also reduce memory consumption & cache locality.</p>
<p>knowing the type has other advantages. For instance, the original example with Array#max is hard to optimize by the compiler</p>
<pre><code>for (i = 0; i < RARRAY_LEN(ary); i++) {
v = RARRAY_AREF(ary, i); // we need to dereference the value
if (result == Qundef || OPTIMIZED_CMP(v, result, cmp_opt) > 0) {
result = v;
}
}
// a bunch of type checks for each iteration
#define OPTIMIZED_CMP(a, b, data) \
((FIXNUM_P(a) && FIXNUM_P(b) && CMP_OPTIMIZABLE(data, Integer)) ? \
(((long)a > (long)b) ? 1 : ((long)a < (long)b) ? -1 : 0) : \
(STRING_P(a) && STRING_P(b) && CMP_OPTIMIZABLE(data, String)) ? \
rb_str_cmp(a, b) : \
(RB_FLOAT_TYPE_P(a) && RB_FLOAT_TYPE_P(b) && CMP_OPTIMIZABLE(data, Float)) ? \
rb_float_cmp(a, b) : \
rb_cmpint(rb_funcallv(a, id_cmp, 1, &b), a, b))
</code></pre>
<p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/482">@mrkn (Kenta Murata)</a> did an optimization, but they're even more type checks which make the code complicated<br>
<a href="https://github.com/ruby/ruby/pull/3325/files" class="external">https://github.com/ruby/ruby/pull/3325/files</a></p>
<pre><code>case array_shape
when Integers
for (i = 0; i < RARRAY_LEN(ary); i++) {
if ((long)vmax < (long)v) {
vmax = v;
}
return vmax;
}
end
</code></pre>
<p>this pattern is very well known and easily optimizable because there're no type checks for each element anymore. There's also a high chance these values will be prefetched or read from a cache.<br>
<a href="https://godbolt.org/z/jeEa8roPn" class="external">https://godbolt.org/z/jeEa8roPn</a></p>
<p>it's applicable to most array methods, but Array#max is simple to explain.</p>
<blockquote>
<p>However, it may introduce additional overhead when degenerating to the current VALUE*representation occurs frequently</p>
</blockquote>
<p>you're right, there're bad scenarios that could actually lead to a slower performance</p>
<pre><code>array = [1,2,3] -> Integer array
array << "string" -> Object array (needs reallocation of the whole array instead of just putting a new element)
</code></pre>
<p>there're ways how to overcome this, but it's the hardest part to implement without a serious performance impact. It's a corner case that should still work, but it isn't very common. Most arrays aren't changing shapes at runtime like this.</p>
<p>besides <a href="https://github.com/oracle/truffleruby" class="external">https://github.com/oracle/truffleruby</a> and <a href="https://www.pypy.org" class="external">https://www.pypy.org</a> which have the optimization implemented, there's also an old pull request about the same idea <a href="https://github.com/topazproject/topaz/pull/555" class="external">https://github.com/topazproject/topaz/pull/555</a> (topaz is a dead project today)</p>
<p>the idea itself isn't hard to understand. I'm not as good in C to implement it to CRuby, but I'll gladly help if someone comes with initial implementation.</p>