Project

General

Profile

Actions

Feature #12222

closed

Introducing basic statistics methods for Enumerable (and optimized implementation for Array)

Added by mrkn (Kenta Murata) over 5 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:74607]

Description

As python has statistics library for calculating mean, variance, etc. of arrays and iterators from version 3.4,
I would like to propose to introduce such features for built-in Enumerable, and optimized implementation for Array.

Especially I want to provide Enumerable#mean and Enumerable#variance as built-in features because they should be implemented by precision compensated algorithms.
The following example shows that we couldn't calculate the standard deviation for some arrays with simple variance algorithm because we get negative variance numbers.

class Array
  # Kahan summation
  def sum
    s = 0.0
    c = 0.0
    n = self.length
    i = 0
    while i < n
      y = self[i] - c
      t = s + y
      c = (t - s) - y
      s = t
      i += 1
    end
    s
  end

  # precision compensated algorithm
  def variance
    n = self.length
    return Float::NAN if n < 2
    m1 = 0.0
    m2 = 0.0
    i = 0
    while i < n
      x = self[i]
      delta = x - m1
      m1 += delta / (i + 1)
      m2 += delta*(x - m1)
      i += 1
    end
    m2 / (n - 1)
  end
end

ary = [ 1.0000000081806004, 1.0000000009124625, 1.0000000099201818, 1.0000000061821668, 1.0000000042644555 ]

# simple variance algorithm
a = ary.map {|x| x ** 2 }.sum
b = ary.sum ** 2 / ary.length
p (a - b) / (ary.length - 1)  #=> -2.220446049250313e-16

# precision compensated algorithm
p ary.variance  #=> 1.2248208046392579e-17

I think precision compensated algorithm is too complicated to let users implement it.


Related issues

Related to Ruby master - Feature #12217: Introducing Enumerable#sum for precision compensated summation and revert r54237Closedmrkn (Kenta Murata)Actions
Actions

Also available in: Atom PDF