Project

General

Profile

Actions

Feature #8887

closed

min(n), max(n), min_by(n), max_by(n)

Added by akr (Akira Tanaka) about 11 years ago. Updated about 10 years ago.

Status:
Closed
Target version:
-
[ruby-core:57111]

Description

How about adding an optional argument, n, for Enumerable#{min,max,min_by,max_by} to
return minimum/maximum n elements as an array.

Example:

  • [6, 0, 3, 3, 8, 3, 5, 0, 6].min(4) #=> [0, 0, 3, 3]
  • [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4) #=> [5, 6, 6, 8]
  • [6, 0, 3, 3, 8, 3, 5, 0, 6].min_by(4) {|v| (v-5)**2 } #=> [5, 6, 6, 3]
  • [6, 0, 3, 3, 8, 3, 5, 0, 6].max_by(4) {|v| (v-5)**2 } #=> [3, 8, 0, 0]

These methods are similar to sort follows first or last.

  • e.min(n) is similar to e.sort.first(n)
  • e.max(n) is similar to e.sort.last(n)
  • e.min_by(n) {...} is similar to e.sort_by {...}.first(n)
  • e.max_by(n) {...} is similar to e.sort_by {...}.last(n)

However e.min(n), e.max(n), e.min_by(n), e.max_by(n) are
less memory consuming and can be faster.
They use memory proportional to n, not e.
They doesn't sort whole e.

I feel their use is not rare.
I found several use after searching.

[ruby-talk:123508], [ruby-list:40939], [ruby-talk:273980]
http://d.hatena.ne.jp/mjh/20101024/1287901875
http://stackoverflow.com/questions/11094874/get-top-n-elements-from-ruby-array-of-hash-values
http://www.math.kobe-u.ac.jp/~kodama/tips-ruby-sized_pqueue.html
https://bitbucket.org/sterlingcamden/topn

Also, e.max(n) can be used to implement weighted random sampling.

Pavlos S. Efraimidis, Paul G. Spirakis
Weighted random sampling with a reservoir
Information Processing Letters
Volume 97, Issue 5 (16 March 2006)

  % ./ruby -e '
  module Enumerable
    def wsample(n)
      self.max_by(n) {|v| rand ** (1.0/yield(v)) }
    end
  end
  e = (-20..20).to_a*10000
  a = e.wsample(20000) {|x|
    Math.exp(-(x/5.0)**2) # normal distribution
  }
  # a is 20000 samples from e.
  p a.length
  h = a.group_by {|x| x }
  -10.upto(10) {|x| puts "*" * (h[x].length/30.0).to_i if h[x] }
  '
  20000
  *
  ***
  ******
  ***********
  ******************
  *****************************
  *****************************************
  ****************************************************
  ***************************************************************
  ********************************************************************
  ***********************************************************************
  ***********************************************************************
  **************************************************************
  ****************************************************
  ***************************************
  ***************************
  ******************
  ***********
  *******
  ***
  *

Any comments?


Files

min-n-and-max-n.patch (15.2 KB) min-n-and-max-n.patch akr (Akira Tanaka), 09/10/2013 10:28 PM
maxn.pdf (20.3 KB) maxn.pdf akr (Akira Tanaka), 09/28/2013 11:10 AM

Updated by Hanmac (Hans Mackowiak) about 11 years ago

hm i am curios in what order max(n) and max_by(n) should return the elements?

like [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4):
should it return sort.last(n) #> [5, 6, 6, 8]
or is that better? [8, 6, 6, 5] because 8 is bigger than 6 ?

PS: min_by(n) and max_by(n) (and maybe sort_by(n)) should maybe implemented different without take, so that it max_by(n) picks only the first n biggest objects without sorting the entire test of the Enumerator?

also about your patch, e.min_by(n).size should respect also n and also enum_size, so it needs to return n when n < as size, and size when n > size (for size != nil)

Updated by akr (Akira Tanaka) about 11 years ago

2013/9/10 Hanmac (Hans Mackowiak) :

Issue #8887 has been updated by Hanmac (Hans Mackowiak).

hm i am curios in what order max(n) and max_by(n) should return the elements?

like [6, 0, 3, 3, 8, 3, 5, 0, 6].max(4):
should it return sort.last(n) #> [5, 6, 6, 8]
or is that better? [8, 6, 6, 5] because 8 is bigger than 6 ?

I think sorted order (minimum element first, maximum element last) is intuitive.

PS: min_by(n) and max_by(n) (and maybe sort_by(n)) should maybe implemented different without take, so that it max_by(n) picks only the first n biggest objects without sorting the entire test of the Enumerator?

The methods accumelates 4n elements, "sort" them and drop 3n elements
repeatedly.
Also, the "sort" is based on quick sort but doesn't sort a half of
divided elements.

also about your patch, e.min_by(n).size should respect also n and also enum_size, so it needs to return n when n < as size, and size when n > size (for size != nil)

As far as I understand, the size method returns number of yield
invocaitons of the enumerator.
The enumerators, e.min_by(n) and e.max_by(n), yields e.size times.
So they returns e.size which is not related to n.

For example, e.reject.size returns e.size as follows.

(1..10).to_a.reject.size #=> 10

This doesn't mean the result array size of reject method.

Tanaka Akira

Updated by akr (Akira Tanaka) about 11 years ago

  • File maxn.pdf added

slide added

Actions #4

Updated by akr (Akira Tanaka) about 11 years ago

  • File deleted (maxn.pdf)

Updated by akr (Akira Tanaka) about 11 years ago

slide updated

Updated by matz (Yukihiro Matsumoto) over 10 years ago

Agreed. I think 2.2 is a good time to add these enhancement.

Matz.

Updated by nobu (Nobuyoshi Nakada) over 10 years ago

  • Category set to core
  • Status changed from Open to Assigned
  • Assignee set to akr (Akira Tanaka)

Updated by nobu (Nobuyoshi Nakada) over 10 years ago

  • Description updated (diff)

Updated by akr (Akira Tanaka) over 10 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

Applied in changeset r44958.


  • enum.c: Enumerable#{min,min_by,max,max_by} extended to take an
    optional argument.
    (nmin_cmp): New function.
    (nmin_block_cmp): Ditto
    (nmin_filter): Ditto.
    (nmin_i): Ditto.
    (nmin_run): Ditto.
    (enum_min): Call nmin_run if the optional argument is given.
    (nmin_max): Ditto.
    (nmin_min_by): Ditto.
    (nmin_max_by): Ditto.

  • range.c: Range#{min,max} extended to take an optional argument.
    (range_min): Call range_first if the optional argument is given.
    (range_max): Call rb_call_super if the optional argument is given.

[ruby-core:57111] [Feature #8887]

Updated by DavidEGrayson (David Grayson) about 10 years ago

Hello. I think the Ruby team should reconsider the ordering of the array returned by max and max_by when the n argument is provided. It makes much more sense to me that it would be sorted in descending order, meaning that the most extreme/special element would be first. For example, I would expect max to behave like this:

[1, 2, 10, 22, 23].max(3)  # => [23, 22, 10]

It seems that min/min_by and max/max_by should be mirrors of each other. For example, I would expect that max_by(n) { |x| x.foo } would be equivalent to min_by(n) { |x| -x.foo }, assuming that foo returns a number.

That leaves us with two choices for the sorting: put the most extreme element first, or put the most extreme element last. I think putting the most extreme element first makes the most sense. Whenever I see any list of top-ranked items, the highest-ranked item is usually first. Here are some examples from the internet: http://imgur.com/a/MPsJl

I looked around on StackOverflow.com for Ruby users who might benefit from the new argument to min/min_by/max/max_by. Almost everyone asking a question about it wanted to use something like max_by and wanted to have the most extreme element be listed first:

However, I should note that in one case, the questioner wanted the order of the original enumerable to be preserved (http://stackoverflow.com/questions/9459447), and in another case they didn't specify (http://stackoverflow.com/questions/12241165). I didn't find anyone who wanted the most extreme element to be last.

I also looked for similar features in other programming languages. The closest thing I found was Python's heapq.nlargest method, which does return the most extreme element first:

https://docs.python.org/3/library/heapq.html#heapq.nlargest

In SQL, if you are making a query to get the top 10 rows, I think you must do something like this, which puts the most extreme row first:

select * from players order by score desc limit 10

In conclusion, I think it is more natural for the max and max_by methods to return arrays with the most extreme element first. I hope the Ruby team will do this before Ruby 2.2.0 is released, since it will be hard to change it later without breaking a lot of people's code.

--David Grayson

P.S. Some other Ruby developers I know here in Las Vegas have agreed with this letter and wanted their names to be included:

  • Jason Arhart
  • Paul Grayson

Updated by jhettich (Jan Hettich) about 10 years ago

A strong case can be made for returning the elements in their original order. That ordering information is otherwise lost; whereas, the client can always sort the resulting elements in whatever order she wants. This behavior would also be more consistent with other enumerable methods, such as select.

Updated by sawa (Tsuyoshi Sawada) about 10 years ago

If descending order is to adopted, a thing that has to be made clear is how to order different objects that are in a tie with respect to the measure in question.

[0, 1, 2, 3, 4, 5, 6, 7, 8].max_by(6){|e| e % 3}

Which of the following should this return?

[1, 4, 7, 2, 5, 8] # ascending, stable
[2, 5, 8, 1, 4, 7] # descending, stable
[8, 5, 2, 7, 4, 1] # descending, reversed stable

Updated by akr (Akira Tanaka) about 10 years ago

David Grayson wrote:

Hello. I think the Ruby team should reconsider the ordering of the array returned by max and max_by when the n argument is provided. It makes much more sense to me that it would be sorted in descending order, meaning that the most extreme/special element would be first. For example, I would expect max to behave like this:

[1, 2, 10, 22, 23].max(3)  # => [23, 22, 10]

It seems reasonable.

Other opinions?

Note that the max(n) implementation doesn't maintain the order of original enumerable.
So it is difficult to preserve the original order or stablility.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0