Feature #1089 [ruby-core:21737]

Stable sorting for sort and sort_by

Added by Kornelius Kalnbach 329 days ago. Updated 52 days ago.

Status :Rejected Start :02/02/2009
Priority :Normal Due date :
Assigned to :Yukihiro Matsumoto % Done :

0%

Category :core
Target version :-

Description

I'd like to have stable sorting in Ruby. Stable sorting means that if two items are ==, they will retain their order in the list after sorting. In other words, stable sort does never switch two equal items.

In Ruby 1.9.1, you'd have to use something like

  enum.sort_by.with_index { |a, i| [a, i] }

to achieve a stable sort in Ruby.

It would also be nice to minimize the number of comparisons, since method calling is expensive in Ruby.

Python is using a highly optimized implementation of a variant of merge sort, which is stable and uses very few comparisons (called Timsort after Tim Peters); maybe Ruby can adopt it for sort and sort_by:

  http://svn.python.org/projects/python/trunk/Objects/listsort.txt

Introducing stable sorting would not be a problem for old code because the order of equal items after sorting is currently not specified.

History

02/02/2009 11:36 PM - Charles Nutter

Kornelius Kalnbach wrote:
> I'd like to have stable sorting in Ruby. Stable sorting means that if two items are ==, they will retain their order in the list after sorting. In other words, stable sort does never switch two equal items.
...
> It would also be nice to minimize the number of comparisons, since method calling is expensive in Ruby.
> 
> Python is using a highly optimized implementation of a variant of merge sort, which is stable and uses very few comparisons (called Timsort after Tim Peters); maybe Ruby can adopt it for sort and sort_by:
> 
>   http://svn.python.org/projects/python/trunk/Objects/listsort.txt
> 
> Introducing stable sorting would not be a problem for old code because the order of equal items after sorting is currently not specified.

I'd love to hear why you're interested in a stable sort. In JRuby, we 
recently put in place an unstable hybrid of a couple sorts so we'd have 
competitive performance. Our original sort was exactly what you talk 
about for Python, a "pretty fast" stable modified merge sort.

I'd also be interested to see how the performance of Timsort compares to 
what's currently in Ruby and JRuby as well as compared to the built-in 
JDK sort we used before. If stable sorting is desired, we could 
certainly provide a flag or flip a bit in JRuby to provide it.

02/03/2009 10:13 AM - Shyouhei Urabe

  • Status changed from Open to Feedback
  • Assigned to set to Yukihiro Matsumoto

02/06/2009 03:06 AM - Kornelius Kalnbach

Hello,

it took me some time to answer. Your question wasn't easy ;)

Charles Oliver Nutter wrote:
> I'd love to hear why you're interested in a stable sort.
Two use cases come to my mind where stable sort is needed. Excuse me for
the confusing examples, but I find it always hard to talk about this topic.

The first one is a bit pathetic. A list of different objects, some of
which are ==, but the list is already sorted. Unstable sort might change
the list order, which might surprise some people.

# Something that can be == without being the same.
class StrangeInt < Struct.new :value
  def <=> other
    self.value / 2 <=> other.value / 2
  end
  include Comparable
  def inspect
    value
  end
end
StrangeInt.new(4) == StrangeInt.new(5)  # => true

a = Array.new(10) { |i| StrangeInt.new i }
a  # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# stable sort would return this
a.sort_by { |x| [x.value / 2, x.value] }
# => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

a.sort
# => [0, 1, 3, 2, 4, 5, 6, 7, 8, 9]

a.sort_by { |x| x }
# => [0, 1, 3, 2, 4, 5, 6, 7, 8, 9]

(This also shows that currently, Array#sort does not check if the list
is already sorted.)

The second example is sort_by chaining. Imagine you have a table view of
files with name, file type, and file size. When you're searching for the
biggest PDF file, you first click twice on the "size" column and then
once on the "type" column, and scroll down to the .pdf files to find the
biggest one on the top. A Rails-style variant:

  files.sort_by(&:size).reverse.sort_by(&:type)

...which doesn't work as expected with unstable sort! You need to

  files.sort_by { |file| [file.type, -file.size] }

I think stable sort feels more natural here.

And the -file.size trick doesn't even work for strings! We had such a
case in a project, and in the end we couldn't implement it the way we
wanted. With a stable sort, our implementation would become much more
clean. I think the example is a very strong case for stable sorting.

> In JRuby, we recently put in place an unstable hybrid of a couple 
> sorts so we'd have competitive performance. Our original sort was 
> exactly what you talk about for Python, a "pretty fast" stable 
> modified merge sort.
Oh noes ^_^ How much difference did it make?

> I'd also be interested to see how the performance of Timsort compares
>  to what's currently in Ruby and JRuby as well as compared to the 
> built-in JDK sort we used before. If stable sorting is desired, we 
> could certainly provide a flag or flip a bit in JRuby to provide it.
I don't know how timsort would perform compared to the current MRI
quicksort or JRuby's sort implementation. I assume it takes some time to
port it to Ruby for testing, and my experience with Ruby core
programming is limited. But I think the question of a need for stable
sorting is independent from that.

[murphy]

02/06/2009 11:38 PM - Charles Nutter

Kornelius Kalnbach wrote:
> Charles Oliver Nutter wrote:
>> I'd love to hear why you're interested in a stable sort.
> Two use cases come to my mind where stable sort is needed. Excuse me for
> the confusing examples, but I find it always hard to talk about this topic.

Your two cases make sense to me, and are similar to reasons I gave for 
us not to abandon a stable sort. I eventually overruled myself, however, 
after more published benchmark results put us slower than Ruby 1.8 for 
array sorts. Perception is important; more important than sort 
stability, I have not decided :)

>> In JRuby, we recently put in place an unstable hybrid of a couple 
>> sorts so we'd have competitive performance. Our original sort was 
>> exactly what you talk about for Python, a "pretty fast" stable 
>> modified merge sort.
> Oh noes ^_^ How much difference did it make?

I guess we ended up going with an unstable sort because it did make a 
difference; I think keeping the sort stable makes it harder to combine 
allgorithms for different classes of list (partially sorted, reversed, 
randomized) which seemed to be key to matching MRI's sort performance. I 
raised a concern about stability, but we figured if nobody had seen 
issues in MRI nobody would care if JRuby's sort became unstable.

Perhaps this is a case where having the fast unstable sort support a 
slower, possibly ruby-based, stable sort is the way to go. In general a 
ruby-based sort performs pretty well on both JRuby and Ruby 1.9. Here's 
some recent results:

http://blog.concord.org/archives/24-Comparison-of-Ruby-1.8.6-1.9-and-JRuby-running-on-Java-1.5-1.6-and-1.7.html

It certainly wouldn't be as fast as a native sort, but perhaps that's 
not a big deal?

- Charlie

02/07/2009 12:57 AM - Kornelius Kalnbach

Charles Oliver Nutter wrote:
> Your two cases make sense to me, and are similar to reasons I gave
> for us not to abandon a stable sort. I eventually overruled myself,
> however, after more published benchmark results put us slower than
> Ruby 1.8 for array sorts. Perception is important; more important
> than sort stability, I have not decided :)
actually, I'd prefer the feature of "stable sort" over "fast sort". I
use Ruby for convenience, not for speed. still: what is the tradeoff?

> I guess we ended up going with an unstable sort because it did make a
>  difference; I think keeping the sort stable makes it harder to
> combine allgorithms for different classes of list (partially sorted,
> reversed, randomized) which seemed to be key to matching MRI's sort
> performance. I raised a concern about stability, but we figured if
> nobody had seen issues in MRI nobody would care if JRuby's sort
> became unstable.
it's critical how much the difference was...if we get stability for,
say, 15% more sort time, I'd buy it. if even the best alogrithms are 50x
slower, I'd abandon my request. but I don't think this is the case; more
investigation follows.

anything that performs better than this would be appreciated (stable
sort is roughly 50 times slower):

time ruby19 -e 'Array(1..1000000).shuffle.sort'
real	0m0.529s
real	0m0.164s (without sorting)

i = 0; Array(1..1000000).shuffle.sort_by { |o| [o, i += 1] }
real	0m18.742s

Array(1..1000000).shuffle.sort_by.with_index { |o, i| [o, i] }
real	0m18.430s

Array(1..1000000).shuffle.sort_by.with_index { |*a| a }
real	0m18.383s

> Perhaps this is a case where having the fast unstable sort support a 
> slower, possibly ruby-based, stable sort is the way to go. In general
> a ruby-based sort performs pretty well on both JRuby and Ruby 1.9.
> Here's some recent results:
> 
> http://rubyurl.com/3d87
> 
> It certainly wouldn't be as fast as a native sort, but perhaps that's
>  not a big deal?
it is...using their benchmark mergesort:
-rmergesort -e 'mergesort Array(1..1000000).shuffle'
real	0m18.066s

wow, that's faster than the sort_by approach :D my own in-place merge
sort version (based on the one Java uses internally for Object[]) is
even faster:

-rmerge_sort -e 'Array(1..1000000).shuffle.merge_sort!'
real	0m10.401s

so, the best I currently get with pure Ruby is about 30x. I'm sure we
can do better with a Java or C implementation.

let's see timsort now. some Python 2.5 benchmarks:

time python2.5 -c 'from random import shuffle; list = range(1000000);
shuffle(list); sorted(list)'
real	0m2.432s
real	0m1.191s (without sorting)

so I guess timsort takes about 1.3s, which would be about 3.5x slower
than Ruby's sort. this is not very nice; but 1.3s are still much better
than 10s!

so, maybe stable sort should be optional. what options do we have?

- compiler switch, global variable and such; we don't want that

- a new method, like Enumerable#stable_sort. my code example would
  become:

  files.stable_sort_by(&:size).reverse.stable_sort_by(&:type)

- an optional parameter to sort and sort_by:

     enum.sort(stable=false)                   -> array
     enum.sort(stable=false) {| a,b | block }  -> array
     enum.sort_by(stable=false) {| obj | block }  -> array

     array.sort!(stable=false)                   -> array
     array.sort!(stable=false) {| a,b | block }  -> array
     array.sort_by!(stable=false) {| obj | block }  -> array

  I know boolean parameters are not nice, but we already use them in
  Object#methods. my code example would become:

  files.sort_by(true, &:size).reverse.sort_by(true, &:type)

none of these options feels like Ruby to me. I hope somebody has another
idea.

[murphy]

02/07/2009 01:06 AM - Kornelius Kalnbach

Charles Oliver Nutter wrote:
> In JRuby, we recently put in place an unstable hybrid of a couple
> sorts so we'd have competitive performance.
can you remember which version if JRuby still contains the stable sort?
I'd like to do benchmarks...

[murphy]

09/19/2009 05:58 AM - Marc-Andre Lafortune

  • Status changed from Feedback to Open

10/11/2009 01:42 AM - Yui NARUSE

  • Status changed from Open to Rejected

11/06/2009 12:00 PM - Kornelius Kalnbach

I'm not sure why the proposal was rejected. Can somebody explain?

If we don't want to change the #sort implementation for performance reasons, I can open another ticket for the request of an alternative, fast and stable sort method in core Ruby. I still think that Ruby should have a core method for stable sorting.

11/06/2009 04:14 PM - Yukihiro Matsumoto

Hi,

In message "Re: [ruby-core:26557] [Feature #1089] Stable sorting for sort and sort_by"
    on Fri, 6 Nov 2009 12:00:35 +0900, Kornelius Kalnbach <redmine@ruby-lang.org> writes:

|I'm not sure why the proposal was rejected. Can somebody explain?

As far as I know, sort stability is rarely required, not worthy for
slows down sorting in general.  If there's stable sort routine faster
than current one, please tell me.

|If we don't want to change the #sort implementation for performance reasons, I can open another ticket for the request of an alternative, fast and stable sort method in core Ruby. I still think that Ruby should have a core method for stable sorting.

If you really need stable sort, you can do so using sort_by pretty
easily.

 i = 0
 ary.sort_by{|x| [x, i += 1] }

							matz.

11/06/2009 04:23 PM - Yui NARUSE

Because you can't persuade Charles (and matz).

I maintain tickets which doesn't concluded for long time.
Rejecting tickets kills or reactivates it.

You can open another ticket for analternative or
reopen this with new implementation or use cases which strongly needs stable sorting.
Seeing above matz' comment, you should show use cases which needs fast and stable sort.

Also available in: Atom PDF