Project

General

Profile

Actions

Bug #252

closed

Array#sort doesn't respect overridden <=>

Added by zenspider (Ryan Davis) almost 16 years ago. Updated almost 13 years ago.

Status:
Closed
Target version:
-
ruby -v:
Backport:
[ruby-core:17708]

Description

=begin
We had this issue reported against rubinius. We're not sure whether it is a bug or a feature in MRI.

class String
def <=>(other)
self.length <=> other.length
end
end

junk = %w[these words should be sorted in the order of their length]
puts junk.sort

MRI (1.8, 1.9) does not respect new behavior of <=>.

Bug or feature?
=end

Actions #1

Updated by zenspider (Ryan Davis) almost 16 years ago

=begin
I guess I should ask: bug, feature, or undefined behavior?
=end

Actions #2

Updated by pragdave (Dave Thomas) almost 16 years ago

=begin

On Jul 9, 2008, at 6:00 PM, Ryan Davis wrote:

junk = %w[these words should be sorted in the order of their length]
puts junk.sort

MRI (1.8, 1.9) does not respect new behavior of <=>.

Bug or feature?

I seem to remember this from a while back--I seem to remember that
String is special-cased for performance.

=end

Actions #3

Updated by zenspider (Ryan Davis) almost 16 years ago

=begin

On Jul 9, 2008, at 17:27 , Dave Thomas wrote:

On Jul 9, 2008, at 6:00 PM, Ryan Davis wrote:

junk = %w[these words should be sorted in the order of their length]
puts junk.sort

MRI (1.8, 1.9) does not respect new behavior of <=>.

Bug or feature?

I seem to remember this from a while back--I seem to remember that
String is special-cased for performance.

and that's cool... but I'd like an "official" weigh in describing it
as either specification or implementation detail. ATM, I don't think
rubinius would see a performance loss if it respected the overridden
<=> method.

=end

Actions #4

Updated by vvs (Vladimir Sizikov) almost 16 years ago

=begin
This seems to be a common performance trick in MRI, esp. for Fixnums and Strings.

Looking at array.c (sort_2), indeed Fixnums and Strings are special-cased.

I also checked JRuby, and JRuby follows what MRI does (and this does improve performance for Fixnum and String arrays).
=end

Actions #5

Updated by nobu (Nobuyoshi Nakada) over 15 years ago

=begin
Hi,

At Thu, 10 Jul 2008 16:44:33 +0900,
Vladimir Sizikov wrote in [ruby-core:17720]:

This seems to be a common performance trick in MRI, esp. for Fixnums and Strings.

Looking at array.c (sort_2), indeed Fixnums and Strings are special-cased.

I've forgotten to post this patch.


Index: intern.h

--- intern.h (revision 18354)
+++ intern.h (working copy)
@@ -168,4 +168,5 @@ void rb_alias _((VALUE, ID, ID));
void rb_attr _((VALUE,ID,int,int,int));
int rb_method_boundp _((VALUE, ID, int));
+int rb_method_basic_definition_p _((VALUE, ID));
VALUE rb_dvar_defined _((ID));
VALUE rb_dvar_curr _((ID));
Index: array.c

--- array.c (revision 18354)
+++ array.c (working copy)
@@ -1686,6 +1686,24 @@ struct ary_sort_data {
VALUE *ptr;
long len;

  • int opt_methods;
  • int opt_inited;
    };

+enum {

  • sort_opt_Fixnum,
  • sort_opt_String,
  • sort_optimizable_count
    +};

+#define STRING_P(s) (TYPE(s) == T_STRING && CLASS_OF(s) == rb_cString)
+
+#define SORT_OPTIMIZABLE_BIT(type) (1U << TOKEN_PASTE(sort_opt_,type))
+#define SORT_OPTIMIZABLE(data, type) \

  • ((data->opt_inited & SORT_OPTIMIZABLE_BIT(type)) ? \
  • (data->opt_methods & SORT_OPTIMIZABLE_BIT(type)) : \
    
  • ((data->opt_inited |= SORT_OPTIMIZABLE_BIT(type)), \
    
  •  rb_method_basic_definition_p(TOKEN_PASTE(rb_c,type), id_cmp) && \
    
  •  (data->opt_methods |= SORT_OPTIMIZABLE_BIT(type))))
    

static void
ary_sort_check(data)
@@ -1719,11 +1737,11 @@ sort_2(ap, bp, data)
int n;

  • if (FIXNUM_P(a) && FIXNUM_P(b)) {
  • if (FIXNUM_P(a) && FIXNUM_P(b) && SORT_OPTIMIZABLE(data, Fixnum)) {
    if ((long)a > (long)b) return 1;
    if ((long)a < (long)b) return -1;
    return 0;
    }
  • if (TYPE(a) == T_STRING) {
  • if (TYPE(b) == T_STRING) return rb_str_cmp(a, b);
  • if (STRING_P(a) && STRING_P(b) && SORT_OPTIMIZABLE(data, String)) {
  • return rb_str_cmp(a, b);
    }

@@ -1743,4 +1761,6 @@ sort_internal(ary)
data.ary = ary;
data.ptr = RARRAY(ary)->ptr; data.len = RARRAY(ary)->len;

  • data.opt_methods = 0;
  • data.opt_inited = 0;
    qsort(RARRAY(ary)->ptr, RARRAY(ary)->len, sizeof(VALUE),
    rb_block_given_p()?sort_1:sort_2, &data);
    Index: eval.c
    ===================================================================
    --- eval.c (revision 18355)
    +++ eval.c (working copy)
    @@ -405,6 +405,7 @@ static ID id, send, respond_to;

#define NOEX_TAINTED 8
-#define NOEX_SAFE(n) ((n) >> 4)
-#define NOEX_WITH(n, v) ((n) | (v) << 4)
+#define NOEX_BASIC 16
+#define NOEX_SAFE(n) ((n) >> 5)
+#define NOEX_WITH(n, v) ((n) | (v) << 5 | (ruby_running ? 0 : NOEX_BASIC))
#define NOEX_WITH_SAFE(n) NOEX_WITH(n, ruby_safe_level)

@@ -4218,5 +4219,14 @@ module_setup(module, n)
}

-static NODE *basic_respond_to = 0;
+int
+rb_method_basic_definition_p(klass, id)

  • VALUE klass;
  • ID id;
    +{
  • NODE *node = rb_method_node(klass, id);
  • if (node && (node->nd_noex & NOEX_BASIC))
  • return 1;
  • return 0;
    +}

int
@@ -4228,5 +4238,5 @@ rb_obj_respond_to(obj, id, priv)
VALUE klass = CLASS_OF(obj);

  • if (rb_method_node(klass, respond_to) == basic_respond_to) {
  • if (rb_method_basic_definition_p(klass, respond_to)) {
    return rb_method_boundp(klass, id, !priv);
    }
    @@ -8185,6 +8195,4 @@ Init_eval()
    rb_define_method(rb_mKernel, "respond_to?", obj_respond_to, -1);
    respond_to = rb_intern("respond_to?");
  • rb_global_variable((void *)&basic_respond_to);

  • basic_respond_to = rb_method_node(rb_cObject, respond_to);

    rb_define_global_function("raise", rb_f_raise, -1);

    Index: include/ruby/intern.h
    ===================================================================
    --- include/ruby/intern.h (revision 18354)
    +++ include/ruby/intern.h (working copy)
    @@ -250,4 +250,5 @@ void rb_alias(VALUE, ID, ID);
    void rb_attr(VALUE,ID,int,int,int);
    int rb_method_boundp(VALUE, ID, int);
    +int rb_method_basic_definition_p(VALUE, ID);
    VALUE rb_eval_cmd(VALUE, VALUE, int);
    int rb_obj_respond_to(VALUE, ID, int);
    Index: include/ruby/node.h
    ===================================================================
    --- include/ruby/node.h (revision 18354)
    +++ include/ruby/node.h (working copy)
    @@ -464,5 +464,6 @@ typedef struct RNode {
    #define NOEX_PRIVATE 0x02
    #define NOEX_PROTECTED 0x04
    -#define NOEX_MASK 0x06 /* 1110 /
    +#define NOEX_MASK 0x06 /
    0110 */
    +#define NOEX_BASIC 0x08

#define NOEX_UNDEF NOEX_NOSUPER
@@ -473,5 +474,5 @@ typedef struct RNode {

#define NOEX_SAFE(n) (((n) >> 8) & 0x0F)
-#define NOEX_WITH(n, s) ((s << 8) | n)
+#define NOEX_WITH(n, s) ((s << 8) | (n) | (ruby_running ? 0 : NOEX_BASIC))
#define NOEX_WITH_SAFE(n) NOEX_WITH(n, rb_safe_level())

Index: array.c

--- array.c (revision 18354)
+++ array.c (working copy)
@@ -1447,8 +1447,30 @@ rb_ary_reverse_m(VALUE ary)
}

+struct ary_sort_data {

  • VALUE ary;
  • int opt_methods;
  • int opt_inited;
    +};

+enum {

  • sort_opt_Fixnum,
  • sort_opt_String,
  • sort_optimizable_count
    +};

+#define STRING_P(s) (TYPE(s) == T_STRING && CLASS_OF(s) == rb_cString)
+
+#define SORT_OPTIMIZABLE_BIT(type) (1U << TOKEN_PASTE(sort_opt_,type))
+#define SORT_OPTIMIZABLE(data, type) \

  • ((data->opt_inited & SORT_OPTIMIZABLE_BIT(type)) ? \
  • (data->opt_methods & SORT_OPTIMIZABLE_BIT(type)) : \
    
  • ((data->opt_inited |= SORT_OPTIMIZABLE_BIT(type)), \
    
  •  rb_method_basic_definition_p(TOKEN_PASTE(rb_c,type), id_cmp) && \
    
  •  (data->opt_methods |= SORT_OPTIMIZABLE_BIT(type))))
    

static VALUE
-sort_reentered(VALUE *klass)
+sort_reentered(VALUE ary)
{

  • if (*klass) {
  • if (RBASIC(ary)->klass) {
    rb_raise(rb_eRuntimeError, "sort reentered");
    }
    @@ -1459,5 +1481,6 @@ static int
    sort_1(const void *ap, const void *bp, void *dummy)
    {
  • VALUE retval = sort_reentered(dummy);
  • struct ary_sort_data *data = dummy;
  • VALUE retval = sort_reentered(data->ary);
    VALUE a = *(const VALUE *)ap, b = *(const VALUE *)bp;
    int n;
    @@ -1465,5 +1488,5 @@ sort_1(const void *ap, const void *bp, v
    retval = rb_yield_values(2, a, b);
    n = rb_cmpint(retval, a, b);
  • sort_reentered(dummy);
  • sort_reentered(data->ary);
    return n;
    }
    @@ -1472,20 +1495,21 @@ static int
    sort_2(const void *ap, const void *bp, void *dummy)
    {
  • VALUE retval = sort_reentered(dummy);
  • struct ary_sort_data *data = dummy;
  • VALUE retval = sort_reentered(data->ary);
    VALUE a = *(const VALUE *)ap, b = *(const VALUE *)bp;
    int n;
  • if (FIXNUM_P(a) && FIXNUM_P(b)) {
  • if (FIXNUM_P(a) && FIXNUM_P(b) && SORT_OPTIMIZABLE(data, Fixnum)) {
    if ((long)a > (long)b) return 1;
    if ((long)a < (long)b) return -1;
    return 0;
    }
  • if (TYPE(a) == T_STRING) {
  • if (TYPE(b) == T_STRING) return rb_str_cmp(a, b);
  • if (STRING_P(a) && STRING_P(b) && SORT_OPTIMIZABLE(data, String)) {

  • return rb_str_cmp(a, b);
    }

    retval = rb_funcall(a, id_cmp, 1, b);
    n = rb_cmpint(retval, a, b);

  • sort_reentered(dummy);
  • sort_reentered(data->ary);

    return n;
    @@ -1514,8 +1538,12 @@ rb_ary_sort_bang(VALUE ary)
    if (RARRAY_LEN(ary) > 1) {
    VALUE tmp = ary_make_shared(ary);

  • struct ary_sort_data data;

    RBASIC(tmp)->klass = 0;

  • data.ary = tmp;

  • data.opt_methods = 0;

  • data.opt_inited = 0;
    ruby_qsort(RARRAY_PTR(tmp), RARRAY_LEN(tmp), sizeof(VALUE),

  •     rb_block_given_p()?sort_1:sort_2, &RBASIC(tmp)->klass);
    
  •     rb_block_given_p()?sort_1:sort_2, &data);
    
    if (RARRAY(ary)->ptr != RARRAY(tmp)->ptr) {
    if (!ARY_SHARED_P(ary)) xfree(RARRAY(ary)->ptr);
    Index: vm_method.c
    ===================================================================
    --- vm_method.c (revision 18354)
    +++ vm_method.c (working copy)
    @@ -1045,4 +1045,13 @@ rb_mod_modfunc(int argc, VALUE *argv, VA
    }

+int
+rb_method_basic_definition_p(VALUE klass, ID id)
+{

  • NODE *node = rb_method_node(klass, id);
  • if (node && (node->nd_noex & NOEX_BASIC))
  • return 1;
  • return 0;
    +}

/*

  • call-seq:
    @@ -1054,6 +1063,4 @@ rb_mod_modfunc(int argc, VALUE *argv, VA
    */

-static NODE *basic_respond_to = 0;

int
rb_obj_respond_to(VALUE obj, ID id, int priv)
@@ -1061,5 +1068,5 @@ rb_obj_respond_to(VALUE obj, ID id, int
VALUE klass = CLASS_OF(obj);

  • if (rb_method_node(klass, idRespond_to) == basic_respond_to) {
  • if (rb_method_basic_definition_p(klass, idRespond_to)) {
    return rb_method_boundp(klass, id, !priv);
    }
    @@ -1109,6 +1116,4 @@ Init_eval_method(void)

    rb_define_method(rb_mKernel, "respond_to?", obj_respond_to, -1);

  • basic_respond_to = rb_method_node(rb_cObject, idRespond_to);

  • rb_register_mark_object((VALUE)basic_respond_to);

    rb_define_private_method(rb_cModule, "remove_method", rb_mod_remove_method, -1);

--
Nobu Nakada

=end

Actions #6

Updated by matz (Yukihiro Matsumoto) over 15 years ago

=begin
Hi,

In message "Re: [ruby-core:18114] Re: [Ruby 1.8 - Bug #252] Array#sort doesn't respect overridden <=>"
on Mon, 4 Aug 2008 15:44:50 +0900, Nobuyoshi Nakada writes:

|I've forgotten to post this patch.

Interesting. Could you apply the patch (to trunk)?

						matz.

=end

Actions #7

Updated by matz (Yukihiro Matsumoto) over 15 years ago

=begin
Hi,

In message "Re: [ruby-core:18177] Re: [Ruby 1.8 - Bug #252] Array#sort doesn't respect overridden <=>"
on Fri, 8 Aug 2008 06:17:41 +0900, Lin Jen-Shin writes:

|And should this behavior apply to condition (e.g. if, unless, while, etc.)
|to call :nil? as well?

No.

						matz.

=end

Actions #8

Updated by matz (Yukihiro Matsumoto) over 15 years ago

=begin
Hi,

In message "Re: [ruby-core:18124] Re: [Ruby 1.8 - Bug #252] Array#sort doesn't respect overridden <=>"
on Tue, 5 Aug 2008 04:49:27 +0900, Charles Oliver Nutter writes:

|So this is going to be a behavioral change now? My reading of the patch
|tells me it's now checking to see if <=> has been overridden and if so
|using the path that actually calls it. As Ryan pointed out, this would
|be new behavior. Can we get an official ruling on what is correct in 1.8
|and 1.9.x?

We are not going to change 1.8 semantic in the near future. We should
keep compatibility, even when it's inconsistent, as long as it's not
"wrong".

For 1.9, I think it's better to adopt the new behavior. What do you
think?

						matz.

=end

Actions #9

Updated by zenspider (Ryan Davis) over 15 years ago

=begin
If this patch was applied, we should close the bug.
=end

Actions #10

Updated by matz (Yukihiro Matsumoto) over 15 years ago

  • Status changed from Open to Closed

=begin
patch already applied.
=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0