Bug #6087
closedHow should inherited methods deal with return values of their own subclass?
Description
Just noticed that we still don't have a consistent way to handle return values:
class A < Array
end
a = A.new
a.flatten.class # => A
a.rotate.class # => Array
(a * 2).class # => A
(a + a).class # => Array
Some methods are even inconsistent depending on their arguments:
a.slice!(0, 1).class # => A
a.slice!(0..0).class # => A
a.slice!(0, 0).class # => Array
a.slice!(1, 0).class # => Array
a.slice!(1..0).class # => Array
Finally, there is currently no constructor nor hook called when making these new copies, so they are never properly constructed.
Imagine this simplified class that relies on @foo
holding a hash:
class A < Array
def initialize(*args)
super
@foo = {}
end
def initialize_copy(orig)
super
@foo = @foo.dup
end
end
a = A.new.flatten
a.class # => A
a.instance_variable_get(:@foo) # => nil, should never happen
I feel this violates object orientation.
One solution is to always return the base class (Array
/String
/...).
Another solution is to return the current subclass. To be object oriented, I feel we must do an actual dup
of the object, including copying the instance variables, if any, and calling initialize_copy
. Exceptions to this would be (1) explicit documentation, e.g. Array#to_a
, or (2) methods inherited from a module (like Enumerable
methods for Array
).
I'll be glad to fix these once there is a decision made on which way to go.
Updated by trans (Thomas Sawyer) over 13 years ago
I would think these methods should be using self.class.new
for constructors thus returning the subclass. Although, that might not always possible.
Updated by marcandre (Marc-Andre Lafortune) over 13 years ago
- Assignee set to matz (Yukihiro Matsumoto)
Hi,
Thomas Sawyer wrote:
I would think these methods should be using
self.class.new
for constructors thus returning the subclass. Although, that might not always possible.
This has two problems:
- It imposes an API on the constructor of subclasses (i.e. that they accept one parameter which would be an instance of the base class)
- The builtin classes constructors doesn't even respect that, i.e.
Hash.new({1 => 2}).has_key?(1) # => false
--
Marc-André
Updated by marcandre (Marc-Andre Lafortune) over 13 years ago
I apparently forgot to mention that I prefer the second approach, i.e. the equivalent of calling dup
on the receiver.
I believe Aaron Patterson seconds this in [ruby-core:43030]
If this approach is accepted, the last remaining question is what of cases of instances of Array
/String
/... in which instance variables where set using instance_variable_set
. Should the instance variables copied over?
b = []
b.instance_variable_set(:@foo, 42)
b.flatten.instance_variable_get(:@foo) # => nil or 42?
I think that to be consistent, they should be copied (again, assuming we decide to return an instance of subclasses). In the discussion of #4136, Charles Nutter thinks it could hinder performance to do so, but I feel that cases where such objects happen to have instance variables set should be extremely rare, so I don't think it would have much effect in practice.
--
Marc-André
Updated by tenderlovemaking (Aaron Patterson) over 13 years ago
Yes, I do second this.
Updated by trans (Thomas Sawyer) over 13 years ago
This has two problems:
- It imposes an API on the constructor of subclasses (i.e. that they accept one parameter which would be an instance of the base class)
- The builtin classes constructors doesn't even respect that, i.e.
Hash.new({1 => 2}).has_key?(1) # => false
You took me a bit too literally. I only meant it should be equivalent too calling self.class.new
. In other words, it should return an instance of the subclass, not the base class. I did not mean to imply the necessary use of the constructor in this way --which (perhaps unfortunately) is not possible in some notable cases, as you point out.
Updated by shyouhei (Shyouhei Urabe) over 13 years ago
- Status changed from Open to Assigned
Updated by headius (Charles Nutter) over 13 years ago
I never noticed this before, so I'm jumping in a couple months late.
Duping the original object or copying its instance vars is wrong. Instance variables are state of an individual object, and should not be carried on to a new object as in these messages. There's no precedent for doing that other than dup'ing, which is explicitly for making a copy of the target object.
flatten et al are not returning "copies"...they're returning new instances with a different arrangement of the same elements. Therefore, those new objects should not automatically inherit instance variables from their parents.
It would be a good idea to design a formal way by which subclasses that want to propagate instance vars to new instances can do so. It just shouldn't be the default.
For the pattern that keeps coming up, where A < Array...you're doing it wrong anyway. Favor composition over inheritance :)
Updated by matz (Yukihiro Matsumoto) over 13 years ago
- Target version changed from 2.0.0 to 3.0
Array methods should return consistent values.
But we keep the behavior for now to maintain compatibility.
We will fix this (to consistently return Arrays) in 3.0.
Matz.
Updated by mame (Yusuke Endoh) almost 6 years ago
- Related to Bug #10845: Subclassing String added
Updated by nobu (Nobuyoshi Nakada) over 5 years ago
- Description updated (diff)
Updated by mame (Yusuke Endoh) over 5 years ago
Recently this ticket was discussed at dev-meeting, and matz changed his mind. I remember that matz said:
- A method that seems to return a new array that is directly related to the receiver, should return an instance of the receiver's class.
- A method that seems to return a new array that is not directly related to the receiver, should return an Array.
So, we need to decide the behavior for each method.
Updated by matz (Yukihiro Matsumoto) over 5 years ago
I used to think methods should honor subclasses, but I changed my mind that the behavior made things too complex.
So if possible I want to make every method return Array
instead of instance of a subclass. I just worry about the size of the incompatibility.
Matz.
Updated by matz (Yukihiro Matsumoto) over 5 years ago
Should we do an experiment in 3.0?
Matz
Updated by Eregon (Benoit Daloze) over 5 years ago
Much like all Enumerable methods return Array
and (of course) do not copy instance variables, I think Array methods should do the same.
This seems particularly important since Array overrides a few methods from Enumerable for optimization but that should be entirely transparent.
For example, returning a subclass in e.g. Array#map would make it inconsistent with Enumerable#map.
So I'm in favor of no subclass handling here.
We're creating a new instance, and copying the entire state from the receiver doesn't seem reasonable to me.
If people want to keep receiver state like class and @ivars, they can always use mutating methods + #dup if needed.
Updated by ko1 (Koichi Sasada) over 5 years ago
Eregon (Benoit Daloze) wrote in #note-14:
Much like all Enumerable methods return
Array
and (of course) do not copy instance variables, I think Array methods should do the same.
+1
Updated by Dan0042 (Daniel DeLorme) over 5 years ago
- A method that seems to return a new array that is directly related to the receiver, should return an instance of the receiver's class.
- A method that seems to return a new array that is not directly related to the receiver, should return an Array.
So this is the old thinking?
I used to think methods should honor subclasses, but I changed my mind that the behavior made things too complex.
And this is the new thinking? In that case +1
If a subclass needs a method to return an instance of the subclass, it can easily and safely opt-in to this behavior (similar to Hash)
class A < Array
def select(...)
A.new(super) #or e.g. dup.replace(super) depending on specifics of the subclass
end
end
On the other hand returning a subclass by default opens the door to all kinds of complexity and bugs depending on how the subclass is implemented. In particular if it has any state/ivars. ary.select
is not the same as ary.dup.select!
in that case.
Is there somewhere a complete list of methods that currently return a subclass?
For Array I think there's only this: drop, drop_while, take, take_while, flatten, uniq, slice
Updated by jeremyevans0 (Jeremy Evans) about 5 years ago
matz (Yukihiro Matsumoto) wrote in #note-13:
Should we do an experiment in 3.0?
I've added a pull request that modifies the Array methods to return Array instances instead of subclass instances: https://github.com/ruby/ruby/pull/3690
Updated by jeremyevans (Jeremy Evans) almost 5 years ago
- Status changed from Assigned to Closed
Applied in changeset git|2a294d499bf03211d02695f613f784a05943ea35.
Make Array methods return Array instances instead of subclass instances
This changes the following methods to return Array instances instead
of subclass instances:
- Array#drop
- Array#drop_while
- Array#flatten
- Array#slice!
- Array#slice/#[]
- Array#take
- Array#take_while
- Array#uniq
- Array#*
Fixes [Bug #6087]