Bug #19278
closedConstructing subclasses of Data with positional arguments
Description
I'd expect both of the following subclasses to work, but the subclass that uses positional parameters raises an exception:
Foo = Data.define
class Bar < Foo
def initialize foo:
p foo
end
end
class Baz < Foo
def initialize foo
p foo
end
end
Bar.new foo: 1 # Prints 1
Baz.new 1 # Raises ArgumentError
I'd expect the subclass that uses positional arguments to work.
Updated by zverok (Victor Shepelev) almost 2 years ago
- Status changed from Open to Feedback
It is a deliberate and documented design decision: new
handles unifying positional and keyword args, and initialize
always receives keyword args.
Note that Measure#initialize always receives keyword arguments, and that mandatory arguments are checked in initialize, not in new. This can be important for redefining initialize in order to convert arguments or provide defaults
Check it:
class Point < Data.define(:x, :y)
def initialize(*a, **kwa)
puts "args: #{a}, kwargs: #{kwa}"
super
end
end
Point.new(1, 2)
# args: [], kwargs: {:x=>1, :y=>2}
Point.new(x: 1, y: 2)
# args: [], kwargs: {:x=>1, :y=>2}
This decision was made for usability: this way, it is easy for class user to redefine #initialize
with small additions (like converting argument types or providing defaults):
class Point < Data.define(:x, :y, :z)
def initialize(x:, y:, z: 0) = super
end
p Point.new(1, 2)
#=> #<data Point x=1, y=2, z=0>
p Point.new(x: 1, y: 2)
#=> #<data Point x=1, y=2, z=0>
p Point.new(1, 2, 3)
#=> #<data Point x=1, y=2, z=3>
p Point.new(x: 1, y: 2, z: 3)
#=> <data Point x=1, y=2, z=3>
Now, imagine amount of code one would need to write in #initialize
if the difference of keyword vs positional arguments initialization would be handled there.
Updated by tenderlovemaking (Aaron Patterson) almost 2 years ago
I guess the fact that new
doesn't behave the same way as new
on every other class kind of ruins "usability" for me. As an end user, it's not clear why this works:
class Foo
def initialize x:, y:
end
end
class Bar < Foo
def initialize(x, y)
super(x:, y:)
end
end
Bar.new(1, 2)
But this doesn't:
Foo = Data.define(:x, :y)
class Bar < Foo
def initialize(x, y)
super(x:, y:)
end
end
Bar.new(1, 2)
And even more confusingly:
Foo = Data.define(:x, :y)
class Bar < Foo
def initialize(x, y)
super(x:, y:)
end
end
Foo.new(1, 2) # this works
Bar.new(1, 2) # this doesn't
I should be able to call a constructor on the subclass in the same way that I could call the constructor on the super class.
Updated by Eregon (Benoit Daloze) almost 2 years ago
It seems hard to solve to me, without losing the significant advantages that @zverok (Victor Shepelev) mentions.
Also inheriting from a Data class seems kind of an anti/rare pattern.
One can of course:
Foo = Data.define(:x, :y)
def custom
end
...
end
Updated by zverok (Victor Shepelev) almost 2 years ago
@tenderlove I assure you that I spent a lot of time thinking about possible design decisions, and I believe that while somewhat unlike other classes, the current choice is the best in usability. It is documented (the documentation might emphasize the design choice more, with that I might agree).
The constraints I was working is was:
- flexible creation (e.g.
Point.new(1, 2)
andPoint.new(x: 1, y: 2)
both works) - as clear error indication with wrong arguments as possible without creating a whole framework for "explanatory errors" (think Rust)
- ease of updating
initialize
protocol for the end user (providing defaults and conversions).
If you have better idea of how to achieve it, I will gladly consider it.
That being said, I believe that "it was surprising on the first pass" does not necessarily equal "ruins usability", and the gains are more than losses.
Also inheriting from a Data class seems kind of an anti/rare pattern.
I don't think it is 100% true (especially if we'll consider Data.define(...) do
as an inheritance). At least, even in pre-Data
times, I have a fair share of usages doing nice things like
class Point < Struct.new(:x, :y)
def distance_to(other)
# ...
end
end
I think "immutable value objects" doesn't imply "no additional convenience querying methods".
And, as I said, the new
/initialize
pair was specifically designed for usability on inheritance (or do
-block-inheritance).
When we discussed the initial protocol, Matz was concerned about the ability to provide defaults for some fields (like some additional API maybe, like Data.define(:x, :y, z: 0)
(which looks nice, but has some grave problems), and I believe that this example provides a clear and usable compromise:
class Point < Data.define(:x, :y, :z)
def initialize(x:, y:, z: 0) = super
end
That is intended.
Data
is a new thing. It has its quirks, but we put a lot into designing it as a small, easy-to-learn, and convenient API.
Updated by sam.saffron (Sam Saffron) almost 2 years ago
Just going to be a bit more explicit here about the problem for future travelers (I know I am repeating but this is the most succinct way imo to show the issue)...
Foo = Data.define :x
class Bar < Foo
def initialize(*args, **kwargs)
p args
p kwargs
end
end
Bar.new(1)
# []
# {:x=>1}
I get that usability arguably (I would argue that now is surprising and unprecedented so it is a usability bomb, not bain) suffers for inheritance, but we would gain tons of clarity this way.
I am with tenderlove here, needing to reason about new -> initialize auto coercion is somewhat scary...
I know this is a bucket of compromises given where we are at, but couldn't this just work correctly without coercion?
class Foo
attr_reader :x
def initialize(*args, **kwargs)
raise ArgumentError if args.length > 0 && kwargs.length > 0
raise ArgumentError if args.length > 1 || kwargs.length > 1
raise ArgumentError if kwargs.length == 1 && !kwargs.key?(:x)
if args.length > 0
@x = args[0]
else
@x = kwargs[:x]
end
end
end
Updated by zverok (Victor Shepelev) almost 2 years ago
@sam.saffron (Sam Saffron) your hand-maid Foo
is analogous to Foo = Data.define(:x)
, right?
Now, can you please show what if somebody wants to inherit your Foo
to provide:
- argument conversion for
x
- or, default value for
x
I believe, the best you can get with would be something like this, for data conversion:
UNDEFINED = Object.new.freeze
class MyFoo < Foo
def initialize(pos_x = UNDEFINED, x: UNDEFINED)
# 1. check that one, and exactly of them is `UNDEFINED`
# 2. do the conversion with the right one, we'd like to preserve the protocol, right?..
if pos_x == UNDEFINED
super(x: x.to_i)
else
super(pos_x.to_i)
end
end
end
Providing the default value, how would you address this?.. Please share your thoughts!
With the current protocol, it is just like that:
# Conversion
Foo = Data.define(:x) do
def initialize(x:) = super(x: x.to_i)
end
# Default value:
Foo = Data.define(:x) do
def initialize(x: 0) = super
end
# Both with now work with Foo.new(1) and Foo.new(x: 1), without breaking the contract that Data promises
Yes, the rule "Data#initialize
accepts only keyword arguments" is unlike the barest "simple class" implementation, but:
- For a good reason
- It is a rule consisting of exactly one phrase; even if one considers it "a dumb quirk," it is not something that is impossible to explain or remember
- But it is not "a dumb quirk"
Updated by tenderlovemaking (Aaron Patterson) almost 2 years ago
Eregon (Benoit Daloze) wrote in #note-3:
It seems hard to solve to me, without losing the significant advantages that @zverok (Victor Shepelev) mentions.
Also inheriting from a Data class seems kind of an anti/rare pattern.
One can of course:Foo = Data.define(:x, :y) def custom end ... end
😅 I do this all the time with structs. I assumed Data.define
would basically just be a read-only Struct.new
and that's how I got here.
zverok (Victor Shepelev) wrote in #note-6:
Yes, the rule "
Data#initialize
accepts only keyword arguments" is unlike the barest "simple class" implementation
IMO this would be a fine rule if new
also only accepted keyword arguments. The automatic coercion from positional to keyword is what got me in to this pickle. This coercion makes subclasses unable to use positional arguments with initialize (a feature which, as I pointed out, regular classes support).
I guess it's not really a big deal if I just make all of my subclasses use keyword arguments, and then only ever instantiate them using keyword parameters. But is that the "best practice"? If so, why implement the coercion feature?
Updated by zverok (Victor Shepelev) almost 2 years ago
I do this all the time with structs. I assumed Data.define would basically just be a read-only Struct.new and that's how I got here.
(Wrote a long theoretical answer, deleted it!)
Wait. I now reread your initial example more attentively, and I see the source of the confusion now: that we are adjusting the initialize
to accept the arguments the initial class knew nothing about (not redefining it to accept the same args, but differently), and the behavior is indeed hard to grasp immediately.
I am not sure something can be done about it in a consistent manner, though.
That's a source of the conveniences that are provided in other cases (I think for "simple data classes" ability to be initialized by both keyword and positional argument is crucial for their adoption; and I still believe that the convention of "initialize is easy to redefine, but it should always accept keywords" is a good enough compromise).
It seems that "completely redefine what data is accepted to constructor" would be a natural limitation of Data
, which seems more or less suitable (inheriting to provide some convenience methods is one thing, building an inheritance chain is another).
The only way I see so far to make your Baz
work is this:
Foo = Data.define
class Bar < Foo
def self.new(foo) = super(foo:) # convert positional to keyword
def initialize(foo:) = p foo # ...then they are safely passed to initialize
end
Bar.new(1)
#=> 1
BTW, note that before 3.2, for structs it was either keyword_init: false
, or keyword_init: true
, not both.
That's why you were able to easily redefine initialize
: you knew it receives either keywords or positionals, not "both and need to choose". After 3.2, it is different, and I think that people would stumble on that many times (the design decision was made differently, too, unlike Data):
S = Struct.new(:x) do
def initialize(*a, **kwa)
puts "a=#{a}, kwa=#{kwa}"
end
end
S.new(1)
# a=[1], kwa={}
S.new(x: 1)
# a=[], kwa={:x=>1}
I think it will cause a fair share of confusion in completely different ways than the design choice for Data
did :)
Updated by Eregon (Benoit Daloze) almost 2 years ago
To clarify:
Also inheriting from a Data class seems kind of an anti/rare pattern.
I meant one should use Struct.new(...) { ... }
/ Data.define(...) { ... }
and not class < Struct.new(...)
/ class < Data.define(...)
which is wasteful because it creates an extra class and makes the hierarchy needlessly one level deeper (and also causes an unnamed supercass).
Updated by zverok (Victor Shepelev) almost 2 years ago
@Eregon (Benoit Daloze) Ah, yeah, that's true.
Though I saw codebases that did that for aesthetical purposes (all "classes with methods" look the same), or to not confuse some code analysis tools (I haven't checked lately, but I believe YARD at some point was able to properly parse class Foo < Struct.new(...)
and produce docs for it, but not Foo = Struct.new(...) do
)
Updated by tenderlovemaking (Aaron Patterson) almost 2 years ago
zverok (Victor Shepelev) wrote in #note-8:
BTW, note that before 3.2, for structs it was either
keyword_init: false
, orkeyword_init: true
, not both.That's why you were able to easily redefine
initialize
: you knew it receives either keywords or positionals, not "both and need to choose". After 3.2, it is different, and I think that people would stumble on that many times (the design decision was made differently, too, unlike Data):S = Struct.new(:x) do def initialize(*a, **kwa) puts "a=#{a}, kwa=#{kwa}" end end S.new(1) # a=[1], kwa={} S.new(x: 1) # a=[], kwa={:x=>1}
I don't really understand this example. The initialize
works the same way whether you use a Struct or not:
S = Struct.new(:x) do
def initialize(*a, **kwa)
puts "a=#{a}, kwa=#{kwa}"
end
end
S.new(1)
# a=[1], kwa={}
S.new(x: 1)
# a=[], kwa={:x=>1}
class K
def initialize(*a, **kwa)
puts "a=#{a}, kwa=#{kwa}"
end
end
K.new(1)
# a=[1], kwa={}
K.new(x: 1)
# a=[], kwa={:x=>1}
I think it will cause a fair share of confusion in completely different ways than the design choice for
Data
did :)
Data
is confusing for me because it doesn't share the same behavior as a regular class (where Struct does).
Eregon (Benoit Daloze) wrote in #note-9:
To clarify:
Also inheriting from a Data class seems kind of an anti/rare pattern.
I meant one should use
Struct.new(...) { ... }
/Data.define(...) { ... }
and notclass < Struct.new(...)
/class < Data.define(...)
which is wasteful because it creates an extra class and makes the hierarchy needlessly one level deeper (and also causes an unnamed supercass).
Sure, but making a class that eventually inherits from a class defined via Struct.new
is something I normally do. I wanted to do the same thing with Data
, but found I could not because it defines new
differently.
Updated by zverok (Victor Shepelev) almost 2 years ago
I don't really understand this example. The initialize works the same way whether you use a Struct or not
It is not.
- Both 3.2+
Struct
andData
make a promise that no other class makes and is non-trivial to implement: if you have Struct/Data-derived classC
with memberx
, then bothC.new(x: 1)
andC.new(1)
work. - These promises are implemented differently (because different people worked on them :shrug:): In
Struct
, the#initialize
handles "unify positional and keywords", inData
,.new
handles. - This leads to different cost/benefit outcome: in
Struct
, it is "like in other classes" but redefining#initialize
is non-trivial, inData
, it is "one interesting quirk", but redefining#initialize
is trivial.
Can you please provide a small realistic example of how would you like to define your Struct or Data-inherited class' #initialize
to not break this contract and achieve the goal you were trying to achieve? We can proceed from there to maybe come to some compromise or better understand each other's points.
Updated by tenderlovemaking (Aaron Patterson) almost 2 years ago
- Status changed from Feedback to Rejected
zverok (Victor Shepelev) wrote in #note-12:
I don't really understand this example. The initialize works the same way whether you use a Struct or not
It is not.
- Both 3.2+
Struct
andData
make a promise that no other class makes and is non-trivial to implement: if you have Struct/Data-derived classC
with memberx
, then bothC.new(x: 1)
andC.new(1)
work.
Your example didn't demonstrate this difference.
- These promises are implemented differently (because different people worked on them :shrug:): In
Struct
, the#initialize
handles "unify positional and keywords", inData
,.new
handles.- This leads to different cost/benefit outcome: in
Struct
, it is "like in other classes" but redefining#initialize
is non-trivial, inData
, it is "one interesting quirk", but redefining#initialize
is trivial.
I really don't understand point 3. As I showed in the original post as well as as other examples, I cannot "trivially" define an initialize as I would with normal Ruby classes.
Can you please provide a small realistic example of how would you like to define your Struct or Data-inherited class'
#initialize
to not break this contract and achieve the goal you were trying to achieve? We can proceed from there to maybe come to some compromise or better understand each other's points.
To be honest, I don't really care anymore. The problems with new
on Data
seem quite clear to me, but perhaps I am wrong. Struct seems to behave in the way I would expect, so I will just use that (or write regular POROs).
Thanks for the feedback.
Updated by zverok (Victor Shepelev) almost 2 years ago
Your example didn't demonstrate this difference.
I really don't understand point 3.
To be honest, I don't really care anymore.
It is a pity, after the thread this long.
If you would've cared, you'd maybe spent a few minutes drafting an initialize
that would work the way you want and maintained the contract, and my points would probably become more obvious.
(FTR, the starting example of the ticket wasn't really illustrative because the point of creating Data
with no members and then adding instance variables in initialize
is unclear. Data
is a value object that is good for value object goals, not a generic "just base class for anything".)
But it is what it is, I think.
Updated by tenderlovemaking (Aaron Patterson) almost 2 years ago
zverok (Victor Shepelev) wrote in #note-14:
Your example didn't demonstrate this difference.
I really don't understand point 3.
To be honest, I don't really care anymore.
It is a pity, after the thread this long.
If you would've cared, you'd maybe spent a few minutes drafting an
initialize
that would work the way you want and maintained the contract, and my points would probably become more obvious.
Sorry, I think you misunderstood. I understand your points, I am just out of steam and don't care to continue the thread anymore.
Thanks.
Updated by zverok (Victor Shepelev) almost 2 years ago
- Related to Bug #19301: Fix Data class to report keyrest instead of rest parameters added