Project

General

Profile

Actions

Feature #19000

closed

Data: Add "Copy with changes method" [Follow-on to #16122 Data: simple immutable value object]

Added by RubyBugs (A Nonymous) over 1 year ago. Updated over 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:109850]

Description

As requested: extracted a follow-up to #16122 Data: simple immutable value object from this comment

Proposal: Add a "Copy with changes" method to Data

Assume the proposed Data.define exists.
Seeing examples from the [Values gem]:

require "values"

# A new class
Point = Value.new(:x, :y)

# An immutable instance
Origin = Point.with(x: 0, y: 0)

# Q: How do we make copies that change 1 or more values?
right        = Origin.with(x: 1.0)
up           = Origin.with(y: 1.0)
up_and_right = right.with(y: up.y)

# In loops
movements = [
  [ :x, +0.5 ],
  [ :x, +0.5 ],
  [ :y, -1.0 ],
  [ :x, +0.5 ],
]

# position = Point(x: 1.5, y: -1.0)
position = movements.inject(Origin) do |p, (field, delta)|
  p.with(field => p.send(field) + delta)
end

Proposed detail: Call this method: #with

Money = Data.define(:amount, :currency)

account = Money.new(amount: 100, currency: 'USD')

transactions = [+10, -5, +15]

account = transactions.inject(account) { |a, t| a.with(amount: a.amount + t) }
#=> Money(amount: 120, currency: "USD")

Why add this "Copy with changes" method to the Data simple immutable value class?

Called on an instance, it returns a new instance with only the provided parameters changed.

This API affordance is now widely adopted across many languages for its usefulness. Why is it so useful? Because copying immutable value object instances, with 1 or more discrete changes to specific fields, is the proper and ubiquitous pattern that takes the place of mutation when working with immutable value objects.

Other languages

C# Records: “immutable record structs — Non-destructive mutation” — is called with { ... }
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/record#nondestructive-mutation

Scala Case Classes — is called #copy
https://docs.scala-lang.org/tour/case-classes.html

Java 14+ Records — Brian Goetz at Oracle is working on adding a with copy constructor inspired by C# above as we speak, likely to be called #with
https://mail.openjdk.org/pipermail/amber-spec-experts/2022-June/003461.html

Rust “Struct Update Syntax” via .. syntax in constructor
https://doc.rust-lang.org/book/ch05-01-defining-structs.html#creating-instances-from-other-instances-with-struct-update-syntax

Alternatives

Without a copy-with-changes method, one must construct entirely new instances using the constructor. This can either be (a) fully spelled out as boilerplate code, or (b) use a symmetrical #to_h to feed the keyword-args constructor.

(a) Boilerplate using constructor

Point = Data.define(:x, :y, :z)
Origin = Point.new(x: 0.0, y: 0.0, z: 0.0)

change = { z: -1.5 }

# Have to use full constructor -- does this even work?
point = Point.new(x: Origin.x, y: Origin.y, **change)

(b) Using a separately proposed #to_h method and constructor symmetry

Point = Data.define(:x, :y, :z)
Origin = Point.new(x: 0.0, y: 0.0, z: 0.0)

change = { z: -1.5 }

# Have to use full constructor -- does this even work?
point = Point.new(**(Origin.to_h.merge(change)))

Notice that the above are not ergonomic -- leading so many of our peer language communities to adopt the #with method to copy an instance with discrete changes.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #19259: `Data#with` doesn't call `initialize` nor `initialize_copy`ClosedActions

Updated by bkuhlmann (Brooke Kuhlmann) over 1 year ago

💡 In case it's of interest, I've partially solved this problem in the Refinements gem several years back by adding the #merge and #merge! methods to Struct as a refinement. Here's the source code. I'm mostly posting this here in case it helps provide more example solutions to the problem mentioned in Alternative B above:

point = Point.new(**(Origin.to_h.merge(change)))

I've found that using keyword arguments, regardless of whether the struct was constructed using positionals or keywords, has been very effective in practice.

Updated by bdewater (Bart de Water) over 1 year ago

Example from Sorbet which uses with as well:

class Point < T::Struct
  const :x, Numeric
  const :y, Numeric
end

# An immutable instance
Origin = Point.new(x: 0, y: 0)

right = Origin.with(x: 1.0)
up = Origin.with(y: 1.0)
up_and_right = right.with(y: up.y)

Updated by RubyBugs (A Nonymous) over 1 year ago

Example from value_semantics gem which uses with as well:

Point = ValueSemantics::Struct.new do
  x Numeric
  y Numeric
end

# An immutable instance
Origin = Point.new(x: 0, y: 0)

right = Origin.with(x: 1.0)
up = Origin.with(y: 1.0)
up_and_right = right.with(y: up.y)

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

Since Ruby already has the Object#dup method, I think we don't need to create a new method to duplicate a Data instance with potentially slightly different values.

I think allowing the dup method on Data instances to take optional keyword arguments would be the ideal API for this:

Point = Data.define(:x, :y, :z)
ORIGIN = Point.new(x: 0.0, y: 0.0, z: 0.0)

right = ORIGIN.dup(x: 1.0)
up = ORIGIN.dup(y: 1.0)
up_and_right = right.dup(y: up.y)

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

ufuk (Ufuk Kayserilioglu) wrote in #note-4:

I think allowing the dup method on Data instances to take optional keyword arguments would be the ideal API for this:

I agree. This is the pattern that Sequel uses for datasets (which are always frozen, as Data instances are), except Sequel uses clone instead of dup (for historical reasons and because it also needs to copy singleton classes).

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

ufuk (Ufuk Kayserilioglu) wrote in #note-4:

I think allowing the dup method on Data instances to take optional keyword arguments would be the ideal API for this

I had a strong visceral "no" reaction to this. To a certain extent it's a logical idea, we are after all creating a kind of copy. But the dup/clone API is so old, so well established and so consistent throughout ruby, changing/extending it would be a mistake.

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

... Sequel uses clone instead of dup (for historical reasons and because it also needs to copy singleton classes).

I think copying singleton classes into such copies might NOT be desired behaviour for Data objects. So, it might make more sense to use dup for this. That means, it should work in the following way:

Point = Data.define(:x, :y, :z)
ORIGIN = Point.new(x: 0.0, y: 0.0, z: 0.0)
ORIGIN.singleton_class.define_method(:distance_from_origin) { 0 }

right = ORIGIN.dup(x: 1.0)
up = ORIGIN.dup(y: 1.0)
up_and_right = right.dup(y: up.y)

ORIGIN.distance_from_origin # => 0
right.distance_from_origin # => undefined method `distance_from_origin' for #<data Point x=1.0, y=0.0, z=0.0> (NoMethodError)
up.distance_from_origin # => undefined method `distance_from_origin' for #<data Point x=0.0, y=1.0, z=0.0> (NoMethodError)
up_and_right.distance_from_origin # => undefined method `distance_from_origin' for #<data Point x=1.0, y=1.0, z=0.0> (NoMethodError)

Dan0042 (Daniel DeLorme) wrote in #note-6:

ufuk (Ufuk Kayserilioglu) wrote in #note-4:
... But the dup/clone API is so old, so well established and so consistent throughout ruby, changing/extending it would be a mistake.

I am not sure we would be extending the API in any incompatible way. Calling dup with no arguments would still work as it should, by creating a copy of the original object. If one supplies, optional, keyword arguments to dup, though, then one would get the special copying that is suitable for Data objects. BTW, it should be an error to supply a keyword argument that the Data class does not have as members.

Point = Data.define(:x, :y, :z)
ORIGIN = Point.new(x: 0.0, y: 0.0, z: 0.0)

cloned_origin = ORIGIN.dup # => works as expected
illegal = ORIGIN.dup(w: 1.0) # => ArgumentError

Updated by RubyBugs (A Nonymous) over 1 year ago

it should be an error to supply a keyword argument that the Data class does not have as members

+1 Agree!

Please note, performance work should be planned to ensure that such checking of the keyword args is as fast as possible and avoids unnecessary allocations.

For example, here's a real life pull request which sped up this error checking the Values gem #with method by 2.29x:
https://github.com/ms-ati/Values/commit/c3f00e3e8d67ff5d6470315ad137166177f079e2

Updated by tomstuart (Tom Stuart) over 1 year ago

While it’s undeniable that this method duplicates its receiver in some sense, I’d like to add my support for the #with naming choice.

I don’t have any technical objection to using #dup, it’s just a stylistic preference: in the interests of programmer happiness, to me it’s more natural for the name to communicate the intent of the sender (“I want an object just like ORIGIN, except #with an x of 1.0”) rather than the lower-level implementation detail of what that operation involves (“I want a shallow clone of ORIGIN via #allocate & #initialize_copy, except assigning 1.0 as the value of the @x instance variable”).

Perhaps another way of putting this is that I’d estimate most Ruby programmers rarely call Object#dup in the course of their work, whereas I would expect users of Data to need this new “copy with changes” operation somewhat frequently, especially if they’re migrating existing code from Struct. But of course I have no data to support that guess!

Updated by p8 (Petrik de Heus) over 1 year ago

I'm seeing some overlap with "Object#with to set and restore attributes around a block" https://bugs.ruby-lang.org/issues/18951

Should we have a single method Object#with_attrs that:

  • returns a copy with the attributes set by default
  • sets and restores the attributes when passed a block? (although that doesn't make sense for the immutable Data)
Origin = Point.new(x: 0, y: 0)
right = Origin.with_attrs(x: 1.0)
right.x # => 1.0

SomeLibrary.with_attrs(enabled: true) do
  SomeLibrary.enabled # => true
end
SomeLibrary.enabled # => false

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

  • Description updated (diff)

At the developers meeting today, someone proposed something like this.

point = Point.new(**(Origin.to_h.merge(change)))
point = Point.new(**Origin.to_h, **change)

Updated by tomstuart (Tom Stuart) over 1 year ago

nobu (Nobuyoshi Nakada) wrote in #note-11:

At the developers meeting today, someone proposed something like this.

point = Point.new(**Origin.to_h, **change)

Agreed — this is what I do at the moment. There’s no need to #merge explicitly because “last keyword wins”, and although repeating a literal keyword will generate a warning, the same isn’t true when double-splatting the result of #to_h.

Updated by mame (Yusuke Endoh) over 1 year ago

@RubyBugs (A Nonymous) I cannot understand the following parts of your first motivative example

# In loops
movements = [
  { x: +0.5 },
  { x: +0.5 },
  { y: -1.0 },
  { x: +0.5 },
]

# position = Point(x: 1.5, y: -1.0)
position = movements.inject(Origin) { |p, move| p.with(**move) }

According to the comment, do you really expect Point.new(x: 0.0+0.5+0.5+0.5, y: 0.0-1.0)? I don't think the proposed operation should mean addition implicitly.

If it is a mistake, could you update the code example?

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

I just put up a PR implementing this using the dup proposal: https://github.com/ruby/ruby/pull/6766

Here are my objections to other proposals:

  1. point = Point.new(**Origin.to_h, **change) works but is very verbose and not intention revealing, in my opinion. For the reader of this code, it is very hard to understand what the intended operation is.
  2. with_attrs (from https://bugs.ruby-lang.org/issues/18951): That proposal is explicitly about mutating the receiver of with_attrs, which is the exact opposite of what this "copy with changes" operation is trying to do. For that reason, I don't think we should be conflating that with this proposal.
  3. with: While I understand that there is precedent for using with from other languages, I have always found the name to not make it clear that a copy is being made. Since with implies doing something "with" original receiver, it leads less experienced folks to think that the receiver might be modified in the process. I think we have an opportunity to make the operation more clear by giving it a better name.

My reasoning for dup is to make it explicit that a copy is being created (just as how a string copy is created when "foo".dup is called) but that the copy has updated values for the supplied member fields. dup is already an existing protocol for creating shallow copies of objects, and can already be used on Data class instance even without this change, of course without being able to affect the copy in any way. Extending the existing copying protocol of Ruby to optionally add the ability to affect the copy feels like the best way forward to me.

I would be happy to hear feedback from the community on my PR and if a different name than dup is picked in the end, also would be happy to update the PR to conform.

Updated by bdewater (Bart de Water) over 1 year ago

I like dup as the method name 👍

tomstuart (Tom Stuart) wrote in #note-9:

Perhaps another way of putting this is that I’d estimate most Ruby programmers rarely call Object#dup in the course of their work, whereas I would expect users of Data to need this new “copy with changes” operation somewhat frequently, especially if they’re migrating existing code from Struct. But of course I have no data to support that guess!

I don't think dup is that rare, IME most Ruby programmers are familiar with it even if they don't use it daily. Some unscientific data points of usage in Rails apps:

Updated by RubyBugs (A Nonymous) over 1 year ago

bdewater (Bart de Water) wrote in #note-15:

I like dup as the method name 👍

Is there a way we could get more active Rubyists to weigh in? My sense is that there is a real tension in that:

  • Nearly every major value objects gem use #with
  • Most other language with value objects use #with

On the other hand, there seem to be a number of voices on this Bug thread, who while they don't necessarily currently work with code that uses this exact pattern, feel strongly that overloading the meaning of #dup is a better choice for Ruby. Even the amazing and incomparable @jeremyevans, whose Sequel gem our team depends on as well :)

While it might not generally be the practice of the Ruby community, would we consider a way to get more "working Rubyist" eyes on this question? Perhaps by getting this thread in to the Ruby Weekly News for example?

In the end, having the method is more important than the name. But it does seem important to let voices be heard?

Updated by mame (Yusuke Endoh) over 1 year ago

@RubyBugs (A Nonymous) Please check my comment https://bugs.ruby-lang.org/issues/19000#note-13 . A wrong motivation example raises the suspicion that this API is actually confusing to users.

Updated by p8 (Petrik de Heus) over 1 year ago

If dup is chosen would it make sense to always allow dup methods to take arguments for consistency?

For example, it would allow the following code in Rails

firm = Firm.first.dup
firm.account = Account.first
assert_queries(2) { firm.save! }

(https://github.com/rails/rails/blob/8e34831f97acd7448fd68a6ac130694079aea951/activerecord/test/cases/autosave_association_test.rb#L277-L279)

to be written as:

firm = Firm.first.dup(account: Account.first)
assert_queries(2) { firm.save! }

Updated by Eregon (Benoit Daloze) over 1 year ago

p8 (Petrik de Heus) wrote in #note-18:

If dup is chosen would it make sense to always allow dup methods to take arguments for consistency?

I'm rather against that, it makes the standard dup so much more complicated, and it would most likely be pretty slow in CRuby.
If it's only for Data it's a much smaller concern.

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

p8 (Petrik de Heus) wrote in #note-18:

If dup is chosen would it make sense to always allow dup methods to take arguments for consistency?

It would make sense, but it would require everyone who wrote a custom #dup to extend their version to support this new API pattern. Not likely to happen.

On the other hand with #with it would be possible to implement a general version for all objects

def with(attr={})
  attr.each_with_object(self.dup) do |(k,v),obj|
    obj.send("#{k}=", v)
  end
end

(but let's keep in mind this proposal is about Data#with, not Object#with)

Updated by RubyBugs (A Nonymous) over 1 year ago

mame (Yusuke Endoh) wrote in #note-17:

@RubyBugs (A Nonymous) Please check my comment https://bugs.ruby-lang.org/issues/19000#note-13 . A wrong motivation example raises the suspicion that this API is actually confusing to users.

Quite right @mame (Yusuke Endoh), thank you for catching that! I provided two examples. The first example is intended to use the previous values in creating the next values, and that part was broken.

How is this?

##
# Example of proposed Data#with
#
# To try this out:
#
#     ruby-install ruby-3.2.0-preview3
#
##

Point = Data.define(:x, :y) do
  # Example only, too slow. Real implementation should eliminate allocations.
  def with(**args)
    raise ArgumentError unless args.keys.all? { |k| members.include? k }
    self.class.new(**(to_h.merge(args)))
  end  
end

Origin = Point.new(x: 0, y: 0)

# Q: How do we make copies that change 1 or more values?
right        = Origin.with(x: 1.0)
up           = Origin.with(y: 1.0)
up_and_right = right.with(y: up.y)

begin
  Origin.with(x: 1, z: 0)
rescue ArgumentError
  puts "#with(z: 0) raised ArgumentError as expected"
end

# In loops
movements = [
  [ :x, +0.5 ],
  [ :x, +0.5 ],
  [ :y, -1.0 ],
  [ :x, +0.5 ],
]

position = movements.inject(Origin) do |p, (field, delta)|
  p.with(field => p.send(field) + delta) 
end

# position = Point(x: 1.5, y: -1.0)

Updated by RubyBugs (A Nonymous) over 1 year ago

For comparison, you can plug in the following example code for a faster implementation of #with and it should work the same:

Point = Data.define(:x, :y) do
  # Example only, too slow. Real implementation should eliminate allocations.
  # def with(**args)
  #   raise ArgumentError unless args.keys.all? { |k| members.include? k }
  #   self.class.new(**(to_h.merge(args)))
  # end  

  # A trick to avoid `nil` for defaults, so that `nil` can be a valid value
  DEFAULT = Class.new.new.freeze

  # An example of a faster implementation that could inspire auto-generation
  def with(x: DEFAULT, y: DEFAULT)
    self.class.new(
      x: x.equal?(DEFAULT) ? self.x : x,
      y: y.equal?(DEFAULT) ? self.y : y,
    )
  end  
end
Actions #23

Updated by RubyBugs (A Nonymous) over 1 year ago

  • Description updated (diff)

Updated by mame (Yusuke Endoh) over 1 year ago

RubyBugs (A Nonymous) wrote in #note-21:

How is this?

Thanks for the update.

Now I have a question. Do you really want to write p.with(field => p.send(field) + delta)? I don't think it is very elegant. It is not very convincing (at least, to me) as a first motivation example.

Also, do you need the ability to update multiple fields at once? Both motivation examples only update a single field. This may be the result of simplifying the motivation example, though.

Looking at these motivation examples, there may be room to consider an API like p.with(field) {|old_value| old_value + delta } or something.

Updated by mame (Yusuke Endoh) over 1 year ago

Discussed at the dev meeting.

Regarding method names, @matz (Yukihiro Matsumoto) likes Data#update. However, @ko1 (Koichi Sasada) objects because "update" is used as a destructive vocabulary in Hash#update. @matz (Yukihiro Matsumoto) said Data#with is acceptable; Data#dup is not acceptable.

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

Thanks for the discussion and the name suggestion. I updated the PR https://github.com/ruby/ruby/pull/6766 to use Data#with. I would be grateful if it could get a review.

However, that raises a question: How should Data#with behave if it is passed no arguments? Semantically, that should mean "create a copy of the current data object with no fields changed", but then that ends up being identical to what Kernel#dup does. Also, with with no arguments does not read that well.

@matz (Yukihiro Matsumoto) and @mame (Yusuke Endoh) Should it be an error?

Example:

Point = Data.define(:x, :y)
origin = Point.new(x: 0, y: 0)
new_origin = origin.with # should this be an error?

Updated by matz (Yukihiro Matsumoto) over 1 year ago

I think it should be an error. I am still not 100% sure with is the best name.

Matz.

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

Thank you @matz (Yukihiro Matsumoto). I will update the implementation to make the no-args case an error, and point people to use dup to make identical shallow clones instead in the error message.

I read the dev meeting notes and saw that you made a point about the main of the operation not being about duplication. While I agree with that, I also feel that makes people think that they are making a cheap operation, when they are actually creating a clone with modified values. I had suggested dup to make that explicit, but happy to go with another name that is chosen that gives a similar sense of an extra clone operation happening under the hood. Another name that was suggested was a dup_with that I also find acceptable personally.

Along the same lines, I feel like update gives the wrong impression that the receiver instance will be updated, so my vote would be against it.

Updated by RubyBugs (A Nonymous) over 1 year ago

ufuk (Ufuk Kayserilioglu) wrote in #note-28:

Thank you @matz (Yukihiro Matsumoto). I will update the implementation to make the no-args case an error, and point people to use dup to make identical shallow clones instead in the error message.

Hi Matz and Ufuk, please consider that making no-args an error makes programming around this API dynamically more complicated — to generate a set of changes as a Hash and pass then goes from:

changes = {} # calculated somehow
p = Origin.with(**changes) # p == Origin is ok

to:

changes = {} # calculated somehow
p = unless changes.empty
      Origin.with(**changes)
    else
      Origin
    end

Because dynamically generated changes is an anticipated use for such an API, would we consider allowing calling #with with no arguments to return self?

Semantically one could describe this API behavior as: “an immutable value object with no changes is itself”?

As another reference point: in Scala case class this operation is called #copy, and I think it returns the original object instance in the no-args case.
https://docs.scala-lang.org/tour/case-classes.html

Updated by tomstuart (Tom Stuart) over 1 year ago

RubyBugs (A Nonymous) wrote in #note-29:

please consider that making no-args an error makes programming around this API dynamically more complicated

I second this request. In the interests of regularity, it should be fine to ask for a Data instance #with “no changes”, otherwise the programmer has to anticipate & handle that as an edge case unnecessarily if they intend to generically handle changes which originate from elsewhere rather than supplying keyword arguments inline.

Whether doing so returns self or a shallow copy shouldn’t be very important since the instances are immutable, but I agree that returning self would be more semantically predictable, slightly more efficient, and a nice way to differentiate it from the behaviour of #dup.

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

RubyBugs (A Nonymous) wrote in #note-29:

Hi Matz and Ufuk, please consider that making no-args an error makes programming around this API dynamically more complicated

Indeed, that is true. However, personally, I find a with call with no arguments very hard to read and reason about. It just reads badly as an English construct. It is like saying, "can I have tea with".

Semantically one could describe this API behavior as: “an immutable value object with no changes is itself”?

Indeed this is a good idea for what the behaviour should be but does not alleviate the naming/readability concern that I've raised above.

As another reference point: in Scala case class this operation is called #copy, and I think it returns the original object instance in the no-args case.
https://docs.scala-lang.org/tour/case-classes.html

Indeed, it was along these lines that I'd suggested dup originally. Like I've said before, I am flexible on what the name should be and copy actually addresses the naming concern equally as well as dup in my opinion. It also makes sense to ask for a "copy" with no values changed and readability is not hurt in any way.

Updated by austin (Austin Ziegler) over 1 year ago

ufuk (Ufuk Kayserilioglu) wrote in #note-31:

RubyBugs (A Nonymous) wrote in #note-29:

As another reference point: in Scala case class this operation is called #copy, and I think it returns the original object instance in the no-args case.
https://docs.scala-lang.org/tour/case-classes.html

Indeed, it was along these lines that I'd suggested dup originally. Like I've said before, I am flexible on what the name should be and copy actually addresses the naming concern equally as well as dup in my opinion. It also makes sense to ask for a "copy" with no values changed and readability is not hurt in any way.

I think that #copy is OK, but this is more of a (checked?) #merge, right? I think that would be where my question comes in. If origin is a Point containing x and y values, then origin#copy(x: 1, y: 2, z: 3) should do…what? Should it error or just take the values that it knows? Because the shape provided to #copy is no longer a Point shape.

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

austin (Austin Ziegler) wrote in #note-32:

If origin is a Point containing x and y values, then origin#copy(x: 1, y: 2, z: 3) should do…what? Should it error or just take the values that it knows?

The current implementation in the PR treats unknown keywords passed to this method as an error. I had already made it clear that I thought it should be an error at the end of my message earlier in this thread https://bugs.ruby-lang.org/issues/19000#note-7 and I think we are all in agreement on this point in this thread.

Updated by RubyBugs (A Nonymous) over 1 year ago

…treats unknown keywords passed to this method as an error. I had already made it clear that I thought it should be an error at the end of my message earlier in this thread … and I think we are all in agreement on this point in this thread.

+1 agreement on treating unknown keywords as error! :-)

Updated by RubyBugs (A Nonymous) over 1 year ago

Hi @mame (Yusuke Endoh)! Thank you for your questions.

mame (Yusuke Endoh) wrote in #note-24:

Thanks for the update.

Now I have a question. Do you really want to write p.with(field => p.send(field) + delta)? I don't think it is very elegant. It is not very convincing (at least, to me) as a first motivation example.

The challenge is that this operation "make a copy of an immutable value object with 0 or more fields changed" is so fundamental, it's hard to find examples that strip away every other concern.

Staying with the Point example, would translate and invert work better as examples that change multiple fields? In this example, let's introduce a 3 dimensional Point, and then 2-dimensional operations on it, as follows:

##
# Example of proposed Data#with
#
# To try this out:
#
#     ruby-install ruby-3.2.0-preview3
#
##

Point3d = Data.define(:x, :y, :z) do
  # Example only, too slow due to allocations
  def with(**args)
    raise ArgumentError unless args.keys.all? { |k| members.include? k }
    self.class.new(**(to_h.merge(args)))
  end  

  def translate_2d(dx: 0, dy: 0)
    with(x: x + dx, y: y + dy)
  end

  def invert_2d
    with(x: -x, y: -y)
  end  
end

Origin3d = Point3d.new(x: 0, y: 0, z: 0)
north = Origin3d.translate_2d(dy: 1.0)
south = north.invert_2d
east = Origin3d.translate_2d(dx: 1.0)
west = east.invert_2d

west == Point3d.new(x: -1.0, y: 0, z: 0)
# => true

mountain_height = 2.0
western_mountain = west.with(z: mountain_height)
# => #<data Point3d x=-1.0, y=0, z=2.0>
eastern_mountain = western_mountain.invert_2d
# => #<data Point3d x=1.0, y=0, z=2.0>

Also, do you need the ability to update multiple fields at once? Both motivation examples only update a single field. This may be the result of simplifying the motivation example, though.

Yes! Definitely need to update multiple fields at once

Looking at these motivation examples, there may be room to consider an API like p.with(field) {|old_value| old_value + delta } or something.

Agreed that this probably comes out of a motivating example which is trying to calculate the new value for a single field based on only the previous value of that field.

Updated by Eregon (Benoit Daloze) over 1 year ago

I also agree the 0-kwargs case should be supported for general use like data.with(**changes), whether changes is empty or not. And that reads fine of course.
It's also like Hash#merge with no arguments.
I think #with with empty kwargs should do the same as dup. So this depends on what Data#dup does.
If Data#dup returns self like immutable objects (Fixnum/Symbol/true/false/nil/etc) then let's do that too, if not let's dup in that case.
Note that Data is not truly immutable (nor always Ractor-shareable) because it can refer to mutable objects.
Currently Data#dup creates a new instance.

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Summary of the discussion so far:


My personal notes:

I'm late to the party, but in my experience of using data classes in other languages, this feels like a must-have for data classes. It'd be unfortunate if Data were released without it.

Also, do you need the ability to update multiple fields at once? Both motivation examples only update a single field.

Yes! Definitely need to update multiple fields at once

I second that. In my previous company, we had an immutable class that represented a state/snapshot for stream-processing multiple events at once (with Amazon Kinesis), which is used by a function that takes a state and returns the next state, therefore not mutating it in the middle of the function. It groups multiple events and dispatches a batch operation when a certain threshold is met. Because we had multiple thresholds (e.g. max size, max duration), we naturally needed to create a copy with multiple updates when we create the next state.

I also agree the 0-kwargs case should be supported for general use like data.with(**changes), whether changes is empty or not. And that reads fine of course.
It's also like Hash#merge with no arguments.

+1. I'm not sure if it's possible to distinguish them in Ruby, but ideally data.with or data.with() should be rejected even if we accept data.with(**changes) when changes are empty. If it's feasible, then it might clear Matz's concern at #note-27.

Updated by Eregon (Benoit Daloze) over 1 year ago

k0kubun (Takashi Kokubun) wrote in #note-37:

+1. I'm not sure if it's possible to distinguish them in Ruby, but ideally data.with or data.with() should be rejected even if we accept data.with(**changes) when changes are empty.

That's not possible to differentiate and also it shouldn't be (it would be very confusing if it was different).
Because foo(**empty_hash) should always be exactly the same as foo().

If it's feasible, then it might clear Matz's concern at #note-27.

@matz (Yukihiro Matsumoto) Is it really that big a concern?
I think it's far far more important that with(**changes) works, whether changes is empty or not.
Honestly I think there is no point to prevent that, it's necessary for the with(**changes) use case and it seems pretty much harmless.

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

Status update

@alanwu (Alan Wu) and I reworked the PR at https://github.com/ruby/ruby/pull/6766 to ensure the following:

  1. Calls to .with with no arguments is valid and returns the receiver as-is. As @Eregon (Benoit Daloze) said, there is no way to differentiate between no args and empty kwargs, so the ability to do .with(**hash) freely won the discussion over.
  2. The implementation works correctly with Data objects being frozen now.
  3. The implementation checks explicitly for keyword arguments and raises argument error for hashes passed as positional argument, as well.

Apart from the decision on the name, the PR should be good to merge. As @k0kubun (Takashi Kokubun) said above, I also feel like this should be a part of the Ruby 3.2 release.

If the only blocker is the name, @matz (Yukihiro Matsumoto) can you pick the final name in the next few days, so that we can ship this with 3.2?

Thank you everyone for the discussion and ideas.

Updated by matz (Yukihiro Matsumoto) over 1 year ago

with seems slightly better than other candidates. Data is (or can be considered as) a value object without identity. So names with identity implication (dup, copy, clone) are not ideal. I like update but I understand @ko1's concern (Hash#update modifies the receiver).

Regarding error-on-no-argument, I thought passing no argument (even via double splat **) indicates some kind of logic error, and honored fail-early principle, but as @Eregon (Benoit Daloze) said, it's not a big deal.

I am not sure @naruse (Yui NARUSE) accepts this last minute change, but I am OK with the last PR from @ufuk (Ufuk Kayserilioglu).

Matz.

Updated by naruse (Yui NARUSE) over 1 year ago

I'm also ok to merge!
This will ship as a part of Ruby 3.2!

Actions #42

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|99cee85775bc5656b942bf4e1b0568a8f40addcd.


Add copy with changes functionality for Data objects (#6766)

Implements [Feature #19000]

This commit adds copy with changes functionality for Data objects
using a new method Data#with.

Since Data objects are immutable, the only way to change them is by
creating a copy. This PR adds a with method for Data class instances
that optionally takes keyword arguments.

If the with method is called with no arguments, the behaviour is the
same as the Kernel#dup method, i.e. a new shallow copy is created
with no field values changed.

However, if keyword arguments are supplied to the with method, then
the copy is created with the specified field values changed. For
example:

    Point = Data.define(:x, :y)
    point = Point.new(x: 1, y: 2)
    point.with(x: 3) # => #<data Point x: 3, y: 2>

Passing positional arguments to with or passing keyword arguments to
it that do not correspond to any of the members of the Data class will
raise an ArgumentError.

Co-authored-by: Alan Wu

Updated by ufuk (Ufuk Kayserilioglu) over 1 year ago

Thank you @matz (Yukihiro Matsumoto) and @naruse (Yui NARUSE). I hate that this ended up being a last-minute change, but it seemed to be little risk adding a method to a new interface and the benefit seemed to greatly outweigh the risk. Otherwise, I would not have asked for consideration.

Thanks again for your understanding. 🙇‍♂️

Actions #44

Updated by Eregon (Benoit Daloze) over 1 year ago

  • Related to Bug #19259: `Data#with` doesn't call `initialize` nor `initialize_copy` added
Actions

Also available in: Atom PDF

Like3
Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like1Like0Like0Like0Like1Like1Like0Like0Like0