Project

General

Profile

Actions

Feature #16122

closed

Data: simple immutable value object

Added by zverok (Victor Shepelev) about 3 years ago. Updated 5 days ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:94508]

Description

Intro (original theoretical part of the proposal)

Value Object is a useful concept, introduced by Martin Fowler (his post, Wikipedia Entry) with the following properties (simplifying the idea):

  • representing some relatively simple data;
  • immutable;
  • compared by type & value;
  • nicely represented.

Value objects are super-useful especially for defining APIs, their input/return values. Recently, there were some movement towards using more immutability-friendly approach in Ruby programming, leading to creating several discussions/libraries with value objects. For example, Tom Dalling's gem, Good Ruby Value object convention (disclaimer: the latter is maintained by yours truly).

I propose to introduce native value objects to Ruby as a core class.

Why not a gem?

  • I believe that concept is that simple, that nobody will even try to use a gem for representing it with, unless the framework/library used already provides one.
  • Potentially, a lot of standard library (and probably even core) APIs could benefit from the concept.

Why Struct is not enough

Core Struct class is "somewhat alike" value-object, and frequently used instead of one: it is compared by value and consists of simple attributes. On the other hand, Struct is:

  • mutable;
  • collection-alike (defines to_a and is Enumerable);
  • dictionary-alike (has [] and .values methods).

The above traits somehow erodes the semantics, making code less clear, especially when duck-typing is used.

For example, this code snippet shows why to_a is problematic:

Result = Struct.new(:success, :content)

# Now, imagine that other code assumes `data` could be either Result, or [Result, Result, Result]
# So, ...

data = Result.new(true, 'it is awesome')

Array(data) # => expected [Result(true, 'it is awesome')], got [true, 'it is awesome']

# or...
def foo(arg1, arg2 = nil)
p arg1, arg2
end

foo(*data) # => expected [Result(true, 'it is awesome'), nil], got [true, 'it is awesome']

Having [] and each defined on something that is thought as "just value" can also lead to subtle bugs, when some method checks "if the received argument is collection-alike", and value object's author doesn't thought of it as a collection.

Data class: consensus proposal/implementation, Sep 2022

  • Name: Data
  • PR: https://github.com/ruby/ruby/pull/6353
  • Example docs rendering: https://zverok.space/ruby-rdoc/Data.html
  • Full API:
    • Data::define creates a new Data class; accepts only symbols (no keyword_init:, no "first argument is the class name" like the Struct had)
    • <data_class>::members: list of member names
    • <data_class>::new: accepts either keyword or positional arguments (but not mix); converts all of the to keyword args; raises ArgumentError if there are too many positional arguments
    • #initialize: accepts only keyword arguments; the default implementation raises ArgumentError on missing or extra arguments; it is easy to redefine initialize to provide defaults or handle extra args.
    • #==
    • #eql?
    • #inspect/#to_s (same representation)
    • #deconstruct
    • #deconstruct_keys
    • #hash
    • #members
    • #to_h

Historical original proposal

  • Class name: Struct::Value: lot of Rubyists are used to have Struct as a quick "something-like-value" drop-in, so alternative, more strict implementation, being part of Struct API, will be quite discoverable; alternative: just Value
  • Class API is copying Structs one (most of the time -- even reuses the implementation), with the following exceptions (note: the immutability is not the only difference):
    • Not Enumerable;
    • Immutable;
    • Doesn't think of itself as "almost hash" (doesn't have to_a, values and [] methods);
    • Can have empty members list (fun fact: Struct.new('Foo') creating member-less Struct::Foo, is allowed, but Struct.new() is not) to allow usage patterns like:
class MyService
  Success = Struct::Value.new(:results)
  NotFound = Struct::Value.new
end

NotFound here, unlike, say, Object.new.freeze (another pattern for creating "empty typed value object"), has nice inspect #<value NotFound>, and created consistently with the Success, making the code more readable. And if it will evolve to have some attributes, the code change would be easy.

Patch is provided

Sample rendered RDoc documentation


Files

struct_value.patch (18.6 KB) struct_value.patch zverok (Victor Shepelev), 08/23/2019 05:40 PM

Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #16769: Struct.new(..., immutable: true)RejectedActions

Updated by Eregon (Benoit Daloze) about 3 years ago

This sounds interesting to me.
What would a simple implementation of Struct::Value.new look like in Ruby code?
I'm not quite sure what the available API is since it's all described as Struct - some methods.

Updated by zverok (Victor Shepelev) about 3 years ago

@Eregon (Benoit Daloze), here is rendered version of class' docs: https://zverok.github.io/ruby-rdoc/Struct-Value.html

Basically, it is what is said on the tin: like Struct, just leaner.

Updated by zverok (Victor Shepelev) about 3 years ago

What would a simple implementation of Struct::Value.new look like in Ruby code?

Oh, I've probably answered the wrong question... But I am not quite sure I understand yours.

Theoretically, it is just something like this (ignoring the fact that Structs implementation has optimized storage and other tricks, and any input validation):

class Struct::Value
  def self.new(*args, keyword_init: false)
    name, *members = args.first.is_a?(String) ? args : [nil, *args]
    Class.new(self) do
      @members = members

      def self.new(*args)
        allocate.tap { |o| o.__send__(:initialize, *args) }
      end

      members.each { |m| define_method(m) { instance_variable_get("@#{m}") }}
    end.tap { |cls| const_set(name, cls) if name }
  end
# ....

So, (if that's what you've asking) it produces object of different class, Struct::Value, unrelated to Struct, but sharing most of the implementation.

Updated by matz (Yukihiro Matsumoto) about 3 years ago

  • Status changed from Open to Feedback

The typical solution is Struct.new(...).freeze. This doesn't require any enhancement. The other option is Struct.new(..., immutable: false). It looks simpler than the proposed Struct::Value.

Matz.

Updated by zverok (Victor Shepelev) about 3 years ago

@matz (Yukihiro Matsumoto) Sorry for not sharing more detailed reasoning which led to the current proposal (I explained the "final reasons" in its text, but it is too terse).

So, it went as following:

1. First, I really wanted just Struct.new(..., immutable: false) (and even experimented for some time with a private monkey-patch, doing just that)

2. But in fact, to be a proper convenient "value object", it is also bad for container to mimic Enumerable, and especially bad to implement to_a. Simple example:

Result = Struct.new(:success, :content)

# Now, imagine that other code assumes `data` could be either Result, or [Result, Result, Result]
# So, ...

data = Result.new(true, 'it is awesome')

Array(data) # => expected [Result(true, 'it is awesome')], got [true, 'it is awesome']

# or...
def foo(arg1, arg2 = nil)
  p arg1, arg2
end

foo(*data) # => expected [Result(true, 'it is awesome'), nil], got [true, 'it is awesome']

3. And generally, some random value object "duck typing" itself as a collection seems not really appropriate.

4. The same, I believe, is related to supporting [:foo] and ['foo'] accessors: convenient for "general content object" that Struct is, but for "just value" it could seem an unnecessary expansion of the interface.

5. Finally, empty-member Value is allowed, while empty-member Struct somehow does not (I don't know if it is by design or just a bug, as I am mentioning above, Struct.new('Name') IS allowed, but Struct.new is NOT).

So, considering all the points above, it could be either multiple settings: immutable: true, enumerable: false, hash_accessors: false (the (5) probably could be just fixed for Struct, too) -- which is not that convenient if you are defining 3-5 types in a row, and requires some cognitive efforts both from writer (errm, what options did I used last time to set it as a "good" value object?..) and reader (ugh, what's this struct with so many settings?..).

So I eventually decided to propose going another way.

Updated by Dan0042 (Daniel DeLorme) about 3 years ago

If I understand correctly, the idea is to have X=Struct::Value.new(:x,:y,:z) which is strictly equivalent to

class X
  def initialize(x=nil, y=nil, z=nil)
    @x,@y,@z = x,y,z
  end
  attr_reader :x, :y, :z
  #and other methods based on x,y,z attributes:
  #def ==(other)
  #def eql?(other)
  #def hash 
end

Or was there some nuance I didn't catch?

Updated by zverok (Victor Shepelev) about 3 years ago

@Dan0042 (Daniel DeLorme) you are (probably) missing #inspect, #==, #eql?, #hash, #to_h and a bunch of other methods that are pretty trivial, but also important for the "value object".

Updated by mame (Yusuke Endoh) about 3 years ago

I couldn't understand what is "value object", and I found: https://martinfowler.com/bliki/ValueObject.html
Please do not assume that everybody knows such an essay ;-)
No one pointed out the article during the developer's meeting, so we cannot understand what you want.

I have some comments:

  • Why don't you start it with a gem? It may be useful for your case, but I'm not sure if it is useful for many people so that it deserves a built-in feature. And the design of Struct::Value is not clear to me (e.g., non-Enumerable is trade off; is it really useful for many cases?).If your gem become so popular, we can import it as a built-in feature.
  • The behavior of Struct::Value is too different from Struct. Another class name (like ValueClass or NamedTuple or what not) looks more suitable.
  • What you (first) want is called "structural equality" in other languages (OCaml, F#, C#, TypeScript, Kotlin, as far as I know). Also it resembles "namedtuple" in Python. You may want to study them.

BTW, I understand the motivation of the proposal. I want "structural equality" in Ruby. Personally, I often write:

class Point3D
  include StructuralEquality
  def initialize(x, y, z)
    @x, @y, @z = x, y, z
  end
end

foo1 = Point3D.new(1, 2, 3)
foo2 = Point3D.new(1, 2, 3)
p foo1 == foo2 #=> true
h = { foo1 => "ok" }
p h[foo2] #=> "ok"

(The definition of StructuralEquality is here: https://github.com/mame/ruby-type-profiler/blob/436a10787fc74db47a8b2e9db995aa6ef7c16311/lib/type-profiler/utils.rb#L8-L31 )

But, I'm unsure if it deserves a built-in feature.

Updated by zverok (Victor Shepelev) about 3 years ago

@mame (Yusuke Endoh) I understand your concerns. I'll update the description today or tomorrow to include all the terminology and detailed rationale behind the proposal.

Updated by zverok (Victor Shepelev) about 3 years ago

  • Description updated (diff)

@mame (Yusuke Endoh), @matz (Yukihiro Matsumoto), I updated the description, tried to include a proper rationale for every design decision made.

Updated by naruse (Yui NARUSE) about 3 years ago

I believe that concept is that simple, that nobody will even try to use a gem for representing it with, unless the framework/library used already provides one.

I'm using immutable_struct.gem.

Updated by zverok (Victor Shepelev) about 3 years ago

@naruse (Yui NARUSE) Of course, there are several good gems with more-or-less similar functionality. But, from the hard experience, large codebases tend to look with great caution on the "small utility" gems to avoid dependency bloat and tend to depend only on large non-trivial functionality. But if it is a part of the language core, it is beneficial for everyone.

Updated by Dan0042 (Daniel DeLorme) about 3 years ago

Question: you say "Doesn't think of itself as almost hash" but at the same time you say it should have to_h. Isn't that a contradiction? What exactly are you looking for?

Naming suggestion: BasicStruct (in parallel to Object and BasicObject)

Updated by zverok (Victor Shepelev) about 3 years ago

@Dan0042 (Daniel DeLorme)

Question: you say "Doesn't think of itself as almost hash" but at the same time you say it should have to_h. Isn't that a contradiction?

Nope. An object that has to_h is not "something hash-like", it is just "something that can be represented as a Hash" (for example, to save it to JSON). The same way that all Ruby objects respond to to_s but that doesn't make them "something like String".

But "mimicking" some of the Hash API (with [] and values and values_at) makes the object responsibility less focused.

Updated by Dan0042 (Daniel DeLorme) about 3 years ago

Ok I see what you meant. BTW Struct#values_at follows the Array rather than Hash API, because Struct also thinks of itself as a tuple :-/

 Struct.new(:x).new(42).values_at(0)  #=> [42]
 Struct.new(:x).new(42).values_at(:x) #=> TypeError

Updated by palkan (Vladimir Dementyev) about 3 years ago

zverok (Victor Shepelev) wrote:

Why not a gem?

  • I believe that concept is that simple, that nobody will even try to use a gem for representing it with, unless the framework/library used already provides one.

If a concept is popular and there is a well-designed gem that implements it then people use it. For example, a lot of people use dry-initializer, which is also dead-simple and provides the functionality that could easily be implemented from scratch (and even could be useful as a part of the standard library).

If there is still no such a gem then there is no enough demand for the feature itself.

So, why pushing it to the core?

Updated by zverok (Victor Shepelev) about 3 years ago

@palkan (Vladimir Dementyev) I have a strong feeling of "value object notion should be a part of the language, not an externally implemented optional thingy", but it is not easy to rationalize it.

Maybe the thing is that "value object" is a notion most useful at API borders (and it is not just utility usability, but conceptual one, "our API accepts this and that type of value objects and return this and that type of them"). And I believe "this is a concept of the language" makes a huge difference in using, documenting and explaining your APIs, compared to "well, we use that external gem, developed by some random dude, to bloat our depenencies, because it is tinsy bit more convenient."

In other words, I am proposing to introduce the concept, not implementation.

Updated by Dan0042 (Daniel DeLorme) almost 3 years ago

zverok (Victor Shepelev) wrote:

So, considering all the points above, it could be either multiple settings: immutable: true, enumerable: false, hash_accessors: false

I think that's a great idea. That way it's possible for everyone to mix and match the behavior they want in their structs. For example let say I want a struct to be mutable but not enumerable (because of the Array(mystruct) bug shown above), the Struct::Value approach doesn't work. If you find yourself always repeating the same options, it's trivial to write your own ValueStruct helper function.

Or maybe Struct could include a few built-in helpers

  • Struct.Value => immutable: true, enumerable: false, hash_accessors: false
  • Struct.Basic => immutable: false, enumerable: false, hash_accessors: false
  • Struct.Tuple => immutable: false, enumerable: true, hash_accessors: false

Updated by matz (Yukihiro Matsumoto) almost 3 years ago

I like the idea of helpers in https://bugs.ruby-lang.org/issues/16122#note-18.
We need to discuss further the combination of attributes (immutable, enumerable, etc.)

Matz.

Actions #20

Updated by k0kubun (Takashi Kokubun) over 2 years ago

Updated by Eregon (Benoit Daloze) over 2 years ago

We already have Struct.new(..., keyword_init: true).
I think having other variants like immutable: true, enumerable: false, hash_accessors: false is consistent and flexible.

Having only the helpers like Struct.Value would restrict to a few combinations, and still need to handle keyword_init:.

I think Struct::Value.new could be a nice helper for immutable: true, enumerable: false, hash_accessors: false.
The others seem more specific, less common to use, and I would rather let people choose the configuration they want with keyword arguments for Struct.new().

Implementation-wise and conceptually, I think it's also nicer if Struct::Value.new(...) is implemented as as Struct.new(..., immutable: true, enumerable: false, hash_accessors: false).

Updated by Eregon (Benoit Daloze) over 2 years ago

In my view, Struct.new is the perfect example to generate a custom class in Ruby.
I think making it customizable with new keyword arguments is both elegant and simple.

OTOH I think having N "subclasses" with different behaviors invites to confusion about what differs between them and enforces duplication in implementation code.

Updated by ko1 (Koichi Sasada) 9 months ago

I don't use Enumerable features of Struct classes, but I don't have any trouble by having Enumerable.
Why do you want to remove Enumerable features?
I can not find any benefits.

Updated by zverok (Victor Shepelev) 9 months ago

@ko1 (Koichi Sasada), the initial ticket provides some explanations:

For example, this code snippet shows why to_a is problematic:

Result = Struct.new(:success, :content)

# Now, imagine that other code assumes `data` could be either Result, or [Result, Result, Result]
# So, ...

data = Result.new(true, 'it is awesome')

Array(data) # => expected [Result(true, 'it is awesome')], got [true, 'it is awesome']

# or...
def foo(arg1, arg2 = nil)
p arg1, arg2
end

foo(*data) # => expected [Result(true, 'it is awesome'), nil], got [true, 'it is awesome']

That's about just #to_a method, but I think that in general, considering duck typing, it is undesirable that the object that the developer thinks of as an "atomic" will be duck-typed as a collection (#respond_to?(:each)). In general, you never know when "is it one thing, or is it an enumeration of things" will be crucial in code, and I think it is important to underline Struct::Value is one thing.

I believe there are good reasons why #each was removed from String, for example.

Updated by ko1 (Koichi Sasada) 9 months ago

zverok (Victor Shepelev) wrote in #note-24:

@ko1 (Koichi Sasada), the initial ticket provides some explanations:

Sorry I found it just after commented.

It seems not related to "immutability".

Updated by zverok (Victor Shepelev) 9 months ago

@ko1 (Koichi Sasada)

It seems not related to "immutability".

Yes, I covered this, too (I know it is a large wall of text, sorry!), in Concrete proposal section:

Class API is copying Structs one (most of the time -- even reuses the implementation), with the following exceptions (note: the immutability is not the only difference)

Updated by Dan0042 (Daniel DeLorme) 9 months ago

matz (Yukihiro Matsumoto) wrote in #note-19:

I like the idea of helpers in https://bugs.ruby-lang.org/issues/16122#note-18.
We need to discuss further the combination of attributes (immutable, enumerable, etc.)

Having helpers would definitely provide a nice easy experience. But since the important thing here is the optional settings, disagreement/bikeshed on the helpers should not prevent or delay a decision on the immutable/enumerable/hash_accessors settings. It should be ok to first decide on those settings, and in a second step decide on the helpers. After all once the settings are available, it's trivial for anyone to define their own helpers.

So regarding those helpers I was thinking of something like Struct.Value(:x, :y) but there's also the Struct::Value.new(:x, :y) syntax that simulates a subclass. Having a Value is the main topic of this ticket, but personally I'm more interested in Basic that behaves more like a simple C struct. It's easier to use and reason about if you don't have to worry about accidential conversion to array and auto-splatting bugs. I'm not particularly attached to Tuple but I thought it was a good name to make it explicit when we want a splattable struct where the ordering of the fields is important, like x,y,z = *tuple.

Updated by mame (Yusuke Endoh) 8 months ago

Discussed on the dev-meeting.

@matz (Yukihiro Matsumoto) is now negative to allow settings. Having various semantics in one Struct class will bring confusion rather than usability. keyword_init settings will be no longer needed after Ruby 3.2. (See #16806 and c956f979e5d05900315d2753d5c3b1389af8dae4)

Instead, he seems positive to provide one strict version of Struct. His current preference is:

  • Has: field reader methods, deconstruct_keys, deconstruct, ==, eql?, hash
  • Does not have: field writer methods like writer=, each and Enumerable, to_a, each_pair, values, [], []=, dig, members, values_at, select, filter, size, to_h

But he couldn't seem to decide on a name. Struct::Value seems acceptable to him, but he wanted to seek a better name. Devs suggested Tuple, NamedTuple, and Record, but none of them seemed to fit for him.

Updated by Eregon (Benoit Daloze) 8 months ago

ValueStruct or ImmutableStruct or FrozenStruct maybe?
ImmutableStruct would probably only make sense if values are made immutable too, which doesn't seem proposed here.

I think the nesting of Struct::Value feels a bit weird, especially with the existing behavior of Struct.new("Foo", :a) defining Struct::Foo.
But not really against it.

Updated by matheusrich (Matheus Richard) 8 months ago

Some more alternatives to get the ideas rolling: Unit and Item (might be paired with Struct)

I also like Box.

Updated by Dan0042 (Daniel DeLorme) 8 months ago

Eregon (Benoit Daloze) wrote in #note-29:

ValueStruct or ImmutableStruct or FrozenStruct maybe?

Those are good ideas. Or to highlight the "pared-down" aspect of this strict version of Struct: SimpleStruct / PlainStruct / BasicStruct (parallel to Object vs BasicObject).

Updated by myronmarston (Myron Marston) 8 months ago

I'm quite fond of this proposal--I basically never use Struct unless I specifically need mutability and have been using the values gem for years, which has a simple implementation of about 100 lines:

https://github.com/tcrayford/Values/blob/master/lib/values.rb

It offers a number of core features that I'd hope any stdlib equivalent would also provide:

  • Instantiation via positional arguments (ValueClass.new(1, 2))
  • Instantiation via keyword arguments (ValueClass.with(foo: 1, bar: 2))
  • Ability to make a copy with one or more attributes updated: value.with(foo: 1)
  • ==/eql?/hash defined for value-based equality semantics
  • Readable to_s/inspect/pretty_print
  • Easy conversion to a hash with to_h

Most engineers I've worked with have referred to this category of objects as "value objects" so I think "Value" in the name is good...but I don't care a whole lot about the name. Kotlin (another language I use) offers a similar feature and calls them data classes:

https://kotlinlang.org/docs/data-classes.html

If this is adopted, it'd also be great to see what stdlib types can be safely ported to build on this type--things like Date/Time/URI, etc. (It may of course be hard or impossible to port these to use the new feature while retaining backwards compatibility.)

Updated by dsisnero (Dominic Sisneros) 8 months ago

+1 -
Also, is there plans to have a flag in C or a different shape so that the VM's can make this fast

Updated by mame (Yusuke Endoh) about 2 months ago

@nobu (Nobuyoshi Nakada) proposed Data, which used to be a class for extension library authors, but deprecated since ruby 2.5 and removed since 3.0. We might reuse it now.

Summarise the proposed name candidates:

  • Struct::Value
  • ImmutableStrudct
  • FrozenStruct
  • Unit
  • Item
  • Box
  • SimpleStruct
  • PlainStruct
  • BasicStruct
  • ValueClass (provided by values gem?)
  • Value (provided by values gem)
  • Data

Updated by mame (Yusuke Endoh) about 2 months ago

BTW, I personally wanted Struct to store the field values simply in instance variables rather than the hidden storage. For example:

FooBar = Struct::Value.new(:foo, :bar)

class FooBar
  def foo_plus_bar
    # These bare "foo" and "bar" are not visually obvious
    # whether they are a method call or local variable access

    foo + bar

    # We can write it as follows, but it is a bit verbose

    self.foo + self.bar

    # If they are stored in instance variables,
    # it is obvious that they are field access
    
    @foo + @bar
  end
end

I know it is impossible to change Struct for compatibility reason, but if we introduce a new Struct-like class, I wonder if we can change this too?

Updated by Eregon (Benoit Daloze) about 2 months ago

mame (Yusuke Endoh) wrote in #note-35:

BTW, I personally wanted Struct to store the field values simply in instance variables rather than the hidden storage.
I know it is impossible to change Struct for compatibility reason, but if we introduce a new Struct-like class, I wonder if we can change this too?

FWIW, TruffleRuby used to use ivars for Struct but changed to "hidden ivars" for compatibility.
Hidden ivars probably adds a bit more flexibility in the implementation but also means e.g. attr_reader can't be used directly to implement Struct::Value.
I'd think it'd be nice if we can share implementation code between Struct and Struct::Value, so it seems best to use the same representation from that POV.

A problem with ivars is Struct allows members which are not valid ivar names (IIRC), so ivars can't be used internally, or Kernel#instance_variables will not necessarily be all Struct::Value attributes.

Updated by k0kubun (Takashi Kokubun) about 2 months ago

My enthusiastic +1 for Data.

I've used Kotlin and its Data classes like @myronmarston (Myron Marston), and I feel calling it a Data class is somewhat accepted by the community. On the other hand, calling it Struct::Value feels like a workaround to avoid a conflict with existing names. I'm not sure if @zverok (Victor Shepelev) likes Data over his own proposal, but note that data appears in his local variable name as well.

Updated by baweaver (Brandon Weaver) about 2 months ago

k0kubun (Takashi Kokubun) wrote in #note-37:

My enthusiastic +1 for Data.

I've used Kotlin and its Data Classes like @myronmarston (Myron Marston), and I feel calling it a Data class is somewhat accepted by the community. On the other hand, calling it Struct::Value feels like a workaround to avoid a conflict with existing names. I'm not sure if @zverok (Victor Shepelev) likes Data over his own proposal, but note that data appears in his local variable name as well.

+1 as well. It's similar to the idea of Case Class in Scala as well, and I think the name Data is reasonable. Happy to see that Struct is looking to deprecate keyword_init in favor of accepting both styles as well, both are welcome changes.

These will be especially useful with pattern matching features

Updated by zverok (Victor Shepelev) about 2 months ago

I'm not sure if @zverok (Victor Shepelev) likes Data over his own proposal, but note that data appears in his local variable name as well.

It is OK, I think, save for some clumsiness when you try to speak in plurals (wrong "datas" or right-yet-not-obvious "datum").

I was never too sure about the name anyway.

If the rest is OK, I'll rebase my PR and update naming on the weekend.

Updated by myronmarston (Myron Marston) about 2 months ago

If “data” is the naming direction folks like, I think the class name should be DataClass. This aligns with kotlin (where data is a keyword before the class keyword) and reads better, IMO: DataClass.new gives you a new class whose purpose is to hold data. Data.new sounds like it gives you a new data which sounds weird.

Updated by k0kubun (Takashi Kokubun) about 2 months ago

I can live with DataClass too, but I still can't forget the beautiful shortness of Data. DataClass.new feels like you wanted to be so right that the name ended up being a bit verbose. To address that point, I thought of Data.class, which looks cool, but I guess such an interface doesn't exist in Ruby yet and DataClass.new is more "correct" in the OOP world.

Updated by mame (Yusuke Endoh) about 2 months ago

At the dev meeting, @matz (Yukihiro Matsumoto) rejected all name candidates except Struct::Value and Data.

  • He wants to avoid the names already used by gems: ImmutableStruct, ValueClass, Value
  • Short common nouns would be conflicting: Unit, Item, Box
  • The main purpose of the new class is immutability, not "frozen", not "plain", not simplicity: FrozenStruct, SimpleStruct, PlainStruct
  • He doesn't plan to make the old Struct inherit from the new one, so BasicStruct like BasicObject is not suitable

Incidentally, my proposal in #note-35 was rejected because an instance variable is weak against a typo. (A misspelled reader method raises NameError, but a misspelled instance variable returns nil implicitly.)

Updated by mame (Yusuke Endoh) about 2 months ago

This is my personal opinion. I think Data is a good choice since there are few compatibility issues at this time despite the short and simple name. If we were to use a slightly different word for this, such as DataClass, I don't see much point in choosing this word.

Updated by zverok (Victor Shepelev) about 2 months ago

At the dev meeting, @matz (Yukihiro Matsumoto) rejected all name candidates except Struct::Value and Data.

So, as far as I can understand, we only should choose one of two now, right?
I like Struct::Value slightly more, but not to the point of spending one more year discussing it :)

Let's stick with Data then, and I prepare the final PR.

Updated by matz (Yukihiro Matsumoto) about 2 months ago

I am not 100% satisfied with any of the candidates, but Struct::Value and Data are better than others.
Struct::Value can cause conflict when someone is using Struct.new("Value", :foo, :bar) (this old-style code creates Struct::Value class).
Data is a little ambiguous, but probably we can get used.

Matz.

Updated by zverok (Victor Shepelev) about 2 months ago

Umm wait.

Data is actually a plural form. While using it as singular is acceptable in modern English, in this case we don't have a plural for it.

I believe it will be a problem while writing docs, tutorials and discussing things. "Let's define a data here. Now, let's define some more ???"

Updated by k0kubun (Takashi Kokubun) about 2 months ago

It consists of multiple members, so calling it data itself doesn't seem like a problem to me. For documentation, you could say a data class or data classes.

Updated by Eregon (Benoit Daloze) about 2 months ago

The main purpose of the new class is immutability, not "frozen", not "plain", not simplicity: FrozenStruct, SimpleStruct, PlainStruct

Immutable AFAIK means "deeply frozen", while frozen means "shallow frozen" (= Kernel#freeze/frozen?).
This new struct-like class will not ensure every value is immutable, so it won't guarantee the Struct::Value instance is immutable.
So in terms of documentation and semantics the new class will create frozen but not immutable instances.

Regarding the name, I like Data as well.
If we want to avoid the confusion of Data.new returning a class and not an instance of Data, we could have a another name for "create a Data subclass with these fields", maybe Data.for(:a, :b) or Data.new_class(:a, :b)/Data.new_subclass(:a, :b) or so.
I have seen many people being confused with Struct.new returning a subclass, so I think it is something worth considering for the new struct-like class.

Updated by matz (Yukihiro Matsumoto) about 1 month ago

We are going to implement Data class in the following spec:

  • D = Data.def(:foo, :bar) to define a data class (subclass of Data)
  • Data.new raises exception (unlike Struct)
  • d = D.new(1,2) to create Data instance
  • Or d = D.new(foo:1, foo:2)
  • D.new(1) or D.new(1,2,3) raises ArgumentError
  • D.new(foo:1) or D.new(foo:1,bar:2,baz:3) raises ArgumentError
  • Instead of D.new(...) you may want to use D[...]

We need further discussion regarding the following:

  • default value to initialize Data
  • how to call initialize method for D class
  • whether we will introduce Struct.def

Matz.

Updated by zverok (Victor Shepelev) about 1 month ago

@matz (Yukihiro Matsumoto) Thanks for the decisions!

A few questions:

  1. I am a bit concerned about using def, which is strongly associated with defining methods. I wouldn't want it to be a blocker (it would be really cool to have Data by the 3.2), but can we consider our options here? From the top of my head, I can think of define (used in methods context too, but less strong association with def method), setup, create or generate.
  2. "default value to initialize Data" I am not sure what do you mean by that, can you please elaborate?
  3. "how to call initialize method for D class". What options do we have here? Is there a necessity for deviation from how other classes work?..

Updated by matz (Yukihiro Matsumoto) about 1 month ago

  1. I slightly regretted to make Struct.new to create a subclass, not an instance. So this time I didn't choose new. create or generate would have a similar issue with new. define might be a candidate. But I still prefer shorter one, but you can try to persuade me.
  2. Sometimes it's handy to fill data members (e.g., foo) by the default value. But we still have no idea to specify those default values. And if default values are available, there could be an issue regarding modifying the value (e.g., an empty array is given for a default value, what if it's modified).
  3. When we allow mixing positional and keyword initializers, implementing initialize may be a bit complex. But we may unify 2 kinds of initializers to keyword initializer in D#new. But it's implementation detail. Need to be discussed later.
D=Data.def(:foo, :bar)
class D
  # if arguments to `new` is passed directly
  def initialize(*args,**kwd)
     # ... checking positional and keyword initializers
  end
  # instead
  def initialize(**kwd) # only takes keyword arguments
     # ... initialization become much simpler
  end
end

Oh, I forgot. During the discussion, someone came up with an idea of Anonymous Data, data instance without creating a subclass (kinda like (Named)Tuples in other languages). This is another topic.

Matz.

Updated by k0kubun (Takashi Kokubun) about 1 month ago

  • Subject changed from Struct::Value: simple immutable value object to Data: simple immutable value object
  • Status changed from Feedback to Assigned
  • Assignee set to zverok (Victor Shepelev)

Updated by k0kubun (Takashi Kokubun) about 1 month ago

Data.new aside, Data.def and Data.define are my current top-2 preferences. I'd be happy with either of these at this point.

But thinking about this further, I might like Data.define slightly more. It's a trade-off with shortness, but Data.define sounds more natural and it's still short enough, thanks to Data being chosen instead of Struct::Value. Even if it were to be ported to Struct, Struct.define doesn't seem too long either.

Updated by Eregon (Benoit Daloze) about 1 month ago

I like Data.define as well, we are "defining a new subclass of Data".
def makes me thing to "define a method" since that's what the keyword of the same name does, but Data.def does not define a method but a class.

Updated by austin (Austin Ziegler) about 1 month ago

Eregon (Benoit Daloze) wrote in #note-54:

I like Data.define as well, we are "defining a new subclass of Data".
def makes me thing to "define a method" since that's what the keyword of the same name does, but Data.def does not define a method but a class.

Elixir uses defmodule for module declarations. Why not Data.defclass or Data.deftype or even Data.defshape?

Updated by Eregon (Benoit Daloze) about 1 month ago

austin (Austin Ziegler) wrote in #note-55:

Why not Data.defclass or Data.deftype or even Data.defshape?

Because those do not use a proper Ruby naming (no _ to separate words, aside: looks like Python matplotlib methods), they are longer and read less nicely.
We already have some agreement on Data.define, let's not discuss another year on other names please.

Updated by zverok (Victor Shepelev) about 1 month ago

Disregard that, looked in the wrong place.

Hmm, folks, working on implementation on the weekend, I am a bit confused. I believe I saw that in 3.2 we decided to make Struct more flexible, but accepting positional OR keyword args regardless of keyword_init: true, but currently I can't find anything like that either in master or in the tracker.
Am I misremembering?.. (It is OK by me to implement params handling for Data initialization separately, I just somehow believed the same is already done for Struct)

Updated by RubyBugs (A Nonymous) about 1 month ago

@Eregon (Benoit Daloze) This is great news!

At Panorama Education, we use maintain a fork of the tcrayford/values gem here for this purpose here: https://github.com/ms-ati/Values/tree/panoramaed-2.0.x

We hope that a Ruby-native solution might address some of the same needs we have in the gem:

1) Copy with changes method

The above gem calls this method #with. Called on an instance, it returns a new instance with only the provided parameters changed.

This API affordance is now widely adopted across languages for its usefulness, because copying with discrete changes is the proper pattern that replaces mutation for immutable value objects, for example:

C# Records: “immutable record structs — Non-destructive mutation” — is called with
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/record#nondestructive-mutation

Scala Case Classes — is called copy
https://docs.scala-lang.org/tour/case-classes.html

Java 14+ Records — Brian Goetz at Oracle is working on adding a with copy constructor inspired by C# above as we speak:
https://mail.openjdk.org/pipermail/amber-spec-experts/2022-June/003461.html

Rust “Struct Update Syntax” via .. syntax in constructor
https://doc.rust-lang.org/book/ch05-01-defining-structs.html#creating-instances-from-other-instances-with-struct-update-syntax

Updated by RubyBugs (A Nonymous) about 1 month ago

2. Highly optimized hash and eql?

Because we use value objects in loops, profiling led us to the commits in our fork of tcrayford/values, which optimize these methods, so that using Value objects in Sets and as Hash keys will be as performant as possible.

Updated by RubyBugs (A Nonymous) about 1 month ago

3. Keyword arg constructors

Because a non-mutating copy-with-changes such as #with will need to take keyword arguments to indicate which values to change, it’s useful to provide a constructor form which also accepts keyword arguments

Updated by RubyBugs (A Nonymous) about 1 month ago

4. Conversion from and to Hash

For example: a keyword arg constructor accepting (**hsh), and a #to_h method

Updated by k0kubun (Takashi Kokubun) about 1 month ago

RubyBugs (A Nonymous) wrote in #note-58:

We hope that a Ruby-native solution might address some of the same needs we have in the gem:

Can you please file a separate ticket to discuss Data extensions that don't exist in Struct? There could be so many Data extension ideas, such as default values mentioned in #note-49, but discussing all that in a single ticket would make it harder to follow all of such discussions.

zverok (Victor Shepelev) wrote in #note-39:

If the rest is OK, I'll rebase my PR and update naming on the weekend.

Have you filed a PR by the way? It seems like you only attached a patch and didn't file one. While Data.def could still be changed to Data.define (#note-51) depending on how the discussion goes, the code change for it wouldn't be so hard. I think it's nice to confirm its look and feel by actually playing with it early, which would be useful for discussing Data.def vs Data.define as well.

Updated by RubyBugs (A Nonymous) 27 days ago

k0kubun (Takashi Kokubun) wrote in #note-62:

Can you please file a separate ticket to discuss Data extensions that don't exist in Struct? There could be so many Data extension ideas, such as default values mentioned in #note-49, but discussing all that in a single ticket would make it harder to follow all of such discussions.

Thanks @k0kubun (Takashi Kokubun)! I've filed follow-up ticket here for the "Copy with changes" method #with:
https://bugs.ruby-lang.org/issues/19000

Updated by RubyBugs (A Nonymous) 27 days ago

I've filed a 2nd follow-up ticket here for the Symmetric #to_h method whose values can be fed to a keyword-args constructor

Updated by shugo (Shugo Maeda) 26 days ago

If we choose define instead of new, why not use Class.define to return a new immutable Struct-like class?

  • The name Data doesn't imply immutability, so Class.define is OK too.
  • It's clearer that Class.define returns a class.

Updated by k0kubun (Takashi Kokubun) 26 days ago

shugo (Shugo Maeda) wrote in #note-65:

why not use Class.define to return a new immutable Struct-like class?

Given that Matz is also thinking about introducing Struct.define, if we do the immutable one with Class.define, we'd need to also pass immutable: true or immutable: false for either of these, which is longer and harder to use. Otherwise you'd need to have Struct.define for mutable one and Class.define for immutable one, which seems inconsistent.

It's clearer that Class.define returns a class.

I feel Data class is a fairly different concept from normal Class because it has special fields used for comparison and deconstruction. You shouldn't have any mutable thing under Data.define while you're free to do such things in Class.new. To make it easier to notice such difference, it feels cleaner to me to separate Class, Struct, and Data this way.

Updated by Eregon (Benoit Daloze) 25 days ago

Agreed with @k0kubun (Takashi Kokubun). Also Class.define wouldn't make it clear it defines a data class and creates Data (subclass) instances.

Updated by zverok (Victor Shepelev) 25 days ago

Pull request: https://github.com/ruby/ruby/pull/6353

Copying from its description:

Example docs rendering: Data

Design and implementation decisions made:

1. The "define data object method is called Data.define. As per Matz:

define might be a candidate. But I still prefer shorter one (e.g. def), but you can try to persuade me.

There were a few quite reasonable arguments towards define in that ticket. To add to them, my PoV:

  • I believe that nowadays (adding new APIs to the mature language), it is better to use full English words to remove confusion and increase readability;
  • def is strongly associated in Ruby with "defining a method" and became a separate word in Rubyist's dictionary, not a "generic shortcut for 'define'"
  • I believe that the "definition of the new type" (unlike the definition of a new method) is a situation where clarity is more important than saving 3 chars; and somewhat rarer.

2. define accepts keyword and positional args; they are converted to keyword args there, and checked in initialize

Measure = Data.define(:amount, :unit)
Measure.new(1, 'km') # => OK
Measure.new(amount: 1, unit: 'km') # => OK
Measure.new(1) # ArgumentError
Measure.new(amount: 1) # ArgumentError
Measure.new(1, 'km', 'h') # ArgumentError
Measure.new(amount: 1, unit: 'km', comment: 'slow') #=> ArgumentError

The fact that initialize accepts only keyword args and checks them makes it easy to define custom initialize with defaults, for example (it is all explicitly documented, see link above):

Measure = Data.define(:amount, :unit) do
  def initialize(amount:, unit: '-none-') = super
end

Measure[1] #=> #<data Measure amount=1, unit="-none-">

(This might be enough for not to invent a separate API for default values, but this discussion can be postponed.)

3. I didn't introduce any additional APIs yet (e.g. something like with which was discussed). So, the full API of the Data as rendered by RDoc is:
image

I believe it is enough for the introduction, and then the feedback might be accepted for how to make it more convenient.

4. I wrote custom docs for Data class instead of copy-pasting/editing docs for Struct. I have a strong belief that the approach to docs used is more appropriate:

  1. For separate methods, instead of detailed documenting of every behavior quirk by spelling it in many phrases, I provided the explanation of what it does logically and examples of how it can/should be used.
  2. For the entire class, I don't see a point in the recently introduced formal structure of "What's here" with many nested sections, at least for small classes. It sacrifices the lightweight-yet-enough explanations that can be consumed almost instantly, towards the feeling of "there is a book-worth of information to read," making the documentation user's experience more laborious.

Probably it is not a good place to discuss those approaches in general, but at least for the Data class, I believe the chosen approach is superior. If the core team believes it is not true, I think my changes can be merged and then formal docs provided by team members who consider them valuable.

Updated by Eregon (Benoit Daloze) 25 days ago

Looks good to me.

Regarding overriding initialize and calling super, that would not work if we define an optimized initialize instance method on the subclass by Data.define (it would result in NoMethodError or the slower generic initialize).
That can be worked around on the VM side by defining the optimized initialize in a module, but that's extra overhead/footprint.
It is what is done for Struct though in https://github.com/oracle/truffleruby/blob/master/src/main/ruby/truffleruby/core/struct.rb

In general using keyword arguments for initialize causes a non-trivial overhead (lots of generic Hash operations),
unless initialize is optimized and generated per Data subclass, where it can then use literal keyword arguments which are much better optimized.

So I think it would be best to specialize initialize per subclass.
Concretely we would define initialize on Data subclasses, but not on Data itself. That keeps us the opportunity to specialize initialize per subclass.
This also means, to override initialize with defaults, one would need to use alias to call the original initialize (much better for memory footprint, no extra subclass), or subclass the Data subclass to use super.

Updated by zverok (Victor Shepelev) 25 days ago

@Eregon (Benoit Daloze) Yeah, those are valuable observations!

The specialized initialize also looks more reasonable to me, actually, but I followed what the Struct does so far.

I am not sure whether we have a C-level API for passing keyword args one-by-one, not by packing to Hash and unpacking back?.. My C-foo might be not strong enough for defining specialized initializer.

@k0kubun (Takashi Kokubun) @matz (Yukihiro Matsumoto) is it something we want to handle from the beginning? (I assume it might be, as changing it later would break the compatibility for redefined initialize)

Updated by Eregon (Benoit Daloze) 25 days ago

zverok (Victor Shepelev) wrote in #note-70:

I am not sure whether we have a C-level API for passing keyword args one-by-one, not by packing to Hash and unpacking back?.. My C-foo might be not strong enough for defining specialized initializer.

I think it doesn't need to be done from the start, but to leave the possibility to do it we should define initialize on the subclass and not on Data.
So I suggest to just define it on the subclass and initially it's fine to simply use Hash operations and optimize it later.
For instance it might be easier to make this work through evaling some Ruby code, assuming the member names are valid local variable names.'

Regarding creating a new Data subclass instance, I wonder if we should support both positional and kwargs,
or if we should only support keyword arguments for simplicity and performance (since anyway we need kwargs for initialize as you said).
We can always add creation with positional arguments after if desired.

Updated by zverok (Victor Shepelev) 25 days ago

Regarding creating a new Data subclass instance, I wonder if we should support both positional and kwargs, or if we should only support keyword arguments

Am I understanding correctly that you propose to only leave Measure.new(amount: 1, unit: 'km') syntax, and ditch Measure.new(1, 'km') one?..

If so, I am positive that we should support both, and I believe that's @matz (Yukihiro Matsumoto) 's position too. I believe that need to write Measure[amount: 10, unit: 'km'] instead of Measure[10, 'km'] for trivial data classes would be a significant barrier towards adoption of the new feature.

Note that with features that pattern-matching provides, even 1-attribute Data makes sense for strong/expressive typing, e.g. Result[1] vs. Error["brrr"], and adding one more name to write here would be incredibly irritating.

So my stance (again, as far as I understand, it is @matz (Yukihiro Matsumoto) 's, too) for new language features is optimization can be postponed if it is non-trivial, but finding a good and expressive API can not.

Anyway, it is implemented already :)

Updated by Eregon (Benoit Daloze) 25 days ago

Indeed, that's what I meant. Alright, I guess we need to support positional arguments too then.
Because that's implemented in the subclass .new it should be possible to optimize it pretty well.

Updated by nobu (Nobuyoshi Nakada) 25 days ago

zverok (Victor Shepelev) wrote in #note-68:

Pull request: https://github.com/ruby/ruby/pull/6353

Very nice.

I don't think keyword_init and "naming as the first argument"
features are needed for new Data. So, I guess that splitting the
rb_struct_s_def function rather than extracting it to
define_struct with a bool flag.

Updated by zverok (Victor Shepelev) 15 days ago

@nobu (Nobuyoshi Nakada) Thanks!

I've applied all suggestions for the code review, except for this one (I've answered why it is done that way), and define_struct one. My reasoning is this:

  • I am not sure that "naming as a first argument" is a widely used feature, but it seems nice, so I left it for Data too (and tests confirming its existence); I imagine that in some systems, doing Data.define('MyType', :members) might be preferred way;
  • If we leave that aside, the differences due to a bool flag are very small (some 5-6 lines of 70-lines method), so it seems that keeping it all together is the most straightforward.

But please tell me if you disagree, and I'll change the implementation.

Updated by ufuk (Ufuk Kayserilioglu) 15 days ago

zverok (Victor Shepelev) wrote in #note-75:

  • I am not sure that "naming as a first argument" is a widely used feature, but it seems nice, so I left it for Data too (and tests confirming its existence); I imagine that in some systems, doing Data.define('MyType', :members) might be preferred way;

In my opinion, this is a good time to break free from this old API and start with a better design. The fact that the class_name argument defines a constant under Struct is a little too magical, needlessly pollutes the namespace, and leads to name clashes. I would prefer it if Data didn't inherit the same thing from Struct and had a more purpose designed API from the start.

I can also see from @matz (Yukihiro Matsumoto) 's previous message in this thread that he considers that syntax as "old-style". Moreover, the existence of the class_name parameter was one of his reasons against the name Struct::Value. It would have been horrible to not be able to use it if it was the best name for the concept, just because it could clash with someone else's magical struct class. Luckily Data was a better name for it. I'd rather that we don't paint ourselves into similar corners in the future.

matz (Yukihiro Matsumoto) wrote in #note-45:

Struct::Value can cause conflict when someone is using Struct.new("Value", :foo, :bar) (this old-style code creates Struct::Value class).

Updated by nobu (Nobuyoshi Nakada) 14 days ago

As @ufuk (Ufuk Kayserilioglu) wrote 🙏, I don’t think the behavior worth to be kept.
In the case you want a name, you can assign it to a constant.

Updated by zverok (Victor Shepelev) 13 days ago

@ufuk (Ufuk Kayserilioglu) @nobu (Nobuyoshi Nakada) Makes sense, right.
I adjusted the PR and removed the unification into the define_struct method. They are pretty different now (this part probably can be extracted to a common one, but I am not really sure it is necessary)

Actions #79

Updated by zverok (Victor Shepelev) 13 days ago

  • Description updated (diff)

Updated by nobu (Nobuyoshi Nakada) 13 days ago

zverok (Victor Shepelev) wrote in #note-78:

They are pretty different now (this part probably can be extracted to a common one, but I am not really sure it is necessary)

I wonder about the “weird name” members…

Updated by zverok (Victor Shepelev) 13 days ago

I wonder about the “weird name” members…

Oh right. Left a note to self and missed it myself 🤦
I adjusted the tests (though the note was left even before the test_edge_cases method was added, so most of it was tested already).

Updated by matz (Yukihiro Matsumoto) 13 days ago

  • Description updated (diff)

Could you summarize the up-to-date proposed specification of Data class, please?
For the record, I accept define instead of def.

Matz.

Updated by zverok (Victor Shepelev) 10 days ago

  • Description updated (diff)

@matz (Yukihiro Matsumoto) I've updated the ticket text with the description of the implemented API and links.

Thank you!

Updated by ioquatix (Samuel Williams) 10 days ago

I'd like to know how complicated it would be to support an interface like this:

Header = Data.define(:type, :length)

# ...
buffer = IO::Buffer...
header = Header.new

buffer.unpack_into(header, :U16, :U32)
# internally, call rb_iv_set(header, 0, value); rb_iv_set(header, 1, value)

What I'm asking for, is for some kinds of objects, can we consider the attributes to be indexed for efficiently writing into them in order?

If so, can we expose that interface, e.g. rb_iv_set_indexed(VALUE object, int index, VALUE value) or something.

For objects that don't support it, raising an exception would be fine. I imagine, both Struct and Data can support it.

I can make separate issue, but I felt like this was a good place to start the discussion.

Updated by matz (Yukihiro Matsumoto) 9 days ago

@zverok (Victor Shepelev) The summary looks OK. I accepted.

@ioquatix (Samuel Williams) Your proposal should be handled separately. Could you submit a new one?

Matz.

Updated by nobu (Nobuyoshi Nakada) 9 days ago

@zverok (Victor Shepelev) Could you add the (simple) NEWS entry too?

Actions #88

Updated by nobu (Nobuyoshi Nakada) 5 days ago

  • Status changed from Assigned to Closed
Actions

Also available in: Atom PDF