Project

General

Profile

Actions

Feature #22100

closed

Native Union Types in Ruby

Feature #22100: Native Union Types in Ruby

Added by bogdan (Bogdan Gusiev) 5 days ago. Updated 4 days ago.

Status:
Feedback
Assignee:
-
Target version:
-
[ruby-core:125664]

Description

Summary

Add a UnionType class to Ruby's standard library and extend Class#| to
construct one, enabling expressive, composable type-checking syntax throughout
the language.

String | Integer          # => UnionType(Integer | String)
value.is_a?(String | Integer)
case value
when String | Integer then ...
end

Motivation

1. Type-checking sugar that every Ruby developer already writes by hand

Runtime type validation is ubiquitous in Ruby codebases. The current idioms
are verbose and inconsistent:

# Common patterns in the wild today
raise TypeError unless value.is_a?(String) || value.is_a?(Integer)
raise TypeError unless [String, Integer].any? { |t| value.is_a?(t) }
raise TypeError unless String === value || Integer === value

A union type collapses all of these into a single, readable expression:

raise TypeError unless value.is_a?(String | Integer)

This is not a niche use-case. Any method that accepts multiple types — a
common pattern in Ruby's own standard library — benefits immediately:

# Hypothetical standard library
def write(data)
  raise TypeError, "expected String or IO" unless data.is_a?(String | IO)
  ...
end

The case/when integration comes for free because UnionType implements
===, making union branches in case expressions natural and zero-cost to
adopt.

2. RBS and Sorbet already model this concept; Ruby itself should too

Ruby's own type annotation language RBS uses | for union types as
first-class syntax:

def process: (String | Integer) -> void

Sorbet expresses the same idea with T.any:

sig { params(value: T.any(String, Integer)).void }
def process(value) = ...

Both tools have converged on the same semantic. Having the concept in static
annotations but not in runtime Ruby creates a gap: developers must translate
String | Integer from their type signatures into verbose is_a? chains by
hand, and the two can drift out of sync.

Sorbet requires the class constant instead, and T.nilable only covers a
single type — so a multi-type nullable needs the verbose form:

T.any(String, Integer, NilClass)     # Sorbet — nil literal not accepted
T.nilable(T.any(String, Integer))    # Sorbet alternative, extra nesting

With a native UnionType the expression stays flat and readable:

String | Integer | nil               # UnionType — matches RBS exactly

Comparison with dry-types sum types. dry-schema uses dry-types' |
operator for multi-type fields:

required(:value).value(Dry::Types['integer'] | Dry::Types['string'])

Dry::Types['integer'] is a Constrained<Nominal<Integer>> object — a
class check with no coercion, semantically equivalent to what UnionType
provides. For already-typed data (parsed JSON, domain objects) a native
UnionType would be a simpler drop-in:

required(:value).value(Integer | String)   # hypothetical, with native UnionType

Construction-time optimization is also worth noting. A UnionType prunes
redundant members at construction: Integer | Numeric collapses to Numeric
immediately, so every subsequent === check is against the minimal set of
classes. User-space code using Array#any? cannot do this without re-running
the deduplication on every call. A native type is also a known, stable shape
that the VM could treat specially in the future — the same path that gave
Integer, Symbol, and true/false their fast paths.

3. Config-style type validation is a widespread, unsolved pattern

Many Ruby libraries and frameworks define configuration schemas as plain
hashes, with a :type key holding an array of valid classes:

# ActiveModel-style validators
validates :amount, type: [Integer, Float]

# Schema definitions (dry-schema, Grape, GraphQL-Ruby, etc.)
params do
  requires :id,   type: [String, Integer]
  optional :meta, type: [Hash, NilClass]
end

# Home-grown config validation
SCHEMA = {
  timeout: { type: [Integer, Float],  default: 30 },
  host:    { type: [String, NilClass], default: nil },
}

Today these arrays have no standard protocol. Each library re-implements the
same loop:

Array(config[:type]).any? { |t| value.is_a?(t) }

A UnionType gives this pattern a first-class home. Libraries could accept
either an array or a UnionType transparently via ===, and authors could
write schemas that are self-documenting and immediately executable:

SCHEMA = {
  timeout: { type: Integer | Float,  default: 30 },
  host:    { type: String  | NilClass, default: nil },
}

SCHEMA.each do |key, rule|
  raise TypeError, "#{key} must be #{rule[:type]}" unless rule[:type] === config[key]
end

4. Literal-value sugar for the three Ruby singletons

Ruby has exactly three values that are singletons of their own class:
nil (NilClass), true (TrueClass), and false (FalseClass).
Because the literal and the class are interchangeable conceptually, the
| operator accepts all three as shorthand:

String | nil    # => UnionType(String | nil)    same as String | NilClass
String | true   # => UnionType(String | true)   same as String | TrueClass
String | false  # => UnionType(String | false)  same as String | FalseClass

# Common real-world pattern: nullable type
def greet(name)
  raise TypeError unless name.is_a?(String | nil)
  "Hello, #{name || "stranger"}!"
end

These three are the complete set. No other Ruby literal has a distinct
singleton class, so no further sugar is needed or planned.

Footgun note: writing nil | String returns true because NilClass#|
is the boolean OR operator. The sugar only works with the union type on the
left: String | nil. This mirrors how Ruby already treats nil | x today
and is a known trade-off.

Proposed additions

Addition Description
UnionType class Immutable value object wrapping a sorted set of classes
Class#| Returns UnionType.new(self, other); accepts nil, true, false as sugar
UnionType#=== Enables case/when
Object#is_a? / kind_of? Accept UnionType as argument
Object#instance_of? Accept UnionType as argument
UnionType#& Intersection of two union types
UnionType#cover? True if a class is covered by the union
UnionType includes Enumerable Full iteration over member classes

Reference implementation

A working gem implementation is available at
https://github.com/bogdan/ruby-union-type

Compatibility

Class#| is not currently defined in Ruby, so no existing code is broken.
Object#is_a? is extended in a backwards-compatible way: non-UnionType
arguments fall through to the original C implementation.

Updated by nobu (Nobuyoshi Nakada) 4 days ago Actions #1 [ruby-core:125678]

  • Status changed from Open to Feedback

bogdan (Bogdan Gusiev) wrote:

case value
when String | Integer then ...
end

Why not?

case value
when String, Integer then ...
end

Even if you prefer is_a?, why not just extending this method to accept multiple arguments?

value.is_a?(String, Integer)

Updated by zverok (Victor Shepelev) 4 days ago Actions #2 [ruby-core:125680]

As String | Integer already works in pattern matching, it seems to cover most of the proposed cases:

# instead of:
raise TypeError unless value.is_a?(String) || value.is_a?(Integer)
raise TypeError unless [String, Integer].any? { |t| value.is_a?(t) }
raise TypeError unless String === value || Integer === value
# we can write:
raise TypeError unless value in String | Integer
# or just:
value => String | Integer # raises NoMatchingPatternError if not matched

# instead of
case value
when String | Integer

# you can just write
case value
in String | Integer

...and so on.

The only drawback of that approach is that patterns aren't values and therefore can't be put in variables/ constants or produced dynamically from the "list of types" argument. So

# Instead of this
SCHEMA = {
  timeout: { type: [Integer, Float],  default: 30 },
  host:    { type: [String, NilClass], default: nil },
}
# The possible approach is this:
SCHEMA = {
  timeout: { type: -> { it in Integer | Float },  default: 30 },
  host:    { type: -> { it in String | NilClass }, default: nil },
}

...and when the type list is dynamic, there is no way of turning it into a pattern, to the best of my understanding.

I remember some discussions about that (can patterns be produced and stored as regular values) after the pattern-matching introduction, but, to the best of my understanding, no way forward was discovered yet.

Updated by byroot (Jean Boussier) 4 days ago Actions #3 [ruby-core:125685]

if you prefer is_a?, why not just extending this method to accept multiple arguments?

That is something I wanted many times.

UnionType

My concern with this proposal, is that it's purely a dynamic/runtime declaration, with little to no way for the compiler to constantize these.

So:

raise TypeError unless value.is_a?(Integer | Float | nil | false)

Will have to allocate multiple UnionType on every execution: first UnionType(Integer, Float), then UnionType(Integer, Float, nil), and finally UnionType(Integer, Float, nil, false), so 3 allocations and 4 method calls for something that is functionally a constant.

And given this is intended for type checking, it would likely end up used in lots of code, making Ruby even more allocation heavy than it already is.

There might be some trickery we could pull in the compiler to try to optimize/cache this, but we'd need to ensure #| hasn't been redefined, etc.

Updated by bogdan (Bogdan Gusiev) 4 days ago Actions #4 [ruby-core:125700]

Why not?

case value
when String, Integer then ...
end

Even if you prefer is_a?, why not just extending this method to accept multiple arguments?

value.is_a?(String, Integer)

That's fair argument. This proposal only makes sense if we plan a better future for typehinting.

I imagen the following happen too:

Element = Integer | String | nil
value.is?(Element | Array[Element] | Hash[String | Symbol, Element))

This had established itself as a convenient schema definition markup. e.g. typescript.
On the oposite site we have the same concept in sorbet with a syntax so complex that I feel sad when I use it:

Element = T.type_alias { T.nilable(T.any(Integer, String)) }

value.is_a?(
  T.any(
    Element,
    T::Array[Element],
    T::Hash[T.any(String, Symbol), Element]
  )
)

However, I can imagine this going much further with:

DefaultUrlOptions = UnionType[Hash[:host | :port | :protocol, Object]]
# OR
DefaultUrlOptions = UnionType[{host: String, port: String | Integer | nil, protocol: String | nil}]

As String | Integer already works in pattern matching, it seems to cover most of the proposed cases:

Never heard of this feature and it seems it does specifically that, but that runtime downside you mentioned:

The only drawback of that approach is that patterns aren't values and therefore can't be put in variables/ constants or produced dynamically from the "list of types" argument.

I believe this is the thing that we actually need as this pattern matching expressions are already very very powerful and there is no need to invent yet another syntax for the same thing. https://gist.github.com/bogdan/686880702176ba0a0fe8f148ad2576b4

I remember some discussions about that (can patterns be produced and stored as regular values) after the pattern-matching introduction, but, to the best of my understanding, no way forward was discovered yet.

That would be very interesting to read the discussion on that topic if you can bring it up.

My concern with this proposal, is that it's purely a dynamic/runtime declaration, with little to no way for the compiler to constantize these.

That's pretty valid concern, I don't know how much that allocations would cost, but will trust your expertise that it will be significant.

Actions

Also available in: PDF Atom