Project

General

Profile

Actions

Feature #19272

closed

Hash#merge: smarter protocol depending on passed block arity

Added by zverok (Victor Shepelev) almost 2 years ago. Updated over 1 year ago.

Status:
Rejected
Assignee:
-
Target version:
-
[ruby-core:111461]

Description

Usage of Hash#merge with a "conflict resolution block" is almost always clumsy: due to the fact that the block accepts |key, old_val, new_val| arguments, and many trivial usages just somehow sum up old and new keys, the thing that should be "intuitively trivial" becomes longer than it should be:

# I just want a sum!
{apples: 1, oranges: 2}.merge(apples: 3, bananas: 5) { |_, o, n| o + n }

# I just want a group!
{words: %w[I just]}.merge(words: %w[want a group]) { |_, o, n| [*o, *n] }

# I just want to unify flags!
{'file1' => File::READABLE, 'file2' => File::READABLE | File::WRITABLE}
  .merge('file1' => File::WRITABLE) { |_, o, n| o | n }

# ...or, vice versa:
{'file1' => File::READABLE, 'file2' => File::READABLE | File::WRITABLE}
  .merge('file1' => File::WRITABLE, 'file2' => File::WRITABLE) { |_, o, n| o & n }

It is especially noticeable in the last two examples, but the usual problem is there are too many "unnecessary" punctuation, where the essential might be lost.

There are proposals like #19148, which struggle to define another method (what would be the name? isn't it just merging?)

But I've been thinking, can't the implementation be chosen based on the arity of the passed block?.. Prototype:

class Hash
  alias old_merge merge

  def merge(other, &block)
    return old_merge(other) unless block
    if block.arity.abs == 2
      old_merge(other) { |_, o, n| block.call(o, n) }
    else
      old_merge(other, &block)
    end
  end
end

E.g.: If, and only if, the passed block is of arity 2, treat it as an operation on old and new values. Otherwise, proceed as before (maintaining backward compatibility.)

Usage:

{apples: 1, oranges: 2}.merge(apples: 3, bananas: 5, &:+)
#=> {:apples=>4, :oranges=>2, :bananas=>5}

{words: %w[I just]}.merge(words: %w[want a group], &:concat)
#=> {:words=>["I", "just", "want", "a", "group"]}

{'file1' => File::READABLE, 'file2' => File::READABLE | File::WRITABLE}
  .merge('file1' => File::WRITABLE, &:|)
#=> {"file1"=>5, "file2"=>5}

{'file1' => File::READABLE, 'file2' => File::READABLE | File::WRITABLE}
  .merge('file1' => File::WRITABLE, 'file2' => File::WRITABLE, &:&)
#=> {"file1"=>0, "file2"=>4}

# If necessary, the old protocol still works:
{apples: 1, oranges: 2}.merge(apples: 3, bananas: 5) { |k, o, n| k == :apples ? 0 : o + n }
# => {:apples=>0, :oranges=>2, :bananas=>5}

As far as I can remember, Ruby core doesn't have methods like this (that change implementation depending on the arity of passed callable), but I think I saw this approach in other languages. Can't remember particular examples, but always found this idea appealing.

Actions #1

Updated by zverok (Victor Shepelev) almost 2 years ago

  • Description updated (diff)
Actions #2

Updated by zverok (Victor Shepelev) almost 2 years ago

  • Description updated (diff)
Actions #3

Updated by zverok (Victor Shepelev) almost 2 years ago

  • Description updated (diff)

Updated by sawa (Tsuyoshi Sawada) almost 2 years ago

Using numbered parameters, we can do slightly better:

{apples: 1, oranges: 2}.merge({apples: 3, bananas: 5}){_2 + _3}

although I am neutral about the proposal.

Updated by zverok (Victor Shepelev) almost 2 years ago

@sawa (Tsuyoshi Sawada) I didn't mention the solution with numeric arguments because I believe it to be even more cryptic than with named ones.

The reader needs to remember at all times what's the protocol of merge block (merge with a block is not used every day, so it is not a given) and what was that first argument that we are ignoring.

With named arguments, we can at least give a hint (in some codebases, I use _k, o, n, which is more like "note to self", in others, I prefer _key, oldval, newval or something like that).

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

zverok (Victor Shepelev) wrote:

E.g.: If, and only if, the passed block is of arity 2, treat it as an operation on old and new values. Otherwise, proceed as before (maintaining backward compatibility.)

Usage:

{apples: 1, oranges: 2}.merge(apples: 3, bananas: 5, &:+)
#=> {:apples=>4, :oranges=>2, :bananas=>5}

:+.to_proc is a proc just calls + method on the first argument with the rest.
That means its arity is not deterministic.

{words: %w[I just]}.merge(words: %w[want a group], &:concat)
#=> {:words=>["I", "just", "want", "a", "group"]}

In this example, you expect Array#concat on the old values, but the arity of Array#concat is -1 not 2.

Updated by zverok (Victor Shepelev) over 1 year ago

@nobu (Nobuyoshi Nakada) All of my examples work with my reference implementation. You can try it yourself.

:any_symbol.to_proc.arity is -2, corresponding to the following lambda:

->(first, *rest) { first.send(symbol, *rest) }

The behavior is corresponding, too:

def fake_to_proc(symbol) = ->(first, *rest) { first.send(symbol, *rest) }

:+.to_proc.arity #=> -2
fake_to_proc(:+).arity #=> -2

:+.to_proc.parameters       #=> [[:req], [:rest]]
fake_to_proc(:+).parameters #=> [[:req, :first], [:rest, :rest]]

:+.to_proc.call(1)
# `+': wrong number of arguments (given 0, expected 1) (ArgumentError) -- on handling +, not calling the lambda
fake_to_proc(:+).call(1)
# `+': wrong number of arguments (given 0, expected 1) (ArgumentError)

:+.to_proc.call(1, 2)       #=> 3
fake_to_proc(:+).call(1, 2) #=> 3

Therefore:

  • Any :+.to_proc.arity is -2
  • Which is not a bug/accident, but a proper reporting of arity/parameters
  • Which actually made me think about this idea with merge :)
  • Which works with the reference implementation.

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

zverok (Victor Shepelev) wrote in #note-7:

  • Any :+.to_proc.arity is -2
  • Which is not a bug/accident, but a proper reporting of arity/parameters

That -2 means just unlimited.

  • Which actually made me think about this idea with merge :)

.abs == 2? 😅

Updated by zverok (Victor Shepelev) over 1 year ago

That -2 means just unlimited.

Well, it is obviously not my call to decide what it means, but I interpret it as "2 explicitly declared params (plus some unpacking probably happening)". I mean, it is not exactly the same as -1 or -3, right?..

So I believe it is a good enough heuristic for this case because when somebody provides an old-style block, its arity would be:

proc { |key, oldval, newval| }.arity #=> 3

E.g. not 2 or -2 definitely.

So, yeah, arity.abs == 2 is a lousy heuristic, but my estimation is it should be enough to provide reasonable distinction and handle most common cases to simplify.

Updated by Eregon (Benoit Daloze) over 1 year ago

-2 means 1 required argument, and rest argument (e.g. p method(def m(a,*); end).arity => -2).

I think using this new behavior for -2 is too hacky.

For arity == 2, it seems more reasonable, and the examples above could use _1 + _2, etc.
Although changing for arity 2 could break code like a.merge(b) { |k,old| old }.

Updated by matz (Yukihiro Matsumoto) over 1 year ago

  • Status changed from Open to Rejected

It looks nice at the first sight but may cause the compatibility issue as @Eregon (Benoit Daloze) mentioned.

Matz.

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0