Feature #20300
openHash: set value and get pre-existing value in one call
Description
When using a Hash, sometimes you want to set a new value, and see what was already there. Today, you have to do this in two steps:
h = { k: "old value" }
# 1. Do a look-up for `:k`.
old_value = h[:k]
# 2. Do another look-up for `:k`, even though we just did that!
h[:k] = "new value"
use(old_value)
This requires two separate Hash
look-ups for :k
. This is fine for symbols, but is expensive if computing #hash
or #eql?
is expensive for the key. It's impossible to work around this today from pure Ruby code.
One example use case is Set#add?
. See https://bugs.ruby-lang.org/issues/20301 for more details.
I propose adding Hash#exchange_value
, which has semantics are similar to this Ruby snippet:
class Hash
# Exact method name TBD.
def exchange_value(key, new_value)
old_value = self[key]
self[key] = new_value
old_value
end
end
... except it'll be implemented in C, with modifications to tbl_update
that achieves this with a hash-lookup.
I'm opening to alternative name suggestions. @nobu (Nobuyoshi Nakada) came up with exchange_value
, which I think is great.
Here's a PR with a PoC implementation: https://github.com/ruby/ruby/pull/10092
h = { k: "old value" }
# Does only a single hash look-up
old_value = h.exchange_value(:k, "new value")
use(old_value)
Updated by AMomchilov (Alexander Momchilov) 8 months ago
- Description updated (diff)
Updated by AMomchilov (Alexander Momchilov) 8 months ago
- Description updated (diff)
Updated by AMomchilov (Alexander Momchilov) 8 months ago
- Description updated (diff)
Updated by AMomchilov (Alexander Momchilov) 8 months ago
- Description updated (diff)
Updated by nobu (Nobuyoshi Nakada) 8 months ago
The name update_value
doesn't feel to return the previous value, to me.
What about Hash#exchange_value
?
Updated by Eregon (Benoit Daloze) 8 months ago
FWIW, concurrent-ruby calls this get_and_set
: https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/Map.html#get_and_set-instance_method
Hash#exchange_value
sounds fine to me too.
Updated by rubyFeedback (robert heiler) 8 months ago · Edited
Not so sure about these names. Note that get_and_set() may have people wonder why it is not set_and_get(). :D
In regards to .update_value() people could ask why the old value is returned rather than the new one.
I guess exchange_value is a bit more specific than the other variants, as people can say "I put in value x, as the new value, and get, in exchange, value y which was the old one".
I don't have a better name suggestion though, but just from the alternatives so far I think nobu's suggestion is at the least better than the other ones, in my opinion.
I don't have any particular opinion on the suggested functionality; in my own code I kind of keep setters and getters usually separate, or, e. g. first set and then get (or get first, assign to a variable, then set to a new value). It would be nice if we could have some kind of "universal idiom" for the desired functionality though - could make it easier if different programming languages, at the least OOP-centric ones, would use the same name/terminology for that. That would make naming things easier too, or at the least more consistent across programming languages, just like setters and getters are fairly well-established idioms in OOP-centric languages (ruby is kind of multi-paradigm; I don't know how functional programming languages solve this issue of setting and obtaining a new or old value in one go, so I can't comment on that).
Updated by Eregon (Benoit Daloze) 8 months ago
- Related to Bug #20301: `Set#add?` does two hash look-ups added
Updated by Eregon (Benoit Daloze) 8 months ago
- Related to Feature #17342: Hash#fetch_set added
Updated by MaxLap (Maxime Lapointe) 8 months ago
In my mind, get_and_set
is clear as it is in the order the operations are done.
But if you want clearer, get_then_set
puts emphasis on the order.
Things like exchange
and swap
can sound like swapping the value between two keys.
Honestly, I'm not sure I ever needed this. If fetch_set
was refused, which I frequently need, this likely won't pass either.
Updated by matheusrich (Matheus Richard) 8 months ago
Rust calls this method insert
:
Inserts a key-value pair into the map.
If the map did not have this key present, None is returned.
If the map did have this key present, the value is updated, and the old value is returned.
let mut map = HashMap::new(); assert_eq!(map.insert(37, "a"), None); assert_eq!(map.is_empty(), false); map.insert(37, "b"); assert_eq!(map.insert(37, "c"), Some("b")); assert_eq!(map[&37], "c");
Updated by AMomchilov (Alexander Momchilov) 8 months ago
@matheusrich (Matheus Richard) I like that name in general, but it's really similar to the existing Hash#store
and the distinction is non-obvious.
Swift calls this updateValue(_:forKey:)
, which returns a Value?
("the value that was replaced, or nil
if a new key-value pair was added."), but it has a similar problem.
I think #exchange_value
is the best recommendation yet, because it makes it reasonably easy to guess what the return value would be, even without reading the docs.
Updated by matheusrich (Matheus Richard) 8 months ago
Some other examples from other languages:
-
Java and Kotlin call this
put
. -
Groovy calls it
putAt
-
Elixir has an interesting
get_and_update
, which is very descriptive, but also accepts a function argument, which can be used to operate on the old value before returning it. I could see us supporting both cases:
hash.get_and_update(:key, :new_value)
hash.get_and_update(:key) { |old| old.upcase }
-
Clojure does something similar with
update-in
. -
Racket calls it
hash-update
. -
Haskell calls this
insertLookupWithKey
😅
I like exchange_value
too. While a bit more verbose, having the _value
suffix opens the opportunity to add a similar method for keys in the future
Another suggestion is replace_value
, since we already have replace
on hash.
Updated by AMomchilov (Alexander Momchilov) 8 months ago · Edited
@matheusrich (Matheus Richard) Thanks for putting together that list!
In general, I like the idea of having this method support an optional block, but we would need to benchmark it to ensure it doesn't have an undesired slow down in the exchange_value(:key, :new_value)
case.
Really, this API's main goal is performance (since it doesn't do anything you couldn't already do with a combination of #has_key?
, #[]
and #[]=
), so if an optional block arg slows that down, it would entirely defeat its benefit. That can still exist as an option, it would just need to be a different method.
Updated by matheusrich (Matheus Richard) 8 months ago
#get_and_transform
could be a name for the version with the block, if performance is impacted significantly.
Updated by AMomchilov (Alexander Momchilov) 8 months ago
- Description updated (diff)
Updated by nobu (Nobuyoshi Nakada) 8 months ago
I want ENV.exchange_value
rather than Hash#exchange_value
.
Updated by mame (Yusuke Endoh) 8 months ago
Discussed at developers meeting. @shyouhei (Shyouhei Urabe) pointed out that Set#add?
could be improved with a single lookup by looking at the change in size, without this proposal.
https://bugs.ruby-lang.org/issues/20301#note-8
@matz (Yukihiro Matsumoto) said that we need another good use case to introduce this if it is sufficient for Set#add?
.
Updated by Eregon (Benoit Daloze) 8 months ago · Edited
That implementation is incorrect (i.e. less correct than current implementation), and Hash#exchange_value
is needed to be thread-safe for Set, see https://bugs.ruby-lang.org/issues/20301.
Updated by Eregon (Benoit Daloze) 8 months ago · Edited
nobu (Nobuyoshi Nakada) wrote in #note-17:
I want
ENV.exchange_value
rather thanHash#exchange_value
.
Agreed it's a generally useful thing (e.g. to swap an env var and later restore it).
So I think we should add it for both Hash and ENV (ENV has all Hash
methods anyway).
Updated by AMomchilov (Alexander Momchilov) 8 months ago
nobu (Nobuyoshi Nakada) wrote in #note-17:
I want
ENV.exchange_value
rather thanHash#exchange_value
.
Is there a reason to put it on ENV
, but not on Hash
? That seems like a needless restriction to me.
Updated by shyouhei (Shyouhei Urabe) 8 months ago
I'm neutral to this particular method (could be handy). If thread safety is in mind perhaps atomic compare-and-exchange could be better than just exchange, though.
Updated by nobu (Nobuyoshi Nakada) 7 months ago · Edited
AMomchilov (Alexander Momchilov) wrote in #note-21:
nobu (Nobuyoshi Nakada) wrote in #note-17:
I want
ENV.exchange_value
rather thanHash#exchange_value
.Is there a reason to put it on
ENV
, but not onHash
? That seems like a needless restriction to me.
ENV
is a hash-like object, so I'm for Hash#exchange_value
too.
Updated by mame (Yusuke Endoh) 7 months ago
@matz (Yukihiro Matsumoto) said he was not sure what a name is good for this method because its true motivation is unclear.
It was originally intended as a method to improve the efficiency of Set#add?
, but the use case was shifted to a method for thread safety. This history makes the use case less persuasive.
If the main purpose is thread safety, we want to respect the terminology of the parallel computing area. If it is just an internal method for efficiency, a long and verbose name may be preferred. (If it is a daily-use method, a short and convenient name may be preferred.)
You may want to explain the concrete example of the use case of thread safety for Hash value exchange, this proposed API is really sufficient for that example use case, and what API and name are given to similar feature in other languages.
Updated by AMomchilov (Alexander Momchilov) 6 months ago
mame (Yusuke Endoh) wrote in #note-24:
It was originally intended as a method to improve the efficiency of
Set#add?
, but the use case was shifted to a method for thread safety. This history makes the use case less persuasive.
The immediate need and use case is for that performance improvement to Set.
The discussion of thread safety is just an incidental benefit of this approach, as compared to the count-based alternative that was proposed.
If it is just an internal method for efficiency, a long and verbose name may be preferred.
I would be okay with this, if there’s no appetite for a public, short-named API for this. Perhaps some underscored name that communicates that this is “library internal” to the stdlib.
You may want to explain the concrete example of the use case of thread safety for Hash value exchange, this proposed API is really sufficient for that example use case,
I don’t think I can make a better case than the concrete implementation I’ve already proposed, and its immediate application to the Set performance problem I’ve identified and fixed, using this API.
If others want to chime in with the use cases in they have in mind, I’d love to hear those.
and what API and name are given to similar feature in other languages.
Let’s first decide if we want this API, in any form. Once we agree on that, I’d be happy to dive into details like naming, and comparisons to other languages.