Feature #17342
closedHash#fetch_set
Added by MaxLap (Maxime Lapointe) almost 4 years ago. Updated 9 months ago.
Description
I would like to propose adding the fetch_set
method to Hash
. It behaves just like fetch
, but when using the default value (2nd argument or the block), it also sets the value in the Hash for the given key.
We often use the pattern cache[key] ||= calculation
. This pattern however has a problem when the calculation could return false or nil, as in those case, the calculation is repeated each time.
I believe the best practice in that case is:
cache.fetch(key) { cache[key] = calculation }
With my suggestion, it would be:
cache.fetch_set(key) { calculation }
In these examples, each part is very short, so the fetch
case is still clean. But as each part gets longer, the need to repeat cache[key] becomes more friction.
Here is a more realistic example:
# Also using the key argument to the block to avoid repeating the
# long symbol, adding some indirection
RequestStore.store.fetch(:monitor_value_is_delayed?) do |key|
RequestStore.store[key] = !MonitorValue.where('date >= ?', Time.now - 5.minutes).exists?
end
RequestStore.store.fetch_set(:monitor_value_is_delayed?) do
!MonitorValue.where('date >= ?', Time.now - 5.minutes).exists?
end
There is a precedent for such a method: Python has it, but with a quite confusing name: setdefault(key, default_value)
. This does not set a default for the whole dictionary as the name would make you think, it really just does what is proposed here. https://docs.python.org/3/library/stdtypes.html#dict.setdefault
Updated by chrisseaton (Chris Seaton) almost 4 years ago
Thanks I've always wanted this feature. Whenever I write cache.fetch(key) { cache[key] = calculation }
(or more often cache.fetch(key) { |k| cache[k] = calculation }
) I think 'surely this pattern is worth making simpler.'
Updated by duerst (Martin Dürst) almost 4 years ago
I think the feature in general is okay, but I have two concerns:
-
The name very easily suggests that the method is fetching a set, rather than fetching and setting at the same time.
-
Why do we need a block for the second parameter? Can't that just be an ordinary parameter?
Updated by chrisseaton (Chris Seaton) almost 4 years ago
#fetch_or_set
could be a good name.
Why do we need a block for the second parameter? Can't that just be an ordinary parameter?
Because we don't want to do the work of calculating the initial value if it isn't needed because the value is already set.
Updated by phluid61 (Matthew Kerwin) almost 4 years ago
chrisseaton (Chris Seaton) wrote in #note-3:
#fetch_or_set
could be a good name.Why do we need a block for the second parameter? Can't that just be an ordinary parameter?
Because we don't want to do the work of calculating the initial value if it isn't needed because the value is already set.
I can see the utilitiy, it would be good if it had a similar signature to #fetch :
hsh = {}
hsh.fetch_or_set(:a, 1) # => 1, hsh = {:a => 1}
hsh.fetch_or_set(:b) {|key| key.to_s } # => "b", hsh = {:a => 1, :b => "b"}
The only reason I don't suggest it as a keyword argument to #fetch is because of the edge of case of including the kwarg and not including the default value/block, otherwise hsh.fetch(:a, 1, store_default: true)
would be okay.
Updated by MaxLap (Maxime Lapointe) almost 4 years ago
I didn't put an example using 2 parameters instead of a block, but yes, that option is also available. It has the same signature as a normal fetch
would. In the examples, calculation was meant to be a complex thing, possibly a method call, which we want to avoid doing if the key is already in the Hash.
I understand the idea that it could be about fetching a set, but I don't see what set that would be. So it doesn't feel ambiguous to me. I did consider the fetch_or_set
alternative, but core methods normally prefer to be shorter when the general idea is pretty clear. But it's not a bad name and I would still be happy to have the feature if that was it.
Updated by nobu (Nobuyoshi Nakada) almost 4 years ago
XXX_or_YYY doesn't feel a good name in general.
Why does it need to be a method of Hash
, and built-in?
Why not a separate class?
class Cache < Hash
def fetch(key, &block)
super(key) {self[key] = yield(key)}
end
end
Updated by jbeschi (jacopo beschi) almost 4 years ago
fetch_set
mixes the concept of query with the concept of command and I think it's not a good approach.
Moreover I don't really see any big advantage in writing
cache.fetch_set(key) { calculation }
vs
cache.fetch(key) { cache[key] = calculation }
Therefore I don't understand why we should add it to the language.
Updated by Eregon (Benoit Daloze) almost 4 years ago
Another name for this is compute_if_absent
.
It's notably the name used in concurrent-ruby:
http://ruby-concurrency.github.io/concurrent-ruby/1.1.5/Concurrent/Map.html#compute_if_absent-instance_method
And in Java, where it's even defined on Map (not just ConcurrentMap):
https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#computeIfAbsent-K-java.util.function.Function-
Having it as a built-in method, it also makes it possible to avoid computing the key's #hash
twice, and potentially avoid redoing the lookup in the Hash.
Another way to do this pattern is to use the default block:
RequestStore.store = Hash.new do |h, k|
h[k] = !MonitorValue.where('date >= ?', Time.now - 5.minutes).exists?
end
RequestStore.store[:monitor_value_is_delayed?]
Which already works fine.
And it has the advantage that if multiple places want to read from the Hash they don't have to repeat the code.
Is there a case this pattern wouldn't work and where Hash#fetch_set
would work?
This pattern can be made to work with parallelism too, see Idiomatic Concurrent Hash Operations, page 83.
Regarding concurrency and parallelism, we need to define the semantics if we add this method.
Of course, the assignment should not be performed if there is already a key, it must be "put if absent" semantics
(cache.fetch(key) { cache[key] = calculation }
is actually breaking that).
The question is whether the given block can be executed multiple times for a given key.
If not, it requires synchronization while calling the block, which can lead to deadlocks.
If yes, it doesn't require synchronization while calling the block which seems safer, but it means the block can be called multiple times.
Updated by MaxLap (Maxime Lapointe) almost 4 years ago
I forgot to mention, but this pattern can also be more performant, as the key only needs to be hashed once for both the fetching and setting part. It's minor, but I do think of it everything I write the fetch
pattern, and when the key is complex, this could be meaningful. (Dang, Eregon beat me to this by 2 minutes!)
nobu (Nobuyoshi Nakada) wrote in #note-6:
Why not a separate class?
class Cache < Hash def fetch(key, &block) super(key) {self[key] = yield(key)} end end
The Hash that we are using is not always under our control. In my second example, I'm using the request_store
gem, which exposes as Hash
(RequestStore.store
), as cache. It would be quite dirty to monkey patch this. I could add that custom Hash in the store
, but that would be losing the cleaner and shorter code benefits that I'm trying to achieve with this feature request.
Doing this class is also less performant instead of being more. (This is a very minor point)
Also: If I see somewhere cache.fetch(key) { calculation }
I will instantly be worried:
- Did the person forget to set the key?
- Is it only set elsewhere?
=> Oh wait, it's a custom class that does something different. - Will one of my colleague copy this pattern without remembering that this is a different class?
So if I was to make a different class, I would still use a different name just to avoid the frictions it would cause.
jbeschi (jacopo beschi) wrote in #note-7:
fetch_set
mixes the concept of query with the concept of command and I think it's not a good approach.
Maybe in name it appears so, but not in spirit. All this does is set a value if one isn't already there, and in common Ruby spirit, it returns something that can be useful, which is the value at the key.
This is where the Python name for the function comes from, setdefault
sets a value "by default" for a single key, and is often used to replace cache[key] ||= []
since Python doesn't support this syntax.
Eregon (Benoit Daloze) wrote in #note-8:
Another name for this is
compute_if_absent
.
True, however my thinking is this could also be used with a 2nd argument, like fetch
, in which case there is no "computing" to speak of.
Another way to do this pattern is to use the default block:
RequestStore.store = Hash.new do |h, k| h[k] = !MonitorValue.where('date >= ?', Time.now - 5.minutes).exists? end RequestStore.store[:monitor_value_is_delayed?]
Which already works fine.
And it has the advantage that if multiple places want to read from the Hash they don't have to repeat the code.
Is there a case this pattern wouldn't work and whereHash#fetch_set
would work?
RequestStore is used for lots of different things, setting a default like that means that if I use it for RequestStore.store[:last_count]
, and that wasn't set, then I would instead be doing this MonitorValue check. And I can only use your pattern once per Hash.
This pattern can be made to work with parallelism too, see Idiomatic Concurrent Hash Operations, page 83.
Regarding concurrency and parallelism, we need to define the semantics if we add this method.
Of course, the assignment should not be performed if there is already a key, it must be "put if absent" semantics
(cache.fetch(key) { cache[key] = calculation }
is actually breaking that).
I don't understand what you mean, how is it breaking that? You mean in the case of threading, where there could be 2 assignments if 2 threads go in at the same time? I don't think it's the job of the Hash to deal with this.
The question is whether the given block can be executed multiple times for a given key.
If not, it requires synchronization while calling the block, which can lead to deadlocks.
If yes, it doesn't require synchronization while calling the block which seems safer, but it means the block can be called multiple times.
I don't consider Hash to be a concurrency primitive in Ruby. So I wouldn't put any synchronization here. If synchronization is needed, it can be done from inside the block (or the block of the classic fetch
pattern).
Updated by Eregon (Benoit Daloze) almost 4 years ago
MaxLap (Maxime Lapointe) wrote in #note-9:
The Hash that we are using is not always under our control. In my second example, I'm using the
request_store
gem, which exposes asHash
(RequestStore.store
), as cache.
I see, https://github.com/steveklabnik/request_store, so you might want a different block per key essentially, and so a single default block isn't enough.
It would be quite dirty to monkey patch this.
There is Hash#default_proc=
but it would be dirty in this case indeed.
I don't understand what you mean, how is it breaking that? You mean in the case of threading, where there could be 2 assignments if 2 threads go in at the same time? I don't think it's the job of the Hash to deal with this.
Yes, exactly.
Many gems assume Hash is thread-safe, so in my opinion it needs to be thread-safe, and it is at least in CRuby.
Even without threads you need to consider cases like the block itself would already set the key, and you shouldn't set it again.
I don't consider Hash to be a concurrency primitive in Ruby. So I wouldn't put any synchronization here. If synchronization is needed, it can be done from inside the block (or the block of the classic
fetch
pattern).
This method fits very clearly for caches, and caches are typically used for multiple thread, so I think we should consider the concurrency semantics.
Concurrency can also happen for Hash when mutating while iterating, or some methods can be called recursively, so even without threads it needs to be considered.
Agreed no synchronization when calling the block is better, and the block can make its own synchronization to avoid running twice per key if really needed.
On CRuby, the GIL will be enough to do a check if the key is already set just before setting it from the result of the block, but the code needs to be careful there is no Ruby call between that check and assignment, or it would break with multiple threads.
Updated by nobu (Nobuyoshi Nakada) almost 4 years ago
Eregon (Benoit Daloze) wrote in #note-8:
Having it as a built-in method, it also makes it possible to avoid computing the key's
#hash
twice, and potentially avoid redoing the lookup in the Hash.
It is true ideally, but no one can guarantee the hash value never change.
MaxLap (Maxime Lapointe) wrote in #note-9:
The Hash that we are using is not always under our control. In my second example, I'm using the
request_store
gem, which exposes asHash
(RequestStore.store
), as cache. It would be quite dirty to monkey patch this. I could add that custom Hash in thestore
, but that would be losing the cleaner and shorter code benefits that I'm trying to achieve with this feature request.
module Caching
refine Hash do
def cache(key)
fetch(key) {self[key] = yield(key)}
end
end
end
using Caching
RequestStore.store.cache(:monitor_value_is_delayed?) do
!MonitorValue.where('date >= ?', Time.now - 5.minutes).exists?
end
Updated by marcandre (Marc-Andre Lafortune) almost 4 years ago
On Thu, Nov 26, 2020 at 9:05 PM nobu@ruby-lang.org wrote:
It is true ideally, but no one can guarantee the hash value never change.
Please, let's be serious. hash[obj] ||= something_that_changes_obj
is nonsense code.
Updated by shyouhei (Shyouhei Urabe) almost 4 years ago
+1 for the feature. I have had chances to write what is proposed here several times.
marcandre (Marc-Andre Lafortune) wrote in #note-12:
On Thu, Nov 26, 2020 at 9:05 PM nobu@ruby-lang.org wrote:
It is true ideally, but no one can guarantee the hash value never change.
Please, let's be serious.
hash[obj] ||= something_that_changes_obj
is nonsense code.
Yes I’m quite sure he is dead serious. There are idiots who write nonsense codes. As a language ruby has to be fool-proof.
Updated by marcandre (Marc-Andre Lafortune) almost 4 years ago
I think this is a good feature. I like the fact that the this can not be implemented in pure Ruby as efficiently (i.e. with hash calculated once).
An alternative name is fetch_init
, as it may gives a better idea that it is meant to initialize the value (once) for the key. That being said, fetch_set
is a good name too.
Updated by Eregon (Benoit Daloze) almost 4 years ago
shyouhei (Shyouhei Urabe) wrote in #note-13:
Yes I’m quite sure he is dead serious. There are idiots who write nonsense codes. As a language ruby has to be fool-proof.
I think such semantics (calling #hash only once) could be part of the method documentation.
OTOH calling #hash
twice is probably not very expensive, and for some cases (e.g. Integer, String) can be optimized by a JIT or by memoizing the hash on the instance.
I don't like the name fetch_set
much (does it fetch a Set?).
I think fetch_or_set
or cache
are better.
Updated by matz (Yukihiro Matsumoto) almost 4 years ago
- Status changed from Open to Feedback
Most of the case, hash[:key] ||= init
works. The exception is that init
value being false. But it should be rare.
So could you explain the concrete use-case for fetch_set()
?
Besides that I am not fully satisfied with the name fetch_set()
.
Matz.
Updated by marcandre (Marc-Andre Lafortune) almost 4 years ago
matz (Yukihiro Matsumoto) wrote in #note-16:
Most of the case,
hash[:key] ||= init
works. The exception is thatinit
value being false. But it should be rare.
We could look into public codebases for better numbers; it is not the most common case, but it is definitely encountered it with some frequency (more with nil
than false
).
So could you explain the concrete use-case for
fetch_set()
?
One additional use-case would be a thread-safe equivalent to Ractor.current[:my_key] ||= init
.
Besides that I am not fully satisfied with the name
fetch_set()
.
Did you consider fetch_init
?
Updated by Dan0042 (Daniel DeLorme) almost 4 years ago
marcandre (Marc-Andre Lafortune) wrote in #note-17:
One additional use-case would be a thread-safe equivalent to
Ractor.current[:my_key] ||= init
.
+1
Updated by MaxLap (Maxime Lapointe) almost 4 years ago
matz (Yukihiro Matsumoto) wrote in #note-16:
Most of the case,
hash[:key] ||= init
works. The exception is thatinit
value being false. But it should be rare.
The problem is not only with false
, but with nil
too.
Anytime you want to cache something and nil (or false) is a possible value (often meaning the data is missing), this problem arises, as it means these cases are recalculated each time, to again indicate that the data is missing.
# If there is no config with that internal_id, each time you try to use the
# cached version, which is meant to be fast, it would actually do a query to the database
def self.cached_by_internal_id(internal_id)
my_cache[internal_id] ||= Config.where(internal_id: internal_id).first
end
It is very easy to forget that nil values means the cache is not going to act as a cache. It probably happens in many places, but is not critical, just makes things a little slower from redoing work many times.
As for the name, there are other options that were suggested:
- fetch_init
- cache
Updated by p8 (Petrik de Heus) almost 4 years ago
Elixir has put_new and put_new_lazy:
https://hexdocs.pm/elixir/Map.html#put_new_lazy/3
Maybe store_if_new or store_new ?
cache.store_new(key) { calculation }
Updated by austin (Austin Ziegler) almost 4 years ago
My emails appear not to be making it through the mailing list gateway into the tickets, so…
I’m mostly doing Elixir these days, and one of the caching modules I use has as its main interface fetch_or_set
, so I think that’s the best choice if this is added.
On the other hand, Nobu’s refinement implementation (#note-11) looks pretty good (and I say this not using refinements yet, even though they’ve been around for a while).
put_new
doesn’t work like this in Elixir, but it is similar in some ways (put only if the key doesn’t exist).
The correct Ruby implementation for this would be something like
hash.key?(key) || hash[key] = value
The advantage of a fetch_or_set
method is that it can be implemented atomically for multithreading runtimes.
Updated by shyouhei (Shyouhei Urabe) almost 4 years ago
MaxLap (Maxime Lapointe) wrote in #note-19:
matz (Yukihiro Matsumoto) wrote in #note-16:
Most of the case,
hash[:key] ||= init
works. The exception is thatinit
value being false. But it should be rare.The problem is not only with
false
, but withnil
too.Anytime you want to cache something and nil (or false) is a possible value (often meaning the data is missing), this problem arises,
Matz says it's rare.
I'm not sure if that is true, but so far nobody shows any evidence about its rarity.
Updated by Eregon (Benoit Daloze) 9 months ago
- Related to Feature #20300: Hash: set value and get pre-existing value in one call added
Updated by Dan0042 (Daniel DeLorme) 9 months ago
It's confusing that this ticket, #17342, is striked out even though the status is "Feedback".
Another name for this could be Hash#add
; at least that's the terminology used in memcached and I believe some other caching servers.