Project

General

Profile

Actions

Feature #15563

open

#dig that throws an exception if a key doesn't exist

Added by 3limin4t0r (Johan Wentholt) almost 6 years ago. Updated over 4 years ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:91265]

Description

Ruby 2.3.0 introduced #dig for Array, Hash and Struct. Both Array and Hash have #fetch, which works the same way as #[], but instead of returning the default value, raises an exception (unless a second argument or block is given). Hash also has #fetch_values, which works the same way as #values_at, raising an exception if a key is missing. For #dig, there is no such option.

My proposal is to add a method that works the same way as #dig, but instead of using the #[] accessor, uses #fetch.

This method would look something like this:

module Traversable
  def traverse(key, *others)
    value = fetch(key)
    return value if others.empty?

    if value.respond_to?(__method__, true)
      value.send(__method__, *others)
    else
      raise TypeError, "#{value.class} does not have ##{__method__} method"
    end
  end
end

Array.include(Traversable)
Hash.include(Traversable)

The exception raised is taken from #dig ([1].dig(0, 1) #=> TypeError: Integer does not have #dig method).

yaml = YAML.load_file('some_file.yml')

# this change is meant to make chaining #fetch calls more easy
yaml.fetch('foo').fetch('bar').fetch('baz')

# would be replaced with
yaml.traverse('foo', 'bar', 'baz')

I'm curious to hear what you guys think about the idea as a whole and the method name.


Related issues 2 (2 open0 closed)

Is duplicate of Ruby master - Feature #12282: Hash#dig! for repeated applications of Hash#fetchOpenActions
Is duplicate of Ruby master - Feature #14602: Version of dig that raises error if a key is not presentOpenActions

Updated by 3limin4t0r (Johan Wentholt) almost 6 years ago

  • Description updated (diff)

I just discovered that #dig also call private methods. I updated the provided examples to do the same.

hash = { b: 'b' }
hash.singleton_class.send(:private, :dig)
{ a: hash }.dig(:a, :b)
#=> 'b'

Updated by shevegen (Robert A. Heiler) almost 6 years ago

I have no particular pro or con against the feature itself as such; I myself do not use or need .dig so I
can not speak much about it. But I believe one problem with the proposal here is the name.

I think a name such as "dig_e" would be very, very rare to see in ruby. Of course I have no idea how
matz thinks about it, but I would recommend to you to also consider alternative names; or perhaps
let it handle just through arguments, whatever may seem to fit better.

Short names are sometimes really, really great, such as p and pp; but I think one overall concern may
be to not lose too much of the meaning. Off the top of my head, I can only think of FileUtils having
odd/very short method names, and this is mostly because it sort of "simulates" how coreutils utilities
such as "mkdir -p" and similar work.

If you look at recent changes in ruby, you may notice the :exception key - :e would be shorter than
that too, but I think it may not be a primary goal at all times to be too overly succinct, so if that is
a valid reasoning then I think this may explain why :exception would be used, and no shorter
variant. A similar reasoning could apply to the case here - but again, ultimately you have to see what
matz thinks about it not how others may think about it. :)

Updated by jwmittag (Jörg W Mittag) almost 6 years ago

shevegen (Robert A. Heiler) wrote:

I have no particular pro or con against the feature itself as such; I myself do not use or need .dig so I
can not speak much about it. But I believe one problem with the proposal here is the name.

I think a name such as "dig_e" would be very, very rare to see in ruby. Of course I have no idea how
matz thinks about it, but I would recommend to you to also consider alternative names; or perhaps
let it handle just through arguments, whatever may seem to fit better.

There is a well-established convention in Ruby, when you have a pair of methods that does similar things in different ways, to name them foo and foo!. For example, select and select!, Process::exit and Process::exit!, and so on.

So, one possibility would be dig!.

Updated by matz (Yukihiro Matsumoto) almost 6 years ago

I am against dig! for this purpose. When we have two versions of a method (foo and foo!), the bang version should be more dangerous than the non-bang version. dig! is not the case.

And with whatever name, we need the real-world use-case for a new method. "We don't have fetch counterpart of dig" is not good enough.

Matz.

Actions #5

Updated by k0kubun (Takashi Kokubun) almost 6 years ago

  • Is duplicate of Feature #12282: Hash#dig! for repeated applications of Hash#fetch added

Updated by k0kubun (Takashi Kokubun) almost 6 years ago

Personally I've hit a real-world use-case of this feature many times.

I often manage structured configs with nested YAML files and load them from Ruby. With current Ruby, to avoid an unhelpful exception NoMethodError, I assert the existence of the deep keys using a Hash#fetch chain like this:

config = YAML.load_file('config.yml')
config.fetch('production').fetch('environment').fetch('SECRET_KEY_BASE') #=> an exception like: KeyError: key not found: "SECRET_KEY_BASE"

If we had such a method, we would be able to easily write (let's say it's named Hash#deep_fetch instead of #dig!):

config.deep_fetch('production', 'environment', 'SECRET_KEY_BASE')

and the best part is that we could get a more helpful error message like "key not found: production.environment.SECRET_KEY_BASE" whose nested information isn't available with a Hash#fetch method chain.


By the way, if we had this, I would like to have a keyword argument default: like the second optional argument of Hash#fetch:

config.deep_fetch('production', 'environment', 'SECRET_KEY_BASE', default: '002bbfb0a35d0fd05b136ab6333dc459')

we want to safely manage the credentials only for production, so sometimes we don't want to manage credentials in (safely-managed originally-encrypted) YAML file for development environment and just want to return the unsafe thing as a default value.

(edit: s/fetch_keys/deep_fetch/, like I proposed 2 years ago)

Updated by 3limin4t0r (Johan Wentholt) almost 6 years ago

My scenario would be similar to k0kubuns scenario.

# The connection translates the request to JSON and parses the response
# from JSON into the correct objects. In this case a nested hash structure.
response = connection.send(request)

# assign shortcuts
report = response
         .fetch('Era.Common.NetworkMessage.ConsoleApi.Reports.RpcGenerateReportResponse')
         .fetch('report')

column_data   = report.fetch('data').fetch('columns')
column_labels = report.fetch('rendering').fetch('table').fetch('columns')

# build report
report_data = column_data.each_with_object({}) do |column, data|
  column_id       = column.fetch('header').fetch('column_id')
  data[column_id] = column.fetch('values')
end

report = column_labels.each_with_object({}) do |column, data|
  label       = column.fetch('label').fetch('literal')
  column_id   = column.fetch('column_id')
  data[label] = report_data.fetch(column_id)
end

From the above scenario you can see that having this new functionality would help clean things up.

The reason I use #fetch here is because the API to which I'm talking might change its structure. Getting an error as soon as possible reduces debug time. If #dig where used, nil would be returned when the structure is invalid. This would most of the time raise an exception somewhere else that then needs to be traced back to its source (the changed response structure).

My preference goes out to dropping the "returning nil if any intermediate step is nil" description (as described in point 2 of the feature proposal). Otherwise, when a key is present but the value is set to nil it will short circuit out of the method. Dropping this part of the #dig description would ensure the full path is traversed.

I also had a look at the linked feature proposal. I find the name #deep_fetch the most descriptive. #fetch_keys sounds like it will fetch multiple keys on a single hash (basically what #fetch_values does). #fetch_all suffers from the same problem. If the eventual version always traverses the full path (see point 2 of the feature proposal) #traverse could be an option.

Updated by walerian (Walerian Sobczak) over 5 years ago

I would suggest #retrieve. It's just a stronger #fetch, and the dictionary definition reflects its meaning:

retrieve (verb)

  1. get or bring (something) back from somewhere
  2. find or extract

The name is still short and simple, but also idiomatic and meaningful at the same time.

config = YAML.load_file('config.yml')

# so instead of this:
config.fetch('production').fetch('environment').fetch('SECRET_KEY_BASE')

# we would have:
config.retrieve('production', 'environment', 'SECRET_KEY_BASE')
Actions #9

Updated by k0kubun (Takashi Kokubun) over 5 years ago

  • Is duplicate of Feature #14602: Version of dig that raises error if a key is not present added
Actions #10

Updated by knu (Akinori MUSHA) over 5 years ago

I thought adding an optional block to dig could be an idea.

obj.dig(*keys) { |dug_keys, rest_keys|
  # raise yourself or return something
}

But the costs required for extending the dig method as such in Array or Hash-like classes would be a bit too high.

Updated by 3limin4t0r (Johan Wentholt) almost 5 years ago

  • Description updated (diff)

Changed naming from #dig_e to #traverse. Removed alternative options, and kept the strict version since #fetch is also strict. Possibility to expand upon this with a block or :default keyword argument is possible, but should be its own feature request.

Updated by 3limin4t0r (Johan Wentholt) almost 5 years ago

  • Description updated (diff)

Removed the two discussion points in the previous edit. This edit adjusts the last sentence to reflect this.

Updated by amcaplan (Ariel Caplan) almost 5 years ago

matz (Yukihiro Matsumoto) wrote:

I am against dig! for this purpose. When we have two versions of a method (foo and foo!), the bang version should be more dangerous than the non-bang version. dig! is not the case.

And with whatever name, we need the real-world use-case for a new method. "We don't have fetch counterpart of dig" is not good enough.

Matz.

I think that I, along with others, have gotten used to bang methods being subject to more errors rather than mutating state, due to how ActiveSupport uses them. Ruby is not Rails, though, so this is a good reminder :)

I like #deep_fetch, which also happens to be a gem that already does this (https://rubygems.org/gems/deep_fetch). It was designed for implicitly validating that API formats have not changed, which says something about the need for this. I think having an easy way to dig while doing that constant validation would encourage writing code that notifies us right away when our assumptions are wrong.

As an alternative, maybe #fetch_strict or (though I like this less) a named parameter passed to dig like my_array.dig(:foo, :bar, :baz, strict: true). The latter option currently isn't possible since it's unclear whether strict: true is a key or a named parameter, but that problem will go away in Ruby 3.

Updated by robb (Robb Shecter) over 4 years ago

amcaplan (Ariel Caplan) wrote in #note-13:

matz (Yukihiro Matsumoto) wrote:

[...] with whatever name, we need the real-world use-case for a new method. "We don't have fetch counterpart of dig" is not good enough.

[...] implicitly validating that API formats have not changed...

I have a similar real-world use-case: I use chained #fetchs when importing JSON: My code needs the JSON to have the correct attributes and structure, and when it's tedious to explicitly check and handle every potential missing attribute. I can either: let my import script just crash with the clear exception message, or choose to handle exceptions in just one place. (Handling nils is more complex and less clear.). From my production app:

      # @param from_json [Hash]
      # @return [nil]
      def import_objects(from_json:)
        import_chapter from_json.fetch('statuteTree').fetch('chapter0')
        import_titles from_json.fetch('statuteTree').fetch('titles')
      end

    # ...

        # 3. Decide: Are my children Sections or Sub-chapters? Then,
        #    import them.
        child_nodes = chapter_json.fetch('content').fetch('contents')

    
    # ...


      # @param nrs_json [Hash]
      # @return [Boolean]
      def simple_subchapter_content?(nrs_json)
        nrs_json.fetch('children').fetch('tag') == 'SubChapterSections'
      end


      # @param nrs_json [Hash]
      # @return [Boolean]
      def simple_chapter_content?(nrs_json)
        nrs_json.fetch('content').fetch('tag') == 'SimpleChapterContent'
      end

I wrote this gem for chained #fetchs on Hashes and Arrays and has a very clean one-line implementation using #reduce: https://github.com/dogweather/digbang to make it easier me for the do the above. Caveat, it uses the Rails-like #dig! naming, but this shows the real-world use-case for a checked #dig.

Updated by robb (Robb Shecter) over 4 years ago

matz (Yukihiro Matsumoto) wrote in #note-4:

And with whatever name, we need the real-world use-case for a new method.

Here is another real-world use-case, from my public.law apps which parse JSON:

      def insert_subparts(for_id:, with_map:)
        sub_documents = JSON.parse(File.read("#{for_id}.json")).fetch('result').fetch('documents').fetch('documents').fetch('items')
        insert_sub_documents from_tree:    sub_documents,
                             under_parent: with_map.fetch(for_id)
      end

I'm sure you can see why I've created syntactic/semantic sugar for all these #fetch invocations.

Actions #16

Updated by sawa (Tsuyoshi Sawada) over 4 years ago

  • Subject changed from #dig that throws an exception if an key doesn't exist to #dig that throws an exception if a key doesn't exist
  • Description updated (diff)
Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0