Feature #11999
closed
MatchData#to_h to get a Hash from named captures
Added by sorah (Sorah Fukumori) about 9 years ago.
Updated over 8 years ago.
Description
class MatchData
def to_h
self.names.map { |n| [n, self[n]] }.to_h
end
end
p '12'.match(/(?<a>.)(?<b>.)(?<c>.)?/).to_h #=> {"a"=>"1", "b"=>"2", "c"=>nil}
Sometimes I want to get a Hash from named capture, but currently I have to use #names + #captures. How about adding MatchData#to_h for convenience way?
Files
Consideration is behavior for multiple captures with same group name:
/(?<a>.)(?<a>.)/
MatchData#[] returns the last one and my attached patch follows that behavior.
I agree. Please add this feature. I have also looked to do the same thing.
I don't think to_h
is appropriate, because MatchData
is not always able to convert to Hash/Map.
Is there any name candidate?
Matz.
is not always able to convert to Hash/Map.
Ah -- agreed. How about MatchData#named_captures?
I can't think this name is the best, suggestions welcome.
Shota Fukumori wrote:
is not always able to convert to Hash/Map.
Ah -- agreed. How about MatchData#named_captures?
I can't think this name is the best, suggestions welcome.
I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)
Matthew Kerwin wrote:
I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)
Could it make sense to include numbered captures in the hash, too? Just thinking aloud.
Martin Dürst wrote:
Matthew Kerwin wrote:
I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)
Could it make sense to include numbered captures in the hash, too? Just thinking aloud.
I thought so myself, but the regular expression engine currently does numbered captures only if there are no named captures.
Note: A regexp can't use named backreferences and numbered backreferences simultaneously.
-- http://ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Capturing
I guess this is spec.
also interesting if you have a with | combined regexp where both of them does have a named capture:
reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").captures #=> ["b", nil]
reg.match("abc")[:a] # => "b"
reg.match("xyz") # => #<MatchData "x" a:nil a:"x">
reg.match("xyz").captures #=> [nil, "x"]
reg.match("xyz")[:a] # => "x"
(also notice that in the inspect of MatchData the capture :a is shown twice.)
such things does need to be remembered when creating a new function for MatchData
@Shota: i do need to test your patch, but my case is a little bit different than yours.
because it can be nil, it seems to pick the first non-nil value in my case. (or is it the last non-nil?)
specially with your patch:
reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").to_h #=> {"a" => "b"} or {"a" => nil}
Yukihiro Matsumoto wrote:
I don't think to_h
is appropriate, because MatchData
is not always able to convert to Hash/Map.
Is there any name candidate?
I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.
irb(main):001:0> /(a)(b)(c)/.match("abc")
=> #<MatchData "abc" 1:"a" 2:"b" 3:"c">
irb(main):002:0> /(a)(b)(c)/.match("abc").to_h
=> {1=>"a", 2=>"b", 3=>"c"}
Yui NARUSE wrote:
Yukihiro Matsumoto wrote:
I don't think to_h
is appropriate, because MatchData
is not always able to convert to Hash/Map.
Is there any name candidate?
I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.
irb(main):001:0> /(a)(b)(c)/.match("abc")
=> #<MatchData "abc" 1:"a" 2:"b" 3:"c">
irb(main):002:0> /(a)(b)(c)/.match("abc").to_h
=> {1=>"a", 2=>"b", 3=>"c"}
I did some experimenting of my own to this end, and came up with this: https://github.com/phluid61/mug/blob/master/lib/mug/matchdata/hash.rb
The only real weirdness arises from the fact that positional captures don't happen at all if there's a named capture group in the Regexp; but given the resulting mutual exclusivity, the code itself becomes pretty straight-forward.
- Assignee set to sorah (Sorah Fukumori)
Updated patch (11999-2.diff).
- Status changed from Open to Closed
Applied in changeset r53863.
-
re.c: Add MatchData#named_captures
[Feature #11999] [ruby-core:72897]
-
test/ruby/test_regexp.rb(test_match_data_named_captures): Test for above.
-
NEWS: News about MatchData#named_captures.
Shouldn't this produce Symbol keys?
Also available in: Atom
PDF