Feature #18583
openPattern-matching: API for custom unpacking strategies?
Description
I started to think about it when discussing https://github.com/ruby/strscan/pull/30.
The thing is, usage of StringScanner for many complicated parsers invokes some kind of branching.
In pseudocode, the "ideal API" would allow to write something like this:
case <what next matches>
in /regexp1/ => value_that_matched
# use value_that_matched
in /regexp2/ => value_that_matched
# use value_that_matched
# ...
This seems "intuitively" that there should be some way of implementing it, but we fall short. We can do some StringScanner-specific matcher object which defines its own #===
and use it with pinning:
case scanner
in ^(Matcher.new(/regexp1/)) => value_that_matched
# ...
But there is no API to tell how the match result will be unpacked, just the whole StringScanner
will be put into value_that_matched
.
So, I thought that maybe it would be possible to define some kind of API for pattern-like objects, the method with signature like try_match_pattern(value)
, which by default is implemented like return value if self === value
, but can be redefined to return something different, like part of the object, or object transformed somehow.
This will open some interesting (if maybe uncanny) possibilities: not just slicing out the necessary part, but something like
value => ^(type_caster(Integer)) => int_value
So... Just a discussion topic!
Updated by zverok (Victor Shepelev) over 2 years ago
One simpler example is, that matching something with regexps with capture groups is still quite annoying!
case string
when /{{(.+?)}}/
content = Regexp.last_match[1] # looking into global value isn't exactly elegant, right?
We could've probably bend it towards
case string
in /{{(.+?)}}/ => content # the matched group
This, though, raises a question of several match groups, at which point one starts to want more:
case string
in /{{(.+?): (.+?)}}/ => [key, value]
# use key and value
in /{{=(?<named>.+?)}}/ => {named:}
# use named
...so... IDK.
Updated by hmdne (hmdne -) over 2 years ago
# looking into global value isn't exactly elegant, right?
It's not global, it's Fiber-local, so are $1 and friends. This may not be messaged well enough in the documentation though...
[1] pry(main)> z = Fiber.new { /(.)/ =~ 'test' }
=> #<Fiber:0x00007f698a2897e0 (pry):1 (created)>
[2] pry(main)> z.resume
=> 0
[3] pry(main)> Regexp.last_match
=> nil
[4] pry(main)>
Updated by palkan (Vladimir Dementyev) over 2 years ago
This, though, raises a question of several match groups, at which point one starts to want more:
case string in /{{(.+?): (.+?)}}/ => [key, value] # use key and value in /{{=(?<named>.+?)}}/ => {named:} # use named
...so... IDK.
This one could be achieve via guards:
case val
in /(foo|bar)/ if $~ in [val]
puts val
in /(?<named>\d+)/ if $~ in {named: }
puts named
end
That would require adding MatchData#{deconstruct,deconstruct_keys}, though:
refine MatchData do
alias deconstruct captures
def deconstruct_keys(*)
named_captures.transform_keys(&:to_sym)
end
end
Regarding the original proposal (the unpacking API), I think, it could bring more confusion than value. Adding one more implicit layer (in addition to #deconstruct
and #deconstruct_keys
, which could also be overridden) would make pattern matching even more magical in a bad sense.
Updated by ntl (Nathan Ladd) 6 months ago ยท Edited
Could the match operator, =~
, could be used as a general complement to ===
?
Example (following original sketch from @zverok (Victor Shepelev)):
class Matcher
def initialize(regexp)
@regexp = regexp
end
def ===(obj)
@regexp.match?(obj)
end
def =~(obj)
match_data = @regexp.match(obj)
match_data
end
end
case "some string"
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data
some_named_capture = match_data[:some_named_capture]
puts "Match: #{some_named_capture}"
end
The implementation of =~
would be optional in my view; not implementing it on whatever implements ===
would just cause Ruby to behave as it does now:
class Matcher
def initialize(regexp)
@regexp = regexp
end
def ===(obj)
@regexp.match?(obj)
end
end
case "some string"
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_variable
# match_variable is just "some string"
puts match_variable.inspect
end
This would add =~
to the pattern matching protocol that's currently comprised of ===
, deconstruct
and deconstruct_keys
. It would make ===
significantly more useful, and regular expressions provide a compelling example for why: when matching a string to a regular expression pattern, the string is already in lexical scope, but the match data provides new useful information that only comes into existence upon a successful match:
subject = "some string"
case subject
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data
# Capturing the match data variable instead of the original string doesn't make the original string inaccessible:
puts "Match subject: #{subject.inspect}"
# match_data provides additional useful information:
some_named_capture = match_data[:some_named_capture]
puts "Match data: :#{some_named_capture}"
end
I also suspect this could be embedded into the pattern matching syntax itself, would could allow for some highly useful possibilities. One example that leaps to mind is reifying primitive data parsed from JSON into a data structure:
SomeStruct = Struct.new(:some_attr, :some_other_attr) do
def self.===(data)
data.is_a?(Hash) && data.key?(:some_attr) && data.key?(:some_other_attr)
end
def self.=~(data)
new(**data)
end
end
some_json = <<JSON
{
"some_attr": "some value",
"some_other_attr": "some other value"
}
JSON
# Parse JSON into raw (primitive) data
some_data = JSON.parse(some_json, symbolize_names: true)
case some_data
in SomeStruct => some_struct
# some_sturct is a reified data structure (SomeStruct) built from some_data
puts some_struct.inspect
end