Feature #18757
openIntroduce %R percent literal for anchored regular expression patterns
Description
When defining regular expression patterns, it's often the case that you want to anchor with \A
and \z
to match the full text input, rather than ^
and $
, respectively, which may (unintentionally) match text including newlines. This is especially true in the context of a web application such as a Rails app. Unfortunately, \A
and \z
reduce the legibility of a regular expression.
For example, take this ActionMailbox
usage:
class ApplicationMailbox < ActionMailbox::Base
routing %r{\Areplies\+.*?@ruby-lang\.org\z}i => :replies
routing %r{\Asales@.*?\z}i => :leads
end
At first glance, it may look as if the second route matches Asales
, but that's not the case upon further inspection. To improve legibility, a developer may choose to use ^
instead of \A
. Because when defining a pattern using \A
and \z
, readability suffers, but especially for \A
. In other cases, developers forget to use \A
and \z
over ^
or $
when validating or matching against user input.
I propose Ruby introduces a new percent-notation, %R{}
, for defining interpolated regular expression patterns that automatically anchor a pattern with \A
and \z
.
For example, the above will look like below:
class ApplicationMailbox < ActionMailbox::Base
routing %R{replies\+.*?@ruby-lang\.org}i => :replies
routing %R{sales@.*?}i => :leads
end
This is much more readable, and it's safer — developers using %R{}
are not going to accidentally use ^
or $
instead of \A
and \z
, respectively (the former being vulnerable to matching input data containing newlines).
This is especially useful in pattern matching data where some values may be a symbol or a string, depending on where the data originated (internally vs externally):
data = { type: :foo, id: 1 } # Could also be: { type: 'foo', id: 1 }
case data
in type: %R(foo), id:
# ...
else
end
Formally, the new anchored regex percent notation would work as follows:
re = %R(test)
# => /\Atest\z/
re.match?('test') # => true
re.match?('testing') # => false
re.match?('a test') # => false
re.match?(:test) # => true
re.match?(:testing) # => false
re.match?(:a_test) # => false
This would also be useful for data validation purposes, where a developer could clean up patterns that previously used regular expressions with \A...\z
and ^...$
, such as with Rails model validations, e.g. validates_format(with: %R{[-a-z0-9]+})
I do understand that having an uppercase %R
behaves differently than other percent notations (i.e. lowercase is typically non-interpolated, uppercase interpolated), but since %r
already allows interpolation, I figured it was okay to be a bit different. Regardless — I'm open to other syntax suggestions.
Updated by zeke (Zeke Gabrielse) over 2 years ago
- Description updated (diff)
Updated by zeke (Zeke Gabrielse) over 2 years ago
- Subject changed from Introduce %R for anchored regular expression patterns to Introduce %R percent literal for anchored regular expression patterns
Updated by jrochkind (jonathan rochkind) over 2 years ago
I do find \A and \z cumbersome and confusing for a common use case. (You didn't mention the need to avoid getting confused with \Z and \z too!).
Instead of new syntax, how about just a new stdlib method, Regexp.anchored(/whatever/
), that would simply add left/right anchoring? Just ordinary new method.
Alternately, I suppose I could see a new flag on the end of /whatever/a
(for (a)nchored). Not sure if adding a new flag has issues. (Not totally sure if a
is already used or not).
Adding new features without adding new syntax is preferable to adding new syntax.