Project

General

Profile

Actions

Feature #18757

open

Introduce %R percent literal for anchored regular expression patterns

Added by zeke (Zeke Gabrielse) almost 2 years ago. Updated almost 2 years ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:108416]

Description

When defining regular expression patterns, it's often the case that you want to anchor with \A and \z to match the full text input, rather than ^ and $, respectively, which may (unintentionally) match text including newlines. This is especially true in the context of a web application such as a Rails app. Unfortunately, \A and \z reduce the legibility of a regular expression.

For example, take this ActionMailbox usage:

class ApplicationMailbox < ActionMailbox::Base
  routing %r{\Areplies\+.*?@ruby-lang\.org\z}i => :replies
  routing %r{\Asales@.*?\z}i                   => :leads
end

At first glance, it may look as if the second route matches Asales, but that's not the case upon further inspection. To improve legibility, a developer may choose to use ^ instead of \A. Because when defining a pattern using \A and \z, readability suffers, but especially for \A. In other cases, developers forget to use \A and \z over ^ or $ when validating or matching against user input.

I propose Ruby introduces a new percent-notation, %R{}, for defining interpolated regular expression patterns that automatically anchor a pattern with \A and \z.

For example, the above will look like below:

class ApplicationMailbox < ActionMailbox::Base
  routing %R{replies\+.*?@ruby-lang\.org}i => :replies
  routing %R{sales@.*?}i                   => :leads
end

This is much more readable, and it's safer — developers using %R{} are not going to accidentally use ^ or $ instead of \A and \z, respectively (the former being vulnerable to matching input data containing newlines).

This is especially useful in pattern matching data where some values may be a symbol or a string, depending on where the data originated (internally vs externally):

data = { type: :foo, id: 1 } # Could also be: { type: 'foo', id: 1 }

case data
in type: %R(foo), id:
  # ...
else
end

Formally, the new anchored regex percent notation would work as follows:

re = %R(test)
# => /\Atest\z/

re.match?('test')    # => true
re.match?('testing') # => false
re.match?('a test')  # => false
re.match?(:test)     # => true
re.match?(:testing)  # => false
re.match?(:a_test)   # => false

This would also be useful for data validation purposes, where a developer could clean up patterns that previously used regular expressions with \A...\z and ^...$, such as with Rails model validations, e.g. validates_format(with: %R{[-a-z0-9]+})

I do understand that having an uppercase %R behaves differently than other percent notations (i.e. lowercase is typically non-interpolated, uppercase interpolated), but since %r already allows interpolation, I figured it was okay to be a bit different. Regardless — I'm open to other syntax suggestions.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0