Feature #18788
closedSupport passing Regexp options as String to Regexp.new
Description
Current situation¶
Regexp.new
takes an integer as second argument which needs to be ORed together from multiple constants:
Regexp.new('foo', Regexp::IGNORECASE | Regexp::MULTILINE | Regexp::EXTENDED) # => /foo/imx
Any other non-nil value is treated as i
flag:
Regexp.new('foo', Object.new) # => /foo/i
Suggestion¶
Regexp.new
should support passing the regexp flags not only as an Integer, but also as a String, like so:
Regexp.new('foo', 'i') # => /foo/i
Regexp.new('foo', 'imx') # => /foo/imx
# edge cases
Regexp.new('foo', 'iii') # => /foo/i
Regexp.new('foo', '') # => /foo/
# unsupported flags should probably emit a warning
Regexp.new('foo', 'jmq') # => /foo/m
Regexp.new('foo', '-m') # => /foo/m
Reasons¶
- The constants are a bit cumbersome to use, particularly when building the regexp from variable data:
def make_regexp(regexp_body, opt_string)
opt_int = 0
opt_int |= Regexp::IGNORECASE if opt_string.include?('i')
opt_int |= Regexp::MULTILINE if opt_string.include?('m')
opt_int |= Regexp::EXTENDED if opt_string.include?('x')
Regexp.new(regexp_body, opt_int)
end
- Passing a String is already silently accepted, and people might get the wrong impression that it works:
Regexp.new('foo', 'i') # => /foo/i
... but it doesn't really work:
Regexp.new('foo', 'x') # => /foo/i
Backwards compatibility¶
This change would not be fully backwards compatible.
Code that relies on the second argument being a String which does not contain "i" in order to make the Regexp case insensitive would break.
Note: originally I suggested supporting Symbols in the same way as Strings, but removed that in light of the discussion.
Updated by Dan0042 (Daniel DeLorme) over 2 years ago
+1, I've often wanted that.
Updated by sawa (Tsuyoshi Sawada) over 2 years ago
I think this is a good idea, but I also think Regexp.new
is not used so often so it does not matter if we have such feature or not. Most of the time, a regex is created using a regex literal. What is the use case of Regexp.new
?
Updated by janosch-x (Janosch Müller) over 2 years ago
I agree that the use cases are fairly limited. They are not totally uncommon, though.
Regexp.new
is mostly needed for recursion on Ruby and metaprogramming, and is used this way e.g. in rubocop, ruby_parser, and parser.
Sometimes it is used to build Regexps based on input, e.g. in capybara, prawn, and psych. There might also be a few CMSes that allow admins to type in validation patterns.
Many people are also seemingly unaware that Regexp literals support interpolation, or maybe they just find this interpolation hard to read. Either way, they often use Regexp.new
instead, passing it an interpolated or concatenated String or Regexp.union
output, as can be seen e.g. in css_parser, haml, net-ssh, or uri. (css_parser actually uses the only coincidentally working "i" as a second argument.)
In some other cases, Regexp.new
is used to avoid a SyntaxError
or a warning on older Rubies, e.g. sinatra does this.
Updated by duerst (Martin Dürst) over 2 years ago
Please don't allow symbols. It may look cute (in some cases), but the options are essentially a set of single letters, and that's not what Symbols are about.
If you really want symbols, please make it something like
Regexp.new('foo', :ignorecase, :multiline, :extend) # => /foo/imx
Even something a bit shorter, such as the following, would be okay with me:
Regexp.new('foo', :ignore, :multi, :ext) # => /foo/imx
Updated by janosch-x (Janosch Müller) over 2 years ago
Please don't allow symbols. It may look cute (in some cases), but the options are essentially a set of single letters, and that's not what Symbols are about.
My reasoning for supporting them was not cuteness but avoiding confusion.
If we change only the processing of Strings, we will have this behavior:
Regexp.new('foo', 'i') # => /foo/i
Regexp.new('foo', 'm') # => /foo/m
Regexp.new('foo', :i) # => /foo/i # looks like it also works
Regexp.new('foo', :m) # => /foo/i # slightly surprising
I'm happy to support only Strings, though. In this case we might want to consider raising an ArgumentError or emitting a deprecation warning when a Symbol is passed, or even for anything that is not nil/true/false/Int/String?
please make it something like
Regexp.new('foo', :ignorecase, :multiline, :extend) # => /foo/imx
I don't think this is a viable option. Regexp.new
already accepts up to 3 arguments. The third one is undocumented as far as I can tell, but it is used in the wild. If a String starting with "n" or "N" is passed as third argument, the Regexp encoding is set to ASCII. Arguably that makes consistency another reason for my proposal.
Updated by nobu (Nobuyoshi Nakada) over 2 years ago
I agree that symbols should be disallowed.
The examples uses arbitrary combinations of flags and the order of the flags doesn't have meanings.
In other words, it is a set of chars and OK for a string.
But a symbol is not such thing.
For the compatibility sake, we'll need migration period to warn anything other than integer and valid string, and then error.
Updated by nobu (Nobuyoshi Nakada) over 2 years ago
Regexp.new("(?#{options}:#{code})")
Regexp.new(code, eval("//#{options}").options)
Updated by janosch-x (Janosch Müller) over 2 years ago
Thank you for the explanation regarding symbols!
we'll need migration period to warn anything other than inter and valid string
I think the optional third argument should be also deprecated as described in #18797. Otherwise we might still want to allow nil
as "stopgap" value for the second argument.
Regexp.new(code, eval("//#{options}").options)
This feels too unsafe for some of the cases mentioned above.
Regexp.new("(?#{options}:#{code})")
This is clever and will work for most of the use cases mentioned above. Just a few minor downsides:
- The group options syntax isn't so well-known and universally understood.
- A few use cases might need to preserve notation (e.g. codemod).
- The result may become verbose in case of recursion or nesting of Regexps.
Updated by janosch-x (Janosch Müller) over 2 years ago
- Description updated (diff)
Updated by matz (Yukihiro Matsumoto) over 2 years ago
Accepted. Unknown flags should raise errors.
Matz.
Updated by nobu (Nobuyoshi Nakada) over 2 years ago
- Status changed from Open to Closed
Applied in changeset git|1e9939dae24db232d6f3693630fa37a382e1a6d7.
[Feature #18788] Support options as String
to Regexp.new
Regexp.new
now supports passing the regexp flags not only as an
Integer
, but also as a `String. Unknown flags raise errors.