Actions

Copy link

Feature #19694

open

Add Regexp#timeout= setter

Feature #19694: Add Regexp#timeout= setter

Added by aharpole (Aaron Harpole) about 3 years ago. Updated about 3 years ago.

Status:

Open

Assignee:

Target version:

[ruby-core:113660]

Description

Abstract¶

In addition to allowing for a Regexp timeout to be set on individual instances by setting a timeout argument in Regexp.new, I'm proposing that we also allow setting the timeout on Regexp objects with a #timeout= setter.

Background¶

To be able to roll out a global Regexp timeout for a large application, there are inevitably some individual regexes for which a different timeout is appropriate. While the timeout keyword argument was added to Regexp.new, this isn't always a viable option.

In the case of regex literal syntax (/ab*/ or %r{ab*}, for instance), it's not possible to set a timeout at all right now without converting to Regexp.new, which may be awkward depending on the contents of the regex.

It also is desirable from time to time to be able to set a timeout for a regex object after it's been initialized.

Finally, because we offer a Regexp#timeout getter, for consistency it would be nice to also offer a setter.

The introduction of a Regexp#timeout= setter was mentioned as a possible way to set individual timeouts in https://bugs.ruby-lang.org/issues/19104#Specification.

Proposal¶

I propose that we add the method Regexp#timeout=. It works the same way the timeout argument works in Regexp.new, taking either a float or nil.

This makes it relatively easy to add timeouts to specific regex literals (regex literals are frozen by default so you do have to dup them first):

emoji_filter_pattern = %r{
  (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)})
  #{EmojiFilter.unicodes_pattern}
  (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })})
}x.dup
emoji_filter_pattern.timeout = 1.0
emoji_filter_pattern.freeze

Implementation¶

This setter has been implemented in https://github.com/ruby/ruby/pull/7847.

Evaluation¶

It's just a setter, so pretty straightforward in terms of implementation and use.

Discussion¶

It's worth considering other options for overriding Regexp.timeout. I'd love to see something like the following for overriding regexp timeouts as well:

Regexp.timeout = 1.0

Regexp.with_timeout(5.0) do
  evaluate_slower_regexes
end

It's possible to implement something like Regexp.with_timeout but it's not thread-safe by default since it would involve overwriting Regexp.timeout.

Summary¶

Regexp instances have a getter for timeout, and adding a corresponding setter adds consistency and will make it easier for developers to adopt adding a global Regexp.timeout by making it simpler to adjust timeouts on a regex by regex basis.

It's a minor change but the added consistency and flexibility help us optimize for developer happiness.

Related issues 1 (0 open — 1 closed)

Updated by austin (Austin Ziegler) about 3 years ago Actions
Copy link
#1 [ruby-core:113661]

Since you have to .dup a regexp literal, it might be better to just call Regexp.new:

emoji_filter_pattern = %r{
  (?<!#{Regexp.quote(ZERO_WIDTH_JOINER)})
  #{EmojiFilter.unicodes_pattern}
  (?!#{Regexp.union(EmojiFilter::MODIFIER_CHAR_MAP.keys.map { |k| Regexp.quote k })})
}x
emoji_filter_pattern = Regexp.new(emoji_filter_pattern, timeout: 1.0).freeze

I am a little surprised that regex literals are frozen, but Regexp.new regexen aren’t.

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#2 [ruby-core:113666]

I am a little surprised that regex literals are frozen, but Regexp.new regexen aren’t.

It's because:

def foo
  /foo/.object_id
end

Always return the same instance. So if it's mutable, it allow to leak state where it's not expected.

Another reason if that deduplicating literal regexp could save a decent amount of memory: #16557. They were frozen to allow deduplicating them, but the implementation is not there yet.

Updated by nobu (Nobuyoshi Nakada) about 3 years ago Actions
Copy link
#3 [ruby-core:113668]

I agree with @austin (Austin Ziegler), calling new with timeout: feels simpler and better than dup then set.

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#4 [ruby-core:113671]

On TruffleRuby all Regexp instances are frozen, notably because they are all globally cached and deduplicated, so this pattern with dup + setter cannot work.
Also it seems clearly better that a Regexp timeout is passed when creating the Regexp, not after, semantically.

I do get the point that Regexp.new is less nice than /.../.
However if you are on 3.2 and the Regexp is linear, then the timeout is most likely unnecessary, and so in general I think it's more effective to work on making a Regexp linear than to use individual timeouts.

Regexp.with_timeout(5.0) do could be a way, as you show, although it introduces fiber-local state just to pass probably to a single Regexp, so it feels like too much complexity and footprint for that.

I think Regexp.timeout= is insufficent anyway to prevent ReDoS, because it needs to be a fairly high value to deal with e.g. variance in timings and CPU usage, scheduling, etc, and if a user manages to hit the timeout on every request they could likely still cause a ReDoS.
I think the real solution is to make as many Regexps as possible linear, and warn when a Regexp is not linear (I still need to file a proposal for that).

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#5 [ruby-core:113672]

I do get the point that Regexp.new is less nice than /.../.

The main issue with Regexp.new is that it compile the regexp on each invocation, which is slow. So you can't just translate from one syntax to the other, you also have to make sure to keep it in a constant.

On TruffleRuby all Regexp instances are frozen, notably because they are all globally cached and deduplicated, so this pattern with dup + setter cannot work.

I assume you could have an internal regexp object that is deduplicated and immutable, and the actual Regexp object that is exposed could be a tuple of (internal_regexp, timeout).

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#6 [ruby-core:113674]

byroot (Jean Boussier) wrote in #note-5:

I assume you could have an internal regexp object that is deduplicated and immutable, and the actual Regexp object that is exposed could be a tuple of (internal_regexp, timeout).

Not really, it would be very messy and have a performance cost.
The reason is we want to embed Regexp objects in the context-independent AST (for persisting JITed code), and so it must not refer to context-specific state.
We could have two types, one for immutable regexps and one for mutable regexps, we do this for String, but it is a nightmare.

Similarly, Ractor wants to be able to use Regexp objects on any Ractor, the requirement for that is that as many Regexp as possible are immutable.
Probably we should make all Regexps immutable in CRuby too for consistency (both across Rubies and between literal/non-literal) and it would benefit Ractor.

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#7 [ruby-core:113697]

I think it is better if no code can mutate the timeout of the Regexps that are passed into it, even if that affected only dupped or non-literal Regexps.

If it is really necessary to set custom timeouts on literals, maybe something like this could work:

regexp = Regexp.with_timeout(2.0) { /foo/ }
regexp.timeout # => 2.0

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#8 [ruby-core:113701]

janosch-x (Janosch Müller) wrote in #note-7:

regexp = Regexp.with_timeout(2.0) { /foo/ }
regexp.timeout # => 2.0

That, and as proposed in the description, doesn't really work if literal Regexps are created at parse time, before execution.
This is the case on CRuby:

$ ruby --disable-gems -e 'pp ObjectSpace.count_objects; O=Object.new; R=/a/' | grep REGEX
 :T_REGEXP=>3,
$ ruby --disable-gems -e 'pp ObjectSpace.count_objects; O=Object.new' | grep REGEX     
 :T_REGEXP=>2,

and it is the case on TruffleRuby as well.

Also it could be confusing for [2.0, 4.0].each { |t| Regexp.with_timeout(t) { /foo/ } } (it would either set it to 2.0 or to the global timeout, never to 4.0).

Furthermore, even if that worked, it would then break Regexp interning, where the timeout at one place would affect another literal Regexp with the same pattern.

Basically, I think there is no way besides Regexp.new(pattern, timeout: t) if you want a custom timeout for a Regexp.
Literal Regexp are created too early to set anything.

And adding state (timeout=) to Regexp feels wrong, since most instances are already immutable, and they might become all immutable.

I would suggest to close this, Regexp.new("a", timeout: 2.0) already works and I think there is no alternative that works well to set the timeout per Regexp.

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#9 [ruby-core:113704]

I guess the only noteworthy argument for a change goes like this:

A custom timeout only being available on Regexp::new might lead people to write less performant code.

I'm not sure this is a strong argument. The big Regexps that take a noteworthy time to compile are often those with interpolation, as seen in the OP, and I assume these aren't so easy to pre-compile or deduplicate anyway.

Maybe there could be an instance method, Regexp#with_timeout, that creates and memoizes a copy with a specific timeout? This also sounds a bit hacky, though...

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#10

Related to Feature #19720: Warning for non-linear Regexps added

Updated by nobu (Nobuyoshi Nakada) about 3 years ago Actions
Copy link
#11 [ruby-core:113838]

janosch-x (Janosch Müller) wrote in #note-9:

I guess the only noteworthy argument for a change goes like this:

A custom timeout only being available on Regexp::new might lead people to write less performant code.

I made a patch to improve Regexp.new(/RE/) (and Regexp#dup).
https://github.com/nobu/ruby/tree/re_copy

Comparison against master 441302be1a.

Iteration per second (i/s):

	compare-ruby	built-ruby
dup	16.684k	817.776k
	-	49.02x
string	16.520k	16.538k
	-	1.00x
regexp	16.365k	842.451k
	-	51.48x

I'm not sure this is a strong argument. The big Regexps that take a noteworthy time to compile are often those with interpolation, as seen in the OP, and I assume these aren't so easy to pre-compile or deduplicate anyway.

A regexp with interpolation can be replaced with Regexp.new at almost same performance.

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#12 [ruby-core:113843]

I made a patch to improve Regexp.new(/RE/) (and Regexp#dup).

Interesting. Given that literal regexp are frozen, and even for unfrozen ones most of their state is immutable, have you considered using Copy on Write at the Ruby object level, like Array#dup / String#dup ?

Even if copying the bytes is relatively fast, if used in a tight loop it may cause some malloc churn.

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#13 [ruby-core:113848]

janosch-x (Janosch Müller) wrote in #note-9:

A custom timeout only being available on Regexp::new might lead people to write less performant code.

I think it is very well known and easy to know though profiling that one should always cache the result of Regexp.new (e.g., in a constant).
The docs should point this out though.

I'm not sure this is a strong argument. The big Regexps that take a noteworthy time to compile are often those with interpolation, as seen in the OP, and I assume these aren't so easy to pre-compile or deduplicate anyway.

Actually TruffleRuby inline-caches interpolated Regexp creation, so if it's the same pattern then the same Regexp is used.
In the OP's case it seems easy to store the Regexp in a constant.

Maybe there could be an instance method, Regexp#with_timeout, that creates and memoizes a copy with a specific timeout? This also sounds a bit hacky, though...

I think timeouts are inefficient and insufficient to address ReDoS. IMO the real solution is #19720.

Updated by nobu (Nobuyoshi Nakada) about 3 years ago Actions
Copy link
#14 [ruby-core:113856]

byroot (Jean Boussier) wrote in #note-12:

I made a patch to improve Regexp.new(/RE/) (and Regexp#dup).

Interesting. Given that literal regexp are frozen, and even for unfrozen ones most of their state is immutable, have you considered using Copy on Write at the Ruby object level, like Array#dup / String#dup ?

Do you mean compiled pattern and so on in OnigRegexType?
They are never changed once initialized until destruction, "Copy-on-Write" won't be a proper word.
Currently timelimit is embedded at the same level as other fields, so the struct must be reconfigured to share other fields.

Even if copying the bytes is relatively fast, if used in a tight loop it may cause some malloc churn.

It is faster than re-parsing the source at least.

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#15 [ruby-core:113857]

They are never changed once initialized until destruction, "Copy-on-Write" won't be a proper word.

Right, I just meant directly sharing that structure, or more accurately having some T_STRUCT simply keeping a reference to another T_STRUCT. Making Regexp.dup and Regexp.new(Regexp) a simple embeded T_STRUCT allocation, like for String#dup.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #19694

Add Regexp#timeout= setter

Abstract¶

Background¶

Proposal¶

Implementation¶

Evaluation¶

Discussion¶

Summary¶

Updated by austin (Austin Ziegler) about 3 years ago Actions
Copy link
#1 [ruby-core:113661]

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#2 [ruby-core:113666]

Updated by nobu (Nobuyoshi Nakada) about 3 years ago Actions
Copy link
#3 [ruby-core:113668]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#4 [ruby-core:113671]

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#5 [ruby-core:113672]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#6 [ruby-core:113674]

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#7 [ruby-core:113697]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#8 [ruby-core:113701]

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#9 [ruby-core:113704]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#10

Updated by nobu (Nobuyoshi Nakada) about 3 years ago Actions
Copy link
#11 [ruby-core:113838]

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#12 [ruby-core:113843]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#13 [ruby-core:113848]

Updated by nobu (Nobuyoshi Nakada) about 3 years ago Actions
Copy link
#14 [ruby-core:113856]

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#15 [ruby-core:113857]

Project

General

Profile

Ruby

Custom queries

Feature #19694

Add Regexp#timeout= setter

Abstract¶

Background¶

Proposal¶

Implementation¶

Evaluation¶

Discussion¶

Summary¶

Updated by austin (Austin Ziegler) about 3 years ago ActionsCopy link #1 [ruby-core:113661]

Updated by byroot (Jean Boussier) about 3 years ago ActionsCopy link #2 [ruby-core:113666]

Updated by nobu (Nobuyoshi Nakada) about 3 years ago ActionsCopy link #3 [ruby-core:113668]

Updated by Eregon (Benoit Daloze) about 3 years ago ActionsCopy link #4 [ruby-core:113671]

Updated by byroot (Jean Boussier) about 3 years ago ActionsCopy link #5 [ruby-core:113672]

Updated by Eregon (Benoit Daloze) about 3 years ago ActionsCopy link #6 [ruby-core:113674]

Updated by janosch-x (Janosch Müller) about 3 years ago ActionsCopy link #7 [ruby-core:113697]

Updated by Eregon (Benoit Daloze) about 3 years ago ActionsCopy link #8 [ruby-core:113701]

Updated by janosch-x (Janosch Müller) about 3 years ago ActionsCopy link #9 [ruby-core:113704]

Updated by Eregon (Benoit Daloze) about 3 years ago ActionsCopy link #10

Updated by nobu (Nobuyoshi Nakada) about 3 years ago ActionsCopy link #11 [ruby-core:113838]

Updated by byroot (Jean Boussier) about 3 years ago ActionsCopy link #12 [ruby-core:113843]

Updated by Eregon (Benoit Daloze) about 3 years ago ActionsCopy link #13 [ruby-core:113848]

Updated by nobu (Nobuyoshi Nakada) about 3 years ago ActionsCopy link #14 [ruby-core:113856]

Updated by byroot (Jean Boussier) about 3 years ago ActionsCopy link #15 [ruby-core:113857]

Updated by austin (Austin Ziegler) about 3 years ago Actions
Copy link
#1 [ruby-core:113661]

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#2 [ruby-core:113666]

Updated by nobu (Nobuyoshi Nakada) about 3 years ago Actions
Copy link
#3 [ruby-core:113668]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#4 [ruby-core:113671]

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#5 [ruby-core:113672]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#6 [ruby-core:113674]

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#7 [ruby-core:113697]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#8 [ruby-core:113701]

Updated by janosch-x (Janosch Müller) about 3 years ago Actions
Copy link
#9 [ruby-core:113704]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#10

Updated by nobu (Nobuyoshi Nakada) about 3 years ago Actions
Copy link
#11 [ruby-core:113838]

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#12 [ruby-core:113843]

Updated by Eregon (Benoit Daloze) about 3 years ago Actions
Copy link
#13 [ruby-core:113848]

Updated by nobu (Nobuyoshi Nakada) about 3 years ago Actions
Copy link
#14 [ruby-core:113856]

Updated by byroot (Jean Boussier) about 3 years ago Actions
Copy link
#15 [ruby-core:113857]