Eregon (Benoit Daloze) wrote in #note-2:
Maybe an easier/better/more portable way to do that would be to have a flag to check that all Regexp are linear when they are executed, e.g., by the test suite?
Like https://speakerdeck.com/eregon/just-in-time-compiling-ruby-regexps-on-truffleruby?slide=19
Maybe it could be a performance warning (or even its own warning category) and then you could easily tweak the behavior, e.g. raise an exception if there is such a warning.
...
It could also be a flag that can be set directly by Ruby code, not necessarily or only a command-line flag.
Yes, I like that a lot. Perhaps something like Regexp.warn_nonlinear = true
or Regexp.on_nonlinear = ->re { warn "nonlinear" }
. And I think it's especially useful to warn or raise when the regexp is created, not only when it's executed. But that would be a different issue from this one.
When discussing this particular use case with others, several people suggested new regexp options to either enforce linear runtime when the regexp is created, or to allow non-linear regexps without warnings or exceptions (e.g. for simple scripts or when the inputs are safely controlled), e.g. /enforced linear/l
and /allow non-linear/L
. But that would also be a different issue from this one. ;)
Eregon (Benoit Daloze) wrote in #note-4:
one concern I have with https://github.com/ruby/net-imap/blob/92db350b24c388d2a2104f36cac9caa49a1044df/test/net/imap/regexp_collector.rb is it could only work CRuby due to using RubyVM::InstructionSequence.
That seems problematic for a security thing that really isn't CRuby-specific.
It also doesn't work with CRuby 3.0 or 3.1, and it isn't inspecting the iseq for procs that are stored as constants, etc. It can't catch everything, but it's a useful first step: it already found three existing non-linear regexps, and prevented a PR that would have added a fourth. But improvements to that test are also a different issue from this one.
I assume TruffleRuby has a comparable mechanism to find regexps within the methods that define them, and PRs are welcome. But TruffleRuby already has a built in approach for warning on non-linear regexps are executed, so it's ahead on this. We should probable enable that TruffleRuby config flag in CI... but that's a different issue from this one. ;)
byroot (Jean Boussier) wrote in #note-3:
Or iterate over all regexps through ObjectSpace.each_object(Regexp)
? But I suppose it doesn't allow to only look at a specific namespace.
Yes, that is one of the reasons I didn't want to use that approach in the tests. Related but more significant for my use-case: ObjectSpace doesn't tell me what created the regexp or is holding a reference to any regexps I find. But it's still a useful technique that I used to manually gauge how thorough the tests were.
Another approach would be to use TracePoint while loading and/or testing the library. I think that should allow limiting by namespace... but it's still a different ticket from this one.