Project

General

Profile

Actions

Feature #21636

open

Proposal to Introduce a Dedicated Warning Category for Regular Expressions

Added by alexanderadam (Alexander Adam) about 22 hours ago. Updated about 5 hours ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:<unknown>]

Description

Hi folks,

while working on adding regex support for the marcel gem (see PR: https://github.com/rails/marcel/pull/132), I encountered regex warnings triggered by some of Tika’s regular expressions. Typically, these warnings are valid but in this case I'd like to silence them for one single call.

Currently, to suppress these warnings, I have overridden $VERBOSE, which effectively silences all warnings but also hides potentially important ones. This feels like a blunt instrument that might mask unrelated issues.

Fortunately, Ruby already supports dedicated warning categories, as detailed in these discussions and pull requests:

Given this existing framework, I would like to propose introducing a dedicated warning category for regular expressions. This would allow libraries and applications to suppress only regex-related warnings selectively, for example, with Warning[:regexp] = false.

Such a feature would provide more granular control over warnings without compromising the visibility of other important warnings.

Actions #1

Updated by mame (Yusuke Endoh) about 10 hours ago

Could you please elaborate on your use case?

I'm not entirely clear on which specific warnings you want to suppress. Looking at the patch's comments, it seems you want to suppress warnings about "character class overlaps." This warning should be issued when the regular expression is compiled. However, the guard is placed around the call to String#match?. Does String#match? ever issue warnings about character class overlaps?


Are you looking to suppress all warnings related to regular expressions? For instance, a future warning might be added to announce that "the meaning of this regular expression will change in the future." Would you really want to suppress that kind of warning as well?

If you only want to suppress specific warnings about character class overlaps, and not all regexp-related warnings, overriding the Warning.warn method might be a suitable solution.

def Warning.warn(msg)
  return if msg.include?("warning: character class has duplicated range")
  super(msg)
end

Regexp.new("[aa]")

By the way, this might be unsolicited advice, but just for the case:

Are you trying to run regular expressions written for Java's regex engine with Ruby's regex engine? If so, are you sure there are no issues with incompatibilities between the regular expressions?

For example, the regex /^bar/ will match "foo\nbar" in Ruby, but it seems it won't match by default in Java. There appear to be many other subtle incompatibilities as well.

Rather than blindly importing Apache Tika's regular expressions, I thought the good approach would be to review them individually and modify them as necessary (including fixing duplicate character classes).

Actions #2

Updated by alexanderadam (Alexander Adam) about 5 hours ago

Thank you so much for your feedback! 🙏

You were absolutely right: I simplified the PR and made it explicit now to run only on the tested types, making it a far more predictable outcome.
Thus I'm not so keen about my proposal right now any more. 😉

Actions

Also available in: Atom PDF

Like0
Like0Like0