Bug #21870
openRegexp: Warnings when using slightly overlapping \p{...} classes
Description
$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/
As far as I can tell this is a perfectly valid and non-redundant set of unicode properties, but I am still being spammed with warnings. Using /(?:\p{Word}|\p{S})/ is kind of a workaround, but it is slower (see benchmarks below), and also less clear.
They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.
For a similar example, consider /[\p{Word}\p{Cf}]/, which overlap precisely on ZWJ and ZWNJ. Even with this very small overlap, Ruby issues a warning, despite neither class being removable without changing the meaning of the regexp. The regexp is valid and as far as I can tell has no practical issues - Onigmo seems to be capable of intersecting overlapping codepoint ranges.
This warning was introduced back in 2009 with #1831, to help surface instances of things like /[:lower:]/ instead of /[[:lower:]]/, but even then the reporter suggested only warning if the class both begins and ends with :.
Is it appropriate to warn here? Is this a job best left to a static linter like Rubocop, which didn't exist at the time #1831 was opened? Or perhaps would it be better to warn only in the very specific case that #1831 was opened to address?