Project

General

Profile

Actions

Bug #21870

open

Regexp: Warnings when using slightly overlapping \p{...} classes

Bug #21870: Regexp: Warnings when using slightly overlapping \p{...} classes

Added by jneen (Jeanine Adkisson) 16 days ago. Updated 6 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:124714]

Description

$VERBOSE = true
# warning: character class has duplicated range: /[\p{Word}\p{S}]/
regex = /[\p{Word}\p{S}]/

As far as I can tell this is a perfectly valid and non-redundant set of unicode properties, but I am still being spammed with warnings. Using /(?:\p{Word}|\p{S})/ is kind of a workaround, but it is slower (see benchmarks below), and also less clear.

They do overlap somewhat, but I think the deeper issue is there is not a convenient way to express this without falling back to raw unicode ranges.

For a similar example, consider /[\p{Word}\p{Cf}]/, which overlap precisely on ZWJ and ZWNJ. Even with this very small overlap, Ruby issues a warning, despite neither class being removable without changing the meaning of the regexp. The regexp is valid and as far as I can tell has no practical issues - Onigmo seems to be capable of intersecting overlapping codepoint ranges.

This warning was introduced back in 2009 with #1831, to help surface instances of things like /[:lower:]/ instead of /[[:lower:]]/, but even then the reporter suggested only warning if the class both begins and ends with :.

Is it appropriate to warn here? Is this a job best left to a static linter like Rubocop, which didn't exist at the time #1831 was opened? Or perhaps would it be better to warn only in the very specific case that #1831 was opened to address?


Related issues 1 (0 open1 closed)

Related to Ruby - Bug #21503: \p{Word} does not match on \p{Join_Control} while docs say it doesClosedActions
Actions

Also available in: PDF Atom