Project

General

Profile

Actions

Bug #21503

closed

\p{Word} does not match on \p{Join_Control} while docs say it does

Added by procmarco (Marco Concetto Rudilosso) 7 days ago. Updated 4 days ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:122665]

Description

in the docs it is mentioned that \p{Word} matches the equivalent of: [\p{M}\p{Nd}\p{Pc}\p{Alpha}\p{Join_Control}] as it's also defined in the unicode spec

the issue is that it does not seem to be the case

irb(main):018> REGEX = /\p{Word}/u
=> /\p{Word}/
irb(main):019> "\u200D".gsub(REGEX, "-")
=> "‍"
irb(main):020> REGEX2 = /\p{Join_Control}/u
=> /\p{Join_Control}/
irb(main):021> "\u200D".gsub(REGEX2, "-")
=> "-"

There's 2 solutions here, either we change the docs or the code.


Related issues 1 (0 open1 closed)

Related to Ruby - Bug #19417: Regexp \p{Word} and [[:word:]] do not match Unicode Other_Number characterClosedActions

Updated by procmarco (Marco Concetto Rudilosso) 7 days ago

What I mean is that the current implementation of \p{Word} does not seem to match \p{Join_Control} even though it should and it also says so in the docs

Actions #2

Updated by mame (Yusuke Endoh) 7 days ago

  • Related to Bug #19417: Regexp \p{Word} and [[:word:]] do not match Unicode Other_Number character added

Updated by naruse (Yui NARUSE) 4 days ago

It looks \p{Word} is updated in TR#18 Version 15.
https://www.unicode.org/reports/tr18/tr18-15.html
The fix looks good.

Updated by hsbt (Hiroshi SHIBATA) 4 days ago

  • Status changed from Open to Closed
  • Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0