Bug #14137
closedWindows / MinGW - Regexp - Character Properties - General Category
Description
While testing RDoc on Appveyor, and the recently 'added' literals.kpeg file, I had several errors across Ruby versions 2.2 thru trunk.
It seems that the \p{}
constructs listed here under 'General Category' generate an invalid character property name {**}
error for many of the listed constructs.
Conversely, the constructs listed previously (eg \p{Alpha}, \p{Lower}, \p{Space}, etc) seem to work.
I briefly looked at the regexp tests, and they don't seem to test these.
Are these unavailable on Windows?
Updated by duerst (Martin Dürst) over 6 years ago
There is a C preprocessor flag USE_UNICODE_PROPERTIES that is used e.g. in enc/unicode/10.0.0/name2ctype.h. I have never actually seen this, but it may be possible that your version of Ruby is compiled without this flag on. I don't see any reason why this should be Windows-specific; these properties are useful independent of the OS.
Updated by jeremyevans0 (Jeremy Evans) almost 3 years ago
- Status changed from Open to Closed
I tested this using RubyInstaller versions on Windows. This appears related to regexp encoding, and not a bug, with the same behavior between Ruby 2.0 and 3.0:
C:\>c:\Ruby30-x64\bin\ruby -e "p(/\p{L}/.match('a'))"
-e:1: invalid character property name {L}: /\p{L}/
C:\>c:\Ruby30-x64\bin\ruby -e "p(/\p{L}/u.match('a'))"
#<MatchData "a">
C:\>c:\Ruby30-x64\bin\ruby -Ku -e "p(/\p{L}/.match('a'))"
#<MatchData "a">
C:\>c:\Ruby200-x64\bin\ruby -e "p(/\p{L}/.match('a'))"
-e:1: invalid character property name {L}: /\p{L}/
C:\>c:\Ruby200-x64\bin\ruby -e "p(/\p{L}/u.match('a'))"
#<MatchData "a">
C:\>c:\Ruby200-x64\bin\ruby -Ku -e "p(/\p{L}/.match('a'))"
#<MatchData "a">
The documentation for this feature (https://docs.ruby-lang.org/en/master/doc/regexp_rdoc.html#label-Character+Properties) says: A Unicode character's General Category value can also be matched
, which I think implies this should only work for Unicode regexps, and not other regexps. So I think the current behavior is expected and not a bug.
Updated by duerst (Martin Dürst) over 2 years ago
I agree with @jeremyevans0 (Jeremy Evans), but would like to add that
ruby -e 'p (/\p{L}/.match("a"))'
will produce #<MatchData "a">
also in any situation that is using UTF-8. That will be on almost all current Linux/Unix,... versions, and also on Windows if you first use the command chcp 65001
.