Project

General

Profile

Actions

Bug #4014

closed

Case-Sensitivity of Property Names Depends on Regexp Encoding

Added by runpaint (Run Paint Run Run) over 13 years ago. Updated about 13 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 1.9.3dev (2010-10-28 trunk 29616) [x86_64-linux]
Backport:
[ruby-core:33000]

Description

=begin
A ticket filed against Read Ruby reminded me of the following inconsistency: in Unicode regexps, property names are case-insensitive; in all other encodings, property names are case-sensitive. This was exacerbated by the reporter's IRB using UTF-8 for regexps, while external scripts used US-ASCII: a seemingly-identical pattern was succeeding in the former case, but failing in the latter.

 run@paint:~$ ruby -e 'p /\p{ascii}/u'
 /\p{ascii}/
 run@paint:~$ ruby -e 'p /\p{ascii}/n'
 -e:1: invalid character property name {ascii}: /\p{ascii}/
 run@paint:~$ ruby -e 'p /\p{ASCII}/n'
 /\p{ASCII}/n
 run@paint:~$ ruby -e 'p /\p{ASCII}/u'
 /\p{ASCII}/

All regexps, regardless of their encoding, support the POSIX bracket names, e.g. xdigit, as properties with the \p{} and \P{} escapes. Unicode regexps normalise the property name by converting to lowercase and ignoring ' ' and '_'. Accordingly, a \p{posix} escape, where posix is a name defined in http://www.opengroup.org/onlinepubs/007908799/xbd/re.html , is case-sensitive in all non-Unicode encodings. Note that this also affects encodings who have other property names in common with Unicode. For example, both Shift-JS and Unicode define Katakana and Hiragana, yet only Unicode ignores case.

I would prefer if \p{} and \P{} always ignored the case of their arguments. Unicode regexps would override this behaviour so as to ignore ' ' and '_', too.
=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0