Project

General

Profile

Actions

Feature #19930

open

[Documentation] class Regexp: Character Classes ranges

Added by noraj-acceis (Alexandre ZANNI) about 1 year ago. Updated about 1 year ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:115070]

Description

cf. https://ruby-doc.org/3.2.2/Regexp.html#class-Regexp-label-Character+Classes

POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.

Reading this description, we globally expect that metacharacters are ASCII only and that POSIX bracket expressions are Unicode aware. But as bracket expressions are POSIX compliant, for example [:xdigit:] use only ASCII range [A-Fa-f0-9] and not the Hex_Digit Unicode property that is also including the Halfwidth and Fullwidth Forms Number Decimal like (U+FF10, FULLWIDTH DIGIT ZERO). So the above description is confusing as we would expect [[:xdigit:]] to encompass non-ASCII characters too. On the contrary [:space:] will look for [\p{Z}\t\r\n\v\f] (\s plus \p{Z} (Separator)) while the description is talking only about [:blank:], newline, carriage return.

My point is, in the end, that it's hard to determine what to expect as ranges for character classes while reading the Ruby Regexp documentation alone. To know what is the exact behavior I'll have to read the source code or at least reading the POSIX spec.

My feature request is about adding a comparison table like the one on https://www.regular-expressions.info/posixbrackets.html (for Java) with: the POSIX bracket expression, the description, the ASCII exact range, the Unicode exact range, the shorthand metacharacter (ASCII), the long escape sequence (Unicode). So we could know precisely what to expect by reading the doc.


Files

Screenshot_20231017_154208.png (145 KB) Screenshot_20231017_154208.png noraj-acceis (Alexandre ZANNI), 10/17/2023 01:42 PM
Actions

Also available in: Atom PDF

Like1
Like0