Project

General

Profile

Actions

Feature #19317

open

Unicode ICU Full case mapping

Added by noraj (Alexandre ZANNI) over 1 year ago. Updated over 1 year ago.

Status:
Assigned
Target version:
-
[ruby-core:111696]

Description

As announced in Case Mapping, Ruby support for Unicode case mapping is not complete yet.

Unicode supports in Ruby is pretty awesome, it works by default nearly everywhere, things are implemented the right way and works as expected by the UTRs.

But some features are still missing.

To reach ICU Full Case Mapping support, a few points need to be enhanced.

context-sensitive case mapping

"ΣΣ".downcase # returns σσ instead of σς

Output examples in ECMAScript:

Σ    ➡️ σ
Σa   ➡️ σa
aΣ   ➡️ aς
aΣa  ➡️ aσa
ΣA   ➡️ σa
aΣ a ➡️ aς a
Σ1   ➡️ σ1
aΣ1  ➡️ aς1
ΣΣ   ➡️ σς

language-sensitive case mapping

  • Lithuanian rules
  • Turkish and Azeri
"I".downcase # => "i"
"I".downcase(:turkic) # => "ı"
"I\u0307".upcase # => "İ"
"I\u0307".upcase(:lithuanian) # => "İ" instead of "I"
  • using some standard locale / language codes

Also, it's true that for now there are only a few language-sensitive rules (for Lithuanian, Turkish and Azeri) but why:

  • adding a :turkic symbol and not a :azeri?
  • using full english arbitrary (why turkic and not turkish?) language name rather than some ICU locale IDs?
    • Language code ISO-639 standard
    • Script code Unicode ISO 15924 Registry
    • country code ISO-3166 standard

So I would rather see something like that

"placeholder".upcase(locale: :tr_TR)
"placeholder".upcase(lang: :tr)

Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalizeClosedduerst (Martin Dürst)Actions
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0