Bug #21559
closedUnicode normalization nfd -> nfc -> nfd is not reversible
Updated by nobu (Nobuyoshi Nakada) 10 months ago
Updated by ima1zumi (Mari Imaizumi) 10 months ago
- Assignee set to ima1zumi (Mari Imaizumi)
This looks like a bug. Per Unicode TR15, the identity toNFD(x) == toNFD(toNFC(x)) must be maintained. https://unicode.org/reports/tr15/#Design_Goals
It seems the NFC process is combining characters across U+11930, even though its CCC is 0.
Updated by duerst (Martin Dürst) 10 months ago
- Assignee changed from ima1zumi (Mari Imaizumi) to duerst (Martin Dürst)
@ima1zumi (Mari Imaizumi) Not sure this is even allowed, but I'm sure I'm responsible for this behavior, and want to fix it myself, so I change the Assignee to myself.
Updated by ima1zumi (Mari Imaizumi) 10 months ago
@duerst (Martin Dürst) Thank you, I appreciate you taking care of it.
Updated by duerst (Martin Dürst) 8 months ago
- Status changed from Open to Closed
Added regression test at https://github.com/ruby/ruby/commit/a122d7a58e91ed6cd531e906cb398688d7cc8b17
and fix at https://github.com/ruby/ruby/commit/e4c8e3544237b8c0efba6b945173dc66552d641c.
Many thanks to Tomoya Ishida for finding this bug.
Updated by duerst (Martin Dürst) 8 months ago
- Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED
Backport would only be needed if the upgrade to Unicode 16.0.0 (see https://bugs.ruby-lang.org/issues/20724) is backported.
Updated by duerst (Martin Dürst) 8 months ago
Note to potential backporters: https://github.com/ruby/ruby/commit/bd51b20c50 should also be backported.