Actions
Feature #15182
closedUpdate extended grapheme cluster implementation for Unicode 11
Description
Reported by naruse (Yui NARUSE) at https://bugs.ruby-lang.org/issues/14802#change-74213:
The definition of extended grapheme cluster is changed in Unicode 11 (Unicode® Standard Annex #29
UNICODE TEXT SEGMENTATION revision 33: https://www.unicode.org/reports/tr29/tr29-33.html)
This affects Regexp /\X/ which is hardcoded in node_extended_grapheme_cluster() in regparse.c.
( CRLF
| Prepend*
( RI-sequence | Hangul-Syllable | !Control )
( Grapheme_Extend | SpacingMark )*
| . )
crlf
| Control
| precore* core postcore*
Updated by duerst (Martin Dürst) about 6 years ago
- Blocks Feature #14802: Update Unicode data to Unicode Version 11.0.0 added
Updated by duerst (Martin Dürst) almost 6 years ago
- Blocked by Feature #15341: Provide emoji version as RbConfig::CONFIG['UNICODE_EMOJI_VERSION'] added
Updated by duerst (Martin Dürst) almost 6 years ago
- Blocked by Bug #15343: String#each_grapheme_cluster wrongly splits some emoji (genie, zombie, wrestling) added
Updated by duerst (Martin Dürst) almost 6 years ago
- Status changed from Open to Closed
- Assignee set to duerst (Martin Dürst)
Implemented though a long series of patches, centered on regparse.c.
Related patches start at r65085 and end at r66269. The main patch is r66213.
New tests are at test/ruby/enc/test_grapheme_breaks.rb and test/ruby/enc/test_emoji_breaks.c.
enc/unicode.c is also modified.
Actions
Like0
Like0Like0Like0Like0