Project

General

Profile

Actions

Feature #15182

closed

Update extended grapheme cluster implementation for Unicode 11

Added by duerst (Martin Dürst) about 6 years ago. Updated almost 6 years ago.

Status:
Closed
Target version:
[ruby-core:89224]

Description

Reported by naruse (Yui NARUSE) at https://bugs.ruby-lang.org/issues/14802#change-74213:

The definition of extended grapheme cluster is changed in Unicode 11 (Unicode® Standard Annex #29
UNICODE TEXT SEGMENTATION revision 33: https://www.unicode.org/reports/tr29/tr29-33.html)
This affects Regexp /\X/ which is hardcoded in node_extended_grapheme_cluster() in regparse.c.

( CRLF
| Prepend*
( RI-sequence | Hangul-Syllable | !Control )
( Grapheme_Extend | SpacingMark )*
| . )
crlf 
| Control 
| precore* core postcore*

Related issues 3 (0 open3 closed)

Blocks Ruby master - Feature #14802: Update Unicode data to Unicode Version 11.0.0Closedduerst (Martin Dürst)Actions
Blocked by Ruby master - Feature #15341: Provide emoji version as RbConfig::CONFIG['UNICODE_EMOJI_VERSION']Closedmatz (Yukihiro Matsumoto)Actions
Blocked by Ruby master - Bug #15343: String#each_grapheme_cluster wrongly splits some emoji (genie, zombie, wrestling)Closedduerst (Martin Dürst)Actions
Actions #1

Updated by duerst (Martin Dürst) about 6 years ago

  • Blocks Feature #14802: Update Unicode data to Unicode Version 11.0.0 added
Actions #2

Updated by duerst (Martin Dürst) almost 6 years ago

  • Blocked by Feature #15341: Provide emoji version as RbConfig::CONFIG['UNICODE_EMOJI_VERSION'] added
Actions #3

Updated by duerst (Martin Dürst) almost 6 years ago

  • Blocked by Bug #15343: String#each_grapheme_cluster wrongly splits some emoji (genie, zombie, wrestling) added

Updated by duerst (Martin Dürst) almost 6 years ago

  • Status changed from Open to Closed
  • Assignee set to duerst (Martin Dürst)

Implemented though a long series of patches, centered on regparse.c.

Related patches start at r65085 and end at r66269. The main patch is r66213.
New tests are at test/ruby/enc/test_grapheme_breaks.rb and test/ruby/enc/test_emoji_breaks.c.
enc/unicode.c is also modified.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0