Feature #19908: Update to Unicode 15.1 - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #19908

closed

Update to Unicode 15.1

Feature #19908: Update to Unicode 15.1

Added by nobu (Nobuyoshi Nakada) over 2 years ago. Updated 10 months ago.

Status:

Closed

Assignee:

duerst (Martin Dürst)

Target version:

[ruby-core:114936]

Description

The Unicode 15.1 is released.

The current enc-unicode.rb seems to fail because of Indic_Conjunct_break properties with values.

I'm not sure how these properties should be handled well.
/\p{InCB_Liner}/ or /\p{InCB=Liner}/ as the comments in that file?
https://github.com/nobu/ruby/tree/unicode-15.1 is the former.

Related issues 4 (1 open — 3 closed)

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#1

Related to Bug #10416: Create mechanism for updating of Unicode data files downstreams when we want added

Updated by hsbt (Hiroshi SHIBATA) about 2 years ago Actions
Copy link
#2

Target version deleted (~~3.3~~)

Updated by duerst (Martin Dürst) about 2 years ago Actions
Copy link
#3 [ruby-core:115899]

There is a more serious issue than just whether using an '_' or an '=' in the property: Unicode 15.1 makes some serious changes to grapheme clusters.

Our implementation (function 'node_extended_grapheme_cluster' in regparse.c) is based on Unicode 11.0, in particular https://www.unicode.org/reports/tr29/tr29-33.html#Grapheme_Cluster_Boundaries. This is quite a bit different from the current version at https://www.unicode.org/reports/tr29/tr29-43.html#Grapheme_Cluster_Boundaries. One major difference is that for Unicode 11.0, there was a regular expression for grapheme clusters, which I just implemented in the above function. Unicode 15.1 just says that it's possible to use a regular expression, but doesn't give this regular expression.

From reading through https://www.unicode.org/versions/Unicode15.1.0/#Migration, that's the main issue affecting Ruby.

Updated by duerst (Martin Dürst) about 2 years ago Actions
Copy link
#4 [ruby-core:115906]

@nobu (Nobuyoshi Nakada):
We have Grapheme_Cluster_Break=...、so I think '=' may be appropriate. But Grapheme_Cluster_Break=... uses a long, explicit name. So shouldn't it be Indic_Cluster_Break=..., not just InCB=...?

Updated by duerst (Martin Dürst) about 2 years ago Actions
Copy link
#5

Related to Bug #20150: Memory leak in grapheme clusters added

Updated by janosch-x (Janosch Müller) about 2 years ago Actions
Copy link
#6 [ruby-core:116056]

Is not this the updated regular expression?

 ccs-base :=     [\p{L}\p{N}\p{P}\p{S}\p{Zs}]
 ccs-extend :=  [\p{M}\p{Join_Control}]
 extended_base :=       ccs-base
 | hangul-syllable
-crlf :=        CR LF
+crlf :=        CR LF | CR | LF
 legacy-core := hangul-syllable
 | ri-sequence
 | xpicto-sequence
 legacy-postcore :=    [Extend ZWJ]
 core :=        hangul-syllable
 | ri-sequence
 | xpicto-sequence
+| conjunctCluster
 | [^Control CR LF]
 postcore :=    [Extend ZWJ SpacingMark]
 precore :=     Prepend
 hangul-syllable :=    L* (V+ | LV V* | LVT) T*
 | L+
 | T+
 xpicto-sequence :=     \p{Extended_Pictographic} (Extend* ZWJ \p{Extended_Pictographic})*
+conjunctCluster :=     \p{InCB=Consonant} ([\p{InCB=Extend} \p{InCB=Linker}]* \p{InCB=Linker} [\p{InCB=Extend} \p{InCB=Linker}]* \p{InCB=Consonant})+

Updated by duerst (Martin Dürst) almost 2 years ago Actions
Copy link
#7 [ruby-core:116099]

@janosch-x (Janosch Müller) You are correct, thanks! I noticed it a few days ago, but didn't yet get around to write about that here. You beat me to that!

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#8 [ruby-core:119128]

Unicode 16.0 has been released.

https://www.unicode.org/versions/Unicode16.0.0/

Should we move this instead of 15.1?

Updated by duerst (Martin Dürst) over 1 year ago Actions
Copy link
#9

Precedes Feature #20724: Update to Unicode 16.0 added

Updated by duerst (Martin Dürst) over 1 year ago 1Actions
Copy link
#10 [ruby-core:119130]

hsbt (Hiroshi SHIBATA) wrote in #note-8:

Unicode 16.0 has been released.

Should we move this instead of 15.1?

I think it's more prudent to do 15.1 first, then 16.0. I hope to be able to work on this soon. I created a separate issue for 16.0.

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#11 [ruby-core:119131]

I think it's more prudent to do 15.1 first, then 16.0.

Agreed, thanks!

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#12

Has duplicate Feature #19171: Update Unicode data to Unicode Version 15.1 added

Updated by ima1zumi (Mari Imaizumi) about 1 year ago Actions
Copy link
#13 [ruby-core:120460]

@duerst (Martin Dürst)

I'm interested in working on this issue. Are you planning to start it? If not, I'd like to try.

Updated by mame (Yusuke Endoh) 10 months ago Actions
Copy link
#14 [ruby-core:121281]

@duerst (Martin Dürst) What do you think?

Updated by ima1zumi (Mari Imaizumi) 10 months ago 1Actions
Copy link
#15 [ruby-core:121291]

I have created a PR to update it.

https://github.com/ruby/ruby/pull/12798

Updated by naruse (Yui NARUSE) 10 months ago 4Actions
Copy link
#16 [ruby-core:121364]

The change looks good to me.
Since you have already contributed reline and show your engineering skill, and now you also want to contribute to ruby/ruby, I think you should have commit right for ruby/ruby and commit this change by yourself.

@matz (Yukihiro Matsumoto) How do you think?

Updated by ima1zumi (Mari Imaizumi) 10 months ago Actions
Copy link
#17 [ruby-core:121365]

@naruse (Yui NARUSE)
Thank you so much for your review and recommending me. I’d be happy to take on commit rights and commit this change myself.

Updated by mame (Yusuke Endoh) 10 months ago Actions
Copy link
#18 [ruby-core:121366]

I'd also like to introduce ima1zumi-san as a candidate for committer. She has been actively working on irb and reline, has deep knowledge and a strong interest in character encoding, and is highly recognized, as she was endorsed by @naruse (Yui NARUSE), the maintainer of Ruby's encoding system. With her contributions extending towards Ruby itself, I support her nomination.

Updated by kosaki (Motohiro KOSAKI) 10 months ago Actions
Copy link
#19 [ruby-core:121367]

Updated by k0kubun (Takashi Kokubun) 10 months ago Actions
Copy link
#20 [ruby-core:121368]

Updated by matsuda (Akira Matsuda) 10 months ago Actions
Copy link
#21 [ruby-core:121369]

Updated by mrkn (Kenta Murata) 10 months ago Actions
Copy link
#22 [ruby-core:121370]

Updated by alanwu (Alan Wu) 10 months ago Actions
Copy link
#23 [ruby-core:121371]

Updated by matz (Yukihiro Matsumoto) 10 months ago 1Actions
Copy link
#24 [ruby-core:121385]

#note-16 Approved.

Matz.

Updated by hsbt (Hiroshi SHIBATA) 10 months ago Actions
Copy link
#25 [ruby-core:121386]

@ima1zumi (Mari Imaizumi) Can you provide the required information to me? See https://github.com/ruby/ruby/wiki/Committer-How-To#how-to-register-you-as-a-committer in details.

Updated by ima1zumi (Mari Imaizumi) 10 months ago Actions
Copy link
#26 [ruby-core:121387]

@hsbt (Hiroshi SHIBATA)
I've sent an email to cvs-admin and opened https://github.com/ruby/git.ruby-lang.org/pull/91

Updated by hsbt (Hiroshi SHIBATA) 10 months ago Actions
Copy link
#27 [ruby-core:121391]

Thanks, I've finished to prepare your account now.

Updated by ima1zumi (Mari Imaizumi) 10 months ago Actions
Copy link
#28

Status changed from Assigned to Closed

Applied in changeset git|e63c516046b6dbf2f684454b68013b4eea12e94a.

[Feature #19908] Update Unicode headers to 15.1.0

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #19908

Update to Unicode 15.1

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#1

Updated by hsbt (Hiroshi SHIBATA) about 2 years ago Actions
Copy link
#2

Updated by duerst (Martin Dürst) about 2 years ago Actions
Copy link
#3 [ruby-core:115899]

Updated by duerst (Martin Dürst) about 2 years ago Actions
Copy link
#4 [ruby-core:115906]

Updated by duerst (Martin Dürst) about 2 years ago Actions
Copy link
#5

Updated by janosch-x (Janosch Müller) about 2 years ago Actions
Copy link
#6 [ruby-core:116056]

Updated by duerst (Martin Dürst) almost 2 years ago Actions
Copy link
#7 [ruby-core:116099]

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#8 [ruby-core:119128]

Updated by duerst (Martin Dürst) over 1 year ago Actions
Copy link
#9

Updated by duerst (Martin Dürst) over 1 year ago 1Actions
Copy link
#10 [ruby-core:119130]

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#11 [ruby-core:119131]

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#12

Updated by ima1zumi (Mari Imaizumi) about 1 year ago Actions
Copy link
#13 [ruby-core:120460]

Updated by mame (Yusuke Endoh) 10 months ago Actions
Copy link
#14 [ruby-core:121281]

Updated by ima1zumi (Mari Imaizumi) 10 months ago 1Actions
Copy link
#15 [ruby-core:121291]

Updated by naruse (Yui NARUSE) 10 months ago 4Actions
Copy link
#16 [ruby-core:121364]

Updated by ima1zumi (Mari Imaizumi) 10 months ago Actions
Copy link
#17 [ruby-core:121365]

Updated by mame (Yusuke Endoh) 10 months ago Actions
Copy link
#18 [ruby-core:121366]

Updated by kosaki (Motohiro KOSAKI) 10 months ago Actions
Copy link
#19 [ruby-core:121367]

Updated by k0kubun (Takashi Kokubun) 10 months ago Actions
Copy link
#20 [ruby-core:121368]

Updated by matsuda (Akira Matsuda) 10 months ago Actions
Copy link
#21 [ruby-core:121369]

Updated by mrkn (Kenta Murata) 10 months ago Actions
Copy link
#22 [ruby-core:121370]

Updated by alanwu (Alan Wu) 10 months ago Actions
Copy link
#23 [ruby-core:121371]

Updated by matz (Yukihiro Matsumoto) 10 months ago 1Actions
Copy link
#24 [ruby-core:121385]

Updated by hsbt (Hiroshi SHIBATA) 10 months ago Actions
Copy link
#25 [ruby-core:121386]

Updated by ima1zumi (Mari Imaizumi) 10 months ago Actions
Copy link
#26 [ruby-core:121387]

Updated by hsbt (Hiroshi SHIBATA) 10 months ago Actions
Copy link
#27 [ruby-core:121391]

Updated by ima1zumi (Mari Imaizumi) 10 months ago Actions
Copy link
#28

Project

General

Profile

Ruby

Custom queries

Feature #19908

Update to Unicode 15.1

Updated by nobu (Nobuyoshi Nakada) over 2 years ago ActionsCopy link #1

Updated by hsbt (Hiroshi SHIBATA) about 2 years ago ActionsCopy link #2

Updated by duerst (Martin Dürst) about 2 years ago ActionsCopy link #3 [ruby-core:115899]

Updated by duerst (Martin Dürst) about 2 years ago ActionsCopy link #4 [ruby-core:115906]

Updated by duerst (Martin Dürst) about 2 years ago ActionsCopy link #5

Updated by janosch-x (Janosch Müller) about 2 years ago ActionsCopy link #6 [ruby-core:116056]

Updated by duerst (Martin Dürst) almost 2 years ago ActionsCopy link #7 [ruby-core:116099]

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago ActionsCopy link #8 [ruby-core:119128]

Updated by duerst (Martin Dürst) over 1 year ago ActionsCopy link #9

Updated by duerst (Martin Dürst) over 1 year ago 1ActionsCopy link #10 [ruby-core:119130]

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago ActionsCopy link #11 [ruby-core:119131]

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago ActionsCopy link #12

Updated by ima1zumi (Mari Imaizumi) about 1 year ago ActionsCopy link #13 [ruby-core:120460]

Updated by mame (Yusuke Endoh) 10 months ago ActionsCopy link #14 [ruby-core:121281]

Updated by ima1zumi (Mari Imaizumi) 10 months ago 1ActionsCopy link #15 [ruby-core:121291]

Updated by naruse (Yui NARUSE) 10 months ago 4ActionsCopy link #16 [ruby-core:121364]

Updated by ima1zumi (Mari Imaizumi) 10 months ago ActionsCopy link #17 [ruby-core:121365]

Updated by mame (Yusuke Endoh) 10 months ago ActionsCopy link #18 [ruby-core:121366]

Updated by kosaki (Motohiro KOSAKI) 10 months ago ActionsCopy link #19 [ruby-core:121367]

Updated by k0kubun (Takashi Kokubun) 10 months ago ActionsCopy link #20 [ruby-core:121368]

Updated by matsuda (Akira Matsuda) 10 months ago ActionsCopy link #21 [ruby-core:121369]

Updated by mrkn (Kenta Murata) 10 months ago ActionsCopy link #22 [ruby-core:121370]

Updated by alanwu (Alan Wu) 10 months ago ActionsCopy link #23 [ruby-core:121371]

Updated by matz (Yukihiro Matsumoto) 10 months ago 1ActionsCopy link #24 [ruby-core:121385]

Updated by hsbt (Hiroshi SHIBATA) 10 months ago ActionsCopy link #25 [ruby-core:121386]

Updated by ima1zumi (Mari Imaizumi) 10 months ago ActionsCopy link #26 [ruby-core:121387]

Updated by hsbt (Hiroshi SHIBATA) 10 months ago ActionsCopy link #27 [ruby-core:121391]

Updated by ima1zumi (Mari Imaizumi) 10 months ago ActionsCopy link #28

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#1

Updated by hsbt (Hiroshi SHIBATA) about 2 years ago Actions
Copy link
#2

Updated by duerst (Martin Dürst) about 2 years ago Actions
Copy link
#3 [ruby-core:115899]

Updated by duerst (Martin Dürst) about 2 years ago Actions
Copy link
#4 [ruby-core:115906]

Updated by duerst (Martin Dürst) about 2 years ago Actions
Copy link
#5

Updated by janosch-x (Janosch Müller) about 2 years ago Actions
Copy link
#6 [ruby-core:116056]

Updated by duerst (Martin Dürst) almost 2 years ago Actions
Copy link
#7 [ruby-core:116099]

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#8 [ruby-core:119128]

Updated by duerst (Martin Dürst) over 1 year ago Actions
Copy link
#9

Updated by duerst (Martin Dürst) over 1 year ago 1Actions
Copy link
#10 [ruby-core:119130]

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#11 [ruby-core:119131]

Updated by hsbt (Hiroshi SHIBATA) over 1 year ago Actions
Copy link
#12

Updated by ima1zumi (Mari Imaizumi) about 1 year ago Actions
Copy link
#13 [ruby-core:120460]

Updated by mame (Yusuke Endoh) 10 months ago Actions
Copy link
#14 [ruby-core:121281]

Updated by ima1zumi (Mari Imaizumi) 10 months ago 1Actions
Copy link
#15 [ruby-core:121291]

Updated by naruse (Yui NARUSE) 10 months ago 4Actions
Copy link
#16 [ruby-core:121364]

Updated by ima1zumi (Mari Imaizumi) 10 months ago Actions
Copy link
#17 [ruby-core:121365]

Updated by mame (Yusuke Endoh) 10 months ago Actions
Copy link
#18 [ruby-core:121366]

Updated by kosaki (Motohiro KOSAKI) 10 months ago Actions
Copy link
#19 [ruby-core:121367]

Updated by k0kubun (Takashi Kokubun) 10 months ago Actions
Copy link
#20 [ruby-core:121368]

Updated by matsuda (Akira Matsuda) 10 months ago Actions
Copy link
#21 [ruby-core:121369]

Updated by mrkn (Kenta Murata) 10 months ago Actions
Copy link
#22 [ruby-core:121370]

Updated by alanwu (Alan Wu) 10 months ago Actions
Copy link
#23 [ruby-core:121371]

Updated by matz (Yukihiro Matsumoto) 10 months ago 1Actions
Copy link
#24 [ruby-core:121385]

Updated by hsbt (Hiroshi SHIBATA) 10 months ago Actions
Copy link
#25 [ruby-core:121386]

Updated by ima1zumi (Mari Imaizumi) 10 months ago Actions
Copy link
#26 [ruby-core:121387]

Updated by hsbt (Hiroshi SHIBATA) 10 months ago Actions
Copy link
#27 [ruby-core:121391]

Updated by ima1zumi (Mari Imaizumi) 10 months ago Actions
Copy link
#28