Misc #21975: Add "UTF-八" as an alias for UTF-8 encoding - Ruby - Ruby Issue Tracking System

Updated by jinroq (Jinroq SAITOH) about 2 months ago Actions
Copy link
#1 [ruby-core:125168]

ko1 (Koichi Sasada) wrote:

In Japan, legal texts must write all characters - including digits - using full-width or kanji forms. As a result, the encoding name "UTF-8"
appears as "UTF-八" (八 = eight in kanji) in official government notices.

Specifically, it appears in a notice issued by the Digital Agency and the Ministry of Internal Affairs and Communications (令和8年デジタル庁・総務省告示第12号), which defines character sets and encoding for local government information systems:

地方公共団体情報システムの標準化に関する法律第七条第一項に規定する各地方公共団体情報システムに共通する基準のうち電磁的記録において用いられる用語及び符号の相互運用性の確保その他の地方公共団体情報システムに係る互換性の確保に関する標準を定める命令第三条第二号の規定に基づき行政事務標準文字の文字セット及び地方公共団体情報システム間の連携のための文字符号化方式を定める告示

Reference: https://www.digital.go.jp/assets/contents/node/basic_page/field_ref_resources/d12bde7e-a950-493b-987c-0f8d4bbd1b6b/66117898/20260324_laws_notice_text_02.pdf

This patch https://github.com/ruby/ruby/pull/16623 adds "UTF-八" as an encoding alias for UTF-8, so that Ruby is compliant with Japanese law.
# encoding: UTF-八

p __ENCODING__ #=> #<Encoding:UTF-8>

p Encoding.find("UTF-八")              #=> #<Encoding:UTF-8>
p "hello".encode("UTF-八")             #=> "hello"
p "こんにちは".force_encoding("UTF-八") #=> "こんにちは"

p "こんにちは".encode("Ｕ
                      Ｔ
                      Ｆ
                      ｜
                      八") #=> "こんにちは"

Should "UTF-" also be full-width ("ＵＴＦ−")?

Updated by duerst (Martin Dürst) about 2 months ago Actions
Copy link
#2 [ruby-core:125173]

Thanks to @ko1 (Koichi Sasada) for this timely news. It looks like the current Japanese government is recently taking some steps that in some ways have felt long overdue. On December 22, 2025, they changed the Romanization used by the Government from 'Kunrei' to 'Hepburn' (see e.g. https://en.wikipedia.org/wiki/Hepburn_romanization). Kunrei reflects the structure of the Japanese syllabaries (Hiragana, Katakana), but Hepburn makes it easier for foreigners to pronounce Japanese words more or less correctly.

Anyway, with respect to @ko1's proposal, I think it's a good idea to allow "UTF-八" (and probably also full-width "ＵＴＦ-八") as an alternative to "UTF-8" for internal Ruby use. However, it shouldn't be used on the Internet unless it is formally registered (see https://www.iana.org/assignments/character-sets/character-sets.xhtml).

As the expert reviewer for that registry (rather than as a Rubyist) I would have to reject such a registration because currently, "charset"s have to be US-ASCII. Rewriting the relevant RFCs (not to speak about all the software that uses them) would be a lot of work :-).

Updated by Dan0042 (Daniel DeLorme) about 2 months ago Actions
Copy link
#3 [ruby-core:125175]

duerst (Martin Dürst) wrote in #note-2:

I think it's a good idea to allow "UTF-八" (and probably also full-width "ＵＴＦ-八") as an alternative to "UTF-8" for internal Ruby use.

Indeed, but I believe "ＵＴＦ－八" would be a better alias here, since hyphen does indeed have a fullwidth version (U+FF0D) distinct from the prolonged sound mark ー (U+30FC)

Updated by duerst (Martin Dürst) about 2 months ago Actions
Copy link
#4 [ruby-core:125178]

Tracker changed from Feature to Misc

Dan0042 (Daniel DeLorme) wrote in #note-3:

Indeed, but I believe "ＵＴＦ－八" would be a better alias here

Fully agree. I was just too lazy to go figure out the full-width hyphen, sorry.

Updated by ima1zumi (Mari Imaizumi) about 1 month ago Actions
Copy link
#5 [ruby-core:125255]

Status changed from Open to Rejected

While I appreciate the proposal, I must reject it for two reasons:

First, on consistency: the vertical writing is allowed only inside string arguments, while the rest of Ruby code remains horizontal. This is inconsistent.

Second, on environmental impact: the vertical writing form requires a lot of whitespace, which is not eco-friendly. 😛

Project

General

Profile

Ruby

Custom queries

Misc #21975

Add "UTF-八" as an alias for UTF-8 encoding

Updated by jinroq (Jinroq SAITOH) about 2 months ago Actions
Copy link
#1 [ruby-core:125168]

Updated by duerst (Martin Dürst) about 2 months ago Actions
Copy link
#2 [ruby-core:125173]

Updated by Dan0042 (Daniel DeLorme) about 2 months ago Actions
Copy link
#3 [ruby-core:125175]

Updated by duerst (Martin Dürst) about 2 months ago Actions
Copy link
#4 [ruby-core:125178]

Updated by ima1zumi (Mari Imaizumi) about 1 month ago Actions
Copy link
#5 [ruby-core:125255]

Project

General

Profile

Ruby

Custom queries

Misc #21975

Add "UTF-八" as an alias for UTF-8 encoding

Updated by jinroq (Jinroq SAITOH) about 2 months ago ActionsCopy link #1 [ruby-core:125168]

Updated by duerst (Martin Dürst) about 2 months ago ActionsCopy link #2 [ruby-core:125173]

Updated by Dan0042 (Daniel DeLorme) about 2 months ago ActionsCopy link #3 [ruby-core:125175]

Updated by duerst (Martin Dürst) about 2 months ago ActionsCopy link #4 [ruby-core:125178]

Updated by ima1zumi (Mari Imaizumi) about 1 month ago ActionsCopy link #5 [ruby-core:125255]

Updated by jinroq (Jinroq SAITOH) about 2 months ago Actions
Copy link
#1 [ruby-core:125168]

Updated by duerst (Martin Dürst) about 2 months ago Actions
Copy link
#2 [ruby-core:125173]

Updated by Dan0042 (Daniel DeLorme) about 2 months ago Actions
Copy link
#3 [ruby-core:125175]

Updated by duerst (Martin Dürst) about 2 months ago Actions
Copy link
#4 [ruby-core:125178]

Updated by ima1zumi (Mari Imaizumi) about 1 month ago Actions
Copy link
#5 [ruby-core:125255]