Project

General

Profile

Feature #4073

HKSCS-2008

Added by oCameLo (oCameLo oTnTh) over 8 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
[ruby-core:33256]

Description

=begin
I suspect that Big5-HKSCS in Ruby missed out some mappings during use it, so I extracted a Big5-HKSCS conversion table from c_951.nls (HKSCS-2001, [1]) and used it to check big5-hkscs-tbl.rb. Except the characters were assigned to PUA in HKSCS-2001, the tbl file missed out 8 characters:

 cF9E9 => u255E # '╞'
 cF9EA => u256A # '╪'
 cF9EB => u2561 # '╡'
 cF9F9 => u2550 # '═'
 cF9FA => u256D # '╭'
 cF9FB => u256E # '╮'
 cF9FC => u2570 # '╰'
 cF9FD => u256F # '╯'

And these 8 characters were included in CP951 but not in the tbl file:

 cA15A => u2574 # '╴'
 cA1C3 => uFFE3 # ' ̄'
 cA1C5 => u02CD # 'ˍ'
 cA1FE => uFF0F # '/'
 cA240 => uFF3C # '\'
 cA2CC => u5341 # '十'
 cA2CE => u5345 # '卅'
 cA3E1 => u20AC # '€'

HKSCS is just a supplementary set of Big5, but they didn't finger out which kind of Big5 base on. Consider for compatibility, I think CP951 is a good choice. Also, HKSCS-2008 ([2]) was released for a long time. So I tried to combine CP951 and HKSCS-2008 together to make a new tbl file, but I met some problems.

First, HKSCS-2008 included 3 ligatures:

 c8862 => <00CA,0304>
 c8864 => <00CA,030C>
 c88A3 => <00EA,0304>
 c88A5 => <00EA,030C>

Does Ruby support this, if so, how to code?

Then, these mappings were included in CP951:

 cA2CC, cA451 => u5341
 cA2CE, cA4CA => u5345
 cA2A5, cF9E9 => u255E
 cA2A6, cF9EA => u256A
 cA2A7, cF9EB => u2561
 cA2A4, cF9F9 => u2550
 cA27E, cF9FA => u256D
 cA2A1, cF9FB => u256E
 cA2A2, cF9FC => u2570
 cA2A3, cF9FD => u256F

 u5341 => cA451
 u5345 => cA4CA
 u255E => cF9E9
 u256A => cF9EA
 u2561 => cF9EB
 u2550 => cF9F9
 u256D => cF9FA
 u256E => cF9FB
 u2570 => cF9FC
 u256F => cF9FD

Same question, does Ruby support this, if so, how to code?

Thanks.

[1] http://www.microsoft.com/hk/hkscs/
[2] http://www.ogcio.gov.hk/ccli/chs/hkscs/mapping_table_2008.html
=end


Files

big5_hkscs_2008.patch.gz (311 KB) big5_hkscs_2008.patch.gz oCameLo (oCameLo oTnTh), 11/22/2010 07:18 PM

Related issues

Related to Ruby master - Feature #1784: More encoding (Big5 series) support?Closed07/16/2009Actions

History

#1

Updated by naruse (Yui NARUSE) over 8 years ago

=begin
Current Ruby's table doesn't have one way conversion, it is wrong.
So I imported tables from ICU.

Now following will pass in trunk.
./ruby -e'p ["A15AA1C3A1C5A1FEA240A2CCA2CEA3E1"].pack("H*").encode("utf-8","cp951")'
./ruby -e'p ["F9E9F9EAF9EBF9F9F9FAF9FBF9FCF9FD"].pack("H*").encode("utf-8","big5-hkscs")'

HKSCS is just a supplementary set of Big5, but they didn't finger out which kind of Big5 base on. Consider for compatibility, I think CP951 is a good choice. Also, HKSCS-2008 ([2]) was released for a long time. So I tried to combine CP951 and HKSCS-2008 together to make a new tbl file, but I met some problems.

For interoperability, we consider compatibility to another implementations.
So can you propose HKSCS-2008 support to GNU libiconv?
We want to follow them.

First, HKSCS-2008 included 3 ligatures:
Then, these mappings were included in CP951:

See enc/trans/.
1:N conversion is specified by simply bytes to bytes mapping.
To specify fallback mapping, split tables from UCS and to UCS.

Anyway we can import simple ucm file, so use it.
http://userguide.icu-project.org/conversion/data
=end

#2

Updated by oCameLo (oCameLo oTnTh) over 8 years ago

=begin
Oops, I just made a new transition table for HKSCS-2008...

That's OK, I'll check those new transition tables later.
=end

#3

Updated by oCameLo (oCameLo oTnTh) over 8 years ago

=begin
CP950 and CP951 in ICU are fine. But HKSCS is too outdated, it's just HKSCS-1999, even older than CP951.

If we need follow another implementations, why not libiconv?

I've written to libiconv to ask for HKSCS-2008 update.
=end

#4

Updated by naruse (Yui NARUSE) over 8 years ago

=begin

CP950 and CP951 in ICU are fine. But HKSCS is too outdated, it's just HKSCS-1999, even older than CP951.

If we need follow another implementations, why not libiconv?

Just because ICU's one is it.
If you have libiconv based one, I can import it.
Do you need all versions?

  • BIG5-HKSCS:1999
  • BIG5-HKSCS:2001
  • BIG5-HKSCS:2004

I've written to libiconv to ask for HKSCS-2008 update.

Thanks, I'll experimentally add your patch as Big5-HKSCS:2008.
=end

#5

Updated by oCameLo (oCameLo oTnTh) over 8 years ago

=begin
Just the-most-update version HKSCS please. If someone needs old versions, Iconv can help.

Actually, GBK, CP950 and Big5-UAO are enough for me. But it might be difficult to understand Chinese Encoding problems by foreigners, I just think I could do something for Ruby on this.
=end

#6

Updated by naruse (Yui NARUSE) over 8 years ago

=begin
I see.

Anyway I succeeded to generate ucm files from libiconv, so I'll add BIG5-HKSCS:2004.
=end

#7

Updated by naruse (Yui NARUSE) over 8 years ago

=begin
Just a question,

If you are a real user of Big5-HKSCS, when you use this?
One time conversion for old data? or communicating with old system?
=end

#8

Updated by oCameLo (oCameLo oTnTh) over 8 years ago

=begin

Anyway I succeeded to generate ucm files from libiconv, so I'll add BIG5-HKSCS:2004.

Big5-HKSCS in libiconv is HKSCS-2008 now:

http://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=commit;h=fd7d5707b506de291acbbefd170281b8226eb379
http://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=blobdiff;f=lib/encodings.def;h=017434351770dabb88f1e4f270eb5933c9f6f92c;hp=b5fda5480f06f4a6160986bddfb7bc48a45dbb78;hb=fd7d5707b506de291acbbefd170281b8226eb379;hpb=8b58085a2a26445b3ec86e811862f3fd4c70eefb

Just a question,

If you are a real user of Big5-HKSCS, when you use this?
One time conversion for old data? or communicating with old system?

Unfortunately, not just limited to "old data".

Most computer users don't understand what Encoding is, not even many webmasters. There's also so many web sites use "Big5", they might mean CP950, Big5-HKSCS or Big5-UAO.

Bruno (author of libiconv) said it's a "big mess around Big5", I agree with that ran deep.
=end

#9

Updated by naruse (Yui NARUSE) over 8 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r29922.
oCameLo, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

=end

Also available in: Atom PDF