Feature #4073
closedHKSCS-2008
Description
=begin
I suspect that Big5-HKSCS in Ruby missed out some mappings during use it, so I extracted a Big5-HKSCS conversion table from c_951.nls (HKSCS-2001, [1]) and used it to check big5-hkscs-tbl.rb. Except the characters were assigned to PUA in HKSCS-2001, the tbl file missed out 8 characters:
cF9E9 => u255E # '╞'
cF9EA => u256A # '╪'
cF9EB => u2561 # '╡'
cF9F9 => u2550 # '═'
cF9FA => u256D # '╭'
cF9FB => u256E # '╮'
cF9FC => u2570 # '╰'
cF9FD => u256F # '╯'
And these 8 characters were included in CP951 but not in the tbl file:
cA15A => u2574 # '╴'
cA1C3 => uFFE3 # ' ̄'
cA1C5 => u02CD # 'ˍ'
cA1FE => uFF0F # '/'
cA240 => uFF3C # '\'
cA2CC => u5341 # '十'
cA2CE => u5345 # '卅'
cA3E1 => u20AC # '€'
HKSCS is just a supplementary set of Big5, but they didn't finger out which kind of Big5 base on. Consider for compatibility, I think CP951 is a good choice. Also, HKSCS-2008 ([2]) was released for a long time. So I tried to combine CP951 and HKSCS-2008 together to make a new tbl file, but I met some problems.
First, HKSCS-2008 included 3 ligatures:
c8862 => <00CA,0304>
c8864 => <00CA,030C>
c88A3 => <00EA,0304>
c88A5 => <00EA,030C>
Does Ruby support this, if so, how to code?
Then, these mappings were included in CP951:
cA2CC, cA451 => u5341
cA2CE, cA4CA => u5345
cA2A5, cF9E9 => u255E
cA2A6, cF9EA => u256A
cA2A7, cF9EB => u2561
cA2A4, cF9F9 => u2550
cA27E, cF9FA => u256D
cA2A1, cF9FB => u256E
cA2A2, cF9FC => u2570
cA2A3, cF9FD => u256F
u5341 => cA451
u5345 => cA4CA
u255E => cF9E9
u256A => cF9EA
u2561 => cF9EB
u2550 => cF9F9
u256D => cF9FA
u256E => cF9FB
u2570 => cF9FC
u256F => cF9FD
Same question, does Ruby support this, if so, how to code?
Thanks.
[1] http://www.microsoft.com/hk/hkscs/
[2] http://www.ogcio.gov.hk/ccli/chs/hkscs/mapping_table_2008.html
=end
Files
Updated by naruse (Yui NARUSE) about 14 years ago
=begin
Current Ruby's table doesn't have one way conversion, it is wrong.
So I imported tables from ICU.
Now following will pass in trunk.
./ruby -e'p ["A15AA1C3A1C5A1FEA240A2CCA2CEA3E1"].pack("H*").encode("utf-8","cp951")'
./ruby -e'p ["F9E9F9EAF9EBF9F9F9FAF9FBF9FCF9FD"].pack("H*").encode("utf-8","big5-hkscs")'
HKSCS is just a supplementary set of Big5, but they didn't finger out which kind of Big5 base on. Consider for compatibility, I think CP951 is a good choice. Also, HKSCS-2008 ([2]) was released for a long time. So I tried to combine CP951 and HKSCS-2008 together to make a new tbl file, but I met some problems.
For interoperability, we consider compatibility to another implementations.
So can you propose HKSCS-2008 support to GNU libiconv?
We want to follow them.
First, HKSCS-2008 included 3 ligatures:
Then, these mappings were included in CP951:
See enc/trans/.
1:N conversion is specified by simply bytes to bytes mapping.
To specify fallback mapping, split tables from UCS and to UCS.
Anyway we can import simple ucm file, so use it.
http://userguide.icu-project.org/conversion/data
=end
Updated by oCameLo (oCameLo oTnTh) about 14 years ago
- File big5_hkscs_2008.patch.gz big5_hkscs_2008.patch.gz added
=begin
Oops, I just made a new transition table for HKSCS-2008...
That's OK, I'll check those new transition tables later.
=end
Updated by oCameLo (oCameLo oTnTh) about 14 years ago
=begin
CP950 and CP951 in ICU are fine. But HKSCS is too outdated, it's just HKSCS-1999, even older than CP951.
If we need follow another implementations, why not libiconv?
I've written to libiconv to ask for HKSCS-2008 update.
=end
Updated by naruse (Yui NARUSE) about 14 years ago
=begin
CP950 and CP951 in ICU are fine. But HKSCS is too outdated, it's just HKSCS-1999, even older than CP951.
If we need follow another implementations, why not libiconv?
Just because ICU's one is it.
If you have libiconv based one, I can import it.
Do you need all versions?
- BIG5-HKSCS:1999
- BIG5-HKSCS:2001
- BIG5-HKSCS:2004
I've written to libiconv to ask for HKSCS-2008 update.
Thanks, I'll experimentally add your patch as Big5-HKSCS:2008.
=end
Updated by oCameLo (oCameLo oTnTh) about 14 years ago
=begin
Just the-most-update version HKSCS please. If someone needs old versions, Iconv can help.
Actually, GBK, CP950 and Big5-UAO are enough for me. But it might be difficult to understand Chinese Encoding problems by foreigners, I just think I could do something for Ruby on this.
=end
Updated by naruse (Yui NARUSE) about 14 years ago
=begin
I see.
Anyway I succeeded to generate ucm files from libiconv, so I'll add BIG5-HKSCS:2004.
=end
Updated by naruse (Yui NARUSE) about 14 years ago
=begin
Just a question,
If you are a real user of Big5-HKSCS, when you use this?
One time conversion for old data? or communicating with old system?
=end
Updated by oCameLo (oCameLo oTnTh) about 14 years ago
=begin
Anyway I succeeded to generate ucm files from libiconv, so I'll add BIG5-HKSCS:2004.
Big5-HKSCS in libiconv is HKSCS-2008 now:
http://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=commit;h=fd7d5707b506de291acbbefd170281b8226eb379
http://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=blobdiff;f=lib/encodings.def;h=017434351770dabb88f1e4f270eb5933c9f6f92c;hp=b5fda5480f06f4a6160986bddfb7bc48a45dbb78;hb=fd7d5707b506de291acbbefd170281b8226eb379;hpb=8b58085a2a26445b3ec86e811862f3fd4c70eefb
Just a question,
If you are a real user of Big5-HKSCS, when you use this?
One time conversion for old data? or communicating with old system?
Unfortunately, not just limited to "old data".
Most computer users don't understand what Encoding is, not even many webmasters. There's also so many web sites use "Big5", they might mean CP950, Big5-HKSCS or Big5-UAO.
Bruno (author of libiconv) said it's a "big mess around Big5", I agree with that ran deep.
=end
Updated by naruse (Yui NARUSE) about 14 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
=begin
This issue was solved with changeset r29922.
oCameLo, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
=end