UnicodeIUC18
Unicode Standard Conference Board Past Conferences Call for Papers Sponsors Showcase
Registration Accommodation Travel Program Talks and Papers Next Conference
Abstract

Conversion Between Hong Kong Supplementary Character Set (HKSCS) and Unicode

Linus Toshihiro Tanaka - Oracle Corporation

Intended Audience: Software Engineer
Session Level: Intermediate

There are two written Chinese languages well recognized in the computer industry. They are Simplified Chinese used primarily in Mainland China, and Traditional Chinese used primarily in Taiwan. There are three more places in the world where Chinese is one of the primary languages, Hong Kong, Macau, and Singapore. Hong Kong's written Chinese language is normally treated as Traditional Chinese. However, there are more than 1,000 characters used in Hong Kong and Mainland China but not frequently used in Taiwan. Therefore, Hong Kong's written Chinese language may be somewhere between Traditional Chinese and Simplified Chinese but much closer to Traditional Chinese than Simplified Chinese. Also, there are a few thousand characters used in Hong Kong that are not used or not frequently used in other Chinese speaking countries and regions. Some of these Hong Kong specific characters have not been included even in Unicode3.0.

In order to solve these two issues, Hong Kong government (currently called Hong Kong S.A.R. government) had defined Government Common Character Set (GCCS) based on Taiwan's Big-5 encoded character set. GCCS included around 3,000 extra characters over Taiwan's Big-5. About half of them are included in China's GBK encoded character set, thus also included in Unicode2.1. Remaining half were not included in Taiwan's Big-5, China's GBK, nor Unicode2.1. Some of these Hong Kong specific characters have been included in Unicode3.0, but there are still some characters not included in Unicode3.0.

In September 1999, Hong Kong S.A.R. government defined Hong Kong Supplementary Character Set (HKSCS) which is the successor of Government Common Character Set (GCCS). Unlike GCCS, HKSCS defines precise mapping between HKSCS and Unicode2.1, and also between HKSCS and Unicode3.0.

Oracle has implemented HKSCS in Oracle8i Release 3 (8.1.7). It handles mapping between HKSCS and Unicode3.0, as well as the compatibility mapping between HKSCS and Unicode2.1. Although HKSCS is very carefully defined by Hong Kong S.A.R. government, there are small number of implementation dependent issues.

In this paper, I explain the specific issues when implementing HKSCS, and what Oracle has done for them.


Unicode
When the world wants to talk, it speaks Unicode

UnicodeIUC18
Unicode Standard Conference Board Past Conferences Call for Papers Sponsors Showcase
Registration Accommodation Travel Program Talks and Papers Next Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

13 December 2000, Webmaster