The Hithhiker's Guide to Chinese Encodings
Thomas Emerson - Basis Technology Corporation
Intended Audience: |
Software Engineer |
Session Level: |
Intermediate |
This presentation presents an overview and analysis of the plethora of
Chinese character encodings, describing their similarities and
differences, and describing how they map to various versions of
Unicode. For example, how does Big Five compare with Big 5+ and
Microsoft CP950? What about the various extensions to Big Five? How
do the HKSCS and Eten the HKUST EUDC extensions to Big 5 compare and
map to Unicode? How does one round-trip each of these? And then there
is CNS-11643...
Unfortunately dealing Simplified Chinese is no simpler: what is the
relationship between GB 2312:80 and GB 12345:90 (GB 12345 is the
traditional analog to GB 2312) and how does GB 12345 compare with Big
5. For that matter, how does GB 2312:80 compare with GBK, Microsoft
CP936, and GB18030? And how do all of these map to Unicode 2.1 and
3.1? What does this all mean for the poor programmer who has to try
and deal with China.
At the end of this presentation it is expected that you will leave
with a better understanding of how these encodings relate and how to
deal with them when authoring Chinese-language applications.
|