The Hithhiker's Guide to Chinese Encodings
Thomas Emerson - Basis Technology Corporation
Intended Audience: |
Software Engineer |
Session Level: |
Intermediate |
This paper presents an overview and analysis of the plethora of
Chinese character encodings, describing their similarities and
differences, and describing how they map to various versions of
Unicode. For example, how does Big 5 compare with Big 5+ and Microsoft
CP950? What about the various extensions to Big Five? How do the
HKSCS, Eten and HKUST EUDC extensions to Big 5 compare and map to
Unicode? How does one round-trip each of these? And then there is
CNS-11643...
Unfortunately, dealing Simplified Chinese is no simpler: what is the
relationship between GB 2312:80 and GB 12345:90 (GB 12345 is the
traditional analog to GB 2312) and how does GB 12345 compare with Big
5? For that matter, how does GB 2312:80 compare with GBK and Microsoft
CP936? And how do all of these map to Unicode 2.1 and 3.0.1? What do
all these mean for the poor programmer who has to try and deal with
them?
At the end of this presentation, you will leave with a better understanding
of how these encodings relate and how to deal with them when authoring
Chinese-language applications.
|