A Generalized Mechanism for Unicode Metadata
Intended Audience: |
Software Engineer, Systems Analyst |
Session Level: |
Intermediate |
The many competing motivations for selecting codepoints
in the Unicode standard threaten the supreme purpose of a
character encoding: data. Digital data is immensely conve-nient
because the advantages of its great simplicity outweigh
the loses incurred by representing knowledge imperfectly.
Increases in computing power permit us to begin recovering
what has been left out. Yet the very richness of the collection
of Unicode characters has made the interpretation of text
more difficult. Algorithms and reports are necessary now to
understand raw streams of Unicode characters.
We propose a general mechanism for conveying metadata
within Unicode. The conceptual boundary between codepoints
and text processing is sharpened. The approach is both flexi-ble
and extendable. Furthermore, algorithms such as the bidi-rectional
algorithm can be recast in such a way that they
become detectable and reversible.