Handling Epigraphic Materials and Manuscripts: Character Encoding or Text Encoding?
Deborah Anderson - Department of Linguistics, UC Berkeley
Intended Audience: |
Software Engineer |
Session Level: |
Intermediate, Advanced |
Ancient texts may be found on a number of media (papyri, inscribed on
rock, on cuneiform tablets, etc.). In order to capture the information for
online scholarly texts, one can include a photo in, for example, JPEG
image format to show the full document. In the absence of (or in addition
to) a photo, scholarly editions often include a version of the
texts with the characters written out. The symbols are then transcribed or
transliterated into Latin letters, sometimes with diacritics. To manipulate the
symbols more fully for data processing, character encoding with Unicode
and markup (i.e., according to Text Encoding Initiative) can be
used. Markup is used for rulings between lines of text and can be used for
editorial commentary on the texts. A proposal for epigraphic materials
called "Epidoc" has been put forward. Epidoc captures nearly all the
editorial content in markup, including brackets and parentheses. Is such a
technique the best (and/or the only) choice for the scholarly community to
embrace? This presentation will give examples of epigraphic and
papyrological materials currently employed in print editions and will
discuss possible approaches for handling these types of materials on the
Web. A plain text version with Unicode is recommended beside a full
text-encoding scheme, as this would make the text more immediately
available to a wider audience.
|