Nineteenth International Unicode Conference

Handling Epigraphic Materials and Manuscripts: Character Encoding or Text Encoding?

Deborah Anderson - Department of Linguistics, UC Berkeley

Intended Audience:	Software Engineer
Session Level:	Intermediate, Advanced

Ancient texts may be found on a number of media (papyri, inscribed on rock, on cuneiform tablets, etc.). In order to capture the information for online scholarly texts, one can include a photo in, for example, JPEG image format to show the full document. In the absence of (or in addition to) a photo, scholarly editions often include a version of the texts with the characters written out. The symbols are then transcribed or transliterated into Latin letters, sometimes with diacritics. To manipulate the symbols more fully for data processing, character encoding with Unicode and markup (i.e., according to Text Encoding Initiative) can be used. Markup is used for rulings between lines of text and can be used for editorial commentary on the texts. A proposal for epigraphic materials called "Epidoc" has been put forward. Epidoc captures nearly all the editorial content in markup, including brackets and parentheses. Is such a technique the best (and/or the only) choice for the scholarly community to embrace? This presentation will give examples of epigraphic and papyrological materials currently employed in print editions and will discuss possible approaches for handling these types of materials on the Web. A plain text version with Unicode is recommended beside a full text-encoding scheme, as this would make the text more immediately available to a wider audience.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

22 Jun 2001, Webmaster