Nineteenth International Unicode Conference

Language Processing Issues with Unicode Data

Richard Youatt - American University of Armenia Corporation

Intended Audience:	Manager, Software Engineer, Systems Analyst, Marketer, Academia/Education
Session Level:	Intermediate

Unicode/ISO10646 and the associated programming languages that manipulate the elements of those character sets have opened up a new realm of technical possibilities. These have had primary application in the worlds of e-commerce, software globalization, and the organizational and administrative needs of large multinational organizations. At the same time, a door has been opened to the world of Information Technology and the World Wide Web for the lesser-known cultures and languages of the world.

Even among those concerned with minority rights and cultures, less attention has been focused on the purer linguistic issues, the benefits of technology assisted research in language processing and computer assisted linguistics than on computer literacy and access to the Information Highway. This presentation addresses some of those issues drawing upon theoretical and practical work with the Digital Library of Classical Armenian Literature at the American University of Armenia, and looks at some of the generic issues of language processing with Unicode data.

The primary conclusion is that linguistic and historical research has yet to take full advantage of the "technology boost" that is now available, and that this requires a multidisciplinary and international approach to project work. Progress in technology does not necessarily enhance linguistic skills, or promote conceptual and intellectual progress in language studies. Technology assisted research linguists are equally concerned with the semantics and etymology of language as with the ability to manipulate the elements of the ISO10646/Unicode repertoire that does not in fact meet their full needs.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

22 Jun 2001, Webmaster