Project Details
Machine Translation and Automated Analysis of Cuneiform Languages
Applicant
Professor Dr. Christian Chiarcos
Subject Area
Egyptology and Ancient Near Eastern Studies
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
from 2017 to 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 329145082
History and culture of ancient Mesopotamia, home of the first empires and birthplace of writing, are mostly known through literary and royal inscriptions. Yet, administrative texts, that make up well over 90% of all cuneiform documents, have received much less attention: Even when transliterated and digitized, most remain untranslated, and therefore inaccessible to scholars in even closely related fields. But these texts are unique and deeply insightful socio-historical witnesses, as they document the day-to-day management of early state economies. Because of their vast numbers, their human translation appears to be an unachievable task. From the 21st c. BC alone, we have access to more than 67,000 digital transcriptions as routinely produced by specialists in a particular subset of documents, but without translation difficult to interpret even by specialists in other branches of Assyriology. MTAAC combines recent developments in machine learning (ML) with statistical and neural machine translation (MT) to facilitate the analysis of this material, thereby fundamentally expanding its accessibility to the Humanities and Social Sciences.Main outcome is a methodology, its implementation, and a body of translated and analyzed texts, released under open licenses. Beyond cuneiform studies, we set an example for processing a host of comparable datasets in different historical philologies. Because the texts are so numerous, we supplement human labor with automated solutions. Statistical and neural approaches to Natural Language Processing have been maturing in the last decades, and enjoy wide usage, but have rarely been applied to even major historical languages. We aim to bridge this gap, set an example for ML and MT in the Humanities, and facilitate studies of cuneiform languages. To increase re-usability, we adapt and develop community-maintained specifications based on linked open data formalisms, and propose rules of best practice for collaboration with other digital humanities actors such as museums, and portals for various strands of philology. PI Heather Baker, University of Toronto, Canada, leads the work on language specific aspects in MTAAC. Co-PI Robert Englund, UCLA, director of the Cuneiform Digital Library Initiative, is in charge of data management and hosting. Co-PI Christian Chiarcos, Goethe University Frankfurt, Germany, is responsible ML, MT and data integration. Methodologies are developed collaboratively. MTAAC provides unified access to a highly representative corpus of early writing, and will employ MT and ML to facilitate its context-sensitive semantic interpretation. The project will foster an unprecedented scholarly cooperation among researchers in a variety of disciplines. As a result, lines of communication to the heritage of civilizations dead for many millennia will be made accessible to the networked public, contributing to a deeper appreciation and understanding of modern culture and its historical roots.
DFG Programme
Research Grants
International Connection
Canada, USA
Co-Investigators
Professorin Heather D. Baker; Dr. Robert K. Englund