Project Details
Projekt Print View

Machine Translation and Automated Analysis of Cuneiform Languages

Subject Area Egyptology and Ancient Near Eastern Studies
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2017 to 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 329145082
 
Final Report Year 2023

Final Report Abstract

Mesopotamia is the birthplace of writing, but the majority of cuneiform texts remain underexplored, as certain genres (e.g., literature) received more attention than the 90% of texts dealing with administrative matters. Although most of these remained untranslated and inaccessible to non-specialists, they are insightful socio-historical witnesses that document early state economies. Because of their vast numbers, human translation appears to be unachievable; MTAAC thus provided automated annotation, machine translation and information extraction methods and applied established best practices (Linked Open Data) to close this gap in order to provide unprecedented access to these sources. This includes: • Pre-processing and Morphology • Morphological and Syntactic Annotation • Machine Translation and Evaluation • Information Extraction • Interoperability and Infrastructure work • Methodology, Applied Use of Data, and Dissemination Despite challenges like staff turnover and delays in manual annotation, MTAAC achieved all project objectives, it delivered tools for processing Sumerian text, its linguistic and semantic annotation, and for automatically producing translations. Standard evaluation metrics (BLEU, F1) were applied during the process. Data and software were released under public domain or open licences. MTAAC contributed to Cuneiform Digital Library Initiative infrastructure development, so that its results become more easily accessible to Digital Assyriology. In the wider context of Digital Humanities, it contributed to the development of community standards (OntoLex, Linguistic Linked Open Data). Scientific communication measures include numerous publications and the organization of tutorials and workshops. Public media coverage was also achieved. Overall, MTAAC was successful in delivering cutting-edge tools for machine translation and automated analysis of Sumerian cuneiform text, and in making tools and data available in an open, FAIR and re-usable way.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung