Machine Translation and Automated Analysis of Cuneiform Languages
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Final Report Abstract
Mesopotamia is the birthplace of writing, but the majority of cuneiform texts remain underexplored, as certain genres (e.g., literature) received more attention than the 90% of texts dealing with administrative matters. Although most of these remained untranslated and inaccessible to non-specialists, they are insightful socio-historical witnesses that document early state economies. Because of their vast numbers, human translation appears to be unachievable; MTAAC thus provided automated annotation, machine translation and information extraction methods and applied established best practices (Linked Open Data) to close this gap in order to provide unprecedented access to these sources. This includes: • Pre-processing and Morphology • Morphological and Syntactic Annotation • Machine Translation and Evaluation • Information Extraction • Interoperability and Infrastructure work • Methodology, Applied Use of Data, and Dissemination Despite challenges like staff turnover and delays in manual annotation, MTAAC achieved all project objectives, it delivered tools for processing Sumerian text, its linguistic and semantic annotation, and for automatically producing translations. Standard evaluation metrics (BLEU, F1) were applied during the process. Data and software were released under public domain or open licences. MTAAC contributed to Cuneiform Digital Library Initiative infrastructure development, so that its results become more easily accessible to Digital Assyriology. In the wider context of Digital Humanities, it contributed to the development of community standards (OntoLex, Linguistic Linked Open Data). Scientific communication measures include numerous publications and the organization of tutorials and workshops. Public media coverage was also achieved. Overall, MTAAC was successful in delivering cutting-edge tools for machine translation and automated analysis of Sumerian cuneiform text, and in making tools and data available in an open, FAIR and re-usable way.
Publications
-
Machine Translation and Automated Analysis of the Sumerian Language. Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 10-16. Association for Computational Linguistics.
Pagé-Perron, Émilie; Sukhareva, Maria; Khait, Ilya & Chiarcos, Christian
-
MTAAC: Machine Translation and Automated Analysis of Cuneiform Languages (Canadian Society for Mesopotamian Studies Symposium “Digital Cuneiform: New Approaches to the Study of Ancient Near Eastern Written Sources,” Toronto, September 30, 2017)
Pagé-Perron, Émilie & Reckling, Lucas
-
Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax. Information, 9(11), 290.
Chiarcos, Christian; Khait, Ilya; Pagé-Perron, Émilie; Schenk, Niko; Jayanth; Fäth, Christian; Steuer, Julius; Mcgrath, William & Wang, Jinyan
-
Annotating Sumerian: (Semi)automatic Tagging of Morphology and Syntax for a Low-Resource Language (Seminar for Comparative- Historical Linguistics, Institute of Linguistics of the Russian Academy of Sciences, Moscow, October 4, 2018)
Khait, Ilya
-
Annotating Sumerian: A LLOD-enhanced Workflow for Cuneiform Corpora. Proceedings of the 6th Workshop on Linked Data in Linguistics (LDL-2018): Towards Linguistic Data Science. Miyazaki, Japan. 2018
Chiarcos, Christian, Khait, Ilya, Pagé-Perron, Émilie, Schenk, Niko, Jayanth & Reckling, Lucas
-
Cuneiform Collections as (Linked) Data at workshop Collections as Data, Digital Libraries Forum, Las Vegas, October 15–17, 2018)
Pagé-Perron, Émilie
-
Introducing the MTAAC project: Machine Translation and Automated Analysis of Cuneiform Languages (64th Rencontre Assyriologique Internationale, Innsbruck, July 20, 2018)
Baker, Heather D.
-
Le projet MTAAC: traduction et analyse automatique de textes cunéiformes (Conférence de l’Association des études du Proche-Orient ancien, Université du Québec à Montréal, March 13, 2018)
Pagé-Perron, Émilie
-
Machine Translation and Automated Analysis of Cuneiform Languages (contributions to workshop “Building International Bridges Through Digital Scholarship: The Trans-Atlantic Platform Digging Into Data Challenge Experience,” DH2018 Conference, Mexico City, June 25, 2018)
Baker, Heather D. & Pagé-Perron, E.
-
Pre-requisites and Workflow for the Machine Translation of the Sumerian Language. workshop “Future Philologies: Digital Directions in Ancient World Text,” Institute for the Study of the Ancient World, New York, April 20, 2018
Pagé-Perron, Émilie
-
Recent Developments in Natural Language Processing for Cuneiform Languages. workshop “Thinking Digital in Cuneiform Studies: Methods, Problems, Perspectives,” Venice, March 27–28, 2018
Pagé-Perron, Émilie
-
Towards a Linked Open Data Edition of Sumerian Corpora. In: Proceedings of the Language Resources and Evaluation Conference (LREC-2018), Miyazaki, Japan. 2018a
Chiarcos, Christian, Pagé-Perron, Émilie, Khait, Ilya, Schenk, Niko & Reckling, Lucas
-
CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation. In: Eskevich et al. (eds.), Proceedings of the 2nd Conference on Language, Data and Knowledge, Leipzig, May 2019, Dagstuhl Open Access Series in Informatics, 70
Christian Chiarcos & Niko Schenk
-
Lemmatizing Low-resource, Lessresearched Languages: The Linked Open Text Reader and Annotator. poster presented at the Digital Humanities Conference 2019 (DH2019), Utrecht, July 2019
Maxim Ionov & Christian Chiarcos
-
Linking the TEI: Approaches, Limitations, Use Cases. paper presented at Digital Humanities Conference 2019 (DH2019), Utrecht, July 2019
Christian Chiarcos & Maxim Ionov
-
New Technologies for Cuneiform Studies: Processing & Linking Textual Data (University of California Berkeley, April 8, 2019)
Pagé-Perron, Émilie
-
The MTAAC Project: Computational Analysis for Sumerian (Broadening Horizons 6 Conference, Freie Universität Berlin, June 24–28, 2019)
Khait, Ilya & Pagé-Perron, Émilie
-
Towards the First Machine Translation System for Sumerian Transliterations. Proceedings of the 28th International Conference on Computational Linguistics, 3454-3460. International Committee on Computational Linguistics.
Punia, Ravneet; Schenk, Niko; Chiarcos, Christian & Pagé-Perron, Émilie
-
A Next Generation Infrastructure for Digital Assyriology The Association for Computers and the Humanities Conference, Virtual, July 21-23 (2021)
Emilie Page-Perron & Christian Chiarcos
-
Digital Library Organization and Accessibility. Future Philology: Digitization and Beyond, University of Oxford, September 30 - October 1 (2021)
Emilie Page-Perron
-
How Low is Too Low? A Computational Perspective on Extremely Low-Resource Languages. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, 2021. Association for Computational Linguistics.
Bansal, Rachit; Choudhary, Himanshu; Punia, Ravneet; Schenk, Niko; Pagé-Perron, Émilie & Dahl, Jacob
-
Building ETCSANS: The Electronic Text Corpus of Syntactically Annotated Neo-Sumerian. paper presented at DH2022. Responding to Asian Diversity. Tokyo, Japan, July 2022
Christian Chiarcos & Emilie Page-Perron
-
When linguistics meets web technologies. Recent advances in modelling linguistic linked data. Semantic Web, 13(6), 987-1050.
Khan, Anas Fahad; Chiarcos, Christian; Declerck, Thierry; Gifu, Daniela; García, Elena González-Blanco; Gracia, Jorge; Ionov, Maxim; Labropoulou, Penny; Mambrini, Francesco; McCrae, John P.; Pagé-Perron, Émilie; Passarotti, Marco; Muñoz, Salvador Ros & Truică, Ciprian-Octavian
-
Getting LOAD ed: Practical Considerations, Tools, and Workflows for Producing Linked Open Assyriological Data. The Ancient World Goes Digital, 335-370. BRILL.
Nurmikko-Fuller, Terhi
