Project Details
Unravelling Linguistic Knowledge via Multilingual Embedding Spaces and Latent Information (B06)
Subject Area
Applied Linguistics, Computational Linguistics
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Term
since 2015
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 232722074
Embeddings (mono- and multi-lingual, static or contextualised) are the workhorses of modern language technologies. They capture semantic, grammatical, morphological and other information. Multi-lingual embeddings are especially promising: word and sentence translations are close in multi-lingual embedding space, allow fine-tuning, few- and zero-shot learning and constitute the core technology underpinning our previous research on Translationese in Phase II. In Phase III, B6 focuses on (i) information spreading in embedding spaces, (ii) translationese subspaces and (iii) extracting tacit background knowledge from translation data, particularly for situations where isomorphism between spaces does not and should not hold, and investigates how (i – iii) impact on information density-based approaches to translation.
DFG Programme
Collaborative Research Centres
Subproject of
SFB 1102:
Information Density and Linguistic Encoding
Applicant Institution
Universität des Saarlandes
Project Heads
Dr. Cristina España i Bonet; Professor Dr. Josef van Genabith; Dr. Raphael Rubino, until 5/2019