Project Details
A Resource Efficient Cross-linguistic Approach to Figurative Meaning Assessment in LLMs
Applicant
Dr. Maria Berger
Subject Area
Applied Linguistics, Computational Linguistics
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Term
since 2026
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 579372772
LLMs can find good mappings for word meanings that are well encoded by large amounts of data. However, they are not able to interpret figurative meaning, and it is unclear whether future architectures will. In fact, there will always be a lack of data representing transferred meaning. While models can handle established proverbs: “birds of a feather flock together” is translated into “gleich und gleich gesellt sich gern” (DE). They struggle with less famous figures: “the biter is sometimes bitten” is literally translated. We also do not know, whether LLMs are capable of correctly interpreting figurative meaning in less-studied languages, because relating studies are missing. To address these issues, we conduct an evaluation using parallel multilingual figurative corpora addressing three objectives: First, we conduct a robust assessment of multilingual LLMs to test their ability to capture figurative meaning. Using existing corpora, we apply downstream tasks, such as machine translation and zero-shot figurative meaning NLI. A challenge is that corpora will be available in some tasks (e.g., metaphor prediction) and limited in other tasks (e.g., parallel proverb detection in German and Chinese). We use literal rephrasing and back-translation strategies to supplement existing resources. We want to assess how LLMs behave depending on available data and whether they are able to cope with figurative meaning. Second, we examine internal representations of LLMs to understand how translation effects a figure’s meaning and how we can measure meaning through encoding. This can be achieved by probing-testing whether a model performs a certain task. To this end, we apply techniques including neuron activation probing and activation vector transformation (SAEs), and complement the probing by applying layer freezing where activations are projected into a smaller parameter space. This will tell us how the inner mechanisms of an LLM effects the output and ultimately, we can extrapolate missing meaning representations, thus nut cracking LLMs’ black-boxed behavior. Third, we aim to enhance cross-lingual capabilities of LLMs. Exceeding pure evaluation, we distill transferable aspects of figurative across languages. It is particularly important to understand the limitations of today's architectures that perform well in resource-rich languages. Since figurative meaning will never be fully represented by LLMs across all languages, we attempt to transfer language-agnostic elements of figurative meaning. We define a low-difficulty challenge: cross-lingual metaphor NLI, and a high-difficulty challenge: understanding proverbs and idiomatic usage. The latter is critical, because languages lexicalize figurative expressions differently, depending on their tradition, hence parallel corpora barely exist. We offer test suites for both difficulty degrees. Since our approaches use joint resources, data processing and model training are very efficient.
DFG Programme
Priority Programmes
