Distributionelle Ansätze für die Modellierung von semantischen Beziehungen: Generalisierung, Evaluierung, Visualisierung

Antragstellerin Privatdozentin Dr. Sabine Schulte im Walde

Fachliche Zuordnung Allgemeine und Vergleichende Sprachwissenschaft, Experimentelle Linguistik, Typologie, Außereuropäische Sprachen
Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing

Förderung Förderung von 2011 bis 2017

Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 192349223

Erstellungsjahr 2017

Zusammenfassung der Projektergebnisse

The project Distributional Approaches to Semantic Relatedness (SemRel) explored the potential and the limits of distributional approaches to lexical semantics. In this vein, phase 1 distinguished three types of semantic relatedness, to shed light on distributional modelling from different perspectives. Phase 2 studied semantic relatedness from a meta-level perspective, across relatedness types, word classes, word senses and feature types. Our work was performed within an interdisciplinary framework between theoretical, cognitive and computational linguistics: each type of relatedness concerning paradigmatic relations, preposition senses and compound compositionality received input and feedback from human judgements, and was applied to statistical machine translation. Our main contributions from the project include • an extensive collection of distributional information and an interface for German subcategorisation information; • a substantial collection of human judgements regarding paradigmatic semantic relations, compositionality ratings, association and feature norms ; • a novel framework based on hard as well as soft clustering to identify ambiguous words, and a visualisation tool to explore their features; • assessment and own development of evaluation measures for soft clustering; • various soft-clustering and/or multi-modal approaches to predict relatedness and identify salient features; • various approaches to predict the compositionality of German multi-word expressions with speciﬁc attention to the linguistic and empirical properties; • various approaches to distinguish between paradigmatic relations, using both count and neural predict models; • a hierarchical SMT system integrating syntactico-semantic subcategorisation information; • a phrase-based SMT system making use of synthetic phrases to model noun phrase and prepositional phrase complements, • a phrase-based SMT system combining approaches to model morphology, syntax and lexical choice; • a phrase-based SMT system integrating compositionality ratings; • an SMT system with support-verb constructions. Overall, we demonstrated a large potential for distributional information to model the various types of semantic relatedness, and to integrate them into an SMT model. We also showed that default features might represent a ﬁrst step as salient distributional properties, but are outperformed by phenomenon-related features. Contrasting text-based with multi-modal variants provided ﬁrst insights into the strengths and complementary properties of the modalities. Most surprising for us was the difﬁculty to capture all three semantic relatedness types within one framework, as the two types of multi-word expressions already showed a very different behaviour, across various models. Distinguishing between the salient features and relating them to human judgements in a reasonable way remains a major challenge. Concerning ambiguity in vector spaces, our work as well as discussions in a reading group and in a workshop showed that in addition to deﬁning appropriate techniques, the underlying gold standard ratings also need to take into account ambiguity, to allow a reasonable assessment of the models.

Projektbezogene Publikationen (Auswahl)

A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1146–1157, Seattle, WA, 2013
Stephen Roller and Sabine Schulte im Walde
Using Subcategorization Knowledge to improve Case Prediction for Translation to German. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), pages 593–603, Soﬁa, Bulgaria, 2013
Marion Weller, Alexander Fraser, and Sabine Schulte im Walde
Chasing Hypernyms in Vector Spaces with Entropy. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 38–42, Gothenburg, Sweden, 2014
Enrico Santus, Alessandro Lenci, Qin Lu, and Sabine Schulte im Walde
Combining Word Patterns and Discourse Markers for Paradigmatic Relation Classiﬁcation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 524–530, Baltimore, MD, 2014
Michael Roth and Sabine Schulte im Walde
Association Norms for German Noun Compounds and their Constituents. Behavior Research Methods, 47(4):1199–1221, 2015
Schulte im Walde, Sabine & Borgwaldt, Susanne R.
How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation. In Proceedings of the 11th Workshop on Multiword Expressions (MWE), pages 19–28, Denver, Colorado, USA, 2015
Fabienne Cap, Manju Nirmal, Marion Weller, and Sabine Schulte im Walde
Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), pages 454–459, Berlin, Germany, 2016
Kim-Anh Nguyen, Sabine Schulte im Walde, and Thang Vu
The Role of Modiﬁer and Head Properties in Predicting the Compositionality of English and German Noun-Noun Compounds: A Vector-Space Perspective. In Proceedings of the 5th Joint Conference on Lexical and Computational Semantics (SEM), pages 148–158, Berlin, Germany, 2016
Sabine Schulte im Walde, Anna Hätty, and Stefan Bott
Addressing Problems across Linguistic Levels in SMT: Combining Approaches to Model Morphology, Syntax and Lexical Choice. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 625–630, Valencia, Spain, 2017
Marion Weller-Di Marco, Alexander Fraser, and Sabine Schulte im Walde
Factoring Ambiguity out of the Prediction of Compositionality for German Multi-Word Expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE), pages 66–72, Valencia, Spain, 2017
Bott, Stefan & Schulte im Walde, Sabine

Servicenavigation

Hauptnavigation

Distributionelle Ansätze für die Modellierung von semantischen Beziehungen: Generalisierung, Evaluierung, Visualisierung

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Servicenavigation

Hauptnavigation

Distributionelle Ansätze für die Modellierung von semantischen Beziehungen: Generalisierung, Evaluierung, Visualisierung

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Textvergrößerung und Kontrastanpassung