Joining graph- and vector-based sense representations for semantic end-user information access (JOIN-T 2)
Final Report Abstract
In the 2010s, natural language processing (NLP) research led to numerous breakthroughs, particularly in natural language understanding. Lexical semantics is one of the key areas in language technology (LT), which has led to a large number of publications addressing the representation of machine-readable knowledge along orthogonal dimensions, such as manual versus automatic acquisition, lexical versus conceptual levels, and dense versus sparse vectors and matrices. Nevertheless, there was a huge demand for researching their combination in order to unite the individual advantages of these dimensions in a common model or resource, that would enable improved performances of complex language technology tasks. We explored approaches to represent meaning based on the duality of graphs and vectors and the hypothesis that both, graphbased and vector-based, representations of lexical items should be used equally and jointly to characterize their meaning. To this end, we developed frameworks and resources that integrate the above-mentioned dimensions and, in particular, combine the interpretability of manually created and sparsely represented items with the accuracy and high coverage of dimensionality-reduced, dense, neural embeddings. In the first project phase (JOIN-T1) we achieved remarkable achievements in the field of distributional semantics, in particular: i) the linkage of distributional- and ontological- semantic representations is possible with high accuracy, and ii) the disambiguation of contextual lexical units to their meanings is possible with high accuracy using graph-based representations, but only with high computational complexity, which prevents their scaling to very large corpora so far. Building on our work on combining ontologies with graph-based distributional semantics in JOIN-T1, we extended our focus for JOIN-T2 to i) connecting to dimensionaltity-reduced, neural, dense vector representations (embeddings) from text and knowledge bases in a joint model, ii) extending coverage to low-frequency and emerging entities by processing corpora the size of the Internet, and iii) leveraging the joint benefits of a simultaneous lexical, distributional and ontological representation for complex natural language processing tasks, such as entity- and event-centered browsing of document collections. Due to the rapid developments in the field of compositional representations through large, pre-trained language models, we adapted our research direction according to the latest approaches and models, and therefore deviated from individual work packages defined in the proposal. In summary, we were able to gain new insights in the area of contextualized representations and made progress especially with regard to modeling and contextual detection of the meaning of words, semantic frames and entities.
Publications
-
Making Sense of Word Embeddings. Proceedings of the 1st Workshop on Representation Learning for NLP.
Pelevina, Maria; Arefiev, Nikolay; Biemann, Chris & Panchenko, Alexander
-
Dual Tensor Model for Detecting Asymmetric Lexico-Semantic Relations. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 1757-1767.
Glavaš, Goran & Ponzetto, Simone Paolo
-
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, 86-98.
Panchenko, Alexander; Ruppert, Eugen; Faralli, Stefano; Ponzetto, Simone Paolo & Biemann, Chris
-
A framework for enriching lexical semantic resources with distributional semantics. Natural Language Engineering, 24(2), 265-312.
BIEMANN, CHRIS; FARALLI, STEFANO; PANCHENKO, ALEXANDER & PONZETTO, SIMONE PAOLO
-
Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Miyazaki, Japan
Alexander Panchenko; Eugen Ruppert; Stefano Faralli; Simone Paolo Ponzetto & Chris Biemann
-
Entity-Aspect Linking. Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, 49-58.
Nanni, Federico; Ponzetto, Simone Paolo & Dietz, Laura
-
Unsupervised Semantic Frame Induction using Triclustering. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 55-62.
Ustalov, Dmitry; Panchenko, Alexander; Kutuzov, Andrey; Biemann, Chris & Ponzetto, Simone Paolo
-
2019. Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings. In Proceedings of the 15th Conference on Natural Language Processing (KON- VENS), 161–170. Erlangen, Germany
Gregor Wiedemann; Steffen Remus; Avi Chawla & Chris Biemann
-
Every Child Should Have Parents: A Taxonomy Refinement Algorithm Based on Hyperbolic Term Embeddings. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4811-4817.
Aly, Rami; Acharya, Shantanu; Ossa, Alexander; Köhn, Arne; Biemann, Chris & Panchenko, Alexander
-
Neural entity linking: A survey of models based on deep learning. Semantic Web, 13(3), 527-570.
Sevgili, Özge; Shelmanov, Artem; Arkhipov, Mikhail; Panchenko, Alexander & Biemann, Chris
