Project Details
Projekt Print View

Joining graph- and vector-based sense representations for semantic end-user information access (JOIN-T 2)

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2014 to 2019
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 259256643
 
Final Report Year 2023

Final Report Abstract

In the 2010s, natural language processing (NLP) research led to numerous breakthroughs, particularly in natural language understanding. Lexical semantics is one of the key areas in language technology (LT), which has led to a large number of publications addressing the representation of machine-readable knowledge along orthogonal dimensions, such as manual versus automatic acquisition, lexical versus conceptual levels, and dense versus sparse vectors and matrices. Nevertheless, there was a huge demand for researching their combination in order to unite the individual advantages of these dimensions in a common model or resource, that would enable improved performances of complex language technology tasks. We explored approaches to represent meaning based on the duality of graphs and vectors and the hypothesis that both, graphbased and vector-based, representations of lexical items should be used equally and jointly to characterize their meaning. To this end, we developed frameworks and resources that integrate the above-mentioned dimensions and, in particular, combine the interpretability of manually created and sparsely represented items with the accuracy and high coverage of dimensionality-reduced, dense, neural embeddings. In the first project phase (JOIN-T1) we achieved remarkable achievements in the field of distributional semantics, in particular: i) the linkage of distributional- and ontological- semantic representations is possible with high accuracy, and ii) the disambiguation of contextual lexical units to their meanings is possible with high accuracy using graph-based representations, but only with high computational complexity, which prevents their scaling to very large corpora so far. Building on our work on combining ontologies with graph-based distributional semantics in JOIN-T1, we extended our focus for JOIN-T2 to i) connecting to dimensionaltity-reduced, neural, dense vector representations (embeddings) from text and knowledge bases in a joint model, ii) extending coverage to low-frequency and emerging entities by processing corpora the size of the Internet, and iii) leveraging the joint benefits of a simultaneous lexical, distributional and ontological representation for complex natural language processing tasks, such as entity- and event-centered browsing of document collections. Due to the rapid developments in the field of compositional representations through large, pre-trained language models, we adapted our research direction according to the latest approaches and models, and therefore deviated from individual work packages defined in the proposal. In summary, we were able to gain new insights in the area of contextualized representations and made progress especially with regard to modeling and contextual detection of the meaning of words, semantic frames and entities.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung