Project Details
Projekt Print View

Negative Knowledge at Web Scale

Subject Area Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2021 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 453095897
 
Final Report Year 2024

Final Report Abstract

Knowledge Graphs (KGs) about real-world entities and their properties are an important asset in many AI applications. Web-scale KGs store almost only positive statements, and miss out on negative statements. Due to the incompleteness of open-world KGs, absent statements are considered unknown, rather than false. In this project, we made the following contributions: 1. We made the case for selective materialization of negative statements about entities in encyclopedic (well-canonicalized) open-world KGs, and formally defined three types of negative statements: grounded, universally absent, and conditional negative statements. We presented the peer-based negation inference method to compile lists of salient negatives about entities. The method computes relevant peers for a given input entity, and uses their positives to set expectations for the input entity. An expectation that does not hold is an immediate candidate negative, and is then scored using frequency, importance, and unexpectedness metrics. 2. We proposed the pattern-based query log extraction method to extract salient negatives from rich textual sources. This method extracts salient negatives about an entity by harvesting large corpora, i.e., search engine’s query logs, using a few handcrafted patterns with negative keywords. 3. We introduced the UnCommonsense method to generate salient negative phrases about everyday concepts in less-canonicalized commonsense KGs. This method is designed to handle negation inference, scrutiny, and ranking of short natural language phrases. It computes comparable concepts for a given target concept, infers candidate negatives from comparing their positives, and scrutinizes these candidates against the KG itself, as well as Language Models (LMs) as an external source of knowledge. Finally, candidates are ranked using semantic-similarity-aware frequency measures. 4. To facilitate exploring our methods and their results, we implemented two prototype systems. In Wikinegata, a system to showcase the peer-based method is developed where users can explore negative statements about 500K entities of 11 classes, and adjust different parameters of the peer-based inference method. They can also query the KG using triple patterns with negated predicates. In the UnCommonsense system, users can closely inspect what the method produces at every step, as well as browse negatives about 8K everyday concepts. Moreover, using the peer-based negation inference method, we create the first large-scale dataset on demographics and outliers in communities of interest, and show its usefulness in use cases such as identifying under-represented groups. We released all datasets and code produced in these projects.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung