Project Details
Negative Knowledge at Web Scale
Applicant
Dr. Simon Razniewski
Subject Area
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
from 2021 to 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 453095897
Structured knowledge is crucial in a range of applications such as question answering, dialogue or recommender systems. The required knowledge is usually stored in knowledge bases (KBs), and recent years have seen a rise of interest in KB construction, querying and maintenance. Some KBs focus on lexical information, others on geospatial knowledge, activities, or common sense. But most prominently, KBs capture encyclopedic knowledge, with notable projects being Wikidata, DBpedia, or the Google Knowledge Graph. These KBs store positive statements such as “Saarbrücken is the capital of the Saarland”, and are a key asset for many knowledge-intensive AI applications.A major limitation of all these KBs is their inability to deal with negative information. At present, all major knowledge bases only contain positive information, whereas statements such as that Tom Cruise did not win an Oscar can only be deduced by inferences that require substantial assumptions. As KBs generally only contain subsets of what is true, users often have to guess whether information not contained in a KBs is false, or truth is merely unknown to the KB. Not being able to formally distinguish whether a statement is false or unknown poses challenges in a variety of applications. In medicine, for instance, it is important to distinguish between knowing about the absence of a biochemical reaction between substances, and not knowing about its existence at all. In corporate integrity, it is important to know whether a person was never employed by a certain competitor, while in anti-corruption investigations, absence of family relations needs to be ascertained. In the domain of (fake) news, there is an important distinction between rumors whose truth is unknown (such as “Malayan Airlines 370 was hijacked”), and those established to be false (“Obama was born in Kenya”).While negative information has received great attention in logics and database theory, it is still absent from current web-scale knowledge bases. For instance, Wikidata, DBpedia and YAGO all only contain positive information, and at best allow limited inferences about negation via schema constraints. Similarly, text extraction and statistical inferences so far have only tackled positive information. In this project we aim to overcome the current restriction of knowledge bases to positive information by research that encompasses three components: (i) statistical inferencing techniques for generating negative information, (ii) web-validation and joint consolidation techniques for resolving contradictions and inconsistencies, and (iii) ranking techniques that allow to retrieve negative information as relevant in specific use cases.
DFG Programme
Research Grants