Automatic Fact Checking for Biomedical Information in Social Media and Scientific Literature
Final Report Abstract
Research into methods for the automatic verification of facts, i.e., computational models that can distinguish correct information from misinformation or disinformation, is largely focused on the news domain and on the analysis of posts in social media. Among other things, texts are checked for their truthfulness. This can be done by analyzing linguistic features that suggest an intention to deceive or by comparing them with other sources that make comparable statements in terms of content. Most studies focus on politically relevant areas. The biomedical domain is also an area of particular social relevance. In social media, various actors and medical laypersons share reports on treatment methods, successes and failures, such as the (disproven) method of treating viral infections with deworming agents or disinfectants. There are also reports on (disproven) links between treatments and adverse effects, such as the causation of autism by vaccination. However, the biomedical domain, unlike other areas relevant for automated fact checking, benefits from a large resource of reliable scientific articles. The aim of the FIBISS project was therefore to develop and evaluate methods that can extract biomedical claims in social media and compare them with reliable sources. One challenge here is that social media does not typically use technical language, so different vocabularies have to be combined. The approach in FIBISS was therefore to develop generalizing information extraction methods. In the course of the project, large language models also became prominent as a further methodological approach. The project was therefore adapted to optimize general representations of claims in such a way that they are suitable for comparison using automatic fact-checking procedures. As a result, we contribute text corpora that are used to develop and evaluate automated biomedical fact-checking systems. We propose methods that automatically reformulate claims so that they are suitable to be automatically verified. Furthermore, we present approaches that can automatically assess the credibility of claims, even independently of existing evidence.
Link to the final report
https://doi.org/10.34657/17863
Publications
-
Claim Detection in Biomedical Twitter Posts. Proceedings of the 20th Workshop on Biomedical Language Processing, 131-142. Association for Computational Linguistics.
Wührl, Amelie & Klinger, Roman
-
CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 244–257, Marseille, France. European Language Resources Association
Isabelle Mohr, Amelie Wührl & Roman Klinger
-
Entity-based Claim Representation Improves Fact-Checking of Medical Content in Tweets. In Proceedings of the 9th Workshop on Argument Mining, pages 187–198, Online and in Gyeongju, Republic of Korea. International Conference on Computational Linguistics
Amelie Wührl & Roman Klinger
-
Recovering Patient Journeys: A Corpus of Biomedical Entities and Relations on Twitter (BEAR). In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4439–4450, Marseille, France. European Language Resources Association
Amelie Wührl & Roman Klinger
-
An Entity-based Claim Extraction Pipeline for Real-world Biomedical Fact-checking. Proceedings of the Sixth Fact Extraction and VERification Workshop (FEVER), 29-37. Association for Computational Linguistics.
Wuehrl, Amelie; Grimminger, Lara & Klinger, Roman
-
UNIDECOR: A Unified Deception Corpus for Cross-Corpus Deception Detection. Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, 39-51. Association for Computational Linguistics.
Velutharambath, Aswathy & Klinger, Roman
-
Can Factual Statements Be Deceptive? The DeFaBel Corpus of Belief-based Deception. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2708–2723, Torino, Italia. ELRA and ICCL
Aswathy Velutharambath, Amelie Wührl & Roman Klinger
-
How Entangled is Factuality and Deception in German?. Findings of the Association for Computational Linguistics: EMNLP 2024, 9538-9554. Association for Computational Linguistics.
Velutharambath, Aswathy; Wuehrl, Amelie & Klinger, Roman
-
Understanding Fine-grained Distortions in Reports of Scientific Findings. Findings of the Association for Computational Linguistics ACL 2024, 6175-6191. Association for Computational Linguistics.
Wuehrl, Amelie; Wright, Dustin; Klinger, Roman & Augenstein, Isabelle
-
What Makes Medical Claims (Un)Verifiable? Analyzing Entity and Relation Properties for Fact Verification. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 2046-2058. Association for Computational Linguistics.
Wuehrl, Amelie; Menchaca, Resendiz Yarik; Grimminger, Lara & Klinger, Roman
