Moral Hallucinations in Large Language Models — Their Argumentative Structure and Ethical Implications

Applicants Professorin Dr. Annette Hautli-Janisz; Professorin Dr. Karoline Reinhardt

Subject Area Practical Philosophy
Applied Linguistics, Computational Linguistics

Term since 2026

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 579280134

Project Description

Our starting point are three observations: First, many people use chatbots based on Large Language Models (LLMs) for controversial and ethically relevant questions that go far beyond the private sphere of individual decision-making, e.g., Is it wrong to lie in order to protect someone’s feelings? (Wester et al. 2025). Secondly, recent research shows that participants rate an LLM’s moral advice as superior to the advice of other people (Aharoni et al. 2024) and even to that of expert ethicists (Dillion et al. 2025). Thirdly, LLMs have been shown to contain moral biases (Takemoto et al. 2024; Xu et al. 2025; among many others) and to exhibit different moral codes than humans (Marraffini et al. 2024; Garcia et al. 2024; Bonagiri et al. 2024). While current research focuses on a simplistic analysis of LLM responses to moral questions, e.g., yes/no responses or moral vs. immoral judgements (Jha et al. 2024, Ji et al. 2024, among others), one crucial property of LLM-generated reasoning is overlooked: moral hallucinations. These hallucinations are not reducible to conventional AI hallucinations, since they are not only about factual inaccuracies or lack of faithfulness to sources. Instead, they involve distortions within patterns of moral reasoning, constituting a qualitatively different issue – with potentially highly relevant consequences, because when LLMs distort moral concepts, they might undermine not only the content of advice, but also the structural foundations of individual and collective moral judgment. In this project, we bring, thus, together methods from philosophy, in particular Applied Ethics of AI, and computational linguistics, especially argument mining, to conceptualize, benchmark, ethically assess and automatically identify LLM-generated moral hallucinations. We structure this research based on the following three research questions: (RQ1) What are the constitutive elements of LLM-generated ‘moral hallucinations’ and what are the ethical consequences if the moral claims that LLMs produce are not just biased but hallucinatory? (RQ2) What are the core argumentative structures and reasoning patterns of moral hallucinations and how do they compare to moral reasoning in genuine philosophical theory? (RQ3) What are the wider ethical implications when an LLM-based system is used for moral advice-seeking and how can we set up a computational model so that it can automatically flag moral hallucinations and components thereof? In answering these questions, set up a benchmark of moral hallucinations that contains a fine-grained reasoning and argumentation analysis, identify the ethical implications of moral hallucinations and develop a computational model to identify these hallucinations in unseen data.

DFG Programme Priority Programmes

Subproject of SPP 2556: Robust Assessment & Safe Applicability of Language Modeling: Foundations for a New Field of Language Science & Technology (LaSTing)

Servicenavigation

Hauptnavigation

Moral Hallucinations in Large Language Models — Their Argumentative Structure and Ethical Implications

Additional Information

Servicenavigation

Hauptnavigation

Moral Hallucinations in Large Language Models — Their Argumentative Structure and Ethical Implications

Additional Information

Textvergrößerung und Kontrastanpassung