Project Details
Projekt Print View

Moral Hallucinations in Large Language Models — Their Argumentative Structure and Ethical Implications

Subject Area Practical Philosophy
Applied Linguistics, Computational Linguistics
Term since 2026
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 579280134
 
Our starting point are three observations: First, many people use chatbots based on Large Language Models (LLMs) for controversial and ethically relevant questions that go far beyond the private sphere of individual decision-making, e.g., Is it wrong to lie in order to protect someone’s feelings? (Wester et al. 2025). Secondly, recent research shows that participants rate an LLM’s moral advice as superior to the advice of other people (Aharoni et al. 2024) and even to that of expert ethicists (Dillion et al. 2025). Thirdly, LLMs have been shown to contain moral biases (Takemoto et al. 2024; Xu et al. 2025; among many others) and to exhibit different moral codes than humans (Marraffini et al. 2024; Garcia et al. 2024; Bonagiri et al. 2024). While current research focuses on a simplistic analysis of LLM responses to moral questions, e.g., yes/no responses or moral vs. immoral judgements (Jha et al. 2024, Ji et al. 2024, among others), one crucial property of LLM-generated reasoning is overlooked: moral hallucinations. These hallucinations are not reducible to conventional AI hallucinations, since they are not only about factual inaccuracies or lack of faithfulness to sources. Instead, they involve distortions within patterns of moral reasoning, constituting a qualitatively different issue – with potentially highly relevant consequences, because when LLMs distort moral concepts, they might undermine not only the content of advice, but also the structural foundations of individual and collective moral judgment. In this project, we bring, thus, together methods from philosophy, in particular Applied Ethics of AI, and computational linguistics, especially argument mining, to conceptualize, benchmark, ethically assess and automatically identify LLM-generated moral hallucinations. We structure this research based on the following three research questions: (RQ1) What are the constitutive elements of LLM-generated ‘moral hallucinations’ and what are the ethical consequences if the moral claims that LLMs produce are not just biased but hallucinatory? (RQ2) What are the core argumentative structures and reasoning patterns of moral hallucinations and how do they compare to moral reasoning in genuine philosophical theory? (RQ3) What are the wider ethical implications when an LLM-based system is used for moral advice-seeking and how can we set up a computational model so that it can automatically flag moral hallucinations and components thereof? In answering these questions, set up a benchmark of moral hallucinations that contains a fine-grained reasoning and argumentation analysis, identify the ethical implications of moral hallucinations and develop a computational model to identify these hallucinations in unseen data.
DFG Programme Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung