Project Details
The evaluation of empathy-related linguistic performance in large language models: Comparing surprisal values for next-word predictions in human EEG and LLMs.
Applicant
Professor Dr. Markus Werning
Subject Area
Methods in Artificial Intelligence and Machine Learning
Human Cognitive and Systems Neuroscience
Theoretical Philosophy
Human Cognitive and Systems Neuroscience
Theoretical Philosophy
Term
since 2026
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 579381973
The capacity to empathize with another agent’s emotions is a key feature of human intelligence, whereas it is widely held that artificial intelligences (AIs), including even those based on the most advanced large language models (LLMs), are not capable of empathy. The project will investigate this difference with the aim to explore whether the contrast between human and artificial intelligence might have to be mitigated. The project will focus on the relationship between the processing of emotion-related linguistic tasks and the capacity to empathize with another agent’s emotions. Given that, for humans, this relationship is quite close (see below), what does this imply for LLM-based AIs that show an increasing ability to succeed in emotion-related linguistic tasks? Does this speak in favor of AIs accomplishing some empathy capacities? Regarding the linguistic tasks, we investigate probabilistic next-work predictions by measuring semantic surprisal regarding emotion words. In humans, we pursue two language-based electrophysiological experiments and focus on the N400 component of the event-related potential (ERP) as a measure of semantic surprisal. Regarding LLMs, we read out the probability distributions of next-work predictions using open source tools (such as Hugging Face) with various LLMs (GPT 2.0, GPT3.0, GPT 4.0, BERT, LLAMA). How well do the read out probability distributions serve as predictors of semantic surprisal values in humans? What are the critical differences between different LLMs that might improve the humanoid predictor qualities of LLMs? To account for the capacity of empathy in human subjects, we use a battery of behavioral tests (MET, IRI, EmBody/EmFace, AQ). The project will be completed with a theoretical investigation of the structure of emotions, emotion-related language, and the role of empathy.
DFG Programme
Priority Programmes
