Evaluating, Explaining, and Enabling Ethical Multi-Agent Systems of Large Language Models (E4-MALM)

Applicants Professorin Dr. Anne Lauscher; Dr. Jae Hee Lee

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term since 2026

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 579327477

Project Description

Large language models (LLMs) have recently transformed the state-of-the-art in natural language processing. They have been shown to be powerful in isolation, but even more so, when deployed in so-called multi-agent setups, where individual LLM-based agents cooperate. However, recent work shows that such multi-agent interactions can lead to unpredictable and undesired emergent behaviors potentially resulting in harmful outcomes (e.g., unsafe system decisions).Our project addresses this problem by investigating the ethical reliability and safety of multi-agent systems built from large language models (MALMs). We will (i) develop a robust evaluation framework that measures and stress-tests the ethical behavior of MALMs at three levels—individual agents, their interactions, and overall system convergence—with a focus on socially salient failure modes such as “toxic agreement” in social-simulation scenarios; (ii) produce causal, mechanistic explanations that connect macro-level interaction patterns to micro-level features, neurons, and attention heads, yielding actionable “mechanism cards”; and (iii) design novel parameter-efficient alignment interventions (e.g., activation steering, LoRA/QLoRA, rank-one edits) that improve the safety of MALMs while preserving core capabilities. Our work contributes to the aims of LaSTing (SPP 2556) by delivering robust assessment methods, actionable insights into internal system mechanisms, and novel methods that enhance the safe applicability of MALMs.

DFG Programme Priority Programmes

Subproject of SPP 2556: Robust Assessment & Safe Applicability of Language Modeling: Foundations for a New Field of Language Science & Technology (LaSTing)

Servicenavigation

Hauptnavigation

Evaluating, Explaining, and Enabling Ethical Multi-Agent Systems of Large Language Models (E4-MALM)

Additional Information

Servicenavigation

Hauptnavigation

Evaluating, Explaining, and Enabling Ethical Multi-Agent Systems of Large Language Models (E4-MALM)

Additional Information

Textvergrößerung und Kontrastanpassung