Project Details
Understanding and Overcoming Architectural Limitations in Neural Language Models
Applicant
Professor Dr. Michael Hahn
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Methods in Artificial Intelligence and Machine Learning
Methods in Artificial Intelligence and Machine Learning
Term
since 2025
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 560456343
Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding language, revolutionizing the field of Natural Language Processing (NLP). However, they still exhibit limitations in logical reasoning. Recent research suggests that these limitations stem from fundamental constraints in the underlying machine learning architecture - specifically, the Transformer architecture. The goal of this project is to develop a deeper theoretical understanding of the capabilities and limitations of these architectures and, based on this, design, implement, and evaluate new architectures that enhance logical reasoning capabilities. In Work Packages (WPs) 1-2, we will develop a robust theoretical framework to rigorously analyze the limitations of logical reasoning in Transformer and related architectures. This includes formalizing the types of reasoning tasks that these models can or cannot solve and identifying architectural features responsible for these limitations. In Work Packages 3-5, we will build on the findings from WPs 1-2 to design, implement, and evaluate new neural architectures that overcome the identified limitations. The innovations will include: 1) WP 3: Adaptive position encodings to improve the handling of longer or more structured input data. 2) WP 4: New approaches to Chain-of-Thought (CoT) reasoning, based on the theoretical framework, enabling more robust multi-step logical reasoning. 3) WP 5: Architectures that dynamically adjust the number of layers used for computation based on task complexity, facilitating more flexible and efficient logical reasoning. This project will make fundamental contributions to both the theoretical and practical aspects of LLM development. The outcomes will not only advance the state-of-the-art in NLP but will also have broader implications for Artificial Intelligence.
DFG Programme
Emmy Noether Independent Junior Research Groups
Major Instrumentation
GPU-Server
