Project Details
Projekt Print View

Understanding and Overcoming Architectural Limitations in Neural Language Models

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Methods in Artificial Intelligence and Machine Learning
Term since 2025
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 560456343
 
Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding language, revolutionizing the field of Natural Language Processing (NLP). However, they still exhibit limitations in logical reasoning. Recent research suggests that these limitations stem from fundamental constraints in the underlying machine learning architecture - specifically, the Transformer architecture. The goal of this project is to develop a deeper theoretical understanding of the capabilities and limitations of these architectures and, based on this, design, implement, and evaluate new architectures that enhance logical reasoning capabilities. In Work Packages (WPs) 1-2, we will develop a robust theoretical framework to rigorously analyze the limitations of logical reasoning in Transformer and related architectures. This includes formalizing the types of reasoning tasks that these models can or cannot solve and identifying architectural features responsible for these limitations. In Work Packages 3-5, we will build on the findings from WPs 1-2 to design, implement, and evaluate new neural architectures that overcome the identified limitations. The innovations will include: 1) WP 3: Adaptive position encodings to improve the handling of longer or more structured input data. 2) WP 4: New approaches to Chain-of-Thought (CoT) reasoning, based on the theoretical framework, enabling more robust multi-step logical reasoning. 3) WP 5: Architectures that dynamically adjust the number of layers used for computation based on task complexity, facilitating more flexible and efficient logical reasoning. This project will make fundamental contributions to both the theoretical and practical aspects of LLM development. The outcomes will not only advance the state-of-the-art in NLP but will also have broader implications for Artificial Intelligence.
DFG Programme Emmy Noether Independent Junior Research Groups
Major Instrumentation GPU-Server
 
 

Additional Information

Textvergrößerung und Kontrastanpassung