Project Details
Structural generalization in transformer-based LLMs
Subject Area
Applied Linguistics, Computational Linguistics
Methods in Artificial Intelligence and Machine Learning
Methods in Artificial Intelligence and Machine Learning
Term
since 2026
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 579333431
Today’s large language models (LLMs) are overwhelmingly based on the transformer architecture. In contrast to earlier neural language models, transformers have made it possible to scale the size of the training data up to trillions of tokens. This has enabled LLMs to perform extremely well both on real-world tasks and on many academic evaluations. There is an intuitive sense that LLMs have mastered the syntax of natural language, at least for well-digitized languages like English. At the same time, there is increasing evidence that transformers struggle with structural generalization: they do not reliably learn to make correct predictions on test instances that are structurally more complex than the training data. This problem plagues even heavily pretrained LLMs and is consistent with initial theoretical findings about the learning capabilities of transformers. For the special case of length generalization, the PIs have previously formally characterized the class of tasks for which transformers correctly generalize to longer test instances from shorter training instances. In this project, we will investigate the ability of transformers to perform structural generalization. We will delineate the tasks on which transformers can learn from finite training data to make judgments on instances of arbitrary structural complexity from those tasks where they cannot, and we will attempt to explain _why_ transformers struggle with structural generalization on these tasks. We will focus primarily on tasks of judgments about syntactic structure, specifically determining whether a sentence is grammatically correct or not (acceptability) and of answering simple questions about it (comprehension). Structural generalization on these tasks is exactly the ability that a learner needs to obtain linguistic competence from finite observations, and our findings will therefore contribute immediately to our understanding of the potential of transformers as linguistic competence models. At the same time, structural generalization is at the heart of many other complex tasks that are currently being pursued with LLMs. Our findings in the focused domain of grammaticality will therefore have a wider impact on understanding and improving the performance of transformers more broadly.
DFG Programme
Priority Programmes
International Connection
USA
Cooperation Partners
Professor William Merrill, Ph.D.; Dr. Yuekun Yao
