Project Details
Projekt Print View

Structural generalization in transformer-based LLMs

Subject Area Applied Linguistics, Computational Linguistics
Methods in Artificial Intelligence and Machine Learning
Term since 2026
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 579333431
 
Today’s large language models (LLMs) are overwhelmingly based on the transformer architecture. In contrast to earlier neural language models, transformers have made it possible to scale the size of the training data up to trillions of tokens. This has enabled LLMs to perform extremely well both on real-world tasks and on many academic evaluations. There is an intuitive sense that LLMs have mastered the syntax of natural language, at least for well-digitized languages like English. At the same time, there is increasing evidence that transformers struggle with structural generalization: they do not reliably learn to make correct predictions on test instances that are structurally more complex than the training data. This problem plagues even heavily pretrained LLMs and is consistent with initial theoretical findings about the learning capabilities of transformers. For the special case of length generalization, the PIs have previously formally characterized the class of tasks for which transformers correctly generalize to longer test instances from shorter training instances. In this project, we will investigate the ability of transformers to perform structural generalization. We will delineate the tasks on which transformers can learn from finite training data to make judgments on instances of arbitrary structural complexity from those tasks where they cannot, and we will attempt to explain _why_ transformers struggle with structural generalization on these tasks. We will focus primarily on tasks of judgments about syntactic structure, specifically determining whether a sentence is grammatically correct or not (acceptability) and of answering simple questions about it (comprehension). Structural generalization on these tasks is exactly the ability that a learner needs to obtain linguistic competence from finite observations, and our findings will therefore contribute immediately to our understanding of the potential of transformers as linguistic competence models. At the same time, structural generalization is at the heart of many other complex tasks that are currently being pursued with LLMs. Our findings in the focused domain of grammaticality will therefore have a wider impact on understanding and improving the performance of transformers more broadly.
DFG Programme Priority Programmes
International Connection USA
 
 

Additional Information

Textvergrößerung und Kontrastanpassung