Project Details
Projekt Print View

Inducing syntactic structure

Term since 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 545523981
 
The starting point of this project is the observation that (i) across syntactic theories, across treebank formats and across languages, a large variety of syntactic structures have been proposed; and (ii) it has been shown that self-supervised contextual language models (LMs) capture syntactic information to a certain extent though it is not clear how these models generalize. In this project, we want to remain neutral with respect to the underlying theory and we want to induce syntactic constituency structure in an unsupervised way from LMs. We will experiment with different types of neural network architectures that make different assumptions concerning the overall hierarchical structures that we extract. Our central research questions are: Q1 How can we automatically learn syntactic structure from processing raw text? Q2 How do the emerging structures relate to established constituency from linguistic theory? Q3 How useful are the emerging structures for NLP applications? To address Q1, we will induce syntactic structure in an unsupervised way from raw text. We will focus on groupings of tokens into phrases and on the categories of these phrases, i.e., our principal focus is on constituency structure. However, we will also look into identifying the syntactic heads of constituents, which will allow to induce also a dependency structure. We will perform syntax induction on a range of different languages. Concerning Q2, we will compare our results to a range of existing syntactic theories and annotation schemes. This way, we hope, on the one hand, to find empirical evidence for certain assumptions made in syntactic theory and on the other hand, to identify a constituency format that emerges from text data and that might therefore be a good candidate to be used in syntactic parsing and annotation. Q3 aims at assessing the latter. Ideally, a syntactic annotation format should be such that it contains enough syntactic detail to provide valuable information for downstream tasks while being sufficiently general and learnable to allow for high quality annotation and parsing. In order to evaluate the usefulness of the emerging syntactic structures in NLP contexts, we will integrate the results from different induction approaches into supervised parsing architectures and into several downstream tasks.
DFG Programme Research Grants
International Connection Canada
Cooperation Partner Professor Dr. Hassan Sajjad
 
 

Additional Information

Textvergrößerung und Kontrastanpassung