Project Details
Projekt Print View

Seq2Vec-LM: Transformer-based Sequence-to-Vector Language Models

Subject Area Methods in Artificial Intelligence and Machine Learning
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term since 2026
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 574835023
 
In the Seq2Vec-LM project, we will work on novel transformer-based language models that output only a single representation vector for an input text. In contrast, common transformer-based language models like GPT or BERT work as sequence-to-sequence models that output many representation vectors: one for each input token. Such sequence-to-sequence models are well-suited for tasks where a sequence is expected as the output, like translation, question answering, or text summarization. However, for tasks like text classification or semantic search, where often only a single representation vector is required to determine the output, the sequence-to-sequence architecture is not optimal for at least two reasons. First, when applying a sequence-to-sequence model in a ``single-vector scenario'', many representation vectors are computed but only one vector is finally used (e.g., the one of the special [CLS] token in case of BERT)---a rather inefficient approach that potentially wastes resources. Second, sequence-to-sequence models are usually pre-trained to predict individual tokens, which means that the token representation vectors may not be that suited to determine a single representation vector for an entire text---possibly resulting in non-optimal effectiveness. To address these potential efficiency and effectiveness issues of sequence-to-sequence models in single-vector scenarios, we will work on novel sequence-to-vector models that compute only a single representation vector for a whole input text. In preliminary work, we have already shown that a sequence-to-vector model can be as effective as established sequence-to-sequence models for semantic search, while being substantially more efficient. In the project, we now aim to further increase the efficiency and effectiveness of sequence-to-vector models by analyzing different architectural configurations, tokenization variants, and (pre-)training strategies. Besides that, we will also investigate the applicability of sequence-to-vector models for generative tasks like text summarization or question answering and for extractive tasks like named entity recognition or text classification.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung